How Amazon Streams Live Video to Tens of Millions of People

Plus, what it's like to find a developer job in 2024, how to learn hard things and more.

February 12, 2024

Hey Everyone!

Today we’ll be talking about

How Amazon Streams Live Video to Tens of Millions of People
- Amazon Prime regularly broadcasts live video to tens of millions of users
- In order to be competitive with TV, Prime Video needs 99.999% reliability (less than 26 seconds of downtime per month)
- We’ll talk about the Ingestion, Encoding, Packaging and Distribution stages
- They deploy each system in at least two AWS Regions and also use other redundancy
Tech Snippets
- Is the “Modern Data Stack” still a useful idea by Tristan Handy
- Finding a New Developer Job in the 2024 Market
- An Interactive Intro to CRDTs
- What it was like working for GitLab
- Making Hard Things Easy by Julia Evans

How Amazon Prime Live Streams Video to Tens of Millions of Users

Prime Video is Amazon’s streaming platform, where you can watch from their catalog of thousands of movies and TV shows.

One of the features they provide is live video streaming where you can watch TV stations, sports games and more. A few years ago, the NFL (American Football League) struck a deal with Amazon to make Prime Video the exclusive broadcaster of Thursday Night Football. Viewership averaged over 10 million users with some games getting over 100 million views.

As you’ve probably experienced, the most frustrating thing that can happen when you’re watching a game is having a laggy stream. With TV, the expectation is extremely high reliability and very rare interruptions.

The Prime Video live streaming team set out to achieve the same with a goal of 5 9’s (99.999%) of availability. This translates to less than 26 seconds of downtime per month and more reliability than most of Boeing’s airplanes (just kidding but uhhh).

Ben Foreman is the Global Head of Live Channels Architecture at Amazon Prime Video. He wrote a fantastic blog post delving into how Amazon achieved these reliability figures and what tech they’re using.

If you want fully editable, spaced-repetition flash cards on all the core concepts we discuss in Quastor, check out Quastor Pro. It’s super useful for becoming a better backend developer and also for system design-style interviews.

Tech Stack

We’re talking about Amazon Prime Video here, so you probably won’t be surprised to hear that they’re using AWS for their tech stack. They make use of AWS Elemental, a suite of AWS services for video providers to build their platforms on AWS.

When you’re building a system like Amazon Prime Video, there’s several steps you have to go through

Video Ingestion
Encoding
Packaging
Delivery

We’ll break down each of these and talk about the tech AWS uses.

Video Ingestion

The first step is to ingest the raw video feed from the recording studio/event venue. AWS asks their partners to deliver multiple feeds of the raw video so that they can immediately switch to a backup if one of the feeds fails.

This feed goes to AWS Elemental MediaConnect, a service that allows for the ingestion of live video in AWS Cloud. MediaConnect can then distribute the video to other AWS services or to some destination outside of AWS.

It supports a wide range of video transmission protocols like Zixi and RTP. The content is also encrypted so there’s no unauthorized access or modification of the feed.

Encoding

The raw feed from the original source is typically very large and not optimized for transmission or playback.

Video codecs solve this problem by compressing/decompressing digital video so it’s easier to store and transmit. Commonly used codecs include H.265, VP9, AV1 and more. Each codec comes with its own strengths/weaknesses in terms of compression efficiency, speed and video quality.

During the encoding stage, multiple versions of the video files are created where each has different sizes and is optimized for different devices.

This will be useful during the delivery stage for adaptive bitrate streaming, where AWS can deliver different versions of the video stream depending on the user’s network conditions. If the user is traveling and moves from an area of good-signal to poor-signal, then AWS can quickly switch the video feed from high-quality to low-quality to prevent any buffering.

For encoding, Prime video uses AWS Elemental MediaLive.

Packaging

The next stage is packaging, where the encoded video streams are organized into formats suitable for delivery over the internet. This is also where you add in things like DRM (digital rights management) protections to prevent (or at least reduce) any online piracy around the video stream.

In order to stream your encoded video files on the internet, you’ll need to use a video streaming protocol like MPEG-DASH or HLS. These are all adaptive bitrate streaming protocols, so they’ll adapt to the bandwidth and device capabilities of the end user to minimize any buffering. This way, content can be delivered to TVs, mobile phones, computers, gaming consoles, tablets and more.

The output of the packaging stage is a bunch of small, segmented video files (each chunk is around 2 to 10 seconds) and a manifest file with metadata on the ordering of the chunks, URLs, available bitrates (quality levels), etc.

This data gets passed on to a content delivery network.

Delivery

The final stage is the delivery stage, where the manifest file and the video chunks are sent to end users. In order to minimize latency, you’ll probably be using a Content Delivery Network like AWS CloudFront, Cloudflare, Akamai, etc.

Prime Video uses AWS CloudFront and users can download from a multitude of different CDN endpoints so there’s reliability in case any region goes down.

Achieving 5 9’s of Reliability

The key to achieving high availability is redundancy. If you have a component with a 1% rate of failure, then you can take two of those components and set them up in a configuration where one will immediately step in if the other fails.

Now, your system will only fail if both of these components go down (which is a 0.01% probability assuming the components are independent… although this might not be the case).

With Amazon Prime Video, they deploy each system in at least two AWS Regions (these regions are designed to be as independent as possible so one going down doesn’t bring down the other region).

AWS Elemental also provides an in-Region redundancy models. This deploys redundant Elemental systems in the same region at a reduced cost. If one of the systems fails for whatever reason, then it can seamlessly switchover to the other system.

Each of the AWS Elemental systems provided an SLA of 3 9’s. By utilizing redundancy and parallelizing all their components in different availability zones, Amazon Prime Video is able to achieve an expected uptime of 99.999% (5 9’s).

For more details, read the full article here.

Tech Snippets

Finding a New Software Developer Job

In October of last year, Henrik Warne was unfortunately laid off from his company. He was able to land a new position after a few months of looking and he wrote a great blog post delving into his experience.

In total, he was in contact with 30 companies and was able to get offers from 3. He used LinkedIn, Demando, Hacker News and a few other sources for job postings.

He noticed that there’s now a big focus on having experience in a given language. Many recruiters have a hard requirement that you need at least two or three years experience in the given language from the hiring company.

henrikwarne.com/2024/02/11/finding-a-new-software-developer-job

Making Hard Things Easy

Julia Evans was a Staff Software Engineer at Stripe and she gave a fantastic talk delving into her process for learning difficult things like the intricacies of DNS, HTTP, bash and more.

One fantastic point she makes is that learning new things *is a skill*. In other words, the more time you dedicate towards learning new things, the better/faster you’ll get at picking new tech up!

You also have to learn to love confusion. Being confused means that you’re about to learn something new. Having a growth mindset can be extremely helpful.

Check the post for more tips.

jvns.ca/blog/2023/10/06/new-talk--making-hard-things-easy

What it was like working for a High Growth Startup (GitLab)

Yorick Peterse joined GitLab in October 2015 when the company was just 30 employees. When he left in 2021, the company had over 2000 employees and he was pretty burned out.

He wrote a great blog post delving into his experience at a high growth startup and what he learned.

yorickpeterse.com/articles/what-it-was-like-working-for-gitlab

An Interactive Intro to CRDTs

CRDTs are data structures that are used for peer-to-peer state synchronization without a central server. Examples of applications that use CRDTs are Figma, SoundCloud, Azure CosmosDB, Redis and more.

This is a fantastic, interactive intro to CRDTs and how they work with code examples in TypeScript.

jakelazaroff.com/words/an-interactive-intro-to-crdts

Is the "Modern Data Stack" Still a Useful Idea?

Tristan Handy is the founder of dbt Labs, the creator of data build tool (dbt).

In this article he delves into the “Modern Data Stack”, where you leverage cloud-native services for data workflows.

While the idea was super hyped in 2021/2022 with companies like Snowflake, it’s become less relevant in 2024 as being “cloud-native” is no longer unique and traditional tools have adopted to the cloud.

He also delves into market cycles and how that’s changed in the industry.

roundup.getdbt.com/p/is-the-modern-data-stack-still-a