How Amazon Streams Live Video to Tens of Millions of People
Plus, what it's like to find a developer job in 2024, how to learn hard things and more.
Hey Everyone!
Today we’ll be talking about
How Amazon Streams Live Video to Tens of Millions of People
Amazon Prime regularly broadcasts live video to tens of millions of users
In order to be competitive with TV, Prime Video needs 99.999% reliability (less than 26 seconds of downtime per month)
We’ll talk about the Ingestion, Encoding, Packaging and Distribution stages
They deploy each system in at least two AWS Regions and also use other redundancy
Tech Snippets
Is the “Modern Data Stack” still a useful idea by Tristan Handy
Finding a New Developer Job in the 2024 Market
An Interactive Intro to CRDTs
What it was like working for GitLab
Making Hard Things Easy by Julia Evans
How Amazon Prime Live Streams Video to Tens of Millions of Users
Prime Video is Amazon’s streaming platform, where you can watch from their catalog of thousands of movies and TV shows.
One of the features they provide is live video streaming where you can watch TV stations, sports games and more. A few years ago, the NFL (American Football League) struck a deal with Amazon to make Prime Video the exclusive broadcaster of Thursday Night Football. Viewership averaged over 10 million users with some games getting over 100 million views.
As you’ve probably experienced, the most frustrating thing that can happen when you’re watching a game is having a laggy stream. With TV, the expectation is extremely high reliability and very rare interruptions.
The Prime Video live streaming team set out to achieve the same with a goal of 5 9’s (99.999%) of availability. This translates to less than 26 seconds of downtime per month and more reliability than most of Boeing’s airplanes (just kidding but uhhh).
Ben Foreman is the Global Head of Live Channels Architecture at Amazon Prime Video. He wrote a fantastic blog post delving into how Amazon achieved these reliability figures and what tech they’re using.
If you want fully editable, spaced-repetition flash cards on all the core concepts we discuss in Quastor, check out Quastor Pro. It’s super useful for becoming a better backend developer and also for system design-style interviews.
Tech Stack
We’re talking about Amazon Prime Video here, so you probably won’t be surprised to hear that they’re using AWS for their tech stack. They make use of AWS Elemental, a suite of AWS services for video providers to build their platforms on AWS.
When you’re building a system like Amazon Prime Video, there’s several steps you have to go through
Video Ingestion
Encoding
Packaging
Delivery
We’ll break down each of these and talk about the tech AWS uses.
Video Ingestion
The first step is to ingest the raw video feed from the recording studio/event venue. AWS asks their partners to deliver multiple feeds of the raw video so that they can immediately switch to a backup if one of the feeds fails.
This feed goes to AWS Elemental MediaConnect, a service that allows for the ingestion of live video in AWS Cloud. MediaConnect can then distribute the video to other AWS services or to some destination outside of AWS.
It supports a wide range of video transmission protocols like Zixi and RTP. The content is also encrypted so there’s no unauthorized access or modification of the feed.
Encoding
The raw feed from the original source is typically very large and not optimized for transmission or playback.
Video codecs solve this problem by compressing/decompressing digital video so it’s easier to store and transmit. Commonly used codecs include H.265, VP9, AV1 and more. Each codec comes with its own strengths/weaknesses in terms of compression efficiency, speed and video quality.
During the encoding stage, multiple versions of the video files are created where each has different sizes and is optimized for different devices.
This will be useful during the delivery stage for adaptive bitrate streaming, where AWS can deliver different versions of the video stream depending on the user’s network conditions. If the user is traveling and moves from an area of good-signal to poor-signal, then AWS can quickly switch the video feed from high-quality to low-quality to prevent any buffering.
For encoding, Prime video uses AWS Elemental MediaLive.
Packaging
The next stage is packaging, where the encoded video streams are organized into formats suitable for delivery over the internet. This is also where you add in things like DRM (digital rights management) protections to prevent (or at least reduce) any online piracy around the video stream.
In order to stream your encoded video files on the internet, you’ll need to use a video streaming protocol like MPEG-DASH or HLS. These are all adaptive bitrate streaming protocols, so they’ll adapt to the bandwidth and device capabilities of the end user to minimize any buffering. This way, content can be delivered to TVs, mobile phones, computers, gaming consoles, tablets and more.
The output of the packaging stage is a bunch of small, segmented video files (each chunk is around 2 to 10 seconds) and a manifest file with metadata on the ordering of the chunks, URLs, available bitrates (quality levels), etc.
This data gets passed on to a content delivery network.
Delivery
The final stage is the delivery stage, where the manifest file and the video chunks are sent to end users. In order to minimize latency, you’ll probably be using a Content Delivery Network like AWS CloudFront, Cloudflare, Akamai, etc.
Prime Video uses AWS CloudFront and users can download from a multitude of different CDN endpoints so there’s reliability in case any region goes down.
Achieving 5 9’s of Reliability
The key to achieving high availability is redundancy. If you have a component with a 1% rate of failure, then you can take two of those components and set them up in a configuration where one will immediately step in if the other fails.
Now, your system will only fail if both of these components go down (which is a 0.01% probability assuming the components are independent… although this might not be the case).
With Amazon Prime Video, they deploy each system in at least two AWS Regions (these regions are designed to be as independent as possible so one going down doesn’t bring down the other region).
AWS Elemental also provides an in-Region redundancy models. This deploys redundant Elemental systems in the same region at a reduced cost. If one of the systems fails for whatever reason, then it can seamlessly switchover to the other system.
Each of the AWS Elemental systems provided an SLA of 3 9’s. By utilizing redundancy and parallelizing all their components in different availability zones, Amazon Prime Video is able to achieve an expected uptime of 99.999% (5 9’s).
For more details, read the full article here.