How Netflix Implements Load Shedding

How Netflix classifies traffic and sheds load to ensure a smooth experience during periods of high usage.

May 31, 2024

Hey Everyone!

Today we’ll be talking about

How Netflix Implemented Load Shedding
- Categorizing Traffic as High Priority or Low Priority
- Implementing Load Shedding in their API Gateway
- Testing using Principles from Chaos Engineering
Tech Snippets
- Why Consensus is Harder Than It Looks
- How to Coach a Mentee from Junior to Senior Engineer
- Advice for Engineering Managers Who Want to Climb the Ladder
- Computer Graphics from Scratch

How Netflix Implemented Load Shedding

The API Gateway sits between the backend and the client and it handles things like rate limiting, authentication, monitoring and routing requests to all the various backend services.

One crucial feature that API gateways can handle is load shedding, where it can start ignoring certain requests during times of high stress (there’s a spike in user trafic or something’s wrong with your backend).

Non-critical requests like logging or background requests can be dropped/shed by the API gateway so that critical requests (that directly impact the user experience) have fewer failures.

Netflix built and maintains a popular API Gateway called Zuul and they gave a great talk at AWS Re:Invent 2021 about how they designed and tested Zuul’s prioritized load shedding feature for their internal use.

Here’s a summary of the talk

During times of high-stress, Netflix uses load shedding to minimize the impact on end users.

For Netflix, high-stress periods can be due to failure in the backend (under-saled services, network issues, cloud provider outages, etc.) or from spikes in traffic (a new season of Love is Blind).

Despite all the effort Netflix engineers put into developing resiliency, there are still many different incidents that degrade user experience.

With load-shedding, Netflix will start ignoring low-priority requests. For example, Zuul will start dropping requests for show trailers.

When you’re scrolling through Netflix, the default behavior of the website is to autoplay the trailer of whatever movie/TV show you have selected.

If the system is under severe strain, then Netflix will ignore these requests and the client will fall-back on just displaying the show’s cover image (and not playing any trailer).

This doesn’t really hurt the user experience that much and it allows the system to prioritize requests that directly relate to the streaming experience for users.

In order to implement prioritized load shedding, Netflix engineers went through 3 steps:

Define a Request Taxonomy - Create a way to categorize requests by priority and assign a score to each request that describes how critical the request is to the user streaming experience.
Implement the Load Shedding Algorithm - Netflix chose to implement it in their API Gateway, Zuul. They use a cubic function to determine the rate at which they load shed requests.
Validate Assumptions using Fault Injection - The Chaos Engineering discipline started at Netflix and the company uses those principles for testing system resilience in a scientific way.

We’ll go through each of these steps in more detail.

Define a Request Taxonomy

Netflix created a scoring system from 0 to 100 that assigns a priority to a request, with 0 being the highest priority and 100 being the lowest priority.

The score was based on 4 dimensions

Functionality - What functionality gets impacted if this request gets throttled? Is it important to the user experience? For example, if logging-related requests are throttled then that doesn’t really hurt the user experience.
Throughput - Some traffic is higher throughput than others. Logs, background requests, events data, etc. are higher throughput and they contribute to a large percentage of load on the system. Throttling them will have a bigger impact on reducing load.
Criticality - If this request gets throttled, is there a well-defined fallback that still delivers an acceptable user experience? For example, if the client’s request for the movie trailer gets blocked, then the fallback is to just show the image for the movie. This is acceptable.
Request State - Was the request initiated by the user? Or was the request initiated by the Netflix app?

Using these dimensions, the API gateway assigns a priority score to every request that comes in.

Load Shedding Algorithm

The first decision was where to implement the load shedding algorithm. Netflix decided to put the logic in their API Gateway, Zuul.

(Note - NLB stands for Network Load Balancer)

When a request comes in to Zuul, the first thing that the gateway does is execute a set of inbound filters. These filters are responsible for decorating the incoming request with extra information. This is where the priority score is computed and added to the request.

With the priority score information, Zuul can now do global throttling. This is where Zuul will throttle requests below a certain priority threshold. This is meant to protect the API gateway itself. The metrics used to trigger global throttling are concurrent requests, connection count and CPU utilization.

Netflix also implemented service throttling, where they can load shed requests for specific microservices that Zuul is talking to. Zuul will monitor the error rate and concurrent requests for each of the microservices. If a threshold is crossed for those metrics, then Zuul will reduce load on the service by throttling traffic above a certain priority level.

In order to calculate the priority level, Netflix uses a cubic function. When the overload percentage is at 35%, Netflix will shed any requests that are above 95% priority. When the overload percentage reaches 80%, then the API Gateway will shed any request with a priority score of greater than ~50.

Validating Assumptions using Chaos Testing

A Fault Injection experiment is where you methodically introduce disruptive events (spike in traffic, CPU load, increased latency, etc.) in your testing or production environments and observe how the system responds.

Netflix routinely runs these types of experiments in their production environment and have built tools like Chaos Monkey and ChAP (Chaos Automation Platform) to make this testing easier.

They created a failure injection point in Zuul that allowed them to shed any request based on a configured priority. Therefore, they could manually simulate a load shedded experience and get an idea of exactly how shedding certain requests affected the user.

Engineers staged an A/B test that will allocate a small number of production users to either a control or treatment group for 45 minutes. During that time period, they’ll throttle a range of priorities for the treatment group and measure the impact on playback experience.

This allows Netflix to quickly determine how the load shedding system is performing across a variety of client devices, client versions, locations, etc.

Tech Snippets

Why Consensus is Harder Than It Looks

Marc Booker is a Distinguished Engineer at Amazon and he writes a great blog where he talks about distributed systems.

In this post, he delves into some of the complexities that people often overlook when building a consensus system. He talks about ensuring determinism, effective monitoring and control. He ends with whether or not you truly need strong consistency.

brooker.co.za/blog/2020/10/05/consensus.html

How Jordan Cutler’s Mentee went from Junior to Senior Engineer in 2 years

Jordan Cutler is a senior software engineer at Pinterest. He was able to help one of his mentees get promoted from a junior-role to a senior role in 2 years by helping them find high impact projects and run a “weekly-braindump” where they kept track of their wins.

He delves into the details in his blog post.

read.highgrowthengineer.com/p/my-mentee-went-from-junior-senior

Advice for Engineering Managers Who Want to Climb the Ladder

Charity Majors is the co-founder and CTO of Honeycomb and she wrote a great post with advice for Engineering Managers on how they can advance in their careers.

The post is centered around becoming an Engineering Director (go from managing ICs to managing managers).

The most opportunities will be at fast-growing startups with at least 100 engineers. You should demonstrate impact beyond your team and be proactive in communicating problems with your manager/management (along with solutions).

Check out the full post for her advice.

charity.wtf/2022/06/13/advice-for-engineering-managers-who-want-to-climb-the-ladder

Computer Graphics from Scratch

This is a fantastic, free textbook that delves into building a raytracer and a rasterizer from scratch.

The code is all language-agnostic, so you don’t need to know C++. You just need basic programming knowledge and high school level math to follow along.

www.gabrielgambetta.com/computer-graphics-from-scratch/index.html