How Netflix Implements Load Shedding

How Netflix classifies traffic and sheds load to ensure a smooth experience during periods of high usage.

Hey Everyone!

Today we’ll be talking about

  • How Netflix Implemented Load Shedding

    • Categorizing Traffic as High Priority or Low Priority

    • Implementing Load Shedding in their API Gateway

    • Testing using Principles from Chaos Engineering

  • Tech Snippets

    • Why Consensus is Harder Than It Looks

    • How to Coach a Mentee from Junior to Senior Engineer

    • Advice for Engineering Managers Who Want to Climb the Ladder

    • Computer Graphics from Scratch

How Netflix Implemented Load Shedding

The API Gateway sits between the backend and the client and it handles things like rate limiting, authentication, monitoring and routing requests to all the various backend services.

One crucial feature that API gateways can handle is load shedding, where it can start ignoring certain requests during times of high stress (there’s a spike in user trafic or something’s wrong with your backend).

Non-critical requests like logging or background requests can be dropped/shed by the API gateway so that critical requests (that directly impact the user experience) have fewer failures.

Netflix built and maintains a popular API Gateway called Zuul and they gave a great talk at AWS Re:Invent 2021 about how they designed and tested Zuul’s prioritized load shedding feature for their internal use.

Here’s a summary of the talk

During times of high-stress, Netflix uses load shedding to minimize the impact on end users.

For Netflix, high-stress periods can be due to failure in the backend (under-saled services, network issues, cloud provider outages, etc.) or from spikes in traffic (a new season of Love is Blind).

Despite all the effort Netflix engineers put into developing resiliency, there are still many different incidents that degrade user experience.

With load-shedding, Netflix will start ignoring low-priority requests. For example, Zuul will start dropping requests for show trailers.

When you’re scrolling through Netflix, the default behavior of the website is to autoplay the trailer of whatever movie/TV show you have selected.

If the system is under severe strain, then Netflix will ignore these requests and the client will fall-back on just displaying the show’s cover image (and not playing any trailer).

This doesn’t really hurt the user experience that much and it allows the system to prioritize requests that directly relate to the streaming experience for users.

In order to implement prioritized load shedding, Netflix engineers went through 3 steps:

  1. Define a Request Taxonomy - Create a way to categorize requests by priority and assign a score to each request that describes how critical the request is to the user streaming experience.

  2. Implement the Load Shedding Algorithm - Netflix chose to implement it in their API Gateway, Zuul. They use a cubic function to determine the rate at which they load shed requests.

  3. Validate Assumptions using Fault Injection - The Chaos Engineering discipline started at Netflix and the company uses those principles for testing system resilience in a scientific way.

We’ll go through each of these steps in more detail.

Define a Request Taxonomy

Netflix created a scoring system from 0 to 100 that assigns a priority to a request, with 0 being the highest priority and 100 being the lowest priority.

The score was based on 4 dimensions

  • Functionality - What functionality gets impacted if this request gets throttled? Is it important to the user experience? For example, if logging-related requests are throttled then that doesn’t really hurt the user experience.

  • Throughput - Some traffic is higher throughput than others. Logs, background requests, events data, etc. are higher throughput and they contribute to a large percentage of load on the system. Throttling them will have a bigger impact on reducing load. 

  • Criticality - If this request gets throttled, is there a well-defined fallback that still delivers an acceptable user experience? For example, if the client’s request for the movie trailer gets blocked, then the fallback is to just show the image for the movie. This is acceptable.

  • Request State - Was the request initiated by the user? Or was the request initiated by the Netflix app?

Using these dimensions, the API gateway assigns a priority score to every request that comes in.

Load Shedding Algorithm

The first decision was where to implement the load shedding algorithm. Netflix decided to put the logic in their API Gateway, Zuul.

(Note - NLB stands for Network Load Balancer)

When a request comes in to Zuul, the first thing that the gateway does is execute a set of inbound filters. These filters are responsible for decorating the incoming request with extra information. This is where the priority score is computed and added to the request.

With the priority score information, Zuul can now do global throttling. This is where Zuul will throttle requests below a certain priority threshold. This is meant to protect the API gateway itself. The metrics used to trigger global throttling are concurrent requests, connection count and CPU utilization.

Netflix also implemented service throttling, where they can load shed requests for specific microservices that Zuul is talking to. Zuul will monitor the error rate and concurrent requests for each of the microservices. If a threshold is crossed for those metrics, then Zuul will reduce load on the service by throttling traffic above a certain priority level.

In order to calculate the priority level, Netflix uses a cubic function. When the overload percentage is at 35%, Netflix will shed any requests that are above 95% priority. When the overload percentage reaches 80%, then the API Gateway will shed any request with a priority score of greater than ~50.

Validating Assumptions using Chaos Testing

A Fault Injection experiment is where you methodically introduce disruptive events (spike in traffic, CPU load, increased latency, etc.) in your testing or production environments and observe how the system responds.

Netflix routinely runs these types of experiments in their production environment and have built tools like Chaos Monkey and ChAP (Chaos Automation Platform) to make this testing easier. 

They created a failure injection point in Zuul that allowed them to shed any request based on a configured priority. Therefore, they could manually simulate a load shedded experience and get an idea of exactly how shedding certain requests affected the user.

Engineers staged an A/B test that will allocate a small number of production users to either a control or treatment group for 45 minutes. During that time period, they’ll throttle a range of priorities for the treatment group and measure the impact on playback experience.

This allows Netflix to quickly determine how the load shedding system is performing across a variety of client devices, client versions, locations, etc.

Tech Snippets