How Netflix Implements Load Shedding
How Netflix classifies traffic and sheds load to ensure a smooth experience during periods of high usage.
Today we’ll be talking about
- How Netflix Implemented Load Shedding
- Categorizing Traffic as High Priority or Low Priority
- Implementing Load Shedding in their API Gateway
- Testing using Principles from Chaos Engineering
- Tech Snippets
- How Uber uses Zig
- Write Your Own Miniature Redis With Python
- 3 Innovations While Unifying Pinterest's Key-Value Storage
- Why Go Does Not Need a Java Style GC
- A Curated List of Software and Architecture Related Design Patterns
How Netflix Implemented Load Shedding
The API Gateway sits between the backend and the client and it handles things like rate limiting, authentication, monitoring and routing requests to all the various backend services.
There are a ton of different tools you can use for an API Gateway like Nginx, Zuul (by Netflix), Envoy (by Lyft), offerings from all the major cloud providers, etc.
One feature that many API gateways provide is load shedding, where you can configure the gateway to automatically drop certain requests and ignore them. This is crucial for times when you face a spike in traffic or if something’s wrong with your backend (and you can’t handle the usual traffic).
Non-critical requests like logging or background requests can be dropped/shed by the API gateway so that critical requests (that impact the user experience) have fewer failures.
Netflix built and maintains a popular API Gateway called Zuul and they gave a great talk at AWS Re:Invent 2021 about how they designed and tested Zuul’s prioritized load shedding feature for their internal use.
Here’s a summary
Despite all the effort Netflix engineers put into developing resiliency, there are still many different incidents that degrade user experience.
Whether it’s something like under-scaled services, network blips, cloud provider outages, bugs in code or something else, engineers need to ensure that the end user experience is minimally affected. Netflix is a movie/tv-show streaming website, so this means that users should still be able to stream their movies and TV shows on their phone/laptop/TV/gaming console.
To ensure high availability, Netflix uses the load-shedding technique where low priority requests are dropped when the system is under severe strain.
An example of a request that can be dropped is requests for show trailers. When a user is scrolling through Netflix, the website will autoplay the trailer of whatever movie/TV show the user currently has selected.
If the system is under severe strain, then Netflix will ignore these requests and the client will fall-back on just displaying the show’s image and not playing any trailer. This doesn’t result in a severe degradation in user experience and it allows the system to prioritize requests that directly relate to the streaming experience for users.
In order to implement prioritized load shedding, Netflix engineers went through 3 steps
- Define a Request Taxonomy - Create a way to categorize requests by priority and assign a score to each request that describes how critical the request is to the user streaming experience.
- Implement the Load Shedding Algorithm - Netflix chose to implement it in their API Gateway, Zuul
- Validate Assumptions using Fault Injection - The Chaos Engineering discipline started at Netflix and the company uses those principles for testing system resilience in a scientific way.
We’ll go through each of these steps
Define a Request Taxonomy
Netflix created a scoring system from 0 to 100 that assigns a priority to a request, with 0 being the highest priority and 100 being the lowest priority.
The score was based on 4 dimensions
- Functionality - What functionality gets impacted if this request gets throttled? Is it important to the user experience? For example, if logging-related requests are throttled then that doesn’t really hurt the user experience.
- Throughput - Some traffic is higher throughput than others. Logs, background requests, events data, etc. are higher throughput and they contribute to a large percentage of load on the system. Throttling them will have a bigger impact on reducing load.
- Criticality - If this request gets throttled, is there a well-defined fallback that still delivers an acceptable user experience? For example, if the client’s request for the movie trailer gets blocked, then the fallback is to just show the image for the movie. This is acceptable.
- Request State - Was the request initiated by the user? Or was the request initiated by the Netflix app?
Using these dimensions, the API gateway assigns a priority score to every request that comes in.
Load Shedding Algorithm
The first decision was where to implement the load shedding algorithm. Netflix decided to put the logic in their API Gateway, Zuul.
(Note - NLB stands for Network Load Balancer)
When a request comes in to Zuul, the first thing that the gateway does is execute a set of inbound filters. These filters are responsible for decorating the incoming request with extra information. This is where the priority score is computed and added to the request.
With the priority score information, Zuul can now do global throttling. This is where Zuul will throttle requests below a certain priority threshold. This is meant to protect the API gateway itself. The metrics used to trigger global throttling are concurrent requests, connection count and CPU utilization.
Netflix also implemented service throttling, where they can load shed requests for specific microservices that Zuul is talking to. Zuul will monitor the error rate and concurrent requests for each of the microservices. If a threshold is crossed for those metrics, then Zuul will reduce load on the service by throttling traffic above a certain priority level.
In order to calculate the priority level, Netflix uses a cubic function. When the overload percentage is at 35%, Netflix will shed any requests that are above 95% priority. When the overload percentage reaches 80%, then the API Gateway will shed any request with a priority score of greater than ~50.
Validating Assumptions using Chaos Testing
A Fault Injection experiment is where you methodically introduce disruptive events (spike in traffic, CPU load, increased latency, etc.) in your testing or production environments and observe how the system responds.
Netflix routinely runs these types of experiments in their production environment and have built tools like Chaos Monkey and ChAP (Chaos Automation Platform) to make this testing easier.
They created a failure injection point in Zuul that allowed them to shed any request based on a configured priority. Therefore, they could manually simulate a load shedded experience and get an idea of exactly how shedding certain requests affected the user.
Engineers staged an A/B test that will allocate a small number of production users to either a control or treatment group for 45 minutes. During that time period, they’ll throttle a range of priorities for the treatment group and measure the impact on playback experience.
This allows Netflix to quickly determine how the load shedding system is performing across a variety of client devices, client versions, locations, etc.
For more details, you can watch the full talk here.
How did you like this summary?
Your feedback really helps me improve curation for future emails.
A Technical Dive on Managing Time Series Data
The amount of time series data companies are storing has been increasing exponentially. From stock prices in finance to sensor data in IoT to user analytics data in consumer apps… the possibilities are endless.
How do you structure and store time series data? How do time series workloads differ from typical OLTP or full text search? What does this mean for your choice of database?
The engineers who created InfluxDB wrote a fantastic technical ebook with answers to all these questions.
- Write your own miniature Redis with Python - Redis is an in-memory key-value store that supports quite a few data types like strings, sets, geospatial indexes and more. This is a great blog post where you’ll see the implementation of a toy version of Redis that provides an API for GET, SET, DELETE, FLUSH and supports multiple data types.
- 3 Innovations While Unifying Pinterest’s Key-Value Storage - Previously, Pinterest used hundreds of key-value database systems that were based on four different key-value storage engines: Terrapin, Rockstore, UserMetaStore and Rocksandra. Having all these instances meant duplicated development effort and unnecessarily high operational overhead. They migrated all of their key-value data to a single, unified storage system based on Rockstore. This blog post tells how and why they did that.
- Go Does Not Need a Java Style GC - This is an awesome blogpost that delves into the notoriety of Java’s garbage collector and some of the mistakes around it. It talks about why modern languages like Go or Rust don’t need complex garbage collectors like Java and how they implement automatic memory management instead.
- A curated list of software and architecture related design patterns - This is a great Github repo with a ton of resources on software architecture. It has resources on cloud architectures and patterns, serverless strategies, sql/nosql databases and more.
You are given the root node of a Binary Search Tree and a L and R value.
Return the sum of values of all the nodes with a value between L and R inclusive.
The Binary Search Tree will have unique values.
As a reminder, here’s our last question
You are given an array nums of distinct positive integers.
Return the number of tuples (a, b, c, d) such that
a * b = c * d
where a, b, c and d are elements of nums and
a != b != c != d
The naive solution would be to use 4 nested for loops.
We iterate through nums in 4 nested loops with each loop representing a, b, c and d respectively.
Then, we can count the number of tuples where a * b = c * d.
This would result in a time complexity of O(n^4).
We can do better by using more memory in exchange for less time.
We start by splitting up our calculations.
First, we calculate every possible product of a * b.
We use a nested for loop (only 2 loops this time) with the first loop representing a and the second loop representing b.
In our inner for loop, we calculate the possible products for a * b.
We’ll store all of these products in a hash table where the keys of the hash table are the products and the values are how many times each of those products appeared as a result of a * b.
After, we’ll have a second nested for loop, with the first loop representing c and the second loop representing d.
We calculate the value of c * d and then check our hash table for how many times this product appeared.
Then, we can add up how many times the product appeared to a counter that keeps track of this.
However, we must remember to avoid double counting due to the condition that
a != b != c != d
In order to do this, we subtract the value from our hash table by 2 (counting the two times where a = c, b = d and a = d, b = c.
After tallying up all the counts, we can return the total number of tuples.