How the BBC uses Serverless

The BBC website must be able to tolerate huge spikes in load. To do this, they rely on AWS Lambda functions.

July 29, 2022

Hey Everyone!

Today we’ll be talking about

How the BBC uses Serverless
- The BBC’s shift to AWS and the structure of their backend
- Choosing cloud functions vs. virtual machines
- Optimizing cloud function memory profiles and cold starts

Tech Snippets
- Papers We Love - A Community for going through Academic Computer Science Papers
- Lessons Learned from 15 years of Open Source Development
- Building a Fast, Concurrent Database in Rust
- How to Make Your Code Reviewer Fall in Love With You

How the BBC uses Serverless

The BBC is the national broadcaster for the UK and is the world’s oldest and largest national broadcaster with more than 20,000 employees.

The website operates at a massive scale, with over half of the UK’s population using the site every week (along with tens of millions of additional users from across the world). They have content in 44 different languages and have hundreds of different page types (news articles, food recipes, videos, etc.).

Until a few years ago, the website was written in PHP and hosted on two datacenters near London. However, the engineering team has rebuilt the website on AWS and used newer technologies like ReactJS.

The website relies heavily on Functions as a Service (FaaS) for scaling, specifically AWS Lambda functions.

Jonathan Ishmael is the Lead Technical Architect at the BBC, and he wrote a great series of blog posts on why the BBC chose serverless and how their backend works.

Here’s a summary

Before getting into the choice of serverless, it’s important to get some context about the type of workloads that the BBC website has to serve.

Traffic to the website can fluctuate greatly depending on current events, social media traffic, etc. These events can be predictable (a traffic spike during a national election) but they can also be random.

During the 2019 London Bridge attack, requests for the BBC’s coverage of the event resulted in a 3x increase in traffic in a single minute (4,000 req/s to 12,000 req/s). Within the next few minutes, traffic doubled again (from 12,000 req/s to 20,000 req/s).

If there’s an unexpected, consequential event then the BBC’s article about it can quickly start trending on social media. This brings a massive amount of traffic.

It’s extremely important that the BBC website be able to quickly scale up with these traffic patterns, so that people can get access to information during an emergency.

The BBC’s Backend

The BBC’s backend stack can be divided into several layers.

Traffic Management Layer

All traffic to the BBC website goes to the Global Traffic Manager, which is a web server based on Nginx. This layer handles thousands of requests per second and is run on AWS EC2 instances.

The layer handles caching, sanitizing requests and forwarding traffic to the relevant backend services.

The EC2 instances run with 50% reserve capacity available for handling bursts of traffic. They don’t have a CPU intensive workload, so AWS autoscaling works well for high traffic events.

Web Rendering Layer

The BBC uses ReactJS for their website. They make use of React’s server-side rendering feature to reduce the initial page load time when someone first visits the website. The Web Rendering layer is where the server side rendering happens.

This rendering process is quite compute-intensive, which means a ton of strain in scenarios where the amount of traffic to the BBC website shoots up (the rendering layer becomes the stress point). AWS EC2 autoscaling typically takes a few minutes to add capacity, which is too slow for scaling the rendering layer (the amount of traffic to the site can double in less than a minute).

Therefore, the BBC relies on AWS Lambda functions for the rendering as they can scale up much faster. Approximately 2,000 lambdas run every second to create the BBC website and AWS will automatically provision more compute when there’s a burst of traffic (there will be a small cold start time discussed below).

Business Layer

The Rendering Layer focuses solely on presentation, and it fetches data through a REST API provided by the Business Layer.

The BBC has a wide variety of content types (TV shows, movies, weather forecasts, etc.) and each one has different data / business logic.

The Business Layer is responsible for taking data from all the various BBC backend systems and transforming it into a common data model for the Web Rendering layer.

The REST API is run on EC2 instances while Lambda functions are used for the compute-intensive task of transforming data from all the different systems into a common data model.

The EC2 instances also handle intermediate caching to reduce load on the Lambda functions.

Platform and Content Production

The last two layers provide a wide range of services and tools that allow content to be created, controlled, stored and processed.

Optimizing Performance

The BBC team wanted to make sure they were optimizing their serverless functions to reduce cost and improve user experience. We’ll go through a couple of the things they did.

Caching

As discussed above, the BBC has two layers that rely on serverless functions: the web rendering layer and the Business Layer.

However, they made sure to put in an intermediate caching layer between the two serverless functions to avoid the rendering functions calling any business logic functions directly.

If they didn’t, then the rendering function would be sitting idle while the business logic function was working. Serverless functions are billed by GB-seconds (number of seconds your function runs for multiplied by the amount of RAM consumed), so any time spent idle is money being wasted.

The caching layers ensure that most business logic serverless functions can complete in under 50 milliseconds, reducing idle time for the rendering function.

Memory Profile

When you’re working with Lambda functions, the main configurable parameter is the amount of RAM each Lambda instance has (from 128 MB to 10 GB).

The amount of memory you select will impact the available vCPUs, which impacts your response time.

Although the BBC only needed ~200 megabytes for their React app, they found that 1 gigabyte of RAM gave them the optimal price/performance point.

Cold Start Times

When you make the first request to a serverless function, your cloud provider has to copy your code bundle to a physical machine and launch a container for you. This process is referred to as a cold start and you’ll have to wait for the Lambda function to spin up before you can get your response.

After the request is done, the cloud platform will keep the instance alive for 15-20 minutes (this differs based on provider) so any subsequent requests will not have to deal with a cold start time.

However, if you have a sudden burst in traffic, your cloud provider will have to spin up new instances to run your functions on. This means additional cold start times (although it’s still faster than using EC2 autoscaling).

Factors that impact the cold start time are RAM allocation per Lambda function (discussed above), size of the code bundle, time taken to invoke the runtime associated with your code (you can write your function in Java, Go, Python, JavaScript and more), etc.

You are not charged for any of the compute that happens during the cold start process, so engineers at the BBC took advantage of this. They used that time to establish network connections to all the APIs that they needed and also loaded any JavaScript requirements into memory.

Additionally, they optimized the RAM allocated per Lambda to minimize cold start time. They found that a 512 mb memory profile increased cold start time by 3x over a 1 gigabyte memory profile, which is part of the reason why they went with 1 gb of RAM allocated.

They ended up with an average cold start time of ~250 milliseconds, with a peak of 1-2 seconds.

Performance

The BBC is running over 100 million serverless function invocations per day with 90% of the invocations taking less than 220 milliseconds (for the rendering functions).

In terms of scalability, they’ve been able to go from a cold system at 0 requests/sec (with everything uncached) to 5,000 requests/sec within a few minutes.

For more details, you can read the full article here.

How did you like this summary?

Your feedback really helps me improve curation for future emails.

Tech Snippets

Papers We Love- This is an awesome community of computer scientists that hosts weekly meetups to discuss various papers in the academic CS field. There are chapters in Seattle, New York, San Francisco, Pune, Berlin, Chicago, Mumbai, London, Singapore, and a ton of other cities/countries. You can view their YouTube channel here.
Lessons learned from 15 years of SumatraPDF, an open source windows app - Krzysztof Kowalczyk is the creator of SumatraPDF, an open source e-reader for Windows (lets you read PDF, XPS, EPUB, MOBI, CHZ and other file formats). He’s been working on SumatraPDF for 15 years now, and he wrote a great blog post on lessons he learned about the code, product and business model of open source. In the past, Krzysztof worked at Microsoft, Palm, BitTorrent and a few small Silicon Valley startups.
Building a fast, concurrent database in Rust - Noria is an open source storage backend written in Rust that’s designed for read-heavy applications.Noria pre-computes and caches relational query results so that reads are extremely fast. It also automatically keeps the cache up-to-date as the underlying data changes.This talk is from Jon Gjengset (a PhD student at MIT’s Parallel and Distributed Operating Systems Group) on Rust and Noria. If you’d like to learn more about Noria, check out the paper a group at MIT published on it’s implementation.
How to Make Your Code Reviewer Fall in Love With You - A great checklist of things you should do when getting your code reviewed by a team-member. The golden rule is to value your reviewer's time. You can do this by doing things like writing a clear changelist description, separate functional and non-functional changes, minimize lag between rounds of review and more. Check out the blog post for the full list.

Interview Question

You are given an array of k linked lists.

Each list is sorted in ascending order.

Merge all the linked lists into one sorted linked list and return it.

Here’s the question in LeetCode.

Previous Question

As a refresher, here’s the previous question

You are given a non-negative number n.

Count the number of primes less than n and return the count.

Here’s the question in LeetCode.

Solution

There are quite a few Prime Number Sieves created to quickly find prime numbers.

One of the most efficient is the Sieve of Eratosthenes.

Here’s a fantastic explanation of how the sieve works. It's much easier to understand if you can see it visually rather than explained in a bunch of text.

We implement the sieve using an array of booleans.

We set all the booleans to true as we start by assuming every number is prime.

Then, we use a for loop to iterate through all the values from 2 to sqrt(n).

For every prime number we encounter ( the value in the array has not been set to false), we iterate through all the multiples of that prime number. We mark each multiple as False (marking it as a composite number).

At the end, we can get the count by counting the number of True booleans. This can be done with the sum function.