How Grab Implemented Rate Limiting
We'll talk about Bloom Filters, Roaring Bitmaps and the Sliding Window Rate Limiting Algorithm. Plus, Basecamp's first year of results after shifting away from the cloud.
Hey Everyone!
Today we’ll be talking about
How Grab Implemented Rate Limiting
Grab needs to rate limit the number of notifications a user gets from the app. Too many notifications and the user might get annoyed.
They sort users into different groups and keep track of them with the Roaring Bitmap data structure (based on compressing bitmaps in a clever way). We’ll also talk about Bloom Filters and why Grab didn’t go that route
Using the Sliding Window Rate Limiting algorithm and Redis’s sorted set data structure to keep track of how many notifications a user has gotten.
Tech Snippets
Basecamp’s First Year Results from Shifting Away from the Cloud
How to quickly get into flow when coding
How to scale an app to 10 million users on AWS
Making $30k+ in side income by Writing Online as a Software Engineer
The Importance of Linchpins, and Why You Should Become One
How Grab Implemented Rate Limiting
Grab is one of the largest companies in Southeast Asia with a market cap of over $12 billion USD. They’re a “super app”, where you can use Grab to book a taxi, order food delivery, send money to friends, get a loan and much more.
Having all these verticals means that different divisions in the company can sometimes compete for user attention.
A product manager in the Grab Food division might want to send users a notification with a 10% off coupon on their next food delivery. Meanwhile, someone in Grab Taxi might try to send users a notification on how they just added a new Limousine option.
Bombarding users with all these notifications from the different divisions is a really great way to piss them off. This could result in users turning off marketing communication for the Grab app entirely.
Therefore, Grab has strict rate limiting in place for the amount of notifications they can send to each user.
Naveen and Abdullah are two engineers at Grab and they wrote a terrific blog post delving into how this works.
Rate Limiting Communication to Users
Grab has different rate limiting levels depending on how often a user uses the app.
If someone just downloaded the app and is only using it for food delivery, then they should receive a small amount of notifications.
On the other hand, if someone is a Grab power-user and they’re constantly using taxi services, food delivery and other products, then they should have much higher rate limits in place.
To deal with this, Grab broke up their user base into different segments based on the amount of interaction the customer has with the app.
To limit the amount of notifications, they built a rate limiter that would check which segment a user was in, and would rate limit the amount of comms that user was sent based on their interaction with the app.
Grab has close to 30 million monthly users and sends hundreds of millions of notifications per day.
We’ll talk about how Grab built this rate limiter to work at scale.
Storing and Retrieving Member Segment Information
The first core problem Grab was trying to solve was to track membership in a set. They have millions of users and want to place these users into different groups based on characteristics (how much the user uses the app). Then, given a user they want to check if that user is in a certain group.
Because they have tens of millions of active users, they were looking for something more space efficient than a set or a bitmap/bit array. Instead, they wanted something that could track set membership without having memory usage grow linearly with the user base size. They also wanted insertions/lookups for the set to be logarithmic time complexity or better.
Two space-efficient ways of keeping track of which items are in a set are
Bloom Filter
Roaring Bitmaps
We’ll delve into both of these and talk about the trade-offs Grab considered.
Bloom Filter
A bloom filter will tell you with certainty if an element is not in a group of elements. However, it can have false positives, where it indicates an element is in the set when it’s really not.
For example, you might create a bloom filter to store usernames (string value). The bloom filter will tell you with certainty if a certain username has not been added to the set.
On the other hand, if the bloom filter says that a username has been added, then you have to check to make sure it’s not a false positive.
Under the hood, Bloom Filters work with a bit array (an array of 0s and 1s).
You start with a bit array full of 0s. When you want to add an item to the Bloom Filter, then you hash the item to convert it to an integer and map that integer to a bit in your array. You set that bit (flip it to 1).
When you want to check if an item is inside your bit-array, you repeat the process of hashing the item and mapping it to a bit in your array. Then, you check if that bit in the array is set (has a value of 1). If it’s not (the bit has a value of 0), then you know with 100% certainty that the item is not inside your bloom filter.
However, if the bit is set, then all you know is that the item could possibly be in your set. Either the item has already been added to the bloom filter, or another item with the same hashed bit value has been added (a collision). If you want to know with certainty if an item has been added to the group, then you’ll have to put in a second layer to check.
Another downside of Bloom Filters is that you cannot remove items from the Bloom Filter. If you add an item, and later on want to remove it from the filter, you can’t just set the bit’s value from 1 to 0. There might be other values in the Bloom Filter that have hashed to that same bit location, so setting the 1 to a 0 would cause them to be removed as well.
Due to the issues with collisions and the inability to remove items from the set, Grab decided against using Bloom Filters.
Roaring Bitmap
Instead of Bloom Filters, a data structure that can be used to represent a set of integers is a roaring bitmap. This is a much newer data structure and was first published in 2016.
Roaring bitmaps will let you store a set of integers but they solve the two problems with bloom filters
You definitively know whether or not an item is in the roaring bitmap
You can remove items from the roaring bitmap
They’re similar to the bitmap data structure but they’re compressed in a very clever way that makes them significantly more space efficient while also being efficient with insertions/deletions.
Basically, when you try to insert an integer into the roaring bitmap, it will first break the number down into smaller parts (the most significant bits and the least significant bits) and then store each smaller part in a container. This container is either an array, a bitmap or a run-length encoding depending on the specific characteristics of the content of the container.
If you’re interested in delving into exactly how roaring bitmaps work, this is the best article I found that explains them in a simple, visual manner.
Grab developed a microservice that abstracts the roaring bitmap’s implementation and provides an API to check set membership and get a list of all the elements in the set.
Tracking number of Comms sent to each user
After identifying the appropriate segment, they apply the specific limits to the user using a distributed rate limiter.
In order to do this, they have to keep track of all the users who’ve recently gotten notifications and count how many notifications have been sent to each of those users.
For handling this, they tested out Redis (using AWS Elasticache) and DynamoDB.
They decided to go with Redis because
Higher throughput at Lower Latency
Cost Effective
Better at handling Spiky Rate Limiting Workloads at Scale
Using Redis, they implemented the Sliding Window Rate Limiting algorithm.
This is a pretty common rate limiting strategy where you divide up time into small increments (depending on your use case this could be a second, a minute, an hour, etc.) and then count the number of requests sent within the window.
If the number of requests goes beyond the rate-limiting threshold, then the next request will get denied. As time progresses, the window “slides” forward. Any past request that is outside the current window is no longer counted towards the limit.
Grab built this with the Sorted Set data structure in Redis. The algorithm takes O(log n) time complexity where n is the number of request logs stored in the sorted set.
For more details, you can read the full article here.
How To Remember Concepts from Quastor Articles
We go through a ton of engineering concepts in the Quastor summaries and tech dives.
In order to help you remember these concepts for your day to day job (or job interviews), I’m creating flash cards that summarize these core concepts from all the Quastor summaries and tech dives.
The flash cards are completely editable and are meant to be integrated in a Spaced-Repetition Program (Anki) so that you can easily remember the concepts forever.
This is an additional perk for Quastor Pro members in addition to the technical deep dive articles.
Quastor Pro is only $12 a month and you should be able to expense it with your job’s learning & development budget. Here’s an email you can send to your manager. Thanks for the support!