How Grab Categorizes Tens of Millions of Users in Milliseconds
Technologies discussed include Apache Spark and ScyllaDB. Plus, lessons on starting a bootstrapped business as a solo founder.
Today we'll be talking about
How Grab Segments Tens of Millions of Users in Milliseconds
Grab is one of the largest tech companies in Southeast Asia with tens of millions of monthly users
They need to quickly categorize these users based on preset rules and user behavior. They also need to share this category data with other backend services in an efficient way.
We’ll talk about how they built this and why they use Apache Spark and ScyllaDB
Lessons from Successful One Person Startups
Engineer’s Codex wrote a terrific blog post delving into lessons he learned while talking to solo founders of successful bootstrapped businesses (doing more than $20k per month in profit)
Founders of these businesses tend to ship fast, invest in marketing and stay focused
How the CTO of Amazon takes Notes
Software Engineering Templates for Design Docs, Postmortem Reviews, PRs and more
Transformers Explained from Scratch
Anatomy of a Terminal Emulator
The fastest way to get promoted is to work on projects that have a big impact on your company.
Big impact => better performance review => promotions and bigger bonuses.
But, how do you know what work is useful?
The key is in combining your abilities as a developer with product skills.
If you have a good sense of product, then you can understand what users want and which features will help the company get more engagement, revenue and profit.
Product for Engineers is a fantastic newsletter that’s dedicated to helping you learn these exact skills.
It’s totally free and they send out curated lessons for developers on building new features users love, how to run successful A/B tests, how to find product-market fit and much more.
How Grab Segments Tens of Millions of Users in Milliseconds
Grab is one of the largest tech companies in Southeast Asia with over 30 million monthly users. The company started as a ride-sharing platform but they’ve expanded into a “super-app” with financial services, food delivery, mobile payments and more.
One important backend feature in the Grab app is their segmentation platform. This allows them to group users/drivers/restaurants into segments (sub-groups) based on certain attributes.
They might have a segment for drivers with a perfect 5 star rating or a segment for the penny-pinchers who only order from food delivery when they’re given a 25% off coupon (i.e. me).
Grab uses these segments for a variety of features
Experimentation - Grab can set feature flags to only show certain buttons/screens to users in a certain segment.
Blacklisting - When a driver goes on the Grab app to find jobs, the Drivers service will call the Segmentation Platform to make sure the driver isn’t blacklisted.
Marketing - Grab’s communications team uses the Segmentation platform to determine which users get certain marketing communications.
Grab creates many different segments so the platform needs to handle a write-heavy workload. That being said, many other backend services are querying the platform for info on which users are in a certain segment, so the ability to handle lots of reads is also crucial.
The Segmentation Platform handles up to 12k reads QPS (queries per second) and 36k write QPS with a P99 latency of 40 ms (99% of requests are answered within 40 milliseconds).
Jake Ng is a senior software engineer at Grab and he wrote a fantastic blog post delving into the architecture of Grab’s system and some problems they had to solve.
Segmentation Platform Architecture
The Segmentation Platform consists of two major subsystems
Segment Creation - Grab team members can create new segments with certain rules (only include users who have logged onto the app every day for the last 2 weeks). The Segment Creation system is responsible for identifying all the users who fit that criteria and putting them in the segment.
Segment Serving - Backend services at Grab can query the Segmentation Platform to get a list of all the users who are in a certain segment.
For creating segments, Grab makes use of Apache Spark. In a past article, we did a deep dive on Spark that you can check out here.
Spark is one of the most popular big data processing frameworks out there. It runs on top of your data storage layer, so you can use Spark to process data stored on AWS S3, Cassandra, MongoDB, MySQL, Postgres, Hadoop Distributed File System, etc.
With Spark, you chain together multiple transformations on your data (map, filter, union, reduce, etc.). Then, you call an action on your dataset and Spark creates jobs to execute the transformations.
Segment creation at Grab is powered by Spark jobs. Whenever a Grab team creates a segment, Spark will retrieve data from the data lake, clean/validate it and then populate the segment with users who fit the criteria.
For storing the segment data, Grab relies on ScyllaDB.
Previously, we delved into Cassandra when we talked about how Uber scaled the database to tens of thousands of nodes.
Cassandra is a NoSQL, distributed database created at Facebook and it took many ideas from Google Bigtable and Amazon’s Dynamo. It’s a wide column store that’s designed for write heavy workloads.
However, there are issues with Cassandra.
Performance Bottlenecks with Java - Cassandra is written in Java so it’s subject to garbage collection pauses. These pauses can cause unpredictable latency spikes and occasional delays.
Operational Complexity - Getting the optimal performance out of a Cassandra set up can require deep knowledge of its internal workings and a lot of manual tunings. Understanding how to set the heap size, compaction strategies, cache settings, etc. can be very esoteric.
ScyllaDB was created in 2015 with the goal of being a “better version of Cassandra”. It’s designed to be a drop-in replacement as it’s fully compatible with Cassandra (supports Cassandra Query Language, has compatible data models, etc.).
It’s written in C++ for better performance and also comes with self-tuning features to make it easier to use than Cassandra.
Discord wrote a great blog post delving into the issues they had with Cassandra and why they switched to ScyllaDB.
Grab picked ScyllaDB because of how scalable it is (distributed with no single point of failure, similar to Cassandra) and it’s ability to meet their latency goals (they needed 99% of requests to be served within 80 milliseconds).
They have a set of Go services that power serving Segment data.
In order to ensure even balancing of data across ScyllaDB shards, they partition their database by User ID.
With this, the Segmentation Platform handles up to 12,000 reads per second and 36,000 writes per second with 99% of requests being served in under 40 milliseconds.
For a deeper dive, please check out the full blog post here.
How did you like this summary?
Your feedback really helps me improve curation for future emails.
Product for Engineers is a fantastic newsletter by PostHog that helps developers learn how to find product-market fit and build apps that users love.
A/B testing and experimentation are crucial for building a feature roadmap, improving conversion rates and accelerating growth. However, many engineers don’t understand the ins-and-outs of how to run these tests effectively.
This edition of Product for Engineers delves into A/B tests and discusses
The 5 traits of good A/B tests
How to think about statistical significance and p-values
Avoiding false positives
To hone your product skills, check out Product for Engineers below.
Lessons from Successful One Person Startups
Engineer’s Codex is a fantastic developer newsletter that covers topics in growing your career/income, real-world case studies, engineering research papers and more.
In a past edition, the newsletter delved into one-person companies that bootstrapped their way to earning more than $20k USD per month. The post discussed the strategies these entrepreneurs employed to stay motivated, minimize risk, and enhance their chances of success.
Here’s a summary of some of the tips
Ship Small and Fast
Releasing your work to the world can be daunting. By doing so, you expose yourself to potential criticism and vulnerability.
For this reason, many indie-founders can fall into a cycle of endless building ("just one more feature until it’s ready") to sidestep the intimidating process of launching their work and receiving real-world feedback.
On the other hand, successful indie hackers are constantly shipping new features/products and getting feedback as soon as possible. Often, they can go from idea to MVP in just a few days.
This approach lets them quickly discern whether they have a hit or miss on their hands, enabling them to pivot or double down accordingly.
Having a Marketing Strategy Beforehand
Successful indie hackers allocate as much time to marketing as they do to engineering. Many have built personal brands on Twitter, while others excel in paid ads, SEO, or email marketing.
Build In Public is a common marketing strategy that goes hand-in-hand with shipping small and fast. While running their business, these entrepreneurs openly share their progress on platforms like Twitter, inviting customers to accompany them on their journey.
Focus on your Unfair Advantages and Passions
Figure out what you have that other founders don’t. If you’re reading this newsletter, then one big unfair advantage you’ve got is the ability to code and quickly ship an MVP.
Another unfair advantage could be your network. Perhaps you’ve got a wide range of friends at various tech companies and they could intro you to potential clients or partners (if you’re selling a B2B product for example).
Before you start a business, figure out exactly what your unfair advantages are. See how you can best take advantage of them.
If you’re a one-person team, then you have no choice but to focus. Trying to build multiple products at the same time (and then quickly giving up when you don’t see initial traction) is a surefire way to failure.
The best strategy Engineer’s Codex saw revolved around two-month sprints. An engineer told him about how he’d work on an idea during nights and weekends for two months.
At the end of the two months, he’d re-evaluate progress and see if there was enough promise to continue.
These are just a couple of tips from the blog post.
For the full list, please read the post here.
You can subscribe to Engineer’s Codex here.