How Grab Categorizes Tens of Millions of Users in Milliseconds

Technologies discussed include Apache Spark and ScyllaDB. Plus, lessons on starting a bootstrapped business as a solo founder.

October 28, 2023

Hey Everyone!

Today we'll be talking about

How Grab Segments Tens of Millions of Users in Milliseconds
- Grab is one of the largest tech companies in Southeast Asia with tens of millions of monthly users
- They need to quickly categorize these users based on preset rules and user behavior. They also need to share this category data with other backend services in an efficient way.
- We’ll talk about how they built this and why they use Apache Spark and ScyllaDB
Lessons from Successful One Person Startups
- Engineer’s Codex wrote a terrific blog post delving into lessons he learned while talking to solo founders of successful bootstrapped businesses (doing more than $20k per month in profit)
- Founders of these businesses tend to ship fast, invest in marketing and stay focused
Tech Snippets
- How the CTO of Amazon takes Notes
- Software Engineering Templates for Design Docs, Postmortem Reviews, PRs and more
- Transformers Explained from Scratch
- Anatomy of a Terminal Emulator

How Grab Segments Tens of Millions of Users in Milliseconds

Grab is one of the largest tech companies in Southeast Asia with over 30 million monthly users. The company started as a ride-sharing platform but they’ve expanded into a “super-app” with financial services, food delivery, mobile payments and more.

One important backend feature in the Grab app is their segmentation platform. This allows them to group users/drivers/restaurants into segments (sub-groups) based on certain attributes.

They might have a segment for drivers with a perfect 5 star rating or a segment for the penny-pinchers who only order from food delivery when they’re given a 25% off coupon (i.e. me).

Grab uses these segments for a variety of features

Experimentation - Grab can set feature flags to only show certain buttons/screens to users in a certain segment.
Blacklisting - When a driver goes on the Grab app to find jobs, the Drivers service will call the Segmentation Platform to make sure the driver isn’t blacklisted.
Marketing - Grab’s communications team uses the Segmentation platform to determine which users get certain marketing communications.

Grab creates many different segments so the platform needs to handle a write-heavy workload. That being said, many other backend services are querying the platform for info on which users are in a certain segment, so the ability to handle lots of reads is also crucial.

The Segmentation Platform handles up to 12k reads QPS (queries per second) and 36k write QPS with a P99 latency of 40 ms (99% of requests are answered within 40 milliseconds).

Jake Ng is a senior software engineer at Grab and he wrote a fantastic blog post delving into the architecture of Grab’s system and some problems they had to solve.

Segmentation Platform Architecture

The Segmentation Platform consists of two major subsystems

Segment Creation - Grab team members can create new segments with certain rules (only include users who have logged onto the app every day for the last 2 weeks). The Segment Creation system is responsible for identifying all the users who fit that criteria and putting them in the segment.
Segment Serving - Backend services at Grab can query the Segmentation Platform to get a list of all the users who are in a certain segment.

Segment Creation

For creating segments, Grab makes use of Apache Spark. In a past article, we did a deep dive on Spark that you can check out here.

Apache Spark

Spark is one of the most popular big data processing frameworks out there. It runs on top of your data storage layer, so you can use Spark to process data stored on AWS S3, Cassandra, MongoDB, MySQL, Postgres, Hadoop Distributed File System, etc.

With Spark, you chain together multiple transformations on your data (map, filter, union, reduce, etc.). Then, you call an action on your dataset and Spark creates jobs to execute the transformations.

Segment creation at Grab is powered by Spark jobs. Whenever a Grab team creates a segment, Spark will retrieve data from the data lake, clean/validate it and then populate the segment with users who fit the criteria.

For storing the segment data, Grab relies on ScyllaDB.

ScyllaDB

Previously, we delved into Cassandra when we talked about how Uber scaled the database to tens of thousands of nodes.

Cassandra is a NoSQL, distributed database created at Facebook and it took many ideas from Google Bigtable and Amazon’s Dynamo. It’s a wide column store that’s designed for write heavy workloads.

However, there are issues with Cassandra.

Performance Bottlenecks with Java - Cassandra is written in Java so it’s subject to garbage collection pauses. These pauses can cause unpredictable latency spikes and occasional delays.
Operational Complexity - Getting the optimal performance out of a Cassandra set up can require deep knowledge of its internal workings and a lot of manual tunings. Understanding how to set the heap size, compaction strategies, cache settings, etc. can be very esoteric.

ScyllaDB was created in 2015 with the goal of being a “better version of Cassandra”. It’s designed to be a drop-in replacement as it’s fully compatible with Cassandra (supports Cassandra Query Language, has compatible data models, etc.).

It’s written in C++ for better performance and also comes with self-tuning features to make it easier to use than Cassandra.

Discord wrote a great blog post delving into the issues they had with Cassandra and why they switched to ScyllaDB.

Segment Serving

Grab picked ScyllaDB because of how scalable it is (distributed with no single point of failure, similar to Cassandra) and it’s ability to meet their latency goals (they needed 99% of requests to be served within 80 milliseconds).

They have a set of Go services that power serving Segment data.

In order to ensure even balancing of data across ScyllaDB shards, they partition their database by User ID.

With this, the Segmentation Platform handles up to 12,000 reads per second and 36,000 writes per second with 99% of requests being served in under 40 milliseconds.

For a deeper dive, please check out the full blog post here.

How did you like this summary?

Your feedback really helps me improve curation for future emails.

Tech Snippets

How the CTO of Amazon takes Notes

A terrific blog post by Werner Vogels on the strategy he uses for taking notes during meetings.

Takes notes on pen/paper will obviously give you a resource you can use for later recall, but it also helps with maintaining focus and internalizing the important points.

Vogels uses the Cornell Method of taking notes, so he delves into how that works.

www.allthingsdistributed.com/2023/06/a-few-words-on-taking-notes.html

Transformers from Scratch

This is a fantastic deep dive into transformers, a deep learning architecture that’s crucial in LLMs, speech recognition, machine translation and much more.

It’s extremely thorough so the post assumes no prior knowledge of linear algebra and instead explains it from scratch. It delves into dot products, matrix multiplication but also explains the transformer architecture and how they work.

e2eml.school/transformers.html

6 Software Engineering Templates I Wish I Had Sooner

Ryan Peterman writes an extremely useful blog on growing your career as a developer. In this post, he shares useful templates that you can make use of.

Templates include
- Direction Doc for setting the direction of the team
- Launch Post for when you have results and want to share them with stakeholders
- PR Summary
- Postmortem Review

and more

www.developing.dev/p/6-software-engineering-templates

Anatomy of a Terminal Emulator

This is a great blog post that goes through all the different components of a terminal and how they interact. It goes through interacting with the shell, drawing the UI and more.

Code examples are written in Rust, but they're comprehensible to non-Rust devs and also have explanations (or just ask chatGPT to explain it to you).

poor.dev/blog/terminal-anatomy/?utm_source=blog.quastor.org&utm_medium=referral&utm_campaign=how-paypal-solved-their-thundering-herd-problem

A Coding Interview Roadmap

If you’re looking to interview at a FAANG company, you’ll need to pass their algorithmic questions.

The best way to prepare for these interviews is to follow a predefined roadmap of the topics/problems you need to know.

NeetCode built a fantastic, free roadmap that you can check out at this link.

neetcode.io/roadmap?utm_source=quastor

Lessons from Successful One Person Startups

Engineer’s Codex is a fantastic developer newsletter that covers topics in growing your career/income, real-world case studies, engineering research papers and more.

In a past edition, the newsletter delved into one-person companies that bootstrapped their way to earning more than $20k USD per month. The post discussed the strategies these entrepreneurs employed to stay motivated, minimize risk, and enhance their chances of success.

Here’s a summary of some of the tips

Ship Small and Fast

Releasing your work to the world can be daunting. By doing so, you expose yourself to potential criticism and vulnerability.

For this reason, many indie-founders can fall into a cycle of endless building ("just one more feature until it’s ready") to sidestep the intimidating process of launching their work and receiving real-world feedback.

On the other hand, successful indie hackers are constantly shipping new features/products and getting feedback as soon as possible. Often, they can go from idea to MVP in just a few days.

This approach lets them quickly discern whether they have a hit or miss on their hands, enabling them to pivot or double down accordingly.

Having a Marketing Strategy Beforehand

First time founders are obsessed with product. Second time founders are obsessed with distribution

Justin Kan (co-founder of Twitch)

Successful indie hackers allocate as much time to marketing as they do to engineering. Many have built personal brands on Twitter, while others excel in paid ads, SEO, or email marketing.

Build In Public is a common marketing strategy that goes hand-in-hand with shipping small and fast. While running their business, these entrepreneurs openly share their progress on platforms like Twitter, inviting customers to accompany them on their journey.

Focus on your Unfair Advantages and Passions

Figure out what you have that other founders don’t. If you’re reading this newsletter, then one big unfair advantage you’ve got is the ability to code and quickly ship an MVP.

Another unfair advantage could be your network. Perhaps you’ve got a wide range of friends at various tech companies and they could intro you to potential clients or partners (if you’re selling a B2B product for example).

Before you start a business, figure out exactly what your unfair advantages are. See how you can best take advantage of them.

Focus

If you’re a one-person team, then you have no choice but to focus. Trying to build multiple products at the same time (and then quickly giving up when you don’t see initial traction) is a surefire way to failure.

The best strategy Engineer’s Codex saw revolved around two-month sprints. An engineer told him about how he’d work on an idea during nights and weekends for two months.

At the end of the two months, he’d re-evaluate progress and see if there was enough promise to continue.

These are just a couple of tips from the blog post.

For the full list, please read the post here.

You can subscribe to Engineer’s Codex here.