☕ Grab's Tech

How Grab handles Real-time Event Processing at massive scale (thousands of events per second) plus an amazing tool by GitHub and OpenAI

Hey Everyone,

Hope you’re all having a fantastic day!

Today we’ll be talking about

  • How Grab handles Real-time Event Processing at massive scale (thousands of events every second)

  • GitHub and OpenAI partner to create Co-Pilot, an “AI pair programmer”

  • Andrej Karpathy (head of Tesla AI) has an awesome talk on Tesla’s vision-only approach.

Real Time Event Processing at Grab

Grab is the largest transportation and food delivery company in Southeast Asia. As of May, their valuation was just under $40 billion U.S. dollars.

One of the Grab app’s features is to offer real-time rewards when a user takes a certain action (or series of actions).

For example, if you use the Grab app to get a ride to work in the morning, the app might immediately reward you with a ride reward that you can use in the evening for the ride back home.

If a Grab driver cancels on you, the Grab app will offer you ride reward points as a compensation.

Grab is able to do this because of Trident, Grab’s in-house, real-time IFTTT (If This Then That) engine.

Trident consumes events from multiple Kafka streams published by various backend services across Grab. Examples of events are GrabFood delivery orders, Transport rides, GrabPay payment processing, etc.

Trident then processes these events based on campaigns, which state what event should trigger what action under what conditions.

A campaign might be to count the number of orders a user has made. If he’s made 3 orders (condition), then give him a $5 delivery credit reward (action). If he’s made 2 orders, then send him an email saying he’ll get a reward after his next order.

An IFTTT engine is nothing new, but Trident has to operate on a massive scale.

Trident needs to handle more than 2,000 events every second and dispatches tens of millions of actions every day.

Here’s the overall system architecture of Trident.

The Trident Management UI is where a Grab employee can create/read/update/delete campaigns (specify what Grab rewards to offer for which events) and the Kafka streams are event streaming platforms where Grab’s backend services publish (and consume) events that have been executed.

The Trident System consumes data from the management backend and the Kafka Streams and then checks to make sure any duplicate events are filtered out. Each event is then cached in Redis.

The events are then passed to an internal SQS (a distributed message queueing service) where the events are queued and then processed.

The processing stage is called Rule Evaluation, where an event is checked through the rules for all the campaigns to see whether any campaigns match.

Grab uses a strategy of evaluating the simplest rules first (rules with the fewest necessary conditions) and using the results from that to avoid unnecessary Rule Evaluations.

An example of two rules might be Reward the user Grab points if he books a taxi in Singapore and Reward the user Grab points if he books a tax in Singapore between 9 and 10 am.

If an event doesn’t qualify for the first rule, then Trident’s Rule Evaluation can skip checking if the event satisfies the second rule.

If an event triggers an action, it gets persisted in MySQL.

Based on the Rule Evaluation, Trident will then trigger downstream services to run the actions that were triggered by the campaigns.

The load put on the Trident system is affected by 3 main factors

  1. Number of events produced in the Kafka Streams

  2. Number of campaigns running

  3. Nature of campaigns running (campaigns that trigger more actions cause higher load)

Trident handles additional load with horizontal scaling (the Trident consumers are all in an auto-scaling server group) to distribute load.

The engineers are also careful to rate limit downstream services (that are called by Trident) to avoid situations where Trident floods downstream services with too many actions during a period of high load.

Tech Snippets

  • GitHub and OpenAI create Copilot - a code completion tool on steroids.

    • Copilot is far more powerful than your typical code completion tool.

    • The demo video shows a programmer asking Copilot to use a web service to determine whether the sentiment of text is positive.

    • Copilot then inserts a code snippet that makes a POST request to text-processing.com (with the correct HTTP headers!!) to get the sentiment of a provided string.

  • Andrej Karpathy (Head of Tesla AI)’s talk at the CVPR 2021 Workshop on Autonomous Vehicles

    • Tesla recently shifted their self-driving systems to vision-only (no radar or lidar)

    • Tesla has been able to make this shift due to their massive dataset. They’ve combined this with a supercomputer that can do 1.8 EFLOPS (10^18 floating point operations per second) that they use for training the neural net. In case you're unfamiliar with an EFLOP, Tesla’s supercomputer is most likely the fifth most powerful supercomputer in the world.

    • Andrej talks about why Tesla is choosing a vision-only strategy and the technical challenges involved.

Interview Question

You are given a binary tree.

Each node in the tree has a next pointer that points to the next right node but the next pointers are currently set to Null.

Write a function that sets each next pointer to the correct node.

We’ll send a detailed solution tomorrow, so make sure you move our emails to primary, so you don’t miss them!

Gmail users—move us to your primary inbox

  • On your phone? Hit the 3 dots at the top right corner, click "Move to" then "Primary"

  • On desktop? Back out of this email then drag and drop this email into the "Primary" tab near the top left of your screen

Apple mail users—tap on our email address at the top of this email (next to "From:" on mobile) and click “Add to VIPs”

Previous Solution

As a refresher, here’s the last question

Write a function that takes a string num and an integer k.

The string num represents a non-negative integer.

The function should return the smallest possible integer after removing k digits from num.

You cannot reorder any of the digits! You can only remove them.

Solution

The brute force algorithm is to just scan the string from left to right (most significant to least significant) and remove a digit if it is larger than it’s right neighbor. We do this k times.

The issue with this approach is that removing a digit from a string takes O(n) time, making our algorithm O(k*n).

Converting the string to an array doesn’t help either, since removing an element from the middle of an array also takes O(n) time.

A data structure that does fit our use case is a stack.

We can iterate through the string and put each digit in our stack. If the next digit is smaller than our current digit (and we still haven’t removed k digits), then we’ll pop the current item off our stack and continue the iteration.

After the iteration, we’ll pop digits off our stack until we remove k digits and return the number without leading 0s.

If you want to practice more questions like this with top engineers at Google, Facebook, etc. then check out Interviewing.io.

You can book realistic mock interviews with senior FAANG engineers who will give you detailed and actionable feedback on exactly what you need to work on.

You don’t pay anything until you’re hired.

Check them out here.