How Booking.com scaled their Customer Review System

Plus, how to improve your ability to focus and why you should prefer complexity in data structures over code

July 22, 2024

Hey Everyone!

Today we’ll be talking about

How Booking.com Scaled Their Customer Review System
- Booking.com sharded their customer reviews across a cluster of servers. In order to scale this system, they needed to add more machines to the cluster.
- They talk about how they made it easy to scale the system up with Jump Consistent Hash, a consistent hashing algorithm developed at Google.
- We'll talk about the process they use to scale up the cluster to avoid any consistency/routing issues.
How to improve your Focus
- Andrew Huberman is a professor at the Stanford School of Medicine and he runs a great podcast where he gives actionable tips on how to improve your health. In this episode, he talks about how to improve your ability to focus.
- Using Binaural Beats to get into flow
- Working in 90 minute durations with a 10 minute break where you deliberately defocus
Tech Snippets
- 6 Best Practices to Manage Pull Request Creation and Feedback
- Good programmers focus on data structures over code
- Coding Interview Roadmap with specific LeetCode questions and solutions

How Booking.com Scaled their Customer Review System

Booking.com is one of the largest online travel agencies in the world; they booked over 35 million airplane tickets and a billion hotel room nights through the app/website in 2023.

Whenever you use Booking.com to book a hotel room/flight, you’re prompted to leave a customer review. So far, they have nearly 250 million customer reviews that need to be stored, aggregated and filtered through.

Storing these many reviews on a single machine is not possible due to the size of the data, so Booking.com has partitioned this data across several shards. They also run replicas of each shard for redundancy and have the entire system replicated across several availability zones. They wrote a great blog post on how they do this.

Sharding is done based on a field of the data, called the partition key. For this, Booking.com uses the internal ID of an accommodation. The hotel/property/airline’s internal ID would be used to determine which shard its customer reviews would be stored on.

A basic way of doing this is with the modulo operator.

If the accommodation internal ID is 10125 and you have 90 shards in the cluster, then customer reviews for that accommodation would go to shard 45 (equal to 10125 % 90).

Challenges with Scaling

The challenge with this sharding scheme comes when you want to add/remove machines to your cluster (change the number of shards).

Booking.com expected a huge boom in users during the summer. They forecasted that they would be seeing some of the highest traffic ever and they needed to come up with a scaling strategy.

However, adding new machines to the cluster will mean rearranging all the data onto new shards.

Let’s go back to our example with internal ID 10125. With 90 shards in our cluster, that accommodation would get mapped to shard 45. If we add 10 shards to our cluster, then that accommodation will now be mapped to shard 25 (equal to 10125 % 100).

This process is called resharding, and it’s quite complex with our current scheme. You have a lot of data being rearranged and you’ll have to deal with issues around ambiguity during the resharding process. Your routing layer won’t know if the 10125 accommodation was already resharded (moved to the new shard) and is now on shard 25 or if it’s still stuck in the processing queue and its data is still located on shard 45.

The solution to this is a family of algorithms called Consistent Hashing. These algorithms minimize the number of keys that need to be remapped when you add/remove shards to the system.

ByteByteGo did a great video on Consistent Hashing (with some awesome visuals), so I’d highly recommend watching that if you’re unfamiliar with the concept. Their explanation was the clearest out of all the videos/articles I read on the topic.

Using the Jump Hash Sharding Algorithm

For their sharding scheme, Booking now uses the Jump Hash Sharding algorithm, a consistent hashing algorithm that was created at Google. It’s extremely fast, takes minimal memory and is simple to implement (can be expressed in 5 lines of code).

With Jump Hash, Booking.com can rely on a property called monotonicity. This property states that when you add new shards to the cluster, data will only move from old shards to new shards; thus there is no unnecessary rearrangement.

With the previous scheme, we had 90 shards at first (labeled shard 1 to 90) and then added 10 new shards (labeled 91 to 100).

Accommodation ID 10125 was getting remapped from shard 45 to shard 25; it was getting moved from one of the old shards in the cluster to another old shard. This data transfer is pointless and doesn’t benefit end users.

What you want is monotonicity, where data is only transferred from shards 1-90 onto the new shards 91 - 100. This data transfer serves a purpose because it balances the load between the old and new shards so you don’t have hot/cold shards.

The Process for Adding New Shards

Booking.com set a clear process for adding new shards to the cluster.

They provision the new hardware and then have coordinator nodes that figure out which keys will be remapped to the new shards and loads them.

The resharding process begins and the old accommodation IDs are transferred over to the new shards, but the remapped keys are not deleted from the old shards during this process.

This allows the routing layer to ignore the resharding and continue directing traffic to remapped accommodation IDs to the old locations.

Once the resharding process is complete, the routing layer is made aware and it will start directing traffic to the new shards (the remapped accommodation ID locations).

Then, the old shards can be updated to delete the keys that were remapped.

Tech Snippets

6 Best Practices to Manage Pull Request Creation and Feedback

This is a useful blog post on the DoorDash Engineering blog with tips on how to minimize the number of bad PRs in your development cycle.

PRs should be short and have a clear title and description. Having short PRs helps programmers get feedback on their code more quickly, so it reduces the number of rewrites.

It also reduces friction in the code review process (you don't want to spend 2 hours reading someone else's PR).

doordash.engineering/2022/08/23/6-best-practices-to-manage-pull-request-creation-and-feedback

Good programmers focus on data structures over code

Linus Torvalds was quoted saying to “design your code around the data rather than the other way around“ when he was talking about Git’s design.

Engineer’s Codex wrote a great blog post delving into this quote further. When building you should start with the data and spend extra time thinking through that ahead of time. Complexity in the data structures is almost always preferable to complexity in the code.

read.engineerscodex.com/p/good-programmers-worry-about-data

A Coding Interview Roadmap for Big Tech

If you're preparing for coding interviews, this is a great roadmap that you can use to build your study plan.

It gives an ordering of the topics you need to study, what order to practice them in, and which LeetCode problems to solve. All the problems listed also come with a free video solution on YouTube by NeetCode.

neetcode.io/roadmap

How to Improve Your Ability to Focus

Andrew Huberman is a researcher at Stanford University and he has an amazing podcast (Huberman Lab) where he goes through current research on how you can live a happier, healthier and more productive life.

Being a programmer requires an ability to focus for long periods of time (get into a flow), so I found his podcast episode - Focus Tookit: Tools to Improve Your Focus & Concentration to be very useful.

Here's a quick summary of the tips he mentioned (based off peer-reviewed scientific literature).

Binaural Beats - There are playlists available on YouTube (or free apps if you google) that will play Binaural Beats (where you listen to two tones with slightly different frequencies at the same time). 40 Hz binaural beats have been shown to improve focus, attention and memory retention in a number of peer reviewed studies. Huberman recommends listening to 5-10 minutes of Binaural Beats prior to when you start your task to make it easier to get into a state of flow.
90 Minutes - The ideal duration for focused sessions is 90 minutes or less. Past that, fatigue begins to set in and the amount of focus people are able to dedicate begins to drop off. Therefore, Huberman sets a timer for 90 minutes when he begins a focused task and stops after that.
Defocus - After the 90 minutes (or less) focus session, you should spend 10-20 minutes where you deliberately defocus and give your brain/body a chance to rest. During this time, you should avoid focusing on any single thing (so avoid using your phone) and can work on menial tasks where your mind can wander (talk a short walk, do the dishes, wash the laundry, etc.). This is the best way to recharge for the next 90 minute focus session after the break.
Visual Field - A great deal of our cognitive focus is directed by our visual system. If you focus your eyes on a pen, you'll naturally start to focus on it and notice details about the pen. Cognitive focus tends to follow overt visual focus. Therefore, you can help ease yourself into a focused state by picking something in your room (part of the wall, an object, etc.) and staring at that object for 30 seconds to a few minutes (blinking is fine, don't try to force your eyes open). This helps you get into a focused state, and you can redirect your focus to your task after the 30 seconds is up.

These are a few of the tips Huberman mentions.

In the podcast, he also talks about using supplements like coffee, EPA, creatine and more.

Here's the full episode