How Notion sharded their Postgres Database

October 07, 2021

Hey Everyone,

Today we’ll be talking about

How Notion sharded their Postgres databases
- Why they sharded
- Picking a shard key
- Database Migration
Plus, a couple awesome tech snippets on
- How Bash works under the hood
- WebRTC explained
- Hands on Web Assembly

We also have a solution to our last coding interview question on arrays, plus a new question from Microsoft.

Quastor Daily is a free Software Engineering newsletter sends out FAANG Interview questions (with detailed solutions), Technical Deep Dives and summaries of Engineering Blog Posts.

How Notion sharded their Postgres Database

Notion is an app that is meant to serve as a personal (or corporate) workspace.

You can store notes, tasks, wikis, kanban boards and other things in a Notion workspace and you can easily share it with other users.

If you’ve been a Notion user for a while, you probably noticed that the app got extremely slow in late 2019 and 2020.

Earlier this year, Notion sharded their Postgres monolith into a fleet of horizontally scalable databases. The resulting performance boost was pretty big.

Amazing how much faster @NotionHQ feels lately. The "show don't tell" is strong with this one.
— Guillermo Rauch (@rauchg)
6:36 PM • Apr 27, 2021

Sharding a database means partitioning your data across multiple database instances.

This allows you to run your database on multiple computers and scale horizontally instead of vertically.

When to Shard?

Sharding your database prematurely can be a big mistake. It can result in an increased maintenance burden, new constraints in application code and little to no performance improvement (so a waste of engineering time).

However, Notion was growing extremely quickly, so they knew they’d have to implement sharding at some point.

The breaking point came when the Postgres VACUUM process began to stall consistently.

The VACUUM process clears storage occupied by dead tuples in your database.

When you update data in Postgres, the existing data is not modified. Instead, a new (updated) version of that data is added to the database.

This is because it’s not safe to directly modify existing data, as other transactions could be reading it.

This is called Multiversion Concurrency Control (MVCC).

At a later point, you can run the VACUUM process to delete the old, outdated data and reclaim disk space.

If you don’t regularly vacuum your database (or have Postgres run autovacuum, where it does this for you), you’ll eventually reach a transaction ID wraparound failure.

So, you must vacuum your database or it will eventually fail.

Having the VACUUM process consistently stall is not an issue that can be ignored.

Application-Level vs. Managed

Sharding can be divided into two approaches

Application-Level Sharding - You implement the data partitioning scheme in your application code. You might direct all American users to one database and all Asian users to another database.
Third-Party Sharding - You rely on a third party to handle the sharding for you. An example is Citus, an open source extension for Postgres.

Notion decided to go with Application-Level sharding.

They didn’t want to go with a third party solution because they felt it’s sharding logic would be opaque and hard to debug.

Shard Key

In order to shard a database, you have to pick a shard key. This determines how your data will be split up amongst the shards.

You want to pick a shard key that will equally distribute loads amongst all the shards.

If one shard is getting a lot more reads/writes than the others, that can make scaling very difficult.

Notion decided to partition their database by workspace. Workspaces are the folders that contain all the pages, tasks, notes, etc.

So, if you’re a student using Notion, you might have separate Workspaces for all your classes.

Each workspace is assigned a UUID upon creation, so that UUID space is partitioned into uniform buckets.

Each bucket goes to a different shard.

How many Shards?

Notion ended up going with 460 logical shards distributed across 32 physical databases (with 15 logical shards per database).

This allows them to handle their existing data and scale for the next two years (based off their projected growth).

Database Migration

After establishing how the sharded database works, you still have to migrate from the old database to the new distributed database.

Double-write: Incoming writes are applied to both the old and new databases.
Backfill: Migrate the old data to the new database.
Verification: Ensure the integrity of data in the new database.
Switch-over: Actually switch to the new database. This can be done incrementally, e.g. double-reads, then migrate all reads.

Read the full details in the blog post!

Tech Snippets

How Bash works under the hood - This is a great post on the architecture of Bash. It goes through the major components: input processing, parsing, word expansions / other command processing, and command execution.
Hands-on WebAssembly - This is an awesome tutorial that gets you started with WebAssembly. The only prerequisite is some general knowledge about web development. Crack out your favorite code editor and learn some WASM!
WebRTC explained - This is a free book that goes through WebRTC. It’s written by the maintainers of WebRTC and it talks about the protocol and APIs.

Interview Question

You are given the head of a linked list.

You are also given an integer n.

Remove the nth node from the end of the linked list and return it’s head.

Example

Input: head = [1, 2, 3, 4, 5], n = 2

Output: [1, 2, 3, 5]

We’ll send the solution in our next email, so make sure you move our emails to primary, so you don’t miss them!

Gmail users—move us to your primary inbox

On your phone? Hit the 3 dots at the top right corner, click "Move to" then "Primary"
On desktop? Back out of this email then drag and drop this email into the "Primary" tab near the top left of your screen
A pop-up will ask you “Do you want to do this for future messages from [email protected]” - please select yes

Apple mail users—tap on our email address at the top of this email (next to "From:" on mobile) and click “Add to VIPs”

Previous Solution

As a reminder, here’s our last question

You are given an integer array called nums. You are also given an integer called target.

Find 3 integers in nums such that the sum is closest to target.

Return the sum of the 3 integers.

Each input will have exactly one solution

Example

Input: nums = [-1, 2, 1, -4], target = 1

Output: 2

The sum that is closest to the target is (-1, 2, 1)

Here’s the question in LeetCode

Solution

We can solve this question by first sorting nums from smallest to greatest.

After sorting, we’ll iterate through all the integers in nums.

Our loop counter will be the variable i.

nums[i] will be one of the numbers in our sum.

Now, we have to find the other 2 numbers greater than nums[i] that minimize the distance between sum and target.

We do that with two pointers, L and R.

L points to nums[i + 1] and R points to the last number in nums.

Now, we take the sum of the integers at i, L and R.

If the sum is closer to the target than any of the previous sums, then we’ll store this sum as our closest.

If the sum is greater than our target, then we’ll decrement R. This will make the sum smaller.

If the sum is less than our target, then we’ll increment L. This will make the sum larger.

We’ll end the current iteration when L is greater than or equal to R.

Then, we’ll increment i.

After the for loop ends, we can return the closest sum.

Here’s the Python 3 code.