How PayPal Built a Database that serves 350 Billion Requests Per Day

PayPal open sourced JunoDB, a distributed key value store. Also, how Dropbox reduced their search latency and why Val Town migrated away from Supabase

May 22, 2023

Hey Everyone!

Today we’ll be talking about

How PayPal Built a Distributed Database that handles 350 Billion Requests Per Day
- JunoDB is a distributed key value database that is used by almost every core backend service at PayPal
- It uses RocksDB as the storage engine and provides strong consistency guarantees with low latency
- We delve into the architecture and design choices the PayPal team made
Tech Snippets
- Migrating Away From Supabase
- Computer Graphics from Scratch (free textbook)
- Improving Search Latency at Dropbox
- Consensus is Harder than it Looks

How PayPal Built a Distributed Database to serve 350 billion requests per day

This week, PayPal open-sourced JunoDB, a highly scalable NoSQL database that they built internally. It uses RocksDB as the underlying storage engine and serves 350 billion requests daily while maintaining 6 nines of availability (less than 3 seconds of downtime per month).

Yaiping Shi is a principal software architect at PayPal and she wrote a fantastic article delving into the architecture of JunoDB and the choices PayPal made. We’ll summarize the post and delve into some of the details.

Overview of JunoDB

JunoDB is a distributed key-value store that is used by nearly every core backend service at PayPal.

Some common use cases are

Caching - it’s often used as a temporary cache to store things like user preferences, account details, API responses, access tokens and more.
Idempotency - A popular pattern is to use Juno to ensure that an operation is idempotent and remove any duplicate processing. PayPal uses Juno to ensure that payments are not reprocessed during a retry or to avoid resending a notification message.
Latency Bridging - PayPal has other databases that are distributed geographically and have high data replication lag (replicating data across nodes in the distributed database). JunoDB has very low latency so it can step in and help address replication delays. This enables near-instant, consistent reads everywhere.

Benefits of Key Value Databases

Key-value databases have a very simple data model compared to the relational paradigm. This results in several benefits

Easy Horizontal Scaling - Keys can be distributed across different shards in the database using consistent hashing so you can easily add additional machines.
Low Latency - Data is stored as key/value pairs so the database can focus on optimizing key-based access. Additionally, there is no specific schema being enforced, so the database doesn’t have to validate writes against any predefined structure.
Flexibility - As mentioned above, PayPal uses JunoDB for a variety of different use cases. Since the data model is just key/value pairs, they don’t have to worry about coming up with a schema and modifying it later as requirements change.

Design of JunoDB

JunoDB uses a proxy-based design. The database consists of 3 main components.

JunoDB Client Library - runs on the application backend service that is using JunoDB. The library provides an API for storage, retrieval and updating of application data. It’s implemented in Java, Go, C++, Node and Python so it’s easy for PayPal developers to use JunoDB regardless of which language they’re using.

JunoDB Proxy - Requests from the client library go to different JunoDB Proxy servers. These Proxy servers will distribute read/write requests to the various Juno storage servers using consistent hashing. This ensures that storage servers can be added/removed while minimizing the amount of data that has to be moved. The Proxy layer uses etcd to store and manage the shard mapping data. This is a distributed key-value store meant for storing configuration data in your distributed system and it uses the Raft distributed consensus algorithm.

JunoDB Storage Server - These are the servers that store the actual data in the system. They accept requests from the proxy and store the key/value pairs in-memory or persist them on disk. The storage servers are running RocksDB, a popular key/value store database. This is the storage engine, so it will manage how machines actually store and retrieve data from disk. RocksDB uses Log Structured Merge Trees as the underlying data structure which enables very fast writes compared to B+ Trees (used in relational databases). However, this depends on a variety of factors. You can read a detailed breakdown here.

Guarantees

PayPal has very high requirements for scalability, consistency, availability and performance.

We’ll delve through each of these and talk about how JunoDB achieves this.

Scalability

Both the Proxy layer and the Storage layer in JunoDB are horizontally scalable. If there’s too many incoming client connections, then additional machines can be spun up and added to the proxy layer to ensure low latency.

Storage nodes can also be spun up as the load grows. JunoDB uses consistent hashing to map the keys to different shards. This means that when the number of storage servers changes, you minimize the amount of keys that need to be remapped to a new shard. This video provides a good explanation of consistent hashing.

A large JunoDB cluster could comprise over 200 storage nodes and easily process over 100 billion requests daily.

Data Replication and Consistency

To ensure fault tolerance and low latency reads, data needs to be replicated across multiple storage nodes. These storage nodes need to be distributed across different availability zones (distinct, isolated data centers) and geographic regions.

However, this means replication lag between the storage nodes, which can lead to inconsistent reads.

JunoDB handles this by using a quorum-based protocol for reads and writes. For consistent reads/writes, at least half of the nodes in the read/write group for that shard have to confirm the read/write request before it’s considered successful. Typically, PayPal uses a configuration with 5 zones and at least 3 must confirm a read/write.

Using this setup of multiple availability zones and replicas ensures very high availability and allows JunoDB to meet the guarantee of 6 nines.

Performance

JunoDB is able to maintain single-digit millisecond response times. PayPal shared the following benchmark results for a 4-node cluster with ⅔ read and ⅓ write workload.

For more details, you can read the full article here.

Tech Snippets

Migrating Away From Supabase

Val Town is a web platform that lets you easily run, deploy and share server-side JavaScript. They built the initial version of the web app on Supabase but they faced issues with local development. Eventually, they ended up switching to Render.

In the post, they delve into the issues they faced with Supabase, the rewriting process and the results with Render.

blog.val.town/blog/migrating-from-supabase

Improving Search Latency at Dropbox with HTTP3

Dropbox published an interesting post on their engineering blog where they delved into the impact HTTP3 has on their latencies.

They found that a majority of global users would see a 5% reduction in search latency. The users experiencing the highest latencies would see the biggest boost, with p95 latencies dropping by over 20%.

dropbox.tech/frontend/investigating-the-impact-of-http3-on-network-latency-for-search

Signup Free to Pointer.io, a Reading Club for Software Developers

If you find Quastor useful, you should check out Pointer.io.

It’s a reading club for software developers that’s read by CTOs, engineering managers and senior developers. They send out super high quality engineering-related content and it’s completely free!

(crosspromo)

https://www.pointer.io/?utm_source=quastordaily&utm_medium=email&utm_campaign=crosspromo

Consensus is Harder Than It Looks

Marc Booker is a Distinguished Engineer at Amazon and he writes a great blog where he talks about distributed systems.

In this post, he delves into some of the complexities that people often overlook when building a consensus system. He talks about ensuring determinism, effective monitoring and control. He ends with whether or not you truly need strong consistency.

brooker.co.za/blog/2020/10/05/consensus.html

Computer Graphics from Scratch

This is a fantastic, free textbook that delves into building a raytracer and a rasterizer from scratch.

The code is all language-agnostic, so you don’t need to know C++. You just need basic programming knowledge and high school level math to follow along.

www.gabrielgambetta.com/computer-graphics-from-scratch/index.html