How PayPal Built a Database that serves 350 Billion Requests Per Day
PayPal open sourced JunoDB, a distributed key value store. Also, how Dropbox reduced their search latency and why Val Town migrated away from Supabase
Hey Everyone!
Today we’ll be talking about
How PayPal Built a Distributed Database that handles 350 Billion Requests Per Day
JunoDB is a distributed key value database that is used by almost every core backend service at PayPal
It uses RocksDB as the storage engine and provides strong consistency guarantees with low latency
We delve into the architecture and design choices the PayPal team made
Tech Snippets
Migrating Away From Supabase
Computer Graphics from Scratch (free textbook)
Improving Search Latency at Dropbox
Consensus is Harder than it Looks
How PayPal Built a Distributed Database to serve 350 billion requests per day
This week, PayPal open-sourced JunoDB, a highly scalable NoSQL database that they built internally. It uses RocksDB as the underlying storage engine and serves 350 billion requests daily while maintaining 6 nines of availability (less than 3 seconds of downtime per month).
Yaiping Shi is a principal software architect at PayPal and she wrote a fantastic article delving into the architecture of JunoDB and the choices PayPal made. We’ll summarize the post and delve into some of the details.
Overview of JunoDB
JunoDB is a distributed key-value store that is used by nearly every core backend service at PayPal.
Some common use cases are
Caching - it’s often used as a temporary cache to store things like user preferences, account details, API responses, access tokens and more.
Idempotency - A popular pattern is to use Juno to ensure that an operation is idempotent and remove any duplicate processing. PayPal uses Juno to ensure that payments are not reprocessed during a retry or to avoid resending a notification message.
Latency Bridging - PayPal has other databases that are distributed geographically and have high data replication lag (replicating data across nodes in the distributed database). JunoDB has very low latency so it can step in and help address replication delays. This enables near-instant, consistent reads everywhere.
Benefits of Key Value Databases
Key-value databases have a very simple data model compared to the relational paradigm. This results in several benefits
Easy Horizontal Scaling - Keys can be distributed across different shards in the database using consistent hashing so you can easily add additional machines.
Low Latency - Data is stored as key/value pairs so the database can focus on optimizing key-based access. Additionally, there is no specific schema being enforced, so the database doesn’t have to validate writes against any predefined structure.
Flexibility - As mentioned above, PayPal uses JunoDB for a variety of different use cases. Since the data model is just key/value pairs, they don’t have to worry about coming up with a schema and modifying it later as requirements change.
Design of JunoDB
JunoDB uses a proxy-based design. The database consists of 3 main components.
JunoDB Client Library - runs on the application backend service that is using JunoDB. The library provides an API for storage, retrieval and updating of application data. It’s implemented in Java, Go, C++, Node and Python so it’s easy for PayPal developers to use JunoDB regardless of which language they’re using.
JunoDB Proxy - Requests from the client library go to different JunoDB Proxy servers. These Proxy servers will distribute read/write requests to the various Juno storage servers using consistent hashing. This ensures that storage servers can be added/removed while minimizing the amount of data that has to be moved. The Proxy layer uses etcd to store and manage the shard mapping data. This is a distributed key-value store meant for storing configuration data in your distributed system and it uses the Raft distributed consensus algorithm.
JunoDB Storage Server - These are the servers that store the actual data in the system. They accept requests from the proxy and store the key/value pairs in-memory or persist them on disk. The storage servers are running RocksDB, a popular key/value store database. This is the storage engine, so it will manage how machines actually store and retrieve data from disk. RocksDB uses Log Structured Merge Trees as the underlying data structure which enables very fast writes compared to B+ Trees (used in relational databases). However, this depends on a variety of factors. You can read a detailed breakdown here.
Guarantees
PayPal has very high requirements for scalability, consistency, availability and performance.
We’ll delve through each of these and talk about how JunoDB achieves this.
Scalability
Both the Proxy layer and the Storage layer in JunoDB are horizontally scalable. If there’s too many incoming client connections, then additional machines can be spun up and added to the proxy layer to ensure low latency.
Storage nodes can also be spun up as the load grows. JunoDB uses consistent hashing to map the keys to different shards. This means that when the number of storage servers changes, you minimize the amount of keys that need to be remapped to a new shard. This video provides a good explanation of consistent hashing.
A large JunoDB cluster could comprise over 200 storage nodes and easily process over 100 billion requests daily.
Data Replication and Consistency
To ensure fault tolerance and low latency reads, data needs to be replicated across multiple storage nodes. These storage nodes need to be distributed across different availability zones (distinct, isolated data centers) and geographic regions.
However, this means replication lag between the storage nodes, which can lead to inconsistent reads.
JunoDB handles this by using a quorum-based protocol for reads and writes. For consistent reads/writes, at least half of the nodes in the read/write group for that shard have to confirm the read/write request before it’s considered successful. Typically, PayPal uses a configuration with 5 zones and at least 3 must confirm a read/write.
Using this setup of multiple availability zones and replicas ensures very high availability and allows JunoDB to meet the guarantee of 6 nines.
Performance
JunoDB is able to maintain single-digit millisecond response times. PayPal shared the following benchmark results for a 4-node cluster with ⅔ read and ⅓ write workload.
For more details, you can read the full article here.