How LinkedIn Serves 5 Million User Profiles per Second
Using Couchbase as a caching layer to scale. Plus, lessons learned building an open source project to 250 million downloads, deep dive on caching in system design and more.
Hey Everyone!
Today we’ll be talking about
How LinkedIn Serves 5 Million User Profiles per Second
Why LinkedIn switched to Espresso, a document-oriented database
Scalability issues with Espresso
Introduction to Couchbase
How LinkedIn incorporated Couchbase as a Caching Layer
Caching Layer Design Principles
Tech Snippets
How to Find Great Senior Engineers
Lessons Learned Building an Open Source Project with 250 Million Downloads
Using the Pareto Principle for Managing Codebase Complexity
Deep Dive on Caching in System Design
How LinkedIn Serves 5 Million User Profiles per Second
LinkedIn has over 930 million users in 200 countries. At peak load, the site is serving nearly 5 million user profile pages a second (a user’s profile page lists things like their job history, skills, recommendations, cringy influencer posts, etc.)
The workload is extremely read-heavy, where 99% of requests are reads and less than 1% are writes (you probably spend a lot more time stalking on LinkedIn versus updating your profile).
To manage the increase in traffic, LinkedIn incorporated Couchbase (a distributed NoSQL database) into their stack as a caching layer. They’ve been able to serve 99% of requests with this caching system, which drastically reduced latency and cost.
Estella Pham and Guanlin Lu are Staff Software Engineers at LinkedIn and they wrote a great blog post delving into why the team decided to use Couchbase, challenges they faced, and how they solved them.
Here’s a summary with some additional context
LinkedIn stores the data for user profiles (and also a bunch of other stuff like InMail messages) in a distributed document database called Espresso (this was created at LinkedIn). Prior to that, they used a relational database (Oracle) but they switched over to a document-oriented database for several reasons….
1. Horizontal Scalability
Document databases are generally much easier to scale than relational databases as sharding is designed into the architecture. The relational paradigm encourages normalization, where data is split into tables with relations between them. For example, a user’s profile information (job history, location, skills, etc.) could be stored in a different table compared to their post history (stored in a user_posts table). Rendering a user’s profile page would mean doing a join between the profile information table and the user posts table.
If you sharded your relational database vertically (placed different tables on different database machines) then getting all of a user’s information to render their profile page would require a cross-shard join (you need to check the user profile info table and also check the user posts table). This can add a ton of latency.
For sharding the relational database horizontally (spreading the rows from a table across different database machines), you would place all of a user’s posts and profile information on a certain shard based on a chosen sharding key (the user’s location, unique ID, etc.). However, there’s a significant amount of maintenance and infrastructure complexity that you’ll have to manage. Document databases are built to take care of this.
Document-oriented databases encourage denormalization by design, where related data is stored together in a single document. All the data related to a single document is stored on the same shard, so you don’t have to do cross-shard joins. You pick the sharding key and the document database will handle splitting up the data, rebalancing hot/cold shards, shard replication, etc.
With horizontal sharding (either for a relational database or a document database), picking the key that you use to shard is a crucial decision. You could shard by user geography (put the Canada-based users on one shard, India-based users on another, etc.), use a unique user ID (and hash it to determine the shard), etc.
For Espresso, LinkedIn picked the latter with hash-based partitioning. Each user profile has a unique identifier, and this is hashed using a consistent hashing function to determine which database shard it should be stored on.
2. Schema Flexibility
LinkedIn wanted to be able to iterate on the product quickly and easily add new features to a user’s profile. However, schema migrations in large relational databases can be quite painful, especially if the database is horizontally sharded.
On the other hand, document databases are schemaless, so they don’t enforce a specific structure for the data being stored. Therefore, you can have documents with very different types of data in them. Also, you can easily add new fields, change the types of existing fields or store new structures of data.
In addition to schema flexibility and sharding, you can view the full breakdown for why LinkedIn switched away from Oracle here. (as you might’ve guessed, $$$ was another big factor)
Scalability Issues with Espresso
LinkedIn migrated off Oracle to Espresso in the mid-2010s and this worked extremely well. They were able to scale to 1.4 million queries per second by adding additional nodes to the cluster.
Espresso Architecture
However, they eventually reached the scaling limits of Espresso where they couldn’t add additional machines to the cluster. In any distributed system, there are shared components that are used by all the nodes in the cluster. Eventually, you will reach a point where one of these shared components becomes a bottleneck and you can’t resolve the issue by just throwing more servers at the system.
In Espresso’s case, the shared components are
Routing Layer - responsible for directing requests to the appropriate storage node
Metadata Store - manages metadata on node failures, replication, backups, etc.
Coordination Service - manages the distribution of data and work amongst nodes and node replicas.
and more.
LinkedIn reached the upper limits of these shared components so they couldn’t add additional storage nodes. Resolving this scaling issue would require a major re-engineering effort.
Instead, the engineers decided to take a simpler approach and add Couchbase as a caching layer to reduce pressure on Espresso. Profile requests are dominated by reads (over 99% reads), so the caching layer could significantly ease the QPS (queries per second) load on Espresso.
Brief Intro to Couchbase
Couchbase is the combination of two open source projects: Membase and CouchDB.
Membase - In the early 2000s, you had the development of Memcache, an in-memory key-value store database. It quickly became very popular for use as a caching layer and some developers created a new project called Membase that leveraged the caching abilities of Memcached and added persistence (writing to disk), cluster management features and more.
CouchDB - a document-oriented database that was created in 2005, amid the explosion in web applications. CouchDB stores data as JSON documents and it lets you read/write with HTTP. It’s written in Erlang and is designed to be highly scalable with a distributed architecture.
Couchbase is a combination of ideas from Membase and CouchDB, where you have the highly scalable caching layer of Membase and the flexible data model of CouchDB.
It’s both a key/value store and a document store, so you can perform Create/Read/Update/Delete (CRUD) operations using the simple API of a key/value store (add, set, get, etc.) but the value can be represented as a JSON document.
With this, you can access your data with the primary key (like you would with a key/value store), or you can use N1QL (pronounced nickel). This is an SQL-like query language for Couchbase that allows you to retrieve your data arbitrarily and also do joins and aggregation.
It also has full-text search capabilities to search for text in your JSON documents and also lets you set up secondary indexes.
LinkedIn’s Cache Design
So, LinkedIn was facing scaling issues with Espresso, where they could no longer horizontally scale without making architectural changes to the distributed database.
Instead, they decided they would incorporate Couchbase as a caching layer to reduce the read load placed on Espresso (remember that 99%+ of the requests are reads).
When the Profile backend service sends a read request to Espresso, it goes to an Espresso Router node. The Router nodes maintain their own internal cache (check the article for more details on this) and first try to serve the read from there.
If the user profile isn’t cached locally in the Router node, then it will send the request to Couchbase, which will generate a cache hit or a miss (the profile wasn’t in the Couchbase cache).
Couchbase is able to achieve a cache hit rate of over 99%, but in the rare event that the profile isn’t cached the request will go to the Espresso storage node.
For writes (when a user changes their job history, location, etc.), these are first done on Espresso storage nodes. The system is eventually consistent, so the writes are copied over to the Couchbase cache asynchronously.
In order to minimize the amount of traffic that gets directed to Espresso, LinkedIn used three core design principles
Guaranteed Resilience against Couchbase Failures
All-time Cached Data Availability
Strictly defined SLO on data divergence
We’ll break down each principle.
Guaranteed Resilience against Couchbase Failures
LinkedIn planned to use Couchbase to upscale (serve a significantly greater number of requests than what Espresso alone can handle), so it’s important that the Couchbase caching layer be as independent as possible from Espresso (the source of truth or SOT).
If Couchbase goes down for some reason, then sending all the request traffic directly to Espresso will certainly bring Espresso down as well.
To make the Couchbase caching layer resilient, LinkedIn implemented quite a few things
Couchbase Health Monitors to check on the health of all the different shards and immediately stop sending requests to any unhealthy shards (to prevent cascading failures).
Have 3 replicas for each Couchbase shard with one leader node and two follower replicas. Every request is fulfilled by the leader replica and followers would immediately step in if the leader fails or is too busy.
Implement retries where certain failed requests are retried in case the failure was temporary
All Timed Cache Data Availability
Because LinkedIn needs to ensure that Couchbase is extremely resilient, they cache the entire Profile dataset in every data center. The size of the dataset and payload is small (and writes are infrequent), so this is feasible for them to do.
Updates that happen to Espresso are pushed to the Couchbase cache with Kafka. They didn’t mention how they calculate TTL (time to live - how long until cached data expires and needs to be re-fetched from Espresso) but they have a finite TTL so that data is re-fetched periodically in case any updates were missed.
In order to minimize the possibility of cached data diverging from the source of truth (Espresso), LinkedIn will periodically bootstrap Couchbase (copy all the data over from Espresso to a new Couchbase caching layer and have that replace the old cache).
Strictly Defined SLO on Data Divergence
Different components have write access to the Couchbase cache. There’s the cache bootstrapper (for bootstrapping Couchbase), the updater and Espresso routers.
Race conditions among these components can lead to data divergence, so LinkedIn uses System Change Number (SCN) values where SCN is a logical timestamp.
For every database write in Espresso, the storage engine will produce a SCN value and persist it in the binlog (log that records changes to the database). The order of the SCN values reflects the commit order and they’re used to make sure that stale data doesn’t accidentally overwrite newer data due to a race condition.
For more details on LinkedIn’s caching layer with Couchbase, you can read the full article here.
How did you like this summary?Your feedback really helps me improve curation for future emails. |