How Zomato built a Petabyte Scale Logging System
We'll talk about building a highly scalable logging system with ClickHouse. Plus, how AWS S3 is showing its age, a guide to pair programming and more.
Hey Everyone!
Today we’ll be talking about
How Zomato built a Petabyte Scale Logging System
Zomato is the largest food-delivery app in South Asia with 80 million active users
They recently switched their logging infrastructure from Elasticsearch to ClickHouse
We’ll talk about why they did this and how they can handle 150 million events logged per minute
Tech Snippets
AWS S3 is showing its Age
How to get Buy-In with Push and Pull
Challenging Software Projects to Try
A Guide to Pair Programming
How Zomato built a Petabyte Scale Logging System
Zomato is the largest food-delivery app in South Asia with over 80 million active users and 3200+ cities. At peak load, the app is handling thousands of food orders per minute.
With any app of this scale, observability is crucial for ensuring high availability. One core part of observability is logs.
Zomato’s backend services generate a huge amount of logs. On a daily basis, there’s over 50 terabytes of uncompressed log files with a maximum rate of 150 million events logged per minute.
Previously, Zomato used Elasticsearch (ELK Stack) to manage these logs but they started running into scaling problems.
Due to the exponential growth in the amount of logs, managing the Elasticsearch clusters became increasingly difficult and costly. They had to over-provision servers to handle variable traffic patterns, which led to rising costs and a poor experience.
To address these challenges, the Zomato team decided to switch to ClickHouse.
We’ll first give a brief introduction to ClickHouse. Then, we’ll delve into some of the strategies the Zomato team used to scale their logging infrastructure.
Introduction to ClickHouse
ClickHouse is a distributed, open-source database that was developed 12 years ago at Yandex (the Google of Russia) to power Metrica (a real-time analytics tool that’s similar to Google Analytics).
For some context, Yandex’s Metrica runs ClickHouse on a cluster of ~375 servers and they store over 20 trillion rows in the database (close to 20 petabytes of total data). So… it was designed with scale in mind.
ClickHouse is column-oriented with a significant emphasis on high write throughput and fast read operations. In order to minimize storage space (and enhance I/O efficiency), the database uses some clever compression techniques. It also supports a SQL-like query language to make it accessible to non-developers (BI analysts, etc.).
You can read about the architectural choices of ClickHouse here.
Zomato’s Logging System Architecture
Here’s the architecture of the setup they used.
Here’s a brief explanation of the diagram
Log Collection - Zomato uses Filebeat to collect logs from docker containers and EC2 instances. These logs are forwarded to Kafka for buffering. Zomato uses custom-built Golang workers to collect these logs from Kafka and send them to ClickHouse (more on this below)
Data Storage - ClickHouse ingests and stores the log data. Zomato used a ClickHouse cluster with 10 M6g.16xlarge AWS EC2 nodes (this article is from July 2023, so this exact figure has probably changed).
UI App - Zomato built a custom dashboard that developers can use to view/filter through the log data. They put a lot of effort into making the dashboard responsive, so it has a First Contentful Paint score of 0.95 seconds and a Largest Contentful Paint score of 1.9 seconds.
Efficient Insertions with Batching
ClickHouse provides Kafka plugins so you can directly send Kafka messages to your ClickHouse database.
However, the Zomato team decided not to rely on these. Instead, they wanted to batch log messages when inserting them to reduce the overhead on ClickHouse.
Each batch had a maximum size of 20,000 log entries; this kept the delay between generation of a log message and its insertion into ClickHouse to less than 5 seconds.
The Zomato team built Golang workers to handle this task.
Optimizing Searches with Skip Indexes
ClickHouse is extremely efficient when searching data by primary key, but queries involving other conditions can be costly (like any other database).
To minimize full table scans, ClickHouse provides “Skip” indexes that enable the database to skip reading significant chunks of data that are guaranteed to have no matching values.
One example of a skip index is a minmax index. This is where the database will store the minimum/maximum values of a column for every datablock. Then, when you’re querying data based on that specific column, ClickHouse can automatically skip any datablocks that have values that fall outside of the query.
Another type of skip index relies on Bloom Filters. A Bloom Filter is a probabilistic data structure that tells you with certainty if an element is not in the set.
This helps the database skip data blocks that are definitely not related to the query. You can read more about ClickHouse’s skip indexes here.
Monitoring & Resilience
For monitoring, ClickHouse comes with a lot of embedded instrumentation.
This let the Zomato team collect metrics via Prometheus and visualize them in Grafana.
They monitored health metrics like
CPU and memory usage
Network usage
Data insertion/query times
How many insertions were getting rejected
To ensure resiliency, the engineering team implemented load shedding. Under times of stress, they would terminate any queries that consumed excessive resources in order to prioritize lightweight queries. This would ensure system availability to the most users.
Impact
The switch to ClickHouse was crucial in helping Zomato scale their logging infrastructure
Some achievements include:
99% of queries were answered within 10 seconds
Ingestion lag time of less than 5 seconds
Over a million dollars per year of cost savings compared to Elasticsearch
For more details, read the full article here.