How Zomato built a Petabyte Scale Logging System

We'll talk about building a highly scalable logging system with ClickHouse. Plus, how AWS S3 is showing its age, a guide to pair programming and more.

Arpan KG
May 23, 2024

Hey Everyone!

Today we’ll be talking about

How Zomato built a Petabyte Scale Logging System
- Zomato is the largest food-delivery app in South Asia with 80 million active users
- They recently switched their logging infrastructure from Elasticsearch to ClickHouse
- We’ll talk about why they did this and how they can handle 150 million events logged per minute
Tech Snippets
- AWS S3 is showing its Age
- How to get Buy-In with Push and Pull
- Challenging Software Projects to Try
- A Guide to Pair Programming

How Zomato built a Petabyte Scale Logging System

Zomato is the largest food-delivery app in South Asia with over 80 million active users and 3200+ cities. At peak load, the app is handling thousands of food orders per minute.

With any app of this scale, observability is crucial for ensuring high availability. One core part of observability is logs.

Zomato’s backend services generate a huge amount of logs. On a daily basis, there’s over 50 terabytes of uncompressed log files with a maximum rate of 150 million events logged per minute.

Previously, Zomato used Elasticsearch (ELK Stack) to manage these logs but they started running into scaling problems.

Due to the exponential growth in the amount of logs, managing the Elasticsearch clusters became increasingly difficult and costly. They had to over-provision servers to handle variable traffic patterns, which led to rising costs and a poor experience.

To address these challenges, the Zomato team decided to switch to ClickHouse.

We’ll first give a brief introduction to ClickHouse. Then, we’ll delve into some of the strategies the Zomato team used to scale their logging infrastructure.

Introduction to ClickHouse

ClickHouse is a distributed, open-source database that was developed 12 years ago at Yandex (the Google of Russia) to power Metrica (a real-time analytics tool that’s similar to Google Analytics).

For some context, Yandex’s Metrica runs ClickHouse on a cluster of ~375 servers and they store over 20 trillion rows in the database (close to 20 petabytes of total data). So… it was designed with scale in mind.

ClickHouse is column-oriented with a significant emphasis on high write throughput and fast read operations. In order to minimize storage space (and enhance I/O efficiency), the database uses some clever compression techniques. It also supports a SQL-like query language to make it accessible to non-developers (BI analysts, etc.).

You can read about the architectural choices of ClickHouse here.

Zomato’s Logging System Architecture

Here’s the architecture of the setup they used.

Here’s a brief explanation of the diagram

Log Collection - Zomato uses Filebeat to collect logs from docker containers and EC2 instances. These logs are forwarded to Kafka for buffering. Zomato uses custom-built Golang workers to collect these logs from Kafka and send them to ClickHouse (more on this below)
Data Storage - ClickHouse ingests and stores the log data. Zomato used a ClickHouse cluster with 10 M6g.16xlarge AWS EC2 nodes (this article is from July 2023, so this exact figure has probably changed).
UI App - Zomato built a custom dashboard that developers can use to view/filter through the log data. They put a lot of effort into making the dashboard responsive, so it has a First Contentful Paint score of 0.95 seconds and a Largest Contentful Paint score of 1.9 seconds.

Efficient Insertions with Batching

ClickHouse provides Kafka plugins so you can directly send Kafka messages to your ClickHouse database.

However, the Zomato team decided not to rely on these. Instead, they wanted to batch log messages when inserting them to reduce the overhead on ClickHouse.

Each batch had a maximum size of 20,000 log entries; this kept the delay between generation of a log message and its insertion into ClickHouse to less than 5 seconds.

The Zomato team built Golang workers to handle this task.

Optimizing Searches with Skip Indexes

ClickHouse is extremely efficient when searching data by primary key, but queries involving other conditions can be costly (like any other database).

To minimize full table scans, ClickHouse provides “Skip” indexes that enable the database to skip reading significant chunks of data that are guaranteed to have no matching values.

One example of a skip index is a minmax index. This is where the database will store the minimum/maximum values of a column for every datablock. Then, when you’re querying data based on that specific column, ClickHouse can automatically skip any datablocks that have values that fall outside of the query.

Another type of skip index relies on Bloom Filters. A Bloom Filter is a probabilistic data structure that tells you with certainty if an element is not in the set.

This helps the database skip data blocks that are definitely not related to the query. You can read more about ClickHouse’s skip indexes here.

Monitoring & Resilience

For monitoring, ClickHouse comes with a lot of embedded instrumentation.

This let the Zomato team collect metrics via Prometheus and visualize them in Grafana.

They monitored health metrics like

CPU and memory usage
Network usage
Data insertion/query times
How many insertions were getting rejected

To ensure resiliency, the engineering team implemented load shedding. Under times of stress, they would terminate any queries that consumed excessive resources in order to prioritize lightweight queries. This would ensure system availability to the most users.

Impact

The switch to ClickHouse was crucial in helping Zomato scale their logging infrastructure

Some achievements include:

99% of queries were answered within 10 seconds
Ingestion lag time of less than 5 seconds
Over a million dollars per year of cost savings compared to Elasticsearch

For more details, read the full article here.

Tech Snippets

AWS S3 Is Showing Its Age

AWS S3 has a huge amount of impressive engineering but it’s currently lacking many features that you can find in competitors.

This is a list of some of the feature gaps that S3 has compared to other competitors.

Some include
- lack of multi-region buckets compared to GCS
- no preconditions
- lack of appends

and more.

materializedview.io/p/s3-is-showing-its-age

How to get Buy-In with Push and Pull

Pull is where there’s a strong sense that the work being done is valuable and fulfilling a specific need. Push is where developers are instructed to perform a task in a specific manner (pushed to do it in a certain way).

Submitting a PR has lots of pull, since you need to submit it in order to get your code merged. Submitting a daily status report, on the other hand, doesn’t have much pull. It might seem like a meaningless task, so you might need push (everyone must submit a status report).

In order to get buy-in with engineering processes, you need a combination of pull and push. This is an interesting article on how to think about the trade off.

kellanem.com/notes/push-and-pull

Challenging Software Projects to Try

Coding through small challenges is a great way to improve your skills as a developer (and also a fun way to spend a sunday night).

This is a great list of 18 projects you can try out. Some of these are also great items to put on a resume.

Projects include
- Write a scientific calculator
- Write your own HTTP server in C
- Build a game engine for text-based games

www.andreinc.net/2024/03/28/programming-projects-ideas

A Guide to Pair Programming

This is a super comprehensive guide on Pair Programming by Tuple.

It has articles about why you should pair program, what to do your first session, anti-patterns when pairing, how to pair with a junior developer and more.

tuple.app/pair-programming-guide