How Canva Collects 25 Billion Events Per Day

An overview of AWS Kinesis and how Canva uses it to collect and process 25 billion events per day. Plus, the art of good code review, how to find coachable employees and more.

Hey Everyone!

Today we’ll be talking about

  • How Canva Collects 25 Billion Events Per Day

    • Brief Overview of AWS Kinesis

    • Architecture of Canva’s Data Pipeline

    • Why Canva picked Kinesis over AWS SQS and techniques Canva uses to minimize costs

  • Tech Snippets

    • Go is my hammer, and everything is a nail

    • Coachability: The Prerequisite To Growth

    • The art of good code review

You’ll often hear about the mythical “10x engineer” - the go-to person on the team whenever you need a feature shipped fast. However, 10x engineers aren’t just super-technical, they also have a great sense of what to build.

If you’re working on the wrong feature, then it doesn’t matter how fast you work. The company won’t see a big impact from your work.

Product for Engineers wrote a great article delving into the most impactful engineers and identified six common traits that they share.

Here’s a couple of the traits.

  1. Always Prototyping and Experimenting - they ship MVPs early and often, iterate quickly based on feedback and aren’t afraid to pivot or kill features that aren’t working.

  2. Are Comfortable Writing - Clear writing skills are a must for documenting features, providing PR feedback, and making big technical decisions with RFCs.

  3. Understand the Broader Context - they understand the organization’s goals and align their decisions/work with the company’s strategy.

For the rest of the traits, check out the Product for Engineers newsletter.

They send out fantastic articles every month to help you develop the skills you need to deliver the most impact (and get promoted faster).

sponsored

How Canva Collects 25 Billion Events Per Day

Canva is an online graphics design platform that lets you create presentations, social media banners, infographics, logos and more. They have over 175 million monthly users and are valued at $26 billion. 

In order to understand how people are using the platform, Canva’s mobile, web and desktop apps collect a wide range of events on user clicks, views, scrolls, etc.

Every day, Canva needs to collect and process over 25 billion events (800 billion events per month). This needs to be done with 99.999% uptime.

Last month, they published a fantastic blog post on how they built a data pipeline to handle this.

They talk about why they built the pipeline on AWS Kinesis and the specific techniques they use to minimize costs and latency.

Brief Overview of AWS Kinesis

AWS Kinesis is a family of services for processing and analyzing streaming data in real-time. It was launched in late 2013 and is composed of four main services: Data Streams, Data Firehose, Data Analytics and Video Streams.

Here’s a brief overview of the four services:

  • Data Streams - this service is responsible for ingesting and storing streaming data in real-time with sub-second latency. Kinesis Data Streams does not handle data processing so you’ll need to use another tool (Apache Flink, Kinesis Data Analytics, Spark, etc.) for transformations and analytics. Kinesis Data Firehose is used for sending the processed data to destinations like AWS S3, MongoDB, etc. 

  • Data Firehose - Firehose is primarily used for loading streaming data into data lakes, databases and analytics services. You can deliver your data to AWS S3, Redshift, Elasticsearch, Splunk and other data stores.


    However, Firehose can also handle data ingestion and basic transformations. A few months ago, Firehose was rebranded from Kinesis Firehose to Data Firehose (but Firehose’s API and other functionality wasn’t changed).

  • Data Analytics - If you’d like to run complex transformations on the streaming data that’s been ingested through Data Streams, then you can do that with Kinesis Data Analytics.

    Under the hood, Data Analytics uses Apache Flink so Amazon has also rebranded Kinesis Data Analytics to “Amazon Managed Service for Apache Flink” (but the core capabilities and purpose haven’t changed).

  • Video Streams - In addition to data, Kinesis can also be used for ingesting and storing live video. Kinesis Video Streams gives you the infrastructure to ingest and store video data. You can integrate it with other services to process and distribute the stored video.

Canva uses Kinesis Data Streams to ingest 25 billion events per day. From Kinesis, Canva sends the event data to Snowflake for processing.

Here’s how the data pipeline works…

Canva’s Data Pipeline for Collecting Events

Canva has iOS, Android, web and desktop applications. Each of these apps is instrumented to collect events and send them to Canva’s backend.

Canva’s servers will first validate the events and make sure that they conform to a predefined schema.

They will then batch the events together (with a few hundred events per batch) and apply ztsd compression. Then, Canva’s servers will send the events to a Kinesis Data Stream.

From Kinesis, Canva has an ingestion worker that will read the events and enrich them with additional data. This worker will do things like

  • Add country-level geolocation data

  • Add user device details

  • Correct any timestamp issues

Canva has a separate ingestion worker do this processing because they wanted to minimize the latency of the collection endpoint in the server. Decoupling the event collection and the event enrichment helps them scale to 25 billion events per day. 

After enrichment, the events are sent back to Kinesis. Canva’s router then routes the events to Snowflake. Canva runs their ML models, dashboards and data analytics with Snowflake as the data store.

Some of the event types are also sent to AWS SQS so they can be consumed by other backend services at Canva (that need to process the event data in real-time).

Minimizing AWS Costs

  • AWS Kinesis over SQS - In the first version of the data pipeline, Canva used AWS SQS and SNS instead of Kinesis. These were easier to set up however the pricing was significantly higher. By switching to Kinesis Data Streams, Canva saw costs drop by 85%.

  • Event Compression - Canva’s servers will first batch the events (in groups of a few hundred events per batch) and apply ztsd compression. These compressed batches will then be sent to Kinesis. Using this strategy (instead of sending each event as a separate record) saves Canva $600k every year in AWS costs.

Many engineering roles today need developers to get involved in product decisions, talk to users and analyze usage data. Understanding how to do this well is hard.

Product for Engineers wrote a fantastic blog post delving into some of the mistakes devs make when they’re trying to make decisions based on analytics data.

Some of the mistakes include

  • Making it too Complicated - It’s easy to get overwhelmed by the huge swath of data tools. Instead, start small.  Pick a specific feature and track its usage with trends and retention. Use that to iterate. 

  • Not Using Session Replays - Session replays are a fantastic tool for uncovering bugs, unexpected behavior and UX issues. They have a very high information density and aren’t just for PMs or marketers.

  • Only focusing on the Numbers - relying on data alone is like tying one arm behind your back. You also need qualitative data like surveys and user interviews. Combining the two will help you build better products.

For the rest of the mistakes, check out Product for Engineers. It’s a fantastic newsletter by PostHog that helps developers learn how to build apps that users love.

To hone your product skills and read more articles like this, check out Product for Engineers below.

sponsored

Tech Snippets