How Canva Saved Millions on Data Storage

An overview of AWS S3 and how Canva changed their setup. Plus, the problem with LangChain, no more Postgres VACUUM, Distributed Systems for Fun and Profit and more.

Hey Everyone!

Today we’ll be talking about

  • How Canva Saved Millions on Data Storage

    • A brief overview of AWS S3

    • How Canva uses S3

    • Analyzing Canva’s data access patterns and storage classes

    • Transitioning data to Glacier Instant Retrieval and the cost savings

  • Tech Snippets

    • The Problem with LangChain

    • No More Postgres VACUUM

    • Distributed Systems for Fun and Profit

    • LazyVim

    • There is No Data Engineering Roadmap

Double Your Pay in 6 Months With AI/ML Skills

Developers who can build, maintain and fine tune AI systems are in extremely high demand. Companies are hiring aggressively and offering compensation packages upward of $300k - $500k.

Interview Kickstart runs an awesome program where they reskill frontend/backend developers to ML engineers and then help them get jobs at FAANG-level companies.

The program has

  • FAANG+ AI/ML engineers & tech leads teach the real-world Machine Learning and Data Science skills you need

  • Individual coaching and 1:1 help with course material

  • 15 mock interviews with Hiring Managers at FAANG companies

  • Access to the Interview Kickstart platform with 10,000+ interview questions, timed tests, videos and more

Alumni of InterviewKickstart’s program routinely get $300k+ job offers, so it’s a fantastic way of investing in yourself.

To learn more about the FAANG interview process, ML engineering positions at top tech companies and about InterviewKickstart’s program, you should check out their free webinar.

sponsored

How Canva Saved Millions on Data Storage

Canva is an online platform that lets you easily create presentations, diagrams, social media posters, flyers and other graphics.

They have a ton of pre-built templates, stock photos/videos, fonts, etc. so you can quickly create a presentation that doesn’t look like it was designed by a 9 year old.

Canva has over 100 million monthly active users with tens of billions of designs created on the platform. They have over 75 million stock photos and graphics on the site.

They run most of their production workloads on AWS and are heavy users of services like

  • S3 - for storing graphics, photos, videos, etc.

  • ECS (Elastic Container Service) - for compute. For example, they use ECS for handling GPU-intensive tasks like image processing.

  • RDS (Relational Database Service) - for storing data on users and more.

  • DynamoDB - key-value store that Canva uses for storing media metadata (title, artist, keywords) and more

In November of 2021, AWS launched a new storage tier of S3 called Glacier Instant Retrieval. This offered low-cost archive storage that also had low latency (milliseconds).

Canva analyzed their data storage/access patterns and estimated the cost savings of switching to this tier. They also looked at the cost of switching and calculated the ROI.

The company was able to save $3.6 million annually by migrating over a hundred petabytes to S3 Glacier Instant Retrieval.

Josh Smith is an Engineering Manager at Canva and he wrote a fantastic blog post delving into how Canva tracked their data access patterns, estimated the ROI of switching, and the migration process.

Here’s a summary of the blog post with additional context

Brief Overview of AWS S3

(You might want to skim/skip over this section if you’re experienced with AWS)

AWS S3 (Simple Storage Service) is one of the first cloud services Amazon launched (back in 2006).

It’s an object storage service, so you can use it to store any type of data. It’s commonly used to store things like images, videos, log files, backups. You can store any file on S3 as long as the file is less than 5 terabytes.

S3 provides

  • Cost Effective - S3 can be a cheap way to store large amounts of data. There’s different storage tiers based on your latency requirements (discussed below) and the pricing is a couple of cents to store a gigabyte per month.

  • High Durability - AWS provides 11 9’s of durability, so it’s very safe and it’s extremely unlikely that you’ll lose data. That being said, you should still have backups.

  • Reliable - AWS provides at least 3 9’s of availability (99.9%), which equates to about 40 minutes of downtime per month. If they’re down for longer than that, then Amazon will write you a heartfelt apology and compensate you for any losses your business incurred from their mistake. Just kidding, they’ll give you a small fraction of your bill back as AWS credits.

With S3, you create a bucket (like a folder in a file system) and upload your files there. Each file is given a key and a version ID. You use the file’s bucket, key and version ID to access it.

The file is immutable, so if you want to change it then you’ll have to upload the entire changed file again.

Pricing is mainly based on

  • Storage - you’re charged per month per gigabyte you use

  • Requests - AWS charges you for each GET and PUT request made. Frequent uploading or accessing data will increase cost.

  • Data Transfer - Charges apply when you move data out of S3. You’re billed per gigabyte that you move out. This can be very high.

AWS provides many different storage classes for storing your data. Each storage class has tradeoffs in terms of latency and pricing.

Some of the classes are

  • S3 Standard - general purpose option for storing data that is frequently accessed. Your storage will be a couple cents per gigabyte per month and GETs/PUTs are a fraction of a cent per 1,000 requests.

  • S3 Standard-Infrequent Access - This is for when you need to store data that isn’t accessed as frequently. Compared to S3 standard, the cost per gigabyte per month of storage is cheaper, but the cost of uploading and accessing the data is more expensive.

  • S3 Glacier Instant Retrieval - Data storage per month is significantly cheaper compared to S3 standard, but uploading and accessing data is also significantly more expensive.

  • S3 Glacier Deep Archive - This is the lowest cost storage option but retrieving the data can have a latency of hours.

Read more about the storage classes here.

You can create rules to automatically move data between storage classes using S3 lifecycle policies.

Alrighty, back to Canva.

How Canva uses S3

Canva stores over 230 petabytes in S3, with their largest bucket coming in at 45 petabytes.

They use many different storage tiers to minimize their cost.

  • S3 Standard - Canva stores stock photos/videos and templates in this storage tier. The data is accessed many times per day, so they need to minimize the cost and latency of PUTs/GETs.

  • S3 Standard-Infrequent Access - Canva uses this to store old user-created projects, images and media. A user will access their project very frequently when it’s first created. After a few weeks, the user will finish the project and rarely open it again. Therefore, the project will first be in S3 Standard and will be moved to S3 Standard-IA after a few weeks by an S3 lifecycle policy.

  • S3 Glacier Flexible Retrieval - Canva also archives logs and backups on S3. They rarely access this data and latency doesn’t matter so they use Glacier Flexible Retrieval. They still get the data within minutes/hours and it’s very cheap to store.

Migrating to S3 Glacier Instant Retrieval

In November 2021, AWS launched S3 Glacier Instant Retrieval. This gives you an extremely cheap cost of storage per gigabyte per month. In addition, data retrieval for Glacier Instant Retrieval can be done instantly (within milliseconds) whereas Glacier Flexible Retrieval can take hours. The downside is that retrieval for this storage class is extremely expensive (around 25 times more expensive compared to S3 Standard).

Canva had to figure out whether it would make financial sense to migrate data to the Glacier Instant Retrieval class. To do this, they used S3 Storage Class Analytics, which you can turn on at a per-bucket level.

With this, Canva made several observations

  • Retrieval for user projects data fell dramatically after the first 15 days, so users finished up their projects after the first 2 weeks

  • The rate of retrieval for data in S3 Standard-Infrequent Access class didn’t change. Users were equally likely to open up a past project a month after they finished it versus a year after they finished it.

  • For a typical bucket, around 10% of the data was stored in S3 Standard, whereas 90% was stored in S3 Standard-IA. However, 70% of all accessed data for that bucket came from S3 Standard.

Based off this (and some more data crunching), Canva decided that it would be cost effective to shift low-access data to Glacier Instant Retrieval.

Unfortunately, shifting S3 data from one storage class to another isn’t free. In fact, moving all of Canva’s 300 billion objects from other storage classes to the Glacier Instant Retrieval class would cost over $6 million dollars. Not fun.

However, the cost of transferring data between storage classes is billed per 1,000 objects. The size of the objects don’t matter, so you can get the biggest bang for your buck by transferring over the largest objects.

Based on this, Canva decided to target buckets with an average object size of 400 KB or more. This would show a positive return on investment (the storage class transfer costs) within 6 months or less.

Conclusion

Canva has already transferred over 130 petabytes to S3 Glacier Instant Retrieval. It cost them $1.6 million dollars to transition but it’ll save them $3.6 million dollars a year.

For more details, please read the full blog post here.

How did you like this summary?

Your feedback really helps me improve curation for future emails.

Login or Subscribe to participate in polls.

Double Your Pay in 6 Months With AI/ML Skills

Developers who can build, maintain and fine tune AI systems are in extremely high demand. Companies are hiring aggressively and offering compensation packages upward of $300k - $500k.

Interview Kickstart runs an awesome program where they reskill frontend/backend developers to ML engineers and then help them get jobs at FAANG-level companies.

The program has

  • FAANG+ AI/ML engineers & tech leads teach the real-world Machine Learning and Data Science skills you need

  • Individual coaching and 1:1 help with course material

  • 15 mock interviews with Hiring Managers at FAANG companies

  • Access to the Interview Kickstart platform with 10,000+ interview questions, timed tests, videos and more

Alumni of InterviewKickstart’s program routinely get $300k+ job offers, so it’s a fantastic way of investing in yourself.

To learn more about the FAANG interview process, ML engineering positions at top tech companies and about InterviewKickstart’s program, you should check out their free webinar.

sponsored

Tech Snippets