How Figma Migrated to Kubernetes

We'll talk about how (and why) Figma switched from AWS ECS to Kubernetes. Plus, how Netflix deals with the Noisy Neighbor problem, how Fireship became YouTube's favorite programmer and more.

Hey Everyone!

Today we’ll be talking about

  • How Figma Migrated to Kubernetes

    • Issues Figma faced with AWS ECS and why they wanted to move to Kubernetes

    • The migration process and strategies Figma used to make the switch easier

    • Results and future areas for improvement

  • Tech Snippets

    • How Netflix deals with the Noisy Neighbor problem

    • How to establish good internal documentation habits

    • Learning about Debuggers

    • How to get better at giving feedback

    • How Fireship became YouTube’s favorite programmer

Hacking Scale is a free bi-weekly newsletter about building and scaling software.

Get an engineering story explained in under 1000 words, with hand-drawn visuals, delivered right to your inbox from Better Stack engineers.

Past articles include

🎁 It’s free for early subscribers!

sponsored

How Figma Migrated to Kubernetes

Figma is a web-design platform that lets users create complex UIs and mockups. The website was launched in 2015 and has since grown to over 4 million users. 

With this growth, the engineering team has had to deal with a ton of challenges, particularly around scaling their backend infrastructure.

Prior to 2024, Figma was running their servers in containers with AWS Elastic Container Service (ECS). However, the engineers was facing some limitations with ECS and were also interested in some of the features of Kubernetes.

The Figma team published a terrific blog post delving into these limitations, the advantages of Kubernetes and their migration process. We’ll be summarizing the post.

Limitations of AWS ECS

AWS Elastic Container Service is an extremely popular platform for running containerized workloads.

However, the Figma team started to face some limitations as they scaled.

  • Lack of Useful Primitives - Figma engineers were having to spend time handling limitations of the ECS platform as their needs became more complex. For example, installing etcd was a pain since they had trouble maintaining network identifiers for containers. On the other hand, Kubernetes provides StatefulSets, which maintains a unique network identifier for each of your pods.  

  • Helm Charts - Helm is a package manager for Kubernetes. Various teams at Figma wanted to use certain OSS projects but installing and maintaining them on ECS would’ve required a ton of engineering time. With Kubernetes, they could just use the Helm chart. 

  • Misc Issues - The Figma team also dealt with many smaller inconveniences with ECS. For example, gracefully terminating a single EC2 machine was difficult with ECS on EC2. It’s easier with AWS Elastic Kubernetes Service where you can cordon off the bad node and let Kubernetes move that machine’s pods to a different server while respecting their shutdown routines.

Advantages of Kubernetes

In addition to solving the pain points above, the Figma team saw several benefits from migrating to Kubernetes.

  • Avoiding Vendor Lock In - With Elastic Kubernetes Service, Figma would be in a great middle-ground. They could reduce headaches as AWS would handle the infrastructure and control plane. However, all their services would be written to run generically on Kubernetes. This would make it easy to move to another Kubernetes platform (or self-host) if they wanted. 

  • CNCF Ecosystem - With Kubernetes, it would be much easier to install and manage open source tech from the Cloud Native Computing Foundation (CNCF). Figma was particularly interested in solutions for auto-scaling and service meshes.

  • Kubernetes is Popular - Many large companies run huge workloads on Kubernetes. This de-risks the platform and also makes it easier to hire engineers with prior experience scaling on Kubernetes.

Based on this, Figma created a roadmap to migrate to Kubernetes in 2023.

Migrating to Kubernetes

Migrating the most crucial Figma services over to Kubernetes took months to accomplish.

During the migration, here’s some techniques the Figma team used to make the process easier.

  • Investing in Load Testing - The engineering team created a “Hello World” service and scaled it up to simulate the largest services at Figma. This helped them understand how the cluster functioned at scale and what problems they might face during the migration. 

  • Rolling out Incrementally - Figma wanted a way to incrementally shift traffic from ECS to EKS with a way to rollback quickly. They accomplished this with weighted DNS where they set up separate DNS records for the ECS-hosted and EKS-hosted services. They gradually shifted incoming traffic from ECS to EKS. AWS offers a weighted routing feature that you can use in Route 53 to implement this.

  • Running Services Early - The engineers wanted to put real workloads on Kubernetes as soon as possible. Doing this taught them much more than just testing in a staging environment. As mentioned earlier, they used weighted DNS to accomplish this safely. 

  • Standardizing Kubernetes YAML - To enforce consistency, Figma created a golden path for defining and configuring services (with customization options for special cases). 

Results

After a year, the Figma team was able to migrate a majority of their highest priority services to EKS.

Currently , they’re working on migrating the rest of the services and also adding improvements. Some of the focus areas are supporting auto-scaling, integrating a service mesh and moving resources out of Terraform to be managed with AWS Controllers for Kubernetes (ACKs).

Hacking Scale is a free bi-weekly newsletter about building and scaling software.

Get an engineering story explained in under 1000 words, with hand-drawn visuals, delivered right to your inbox from Better Stack engineers.

Past articles include

🎁 It’s free for early subscribers!

sponsored

Tech Snippets