Infrastructure as Code at Shopify


Today we’ll be talking about

  • Managing Infrastructure with Code at Shopify
    • Shopify’s Contact Center team relies on Twilio for routing tasks to their customer service reps. However, using Twilio’s website for configuration was becoming a bottleneck.
    • Engineers used Terraform, an open source Infrastructure as Code tool, to manage their Twilio configuration using code.
    • We’ll talk about how engineers created a Twilio Terraform Provider and the final results with Terraform.
  • Why Programming Safety Matters
    • Programming languages make different tradeoffs and some languages make it easier to write bug-free code than others
    • Programming safety can be divided into Memory Safety, Type Safety and Thread Safety.
    • We’ll go through all 3 of these categories.
  • Plus, some tech snippets on
    • Practical Frontend Architecture using ReactJS, NextJS, GraphQL and TypeScript
    • 10 Rules for Negotiating a Job offer
    • A Gentle Introduction to Elliptic Curve Cryptography
    • Writing a Toy Traceroute from Scratch

We also have a solution to our last Microsoft interview question and a new question from Facebook.

Managing Infrastructure with Code at Shopify

Shopify is a tech company that helps businesses build e-commerce stores. If you want to build an e-commerce store to sell your handcrafted guitars, you could use Shopify to set up your website, manage customer information, handle payments/banking and more.

Jeremy Cobb is a software engineer at Shopify, where he works on the Contact Center team. They’re responsible for building the tooling that helps Shopify’s customer service team deal with all the support inquiries from businesses that use the platform.

He wrote a great blog post on how his team uses Terraform for configuration management. Terraform is an open source tool that lets you configure your infrastructure using code.

Here’s a summary

The Contact Center team builds the tooling that Shopify customer service agents use to handle support requests.

One tool the engineers rely on is Twilio’s TaskRouter service. Twilio is a company that builds programmable communication tools, so you can use Twilio’s API for sending emails, text messages, etc.

Shopify uses Twilio TaskRouter to handle routing communication tasks (voice, chat, etc.) to the most appropriate customer service agent based on a set of routing rules. For example, users in the US might get sent to a different customer service agent than users in Canada.

Previously, Shopify would configure these routing rules using Twilio’s website. However, the complexity of the rules grew and it became too much for a single person to manage.

Having multiple people manage the rules quickly became troublesome because the website doesn’t provide a clear history of changes or way to roll changes back.

In order to solve this, the Contact Center team decided to use Terraform to manage the configuration of Twilio Taskrouter.

Terraform is an open source tool that lets you write code to manage/configure your infrastructure/tooling. You can write the code in JSON or in a Terraform-specific language called HashiCorp Configuration Language (HCL).

In order to use Terraform to manage your infrastructure, you need 3 things.

  1. A reliable API - The infrastructure/service (Twilio in Shopify’s case) will need a reliable API that you can send requests to in order to make changes. If the only way of configuring your infrastructure is through their website, then it’s not possible to use any infrastructure as code solutions.
  2. A Terraform Provider - In order to consume the infrastructure’s API, Terraform needs a Provider Plugin, which lets Terraform interface with external APIs. The Provider Plugin contains CRUD instructions for all the resources that the Provider manages. For example, the AWS Terraform Provider Plugin will have CRUD instructions for AWS ec2 resources.All the major cloud computing companies (GCP, AWS, etc.) maintain their own Terraform Providers for their service. You can also create your own Providers to use external APIs that don’t already have a provider available.
  3. A Client Library - You’ll also want a separate library that the Terraform Provider can interface with to make API requests to the external infrastructure API. You could create a Terraform Provider Plugin that makes the API calls itself, but this is highly discouraged. It’s better to modularize the API calls in a separate client library.

So, Twilio TaskRouter provided a reliable API that the Shopify team could use to manage their rule configuration.

There was no TaskRouter Terraform Provider available at the time (Twilio has since developed their own) so the Shopify team built one themselves.

The Provider defines how Terraform should manage Twilio TaskRouter. It contains resource files for every type of resource in TaskRouter that Terraform has to manage; each resource file has CRUD instructions that tell Terraform how to manage it.

The Provider also has import instructions that let Terraform import existing infrastructure. This is useful if you already have infrastructure running and want to start using Terraform to manage it.

The Shopify team also built a client library that the Terraform Provider would use to make HTTP calls to Twilio’s API.

Using Terraform

With Terraform set up, Shopify could stop relying on Twilio’s website for configuring TaskRouter rules and instead write them using HCL (Terraform’s domain specific language).

This made seeing changes to the infrastructure much easier and allowed Shopify to integrate software engineering practices like pull requests, code reviews, etc for their TaskRouter rules.

It also allowed non-developers to start configuring rule changes themselves. Business and support teams could write rule changes in HCL and create PRs instead of making a request and waiting for a developer to log onto Twilio’s website and change the config manually.

For more details on how Shopify created the Provider and on how they use Terraform, you can read the full article here.

Quastor is a free Software Engineering newsletter that sends out deep dives on interesting tech, summaries of technical blog posts, and FAANG interview questions and solutions.

Tech Snippets

  • Practical Frontend Architecture - Jared Gorski is a senior software engineer at Liferay. He wrote a great blog post that goes through the various parts of their frontend architecture and the purpose it serves. He uses the RANT stack (ReactJS + TypeScript + NextJS) along with GraphQL.
  • 10 Rules for Negotiating a Job Offer - This is an great blog post on things you should keep in mind when negotiating your salary for a new job. One good rule mentioned is to understand what the company values. For some companies, negotiating a higher signing bonus will be a lot easier than asking for a higher salary. Many companies will also be a lot more willing to give you more stock rather than bumping up your cash compensation.
  • A gentle introduction to Elliptic Curve Cryptography - Elliptic Curve cryptography is used in TLS, PGP and SSH. It’s also used heavily in cryptocurrencies like Bitcoin and Ethereum. This is a great 4-part blog series that gives an introduction to elliptic curves and their use in cryptography.
  • Writing a Toy Traceroute from Scratch - You can use traceroute to trace the route of packets from your computer to another computer. If you want to learn how traceroute works, this is a great blog post that dives into that. There’s also Python code that creates a toy version of Traceroute.

Why Safe Programming Matters

Deepu Sasidharan is a developer at Okta. He wrote a great blog post on programming safety and why it matters. 

Here’s a summary

When you’re talking about safety in programming, you’re primarily talking about three things

  1. Memory Safety
  2. Type Safety
  3. Thread Safety

Memory Safety

When you access a variable/array in a memory-safe language, it will check to make sure that you are indeed accessing what you meant to access.

Memory-unsafe languages (like C and C++) do not provide built-in protections against access/overwriting data in other parts of memory.

This can lead to tons of different kinds of hairy bugs/exploits like

  • Buffer Overflow - When your program is writing data to a buffer, it can overrun the buffer’s boundary and overwrite adjacent memory locations.
  • Dangling Pointers - When the object that a pointer is referencing has been deleted without deleting the pointer. If the program writes something new to that spot in memory that is referenced to by the pointer then that can cause unpredictable behavior.
  • Memory leaks - When your program is not managing memory correctly, leading to your program using more memory than is actually needed.

And many more.

Memory safety issues are the cause of most security vulnerabilities encountered.

If you look at the stats,

  • About 70% of Common Vulnerabilities and Exposures at Microsoft are memory safety issues
  • Two-thirds of Linux kernel vulnerabilities are from memory-safety issues
  • 90% of Android vulnerabilities are from memory-safety

Type Safety

In a type-safe language, the compiler will validate types while compiling and throw an error if you try to assign the wrong type to a variable.

On the other hand, a language with poor type safety will not run these checks and will also do unintuitive things when you combine multiple types (JavaScript’s type coercion gives a ton of examples).

A lack of type safety can lead to tons of bugs at runtime that are hard to diagnose.

Thread Safety

In a thread-safe language, you can access or modify the same memory from multiple threads simultaneously without worrying about data races (where the behavior of your program changes depending on which thread changes the memory first).

This is achieved using things like mutexes, thread synchronization and more.

Deepu then discusses how Rust addresses all of these safety issues.

Read the full blog post for more.

Interview Question

Write an algorithm that searches for a target value in an m x n integer matrix.

The matrix will have the following properties

  • integers in each row are sorted in ascending from left to right
  • integers in each column are sorted in ascending from top to bottom

Previous Question

As a refresher, here’s the previous question

You are given the root to a binary search tree.

Find the second largest node in the tree and return it.


The brute force solution there would be to just do an in-order traversal of the tree.

Then, when we have a list of all the ordered nodes in the tree, we can just return the second largest element in the list.

This solution takes O(n) time and space. Can we do better?

We can!

Let’s say the problem said find the second smallest node in the tree and return it. How would we do that? Well, we could just quit our in-order traversal once we’ve visited two nodes!

However, our problem statement is the second largest node in the tree. So, how can we modify our inorder traversal to solve this variant with the same approach?

We reverse our inorder traversal!

With inorder traversal, we

  1. Traverse the left subtree
  2. Process the node
  3. Traverse the right subtree

We’ll reverse it! In a reverse in-order traversal, we

  1. Traverse the right subtree
  2. Process the node
  3. Traverse the left subtree

In a Binary Search Tree, reverse in-order traversal will give us the nodes sorted in largest-to-smallest order!

Now, we can just quit once we’ve processed two nodes and return the second largest node in our tree.

Quastor is a free Software Engineering newsletter that sends out deep dives on interesting tech, summaries of technical blog posts, and FAANG interview questions and solutions.

Subscribe to Quastor

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.