☕ Remote First / Leader Election Explained

Leader Election in System Design Explained. An Amazon Interview Question. The latest silicon valley startup to shift to a fully remote workforce. Spotify open sources an internal tool.

Hey,

Hope you’re having a fantastic day!

Interview Problem

Given an integer K, construct all possible binary search trees with K nodes.

We’ll send a detailed solution tomorrow, so make sure you move our emails to primary, so you don’t miss them!

Gmail users—move us to your primary inbox

  • On your phone? Hit the 3 dots at top right corner, click "Move to" then "Primary"

  • On desktop? Back out of this email then drag and drop this email into the "Primary" tab near the top left of your screen

Apple mail users—tap on our email address at the top of this email (next to "From:" on mobile) and click “Add to VIPs”

Industry News

Spotify open-sources Klio, a powerful way to process audio files at scale

Klio is an ecosystem that allows you to process audio files easily and at scale. You can take audio files from event inputs (Pub/Sub input subscription), download them onto worker machines, run processing algorithms (like ffmpeg or trained ML models, or whatever) on the files, and then save the output to whatever data store you want.

Being able to do something like this gets very tricky, very fast when you’re handling terabytes of data. Klio helps out by providing a simple framework to handle this allowing it to be done in-house. This lets data scientists and researchers do their audio processing on the same infrastructure that is being used by engineers for production systems.

The tool is built on Apache Beam and has been developed and built by Spotify to help their engineers and audio scientists develop and deploy next-generation audio algorithms. An example of such an audio algorithm is Spotify’s Sing Along feature, a karaoke-like feature that uses AI to separate the vocals from an instrument track within minutes of a song joining the catalog. Processing every song is no easy feat as there are 40,000 songs added per day to Spotify’s database of over 60 million songs.

Spotify has now open-sourced the tool and also has more usability features on the roadmap. The long-term goal is to allow non-technical people, like product managers, to utilize Klio and the powerful benefits it provides.

Dropbox announces that they will go remote-first

With the COVID pandemic, there’s a massive shift in the way that we all work. Dropbox has joined the trend of making this shift permanent, with their announcement that they will go “remote-first” after the pandemic. This means that all employees will work remotely (unlike Microsoft or Facebook, which have given employees the option of going remote). Keep in mind, Dropbox is not a small tech startup. The company has more than 2800 employees and offices in several cities across the world.

In order to accommodate employees who prefer in-office work, Dropbox will be maintaining offices as “Dropbox Studios”. These offices will have a smaller footprint (since fewer employees are coming in), but all current employees will have access to a Dropbox Studio.

Dropbox is also making a big shift to how employees work with “non-linear” workdays. Dropbox will have core collaboration hours with overlap between timezones, so employees can easily arrange meetings. Other than those collaboration hours, however, employees are free to design their own schedules.

This makes Dropbox one of the largest companies to adopt such a huge change. Another tech startup that made a similar move is Coinbase, a cryptocurrency exchange. While they’re half the size of Dropbox (1,123 employees), they also saw the tradeoffs involved with remote-first work and decided to pull the trigger.

Previous Solution

As a refresher, here’s the previous question

What is Leader Election in the context of System Design and Distributed Computing?

Solution

In large scale Distributed Systems, you should always expect one of your servers to go down. How do you deal with servers randomly going down?

Redundancy!

Introduce redundancy into your system and have multiple nodes that accomplish the same task.

Now, your problem shifts to how to deal with coordinating between all these nodes. Obviously, you don’t want multiple nodes doing a task when only one node is supposed to do it. For example, if you have redundancy in your payment processor, you don’t want multiple nodes processing the same payment as you will accidentally charge your customer multiple times.

Leader Election, is the solution to this. Your multiple nodes will first elect a leader and then the leader will accomplish the task that is in question.

However, it’s hard to solve the coordination problem in Leader Election. Having multiple machines gain and maintain consensus is a very difficult problem.

In order to solve this, there are several consensus algorithms used in the industry. Some examples are ZAB, Raft or Paxos. You’ll never be expected to implement this consensus algorithms yourself, but you’ll typically have to use a service that implements one of these algorithms under the hood.

One type of service that implements this is a Distributed Key/Value Store that provides a library for distributed coordination.

Examples of this are Etcd with their coordination primitives or Zookeeper with their library of coordination recipes.

These Distributed Key/Value stores provide High Availability and Strong Consistency.

Now, all your nodes can utilize the Key/Value store to keep consensus on who the leader is, and to elect future leaders. You don’t have to worry about two nodes that both think they are the new leader due to the Strong Consistency guarantees of those Key/Value Stores.