Howdy!
Today we’ll be talking about
We have a new question from Microsoft and a solution to our last question on dynamic programming. We give the top down and the bottom up DP solutions. The bottom up solution is slightly more space efficient.
In 2020, the 3 co-founders of RocaNews quit their jobs because they hated the news. Don't we all?
The news was negative, partisan, and alarmist. They wanted news that didn't obsess over politics. They wanted to know what was going on in the world, but wanted facts – not opinions. So they created RocaNews.
The free newsletter covers the most interesting events happening in the world, and is designed to fit into your day.
Join the 1.2 million+ people who start their morning with RocaNews!
sponsored
Airbnb has recently migrated from a Ruby on Rails monolith to a Services-Oriented Architecture (SOA).
This migration helped Airbnb scale their application, but it also introduced new challenges.
One of these challenges was around Airbnb's Continuous Delivery process and how they had to adapt it to the new services architecture.
Jens Vanderhaeghe is a senior software engineer at Airbnb and Manish Maheshwari is a lead product manager. They wrote a great blog post on Airbnb's new continuous delivery process and how they migrated.
Here’s a summary
Previously, Airbnb used an internal service called Deployboard to handle their deploys. Deployboard worked great when Airbnb was using a Ruby on Rails monolith but over the past few years the company has shifted to a Microservices-oriented Architecture.
A microservices architecture means decentralized deployments where individual teams have their own pipeline.
Airbnb needed something more templated, so that each team could quickly get a standard, best-practices pipeline, rather than building their own service from scratch.
Spinnaker is an open source continuous delivery platform that was developed internally at Netflix and further extended by Google.
Airbnb decided to adopt Spinnaker because it
Migrating to Spinnaker
Airbnb has a globally distributed team of thousands of software engineers. Getting all of them to shift to Spinnaker would be a challenge.
They were particularly worried about the Long-tail Migration Problem, where they could get 80% of teams to switch over to the new deployment system but then struggle to get the remaining 20% to switch over.
Being forced to maintain two deployment systems can become very costly and is a reliability/security risk because the legacy system gets less and less maintenance/attention over time.
To prevent this, Airbnb had a migration strategy that focused on 3 pillars.
Focus on Benefits
Airbnb started by encouraging teams to adopt Spinnaker voluntarily.
They did this by first onboarding a small group of early adopters. They identified a set of services that were prone to causing incidents and switched those teams over to Spinnaker.
The automated Canary analysis quickly demonstrated its value to those teams as well as the other features that Spinnaker provided.
These early adopters ended up becoming evangelists for Spinnaker and spread the word to other teams at Airbnb organically. This helped increase voluntary adoption.
Automated Onboarding
As more teams started adopting Spinnaker, the Continuous Delivery team at Airbnb could no longer keep up with demand. Therefore, they started building tooling to automate the onboarding process to Spinnaker.
They created an abstraction layer on top of Spinnaker that let engineers make changes to the CD platform with code (IaC). This allowed all continuous delivery configuration to be source controlled and managed by Airbnb's tools and processes.
Data
The Continuous Delivery team also put a great amount of effort into clearly communicating the value-add of adopting Spinnaker.
They created dashboards for every service that adopted Spinnaker to show metrics like number of regressions prevented, increase in deploy frequency, etc.
Final Hurdle
With this 3 pillar strategy, the vast majority of teams at Airbnb had organically switched over to Spinnaker.
However, adoption began to tail off as the company reached ~85% of deployments on Spinnaker.
At this point, the team decided to switch strategy to avoid the long-tail migration problem described above.
Their new plan consisted of
1. Stop the bleeding - Stop any new services/teams from being deployed using the old continuous delivery platform.
2. Announce deprecation date - Announce a deprecation date for the old continuous delivery platform and add a warning banner at the top.
3. Send out automated PRs - Airbnb has an in-house refactor tool called Refactorator that helped with making the switch to Spinnaker easier.
4. Deprecate and post-deprecation - On deprecation date, they had code in-place that blocked deploys from the old continuous delivery platform. However, they had exemptions in-place for emergencies where the old system had to be used.
Conclusion
With this strategy, Airbnb was able to get to the 100% finish line in the migration.
This migration serves as the blueprint for how other infrastructure-related migrations will be done at Airbnb.
Read the full article for more details.
In 2020, the 3 co-founders of RocaNews quit their jobs because they hated the news. Don't we all?
The news was negative, partisan, and alarmist. They wanted news that didn't obsess over politics. They wanted to know what was going on in the world, but wanted facts – not opinions. So they created RocaNews.
The free newsletter covers the most interesting events happening in the world, and is designed to fit into your day.
Join the 1.2 million+ people who start their morning with RocaNews!
sponsored
Write a function that sorts the elements in a stack so that the smallest elements are on top.
You can use an additional temporary stack, but you cannot copy the elements into any other data structure.
The stack supports the following operations
As a refresher, here’s the previous question
You are walking up a staircase with n steps.
You can hop either 1 step, 2 steps or 3 steps in a single time (you have long legs).
Write a function to count how many possible ways you can walk up the stairs.
Solution
We can solve this question with Dynamic Programming.
If we have n steps and we have a function numSteps, that tells us the number of possible ways we can walk up the stairs.
Then, the number of ways we can walk up the stairs is numSteps(n - 1) + numSteps(n - 2) + numSteps(n - 3).
Therefore, with top down DP, we can just recursively call each individual function and add them together to get our result.
Our base case will be if n == 0, in which case we just return 1.
For Bottom Up DP, we’d typically use a memo table.
However, in this question we’re just accessing the last 3 values (n - 1, n - 2, n - 3).
Therefore, we can solve this question using constant space by just using 3 variables to keep track of those values. We don’t have to use an entire array to maintain our memo table.
Here’s the Bottom Up DP code.
The time complexity is linear and the space complexity is constant.
Here’s one of our past tech dives in case you missed it!
In 1998, the first Google index had 26 million pages. In 2000, the Google index reached a billion web pages. By 2008, Google was processing more than 1 trillion web pages.
As you might imagine, the storage needs required for this kind of processing were massive and rapidly growing.
To solve this, Google built Google File System (GFS), a scalable distributed file system written in C++. Even in 2003, the largest GFS cluster provided hundreds of terabytes of storage across thousands of machines and it was serving hundreds of clients concurrently.
GFS is a proprietary distributed file system, so you’ll only encounter it if you work at Google. However, Doug Cutting and Mike Cafarella implemented Hadoop Distributed File System (HDFS) based on Google File System and HDFS is used widely across the industry.
LinkedIn recently published a blog post on how they store 1 exabyte of data across their HDFS clusters. An exabyte is 1 billion gigabytes.
In this post, we’ll be talking about the goals of GFS and its design. If you’d like more detail, you can read the full GFS paper here.
The main goal for GFS was that it be big and fast. Google wanted to store extremely large amounts of data and also wanted clients to be able to quickly access that data.
In order to accomplish this, Google wanted to use a distributed system built of inexpensive, commodity machines.
Using commodity machines is great because then you can quickly add more machines to your distributed system (as your storage needs grow). If Google relied on specialized hardware, then there may be limits on how quickly they can acquire new machines.
To achieve the scale Google wanted, GFS would have to use thousands of machines. When you’re using that many servers, you’re going to have constant failures. Disk failures, network partitions, server crashes, etc. are an everyday occurrence.
Therefore, GFS needed to have systems in place for automatic failure recovery. An engineer shouldn’t have to get involved every time there’s a failure. The system should be able to handle common failures on its own.
The individual files that Google wanted to store in GFS are quite big. Individual files are typically multiple gigabytes and so this affected the block sizes and I/O operation assumptions that Google made for GFS.
GFS is designed for big, sequential reads and writes of data. Most files are mutated by appending new data rather than overwriting existing data and random writes within a file are rare. Because of that access pattern, appending new data was the focus of performance optimization.
A GFS cluster consists of a single master node and multiple chunkserver nodes.
The master node maintains the file system’s metadata and coordinates the system. The chunkserver nodes are where all the data is stored and accessed by clients.
Files are divided into 64 megabyte chunks and assigned a 64 bit chunk handle by the master node for identification. The chunks are then stored on the chunkservers with each chunk being replicated across several chunkservers for reliability and speed (the default is 3 replicas).
The master node keeps track of the file namespace, the mappings from files to chunks and the locations of all the chunks. It also handles garbage collection of orphaned chunks and chunk migration between the chunkservers. The master periodically communicates with all the chunkservers through HeartBeat messages to collect its state and give it instructions.
An interesting design choice is the decision to use a single master node. Having a single master greatly simplified the design since the master could make chunk placement and replication decisions without coordinating with other master nodes.
However, Google engineers had to make sure that the single master node doesn’t become a bottleneck in the system.
Therefore, clients never read or write file data through the master node. Instead, the client asks the master which chunkservers it should contact. Then, the client caches this information for a limited time so it doesn’t have to keep contacting the master node.
A mutation is an operation that changes the contents or the metadata of a chunk (so a write or an append operation).
In order to guarantee consistency amongst the replicas after a mutation, GFS performs mutations in a certain order.
As stated before, each chunk will have multiple replicas. The master will designate one of these replicas as the primary replica.
Here are the steps for performing a mutation to a chunk:
GFS organizes files hierarchically in directories and identifies them by pathnames, like a standard file system. The master node keeps track of the mappings between files and chunks.
GFS provides the usual operations to create, delete, open, close, read and write files.
It also has snapshot and record append operations.
Snapshot lets you create a copy of a file or directory tree at low cost.
Record append allows multiple clients to append data to a file concurrently and it guarantees the atomicity of each individual client’s append.
To learn more about Google File System, read the full paper here.
If you’d like to read about the differences between GFS and HDFS, you can check that out here.