Tech Dive - Containers

A dive into virtual machines, containers and docker. Plus, how Slack manages infrastructure projects, a guide to onboarding software engineers and more.

April 26, 2024

Hey Everyone!

Today we’ll be talking about

A Tech Dive on Virtual Machines, Containers and Dockers
- Why should you care?
- Deploying Code before VMs and the issues involved
- Intro to VMs and their benefits
- Intro to containers and their pros/cons compared to VMs
- Intro to Docker and key concepts
- Alternatives to Docker and the future of containers
Tech Snippets
- 5 properties of a healthy software project
- Build a CDN from Scratch
- The Ultimate Guide to Onboarding Software Engineers
- How Slack Manages Infrastructure Projects

Tech Dive - Docker

One of the biggest trends in tech over the last decade has been the rise of containers and the meteoric growth they’ve seen in adoption. They’ve completely changed the way companies ship software and have helped catalyze new technologies/paradigms like microservices-based architectures or serverless computing.

Nearly all of the backend dev buzzwords you’ll see thrown around nowadays rely (in some form) on containers.

In this tech dive, we’ll first talk about the rise of containers and what problems they solve. Afterwards, we’ll delve into Docker, how it works and its associated ecosystem.

This is the first part of our pro article on Containers and Docker.

We’ll be sending out the full article tomorrow, so you can get the full content by subscribing to Quastor Pro here. Thanks for the support, I really appreciate it!

Why should you care?

There’s a constant influx of new technology being created every day… so why should you care about containers?

A container is a technology that allows you to package your application code, dependencies, environment variables, configuration settings, etc. into a single bundle.

You can then share this bundle (called a container image) with other developers who need to run your application. It makes deploying your code significantly easier.

Some benefits of using containers are

Consistent Development Environments
Simplified Dependency Management
Faster Onboarding for New Devs
Reproducible Builds
Easier Scalability
Portability
Easier Versioning & Rollbacks
Configuration Management

These are all benefits of using containers like Docker or LXC. However, these benefits didn’t start with containers.

Many of them came in the early 2000s with the precursor to containers - virtual machines (VMs).

In this article, we’ll first talk about life before virtual machines, why they came about, how they changed the game and their pros/cons.

Then, we’ll delve into the pros/cons of containers and why they’re so widely adopted. After, we’ll talk about Docker (an open-source platform for creating, deploying and running apps with containers) and the ecosystem of tooling with dockerfiles, docker engine, registries and more.

We’ll also explore alternatives to Docker and why you’d consider those. We’ll end with a discussion about the future of containers and where the industry is heading.

Deploying your Code Prior to VMs

Let’s say you’re building a small web application where you’re using Postgres for storage and Redis for caching. You’d like to deploy this application so that other people on the internet can use it.

There’s several issues you’ll have to deal with

Server Costs
Portability

Server Costs

Prior to VMs/containers, you’d probably have to buy your own dedicated machine to run your application.

You can’t run it on your own machine since you might interfere with the application when you’re using your computer for your own personal tasks.

You could consider running your application on your buddy’s powerful server that has some spare capacity. However, this comes with its own challenges.

Dependency Conflicts - Your friend’s server might already be running Redis/Postgres, but he could be using older versions that are incompatible with your application.
Resource Contention - Maybe your friend’s application periodically has a spike in CPU usage. You can’t specifically allocate CPU/memory to each application so his CPU-usage spike will become your problem.
Security Issues - Your friend might have security vulnerabilities in his code. His application getting hacked could mean your data gets leaked too since they’re on the same OS.

Unfortunately, your only option is to buy your own machine. Even worse, you’ll also have to plan for future user growth. If your application doubles in the number of users, it would be quite wasteful if you had to throw out all the old hardware and buy a new, beefier server to keep up with the user growth.

In addition to the excessive spending, another issue you’ll deal with is portability.

Portability

Let’s say you want to send your web app to a QA tester for debugging, then that QA tester will also need postgres and redis installed on their machine.

Before they can even run your application, they’ll have to set up all the dependencies, utilities and scripts that you’re using. They’ll also have to make sure the versioning is correct, environment variables are set, configure security policies, etc.

This process can be extremely tedious and also very error-prone. It’ll lead to the “it works on my machine!” argument that can often happen between developers and DevOps.

With our example so far…. we just have a single QA tester. What happens when you have to deploy this app across hundreds of web servers?

Manually installing the dependencies/config settings/application is incredibly tedious and error prone.

Whenever there are any app updates, dependency upgrades, configuration changes, then you’ll have to make sure that the updates are correctly rolled out across all the servers in your fleet.

VMs were created to address both of these problems.

Virtual Machines

Virtual Machines allow you to run multiple “virtual computers” on a single physical computer. Each of these virtual computers (virtual machines) will have its own operating system with virtualized hardware. In other words, each VM behaves as if it were a standalone physical computer with its own dedicated amount of CPU, RAM and disk storage.

Virtual machines work by relying on a software layer called a hypervisor. Hypervisors can be split into Type 1 Hypervisors (run on a bare metal server) and Type 2 hypervisors (run on a guest operating system). All your virtual machines run on top of the hypervisor, which allocates the physical hardware resources across the VMs.

When you want to use a hypervisor to run your application, you take your application code, dependencies, libraries, etc. and create a VM image.

After you’ve created your VM image, you can back it up on AWS S3, distribute it across all the servers in your fleet, store it in an online repository and more (you can treat it just as any other file). When someone needs to run your application, they can just download the VM image and run it using a hypervisor.

Prior to the 1990s, VMs were mostly used with mainframes or in academic research. The first virtual machines were created at IBM to help them with time sharing for their massive mainframe servers.

In 1998, VMWare was founded and they released VMware Workstation in 1999. This allowed you to run virtual machines on the x86 architecture, which was the most common computing platform for personal & enterprise machines. It became much easier and more commonplace to run virtual machines.

With Virtual Machines, you now have the following benefits

Cost Efficiency - different users can run their applications on a single physical server by using different virtual machines. This allows you to use all the resources of the machine much more effectively.
Isolation & Security - VMs provide a high degree of isolation from each other. If an application on one machine gets hacked, the applications on different machines are still safe.
Flexibility & Portability - You can create VM images with everything you need to run your application. Another developer can download your VM image, run it and immediately get set up.

Virtualization exploded in popularity in the 2000s and helped spurn the growth of cloud computing. If you’d like to rent a server in the cloud, you no longer have to rent the entire server. Instead, you could just rent a small portion of the computing resources of the server. Amazon was the first cloud provider to take advantage of this with the launch of EC2 in 2006 (and it’s done rather well for them).

However, as VMs proliferated, some issues began to come up.

The main problem was around how heavy and compute-intensive VMs were.

Each VM required its own dedicated OS. That meant that each VM would be

Resource Intensive - each VM occupies a large footprint on the underlying host machine
Require Patching and Maintenance - you need to dedicate time on OS-level patching and management
Slow to Boot - booting up a VM would take dozens of seconds

However, for many uses of VMs, you don’t actually need the entire OS. You could use something far more lightweight.

This is where containers come into play.

This is the first part of our tech dive on Containers and Docker. We’ll be sending out the full article (2500+ words) to Quastor Pro subscribers tomorrow.

We’ll talk about containers and their pros/cons compared to VMs, Docker and key concepts you should know (registries, dockerfiles, docker engine, open container initiative, podman, containerd and more)

You can get the full article by subscribing to Quastor Pro.

Tech Snippets

5 properties of a healthy software project

Dominik wrote a great blog post delving into properties he’s always seen with healthy software projects. These tended to be missing in the not-so-healthy projects he worked on

1. Reliable Estimations
2. Understanding Interdependency
3. Reproducibility of Builds and Infrastructure
4. Confident Deployments
5. Proper Team Structures

Read the full blog post for an in-depth explanation on each of these.

dominikbraun.io/blog/5-properties-of-a-healthy-software-project

Build a CDN from Scratch

Building something from scratch is a fantastic way to get a solid understanding of how it works.

This is a fantastic article where you build a CDN. You’ll learn about tech like Nginx, Lua, Prometheus, Grafana and more.

You’ll also get some practice with concepts like load balancing and containers.

github.com/leandromoreira/cdn-up-and-running

The Ultimate Guide to Onboarding Software Engineers

Leadership Garden is a great blog that helps engineers become better leaders. This post gives a 13-step onboarding framework for helping new developers succeed at your company. The framework is divided into 30 day, 60 day, 90 day and 150 day milestones.

leadership.garden/onboarding-engineers

How Slack Manages Infrastructure Projects

This is a fantastic blog post that delves into the development, support and eventual retirement of infrastructure projects at Slack.

The model consists of six stages: Alpha, Beta, Active, Maintenance, Deprecated and Retired. Each stage has specific expectations for the acceptance of new customers, feature requests, bug reports, security, documentation quality, etc.

Having a documented and concrete lifecycle policy made it easier for Slack’s infrastructure teams to plan and communicate their work.

slack.engineering/technology-lifecycle