A Dive Into Load Balancers
Hey Everyone!
Today we’ll be talking about
A Dive Into Load Balancers
Purpose of Load Balancers and the differences versus an API Gateway
L4 vs. L7 Load Balancers
Load Balancing Strategies (Round Robin, Least Connections, Hashing and more)
Tech Snippets
A collection of design patterns for building reliable, scalable and secure applications in the cloud
How the TypeScript compiler works
How to get promoted quickly as a developer
Writing a toy traceroute from scratch
Plus a solution to our last coding interview question
How Cloud Infrastructure Companies Minimize Downtime
NetApp is a cloud services company that provides storage infrastructure for data management, protection, analysis and more. The company has more than 11,000 employees and generates more than 5 billion dollars in revenue.
It’s crucial that engineers at NetApp minimize any downtime so they’ve built robust SRE tooling to quickly identify and manage any outages.
At the core of their infrastructure is a time series platform that detects adverse trends in real time and alerts the SRE team about any failures within the environment. This stack is built with InfluxDB, Grafana, Telegraf and more.
Dustin Sorge is the Lead Site Reliability Engineer at NetApp and he gave a talk on how they keep track of key metrics around performance, what the best practices and techniques are for monitoring memory and CPU usage and how they use InfluxDB to detect trends and coordinate fixes faster.
sponsored
Load Balancing Strategies
The Purpose of Load Balancers
As your backend gets more traffic, you’ll eventually reach a point where vertically scaling your web server (upgrading your hardware) becomes too costly. You’ll have to scale horizontally and create a server pool of multiple machines that are handling incoming requests.
A load balancer sits in front of that server pool and directs incoming requests to the servers in the pool. If one of the web servers goes down, the load balancer will stop sending it traffic. If another web server is added to the pool, the load balancer will start sending it requests.
Load balancers can also handle other tasks like caching responses, handling session persistence (send requests from the same client to the same web server), rate limiting and more.
Typically, the web servers are hidden in a private subnet (keeping them secure) and users connect to the public IP of the load balancer. The load balancer is the “front door” to the backend.
Load Balancer vs. API Gateway
When you’re using a services oriented architecture, you’ll have an API gateway that directs requests to the corresponding backend service. The API Gateway will also provide other features like rate limiting, circuit breakers, monitoring, authentication and more. The API Gateway can act as the front door for your application instead of a load balancer.
API Gateways can replace what a load balancer would provide, but it’ll usually be cheaper to use a load balancer if you’re not using the extra functionality provided by the API Gateway.
Here’s a great blog post that gives a detailed comparison of AWS API Gateway vs. AWS Application Load Balancer.
Types of Load Balancers
When you’re adding a load balancer, there are two main types you can use: layer 4 load balancers and layer 7 load balancers.
This is based on the OSI Model, where layer 4 is the transport layer and layer 7 is the application layer.
The main transport layer protocols are TCP and UDP, so a L4 load balancer will make routing decisions based on the packet headers for those protocols: the IP address and the port. You’ll frequently see the terms “4-tuple” or “5-tuple” hash when looking at L4 load balancers.
This hash is based on the "5-Tuple" concept in TCP/UDP.
Source IP
Source Port
Destination IP
Destination Port
Protocol Type
With a 5 tuple hash, you would use all 5 of those to create a hash and then use that hash to determine which server to route the request to. A 4 tuple hash would use 4 of those factors.
With Layer 7 load balancers, they operate on the application layer so they have access to the HTTP headers. They can read data like the URL, cookies, content type and other headers. An L7 load balancer can consider all of these things when making routing decisions.
Popular load balancers like HAProxy and Nginx can be configured to run in layer 4 or layer 7. AWS Elastic Load Balancing service provides Application Load Balancer (ALB) and Network Load Balancer (NLB) where ALB is layer 7 and NLB is layer 4 (there’s also Classic Load Balancer which allows both).
The main benefit of an L4 load balancer is that it’s quite simple. It’s just using the IP address and port to make its decision and so it can handle a very high rate of requests per second. The downside is that it has no ability to make smarter load balancing decisions. Doing things like caching requests is also not possible.
On the other hand, layer 7 load balancers can be a lot smarter and forward requests based on rules set up around the HTTP headers and the URL parameters. Additionally, you can do things like cache responses for GET requests for a certain URL to reduce load on your web servers.
The downside of L7 load balancers is that they can be more complex and computationally expensive to run. However, CPU and memory are now sufficiently fast and cheap enough that the performance advantage for L4 load balancers has become pretty negligible in most situations.
Therefore, most general purpose load balancers operate at layer 7. However, you’ll also see companies use both L4 and L7 load balancers, where the L4 load balancers are placed before the L7 load balancers.
Facebook has a setup like this where they use shiv (a L4 load balancer) in front of proxygen (a L7 load balancer). You can see a talk about this set up here.
Load Balancing Algorithms
Round Robin - This is usually the default method chosen for load balancing where web servers are selected in round robin order: you assign requests one by one to each web server and then cycle back to the first server after going through the list. Many load balancers will also allow you to do weighted round robin, where you can assign each server weights and assign work based on the server weight (a more powerful machine gets a higher weight).
An issue with Round Robin scheduling comes when the incoming requests vary in processing time. Round robin scheduling doesn’t consider how much computational time is needed to process a request, it just sends it to the next server in the queue. If a server is next in the queue but it’s stuck processing a time-consuming request, Round Robin will still send it another job anyway. This can lead to a work skew where some of the machines in the pool are at a far higher utilization than others.
Least Connections (Least Outstanding Requests) - With this strategy, you look at the number of active connections/requests a web server has and also look at server weights (based on how powerful the server's hardware is). Taking these two into consideration, you send your request to the server with the least active connections / outstanding requests. This helps alleviate the work skew issue that can come with Round Robin.
Hashing - In some scenarios, you’ll want certain requests to always go to the same server in the server pool. You might want all GET requests for a certain URL to go to a certain server in the pool or you might want all the requests from the same client to always go to the same server (session persistence). Hashing is a good solution for this.
You can define a key (like request URL or client IP address) and then the load balancer will use a hash function to determine which server to send the request to. Requests with the same key will always go to the same server, assuming the number of servers is constant.
Consistent Hashing - The issue with the hashing approach mentioned above is that adding/removing servers to the server pool will mess up the hashing scheme. Anytime a server is added, each request will get hashed to a new server. Consistent hashing is a strategy that’s meant to minimize the number of requests that have to be sent to a new server when the server pool size is changed. Here’s a great video that explains why consistent hashing is necessary and how it works.
There are different consistent hashing algorithms that you can use and the most common one is Ring hash. Maglev is another consistent hashing algorithm that was developed by Google in 2016 and has been serving Google’s traffic since 2008.
This is one of our past System Design posts from Quastor Pro. If you'd like Weekly Articles on System Design Concepts (in addition to the tech summaries), check out the premium version of Quastor!
Past Articles include
It's $12 a month. I'd highly recommend using your job's Learning & Development budget to pay for Quastor.
Tech Snippets
Cloud Design Patterns - This is a great list of cloud design patterns that you can employ when building cloud applications. Most of the patterns include code samples that show how to implement the pattern on Azure but the patterns are relevant to any distributed system and can be run on any cloud platform.
How the Typescript Compiler Works - Orta Therox is an engineer at Microsoft working on TypeScript. He gave a great talk where he delves into how the TypeScript compiler works. He goes through the full pipeline of TypeScript reading a ts file, manipulating it in memory and then writing a javascript file. You can view the slides here.
How to get promoted as a developer- This is a great video from Rahul Pandey (co-founder of Tech Career Growth) on how to get promoted quickly as a software engineer. He interviews the former VP of Engineering at Mixpanel on how a software engineer at the company went from Engineer 2 -> Senior Engineer -> Staff Engineer -> Tech Lead -> Principal Tech Lead in just 4 years. The key is to do work that has a great impact on the metrics your company is trying to improve. The video gives some tips on how to do that.
Writing a Toy Traceroute from Scratch - You can use traceroute to trace the route of packets from your computer to another computer. If you want to learn how traceroute works, this is a great blog post that dives into that. There’s also Python code that creates a toy version of Traceroute.
Get a look into what CTOs are reading - If you find Quastor useful, you should check out Pointer.io. It's a reading club for software developers that sends out super high quality engineering-related content. It's read by CTOs, engineering managers and senior engineers so you should definitely sign up if you're on that path (or if you want to go down that path in the future). It's completely free! (cross promo)
Interview Question
You are given an array nums of distinct positive integers.
Return the number of tuples (a, b, c, d) such that
a * b = c * d
where a, b, c and d are elements of nums and
a != b != c != d
Previous Question
As a reminder, here’s our last question
Given an array nums of distinct integers, return all the possible permutations. You can return the answer in any order.
Solution
We can solve this question with Backtracking.
The core idea is that we'll recursively traverse through nums and build every possible permutation using backtracking.
We have a function called _permute that takes in
index - this is an integer that indicates what position we're at in nums
path - this is the current ordering (permutation) that we're building up for nums
In our recursive function, we'll iterate through all the positions from the index to the end of nums.
For each position, we'll switch it with the index and create a new permutation.
Then, we'll go down all the different orderings for that permutation by recursively calling our _permute function on that ordering.
After exploring that permutation, we can switch index back to the original position and then continue with our loop.
When we've reached the end of the array (index is equal to the length of nums), then we can add the current permutation to our list of permutations.
If you prefer visual explanations, here's the best YouTube video I found that covers this problem.