A Deep Dive into HTTP and it's Evolution

We'll delve into HTTP/1, HTTP/2 and HTTP/3 and talk about the issues each of them solved. We'll end with delving into Facebook and Dropbox's implementation of HTTP/3 and the results they saw.

February 19, 2024

Hey Everyone!

Today we’ll be talking about

How HTTP has Evolved over the Years
- We’ll delve into the evolution of HTTP and how (and why) it’s changed over the past few decades
- HTTP/0.9 with Tim Berners-Lee
- Release of HTTP/1.0 and issues with short-lived TCP connections
- HTTP/1.1 with Keep-Alive headers
- The release of HTTP/2 with multiplexing to avoid Head of Line blocking
- HTTP/3 and the introduction of QUIC
- How Dropbox and Facebook implemented HTTP/3 and what performance gains they saw
Tech Snippets
- How Facebook uses an LLM to generate Unit Tests
- How Stripe Keeps Track of Billions of Payment Transactions
- Why GitHub Copilot May Hurt Web Accessibility
- Startup Resources Toolkit
- Switching from a Software Engineer to Engineering Manager

The Evolution of HTTP

The web is built on top of HTTP and for nearly 20 years, HTTP/1.1 was the standard. The protocol worked incredibly well as the internet scaled up from tens of millions of users (1997) to billions of users (2015).

However, there were some issues with inefficiency (we’ll explain below) and in 2015, HTTP/2 was published as a new standard and has seen steadily growing adoption.

In 2022, HTTP/3 was officially published with a big shift from HTTP running on TCP to running on QUIC, a protocol built on top of UDP. (Note - HTTP/3 was official published as a standard in 2022 but it was supported by all major browsers by late 2020)

In this article, we’ll do a dive through the history of HTTP and talk about it’s major evolutions and why certain changes were made. We’ll also delve into QUIC and the tradeoffs HTTP/3 makes.

Finally, we’ll end with a discussion of HTTP/3 at Dropbox and Facebook and talk about the results they saw when they switched from HTTP/2.

Note - This is a Quastor Pro article so free readers will only see a preview of the first part of the article.

You can view the full article by subscribing here. You also get spaced-repetition flash cards on all the concepts in Quastor.

A Brief Intro to HTTP

HTTP is an application layer protocol that allows machines to communicate over the internet through sending requests and receiving responses. HTTP sets the rules for how the messages should be formatted, how to send multiple requests at once, how clients/servers can cache messages and much more.

If you have zero familiarity with the HTTP protocol, then I’d recommend this article by MDN.

Prior to HTTP/3, HTTP worked on top of the TCP protocol, a transport layer protocol. TCP is responsible for establishing the connection between client/server, dividing the HTTP messages into packets, reassembling them on the other machine, ensuring that packets are delivered correctly/uncorrupted and more.

For security reasons, you’d also want to make sure that the HTTP messages you sent were encrypted. This way, if they were intercepted in the middle, the attacker could not read/tamper with the specific contents. For this, you’d use TLS, which is responsible for authenticating the server, encrypting the messages being sent and checking the integrity of the data on the receiving end.

History

HTTP/0.9

Tim Berners-Lee is known as the founder of the web for three key technologies he came up with

Uniform Resource Identifiers
HTML
HTTP

(Note - he also did a bunch of other stuff like create the first web client/server, start the W3C and more.)

HTTP (HyperText Transfer Protocol) was his way of easily transferring HTML documents across the web.

He submitted the first proposal for HTTP in 1990. At this point, the internet only had a few million users (mainly just academics discussing technical papers and getting into arguments on Usenet. Fun fact - Godwin’s Law was created in 1990 and referred specifically to the Usenet newsgroup discussions.)

The first specification for HTTP (now known as HTTP/0.9) was published in 1991. It was extremely simple with the entire specification being a few hundred words (you can read it in a couple minutes).

It defined things like

Request Mechanism - It only supported the GET request, to request a document from a server. The request was just a single line with the method GET followed by the URI to the HTML page GET /webpage.html
Response Mechanism - The response would just be the HTML file as a byte stream of ASCII characters.
TCP/IP - The protocol was defined to work over TCP/IP. The client initiates a TCP/IP connection to the host and the server will break the connection once the whole document has been transferred.
Idempotency - Requests are idempotent. The server would not store any information about the request.

HTTP/1.0

As you’ve probably realized by now, the reception to the World Wide Web was super positive. The year 1994 was known as the “Year of the Web” and it saw an insane amount of traction. You might’ve seen that Jeff Bezos interview from 2001 where he came across the web’s startling growth of 2300% in 1994. That figure convinced him to leave his job to start Amazon.

By 1995, you had web browsers like Internet Explorer and Netscape Navigator (the precursor to Firefox).

Millions of normal people were starting to get on the web and the limitations of HTTP/0.9 were becoming super obvious. People wanted to do more than just transfer HTML documents with the web.

In 1996, HTTP/1.0 was finalized and published with RFC 1945.

This defined a lot of the specifications we use today with HTTP, including

Status Codes - 200, 404, 500, etc. sent at the beginning of the response
HTTP Headers - encode some metadata with your requests and responses
Content-Type - the Content Type header was added and you could send more than just HTML files as your response
POST - clients can use this to send data to the server

and more.

Issues with HTTP/1.0

A big issue with HTTP/1 was around how TCP connections were handled.

In 1996, RAM was an extremely limited resource and you didn’t want to waste it with unnecessary TCP connections. A server with thousands of users would have to waste a ton of precious RAM by keeping a thousand TCP connections open. Additionally, websites were mainly just text so you could get everything you needed to load the page with a single request/response.

Therefore, the protocol was designed so that the TCP connection would immediately close after you got your HTTP response. If you needed to make another HTTP request, then you’d open a new TCP connection.

Creating a new TCP connection every time is quite slow due to the 3-way handshake to initiate the connection and the TCP slow start (where the amount of data sent is slowly increased to max capacity).

This added a ton of latency when you had to send multiple requests to the server.

HTTP/1.1

This was quickly fixed in 1997 with the release of HTTP/1.1.

With HTTP/1.1, TCP connections were changed so they were persisted by default (with the Keep Alive HTTP header). You can set the connection to stay alive after the response was delivered. Then, another request can be sent through that same connection.

Additionally, HTTP/1.1 also had a feature called HTTP pipelining, where the client could send multiple HTTP requests one after another without having to wait for the responses.

Let’s say you needed image1, image2 and video1 to load a website. With pipelining, you could immediately send 3 requests with GET image1, GET image2 and GET video1 to the web server through a single TCP connection.

The server would get all the items in parallel and then respond with image1, image2 and video1. (However, the responses had to be in the same order that the data was requested in - we’ll talk more about this in the issues section)

Being able to do this simultaneously was (in theory) a lot faster than staggering the request/responses sequentially where you would do something like…

GET image1

Server responds with image1

GET image2

Server responds with image2

GET video1

Server responds with video1

Another feature with HTTP/1.1 was Chunked Transfer Encoding which allowed you to stream large files (images, videos) that you were requesting.

HTTP/1.1 was fundamental to the development of the internet and it had been the standard from 1997 to 2015.

Issues with HTTP/1.1

As mentioned, HTTP/1.1 introduced the pipelining feature, where a client could send multiple requests at once through a single TCP connection.

However, the server was required to respond to pipelined requests in the order they were received.

If a server received requests for image1, image2 and video1, then the server would be required to respond with image1 first, then image2 and finally video1.

This led to several issues

Complexity - Responding to requests in the same order that they were received was complicated to implement and bug-prone, especially if you had forward/reverse proxies.
Head of Line Blocking - If the server had to respond with image1, then image2 and finally video1 then any delays in retrieving image1 would delay image2 and video1. This limited the efficiency gains that the client got from pipelining the requests

With these issues with pipelining, many clients just resorted to creating multiple TCP connections for retrieving multiple assets concurrently. This way, you didn’t have to deal with the head of line blocking problem since they were all different TCP connections.

However, this negated the benefit of the Keep-alive header and meant that client/servers were spinning up multiple TCP connections and going through the expensive 3-way handshake progress again and again.

HTTP/2

Between 1997 and 2015, there was obviously a massive change in the internet ecosystem. We moved from static HTML/CSS websites to incorporating technologies like JavaScript, PHP, jQuery, React and more.

Websites started to require hundreds of resources and megabytes of data in order to fully render.

With this, the performance limitations of HTTP/1.1 became more apparent.

HTTP/2 was introduced in 2015 with a solution to this problem (along with a bunch of other features).

A big issue with HTTP/1.1 was how it didn’t use TCP connections efficiently; you had to wait for responses to be sent back in the same order of the requests.

HTTP/2 introduced the concept of multiplexing to fix this problem. Now, you can send multiple requests through a single TCP connection and receive their responses in any order.

So, if you needed image1, image2 and video1, then you could send

GET image1

GET image2

GET video1

The server could respond with these assets in any order. So, if it already had video1 cached, then it might respond with….

Server responds with video1

Server responds with image2

Server responds with image1

Or any other order. This was a lot easier to implement and reduced latency when requesting data.

HTTP/2 implemented this by introducing the concept of streams, which are virtual channels within a single TCP connection. When you want to request multiple items, then you do so within different streams within the same TCP connection. However, this did not completely solve the Head of Line blocking problem. We’ll delve into this in the HTTP/2 issues section.

In addition to multiplexing, HTTP/2 added a bunch of other features like

Pushing data to the client with Server Push
Efficient compression of HTTP header fields
Request prioritization

And more.

This is the first part of our tech dive on HTTP.

In the next part of the article, we’ll delve into

Issues with HTTP/2
HTTP/3 and QUIC
How QUIC works
HTTP/3 at Dropbox and Facebook and the results they saw.

For the full article, please subscribe here

Tech Snippets

How Stripe Keeps Track of Billions of Payment Transactions

Last Back Friday to Cyber Monday, Stripe had to process 300 million transactions with a total payment volume of $18.6 billion dollars. They were able to do this with 99.999% availability.

This is a detailed blog post from the Stripe Engineering blog on how their money movement tracking system works and how Stripe processes payments.

stripe.com/blog/ledger-stripe-system-for-tracking-and-validating-money-movement

How Facebook uses LLMs to generate Unit Tests

Facebook created TestGen-LLM, an automated test class improver.

This uses two of Facebook’s internal Large Language Models to generate and extend Kotlin test cases. They found that 73% of the generated test cases were accepted by Meta software engineers and deployed.

The paper delves into how TestGen-LLM works and how Facebook has deployed it across the company. It also details how Facebook is seeing improved test coverage as a result.

arxiv.org/pdf/2402.09171.pdf

How GitHub Copilot May Hurt Web Accessibility

Josh Collinsworth is a senior frontend developer at Deno. He wrote an interesting critique of GitHub copilot where he talks about his concern that it could be hurting accessibility.

While Copilot can help you speed up many small coding tasks, it’s suggestions can frequently be completely nonsensical. Even worse, the suggestions can also be sensical but not properly optimized for web accessibility.

joshcollinsworth.com/blog/copilot

Startup Resources Toolkit

If you’re interested in building your own start (or if you’re currently working at one), this is a great list of resources on startup-programs/credits, growth marketing, design, strategy and more.

There’s a lot of legal stuff you need to know about (especially if you’re at a venture-backed startup) so this repo has good links on convertible notes, capitalization tables, doing due diligence on VCs and more.

timothyhui.com.au/toolkit

Switching from a Software Engineer to Engineering Manager

Vladimir Klepov is an Engineering Manager at Ozon Fintech and he wrote a great blog post on his experiencing switching from an individual contributor to an engineering manager.

He delves into the things he loves and hates. The amount of impact you can have as an engineering manager can be very large and it also means many new, interesting challenges.

However, he wasn’t a fan of the increased amount of corporate BS he had to put up with. Additionally, there’s a much longer feedback loop on your performance when you’re an engineering manager. This can make it harder to improve.

thoughtspile.github.io/2024/02/16/eng-to-em