How Discord Can Serve Millions of Users From a Single Server

Plus, the Art of Programming by Edsger Dijkstra, What Every Developer Should Know about Unicode and more.

Hey Everyone!

Today we'll be talking about

  • How Discord Can Serve Millions of Users From a Single Server

    • Discord uses Elixir, a functional programming language that runs on the BEAM virtual machine

    • We’ll talk about Elixir, BEAM and why they’re useful for building highly scalable applications

    • Plus, we’ll talk about how Discord profiles their Elixir applications and things they did to scale it.

  • How to improve your Focus

    • Andrew Huberman is a professor at the Stanford School of Medicine and he runs a great podcast where he gives actionable tips on how to improve your health. In this episode, he talks about how to improve your ability to focus.

    • Using Binaural Beats to get into flow

    • Working in 90 minute durations with a 10 minute break where you deliberately defocus

    • Taking advantage of your visual system to get into a focused state

  • Tech Snippets

    • Introduction to the Art of Programming by Edsger Dijkstra

    • Actionable Steps to Maximize Impact by Engineer’s Codex

    • 7 Types of Difficult Colleagues and How to Deal With Them

    • What Every Software Dev Should Know About Unicode in 2023

    • Transitioning from an Intern to a Staff Engineer at Meta

    • How Smartphones Fragment Your Attention Span

How Discord Can Serve Millions of Users From a Single Server

Discord is a voice/video/text communication platform with tens of millions of users. It’s quite similar to Slack, where it’s structured into servers. Discord servers (also called Guilds) are community spaces that have different text/voice channels for various topics.

These servers can get massive, with the largest Discord server having over 16 million users (this is the Midjourney server which you can use to generate images).

Whenever a message/update is posted on a Discord server, all the other members need to be notified. This can be quite a challenge to do for large servers. They can have thousands of messages being sent every hour and also have millions of users who need to be updated.

Yuliy Pisetsky is a Staff Software Engineer at Discord and he wrote a fantastic blog post delving into how they optimized this.

In order to minimize complexity and reduce costs, Discord has tried to push the limits of vertical scaling. They’ve been able to scale individual Discord backend servers from handling tens of thousands of concurrent users to nearly two million concurrent users per server.

Two of the technologies that have been important for allowing Discord to do this include

We’ll talk about both of these and then delve into some of the techniques Discord engineers implemented for scaling.

Note - internally, engineers at the company refer to Discord servers as “Guilds”.

We’ll use this terminology in the summary because we’ll also be talking about Discord’s actual backend servers. Using the same term in two different contexts is obviously confusing, so we’ll say Discord guilds to mean the servers a user can join to chat with their friends. When we say Discord server, we’ll be talking about the computers that the company has in their cloud for running backend stuff.


BEAM is a virtual machine that’s used to run Erlang code.

Erlang is a functional programming language designed at Ericsson (a pioneer in networking & telecom) in the 1980s. Other tech that Ericsson played a crucial role in developing include Bluetooth, 4G, GSM and much more.

The Erlang language was originally developed for building telecom systems that needed extremely high levels of concurrency, fault tolerance and availability. 

Some of the design goals were

  • Concurrent Processes - Erlang should make concurrent programming easier and less error prone.

  • Robust - Ambulances, police, etc. rely on telecom so high availability is a must. Therefore, programs should quickly recover from failures and faults shouldn’t affect the entire system.

  • Easy to Update - Downtime must be avoided, so it was created with the capability to update code on a running system (hot swapping) so that it can be quickly and easily modified. 

Concurrency with BEAM

As mentioned, Erlang was designed with concurrency in mind, so the designers put a ton of thought into multi-threading.

To accomplish this, BEAM provides light-weight threads for writing concurrent code. These threads are actually called BEAM processes (yes, this is confusing but we’ll explain why it’s called processes) and they provide the following features

  • Independence - Each BEAM process manages its own memory (separate heap and stack). This is why they’re called processes instead of threads (Operating system threads run in a shared memory space whereas OS processes run in separate memory spaces). Each BEAM process has its own garbage collection so it can run independently for each process without slowing down the entire system.

  • Lightweight - Processes in BEAM are designed to be lightweight and quick to spin up, so applications can run millions of processes in parallel without significant overhead.

  • Communication - As mentioned, processes in BEAM don’t share memory so you don’t have to deal with locks and tricky race conditions. Instead, processes will send messages to each other for communication.

This is one of the few reasons why Erlang/BEAM has been so popular for building large scale, distributed systems. WhatsApp also used Erlang to scale to a billion users with only 50 engineers.

However, one of the criticisms of Erlang has been the unconventional syntax and steep learning curve.

Elixir is a dynamic, functional programming language that was created in the early 2010s to solve this and add new features.


Elixir is a functional language released in 2012 that runs on top of the BEAM virtual machine and is fully compatible with the Erlang ecosystem.

It was created with Ruby-inspired syntax to make it more approachable than Erlang (Jose Valim, the creator of Elixir, was previously a core contributor to Ruby on Rails).

To learn more about Elixir, Erlang and other BEAM languages, I’d highly recommend watching this conference talk.

With the background info out of the way, let’s go back to Discord.

Fanout Explained

With systems like Discord, Slack, Twitter, Instagram, etc. you need to efficiently fan-out, where an update from one user needs to be sent out to thousands (or millions of users).

This can be simple for the average user profile, but it’s extremely difficult if you’re trying to fan out updates from Cristiano Ronaldo to 600 million instagram users.

Using Elixir as a Fanout System

With Discord, they need to send updates to all the members of a certain guild whenever someone sends a message or when a new person joins.

To do this, engineers use a single Elixir process (BEAM process) per Discord guild as a central routing point for everything that’s happening on that guild. Then, they use another process for each connected user’s client.

The Elixir process for the discord guild keeps track of the sessions for users who are members of the guild. On every update, it will fan out the messages to all the connected user client processes. Those client processes will then forward the update over a websocket connection to the discord user’s phone/laptop.

However, fanning out messages is very computationally expensive for large guilds. The amount of work needed to fan out a message increases proportionally to the number of people in the guild. Sending messages to 10x the number of people will take 10x the time.

Even worse, the amount of activity in a discord guild increases proportionally with the number of people in the guild. A thousand person guild sends 10x as many messages as a hundred-person guild.

This meant the amount of work needed to handle a single discord guild was growing quadratically with the size of the guild.

Here’s the steps Discord took to address the problem and ensure support for larger Discord guilds with tens of millions of users.


The first step was to get a good sense of what servers were spending their CPU/RAM on. Elixir provides a wide array of utilities for profiling your code.

Wall Time Analysis

The simplest way of understanding what an Elixir process is doing is by looking at it’s stack trace. Figure out where it’s slow and then delve deeper into why that’s happening.

For getting richer information, Discord engineers also instrumented the Guild Elixir process to record how much of each type of message they receive and how long each message takes to process.

From this, they had a distribution of which updates were the most expensive and which were the most bursty/frequent.

They spent engineering time figuring out how to minimize the cost of these operations.

Process Heap Memory Analysis

The team also investigated how servers are using RAM. Memory usage affects how powerful the hardware has to be and also how long garbage collection takes for clean up.

They used Elixir’s erts_debug.size for profiling, however this gave an issue for large objects since it walks every single element in the object (it takes linear time).

Instead, Discord built a helper library that could sample large maps/lists and use that to produce an estimate of the memory usage.

This helped them identify high memory tasks/operations and eliminate/refactor them.

Ignoring Passive Sessions

One of the most straightforward ways to reduce the load on servers is to just do less work. Clarify exactly what the requirements are with other teams and see if anything can be eliminated.

Discord did just this and they realized that discord guilds have many inactive users. Wasting server time sending them every update was inefficient as the users weren’t checking them.

Therefore, they split users into passive vs. active guild members. They refactored the system so that a user wouldn’t get updates until they clicked into the server.

Around 90% of the users in large guilds are passive, so this resulted in a major win with fanout work being 90% less expensive.

Splitting Fanout Across Multiple Processes

Previously, Discord was using a single Elixir process as the central routing point for all the updates that were happening on a discord server.

To scale this, they split up the work across multiple processes. They built a system called relays between the guilds and the user session processes. The guild process still handled some of the operations, but could rely on the Relays for other parts to improve scalability.

By utilizing multiple processes, they could split up the work across multiple CPU cores and utilize more compute resources for the larger guilds.

This is the first part of the techniques Discord used to scale their system. You can read the full article here.

How did you like this summary?

Your feedback really helps me improve curation for future emails.

Login or Subscribe to participate in polls.

Tech Snippets

How to Improve Your Ability to Focus

Andrew Huberman is a researcher at Stanford University and he has an amazing podcast (Huberman Lab) where he goes through current research on how you can live a happier, healthier and more productive life.

Being a programmer requires an ability to focus for long periods of time (get into a flow), so I found his podcast episode - Focus Tookit: Tools to Improve Your Focus & Concentration to be very useful.

Here's a quick summary of the tips he mentioned (based off peer-reviewed scientific literature).

  • Binaural Beats - There are playlists available on YouTube (or free apps if you google) that will play Binaural Beats (where you listen to two tones with slightly different frequencies at the same time). 40 Hz binaural beats have been shown to improve focus, attention and memory retention in a number of peer reviewed studies. Huberman recommends listening to 5-10 minutes of Binaural Beats prior to when you start your task to make it easier to get into a state of flow.

  • 90 Minutes - The ideal duration for focused sessions is 90 minutes or less. Past that, fatigue begins to set in and the amount of focus people are able to dedicate begins to drop off. Therefore, Huberman sets a timer for 90 minutes when he begins a focused task and stops after that.

  • Defocus - After the 90 minutes (or less) focus session, you should spend 10-20 minutes where you deliberately defocus and give your brain/body a chance to rest. During this time, you should avoid focusing on any single thing (so avoid using your phone) and can work on menial tasks where your mind can wander (talk a short walk, do the dishes, wash the laundry, etc.). This is the best way to recharge for the next 90 minute focus session after the break.

  •  Visual Field - A great deal of our cognitive focus is directed by our visual system. If you focus your eyes on a pen, you'll naturally start to focus on it and notice details about the pen. Cognitive focus tends to follow overt visual focus. Therefore, you can help ease yourself into a focused state by picking something in your room (part of the wall, an object, etc.) and staring at that object for 30 seconds to a few minutes (blinking is fine, don't try to force your eyes open). This helps you get into a focused state, and you can redirect your focus to your task after the 30 seconds is up.

These are a few of the tips Huberman mentions.

In the podcast, he also talks about using supplements like coffee, EPA, creatine and more.