How Discord's Live Streaming Works

We'll delve into Discord's Live Streaming Feature and how it works. Plus, Productivity Tips from Sam Altman, a Guide to Software Engineering Contracting and more!

Hey Everyone!

Today we’ll be talking about

  • How Discord’s Live Streaming Works

    • Capturing Video and Audio from the Streamer’s Device

    • Encoding the Livestream Feed to Minimize Network Load

    • Transmitting and Processing the Feed on Discord Servers

    • Decoding the Feed on Viewer Devices and Measuring Performance

  • Productivity Tips from Sam Altman

    • Spend lots of time thinking about what you work on

    • Sam’s three pillars of productivity

    • Overcommitting just a bit can help you stay focused and achieve more

  • Tech Snippets

    • How Facebook created the largest Memcached system in the world

    • Latency numbers every programmer should know (Visualized)

    • Guide to software engineering contracting in the UK

    • Monte-Carlo graph search explained from first principles

    • A practical book on Go

Imagine a bot that can monitor your stock portfolio’s latest headlines and send you an instant alert for any significant news. Or an app that can monitor the latest news stories and use GPT-4 to craft summaries without the clickbait.

With the Brave Search API, you can build both of these -and hundreds of other ideas- in a weekend!

You probably know Brave as the ad-blocking, privacy focused web browser. However, they also have one of the fastest growing, independent search engines out there. Now, they’ve released the Brave Search API to let you incorporate this search engine in your apps.

  • Affordable - It’s much cheaper and easier to set up than the other big tech options. So it’s perfect for everything from small projects to large apps.

  • High Quality - Brave’s index is populated with sites that real people actually visit. No junk or clickbait farms, no SEO spam, and a much more human dataset.

  • Easy to Use - It’s quick and easy to set up. Data is structured for simple implementation across a wide range of apps, from NLP to complex analysis.

  • It’s Fast - The API is optimized for low latency, so it’s ideal for real-time apps like responsive search or chatbots.

You can use the API for free for up to 2,000 queries per month.

sponsored

How Discord’s Live Streaming Works

Discord is a communication platform for gamers that gives them a space to talk via text, voice and video. The app has become incredibly popular and now has hundreds of millions of active users.

One key feature in the app is Go Live, which allows a user to livestream their screen/apps/video game to other people in their Discord group. This feature was first released for the desktop app but has grown to support phones, gaming consoles and more.

Josh Stratton is an Engineering Manager at Discord and he wrote a fantastic blog post on how Discord’s live streaming feature works.

He goes through each step in the process: capturing video/audio, encoding, transmission to end-viewers and decoding the livestream feed.

We’ll delve into each of these steps and also talk about how Discord measures performance.

If you’d like to remember the live-streaming concepts we discuss in this article, check out Quastor Pro. You’ll get detailed Spaced-Repetition Anki Flash Cards on all the concepts covered in past Quastor articles.

Capture

The first step is capturing the video/audio that the streamer wants to share, whether it’s from an application, a game, or a YouTube video they’re watching.

We’ll break down both: capturing video and capturing audio.

Capturing Video

To capture video, Discord uses several strategies. They also have a robust fallback system so if one method fails, it’ll quickly switch over to the next method without interrupting the stream.

One strategy is to capture video using operating system methods. Discord didn’t state specifically what they use, but for Windows, this might be with the Desktop Duplication API (part of DirectX). For macOS, it could be with AVFoundation, a framework by Apple for working with video on macOS/iOS.

Another strategy Discord employs is to use dll-injection to insert a custom library into the other process’s address space. Then, Discord can capture the graphical output directly from the application.

Capturing Audio

Discord uses OS-specific audio APIs to capture audio from the shared screen. There’s Core Audio APIs for Windows and Core Audio for macOS.

Audio is usually generated from several processes (music from the video game, another for a voice chat, video from youtube, etc.)

Discord captures audio from all these shared processes and all their children.

Encoding

A single unencoded 1080p frame from a screenshare can be upwards of 6 megabytes. This means that a minute of unencoded video (assuming 30 frames per second) would be 10.8 gigabytes. This is obviously way too inefficient.

Instead, Discord needs to use a codec to encode the video to transfer it over the network. The app currently supports VP8 and H264 video codecs, with HEVC and AV1 encoders available on specific platforms.

The final output quality (framerate, resolution, image quality) from the encoder is determined by how much bandwidth is available on the streamer’s network.

Network conditions are constantly changing so the encoder needs to handle changes in real-time. If network conditions drop drastically, then the encoder will start dropping frames to reduce congestion. 

Transmission

The livestream feed is sent from the streamer’s device to Discord’s servers. Their servers handle several tasks like

  • Quality Adjustments - the backend can adjust the stream’s quality based on each of the viewer’s bandwidth and device capabilities. Discord uses WebRTC bandwidth estimators to figure out how much data a viewer can reliably download.

  • Routing - Discord servers will determine the most efficient path to deliver the stream to viewers, considering factors like network conditions and geographic locations

Decoding

When the encoded video reaches the viewer’s device, they’ll first need to decode it. If the user’s device has a hardware decoder (modern GPUs often come with their own  built-in encoder/decoder hardware) then the Discord app will use that to limit CPU usage.

Audio and video are sent over separate RTP packets, so the app will synchronize them before playback.

Measuring Performance

In order to provide the best live-streaming experience, the Discord team looks at several metrics to measure quality

They consider factors like

  • Frame rate

  • Consistent Frame Delivery

  • Latency

  • Image Quality

  • Network Utilization

And more.

They also monitor the CPU/memory usage on the devices of the live-streamer and the viewers to ensure that they’re not taking up too many resources.

A major challenge is finding the right balance between these metrics, such as the tradeoff between increasing frame rate and its impact on latency. Finding the right compromise that matches user expectations for video quality against their latency tolerance is difficult.

To measure user feedback, they use surveys to ask users how good the livestream quality is.

For more details, read the full article here.

With LLMs and Real-time Web Search, there’s a huge number of new possibilities for applications you can spin up in a weekend.

You could build a travel-deal finder that searches through different airlines and hotels for last-minute deals. Or you could create an AI research assistant that scans through web search results and gives you key findings and methodologies. There’s a countless number of side projects and applications that you can build.

The Brave Search API is an easy way to access the entire web through a fast API and use the search results in your application.

The API is

  • Affordable - It’s much cheaper and easier to set up than the other big tech options. So it’s perfect for everything from small projects to large apps.

  • High Quality - Brave’s index is populated with sites that real people actually visit. No junk or clickbait farms, no SEO spam, and a much more human dataset.

  • Easy to Use - It’s quick and easy to set up. Data is structured for simple implementation across a wide range of apps, from NLP to complex analysis.

  • It’s Fast - The API is optimized for low latency, so it’s ideal for real-time apps like responsive search or chatbots.

You can use the API for free for up to 2,000 queries per month.

sponsored

Tech Snippets

Premium Content

Subscribe to Quastor Pro for long-form, in-depth articles on concepts in system design and backend engineering.

The articles are each thousands of words and also come with spaced-repetition (Anki) flash cards so you can remember all the core concepts discussed.

Past article content includes 

System Design Concepts

  • Measuring Availability

  • API Gateways

  • Database Replication

  • Load Balancing

  • API Paradigms

  • Database Sharding

  • Caching Strategies

  • Event Driven Systems

  • Database Consistency

  • Chaos Engineering

  • Distributed Consensus

Tech Dives

  • Redis

  • Postgres

  • Kafka

  • DynamoDB

  • gRPC

  • Apache Spark

  • HTTP

  • DNS

  • B Trees & LSM Trees

  • OLAP Databases

  • Database Engines

When you subscribe, you’ll also get Spaced Repetition (Anki) Flashcards for reviewing all the main concepts discussed in prior Quastor articles

Productivity Tips from Sam Altman

Sam Altman is the CEO of OpenAI and was formerly the president of YCombinator. He also briefly served as the CEO of Reddit (for 8 days).

He writes a terrific blog where he shares his thoughts and he’s previously published an interesting post with his thoughts around being productive.

We’ll share the high level takeaways here.

What You Work On

If you’re working on the wrong thing, then it doesn’t matter how productive you are. Picking the right thing to work on is the most important element of productivity… and it’s often ignored.

Therefore, Sam leaves a ton of time on his schedule to think about what to work on. He does this by reading books, spending time with interesting people and spending time in nature.

Being around smart, productive, happy and positive people is extremely important. You should avoid spending time with the opposite kind of people; especially those who belittle your ambitions.

An important thing to remember is that you can learn anything you want and you can also get better quickly. The more you exercise this “skill” (by learning new things), the more you’ll trust it.

Three Pillars of Productivity

After figuring out what to work on, Sam describes his productivity system as having three pillars.

  1. Prioritize the Important Work

  2. Avoid Wasting Time

  3. Make Lots of Lists

Prioritization

Sam tries to prioritize in a way that generates momentum. Getting things done can be a positive feedback loop where getting something done makes you feel happier and makes it easier to get more done (since you’re in a good mental state).

Sam takes advantage of this by starting and ending each day with something he can really make progress on. He’s also ruthless about making sure that his most important projects get completed.

Avoid Wasting Time

Here’s Sam’s tips on how he avoids wasting time

  1. Be ruthless about saying No to things that aren’t essential

  2. Avoid meetings. Most meetings are best scheduled for 15-20 mins or 2 hours. The default of 30 mins - 1 hour is usually wrong and leads to lots of wasted time.

  3. Avoid productivity porn. Many people spend too much time optimizing their system instead of asking whether they’re working on the right problems.

Lists

Sam highly recommends making lots of lists. He has lists for what he’d like to accomplish each year, each month and each day. They help him stay focused and understand what you need to get done.

Overcommitting

In general, Sam found it useful to overcommit a bit. If you have a little bit too much then that forces you to stay focused and avoid distractions. However, overcommitting for too much is disastrous.

For the rest of Sam’s tips, read the full blog post here.