How Pinterest Optimized Video Playback

An introduction to Adaptive Bitrate Streaming and how Pinterest was able to reduce startup latency for videos. Plus, the architecture of open source applications and how Anthropic was able to improve RAG.

September 20, 2024

Hey Everyone!

Today we’ll be talking about

How Pinterest Optimized Video Playback
- Introduction to Adaptive Bitrate Streaming, HLS and DASH
- Why Pinterest was experiencing high startup latency for videos
- Embedding the video manifest files in their metadata API and improving performance with caching
Tech Snippets
- Digital Signatures and how to avoid them
- The Architecture of Open Source Applications
- Anthropic’s new blog post on Contextual Retrieval for RAG

How Pinterest Optimized Video Playback

Pinterest is a social media platform that helps you discover ideas and inspiration related to whatever you’re interested in (cooking recipes, home decor, clothing, etc)

The platform was launched in 2010 and it’s grown to over 500 million monthly active users. Pinterest is now publicly traded and valued at more than $20 billion.

Like every other social platform, video content is one of the most popular mediums on Pinterest. When you’re serving videos to your users, one of your highest priorities should be to minimize any buffering and startup delay. With the modern day attention span, even having your video buffer for a couple of seconds can result in a huge number of users leaving your app.

Pinterest engineering published a great blog post on how they optimized video playback and reduced startup latency by 36%.

We’ll give some context on how videos are streamed, what protocols are involved and what Pinterest did to optimize playback.

Introduction to Adaptive Bitrate Streaming

When you’re delivering video to users, one technique that’s used universally nowadays is Adaptive Bitrate Streaming.

This is where you take the video and encode it at multiple bitrates and resolutions and store them all on your server. When a user wants to play the video, their phone will select the optimal rendition based on factors like network bandwidth and device characteristics to minimize any buffering.

With Adaptive Bitrate Streaming, the player can also switch dynamically between different bitrates. If the internet connection weakens while they’re watching a video on their phone, ABR allows the player to automatically switch to a lower bitrate stream so playback can be smooth without any buffering interruptions.

When the network improves, the player will automatically switch back to the higher bitrate stream to provide better video quality.

Basics of Adaptive Bitrate Streaming

There are different protocols you can use for Adaptive Bitrate Streaming, but they share some common fundamentals.

Chunking - the video file is broken up into small chunks. Each chunk ranges from 2-10 seconds in length.
Multiple Renditions - Each chunk is encoded at multiple bitrates and resolutions.
Manifest File - a manifest file contains metadata about the available renditions for every chunk, including their bitrates and resolutions.
Dynamic Selection - the user’s video player will use the manifest file to determine which chunk to download based on the current network conditions and device capabilities.

The most widely adopted Adaptive Bitrate protocols are HTTP Live Streaming (HLS) and Dynamic Adaptive Streaming over HTTP (DASH).

You’ve probably realized this by the names but HLS and DASH are both based on HTTP.

HTTP Live Streaming (HLS)

HLS was developed by Apple in 2009 and it’s one of the earliest and most widely adopted ABR protocols. The video stream is broken into small, HTTP-based downloads. It supports both live and on-demand streaming.

It’s developed and maintained by Apple so it’s natively supported on iOS, macOS and Safari.

HLS uses .m3u8 manifest files to guide the player in selecting the most appropriate video chunks based on real-time network conditions.

Dynamic Adaptive Streaming over HTTP (DASH)

DASH was created by a consortium of companies led by MPEG (Moving Picture Experts Group). The protocol was first published in 2012 and it currently powers platforms like YouTube and Netflix.

DASH uses .mpd manifest files to provide metadata about the available renditions and chunk URLs.

Video Streaming at Pinterest

At Pinterest, both HLS and DASH are used for delivering videos across iOS and Android platforms, respectively.

HLS: Utilized for video streaming on iOS devices through Apple’s AVPlayer, accounting for approximately 70% of video playback sessions on iOS apps.
DASH: Employed for video streaming on Android devices using ExoPlayer, representing around 55% of video playback sessions on Android.

One of the key metrics Pinterest measures for video performance is startup latency - the time it takes for a video to begin playing after a user initiates playback.

As we stated above, both HLS and and DASH require a manifest file before you can initiate video playback. With HLS, you might have to download additional manifest files (for the specific rendition) after downloading the main one.

Only after you download the manifest file can the video player start downloading the first few chunks of the video. This is the primary contributor to users’ perceived latency.

The Pinterest team decided to eliminate the latency from the round trips by embedding all the relevant manifest files in the original API response. When a user first requests metadata for a video (thumbnail, title, etc.), the API response to that request will also contain the manifest files of the video.

During playback, the player can swiftly access the manifest information locally and immediately start downloading video chunks.

Reducing API Response Time

When Pinterest started including manifest files in the API responses, the primary issue they faced was increased latency for the API endpoint. The backend now had to retrieve manifest files before it could respond with video metadata.

They were able to solve this issue with caching. They added a MemCache layer into the manifest serving process to cache the most popular video manifest files.

Here’s the new process for retrieving manifest files.

API Request - a client requests Pins metadata
Manifest Embeddings - the Backend retrieves manifest files from S3, serializes them and embeds the bytes within the API response
MemCache - Subsequent requests for popular video manifest files are served immediately from the MemCache caching layer.
Response Delivery - the API delivers the payload with the manifest data embedded

Results

With this new setup, Pinterest was able to see a 36.7% reduction in p90 startup latency on iOS. They also saw a 12.3% reduction in the number of users who had to wait longer than 1 second for a video to start.

Tech Snippets

The Architecture of Open Source Applications

This is a terrific series of free books that teach you software architecture using practical examples from open source.

The chapters go through applications like Git, CMake, Audacity, Firefox and more and explain how they work.

aosabook.org/en

Digital signatures and how to avoid them

Neil Madden is the author of API Security in Action and has worked as a Security Architect and software engineer.

He wrote a really interesting blog post on digital signatures, how they work and when they should be used (and when they should be avoided). He talks about the fragility of current signature schemes and how they can lose important contextual details. For many use-cases, Madden advoces using simpler methods like HMAC for authentication instead of digital signatures.

neilmadden.blog/2024/09/18/digital-signatures-and-how-to-avoid-them

Introducing Contextual Retrieval

Anthropic (creators of Claude) recently published an article on how to improve Retrieval-Augmented Generation systems using a method called “Contextual Retrieval”. This addresses a common issue in traditional RAG systems where context can be lost when splitting documents into smaller chunks.

Using this technique, Anthropic was able to reduce failed retrievals in RAG by 67%. They were also able to make it much more cost-effective, saving about $1.02 per million tokens on LLM API costs.