The Engineering behind Instagram's Recommendation Algorithm

Plus, how open source repos get hijacked, a curated list of software and architecture design patterns and more

Hey Everyone!

Today we'll be talking about

  • The Engineering Behind Instagram’s Recommendation Algorithm

    • The Candidate Generation/Retrieval Phase

    • Two Tower Neural Network Models and Why They’re Highly Scalable

    • First Pass Ranking With a Two Tower Model

    • Second Pass Ranking With a Multi-Task Multi-Label Model

  • Tech Snippets

    • How Open Source Repos get Hijacked through Social Engineering

    • A curated list of Software and Architecture Design Patterns

    • Build a More Transparent Uptime Page

    • OKRs are not OK

The Engineering Behind Instagram’s Recommendation Algorithm

One of Instagram’s most popular features is their explore page, where they recommend photos and videos to you. The majority of these photos and videos are from people you don’t follow, so Instagram needs to search through millions of pieces of content to generate recommendations for you.

Instagram has over 500 million daily active users, so this means billions of recommendations have to be generated every day.

Vladislav Vorotilov and Ilnur Shugaepov are two senior machine learning engineers at Meta and they wrote a fantastic blog post delving into how they built this recommendation system and how it was designed to be highly scalable.

We’ll cover a ton of concepts on Recommendation systems in this post.

If you’d like Spaced Repetition Flashcards (Anki) on all the concepts discussed in Quastor, check out Quastor Pro.

When you join, you’ll also get an up-to-date PDF with our past articles.

Recommendation Systems at a High Level

All the recommendation systems you see at Twitter, Facebook, TikTok, YouTube, etc. have a similar high-level architecture.

They have a layered architecture that looks something like the following

  1. Retrieval - Narrow down the candidates of what to show a user to thousands of potential items

  2. First Stage Ranking - Apply a low-level ranking system to quickly rank the thousands of potential photos/videos and narrow it down to the 100 best candidates

  3. Second Stage Ranking - Apply a heavier ML model to rank the 100 items by how likely the user is to engage with the photo/video. Pass this final ranking to the next step

  4. Final Reranking - Filter out and downrank items based on business rules (for ex. Don’t show content from the same author again and again, etc.)

The specific details will obviously differ, but most recommendation systems use this Candidate Generation, Ranking, Final Filtering type of architecture.

We’ll go through each of the layers in Instagram’s system and talk about how they work.

Retrieval

Ranking all of the billions of pieces of content uploaded to Instagram for every single user is obviously not feasible.

Therefore, the candidate generation stage uses a set of heuristics and ML models to narrow down the potential items to thousands of photos/videos.

In terms of heuristics, Instagram uses things like

  • Accounts you follow

  • Topics you’re interested in

  • Accounts you’ve previously engaged with

And metrics like that.

Some of these are calculated in real-time while others (for ex. topics you follow) can be pre-generated during off-peak hours and stored in cache.

In terms of ML models, Instagram makes heavy use of the Two Tower Neural Network model.

Two Tower Neural Networks

Two Tower Neural Networks is a very popular machine learning algorithm for recommender systems that Instagram uses heavily.

With a Two Tower Model, you generate embedding vectors for the user and for all the content you need to retrieve/rank. An embedding vector is just a compact representation that captures the attributes and relationships of an item in a machine-learning-friendly vector.

Once you have these embedding vectors, you can look at the similarity between a user’s embedding vector and the content’s embedding vector to predict the probability that the user will engage with the content.

One big benefit of the Two Tower approach is that both the user and item embeddings can be calculated during off-peak hours and then cached. This makes inference extremely efficient.

If you’d like to read more, Uber Engineering published an extremely detailed blog post on how they use Two Towers in the UberEats app to convince you to buy burritos at 1:30 am. I’d highly recommend giving it a skim if you’d like to delve deeper.

First Stage Ranking

After candidates are retrieved, the system needs to rank them by value to the user. This “value” is determined by how likely a user is to engage with the photo/video. Engagement is measured by whether the user likes/comments on it, shares it, watches it fully, etc.

The first stage ranker takes in thousands of candidates from the Retrieval stage and filters it down to the top 100 potential items.

To do this, Instagram again uses the Two Tower NN model. The fact that the model lets you precompute and cache embedding vectors for the user and all the content you need to rank makes it very scalable and efficient.

However, this time, the learning objective is different from the Two Tower NN of the Candidate Generation stage.

Instead, the two embedding vectors are used to generate a prediction of how the second stage ranking model will rank this piece of content. The model is used to quickly (and cheaply) gauge whether the second stage ranking model will rank this content highly.

Based on this, the top 100 posts are passed on to the second-stage ranking model.

Second Stage

Here, Instagram uses a Multi-Task Multi Label (MTML) neural network model. As the name suggests, this is an ML model that is designed to handle multiple tasks (objectives) and predict multiple labels (outcomes) simultaneously.

For recommendation systems, this means predicting different types of user engagement with a piece of content (probability a user will like, share, comment, block, etc.).

The MTML model is much larger than the Two Towers model of the first-pass ranking. Predicting all the different types of engagement requires far more features and a deeper neural net.

Once the model generates probabilities for all the different actions a user can take (liking, commenting, sharing, etc.), these weighted and summed together to generate an Expected Value (EV) for the piece of content.

Expected Value = W_click * P(click) + W_like * P(like) – W_see_less * P(see less) + etc.

The 100 pieces of content that made it to this stage of the system are ranked by their EV score.

Final Reranking

Here, Facebook applies fine-grained business rules to filter out certain types of content.

For example

  • Avoid sending too much content from the same author

  • Downrank posts/content that could be considered harmful

And so on.

Tech Snippets