The Engineering behind Instagram's Recommendation Algorithm

Plus, how open source repos get hijacked, a curated list of software and architecture design patterns and more

May 09, 2024

Hey Everyone!

Today we'll be talking about

The Engineering Behind Instagram’s Recommendation Algorithm
- The Candidate Generation/Retrieval Phase
- Two Tower Neural Network Models and Why They’re Highly Scalable
- First Pass Ranking With a Two Tower Model
- Second Pass Ranking With a Multi-Task Multi-Label Model
Tech Snippets
- How Open Source Repos get Hijacked through Social Engineering
- A curated list of Software and Architecture Design Patterns
- Build a More Transparent Uptime Page
- OKRs are not OK

The Engineering Behind Instagram’s Recommendation Algorithm

One of Instagram’s most popular features is their explore page, where they recommend photos and videos to you. The majority of these photos and videos are from people you don’t follow, so Instagram needs to search through millions of pieces of content to generate recommendations for you.

Instagram has over 500 million daily active users, so this means billions of recommendations have to be generated every day.

Vladislav Vorotilov and Ilnur Shugaepov are two senior machine learning engineers at Meta and they wrote a fantastic blog post delving into how they built this recommendation system and how it was designed to be highly scalable.

We’ll cover a ton of concepts on Recommendation systems in this post.

If you’d like Spaced Repetition Flashcards (Anki) on all the concepts discussed in Quastor, check out Quastor Pro.

When you join, you’ll also get an up-to-date PDF with our past articles.

Recommendation Systems at a High Level

All the recommendation systems you see at Twitter, Facebook, TikTok, YouTube, etc. have a similar high-level architecture.

They have a layered architecture that looks something like the following

Retrieval - Narrow down the candidates of what to show a user to thousands of potential items
First Stage Ranking - Apply a low-level ranking system to quickly rank the thousands of potential photos/videos and narrow it down to the 100 best candidates
Second Stage Ranking - Apply a heavier ML model to rank the 100 items by how likely the user is to engage with the photo/video. Pass this final ranking to the next step
Final Reranking - Filter out and downrank items based on business rules (for ex. Don’t show content from the same author again and again, etc.)

The specific details will obviously differ, but most recommendation systems use this Candidate Generation, Ranking, Final Filtering type of architecture.

We’ll go through each of the layers in Instagram’s system and talk about how they work.

Retrieval

Ranking all of the billions of pieces of content uploaded to Instagram for every single user is obviously not feasible.

Therefore, the candidate generation stage uses a set of heuristics and ML models to narrow down the potential items to thousands of photos/videos.

In terms of heuristics, Instagram uses things like

Accounts you follow
Topics you’re interested in
Accounts you’ve previously engaged with

And metrics like that.

Some of these are calculated in real-time while others (for ex. topics you follow) can be pre-generated during off-peak hours and stored in cache.

In terms of ML models, Instagram makes heavy use of the Two Tower Neural Network model.

Two Tower Neural Networks

Two Tower Neural Networks is a very popular machine learning algorithm for recommender systems that Instagram uses heavily.

With a Two Tower Model, you generate embedding vectors for the user and for all the content you need to retrieve/rank. An embedding vector is just a compact representation that captures the attributes and relationships of an item in a machine-learning-friendly vector.

Once you have these embedding vectors, you can look at the similarity between a user’s embedding vector and the content’s embedding vector to predict the probability that the user will engage with the content.

One big benefit of the Two Tower approach is that both the user and item embeddings can be calculated during off-peak hours and then cached. This makes inference extremely efficient.

If you’d like to read more, Uber Engineering published an extremely detailed blog post on how they use Two Towers in the UberEats app to convince you to buy burritos at 1:30 am. I’d highly recommend giving it a skim if you’d like to delve deeper.

First Stage Ranking

After candidates are retrieved, the system needs to rank them by value to the user. This “value” is determined by how likely a user is to engage with the photo/video. Engagement is measured by whether the user likes/comments on it, shares it, watches it fully, etc.

The first stage ranker takes in thousands of candidates from the Retrieval stage and filters it down to the top 100 potential items.

To do this, Instagram again uses the Two Tower NN model. The fact that the model lets you precompute and cache embedding vectors for the user and all the content you need to rank makes it very scalable and efficient.

However, this time, the learning objective is different from the Two Tower NN of the Candidate Generation stage.

Instead, the two embedding vectors are used to generate a prediction of how the second stage ranking model will rank this piece of content. The model is used to quickly (and cheaply) gauge whether the second stage ranking model will rank this content highly.

Based on this, the top 100 posts are passed on to the second-stage ranking model.

Second Stage

Here, Instagram uses a Multi-Task Multi Label (MTML) neural network model. As the name suggests, this is an ML model that is designed to handle multiple tasks (objectives) and predict multiple labels (outcomes) simultaneously.

For recommendation systems, this means predicting different types of user engagement with a piece of content (probability a user will like, share, comment, block, etc.).

The MTML model is much larger than the Two Towers model of the first-pass ranking. Predicting all the different types of engagement requires far more features and a deeper neural net.

Once the model generates probabilities for all the different actions a user can take (liking, commenting, sharing, etc.), these weighted and summed together to generate an Expected Value (EV) for the piece of content.

Expected Value = W_click * P(click) + W_like * P(like) – W_see_less * P(see less) + etc.

The 100 pieces of content that made it to this stage of the system are ranked by their EV score.

Final Reranking

Here, Facebook applies fine-grained business rules to filter out certain types of content.

For example

Avoid sending too much content from the same author
Downrank posts/content that could be considered harmful

And so on.

Tech Snippets

How Open Source Repos get Social Engineered

A few weeks ago, there was an extremely concerning hack in xz Utils where a hacker was able to social engineer his way into maintainer status. After, the hacker tried to insert a backdoor into the library.

This is a really interesting blog post from the OpenJS and OpenSSF foundation on how these types of attacks are not isolated. The OpenJS foundation intercepted a similar takeover attempt for one of their popular JavaScript projects.

The post goes into detail of how these attacks happen and patterns they’re seeing in the attackers. It also talks about actionable recommendations for open source projects to improve their security against these threats.

openssf.org/blog/2024/04/15/open-source-security-openssf-and-openjs-foundations-issue-alert-for-social-engineering-takeovers-of-open-source-projects

OKRs are not OK

OKRs are important to foster alignment and focus within a company/team. However, they can often stifle exploration, reinforce silos and lead to a narrow-minded approach.

This is a great blog post that delves into the cons of OKRs and what you can do instead.

hagakure.substack.com/p/twh48-okrs-are-not-ok

Build a More Transparent Uptime Page

Uptime pages are now mostly a joke. AWS is known for having an evergreen status page regardless of whether or not there’s an outage. Slack was criticized for reporting 100% uptime despite widespread DNS issues and blackouts.

This discrepancy is because companies use status pages to convey whether or not customers will get a refund based on the SLA. Just because there’s an outage, doesn’t mean you’re getting your money back.

This is a good blog post that delves deeper into this issue and why companies should be more transparent with status.

blog.lawrencejones.dev/status-pages

A curated list of software and architecture related design patterns.

This is a fantastic GitHub Awesome repo with a ton of resources on software and architecture related design patterns.

github.com/DovAmir/awesome-design-patterns