How Clubhouse recommends Rooms
Hey Everyone!
Today we’ll be talking about
Rob Pike’s 5 rules for Software Engineering
Rob Pike was one of the designers of the Go programming language. He was also a core member of the Unix team at Bell Labs
He has 5 famous rules of programming that serve as useful heuristics to keep in mind
We’ll go through all 5 rules.
How Clubhouse recommends Rooms
Clubhouse is a social audio app that lets you join rooms to listen in on conversations
Their room recommendations are calculated using gradient-boosted decision trees (GBDTs)
Some challenges they faced when implementing their ML engine
Plus, some tech snippets on
Good hacks to use when giving an engineering estimate
A benchmarking of commonly used data algorithms
Cloudflare Workers are open source!
We also have a solution to our last interview question and a new question from Bloomberg.
Quastor is a free Software Engineering newsletter that sends out summaries of technical blog posts, deep dives on interesting tech and FAANG interview questions and solutions.
Rob Pike’s 5 Rules of Programming
Rob Pike is one of the designers of the Go programming Language and was also a core member of the Unix team at Bell Labs (he was the co-author of The Unix Programming Environment with Brian Kernighan).
He’s also known for his 5 Rules of Programming.
Here’s the five rules…
You can't tell where a program is going to spend its time. Bottlenecks occur in surprising places, so don't try to second guess and put in a speed hack until you've proven that's where a bottleneck exists.
Measure. Don't tune for speed until you've measured, and even then don't unless one part of the code overwhelms the rest.
Fancy algorithms are slow when n is small, and n is usually small. Fancy algorithms have big constants. Until you know that n is frequently going to be big, don't get fancy. (Even if n does get big, use Rule 2 first.)
Fancy algorithms are buggier than simple ones, and they're much harder to implement. Use simple algorithms as well as simple data structures.
Data dominates. If you've chosen the right data structures and organized things well, the algorithms will almost always be self-evident. Data structures, not algorithms, are central to programming.
Tech Snippets
Hacks for giving engineering estimates - Shubhro Saha is an Engineering Manager at Facebook. He wrote a great blog post on how you can give better estimates on how long a project will take to complete.One great tip is to always clearly separate the estimate vs. the goal vs. the plan. The goal may be the most optimistic while the plan is the most pessimistic (it accounts for any hiccups that may be encountered).
Use Fast Data Algorithms - Joey Lynch is a senior software engineer at Netflix where he works on Cloud Data Engineering. He spends a significant amount of time moving data between databases and hashing/compressing the data.Based on all his experience, he wrote a great blog post going through some of the common tasks you’ll have to do with data and he talks about the best algorithms to use, the worst ones to use, and the expected performance difference.
Clouldflare is open sourcing the Workers Runtime - Cloudflare Workers provides a serverless execution environment that you can use to run your code with very little maintenance. It’s been very popular with developers as more than 450,000 devs have used them since they launched in 2018. They are now open sourcing the Workers runtime under the Apache 2.0 license.
Quastor is a free Software Engineering newsletter that sends out summaries of technical blog posts, deep dives on interesting tech and FAANG interview questions and solutions.
How Clubhouse Recommends Rooms
Clubhouse is a social audio application that allows you to create private audio rooms where you can speak to your friends in the app. You can also create public audio rooms where other users of the app can join in the audience (and listen in) or as a speaker.
The app experienced viral growth in 2020, peaking in February 2021 with Elon Musk interviewing Vlad Tenev (the CEO of Robinhood) on the whole GameStop saga. Since then, however, usage of the Clubhouse app has plummeted and downloads of the app have been stagnating.
One of the reasons why is because Clubhouse’s room recommendations (the audio rooms recommended when you open the app) were pretty poor.
Speaking from my personal usage, the room recommendations were not relevant to my interests so I’d frequently open the app, find nothing interesting and immediately leave.
Akshaan Kakar is a software engineer working on machine learning & discovery at Clubhouse, and he wrote a great blog post on how Clubhouse built a new recommendation engine to provide better room recommendations.
Here’s a summary
When you first open the Clubhouse app, you are in the “hallway”. In the hallway are a bunch of different audio rooms that you can join.
At any given time, there are thousands of possible rooms that Clubhouse can recommend to you in your hallway. Therefore they need to rank the potential rooms and present you with the top recommendations.
Early on, Clubhouse used simple heuristics to rank the rooms. Heuristics like how many of your friends were in the room or how closely the room matched with the topics that you’re following.
Now, Clubhouse uses machine learning with a ranking model that is based on Gradient Boosted Decision Trees (GBDTs).
If you’re unfamiliar with GBDTs, this is the best article I found that explains them. It starts from decision trees, goes into ensemble learning (a model that makes predictions by combining multiple simpler models) and boosting and then goes into GBDTs.
The GBDT model at Clubhouse is based on hundreds of different data points like whether you spend more time in smaller rooms vs. larger rooms, whether you prefer to speak or just listen in, how many participants are in the room, etc.
The model is trained as a classifier, where it will create a score for each room between 0 and 1, where 0 means the room is not relevant to you at all while 1 means it’s extremely relevant.
This classification score is then used to rank the rooms in your hallway.
Complexities
There are many complexities that arise with the machine learning model. Here are a couple.
Real Time Features
Many of the features (data points) used by the ranking model are slow-moving batch features that can be computed once every few hours.
However, the model is recommending live rooms that change on a second by second basis. A celebrity can randomly join a room and that completely changes how the room should be ranked.
Therefore, engineers also have to incorporate real time data into the recommendation model.
To do this, they have individual events that fire every time there is a change to a room. These events are then sent as streaming data to Clubhouse’s recommendation model so it can incorporate real time information into its recommendations.
These real time features are also logged at inference time, so engineers can use the values later to train future iterations of the model.
Making Fast Inferences
Clubhouse has to run this recommendation model on hundreds of features across hundreds of rooms. This can be very resource-intensive so they’ve taken steps to make sure user experience isn’t compromised.
They use a simple memory-backed feature storage mechanism, so fetching model features is done quickly.
They also spin up lightweight stateless microservices that are solely responsible for model inference. The server will fetch the feature data and then send it to the microservice responsible for machine-learning inferences. With this set up, model-inference is isolated from the core server and it can be scaled up/down independently.
For more details, read the full article here.
Quastor is a free Software Engineering newsletter that sends out summaries of technical blog posts, deep dives on interesting tech and FAANG interview questions and solutions.
Interview Question
You are walking up a staircase with n steps.
You can hop either 1 step, 2 steps or 3 steps in a single time (you have long legs).
Write a function to count how many possible ways you can walk up the stairs.
Previous Question
As a refresher, here’s the previous question
Design an algorithm to encode a list of strings to a single string.
This encoded string will be sent over a network to another computer, where it will be decoded back to the original list of strings.
Implement the encode and decode functions.
Here’s the question in LintCode
Solution
This is a pretty open-ended question and there’s an infinite number of ways to solve it.
A simple solution we’ll implement is to combine all the strings in the encode function, but add a “-” in between them.
So, [“hello”, “there”] will turn into “hello-there”.
Then, our decode function will split the string up by the “-”.
However, an issue we will face is if there are any dashes present in the list of strings.
For example, what if we get [“Obi-Wan”, “Kenobi”].
Our encode function will turn that into “Obi-Wan-Kenobi” and our decode function will return [“Obi”, “Wan”, “Kenobi”].
We can solve this issue by first replacing all the dashes in the list of strings with a keyword to identify them.
Then, we’ll encode our strings into a single string.
Our decode function will first split the string by dashes and then go through each string in the list and replace the keywords with the original dash.
Here’s the Python 3 code.