How Airbnb Built Their Feature Recommendation System

Plus, how to minimize correlated failures in a distributed system, building a basic RDBMS from scratch and more.

March 11, 2023

Hey Everyone!

Today we’ll be talking about

How Airbnb Built Their Feature Recommendation System
- Airbnb scans through customer reviews, conversations between hosts and travelers, support requests and other unstructured data.
- One way they use this data is to generate recommendations on how the host can improve their property listings
- They use TextCNN for named entity recognition and use word embeddings to map the text to key phrases.
Career Advice Nobody Gave Me: Never Ignore a Recruiter
- Developers on Reddit/Hacker News like to joke about the large amount of recruiter spam you get on linkedin/email.
- While most of the inbound messages you get are a waste of time, some of them can be meaningful opportunities.
- You can make the most of the messages by creating a system to quickly send the recruiter a templated reply to get information on tech stack, total compensation, etc. while minimizing the amount of time you spend.
Tech Snippets
- How to Minimize Correlated Failures in a Distributed System ~ AWS Builder’s Library
- Building a Basic RDBMS From Scratch
- How Asana Onboards Engineering Managers
- How Dropbox Manages Data Quality and Coverage

How Airbnb Built Their Feature Recommendation System

Airbnb is an online marketplace where people can rent out their homes or rooms to travelers who need a place to stay. The company has hundreds of millions of users worldwide and over 4 million hosts on the platform.

In order to increase revenue, hosts on Airbnb need to create the most attractive listing possible (which travelers will see when they’re searching for an apartment). They should provide clear information on the specific things travelers are looking for (fast internet, kitchen size, access to shopping, etc.) and also advertise the best features of the home/apartment.

Airbnb makes this easier by providing highly personalized recommendations to hosts on details that should be added to the listing.

They generate these recommendations by analyzing a huge amount of data, including

In-app conversations between the host and travelers
Customer reviews for the property
Customer support requests that travelers made while they were staying on the property

Joy Jing is a senior software engineer at Airbnb and she wrote a great blog post on the machine learning Airbnb uses to generate these recommendations.

Here’s a summary

Airbnb has a huge amount of text data on each property. Things like conversations between the host and travelers, customer reviews, customer support requests, and more.

They use this unstructured data to generate home attributes around things like wifi speed, free parking, access to the beach, etc.

To do this, they built LATEX (Listing ATtribute EXtraction), a machine learning system to extract these attributes from the unstructured text.

It works in two steps

Named Entity Recognition (NER) - they extract key phrases from the unstructured text data
Entity Mapping Module - they use word embeddings to map these phrases to home attributes.

For NER, Airbnb wants to scan through the unstructured text and extract any phrases that are related to home attributes. To do this, they use textCNN (convolutional neural network for text).

They fine-tuned the model on human labeled text data from various sources within Airbnb and it extracts any key phrases around things like amenities (“hot tub”), specific POI (“Empire State Building”), generic POI (“post office”) and more.

However, users might use different terms to refer to the same thing. Someone might refer to the hot tub as the jacuzzi or as the whirlpool. Airbnb needs to take all these different phrases and map them all to hot tub.

To do this, Airbnb uses word embeddings, where the key phrase is converted to a vector using an algorithm like Word2Vec (where the vector is chosen based on the meaning of the phrase). Then, Airbnb looks for the closest attribute label word vector using cosine distance.

To provide recommendations to the host, they calculate how frequently each attribute label is referenced across the different text sources (past reviews, customer support channels, etc.) and then aggregate them.

They use this as a factor to rank each attribute in terms of importance. They also use other factors like the characteristics of the property (property type, square footage, luxury level, etc.).

Airbnb then prompts the owner to include more details about certain attribute labels that are highly ranked to improve their listing.

For more details, you can read the full blog post here.

How did you like this summary?

Your feedback really helps me improve curation for future emails.

Tech Snippets

Minimizing correlated failures in distributed systems

The AWS Builder’s Library has a ton of great advice on designing large scale systems.
Fault Correlation is an issue with distributed systems where a single issue can cause multiple things to break (and each of those trigger more failures).

This article gives techniques on how to reduce correlated failures with things like availability zones, adding random jitter, shuffle-sharding and more.

aws.amazon.com/builders-library/minimizing-correlated-failures-in-distributed-systems

Building a basic RDBMS from Scratch

Akila is a software engineer at OpenAI and he wrote a great blog post on building a basic RDBMS from scratch based off MIT’s Database Systems course. He talks about the architecture, building the query parser, optimizer and adding features like transactions.

The database is open source and he gives instructions on how to run it.

www.awelm.com/posts/simple-db

How Asana onboards Engineering Managers

Asana onboards new engineers with a 30-60-90 day plan and they have specific tracks for things like learning team processes, community building, technical onboarding and more.

They published a great blog post delving into exactly how it works and the results they’ve achieved with it.

https://theworkback.com/revamping-engineering-manager-onboarding-at-asana/

How Dropbox manages Data Quality and Coverage

Dropbox uses a hadoop-based data lake for storing data on analytics, billing, new features, and more. The data lake is over 55 petabytes in size and they need to ensure that the data quality is high.

To handle this, they added data validation logic to Apache Airflow. They published a great blog post on the checks they run.

dropbox.tech/infrastructure/balancing-quality-and-coverage-with-our-data-validation-framework

Career Advice Nobody Gave Me: Never Ignore a Recruiter

Many developers on Reddit/Hacker News like to joke about the “recruiter spam” you can get as a software engineer. Many of the inbound messages can be completely irrelevant and it’s usually a waste of your time to engage.

Most engineers just ignore the messages, but Alex Chesser wrote a great blog post on a better way to engage with the inbound.

Instead, he uses a templated script that he copy/pastes to auto-respond to recruiter messages.

In the script he politely tells the recruiter he doesn’t have time for a call, but would like more information about the position. He enquires about the company name, job description and total compensation for the role.

When the recruiter responds, there’s three possible scenarios

The salary is at or below your current level - You’ve just collected a salary data point. Looks like you’re getting paid the right amount. You can reply to the recruiter with a message about how you’re only open to positions with a compensation of 1.5x your current salary.
The salary is less than 1.5x your current - Ask for more information about the technology stack and position type. Maybe there are growth opportunities in switching.
The salary is more than 1.5x your current - It probably makes sense to arrange a call.

Write templated replies for each of these scenarios, so it’s much faster to auto-respond (Alex gives examples of templates he uses in the full blog post).

The vast majority of your auto responses will probably result in scenario 1 (assuming you’re being paid a fair rate), but scenarios 2 and 3 is where you’ll find the biggest career growth.

Put a system in place so you don’t miss those opportunities while minimizing the amount of time/energy you need to invest.

You can read the full blog post here.

How did you like this summary?

Your feedback really helps me improve curation for future emails. Thanks!