• Quastor
  • Posts
  • How Airbnb Built Their Feature Recommendation System

How Airbnb Built Their Feature Recommendation System

Plus, how to minimize correlated failures in a distributed system, building a basic RDBMS from scratch and more.

Hey Everyone!

Today we’ll be talking about

  • How Airbnb Built Their Feature Recommendation System

    • Airbnb scans through customer reviews, conversations between hosts and travelers, support requests and other unstructured data.

    • One way they use this data is to generate recommendations on how the host can improve their property listings

    • They use TextCNN for named entity recognition and use word embeddings to map the text to key phrases.

  • Career Advice Nobody Gave Me: Never Ignore a Recruiter

    • Developers on Reddit/Hacker News like to joke about the large amount of recruiter spam you get on linkedin/email.

    • While most of the inbound messages you get are a waste of time, some of them can be meaningful opportunities.

    • You can make the most of the messages by creating a system to quickly send the recruiter a templated reply to get information on tech stack, total compensation, etc. while minimizing the amount of time you spend.

  • Tech Snippets

    • How to Minimize Correlated Failures in a Distributed System ~ AWS Builder’s Library

    • Building a Basic RDBMS From Scratch

    • How Asana Onboards Engineering Managers

    • How Dropbox Manages Data Quality and Coverage

How to Choose the Right Database for your Workload

There’s an endless number of different database types available. Key-Value, Graph, Document, Wide-Column and Time-Series are just a few examples of the paradigms.

InfluxData hosted a great presentation where they delved into the ecosystem, use cases, and tools you can use to pick the right database for your application.

In the talk, they discuss

  • The current database landscape

  • The underlying architecture that makes databases perform differently

  • Future trends in the database ecosystem

They analyze Relational, Key-Value, Time-Series, In-Memory, NewSQL, Columnar, Document, Graph databases and more. You’ll learn about the pros/cons of each variant and specific use cases for where they’re used.


How Airbnb Built Their Feature Recommendation System

Airbnb is an online marketplace where people can rent out their homes or rooms to travelers who need a place to stay. The company has hundreds of millions of users worldwide and over 4 million hosts on the platform.

In order to increase revenue, hosts on Airbnb need to create the most attractive listing possible (which travelers will see when they’re searching for an apartment). They should provide clear information on the specific things travelers are looking for (fast internet, kitchen size, access to shopping, etc.) and also advertise the best features of the home/apartment.

Airbnb makes this easier by providing highly personalized recommendations to hosts on details that should be added to the listing.

They generate these recommendations by analyzing a huge amount of data, including

  • In-app conversations between the host and travelers

  • Customer reviews for the property

  • Customer support requests that travelers made while they were staying on the property

Joy Jing is a senior software engineer at Airbnb and she wrote a great blog post on the machine learning Airbnb uses to generate these recommendations.

Here’s a summary

Airbnb has a huge amount of text data on each property. Things like conversations between the host and travelers, customer reviews, customer support requests, and more.

They use this unstructured data to generate home attributes around things like wifi speed, free parking, access to the beach, etc.

To do this, they built LATEX (Listing ATtribute EXtraction), a machine learning system to extract these attributes from the unstructured text.

It works in two steps

  1. Named Entity Recognition (NER) - they extract key phrases from the unstructured text data

  2. Entity Mapping Module - they use word embeddings to map these phrases to home attributes.

For NER, Airbnb wants to scan through the unstructured text and extract any phrases that are related to home attributes. To do this, they use textCNN (convolutional neural network for text).

They fine-tuned the model on human labeled text data from various sources within Airbnb and it extracts any key phrases around things like amenities (“hot tub”), specific POI (“Empire State Building”), generic POI (“post office”) and more.

However, users might use different terms to refer to the same thing. Someone might refer to the hot tub as the jacuzzi or as the whirlpool. Airbnb needs to take all these different phrases and map them all to hot tub.

To do this, Airbnb uses word embeddings, where the key phrase is converted to a vector using an algorithm like Word2Vec (where the vector is chosen based on the meaning of the phrase). Then, Airbnb looks for the closest attribute label word vector using cosine distance.

To provide recommendations to the host, they calculate how frequently each attribute label is referenced across the different text sources (past reviews, customer support channels, etc.) and then aggregate them.

They use this as a factor to rank each attribute in terms of importance. They also use other factors like the characteristics of the property (property type, square footage, luxury level, etc.).

Airbnb then prompts the owner to include more details about certain attribute labels that are highly ranked to improve their listing.

For more details, you can read the full blog post here.

How did you like this summary?

Your feedback really helps me improve curation for future emails.

Login or Subscribe to participate in polls.

It's About Time. Build on InfluxDB.

Working with large sets of time-stamped data has its challenges.

Fortunately, InfluxDB is a time series platform purpose-built to handle the unique workloads of time series data.

Using InfluxDB, developers can ingest billions of data points in real-time with unbounded cardinality, and store, analyze, and act on that data – all in a single database.

No matter what kind of time series data you’re working with – metrics, events, traces, or logs – InfluxDB Cloud provides a performant, elastic, serverless time series platform with the tools and features developers need. Native SQL compatibility makes it easy to get started with InfluxDB and to scale your solutions.

Companies like IBM, Cisco, and Robinhood all rely heavily on InfluxDB to build and manage responsive backend applications, to power predictive intelligence, and to monitor their systems for insights that they would otherwise miss.

See for yourself by quickly spinning up the platform and testing it out InfluxDB Cloud for free.


Tech Snippets

Career Advice Nobody Gave Me: Never Ignore a Recruiter

Many developers on Reddit/Hacker News like to joke about the “recruiter spam” you can get as a software engineer. Many of the inbound messages can be completely irrelevant and it’s usually a waste of your time to engage.

Most engineers just ignore the messages, but Alex Chesser wrote a great blog post on a better way to engage with the inbound.

Instead, he uses a templated script that he copy/pastes to auto-respond to recruiter messages.

In the script he politely tells the recruiter he doesn’t have time for a call, but would like more information about the position. He enquires about the company name, job description and total compensation for the role.

When the recruiter responds, there’s three possible scenarios

  1. The salary is at or below your current level - You’ve just collected a salary data point. Looks like you’re getting paid the right amount. You can reply to the recruiter with a message about how you’re only open to positions with a compensation of 1.5x your current salary.

  2. The salary is less than 1.5x your current - Ask for more information about the technology stack and position type. Maybe there are growth opportunities in switching.

  3. The salary is more than 1.5x your current - It probably makes sense to arrange a call.

Write templated replies for each of these scenarios, so it’s much faster to auto-respond (Alex gives examples of templates he uses in the full blog post).

The vast majority of your auto responses will probably result in scenario 1 (assuming you’re being paid a fair rate), but scenarios 2 and 3 is where you’ll find the biggest career growth.

Put a system in place so you don’t miss those opportunities while minimizing the amount of time/energy you need to invest.

You can read the full blog post here.

How did you like this summary?

Your feedback really helps me improve curation for future emails. Thanks!

Login or Subscribe to participate in polls.