How DoorDash labels Millions of Items with Large Language Models
How DoorDash uses GPT-4, RAG and LLM agents for labeling data. Plus, why you don't always need indexes, bloom filters explained visually and more.
Hey Everyone!
Today we’ll be talking about
How DoorDash uses LLMs for labeling their products
Issues DoorDash faces with Data Labeling
Solving the “Cold Start problem” with LLMs
Labeling Organic Products with LLM agents
Solving Entity Resolution with Retrieval Augmented Generation
Tech Snippets
Become a better communicator by focusing on the “kernel“
You don’t always need indexes
Why working quickly is more important than it seems
Bloom filters explained visually
How DoorDash uses LLMs for Data Labeling
Currently, Large Language Models have a ton of “FOMO” (fear of missing out) around them, so it can be tempting to dismiss them as another hype-train that’ll fizzle out.
There are definitely some aspects that are over-hyped, but it’s also important to note that LLMs have fundamentally changed how we’re building ML models and they’re now being used in NLP tasks far beyond ChatGPT.
DoorDash is the largest food delivery service in the US and they let you order items from restaurants, convenience stores, grocery stores and more.
Their engineering team published a fantastic blog post about how they use GPT-4 to generate labels and attributes for all the different items they have to list.
We’ll talk about the architecture of their system and how they use OpenAI embeddings, GPT-4, LLM agents, retrieval augmented generation and more to power their data labeling. (we’ll define all these terms)
We’ll cover a ton of concepts on LLMs in this article.
If you’d like Spaced Repetition Flashcards (Anki) on all the concepts discussed in Quastor, check out Quastor Pro.
When you join, you’ll also get an up-to-date PDF with our past articles.
Data Labeling at DoorDash
As we mentioned earlier, DoorDash doesn’t just deliver food from restaurants. They also deliver groceries, medical items, beauty products, alcohol and much more.
With each of these items, the app needs to track specific attributes in order to properly identify the product.
For example, a can of Coke will have attributes like
Size: 12 fluid ounces
Flavor: Cherry
Type: Diet
On the other hand, a bottle of shampoo will have attributes like
Brand: Dove
Type: Shampoo
Keyword: Anti-Dandruff
Size: 500 ml
Every product has different attribute/value pairs based on what the item is. DoorDash needs to generate and maintain these for millions of items across their app.
To do this, they need an intelligent, robust ML system that can handle creation and maintenance of these attribute/value pairs based on a product’s name and description.
We’ll talk about 3 specific problems DoorDash faced when building this system and how LLMs have helped them address the issues.
Cold Start Problem
Labeling Organic Products
Entity Resolution
Solving the Cold Start Problem with LLMs
One big issue DoorDash faced with building this attribute/value creation system was the cold start problem (a classic issue with ML systems).
This happens when DoorDash onboards a new merchant and there are a bunch of new items that they’ve never seen before.
For example, what would DoorDash do if Costco joined the platform?
Costco sells a bunch of their own products (under the Kirkland brand), so a traditional NLP system wouldn’t recognize any of the Kirkland branded items (it wasn’t in the training set).
However, Large Language Models are already trained on vast amounts of data. GPT-4 has knowledge of Costco’s products and even understands memes about Costco’s Rotisserie chicken.
With this base knowledge, LLMs can perform extremely well without requiring labeled examples (zero shot prompting) or requiring just a few (few shot prompting).
Here’s the process DoorDash uses for dealing with the Cold Start problem:
Traditional Techniques - The product name and description is passed to DoorDash’s in-house classifier. This is built with traditional NLP techniques for Named Entity Recognition.
Use LLM for Brand Recognition - Items that cannot be tagged confidently are passed to an LLM. The LLM will take in the item’s name and description and is tasked with figuring out the brand of the item. For the shampoo bottle example, the LLM would return Dove.
RAG to find Overlapping Products - DoorDash takes the brand name and product name/description and then queries an internal knowledge graph to find similar items. The brand name, product name/description and the results of this knowledge graph query are all taken and then given to an LLM (retrieval augmented generation). The LLM’s task is to see if the product being analyzed is a duplicate of any other products found in the internal knowledge graph.
Adding to the Knowledge Base - If the LLM determines that the product is unique then it enters the DoorDash knowledge graph. Their in-house classifier (from step 1) is re-trained with the new product attribute/value pairs.
Organic Product Labeling with LLM agents
Another issue that DoorDash needs to solve is properly labeling organic products. One of their goals was to create a “Fresh & Organic” section in the app for customers who prefer those types of products.
Here are the steps in how DoorDash figures out if a product is organic
String Matching - Look for the keyword “organic” in the product name/description. However, the product names/descriptions aren’t perfect and organic could be misspelled or it could go under a different name (“natural”, “non-GMO”, “hormone-free”, “unprocessed”, etc.). This is where LLMs come into play.
LLM Reasoning - DoorDash will use LLMs to read the available product information and determine whether it could be organic. This has massively improved coverage and addressed the challenges faced with only doing string matching.
LLM Agent - LLMs will also conduct online searches of product information and send the search results to another LLM for reasoning. This process of having LLMs use external tools (web search) and make internal decisions is called “agent-powered LLMs”. I’d highly recommend checking out LlamaIndex to learn more about this.
Solving the Entity Resolution Problem with Retrieval Augmented Generation
Entity Resolution is where you take the product name/description of two items and figure out whether they’re referring to the same thing.
For example, does “Corona Extra Mexican Lager (12 oz x 12 ct)” refer to the same product as “Corona Extra Mexican Lager Beer Bottles, 12 pk, 12 fl oz”?
In order to accomplish this, DoorDash uses LLMs and Retrieval Augmented Generation (RAG).
RAG is a commonly used way to use language models like GPT-4 on your own data.
With RAG, you first take your input prompt and use that to query an external data source (a popular choice is a vector database) for relevant context/documents. You take the relevant context/documents and add that to your input prompt and feed that to the LLM.
Adding this context from your own dataset helps personalize the LLM’s results to your own use case.
Here’s how DoorDash does this for Entity Resolution.
They’ll take a product name/description and run it through this process:
Generate Embeddings Vector - a common way to compare strings is to take the string and use an Embeddings model to turn that string into a vector (a collection of numbers). This vector encodes meaning and knowledge about the original text so strings like “queen” and “beyonce” will map to vectors that are “similar” in certain dimensions. 3Blue1Brown has an amazing video delving into word embeddings in a visual way.
DoorDash uses OpenAI Embeddings to do this with the product’s name.Query the Vector Database - Once they generate a vector from the product name, they’ll query a vector database that stores the embedding vectors from all the other product names in DoorDash’s app. Then, they use approximate nearest neighbors to retrieve the most similar products.
Pass Augmented Prompt to GPT-4 - They take the most similar product names and then feed that to GPT-4. GPT-4 is instructed to read the product names and figure out if they’re referring to the same underlying product.
With this approach, DoorDash has been able to generate annotations in less than ten percent of the time it previously took them.