How Stripe catches Fraud Rings

July 21, 2022

Hey Everyone!

Today we’ll be talking about

Why DoorDash migrated from Python to Kotlin
- DoorDash migrated from a Python 2 / Django monolith to a microservices architecture
- They considered the pros/cons of Kotlin, Java, Go, Rust and Python 3. They picked Kotlin.
- Some of the migration pains were educating engineers on the language, developing best practices for using coroutines, java interoperability and dependency management.
How Stripe uses Similarity Clustering to catch Fraud Rings
- The merchant fraud problem at Stripe
- Picking Gradient Boosted Decision Trees and training the models
- Using the models to predict fraud.
Best Books on Managing Software Complexity - This is from a thread on Hacker News where someone asked for the best books on managing software complexity
- John K Ousterhout, A Philosophy of Software Design
- Robert L. Glass, Facts and Fallacies of Software Engineering
- Titus Winter (et al), Software Engineering at Google

Why DoorDash migrated from Python to Kotlin

DoorDash is the largest food delivery app in the United States with more than 450 thousand restaurants, 20 million customers and 1 million deliverers.

Matt Anger is a Senior Staff Engineer at DoorDash where he works on the Core Platform and Performance teams.

He published a great blog post (May 2021) on DoorDash’s migration from Python 2 to Kotlin. Here’s a summary.

Summary

DoorDash was quickly approaching the limits of what their Django-based monolithic codebase could support.

With their legacy system, the number of nodes that needed to be updated added significant time to releases. Debugging bad deploys with bisection got harder and longer due to the number of commits each deploy had. The monolith was built with Python 2 which was also entering end-of-life. You can read more about the scaling pains DoorDash experienced with their monolith here.

Engineers at DoorDash decided to transition from the monolith to a microservices architecture. They also looked for a new tech stack to replace Python 2 and Django.

One of their goals was to only use one language for the backend.

Having one language would let them

Promote Best Practices - Having one language makes it easier for teams to share development best practices across the entire company.
Build Common Libraries - All engineers can share common libraries and tooling.
Change Teams - Engineers can change teams with minimal friction, which encourages more collaboration.

Picking the Right Coding Language

First, DoorDash engineers looked at the parts of their tech stack that would not change.

They had a lot of experience with Postgres and Apache Cassandra, so they would continue to use those technologies as data stores.

They would use gRPC for synchronous service-to-service communication, with Apache Kafka as a message queue.

In terms of the programming language, the choices in contention were Kotlin, Java, Go, Rust and Python 3.

Here’s the comparison they did…

After doing the comparison, they went with Kotlin. They had already done some testing around the language and it worked well.

Kotlin mitigated some of the pain points around Java with Null Safety and Coroutines.

Some of the growing pains they faced with Kotlin were

Educating DoorDash engineers on the language - Much of the online community around Kotlin is specific to Android dev, and there isn’t as much content on backend engineering.To help engineers learn the language, they regularly held Lunch and Learn sessions and set up a slack channel for questions.
Avoiding coroutine gotchas - DoorDash used gRPC for service-to-service communication however gRPC Kotlin wasn’t available when they first made the switch. They used gRPC-Java, which lacked support for coroutines.gRPC Kotlin is now generally available so they made the migration to that.There are several other gotchas around coroutines that are discussed in the article.
Getting around Java interoperability pain points - There were some pain points with Java interop. Many libraries claiming to implement modern Java Non-blocking I/O standards did so in an unscalable manner. This caused issues when using coroutines. Check the article for full details.
Making dependency management easier - The build system and dependency management are a lot less intuitive than more recent solutions like Rust’s Cargo or Go’s modules. Some dependencies are particularly sensitive to version upgrades and can lead to issues where compilation succeeds but the app fails on boot up with odd, seemingly irrelevant back traces.DoorDash engineers learned which projects tend to cause these issues most often and have guidelines for how to catch and bypass them.

For more details, read the full article

How did you like this summary?

Your feedback really helps me improve curation for future emails.

Ask HN: Best books on managing software complexity?

Someone on Hacker News posted a thread asking for the best books on managing software complexity, both from an architectural as well as organizational perspective.

These three books were recommended quite a bit:

John K Ousterhout, A Philosophy of Software Design
Titus Winter (et al), Software Engineering at Google
Hanson and Sussman, Software Design for Flexibility

Other books that were recommended were:

Peter Naur, Programming as Theory Building
Scott Wlaschin, Domain Modeling Made Functional
Nick Tune, Patterns, Principles, and Practises of Domain Driven Design
Robert L. Glass, Facts and Fallacies of Software Engineering
Donald Reinertsen, The Principles of Product Development Flow

How Stripe uses Similarity Clustering to catch Fraud Rings

Stripe is one of the world’s largest payment processors.

The company’s main product is the Stripe Payments API, which developers can use to easily embed payment functionality into their applications.

Due to Stripe’s scale, they’re a big target for payments fraud and cybercrime.

Andrew Tausz is part of the Risk Intelligence team at Stripe, and he wrote a great blog post on how Stripe uses similarity clustering to catch fraud rings. Note - the blog post was published 2 years ago, so it may be slightly out of date.

Merchant Fraud at Stripe

One of the most common types of fraud that Stripe faces is merchant fraud, where a scammer will create a website that advertises fraudulent products or services (and uses Stripe to process payments).

An example might be if a scammer creates a website that sells electronic goods at a highly discounted price. After a customer pays him for the good, he pockets the money and doesn’t send the customer the promised good.

The customer will end up issuing a chargeback through their credit card, which will have to get paid back by Stripe. Stripe will then attempt to debit the account of the scammer, but if they’re unable to (the scammer transferred out all his money) then Stripe will have to eat the losses.

After a fraudster gets caught by Stripe, his account will be disabled. But, it’s quite likely that he’ll try to continue the scam by creating a new Stripe account.

One way Stripe can reduce fraud is by catching these repeat fraudsters through similarity clustering.

Using Similarity Clustering to Reduce Merchant Fraud

When a scammer creates a new Stripe account (after getting caught on his previous account), he’ll probably reuse some information and attributes from his previous account.

Certain information is easy to fabricate, like your name or date of birth. But, other attributes are more difficult. For example, it takes significant effort to obtain a new bank account.

Therefore, Stripe has found that linking accounts together via shared attributes is quite effective at catching obvious fraud attempts.

Switching from Heuristics-based to an ML model

In order to link accounts together, Stripe relies on a similarity score.

They take two accounts and then assign them a similarity score based on the number of shared attributes the accounts have.

Some shared attributes are weighed more heavily than others. Two Stripe accounts who share dates of birth should have a lower similarity score than two accounts who share a bank account.

Previously, Stripe relied on a heuristic based system where the weightings were hand-constructed (based on guess and check). Stripe decided to switch by training a machine learning model to handle this task.

Now, they can automatically retrain the model over time as they obtain more data and improve in accuracy, adapt to new fraud trends, and learn the signatures of particular adversarial groups.

Building the ML Model

To build the model, Stripe followed a supervised learning approach.

The approach Stripe took to build the model is Similarity Learning, where the objective is to learn a similarity function that can measure how similar two objects are.

Similarity learning is used extensively in ranking, recommendation systems, face/voice verification, and fraud detection.

They already had a massive dataset of fraud rings and clusters of fraudulent accounts based on prior work from their risk underwriting team. Stripe cleaned that into a dataset consisting of pairs of accounts along with a label for each pair indicating whether or not the two accounts belong to the same cluster.

Now that they had the dataset, Stripe had to generate features that the model could use to compare the pair of accounts. Creating a Stripe account requires quite a bit of data, so Stripe had a large feature set they could utilize.

Examples of features chosen include the account’s email domain, overlap in credit card numbers used for both accounts, measure of text similarity, and more.

Using gradient-boosted decision trees

Due to the huge range of features, Stripe decided to go with gradient-boosted decision trees (GBDTs) to represent their similarity model.

Stripe found that GBDTs strike the right balance between being easy to train, having strong predictive power, and being robust despite variations in the data. GBDTs are also straightforward to fine-tune and have well-understood properties.

The implementation of GBDTs that Stripe used was XGBoost. Stripe chose XGBoost models because of their great performance and also because there's already a ton of well-developed infrastructure to train and support them.

Stripe has an internal API called Railyard that handles training ML models in a scalable and maintainable way. You can read more about Railyard and it’s architecture here.

Prediction Use

After, Stripe began to use their model to predict fraudulent activity.

Since this model operates on pairs of Stripe accounts, it’s not possible to feed it all pairs of accounts and compute similarity scores across all pairs (there’s too many combinations).

Instead, Stripe uses some heuristics to identify suspicious accounts and prune the set of candidates to a reasonable number.

Then, they use their ML models to generate similarity scores between the accounts.

After, they compute the connected components on the resulting graph to get a final output of high-fidelity account clusters that can be analyzed, processed or manually inspected.

If a cluster contains a large amount of known fraudulent accounts, then a risk analyst may want to further investigate the remaining accounts in that cluster.

You can read more details in the full article here.

How did you like this summary?

Your feedback really helps me improve curation for future emails. Thanks!

Today's email got a bit long, so we'll give our previous solution and next interview question in next Tuesday's email!