How Stripe Does Rate Limiting
Plus, Leader Election in Distributed Systems, Use Naming that is Short but Clear and More
Hey Everyone!
Today we’ll be talking about
How Stripe does Rate Limiting
Why Rate Limit
Rate Limiting vs Load Shedding
Common Rate Limiting Algorithms
Rate Limiting at Stripe
Load Shedding at Stripe
Tech Snippets
Leader Election in Distributed Systems
How to Become a Better Engineering Leader
Use Naming that is Short but Clear
Pull Request Creation Best Practices
API Limiting at Stripe
When you’re building an API that’ll have many users, rate limiting is something you must think about.
Rate limiting is where you put a limit on the number of requests a user can send to your API in a specific amount of time. You might set a rate limit of 100 requests per minute.
If someone tries to send you more, then you reply with a HTTP 429 (Too Many Requests) status code. You could include the Retry-After HTTP header with a value of 240 (seconds) telling them they can try their request again after a 4 minute cooldown.
Stripe is a payments processing service where you can use their API in your web app to collect payments from your users. Like every other API service, they have to put in strict rate limits to prevent a user from spamming too many requests.
You can read about Stripe’s specific rate limiting policies here, in their API docs.
Paul Tarjan was previously a Principal Software Engineer at Stripe and he wrote a great blog post on how Stripe does rate limiting. We’ll explain rate limiters, how to implement them and summarize the blog post.
Note - the blog post is from 2017 so some aspects of it are out of date. However, the core concepts behind rate limiters haven’t changed. If you work at Stripe, then please feel free to tell me what’s changed and I’ll update this post!
Why Rate Limit
If you expose an API endpoint to the public, then you’ll inevitably have someone start spamming your endpoint with requests. This person could be doing it intentionally or unintentionally.
It could be a hacker trying to bring down your API, but it could also be a well-intentioned user who has a bug in his code. Or, perhaps the user is facing a spike in traffic and he’s decided that he’s going to make his problem your problem.
Regardless, this is something you have to plan for. Failing to plan for it means
Poor User Experience - One user taking up too many resources from your backend will degrade the UX for all the other users of your service
Unnecessary Stress - Getting paged at 3 am because some dude is sending your API a thousand requests per minute (RPM) is not fun
Wasting Money - If a small percentage of users are sending you 1000 RPM while the majority of users are sending 100 RPM then the cost of scaling the backend to meet the high-usage users might not be financially worth it. It could be better to just rate limit the API at 20 requests per minute and tell the high-usage customers to get lost.
A rule of thumb for how to configure your rate limiter is to base it on if your users can reduce the frequency of their API requests without affecting the outcome of their service.
For example, let’s say you’re running a social media site and you have an endpoint that returns a list of the followers of a specified user.
This list is unlikely to change on a second-to-second basis, so you might set a rate-limit of 10 requests per minute for this endpoint. Allowing someone to send the endpoint 100 requests a minute makes no sense, as you’ll just be sending the same list back again and again.
Of course, this is just a rule of thumb. You might make your rate limits stricter depending on engineering/financial constraints.
Rate Limiting vs Load Shedding
Load Shedding is another technique we’ve talked about frequently in Quastor. This is where you intentionally ignore incoming requests when the load exceeds a certain threshold. This is another strategy to avoid overwhelming your backend and it’s usually implemented in concert with rate limiting.
The difference is that rate limiting is done against a specific API user (based on their IP address or API key). Load shedding is done against all users, however you could do some segmentation (ignore any lower-priority requests or de-prioritize requests from free-tier users).
In a previous article, we delved into how Netflix uses load shedding to avoid bringing the site down when they release a new season of Stranger Things.
Rate Limiting Algorithms
There’s many different algorithms you can use to implement your rate limiter. Some common ones are…
Token Bucket
This is the strategy that Stripe uses.
Each user gets an allocation of “tokens”. On each request, they use up a certain number of tokens. If they’ve used all their tokens, then the request is rejected and they get a 429 (Too Many Requests) error code.
At some set interval (every second/minute), the user will get additional tokens.
Fixed Window
You create a window with a fixed-time size (every 30 seconds, every minute, etc.). The user is allowed to send a certain number of requests during the window and this resets back to 0 once you enter the next window.
You might have a fixed-window size of 1 minute that is configured to reset at the end of every minute. During 1:00:00 to 1:00:59, they can send you a maximum of 10 requests. At 1:01:00, it resets.
The issue with this strategy is that a user might send you 10 requests at 1:00:59 and then immediately send you another 10 requests at 1:01:00. This can result in bursts of traffic.
Sliding Window
Sliding Window is meant to address this burst problem with Fixed Window. In this approach, the window of time isn't fixed but "slides" with each incoming request. This means that the rate limiter takes into account not just the requests made within the current window, but also a portion of the requests from the previous window.
For instance, consider a sliding window of 1 minute where a user is allowed to make a maximum of 10 requests.
Let's say a user makes 10 requests starting from 1:00:00 to 1:00:30.
Request Rejected - If the user tries to make another request at 1:00:31, the system will look back one minute from this point. Since all 10 requests fall within this one-minute window (from 1:00:31 to 1:00:00), the new request will be rejected.
Request Accepted - If the user makes another request at 1:00:45, the system will look back one minute from this point. The requests made at the beginning of the window (say, the first two requests made at 1:00:00 and 1:00:01) are now outside this one-minute window. So, these two requests are no longer counted, and the new request at 1:00:45 is allowed.
Rate Limiting and Load Shedding at Stripe
Now we’ll talk about how Stripe uses rate limiting and load shedding to reduce the number of sleepless nights their SREs have to go through.
Stripe uses 4 different types of limiters in production.
Request Rate Limiter
Concurrent Requests Limiter
Fleet Usage Load Shedder
Worker Utilization Load Shedder
Request Rate Limiter
This limits each user to N requests per second and it uses a token bucket model. Each user gets a certain number of tokens that refill every second. However, Stripe adds flexibility so users can briefly burst above the cap in case they have a sudden spike in traffic (a flash-sale, going viral on social media, etc.)
Concurrent Requests Rate Limiter
This limits the number of concurrent requests a user has. Sometimes, a user will have poorly configured timeouts and they’ll retry a request while the Stripe API is still processing. These retries add more demand to the already overloaded endpoint, causing it to slow down even more. This rate limiter prevents this.
Fleet Usage Load Shedder
This is a load shedder, not a rate limiter. It will not be targeted against any specific user, but will instead block certain types of traffic while the system is overloaded.
Stripe divides their traffic into critical and non-critical API methods. Critical methods would be charging a user. An example of a non-critical method would be querying for a list of past charges.
Stripe always reserves a fraction of their infrastructure for critical requests. If the reservation number is 10%, then non-critical requests will start getting rejected once their infrastructure usage crosses 90% (and they only have 10% remaining that’s reserved for critical load)
Worker Utilization Load Shedder
The Fleet Usage Load Shedder operates at the level of the entire fleet of servers. Stripe has another load shedder that operates at the level of individual workers within a server.
This Load Shedder divides traffic into 4 categories
Critical Methods
POST requests
GET requests
Test Mode Traffic
If a server is getting too much traffic, then it will start shedding load starting with the Test Mode Traffic and working its way up.
For more details on Limiters at Stripe, you can read the full article here.
How did you like this summary?Your feedback really helps me improve curation for future emails. |
A Developer Approach to Building Internal Tools
Developer time is expensive and many spend countless hours on items such as
Managing cron jobs to send out daily TPS report reminders
Creating React dashboards from scratch to display metrics
Wrangling scripts to pull data from different apps and sending it to Redshift
Airplane is the developer platform that handles all of this for you. With Airplane, users can transform scripts, queries, APIs, and more into powerful internal UIs and workflows within minutes. Airplane is used at companies like Flatfile, Modern Treasury, Dover and more.
You can use Airplane to quickly create a script that connects to a REST/GraphQL API, schedule recurring jobs or even build complex dashboards in React. They have a large library of templates and components that you can use to get started.
Airplane is a code-first platform, so all the tools you create can easily be integrated within your codebase, version controlled, extended with third-party libraries, and more.
Using Airplane, companies have massively sped up the creation of internal tooling, allowing their developers to focus on the high-impact code that brings value to their customers.
sponsored