How to Implement Rate Limiting in 10 Lines of Code

In a world where APIs are the backbone of digital communication, managing the flow of requests is crucial. Rate limiting is a strategy to control the number of requests a user can make to an API within a given timeframe, preventing overuse and ensuring equitable access. Here's how you can implement basic rate limiting in just 10 lines of Python code using a simple token bucket algorithm.
n
from flask import Flask, request, g
from time import time
app = Flask(__name__)
BUCKET = {}
WINDOW_SIZE = 60 # seconds
MAX_REQUESTS = 5
@app.before_request
def rate_limiter():
client_id = request.remote_addr
request_time = time()
bucket = BUCKET.get(client_id, (0, request_time))
requests, last_time = bucket
if request_time - last_time > WINDOW_SIZE:
BUCKET[client_id] = (1, request_time)
elif requests < MAX_REQUESTS:
BUCKET[client_id] = (requests + 1, last_time)
else:
return "Rate limit exceeded", 429
Explanation of Key Concepts
- Token Bucket Algorithm: This approach allows for a specific number of requests (tokens) in a bucket, refilling over time. Once the bucket is empty, further requests are denied until it refills.
- Flask Middleware: The
@app.before_requestdecorator in Flask is used to execute therate_limiterfunction before each request, enabling the rate limiting check. - Global Dictionary:
BUCKETstores the count and the last request time for eachclient_id, which in this case, is the client's IP address (request.remote_addr).
Quick Tip
Implementing rate limiting at the application level, as shown, is straightforward and suitable for small to medium-scale applications. However, for larger, more distributed systems, consider using dedicated middleware or services designed for rate limiting and traffic management.
This simple rate limiting solution demonstrates how a few lines of code can significantly enhance the robustness and fairness of your API's request handling.