Rate limiting is one of those features that seems simple until you implement it in production. The algorithm choice, threshold values, and response format each have concrete effects on user experience and API abuse resistance.
Four common algorithms:
Fixed window: count requests per time window (e.g., 100 requests per minute). Simple to implement and understand. Weakness: allows 2x the limit in a burst at window boundaries — 100 requests at minute 0:59, then 100 more at minute 1:00.
Sliding window log: store the timestamp of each request, count requests in the trailing window. Accurate and burst-resistant, but memory-intensive at high scale (you store one entry per request per user).
Sliding window counter: approximate the sliding window using weighted counts from the current and previous window. 99%+ accuracy at O(1) memory per user. The algorithm behind Cloudflare and Upstash's rate limiting.
Token bucket: users accumulate tokens at a fixed rate, each request consumes a token. Allows bursts up to bucket size, then enforces steady-state rate. Most flexible for APIs where occasional bursts are acceptable. Stripe uses token bucket.
Response headers matter for DX. Return X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset on every response. Return Retry-After on 429 responses. Well-implemented rate limit responses let API clients backoff gracefully without polling.
Threshold guidance for REST APIs: 100-1000 requests/minute per authenticated user is a reasonable default for most use cases. Set burst limits 3-5x higher than sustained limits. Separate limits for read vs write endpoints.