The DDOS of Your Own Making
Rate limiting is the fundamental shield of any public API. Without it, a poorly written while(true) loop in a client's script can exhaust your database connections, saturating your infrastructure and causing a total outage for all other tenants.
However, implementing rate limiting poorly is often worse than not having it at all.
The Centralized Bottleneck: If your rate limiting strategy relies on querying a central relational database or a single Redis cluster on every incoming request, your rate limiter becomes the bottleneck. Under heavy load, the database locks up, and legitimate traffic is dropped alongside abusive traffic.
Edge-Native Rate Limiting
To protect the core infrastructure, abuse prevention must occur as close to the user as possible—at the network edge.
When an API gateway spans hundreds of global data centers, synchronizing a rate limit counter (e.g., "User A has made 95 out of 100 requests") across the globe in real-time is impossible due to the speed of light.
The Local Token Bucket Algorithm
Instead of global synchronization, modern edge networks utilize localized instances of the Token Bucket algorithm.
- Buckets per PoP: Each Point of Presence (PoP) around the world maintains its own localized counter for the user's API key.
- Asynchronous Syncing: The local counters periodically sync their state to a regional aggregator, achieving eventual consistency without blocking the main execution thread.
// Edge Rate Limiter Pattern
export async function checkRateLimit(request, env) {
const apiKey = getApiKey(request);
// 1. Check the localized Token Bucket (in-memory at the edge)
const bucket = await env.RATE_LIMITER.getBucket(apiKey);
if (bucket.tokens <= 0) {
return new Response(JSON.stringify({
error: "Rate limit exceeded",
retry_after: bucket.resetTime
}), {
status: 429,
headers: { "X-RateLimit-Reset": bucket.resetTime }
});
}
// 2. Consume a token asynchronously and allow the request
env.ctx.waitUntil(env.RATE_LIMITER.consume(apiKey));
return null;
}
Segmented Limits for Critical Infrastructure
Not all API routes are created equal. A robust rate limiting architecture segments limits based on the computational cost of the endpoint.
- High Limit (10,000/min): Lightweight read operations, such as checking the status of an ongoing export.
- Medium Limit (1,000/min): Standard mutations, such as updating user metadata.
- Critical Limit (10/min): Cryptographically expensive operations or high-abuse targets, such as initiating a password reset or sending bulk emails through services like MyEmailAPI.