API Rate Limiting: Patterns and Best Practices

Master API rate limiting with token bucket, sliding window, and fixed counter patterns. Protect your backend with NestJS throttler, Redis, and real-world configuration examples.

E
ECOSIRE Research and Development Team
|March 19, 202610 min read2.2k Words|

API Rate Limiting: Patterns and Best Practices

Every public API endpoint is a target — bots, scrapers, and bad actors will hammer your server the moment you go live. Without rate limiting, a single misbehaving client can exhaust your database connections, spike your cloud bill, and take down service for every legitimate user. Rate limiting is not optional; it is the first line of defense for any production API.

This guide walks through the four major rate-limiting algorithms, their trade-offs, and how to implement them correctly in NestJS with Redis. You will leave with copy-paste configurations for common scenarios and a mental model for choosing the right strategy per endpoint.

Key Takeaways

  • Token bucket allows controlled bursting while fixed window counters are the cheapest to implement
  • Sliding window log is the most accurate but memory-intensive; sliding window counter is the best balance
  • Redis is the only correct backing store when you run multiple API replicas
  • NestJS @nestjs/throttler supports custom storage adapters — swap in Redis with one config change
  • Always return Retry-After, X-RateLimit-Limit, X-RateLimit-Remaining, and X-RateLimit-Reset headers
  • Differentiate limits by endpoint sensitivity: authentication (5/min) vs read APIs (1000/min)
  • Use IP-based limits for anonymous traffic and user-based limits for authenticated requests
  • Never silently drop requests — always return 429 Too Many Requests with a helpful message

The Four Core Algorithms

Fixed Window Counter

The simplest approach: count requests in a fixed time window, reset at the boundary.

// Fixed window: 100 requests per minute
// Window resets at :00, :01, :02 ...
const windowKey = `ratelimit:${userId}:${Math.floor(Date.now() / 60000)}`;
const count = await redis.incr(windowKey);
await redis.expire(windowKey, 60);

if (count > 100) {
  throw new TooManyRequestsException();
}

Weakness: The boundary exploit. A client can send 100 requests at 12:00:59 and another 100 at 12:01:00 — effectively 200 requests in two seconds. For most APIs this is acceptable. For authentication endpoints, it is not.

Sliding Window Log

Store every request timestamp. On each request, count timestamps in the last window.

const now = Date.now();
const windowStart = now - 60_000; // 60 seconds ago

// Remove old entries, add current
await redis.zremrangebyscore(key, 0, windowStart);
await redis.zadd(key, now, now.toString());
const count = await redis.zcard(key);
await redis.expire(key, 60);

if (count > 100) {
  throw new TooManyRequestsException();
}

Trade-off: Perfectly accurate but stores O(n) entries per user where n is the request count. At 1,000 RPS across 10,000 users, your Redis memory climbs fast. Use for low-volume, high-security endpoints like password reset.

Sliding Window Counter

Approximate sliding window using two fixed windows — no memory explosion.

const now = Date.now();
const currentWindow = Math.floor(now / 60000);
const previousWindow = currentWindow - 1;
const windowProgress = (now % 60000) / 60000; // 0.0 to 1.0

const [current, previous] = await redis.mget(
  `rl:${userId}:${currentWindow}`,
  `rl:${userId}:${previousWindow}`
);

const estimated =
  (parseInt(previous ?? '0') * (1 - windowProgress)) +
  parseInt(current ?? '0');

if (estimated >= 100) {
  throw new TooManyRequestsException();
}

await redis.incr(`rl:${userId}:${currentWindow}`);
await redis.expire(`rl:${userId}:${currentWindow}`, 120);

This is the algorithm Cloudflare uses. It smooths the boundary spike with minimal overhead — two Redis keys per user per window.

Token Bucket

The gold standard for allowing bursts while maintaining a long-term rate. Each user has a bucket that fills at a constant rate. Requests consume tokens.

async function consumeToken(
  redis: Redis,
  userId: string,
  ratePerSec: number,
  capacity: number
): Promise<boolean> {
  const now = Date.now() / 1000;
  const key = `bucket:${userId}`;

  const values = await redis.hmget(key, 'tokens', 'lastRefill');
  const currentTokens = parseFloat(values[0] ?? String(capacity));
  const lastRefillTime = parseFloat(values[1] ?? String(now));

  // Refill tokens based on elapsed time
  const elapsed = now - lastRefillTime;
  const refilled = Math.min(capacity, currentTokens + elapsed * ratePerSec);

  if (refilled < 1) {
    return false; // No tokens available
  }

  await redis.hset(key, 'tokens', String(refilled - 1), 'lastRefill', String(now));
  await redis.expire(key, Math.ceil(capacity / ratePerSec) + 60);

  return true;
}

Token bucket is ideal for APIs that need to allow short bursts (uploading 10 files at once) while preventing sustained abuse.


NestJS Throttler Configuration

@nestjs/throttler v5 ships with a Redis storage adapter. Here is a production-ready setup:

pnpm add @nestjs/throttler @nestjs/throttler-storage-redis ioredis
// app.module.ts
import { ThrottlerModule, ThrottlerGuard } from '@nestjs/throttler';
import { ThrottlerStorageRedisService } from '@nestjs/throttler-storage-redis';
import { APP_GUARD } from '@nestjs/core';

@Module({
  imports: [
    ThrottlerModule.forRootAsync({
      imports: [ConfigModule],
      inject: [ConfigService],
      useFactory: (config: ConfigService) => ({
        throttlers: [
          { name: 'short',  ttl: 1000,    limit: 5    }, // 5 req/sec burst
          { name: 'medium', ttl: 60000,   limit: 300  }, // 300 req/min
          { name: 'long',   ttl: 3600000, limit: 5000 }, // 5000 req/hr
        ],
        storage: new ThrottlerStorageRedisService({
          host: config.get('REDIS_HOST'),
          port: config.get('REDIS_PORT'),
        }),
      }),
    }),
  ],
  providers: [
    {
      provide: APP_GUARD,
      useClass: ThrottlerGuard,
    },
  ],
})
export class AppModule {}

Override limits per controller or route:

@Controller('auth')
export class AuthController {
  // Authentication: very strict — 5 attempts per minute
  @Post('login')
  @Throttle({ medium: { ttl: 60000, limit: 5 } })
  async login(@Body() dto: LoginDto) { /* ... */ }

  // Refresh: moderate — 30 per minute
  @Post('refresh')
  @Throttle({ medium: { ttl: 60000, limit: 30 } })
  async refresh(@Body() dto: RefreshDto) { /* ... */ }

  // Skip throttling on the exchange endpoint (protected by one-time code TTL)
  @Post('exchange')
  @SkipThrottle()
  async exchange(@Body() dto: ExchangeDto) { /* ... */ }
}

Custom Key Generators

By default, NestJS throttler uses the client IP. In production behind Nginx/Cloudflare, you need X-Real-IP or CF-Connecting-IP.

// throttler-behind-proxy.guard.ts
import { ThrottlerGuard } from '@nestjs/throttler';
import { Injectable, ExecutionContext } from '@nestjs/common';

@Injectable()
export class ThrottlerBehindProxyGuard extends ThrottlerGuard {
  protected async getTracker(req: Record<string, any>): Promise<string> {
    // Authenticated user — use userId for accurate per-user limits
    if (req.user?.sub) {
      return `user:${req.user.sub}`;
    }
    // Anonymous — use real IP from Cloudflare header
    return (
      req.headers['cf-connecting-ip'] ||
      req.headers['x-real-ip'] ||
      (req.headers['x-forwarded-for'] as string)?.split(',')[0] ||
      req.ip
    );
  }

  protected async throwThrottlingException(
    context: ExecutionContext,
    throttlerLimitDetail: ThrottlerLimitDetail
  ): Promise<void> {
    const response = context.switchToHttp().getResponse();
    response.header(
      'Retry-After',
      Math.ceil(throttlerLimitDetail.ttl / 1000)
    );
    await super.throwThrottlingException(context, throttlerLimitDetail);
  }
}

Response Headers

RFC 6585 and draft RateLimit headers tell clients exactly when to retry:

// rate-limit.interceptor.ts
import { CallHandler, ExecutionContext, Injectable, NestInterceptor } from '@nestjs/common';
import { Observable, tap } from 'rxjs';

@Injectable()
export class RateLimitHeadersInterceptor implements NestInterceptor {
  intercept(context: ExecutionContext, next: CallHandler): Observable<any> {
    return next.handle().pipe(
      tap(() => {
        const res = context.switchToHttp().getResponse();
        const req = context.switchToHttp().getRequest();

        // Values injected by ThrottlerGuard after evaluation
        if (req.rateLimit) {
          res.set({
            'X-RateLimit-Limit': req.rateLimit.limit,
            'X-RateLimit-Remaining': Math.max(
              0,
              req.rateLimit.limit - req.rateLimit.current
            ),
            'X-RateLimit-Reset': new Date(
              Date.now() + req.rateLimit.ttl
            ).toISOString(),
            'RateLimit-Policy': `${req.rateLimit.limit};w=${Math.ceil(
              req.rateLimit.ttl / 1000
            )}`,
          });
        }
      })
    );
  }
}

Endpoint-Specific Strategies

Different endpoints warrant different limits. Here is a reference table for common patterns:

Endpoint TypeAlgorithmLimitWindow
Login / password resetSliding window log515 minutes
OTP / 2FA verifyFixed window310 minutes
Public read APIToken bucket1000 burst, 100/s fill
Mutation API (authenticated)Sliding window counter3001 minute
Webhook ingestionFixed window10,0001 minute
File uploadToken bucket10 burst, 1/s fill
AI / LLM endpointsFixed window201 minute
Search (anonymous)Fixed window301 minute

Atomic Lua Scripts for Distributed Safety

When you have multiple API replicas, race conditions in increment-check sequences can allow bursts above the limit. Use a Lua script loaded via redis.defineCommand to make the check-and-increment atomic:

// rate-limit.service.ts
import { Injectable } from '@nestjs/common';
import Redis from 'ioredis';

@Injectable()
export class RateLimitService {
  constructor(private readonly redis: Redis) {
    // Define atomic increment+check as a custom Redis command
    this.redis.defineCommand('rateLimitCheck', {
      numberOfKeys: 1,
      lua: `
        local key   = KEYS[1]
        local limit = tonumber(ARGV[1])
        local ttlMs = tonumber(ARGV[2])
        local count = redis.call('INCR', key)
        if count == 1 then
          redis.call('PEXPIRE', key, ttlMs)
        end
        if count > limit then
          return {0, redis.call('PTTL', key)}
        end
        return {1, -1}
      `,
    });
  }

  async isAllowed(
    key: string,
    limit: number,
    windowMs: number
  ): Promise<{ allowed: boolean; retryAfterMs: number }> {
    const result = await (this.redis as any).rateLimitCheck(
      key, limit, windowMs
    ) as [number, number];

    return {
      allowed: result[0] === 1,
      retryAfterMs: result[1] > 0 ? result[1] : 0,
    };
  }
}

Graceful Degradation and Bypass Strategies

Rate limiting should not block internal health checks, monitoring agents, or trusted partners.

// trusted-bypass.guard.ts
@Injectable()
export class RateLimitWithBypassGuard extends ThrottlerBehindProxyGuard {
  private readonly trustedTokens = new Set([
    process.env.MONITORING_TOKEN,
    process.env.PARTNER_API_TOKEN,
  ].filter(Boolean));

  async canActivate(context: ExecutionContext): Promise<boolean> {
    const req = context.switchToHttp().getRequest();

    // Internal health checks bypass all rate limits
    if (req.path === '/health' || req.path === '/ready') {
      return true;
    }

    // Trusted API tokens bypass
    const token = req.headers['x-bypass-token'];
    if (token && this.trustedTokens.has(token)) {
      return true;
    }

    return super.canActivate(context);
  }
}

For progressive rate limiting (warn before hard block), return 429 with a Retry-After header only after crossing 80% of the limit:

// In your custom guard, after counting requests:
if (count > limit * 0.9 && count <= limit) {
  response.set('X-RateLimit-Warning', 'approaching limit');
}
if (count > limit) {
  response.set('Retry-After', retryAfterSeconds.toString());
  throw new HttpException('Rate limit exceeded', 429);
}

Testing Rate Limiting

// rate-limit.spec.ts
describe('Rate Limiting', () => {
  it('should block after limit exceeded', async () => {
    const app = moduleRef.createNestApplication();
    await app.init();

    // Hit the endpoint 5 times (limit for login)
    for (let i = 0; i < 5; i++) {
      await request(app.getHttpServer())
        .post('/auth/login')
        .send({ email: '[email protected]', password: 'wrongpassword' })
        .expect((res) => expect(res.status).toBeLessThan(429));
    }

    // 6th request should be blocked
    const response = await request(app.getHttpServer())
      .post('/auth/login')
      .send({ email: '[email protected]', password: 'wrongpassword' });

    expect(response.status).toBe(429);
    expect(response.headers['retry-after']).toBeDefined();
    expect(response.body.message).toContain('rate limit');
  });
});

Monitoring and Alerting

Rate limit events are valuable signals. Log them to your observability platform:

@Injectable()
export class RateLimitMetricsService {
  async recordRateLimitHit(userId: string, endpoint: string, ip: string) {
    await this.metricsService.increment('rate_limit.hits', {
      endpoint,
      user_type: userId ? 'authenticated' : 'anonymous',
    });

    // Alert on sustained attacks (>100 hits in 1 min from same IP)
    const alertKey = `rl_alert:${ip}`;
    const recentHits = await this.redis.incr(alertKey);
    if (recentHits === 1) {
      await this.redis.expire(alertKey, 60);
    }

    if (recentHits === 100) {
      await this.alertService.send({
        severity: 'high',
        message: `Rate limit attack detected from IP ${ip}`,
        endpoint,
      });
    }
  }
}

Create dashboards tracking:

  • Rate limit hits per endpoint (p95, p99)
  • Top IPs/users hitting limits
  • Percentage of requests blocked vs served
  • Retry-after durations to detect misconfigured clients

Frequently Asked Questions

Should I rate limit by IP or by user ID?

Use both. For unauthenticated endpoints, IP is the only identifier available. For authenticated endpoints, always prefer user ID — it is more accurate and prevents one shared IP (like a corporate NAT) from blocking all employees. Implement a two-tier check: IP limit at the Nginx level and user-ID limit at the application level.

What is the correct HTTP status code for rate limiting?

Always 429 Too Many Requests per RFC 6585. Never use 503 Service Unavailable (implies infrastructure failure) or 403 Forbidden (implies authorization failure). Include Retry-After as a header in seconds so clients know when to retry.

How do I handle rate limiting behind Cloudflare or a load balancer?

Configure your proxy to set X-Real-IP or CF-Connecting-IP and trust only your proxy's IP range. In Nginx: set_real_ip_from 103.21.244.0/22; real_ip_header CF-Connecting-IP;. In NestJS, set app.set('trust proxy', 1) and read req.ip which NestJS resolves from the trusted proxy header.

What Redis data structure is best for rate limiting?

For fixed/sliding window counters, use INCR + EXPIRE on a string key — O(1) per request. For sliding window log, use a sorted set (ZADD, ZREMRANGEBYSCORE, ZCARD) — O(log n). For token bucket, use a hash (HSET, HGET) — O(1). Lua scripts make all operations atomic across all three patterns.

How should I handle rate limits for webhooks from trusted providers?

Stripe, GitHub, and similar providers send webhooks from known IP ranges. Maintain an allowlist of their CIDR ranges and bypass rate limiting for those IPs on your webhook ingestion endpoint. Verify the webhook signature first — signature verification is your actual security layer there, not rate limiting.

Can I implement rate limiting at the Nginx level instead?

Yes, and you should for basic DDoS protection. Use limit_req_zone in Nginx for coarse IP-based limits (1000 req/min). Layer application-level rate limiting on top for granular per-user, per-endpoint control. The two layers complement each other: Nginx handles volume attacks cheaply without hitting your application, and NestJS handles nuanced business logic limits.


Next Steps

Building a production API without robust rate limiting is like leaving your front door unlocked. The patterns in this guide — token bucket for bursty traffic, sliding window counter for smooth enforcement, Redis-backed distributed storage, and proper 429 responses — form the backbone of a secure, scalable API.

ECOSIRE builds enterprise-grade NestJS backends with rate limiting, Redis caching, and full observability baked in from day one. If you are launching a new API or hardening an existing one, explore our backend engineering services to see how we can accelerate your delivery.

E

Written by

ECOSIRE Research and Development Team

Building enterprise-grade digital products at ECOSIRE. Sharing insights on Odoo integrations, e-commerce automation, and AI-powered business solutions.

Chat on WhatsApp