API Performance: Rate Limiting, Pagination & Async Processing

Your API is only as fast as its slowest endpoint under peak load. A single unoptimized endpoint that holds database connections for 5 seconds can exhaust your connection pool and cascade failures across your entire platform. API performance engineering focuses on three pillars: protecting your API from overload (rate limiting), handling large datasets efficiently (pagination), and moving expensive operations out of the request cycle (async processing).

Key Takeaways

Token bucket and sliding window are the two rate limiting algorithms that cover 95% of use cases -- choose based on whether you want burst tolerance or strict enforcement

Cursor-based pagination outperforms offset pagination for large datasets because it avoids counting skipped rows

Async processing with job queues reduces P95 response times by moving email sending, PDF generation, and webhook delivery out of the request path

Response compression with Brotli reduces payload sizes by 70-85%, translating directly to faster client-side rendering

Rate Limiting Algorithms

Rate limiting protects your API from abuse, ensures fair resource allocation, and prevents cascade failures during traffic spikes. The algorithm you choose determines how bursts are handled and how predictable the limiting behavior is for consumers.

Algorithm	Burst Handling	Memory Usage	Precision	Best For
Fixed window	Allows 2x burst at window boundary	Very low	Low	Simple use cases, internal APIs
Sliding window log	No bursts, precise	High (stores timestamps)	Very high	Financial APIs, strict compliance
Sliding window counter	Minimal boundary burst	Low	High	General-purpose public APIs
Token bucket	Allows controlled bursts	Low	Moderate	APIs with natural burst patterns
Leaky bucket	Smooths all traffic	Low	High	APIs requiring steady throughput

Token Bucket

The token bucket algorithm is the most practical choice for most APIs. A bucket holds tokens up to a maximum capacity. Tokens are added at a fixed rate (the refill rate). Each request consumes one token. If the bucket is empty, the request is rejected or queued.

The key advantage of token bucket is burst tolerance. If a client has not made requests for a while, their bucket is full and they can make a burst of requests up to the bucket capacity. This matches natural usage patterns -- a client that loads a dashboard might make 20 requests in rapid succession, then nothing for 30 seconds.

Configuration example for an eCommerce API:

Bucket size: 100 tokens
Refill rate: 10 tokens per second
This allows bursts of up to 100 requests while sustaining 10 requests per second long-term

Sliding Window Counter

The sliding window counter combines the precision of sliding window log with the memory efficiency of fixed window. It maintains counters for the current and previous window, then calculates a weighted count based on how far into the current window the request falls.

For a 60-second window evaluated 45 seconds in, the effective count is: (previous window count * 0.25) + (current window count). This eliminates the boundary burst problem of fixed windows without storing individual request timestamps.

Implementation with Redis

Redis is the standard backing store for distributed rate limiting because it provides atomic increment operations with TTL. Use INCR with EXPIRE for fixed windows, or sorted sets with ZADD and ZRANGEBYSCORE for sliding windows. For token bucket, Redis Lua scripts provide atomic check-and-decrement operations.

Rate limiting headers communicate limits to API consumers:

X-RateLimit-Limit -- maximum requests allowed in the window
X-RateLimit-Remaining -- requests remaining in the current window
X-RateLimit-Reset -- Unix timestamp when the window resets
Retry-After -- seconds until the client should retry (on 429 responses)

Pagination Strategies

Every list endpoint must be paginated. Returning unbounded result sets wastes bandwidth, strains the database, and risks timeout errors as data grows.

Offset Pagination

Offset pagination uses LIMIT and OFFSET SQL clauses. The client requests ?page=3&limit=20, and the server translates to OFFSET 40 LIMIT 20.

Advantages:

Simple to implement and understand
Clients can jump to any page directly
Total count enables "Page X of Y" UI

Disadvantages:

Performance degrades with high offsets -- OFFSET 1000000 still scans 1,000,000 rows before returning results
Inconsistent results when data changes between pages (rows shift as new data is inserted or deleted)
Total count query (COUNT(*)) can be expensive on large tables

Cursor-Based Pagination

Cursor-based pagination uses an opaque cursor (typically an encoded primary key or timestamp) to mark the position in the result set. The client requests ?cursor=abc123&limit=20, and the server uses the cursor as a WHERE clause: WHERE id > decoded(abc123) LIMIT 20.

Advantages:

Consistent performance regardless of position in the dataset -- no offset scanning
Stable results even when data changes between pages
Natural fit for infinite scroll and real-time feeds

Disadvantages:

Cannot jump to arbitrary pages (no "Go to page 50")
More complex to implement, especially with multi-column sort orders
Total count must be provided separately if needed

Which Pagination to Use

Scenario	Recommendation	Reason
Admin data tables with page numbers	Offset	Users expect page navigation
Mobile infinite scroll	Cursor	Performance at any depth
API consumed by integrations	Cursor	Stable pagination for batch processing
Small datasets (under 10,000 rows)	Either	Performance difference is negligible
Large datasets (over 100,000 rows)	Cursor	Offset becomes unusably slow
Real-time feeds (chat, notifications)	Cursor	Consistency as new data arrives

Pagination Response Format

A well-designed pagination response includes metadata that clients need to navigate:

{
  "data": [],
  "pagination": {
    "total": 15432,
    "limit": 20,
    "hasMore": true,
    "nextCursor": "eyJpZCI6MTAwfQ=="
  }
}

Async Processing with Job Queues

Synchronous API endpoints should return responses within 200ms. Any operation that takes longer -- sending emails, generating PDFs, processing images, calling external APIs, running reports -- should be moved to a background job queue.

The Job Queue Pattern

The API endpoint validates the request and creates a job record
The job is placed on a queue (Redis, RabbitMQ, SQS)
The API returns immediately with a 202 Accepted response and a job ID
A worker process picks up the job and executes it asynchronously
The client polls for job status or receives a webhook callback on completion

Common Async Use Cases

Email sending -- SMTP operations take 500ms-3s depending on the provider. Queueing emails reduces API response time and allows retry logic for transient failures without blocking the user.

PDF generation -- Generating invoices, reports, or export files is CPU-intensive and memory-heavy. Running these in dedicated workers prevents resource contention with API request handling.

Webhook delivery -- Outgoing webhooks depend on third-party server availability. Queue webhook deliveries with exponential backoff retry (1s, 2s, 4s, 8s, up to 5 minutes) to handle temporary failures without blocking your system.

Data import and export -- Processing CSV uploads with 100,000 rows should never happen in a request cycle. Accept the upload, return a job ID, and process rows in batches.

Queue Selection

Queue Technology	Best For	Considerations
BullMQ (Redis-backed)	Node.js applications, NestJS integration	Great developer experience, built-in dashboard
RabbitMQ	Multi-language systems, complex routing	Mature, supports message acknowledgment patterns
AWS SQS	Serverless, managed infrastructure	No server management, pay-per-message
Kafka	Event streaming, high throughput	Overkill for simple job queues, excellent for event sourcing

Response Optimization

Beyond application logic, the response itself can be optimized for size and delivery speed.

Compression

Enable response compression to reduce payload sizes over the network. Modern compression algorithms significantly reduce text-based payloads (JSON, HTML, CSS, JavaScript).

Algorithm	Compression Ratio	CPU Cost	Browser Support
gzip	60-75% reduction	Low	Universal
Brotli	70-85% reduction	Moderate	All modern browsers
zstd	70-85% reduction	Low	Emerging (not yet universal)

Use Brotli for static assets (pre-compressed at build time) and gzip as a fallback for dynamic responses. In NestJS, the compression middleware handles this automatically, but in production, let Nginx handle compression to offload CPU from your application server.

Field Selection

Allow API consumers to request only the fields they need. GraphQL does this inherently, but REST APIs can support field selection with a ?fields=id,name,price query parameter. This reduces payload size and can optimize database queries by selecting only needed columns.

Response Caching Headers

Set appropriate Cache-Control headers on API responses. Public list endpoints (products, categories) can use Cache-Control: public, max-age=300 to cache for 5 minutes. Authenticated endpoints should use Cache-Control: private, no-cache to prevent CDN caching while allowing browser caching with revalidation.

For more on caching strategies, see our detailed guide on Redis, CDN, and HTTP caching.

Connection Management

Database and HTTP connections are finite resources that must be managed carefully under load.

Database Connection Pooling

A connection pool maintains a set of reusable database connections. Without pooling, each API request opens a new database connection (50-100ms overhead) and closes it after the response. With pooling, requests borrow connections from the pool and return them when done.

Pool sizing formula: connections = (core_count * 2) + effective_spindle_count. For a 4-core server with SSD storage, 10-20 connections per application instance is a good starting point. Monitor pool utilization -- if it regularly exceeds 80%, either increase the pool size or optimize query duration.

HTTP Keep-Alive

Enable HTTP keep-alive for connections to upstream services (databases, Redis, external APIs). This reuses TCP connections instead of establishing new ones per request, eliminating the TCP handshake and TLS negotiation overhead (50-200ms per new connection).

Frequently Asked Questions

What rate limits should I set for a public API?

Start with conservative limits and adjust based on legitimate usage patterns. A common starting point is 100 requests per minute for authenticated users and 20 requests per minute for anonymous users. Monitor 429 response rates -- if legitimate users frequently hit limits, increase them. Provide higher limits for premium API tiers.

How do I handle pagination when data changes between pages?

Cursor-based pagination handles this naturally because it anchors to a specific position in the sorted data. With offset pagination, document that results may shift between pages. For critical use cases (financial reports, data exports), snapshot the data at the beginning of pagination and paginate over the snapshot.

Should I use REST or GraphQL for performance?

REST with field selection and proper caching is faster for simple, well-defined endpoints. GraphQL eliminates over-fetching and under-fetching for complex data requirements but adds query parsing overhead and makes HTTP caching harder. Use REST for public APIs with caching needs and GraphQL for internal APIs serving complex frontend data requirements.

How do I monitor API performance in production?

Track P50, P95, and P99 response times per endpoint. Set alerts on P95 breaching your SLO (typically 200-500ms). Use distributed tracing to break down time spent in database, cache, external services, and application logic. See our guide on monitoring and observability for detailed setup.

What Is Next

Start by auditing your API endpoints for missing pagination, unprotected public endpoints without rate limiting, and synchronous operations that should be async. These three changes typically reduce P95 response times by 50-70% and prevent the most common production incidents.

For the complete performance engineering perspective, see our pillar guide on scaling your business platform. For the database layer that powers your API, read our query optimization guide.

ECOSIRE builds high-performance APIs for business platforms on Odoo ERP and custom architectures. Contact us for an API performance review.

Published by ECOSIRE — helping businesses scale with AI-powered solutions across Odoo ERP, Shopify eCommerce, and OpenClaw AI.

Key Takeaways

Token bucket and sliding window are the two rate limiting algorithms that cover 95% of use cases -- choose based on whether you want burst tolerance or strict enforcement

Cursor-based pagination outperforms offset pagination for large datasets because it avoids counting skipped rows

Async processing with job queues reduces P95 response times by moving email sending, PDF generation, and webhook delivery out of the request path

Response compression with Brotli reduces payload sizes by 70-85%, translating directly to faster client-side rendering

Rate Limiting Algorithms

Algorithm	Burst Handling	Memory Usage	Precision	Best For
Fixed window	Allows 2x burst at window boundary	Very low	Low	Simple use cases, internal APIs
Sliding window log	No bursts, precise	High (stores timestamps)	Very high	Financial APIs, strict compliance
Sliding window counter	Minimal boundary burst	Low	High	General-purpose public APIs
Token bucket	Allows controlled bursts	Low	Moderate	APIs with natural burst patterns
Leaky bucket	Smooths all traffic	Low	High	APIs requiring steady throughput

Token Bucket

Configuration example for an eCommerce API:

Bucket size: 100 tokens
Refill rate: 10 tokens per second
This allows bursts of up to 100 requests while sustaining 10 requests per second long-term

Sliding Window Counter

Implementation with Redis

Rate limiting headers communicate limits to API consumers:

X-RateLimit-Limit -- maximum requests allowed in the window
X-RateLimit-Remaining -- requests remaining in the current window
X-RateLimit-Reset -- Unix timestamp when the window resets
Retry-After -- seconds until the client should retry (on 429 responses)

Pagination Strategies

Every list endpoint must be paginated. Returning unbounded result sets wastes bandwidth, strains the database, and risks timeout errors as data grows.

Offset Pagination

Offset pagination uses LIMIT and OFFSET SQL clauses. The client requests ?page=3&limit=20, and the server translates to OFFSET 40 LIMIT 20.

Advantages:

Simple to implement and understand
Clients can jump to any page directly
Total count enables "Page X of Y" UI

Disadvantages:

Performance degrades with high offsets -- OFFSET 1000000 still scans 1,000,000 rows before returning results
Inconsistent results when data changes between pages (rows shift as new data is inserted or deleted)
Total count query (COUNT(*)) can be expensive on large tables

Cursor-Based Pagination

Advantages:

Consistent performance regardless of position in the dataset -- no offset scanning
Stable results even when data changes between pages
Natural fit for infinite scroll and real-time feeds

Disadvantages:

Cannot jump to arbitrary pages (no "Go to page 50")
More complex to implement, especially with multi-column sort orders
Total count must be provided separately if needed

Which Pagination to Use

Scenario	Recommendation	Reason
Admin data tables with page numbers	Offset	Users expect page navigation
Mobile infinite scroll	Cursor	Performance at any depth
API consumed by integrations	Cursor	Stable pagination for batch processing
Small datasets (under 10,000 rows)	Either	Performance difference is negligible
Large datasets (over 100,000 rows)	Cursor	Offset becomes unusably slow
Real-time feeds (chat, notifications)	Cursor	Consistency as new data arrives

Pagination Response Format

A well-designed pagination response includes metadata that clients need to navigate:

{
  "data": [],
  "pagination": {
    "total": 15432,
    "limit": 20,
    "hasMore": true,
    "nextCursor": "eyJpZCI6MTAwfQ=="
  }
}

Async Processing with Job Queues

The Job Queue Pattern

The API endpoint validates the request and creates a job record
The job is placed on a queue (Redis, RabbitMQ, SQS)
The API returns immediately with a 202 Accepted response and a job ID
A worker process picks up the job and executes it asynchronously
The client polls for job status or receives a webhook callback on completion

Common Async Use Cases

Email sending -- SMTP operations take 500ms-3s depending on the provider. Queueing emails reduces API response time and allows retry logic for transient failures without blocking the user.

PDF generation -- Generating invoices, reports, or export files is CPU-intensive and memory-heavy. Running these in dedicated workers prevents resource contention with API request handling.

Data import and export -- Processing CSV uploads with 100,000 rows should never happen in a request cycle. Accept the upload, return a job ID, and process rows in batches.

Queue Selection

Queue Technology	Best For	Considerations
BullMQ (Redis-backed)	Node.js applications, NestJS integration	Great developer experience, built-in dashboard
RabbitMQ	Multi-language systems, complex routing	Mature, supports message acknowledgment patterns
AWS SQS	Serverless, managed infrastructure	No server management, pay-per-message
Kafka	Event streaming, high throughput	Overkill for simple job queues, excellent for event sourcing

Response Optimization

Beyond application logic, the response itself can be optimized for size and delivery speed.

Compression

Enable response compression to reduce payload sizes over the network. Modern compression algorithms significantly reduce text-based payloads (JSON, HTML, CSS, JavaScript).

Algorithm	Compression Ratio	CPU Cost	Browser Support
gzip	60-75% reduction	Low	Universal
Brotli	70-85% reduction	Moderate	All modern browsers
zstd	70-85% reduction	Low	Emerging (not yet universal)

Field Selection

Response Caching Headers

For more on caching strategies, see our detailed guide on Redis, CDN, and HTTP caching.

Connection Management

Database and HTTP connections are finite resources that must be managed carefully under load.

Database Connection Pooling

HTTP Keep-Alive

Frequently Asked Questions

What rate limits should I set for a public API?

How do I handle pagination when data changes between pages?

Should I use REST or GraphQL for performance?

How do I monitor API performance in production?

What Is Next

For the complete performance engineering perspective, see our pillar guide on scaling your business platform. For the database layer that powers your API, read our query optimization guide.

ECOSIRE builds high-performance APIs for business platforms on Odoo ERP and custom architectures. Contact us for an API performance review.

Published by ECOSIRE — helping businesses scale with AI-powered solutions across Odoo ERP, Shopify eCommerce, and OpenClaw AI.

API Performance: Rate Limiting, Pagination & Async Processing

Rate Limiting Algorithms

Token Bucket

Sliding Window Counter

Implementation with Redis

Pagination Strategies

Offset Pagination

Cursor-Based Pagination

Which Pagination to Use

Pagination Response Format

Async Processing with Job Queues

The Job Queue Pattern

Common Async Use Cases

Queue Selection

Response Optimization

Compression

Field Selection

Response Caching Headers

Connection Management

Database Connection Pooling

HTTP Keep-Alive

Frequently Asked Questions

What Is Next

Grow Your Business with ECOSIRE

Related Articles

Shopify Speed Optimization: A Technical Checklist That Actually Moves Core Web Vitals (2026)

Odoo 19 HR: Skills Matrix, Career Plans, Performance Cycles

Odoo 19 Performance Benchmarks: PostgreSQL 17 Tuning Numbers

More from Performance & Scalability

Shopify Speed Optimization: A Technical Checklist That Actually Moves Core Web Vitals (2026)

Technical SEO Audit Checklist 2026: 47 Checks We Run on Every Client Site

Odoo 19 HR: Skills Matrix, Career Plans, Performance Cycles

Odoo 19 Performance Benchmarks: PostgreSQL 17 Tuning Numbers

OpenClaw Cost Optimization and Token Efficiency at Scale

Power BI Incremental Refresh for Tables Over 10 Million Rows

API Performance: Rate Limiting, Pagination & Async Processing

Rate Limiting Algorithms

Token Bucket

Sliding Window Counter

Implementation with Redis

Pagination Strategies

Offset Pagination

Cursor-Based Pagination

Which Pagination to Use

Pagination Response Format

Async Processing with Job Queues

The Job Queue Pattern

Common Async Use Cases

Queue Selection

Response Optimization

Compression

Field Selection

Response Caching Headers

Connection Management

Database Connection Pooling

HTTP Keep-Alive

Frequently Asked Questions

What Is Next

Grow Your Business with ECOSIRE

Related Articles

Shopify Speed Optimization: A Technical Checklist That Actually Moves Core Web Vitals (2026)

Odoo 19 HR: Skills Matrix, Career Plans, Performance Cycles

Odoo 19 Performance Benchmarks: PostgreSQL 17 Tuning Numbers

More from Performance & Scalability

Shopify Speed Optimization: A Technical Checklist That Actually Moves Core Web Vitals (2026)

Technical SEO Audit Checklist 2026: 47 Checks We Run on Every Client Site

Odoo 19 HR: Skills Matrix, Career Plans, Performance Cycles

Odoo 19 Performance Benchmarks: PostgreSQL 17 Tuning Numbers

OpenClaw Cost Optimization and Token Efficiency at Scale

Power BI Incremental Refresh for Tables Over 10 Million Rows