Part of our Performance & Scalability series
Read the complete guideAPI Performance: Rate Limiting, Pagination & Async Processing
Your API is only as fast as its slowest endpoint under peak load. A single unoptimized endpoint that holds database connections for 5 seconds can exhaust your connection pool and cascade failures across your entire platform. API performance engineering focuses on three pillars: protecting your API from overload (rate limiting), handling large datasets efficiently (pagination), and moving expensive operations out of the request cycle (async processing).
Key Takeaways
- Token bucket and sliding window are the two rate limiting algorithms that cover 95% of use cases -- choose based on whether you want burst tolerance or strict enforcement
- Cursor-based pagination outperforms offset pagination for large datasets because it avoids counting skipped rows
- Async processing with job queues reduces P95 response times by moving email sending, PDF generation, and webhook delivery out of the request path
- Response compression with Brotli reduces payload sizes by 70-85%, translating directly to faster client-side rendering
Rate Limiting Algorithms
Rate limiting protects your API from abuse, ensures fair resource allocation, and prevents cascade failures during traffic spikes. The algorithm you choose determines how bursts are handled and how predictable the limiting behavior is for consumers.
| Algorithm | Burst Handling | Memory Usage | Precision | Best For |
|---|---|---|---|---|
| Fixed window | Allows 2x burst at window boundary | Very low | Low | Simple use cases, internal APIs |
| Sliding window log | No bursts, precise | High (stores timestamps) | Very high | Financial APIs, strict compliance |
| Sliding window counter | Minimal boundary burst | Low | High | General-purpose public APIs |
| Token bucket | Allows controlled bursts | Low | Moderate | APIs with natural burst patterns |
| Leaky bucket | Smooths all traffic | Low | High | APIs requiring steady throughput |
Token Bucket
The token bucket algorithm is the most practical choice for most APIs. A bucket holds tokens up to a maximum capacity. Tokens are added at a fixed rate (the refill rate). Each request consumes one token. If the bucket is empty, the request is rejected or queued.
The key advantage of token bucket is burst tolerance. If a client has not made requests for a while, their bucket is full and they can make a burst of requests up to the bucket capacity. This matches natural usage patterns -- a client that loads a dashboard might make 20 requests in rapid succession, then nothing for 30 seconds.
Configuration example for an eCommerce API:
- Bucket size: 100 tokens
- Refill rate: 10 tokens per second
- This allows bursts of up to 100 requests while sustaining 10 requests per second long-term
Sliding Window Counter
The sliding window counter combines the precision of sliding window log with the memory efficiency of fixed window. It maintains counters for the current and previous window, then calculates a weighted count based on how far into the current window the request falls.
For a 60-second window evaluated 45 seconds in, the effective count is: (previous window count * 0.25) + (current window count). This eliminates the boundary burst problem of fixed windows without storing individual request timestamps.
Implementation with Redis
Redis is the standard backing store for distributed rate limiting because it provides atomic increment operations with TTL. Use INCR with EXPIRE for fixed windows, or sorted sets with ZADD and ZRANGEBYSCORE for sliding windows. For token bucket, Redis Lua scripts provide atomic check-and-decrement operations.
Rate limiting headers communicate limits to API consumers:
X-RateLimit-Limit-- maximum requests allowed in the windowX-RateLimit-Remaining-- requests remaining in the current windowX-RateLimit-Reset-- Unix timestamp when the window resetsRetry-After-- seconds until the client should retry (on 429 responses)
Pagination Strategies
Every list endpoint must be paginated. Returning unbounded result sets wastes bandwidth, strains the database, and risks timeout errors as data grows.
Offset Pagination
Offset pagination uses LIMIT and OFFSET SQL clauses. The client requests ?page=3&limit=20, and the server translates to OFFSET 40 LIMIT 20.
Advantages:
- Simple to implement and understand
- Clients can jump to any page directly
- Total count enables "Page X of Y" UI
Disadvantages:
- Performance degrades with high offsets --
OFFSET 1000000still scans 1,000,000 rows before returning results - Inconsistent results when data changes between pages (rows shift as new data is inserted or deleted)
- Total count query (COUNT(*)) can be expensive on large tables
Cursor-Based Pagination
Cursor-based pagination uses an opaque cursor (typically an encoded primary key or timestamp) to mark the position in the result set. The client requests ?cursor=abc123&limit=20, and the server uses the cursor as a WHERE clause: WHERE id > decoded(abc123) LIMIT 20.
Advantages:
- Consistent performance regardless of position in the dataset -- no offset scanning
- Stable results even when data changes between pages
- Natural fit for infinite scroll and real-time feeds
Disadvantages:
- Cannot jump to arbitrary pages (no "Go to page 50")
- More complex to implement, especially with multi-column sort orders
- Total count must be provided separately if needed
Which Pagination to Use
| Scenario | Recommendation | Reason |
|---|---|---|
| Admin data tables with page numbers | Offset | Users expect page navigation |
| Mobile infinite scroll | Cursor | Performance at any depth |
| API consumed by integrations | Cursor | Stable pagination for batch processing |
| Small datasets (under 10,000 rows) | Either | Performance difference is negligible |
| Large datasets (over 100,000 rows) | Cursor | Offset becomes unusably slow |
| Real-time feeds (chat, notifications) | Cursor | Consistency as new data arrives |
Pagination Response Format
A well-designed pagination response includes metadata that clients need to navigate:
{
"data": [],
"pagination": {
"total": 15432,
"limit": 20,
"hasMore": true,
"nextCursor": "eyJpZCI6MTAwfQ=="
}
}
Async Processing with Job Queues
Synchronous API endpoints should return responses within 200ms. Any operation that takes longer -- sending emails, generating PDFs, processing images, calling external APIs, running reports -- should be moved to a background job queue.
The Job Queue Pattern
- The API endpoint validates the request and creates a job record
- The job is placed on a queue (Redis, RabbitMQ, SQS)
- The API returns immediately with a 202 Accepted response and a job ID
- A worker process picks up the job and executes it asynchronously
- The client polls for job status or receives a webhook callback on completion
Common Async Use Cases
Email sending -- SMTP operations take 500ms-3s depending on the provider. Queueing emails reduces API response time and allows retry logic for transient failures without blocking the user.
PDF generation -- Generating invoices, reports, or export files is CPU-intensive and memory-heavy. Running these in dedicated workers prevents resource contention with API request handling.
Webhook delivery -- Outgoing webhooks depend on third-party server availability. Queue webhook deliveries with exponential backoff retry (1s, 2s, 4s, 8s, up to 5 minutes) to handle temporary failures without blocking your system.
Data import and export -- Processing CSV uploads with 100,000 rows should never happen in a request cycle. Accept the upload, return a job ID, and process rows in batches.
Queue Selection
| Queue Technology | Best For | Considerations |
|---|---|---|
| BullMQ (Redis-backed) | Node.js applications, NestJS integration | Great developer experience, built-in dashboard |
| RabbitMQ | Multi-language systems, complex routing | Mature, supports message acknowledgment patterns |
| AWS SQS | Serverless, managed infrastructure | No server management, pay-per-message |
| Kafka | Event streaming, high throughput | Overkill for simple job queues, excellent for event sourcing |
Response Optimization
Beyond application logic, the response itself can be optimized for size and delivery speed.
Compression
Enable response compression to reduce payload sizes over the network. Modern compression algorithms significantly reduce text-based payloads (JSON, HTML, CSS, JavaScript).
| Algorithm | Compression Ratio | CPU Cost | Browser Support |
|---|---|---|---|
| gzip | 60-75% reduction | Low | Universal |
| Brotli | 70-85% reduction | Moderate | All modern browsers |
| zstd | 70-85% reduction | Low | Emerging (not yet universal) |
Use Brotli for static assets (pre-compressed at build time) and gzip as a fallback for dynamic responses. In NestJS, the compression middleware handles this automatically, but in production, let Nginx handle compression to offload CPU from your application server.
Field Selection
Allow API consumers to request only the fields they need. GraphQL does this inherently, but REST APIs can support field selection with a ?fields=id,name,price query parameter. This reduces payload size and can optimize database queries by selecting only needed columns.
Response Caching Headers
Set appropriate Cache-Control headers on API responses. Public list endpoints (products, categories) can use Cache-Control: public, max-age=300 to cache for 5 minutes. Authenticated endpoints should use Cache-Control: private, no-cache to prevent CDN caching while allowing browser caching with revalidation.
For more on caching strategies, see our detailed guide on Redis, CDN, and HTTP caching.
Connection Management
Database and HTTP connections are finite resources that must be managed carefully under load.
Database Connection Pooling
A connection pool maintains a set of reusable database connections. Without pooling, each API request opens a new database connection (50-100ms overhead) and closes it after the response. With pooling, requests borrow connections from the pool and return them when done.
Pool sizing formula: connections = (core_count * 2) + effective_spindle_count. For a 4-core server with SSD storage, 10-20 connections per application instance is a good starting point. Monitor pool utilization -- if it regularly exceeds 80%, either increase the pool size or optimize query duration.
HTTP Keep-Alive
Enable HTTP keep-alive for connections to upstream services (databases, Redis, external APIs). This reuses TCP connections instead of establishing new ones per request, eliminating the TCP handshake and TLS negotiation overhead (50-200ms per new connection).
Frequently Asked Questions
What rate limits should I set for a public API?
Start with conservative limits and adjust based on legitimate usage patterns. A common starting point is 100 requests per minute for authenticated users and 20 requests per minute for anonymous users. Monitor 429 response rates -- if legitimate users frequently hit limits, increase them. Provide higher limits for premium API tiers.
How do I handle pagination when data changes between pages?
Cursor-based pagination handles this naturally because it anchors to a specific position in the sorted data. With offset pagination, document that results may shift between pages. For critical use cases (financial reports, data exports), snapshot the data at the beginning of pagination and paginate over the snapshot.
Should I use REST or GraphQL for performance?
REST with field selection and proper caching is faster for simple, well-defined endpoints. GraphQL eliminates over-fetching and under-fetching for complex data requirements but adds query parsing overhead and makes HTTP caching harder. Use REST for public APIs with caching needs and GraphQL for internal APIs serving complex frontend data requirements.
How do I monitor API performance in production?
Track P50, P95, and P99 response times per endpoint. Set alerts on P95 breaching your SLO (typically 200-500ms). Use distributed tracing to break down time spent in database, cache, external services, and application logic. See our guide on monitoring and observability for detailed setup.
What Is Next
Start by auditing your API endpoints for missing pagination, unprotected public endpoints without rate limiting, and synchronous operations that should be async. These three changes typically reduce P95 response times by 50-70% and prevent the most common production incidents.
For the complete performance engineering perspective, see our pillar guide on scaling your business platform. For the database layer that powers your API, read our query optimization guide.
ECOSIRE builds high-performance APIs for business platforms on Odoo ERP and custom architectures. Contact us for an API performance review.
Published by ECOSIRE — helping businesses scale with AI-powered solutions across Odoo ERP, Shopify eCommerce, and OpenClaw AI.
Written by
ECOSIRE TeamTechnical Writing
The ECOSIRE technical writing team covers Odoo ERP, Shopify eCommerce, AI agents, Power BI analytics, GoHighLevel automation, and enterprise software best practices. Our guides help businesses make informed technology decisions.
ECOSIRE
Grow Your Business with ECOSIRE
Enterprise solutions across ERP, eCommerce, AI, analytics, and automation.
Related Articles
Hepsiburada API Integration with Odoo: Complete Setup Guide
Complete guide to integrating Hepsiburada with Odoo ERP via API. Automate orders, inventory, and fulfilment on Turkey's trusted marketplace.
Shopify Integration Hub: How to Connect Shopify to Any System in 2026
Complete guide to Shopify integrations: API, webhooks, middleware, iPaaS methods. Connect Shopify to ERP, accounting, CRM, marketplaces, and POS systems.
API Rate Limiting: Patterns and Best Practices
Master API rate limiting with token bucket, sliding window, and fixed counter patterns. Protect your backend with NestJS throttler, Redis, and real-world configuration examples.
More from Performance & Scalability
Webhook Debugging and Monitoring: The Complete Troubleshooting Guide
Master webhook debugging with this complete guide covering failure patterns, debugging tools, retry strategies, monitoring dashboards, and security best practices.
k6 Load Testing: Stress-Test Your APIs Before Launch
Master k6 load testing for Node.js APIs. Covers virtual user ramp-ups, thresholds, scenarios, HTTP/2, WebSocket testing, Grafana dashboards, and CI integration patterns.
Nginx Production Configuration: SSL, Caching, and Security
Nginx production configuration guide: SSL termination, HTTP/2, caching headers, security headers, rate limiting, reverse proxy setup, and Cloudflare integration patterns.
Odoo Performance Tuning: PostgreSQL and Server Optimization
Expert guide to Odoo 19 performance tuning. Covers PostgreSQL configuration, indexing, query optimization, Nginx caching, and server sizing for enterprise deployments.
Odoo vs Acumatica: Cloud ERP for Growing Businesses
Odoo vs Acumatica compared for 2026: unique pricing models, scalability, manufacturing depth, and which cloud ERP fits your growth trajectory.
Testing and Monitoring AI Agents in Production
A complete guide to testing and monitoring AI agents in production environments. Covers evaluation frameworks, observability, drift detection, and incident response for OpenClaw deployments.