Bu makale şu anda yalnızca İngilizce olarak mevcuttur. Çeviri yakında eklenecektir.
Performance & Scalability serimizin bir parçası
Tam kılavuzu okuyunAPI Performance: Rate Limiting, Pagination & Async Processing
Your API is only as fast as its slowest endpoint under peak load. A single unoptimized endpoint that holds database connections for 5 seconds can exhaust your connection pool and cascade failures across your entire platform. API performance engineering focuses on three pillars: protecting your API from overload (rate limiting), handling large datasets efficiently (pagination), and moving expensive operations out of the request cycle (async processing).
Key Takeaways
- Token bucket and sliding window are the two rate limiting algorithms that cover 95% of use cases -- choose based on whether you want burst tolerance or strict enforcement
- Cursor-based pagination outperforms offset pagination for large datasets because it avoids counting skipped rows
- Async processing with job queues reduces P95 response times by moving email sending, PDF generation, and webhook delivery out of the request path
- Response compression with Brotli reduces payload sizes by 70-85%, translating directly to faster client-side rendering
Rate Limiting Algorithms
Rate limiting protects your API from abuse, ensures fair resource allocation, and prevents cascade failures during traffic spikes. The algorithm you choose determines how bursts are handled and how predictable the limiting behavior is for consumers.
| Algorithm | Burst Handling | Memory Usage | Precision | Best For | |---|---|---|---|---| | Fixed window | Allows 2x burst at window boundary | Very low | Low | Simple use cases, internal APIs | | Sliding window log | No bursts, precise | High (stores timestamps) | Very high | Financial APIs, strict compliance | | Sliding window counter | Minimal boundary burst | Low | High | General-purpose public APIs | | Token bucket | Allows controlled bursts | Low | Moderate | APIs with natural burst patterns | | Leaky bucket | Smooths all traffic | Low | High | APIs requiring steady throughput |
Token Bucket
The token bucket algorithm is the most practical choice for most APIs. A bucket holds tokens up to a maximum capacity. Tokens are added at a fixed rate (the refill rate). Each request consumes one token. If the bucket is empty, the request is rejected or queued.
The key advantage of token bucket is burst tolerance. If a client has not made requests for a while, their bucket is full and they can make a burst of requests up to the bucket capacity. This matches natural usage patterns -- a client that loads a dashboard might make 20 requests in rapid succession, then nothing for 30 seconds.
Configuration example for an eCommerce API:
- Bucket size: 100 tokens
- Refill rate: 10 tokens per second
- This allows bursts of up to 100 requests while sustaining 10 requests per second long-term
Sliding Window Counter
The sliding window counter combines the precision of sliding window log with the memory efficiency of fixed window. It maintains counters for the current and previous window, then calculates a weighted count based on how far into the current window the request falls.
For a 60-second window evaluated 45 seconds in, the effective count is: (previous window count * 0.25) + (current window count). This eliminates the boundary burst problem of fixed windows without storing individual request timestamps.
Implementation with Redis
Redis is the standard backing store for distributed rate limiting because it provides atomic increment operations with TTL. Use INCR with EXPIRE for fixed windows, or sorted sets with ZADD and ZRANGEBYSCORE for sliding windows. For token bucket, Redis Lua scripts provide atomic check-and-decrement operations.
Rate limiting headers communicate limits to API consumers:
X-RateLimit-Limit-- maximum requests allowed in the windowX-RateLimit-Remaining-- requests remaining in the current windowX-RateLimit-Reset-- Unix timestamp when the window resetsRetry-After-- seconds until the client should retry (on 429 responses)
Pagination Strategies
Every list endpoint must be paginated. Returning unbounded result sets wastes bandwidth, strains the database, and risks timeout errors as data grows.
Offset Pagination
Offset pagination uses LIMIT and OFFSET SQL clauses. The client requests ?page=3&limit=20, and the server translates to OFFSET 40 LIMIT 20.
Advantages:
- Simple to implement and understand
- Clients can jump to any page directly
- Total count enables "Page X of Y" UI
Disadvantages:
- Performance degrades with high offsets --
OFFSET 1000000still scans 1,000,000 rows before returning results - Inconsistent results when data changes between pages (rows shift as new data is inserted or deleted)
- Total count query (COUNT(*)) can be expensive on large tables
Cursor-Based Pagination
Cursor-based pagination uses an opaque cursor (typically an encoded primary key or timestamp) to mark the position in the result set. The client requests ?cursor=abc123&limit=20, and the server uses the cursor as a WHERE clause: WHERE id > decoded(abc123) LIMIT 20.
Advantages:
- Consistent performance regardless of position in the dataset -- no offset scanning
- Stable results even when data changes between pages
- Natural fit for infinite scroll and real-time feeds
Disadvantages:
- Cannot jump to arbitrary pages (no "Go to page 50")
- More complex to implement, especially with multi-column sort orders
- Total count must be provided separately if needed
Which Pagination to Use
| Scenario | Recommendation | Reason | |---|---|---| | Admin data tables with page numbers | Offset | Users expect page navigation | | Mobile infinite scroll | Cursor | Performance at any depth | | API consumed by integrations | Cursor | Stable pagination for batch processing | | Small datasets (under 10,000 rows) | Either | Performance difference is negligible | | Large datasets (over 100,000 rows) | Cursor | Offset becomes unusably slow | | Real-time feeds (chat, notifications) | Cursor | Consistency as new data arrives |
Pagination Response Format
A well-designed pagination response includes metadata that clients need to navigate:
{
"data": [],
"pagination": {
"total": 15432,
"limit": 20,
"hasMore": true,
"nextCursor": "eyJpZCI6MTAwfQ=="
}
}
Async Processing with Job Queues
Synchronous API endpoints should return responses within 200ms. Any operation that takes longer -- sending emails, generating PDFs, processing images, calling external APIs, running reports -- should be moved to a background job queue.
The Job Queue Pattern
- The API endpoint validates the request and creates a job record
- The job is placed on a queue (Redis, RabbitMQ, SQS)
- The API returns immediately with a 202 Accepted response and a job ID
- A worker process picks up the job and executes it asynchronously
- The client polls for job status or receives a webhook callback on completion
Common Async Use Cases
Email sending -- SMTP operations take 500ms-3s depending on the provider. Queueing emails reduces API response time and allows retry logic for transient failures without blocking the user.
PDF generation -- Generating invoices, reports, or export files is CPU-intensive and memory-heavy. Running these in dedicated workers prevents resource contention with API request handling.
Webhook delivery -- Outgoing webhooks depend on third-party server availability. Queue webhook deliveries with exponential backoff retry (1s, 2s, 4s, 8s, up to 5 minutes) to handle temporary failures without blocking your system.
Data import and export -- Processing CSV uploads with 100,000 rows should never happen in a request cycle. Accept the upload, return a job ID, and process rows in batches.
Queue Selection
| Queue Technology | Best For | Considerations | |---|---|---| | BullMQ (Redis-backed) | Node.js applications, NestJS integration | Great developer experience, built-in dashboard | | RabbitMQ | Multi-language systems, complex routing | Mature, supports message acknowledgment patterns | | AWS SQS | Serverless, managed infrastructure | No server management, pay-per-message | | Kafka | Event streaming, high throughput | Overkill for simple job queues, excellent for event sourcing |
Response Optimization
Beyond application logic, the response itself can be optimized for size and delivery speed.
Compression
Enable response compression to reduce payload sizes over the network. Modern compression algorithms significantly reduce text-based payloads (JSON, HTML, CSS, JavaScript).
| Algorithm | Compression Ratio | CPU Cost | Browser Support | |---|---|---|---| | gzip | 60-75% reduction | Low | Universal | | Brotli | 70-85% reduction | Moderate | All modern browsers | | zstd | 70-85% reduction | Low | Emerging (not yet universal) |
Use Brotli for static assets (pre-compressed at build time) and gzip as a fallback for dynamic responses. In NestJS, the compression middleware handles this automatically, but in production, let Nginx handle compression to offload CPU from your application server.
Field Selection
Allow API consumers to request only the fields they need. GraphQL does this inherently, but REST APIs can support field selection with a ?fields=id,name,price query parameter. This reduces payload size and can optimize database queries by selecting only needed columns.
Response Caching Headers
Set appropriate Cache-Control headers on API responses. Public list endpoints (products, categories) can use Cache-Control: public, max-age=300 to cache for 5 minutes. Authenticated endpoints should use Cache-Control: private, no-cache to prevent CDN caching while allowing browser caching with revalidation.
For more on caching strategies, see our detailed guide on Redis, CDN, and HTTP caching.
Connection Management
Database and HTTP connections are finite resources that must be managed carefully under load.
Database Connection Pooling
A connection pool maintains a set of reusable database connections. Without pooling, each API request opens a new database connection (50-100ms overhead) and closes it after the response. With pooling, requests borrow connections from the pool and return them when done.
Pool sizing formula: connections = (core_count * 2) + effective_spindle_count. For a 4-core server with SSD storage, 10-20 connections per application instance is a good starting point. Monitor pool utilization -- if it regularly exceeds 80%, either increase the pool size or optimize query duration.
HTTP Keep-Alive
Enable HTTP keep-alive for connections to upstream services (databases, Redis, external APIs). This reuses TCP connections instead of establishing new ones per request, eliminating the TCP handshake and TLS negotiation overhead (50-200ms per new connection).
Frequently Asked Questions
What rate limits should I set for a public API?
Start with conservative limits and adjust based on legitimate usage patterns. A common starting point is 100 requests per minute for authenticated users and 20 requests per minute for anonymous users. Monitor 429 response rates -- if legitimate users frequently hit limits, increase them. Provide higher limits for premium API tiers.
How do I handle pagination when data changes between pages?
Cursor-based pagination handles this naturally because it anchors to a specific position in the sorted data. With offset pagination, document that results may shift between pages. For critical use cases (financial reports, data exports), snapshot the data at the beginning of pagination and paginate over the snapshot.
Should I use REST or GraphQL for performance?
REST with field selection and proper caching is faster for simple, well-defined endpoints. GraphQL eliminates over-fetching and under-fetching for complex data requirements but adds query parsing overhead and makes HTTP caching harder. Use REST for public APIs with caching needs and GraphQL for internal APIs serving complex frontend data requirements.
How do I monitor API performance in production?
Track P50, P95, and P99 response times per endpoint. Set alerts on P95 breaching your SLO (typically 200-500ms). Use distributed tracing to break down time spent in database, cache, external services, and application logic. See our guide on monitoring and observability for detailed setup.
What Is Next
Start by auditing your API endpoints for missing pagination, unprotected public endpoints without rate limiting, and synchronous operations that should be async. These three changes typically reduce P95 response times by 50-70% and prevent the most common production incidents.
For the complete performance engineering perspective, see our pillar guide on scaling your business platform. For the database layer that powers your API, read our query optimization guide.
ECOSIRE builds high-performance APIs for business platforms on Odoo ERP and custom architectures. Contact us for an API performance review.
Published by ECOSIRE — helping businesses scale with AI-powered solutions across Odoo ERP, Shopify eCommerce, and OpenClaw AI.
Yazan
ECOSIRE Research and Development Team
ECOSIRE'da kurumsal düzeyde dijital ürünler geliştiriyor. Odoo entegrasyonları, e-ticaret otomasyonu ve yapay zeka destekli iş çözümleri hakkında içgörüler paylaşıyor.
İlgili Makaleler
Caching Strategies: Redis, CDN & HTTP Caching for Web Applications
Implement multi-layer caching with Redis, CDN edge caching, and HTTP cache headers to reduce latency by 90% and cut infrastructure costs.
Core Web Vitals Optimization: LCP, FID & CLS for eCommerce Sites
Optimize Core Web Vitals for eCommerce. Improve LCP, INP, and CLS scores to boost SEO rankings and reduce cart abandonment by 24%.
Data Mapping & Transformation: Handling Different APIs & Data Formats
Master field mapping, data normalization, unit conversion, currency handling, and category taxonomy mapping across eCommerce APIs and data formats.
Performance & Scalability serisinden daha fazlası
Caching Strategies: Redis, CDN & HTTP Caching for Web Applications
Implement multi-layer caching with Redis, CDN edge caching, and HTTP cache headers to reduce latency by 90% and cut infrastructure costs.
Core Web Vitals Optimization: LCP, FID & CLS for eCommerce Sites
Optimize Core Web Vitals for eCommerce. Improve LCP, INP, and CLS scores to boost SEO rankings and reduce cart abandonment by 24%.
Database Query Optimization: Indexes, Execution Plans & Partitioning
Optimize PostgreSQL performance with proper indexing, EXPLAIN ANALYZE reading, N+1 detection, and partitioning strategies for growing datasets.
Integration Monitoring: Detecting Sync Failures Before They Cost Revenue
Build integration monitoring with health checks, error categorization, retry strategies, dead letter queues, and alerting for multi-channel eCommerce sync.
Load Testing Your eCommerce Platform: Preparing for Black Friday Traffic
Prepare your eCommerce site for Black Friday with load testing strategies using k6, Artillery, and Locust. Learn traffic modeling and bottleneck identification.
Monitoring & Observability: APM, Logging & Alerting Best Practices
Build production observability with the three pillars: metrics, logs, and traces. Compare APM tools and design alerts that reduce noise and catch real issues.