Part of our Performance & Scalability series
Read the complete guideDatabase Query Optimization: Indexes, Execution Plans & Partitioning
A single missing index can turn a 2-millisecond query into a 20-second table scan. As your database grows from thousands to millions of rows, the difference between an optimized and unoptimized query is the difference between a responsive application and one that times out under load. Database optimization delivers the highest return on engineering time of any performance work you can do.
Key Takeaways
- EXPLAIN ANALYZE is your most powerful diagnostic tool -- learn to read execution plans before optimizing anything
- Choose index types strategically: B-tree for equality and range, GIN for full-text and JSONB, partial indexes for filtered subsets
- N+1 queries are the most common performance killer in ORM-based applications -- detect them early with query logging
- Table partitioning becomes essential when tables exceed 10-50 million rows, reducing query planning time and enabling efficient data lifecycle management
Reading Execution Plans with EXPLAIN ANALYZE
Before optimizing any query, you must understand how PostgreSQL currently executes it. EXPLAIN ANALYZE runs the query and shows the actual execution plan with real timing data.
A basic EXPLAIN ANALYZE output shows you the planner's chosen strategy, the estimated versus actual row counts, and the time spent at each step. The key metrics to focus on are:
- Seq Scan -- the database reads every row in the table. Acceptable for small tables (under 10,000 rows) but a red flag for larger ones.
- Index Scan -- the database uses an index to find matching rows efficiently. This is what you want for filtered queries on large tables.
- Index Only Scan -- the database answers the query entirely from the index without touching the table. The fastest scan type.
- Nested Loop -- joins tables by scanning the inner table once per row in the outer table. Efficient when the inner scan uses an index.
- Hash Join -- builds a hash table from one side of the join, then probes it with the other. Efficient for larger result sets.
- Sort -- an explicit sort step, often for ORDER BY. Watch for sorts that spill to disk (indicated by "Sort Method: external merge").
What to Look For
The most important signal in an execution plan is the gap between estimated and actual rows. When PostgreSQL estimates 10 rows but finds 100,000, it chose the wrong plan. This happens when table statistics are stale -- run ANALYZE on the table to update them.
Watch for sequential scans on large tables, sorts without indexes, and nested loops with sequential scans on the inner table. Each of these patterns indicates a missing index or a query that needs rewriting.
Index Types and When to Use Them
PostgreSQL offers several index types, each optimized for different query patterns. Choosing the right type is critical -- a GIN index on a column that only needs equality checks wastes storage and slows writes without improving reads.
| Index Type | Best For | Example Use Case | Storage Overhead |
|---|---|---|---|
| B-tree (default) | Equality, range, sorting, LIKE prefix | WHERE status = 'active', WHERE created_at > '2026-01-01' | Low to moderate |
| Hash | Equality only (no range) | WHERE uuid = '...' (rare, B-tree usually sufficient) | Low |
| GIN (Generalized Inverted) | Full-text search, JSONB containment, arrays | JSONB containment queries, full-text search, array operations | High |
| GiST (Generalized Search Tree) | Geometric data, range types, nearest-neighbor | Nearest-neighbor spatial queries, range type overlap queries | Moderate |
| BRIN (Block Range Index) | Naturally ordered data (timestamps, sequences) | WHERE created_at BETWEEN '2026-01-01' AND '2026-01-31' on append-only tables | Very low |
| Partial | Filtered subsets of data | WHERE status = 'pending' (index only pending rows) | Low |
B-tree Indexes
B-tree is the default and most versatile index type. It supports equality, range comparisons (less than, greater than, BETWEEN), sorting (ORDER BY), and prefix pattern matching (LIKE 'abc%'). For most columns in WHERE, JOIN, and ORDER BY clauses, a B-tree index is the correct choice.
Composite indexes combine multiple columns into a single B-tree. Column order matters: the index on (status, created_at) efficiently supports queries filtering on status alone or on both status and created_at, but not on created_at alone. Place the most selective column first and the column used for range filtering last.
GIN Indexes
GIN indexes excel at searching within composite values. They are essential for full-text search (tsvector columns), JSONB containment queries, and array overlap queries. GIN indexes are larger and slower to update than B-tree indexes, so use them only where B-tree cannot serve the query pattern.
For JSONB columns that store flexible attributes, a GIN index on the entire column supports any key-based query. For columns where you only query specific keys, a B-tree index on a generated column or expression is more efficient.
Partial Indexes
Partial indexes only index rows matching a WHERE condition. They are powerful for tables where queries consistently filter for a small subset of data.
For example, if your orders table has 10 million rows but you almost exclusively query active orders (5% of the table), a partial index on (customer_id, created_at) WHERE status = 'active' is 20x smaller than a full index and just as fast for your actual queries.
Detecting and Fixing N+1 Queries
The N+1 query problem is the most common performance issue in applications using ORMs. It occurs when code loads a list of N records, then executes one additional query per record to load related data, resulting in N+1 total queries instead of 1-2.
How N+1 Queries Happen
Consider loading a list of orders with their customer names. A naive implementation loads the orders list (1 query), then for each order, loads the customer (N queries). With 100 orders, this generates 101 database round trips. At 1ms per query, that is 101ms -- but under concurrent load with connection pool contention, it can easily become 500ms or more.
Detection Methods
- Query logging -- enable PostgreSQL query logging temporarily and look for repeated identical queries with different parameter values
- ORM-level logging -- Drizzle ORM, Prisma, and TypeORM all support query logging that shows every SQL statement executed
- APM tools -- Datadog, New Relic, and Sentry can group queries by endpoint and highlight N+1 patterns automatically
- pg_stat_statements -- this PostgreSQL extension tracks query execution statistics and reveals frequently executed identical query templates
Fixing N+1 Queries
The fix depends on your ORM and query pattern:
- Eager loading -- tell the ORM to load related data in the initial query using JOINs. In Drizzle, use the
withoption in query builders. - Batch loading -- collect all foreign key IDs, then load related records in a single WHERE id IN (...) query. This is the DataLoader pattern.
- Denormalization -- for read-heavy use cases, store the related data directly on the parent record. Trade write complexity for read performance.
Query Rewriting Techniques
Sometimes the query itself needs restructuring, not just better indexes.
Subquery to JOIN Conversion
Correlated subqueries execute once per row in the outer query. Converting them to JOINs allows PostgreSQL to use more efficient join strategies.
Instead of selecting orders with a subquery that looks up the latest order date per customer, rewrite it as a JOIN with a derived table or a window function. The JOIN version allows PostgreSQL to choose between nested loop, hash join, and merge join based on data distribution.
Common Table Expressions (CTEs)
In PostgreSQL 12 and later, CTEs are inlined by default, meaning the optimizer can push predicates into them. Use CTEs for readability without worrying about performance fences. For cases where you explicitly want materialization (to prevent re-execution of expensive subqueries), add the MATERIALIZED keyword.
Window Functions vs GROUP BY
When you need both detail rows and aggregates, window functions avoid the need for a self-join or subquery. Calculating a running total, ranking within groups, or comparing each row to the group average are all more efficient with window functions than with correlated subqueries.
Table Partitioning Strategies
When tables grow beyond 10-50 million rows, even well-indexed queries slow down due to index depth, vacuum overhead, and planner complexity. Partitioning divides a large table into smaller physical chunks while maintaining a single logical table interface.
Partition Types
| Strategy | Mechanism | Best For |
|---|---|---|
| Range partitioning | Partition by value ranges (date ranges, ID ranges) | Time-series data, logs, orders by date |
| List partitioning | Partition by discrete values | Multi-tenant data by organization_id, orders by region |
| Hash partitioning | Partition by hash of a column | Even distribution when no natural range or list key exists |
Range Partitioning by Date
The most common pattern is monthly partitioning by a timestamp column. Each month's data lives in its own partition. Queries that filter by date automatically scan only the relevant partitions (partition pruning).
Benefits of time-based partitioning:
- Query performance -- queries for recent data only scan recent partitions
- Maintenance -- VACUUM and ANALYZE run faster on smaller partitions
- Data lifecycle -- dropping old partitions is instant compared to deleting millions of rows
- Backup efficiency -- back up only recent partitions for point-in-time recovery
Partitioning Considerations
Partitioning adds complexity. Every query must include the partition key in its WHERE clause for partition pruning to work. Unique constraints must include the partition key. Foreign keys referencing partitioned tables have limitations. Start partitioning only when you have measured that table size is causing performance degradation.
PostgreSQL Configuration Tuning
Default PostgreSQL configuration is conservative, designed to run on minimal hardware. Production workloads benefit from tuning key parameters.
| Parameter | Default | Recommended (16GB RAM server) | Purpose |
|---|---|---|---|
| shared_buffers | 128MB | 4GB (25% of RAM) | In-memory cache for table and index data |
| effective_cache_size | 4GB | 12GB (75% of RAM) | Planner hint for OS file cache availability |
| work_mem | 4MB | 64MB | Memory per sort/hash operation (careful with concurrency) |
| maintenance_work_mem | 64MB | 1GB | Memory for VACUUM, CREATE INDEX, ALTER TABLE |
| random_page_cost | 4.0 | 1.1 (SSD storage) | Cost estimate for random I/O (lower for SSD) |
| effective_io_concurrency | 1 | 200 (SSD storage) | Concurrent I/O operations for bitmap heap scans |
| max_connections | 100 | 200 (with PgBouncer) | Use connection pooling to keep this reasonable |
These settings must be tuned for your specific hardware and workload. Monitor pg_stat_bgwriter, pg_stat_activity, and pg_stat_user_tables to validate that changes improve performance.
Frequently Asked Questions
How many indexes should a table have?
There is no fixed limit, but each index slows down INSERT, UPDATE, and DELETE operations because the index must be maintained. A good rule of thumb is to create indexes for columns that appear in WHERE, JOIN ON, and ORDER BY clauses of your most frequent queries. Use pg_stat_user_indexes to find unused indexes that can be dropped.
Should I use UUID or integer primary keys for performance?
Integer primary keys (BIGSERIAL) are faster for joins and indexing because they are smaller (8 bytes vs 16 bytes) and naturally ordered. UUIDs provide global uniqueness without coordination, which matters for distributed systems. For most applications, use UUIDs for external-facing identifiers and integers for internal joins.
When should I switch from a single database to read replicas?
When your read workload exceeds 70-80% of your database's capacity, or when reporting queries compete with transactional queries for resources. Read replicas handle the read load while the primary focuses on writes. This is typically needed at 5,000-10,000 concurrent users for a typical web application.
How do I handle slow queries in production without downtime?
Create indexes with the CONCURRENTLY option to avoid locking the table. Use pg_stat_statements to identify the slowest queries. Deploy query optimizations behind feature flags. For schema changes that rewrite tables, use tools like pg_repack to reorganize tables without locking.
What Is Next
Database optimization is the foundation of platform performance. Start by enabling pg_stat_statements, identify your slowest queries, and work through them systematically with EXPLAIN ANALYZE. Add missing indexes, fix N+1 patterns, and consider partitioning for your largest tables.
For the broader performance picture, see our pillar guide on scaling your business platform from startup to enterprise. To learn about the next layer of optimization, read our guide on caching strategies with Redis, CDN, and HTTP caching.
ECOSIRE provides expert database optimization for PostgreSQL-backed platforms including Odoo ERP and custom applications. Contact us for a database performance audit.
Published by ECOSIRE — helping businesses scale with AI-powered solutions across Odoo ERP, Shopify eCommerce, and OpenClaw AI.
Written by
ECOSIRE Research and Development Team
Building enterprise-grade digital products at ECOSIRE. Sharing insights on Odoo integrations, e-commerce automation, and AI-powered business solutions.
Related Articles
AI Agent Performance Optimization: Speed, Accuracy, and Cost Efficiency
Optimize AI agent performance across response time, accuracy, and cost with proven techniques for prompt engineering, caching, model selection, and monitoring.
AI for Inventory Optimization: Reduce Stockouts and Cut Carrying Costs
Deploy AI-powered inventory optimization to reduce stockouts by 30-50% and cut carrying costs by 15-25%. Covers demand forecasting, safety stock, and reorder logic.
AI-Driven Pricing Optimization: Dynamic Pricing That Maximizes Revenue
Implement AI pricing optimization for dynamic pricing, price elasticity modeling, competitive monitoring, and margin maximization across channels.
More from Performance & Scalability
AI Agent Performance Optimization: Speed, Accuracy, and Cost Efficiency
Optimize AI agent performance across response time, accuracy, and cost with proven techniques for prompt engineering, caching, model selection, and monitoring.
Testing and Monitoring AI Agents: Reliability Engineering for Autonomous Systems
Complete guide to testing and monitoring AI agents covering unit testing, integration testing, behavioral testing, observability, and production monitoring strategies.
CDN Performance Optimization: The Complete Guide to Faster Global Delivery
Optimize CDN performance with caching strategies, edge computing, image optimization, and multi-CDN architectures for faster global content delivery.
Load Testing Strategies for Web Applications: Find Breaking Points Before Users Do
Load test web applications with k6, Artillery, and Locust. Covers test design, traffic modeling, performance baselines, and result interpretation strategies.
Mobile SEO for eCommerce: Complete Optimization Guide for 2026
Mobile SEO guide for eCommerce sites. Covers mobile-first indexing, Core Web Vitals, structured data, page speed optimization, and mobile search ranking factors.
Production Monitoring and Alerting: The Complete Setup Guide
Set up production monitoring and alerting with Prometheus, Grafana, and Sentry. Covers metrics, logs, traces, alert policies, and incident response workflows.