Database Query Optimization: Indexes, Execution Plans & Partitioning

Optimize PostgreSQL performance with proper indexing, EXPLAIN ANALYZE reading, N+1 detection, and partitioning strategies for growing datasets.

E
ECOSIRE Research and Development Team
|March 15, 202610 min read2.2k Words|

Part of our Performance & Scalability series

Read the complete guide

Database Query Optimization: Indexes, Execution Plans & Partitioning

A single missing index can turn a 2-millisecond query into a 20-second table scan. As your database grows from thousands to millions of rows, the difference between an optimized and unoptimized query is the difference between a responsive application and one that times out under load. Database optimization delivers the highest return on engineering time of any performance work you can do.

Key Takeaways

  • EXPLAIN ANALYZE is your most powerful diagnostic tool -- learn to read execution plans before optimizing anything
  • Choose index types strategically: B-tree for equality and range, GIN for full-text and JSONB, partial indexes for filtered subsets
  • N+1 queries are the most common performance killer in ORM-based applications -- detect them early with query logging
  • Table partitioning becomes essential when tables exceed 10-50 million rows, reducing query planning time and enabling efficient data lifecycle management

Reading Execution Plans with EXPLAIN ANALYZE

Before optimizing any query, you must understand how PostgreSQL currently executes it. EXPLAIN ANALYZE runs the query and shows the actual execution plan with real timing data.

A basic EXPLAIN ANALYZE output shows you the planner's chosen strategy, the estimated versus actual row counts, and the time spent at each step. The key metrics to focus on are:

  • Seq Scan -- the database reads every row in the table. Acceptable for small tables (under 10,000 rows) but a red flag for larger ones.
  • Index Scan -- the database uses an index to find matching rows efficiently. This is what you want for filtered queries on large tables.
  • Index Only Scan -- the database answers the query entirely from the index without touching the table. The fastest scan type.
  • Nested Loop -- joins tables by scanning the inner table once per row in the outer table. Efficient when the inner scan uses an index.
  • Hash Join -- builds a hash table from one side of the join, then probes it with the other. Efficient for larger result sets.
  • Sort -- an explicit sort step, often for ORDER BY. Watch for sorts that spill to disk (indicated by "Sort Method: external merge").

What to Look For

The most important signal in an execution plan is the gap between estimated and actual rows. When PostgreSQL estimates 10 rows but finds 100,000, it chose the wrong plan. This happens when table statistics are stale -- run ANALYZE on the table to update them.

Watch for sequential scans on large tables, sorts without indexes, and nested loops with sequential scans on the inner table. Each of these patterns indicates a missing index or a query that needs rewriting.


Index Types and When to Use Them

PostgreSQL offers several index types, each optimized for different query patterns. Choosing the right type is critical -- a GIN index on a column that only needs equality checks wastes storage and slows writes without improving reads.

Index TypeBest ForExample Use CaseStorage Overhead
B-tree (default)Equality, range, sorting, LIKE prefixWHERE status = 'active', WHERE created_at > '2026-01-01'Low to moderate
HashEquality only (no range)WHERE uuid = '...' (rare, B-tree usually sufficient)Low
GIN (Generalized Inverted)Full-text search, JSONB containment, arraysJSONB containment queries, full-text search, array operationsHigh
GiST (Generalized Search Tree)Geometric data, range types, nearest-neighborNearest-neighbor spatial queries, range type overlap queriesModerate
BRIN (Block Range Index)Naturally ordered data (timestamps, sequences)WHERE created_at BETWEEN '2026-01-01' AND '2026-01-31' on append-only tablesVery low
PartialFiltered subsets of dataWHERE status = 'pending' (index only pending rows)Low

B-tree Indexes

B-tree is the default and most versatile index type. It supports equality, range comparisons (less than, greater than, BETWEEN), sorting (ORDER BY), and prefix pattern matching (LIKE 'abc%'). For most columns in WHERE, JOIN, and ORDER BY clauses, a B-tree index is the correct choice.

Composite indexes combine multiple columns into a single B-tree. Column order matters: the index on (status, created_at) efficiently supports queries filtering on status alone or on both status and created_at, but not on created_at alone. Place the most selective column first and the column used for range filtering last.

GIN Indexes

GIN indexes excel at searching within composite values. They are essential for full-text search (tsvector columns), JSONB containment queries, and array overlap queries. GIN indexes are larger and slower to update than B-tree indexes, so use them only where B-tree cannot serve the query pattern.

For JSONB columns that store flexible attributes, a GIN index on the entire column supports any key-based query. For columns where you only query specific keys, a B-tree index on a generated column or expression is more efficient.

Partial Indexes

Partial indexes only index rows matching a WHERE condition. They are powerful for tables where queries consistently filter for a small subset of data.

For example, if your orders table has 10 million rows but you almost exclusively query active orders (5% of the table), a partial index on (customer_id, created_at) WHERE status = 'active' is 20x smaller than a full index and just as fast for your actual queries.


Detecting and Fixing N+1 Queries

The N+1 query problem is the most common performance issue in applications using ORMs. It occurs when code loads a list of N records, then executes one additional query per record to load related data, resulting in N+1 total queries instead of 1-2.

How N+1 Queries Happen

Consider loading a list of orders with their customer names. A naive implementation loads the orders list (1 query), then for each order, loads the customer (N queries). With 100 orders, this generates 101 database round trips. At 1ms per query, that is 101ms -- but under concurrent load with connection pool contention, it can easily become 500ms or more.

Detection Methods

  1. Query logging -- enable PostgreSQL query logging temporarily and look for repeated identical queries with different parameter values
  2. ORM-level logging -- Drizzle ORM, Prisma, and TypeORM all support query logging that shows every SQL statement executed
  3. APM tools -- Datadog, New Relic, and Sentry can group queries by endpoint and highlight N+1 patterns automatically
  4. pg_stat_statements -- this PostgreSQL extension tracks query execution statistics and reveals frequently executed identical query templates

Fixing N+1 Queries

The fix depends on your ORM and query pattern:

  • Eager loading -- tell the ORM to load related data in the initial query using JOINs. In Drizzle, use the with option in query builders.
  • Batch loading -- collect all foreign key IDs, then load related records in a single WHERE id IN (...) query. This is the DataLoader pattern.
  • Denormalization -- for read-heavy use cases, store the related data directly on the parent record. Trade write complexity for read performance.

Query Rewriting Techniques

Sometimes the query itself needs restructuring, not just better indexes.

Subquery to JOIN Conversion

Correlated subqueries execute once per row in the outer query. Converting them to JOINs allows PostgreSQL to use more efficient join strategies.

Instead of selecting orders with a subquery that looks up the latest order date per customer, rewrite it as a JOIN with a derived table or a window function. The JOIN version allows PostgreSQL to choose between nested loop, hash join, and merge join based on data distribution.

Common Table Expressions (CTEs)

In PostgreSQL 12 and later, CTEs are inlined by default, meaning the optimizer can push predicates into them. Use CTEs for readability without worrying about performance fences. For cases where you explicitly want materialization (to prevent re-execution of expensive subqueries), add the MATERIALIZED keyword.

Window Functions vs GROUP BY

When you need both detail rows and aggregates, window functions avoid the need for a self-join or subquery. Calculating a running total, ranking within groups, or comparing each row to the group average are all more efficient with window functions than with correlated subqueries.


Table Partitioning Strategies

When tables grow beyond 10-50 million rows, even well-indexed queries slow down due to index depth, vacuum overhead, and planner complexity. Partitioning divides a large table into smaller physical chunks while maintaining a single logical table interface.

Partition Types

StrategyMechanismBest For
Range partitioningPartition by value ranges (date ranges, ID ranges)Time-series data, logs, orders by date
List partitioningPartition by discrete valuesMulti-tenant data by organization_id, orders by region
Hash partitioningPartition by hash of a columnEven distribution when no natural range or list key exists

Range Partitioning by Date

The most common pattern is monthly partitioning by a timestamp column. Each month's data lives in its own partition. Queries that filter by date automatically scan only the relevant partitions (partition pruning).

Benefits of time-based partitioning:

  • Query performance -- queries for recent data only scan recent partitions
  • Maintenance -- VACUUM and ANALYZE run faster on smaller partitions
  • Data lifecycle -- dropping old partitions is instant compared to deleting millions of rows
  • Backup efficiency -- back up only recent partitions for point-in-time recovery

Partitioning Considerations

Partitioning adds complexity. Every query must include the partition key in its WHERE clause for partition pruning to work. Unique constraints must include the partition key. Foreign keys referencing partitioned tables have limitations. Start partitioning only when you have measured that table size is causing performance degradation.


PostgreSQL Configuration Tuning

Default PostgreSQL configuration is conservative, designed to run on minimal hardware. Production workloads benefit from tuning key parameters.

ParameterDefaultRecommended (16GB RAM server)Purpose
shared_buffers128MB4GB (25% of RAM)In-memory cache for table and index data
effective_cache_size4GB12GB (75% of RAM)Planner hint for OS file cache availability
work_mem4MB64MBMemory per sort/hash operation (careful with concurrency)
maintenance_work_mem64MB1GBMemory for VACUUM, CREATE INDEX, ALTER TABLE
random_page_cost4.01.1 (SSD storage)Cost estimate for random I/O (lower for SSD)
effective_io_concurrency1200 (SSD storage)Concurrent I/O operations for bitmap heap scans
max_connections100200 (with PgBouncer)Use connection pooling to keep this reasonable

These settings must be tuned for your specific hardware and workload. Monitor pg_stat_bgwriter, pg_stat_activity, and pg_stat_user_tables to validate that changes improve performance.


Frequently Asked Questions

How many indexes should a table have?

There is no fixed limit, but each index slows down INSERT, UPDATE, and DELETE operations because the index must be maintained. A good rule of thumb is to create indexes for columns that appear in WHERE, JOIN ON, and ORDER BY clauses of your most frequent queries. Use pg_stat_user_indexes to find unused indexes that can be dropped.

Should I use UUID or integer primary keys for performance?

Integer primary keys (BIGSERIAL) are faster for joins and indexing because they are smaller (8 bytes vs 16 bytes) and naturally ordered. UUIDs provide global uniqueness without coordination, which matters for distributed systems. For most applications, use UUIDs for external-facing identifiers and integers for internal joins.

When should I switch from a single database to read replicas?

When your read workload exceeds 70-80% of your database's capacity, or when reporting queries compete with transactional queries for resources. Read replicas handle the read load while the primary focuses on writes. This is typically needed at 5,000-10,000 concurrent users for a typical web application.

How do I handle slow queries in production without downtime?

Create indexes with the CONCURRENTLY option to avoid locking the table. Use pg_stat_statements to identify the slowest queries. Deploy query optimizations behind feature flags. For schema changes that rewrite tables, use tools like pg_repack to reorganize tables without locking.


What Is Next

Database optimization is the foundation of platform performance. Start by enabling pg_stat_statements, identify your slowest queries, and work through them systematically with EXPLAIN ANALYZE. Add missing indexes, fix N+1 patterns, and consider partitioning for your largest tables.

For the broader performance picture, see our pillar guide on scaling your business platform from startup to enterprise. To learn about the next layer of optimization, read our guide on caching strategies with Redis, CDN, and HTTP caching.

ECOSIRE provides expert database optimization for PostgreSQL-backed platforms including Odoo ERP and custom applications. Contact us for a database performance audit.


Published by ECOSIRE — helping businesses scale with AI-powered solutions across Odoo ERP, Shopify eCommerce, and OpenClaw AI.

E

Written by

ECOSIRE Research and Development Team

Building enterprise-grade digital products at ECOSIRE. Sharing insights on Odoo integrations, e-commerce automation, and AI-powered business solutions.

Chat on WhatsApp