Data Warehouse Design: Star Schema for ERP & eCommerce Analytics

Learn dimensional modeling with star schema for ERP and eCommerce analytics covering fact tables, dimension tables, ETL patterns, and query optimization.

E

ECOSIRE Research and Development Team

ECOSIRE-Team

15. März 202611 Min. Lesezeit2.4k Wörter

Dieser Artikel ist derzeit nur auf Englisch verfügbar. Die Übersetzung folgt bald.

Teil unserer Data Analytics & BI-Serie

Den vollständigen Leitfaden lesen

Data Warehouse Design: Star Schema for ERP & eCommerce Analytics

Your ERP database is optimized for transactions --- inserting orders, updating inventory, processing payments. Your eCommerce platform is optimized for serving product pages and processing checkouts. Neither is optimized for answering the questions that drive business decisions: Which product categories are most profitable after returns? Which customer segments have growing lifetime value? Where are the bottlenecks in our supply chain?

That gap is what a data warehouse fills. And the star schema is the design pattern that makes analytical queries fast, intuitive, and maintainable.

Key Takeaways

  • Star schema separates business metrics (facts) from descriptive context (dimensions), making queries intuitive and fast
  • ERP and eCommerce analytics typically need four to six fact tables and eight to twelve dimension tables to cover core business questions
  • ETL pipelines should use incremental loading with slowly changing dimensions to handle historical analysis without reprocessing all data
  • A well-designed star schema reduces query complexity by 60 to 80 percent compared to querying normalized operational databases directly

Why Not Query the ERP Directly?

Before investing in a separate data warehouse, many companies try to run analytical queries against their operational database. This fails for three reasons.

Performance. Analytical queries scan millions of rows, compute aggregations, and join many tables. Running these against the production database slows down the ERP for every user. A report that scans six months of order data can lock tables and degrade checkout performance on your Shopify store.

Complexity. Operational databases are normalized --- designed to minimize data redundancy. A simple question like "total revenue by product category by month" might require joining eight tables in Odoo's PostgreSQL database. In a star schema, the same query joins two tables.

History. Operational systems overwrite data. When a customer changes their address, the old address is gone. When a product is recategorized, historical reports retroactively change. A data warehouse preserves history through slowly changing dimensions.

Multi-source. Mid-market companies typically run three to seven systems that contain business data. The data warehouse consolidates all of them. Our guide to ETL pipelines for ERP data covers extraction and loading in detail.


Star Schema Fundamentals

A star schema organizes data into two types of tables: fact tables and dimension tables. Fact tables sit at the center (the star's body), and dimension tables surround them (the star's points).

Fact Tables

Fact tables store measurable business events --- things that happened. Each row represents one event at the lowest meaningful grain.

Characteristics:

  • Contain numeric measures (quantity, amount, duration, count)
  • Contain foreign keys to dimension tables
  • Are typically the largest tables in the warehouse
  • Grow continuously as new events occur
  • Should be at the finest grain that supports business questions

Dimension Tables

Dimension tables store descriptive context --- the who, what, where, when, and how of business events.

Characteristics:

  • Contain textual attributes and hierarchies
  • Are relatively small (thousands to millions of rows, not billions)
  • Change slowly over time
  • Are denormalized for query simplicity
  • Provide the labels, filters, and groupings for reports

The Star Shape

                    Dim: Customer
                         |
    Dim: Product ---  Fact: Sales  --- Dim: Time
                         |
                    Dim: Location

A query like "total revenue by product category by quarter by region" joins the sales fact table to three dimension tables. No subqueries, no complex nested joins --- just straightforward star joins.


Designing Fact Tables for ERP and eCommerce

A typical mid-market company running Odoo ERP and Shopify eCommerce needs four to six fact tables to cover the core analytical use cases.

Fact: Sales

The sales fact table is the cornerstone. Each row represents one line item on a sales order.

| Column | Type | Description | |--------|------|-------------| | sale_key | BIGINT | Surrogate key | | date_key | INT | FK to Dim: Time | | customer_key | INT | FK to Dim: Customer | | product_key | INT | FK to Dim: Product | | location_key | INT | FK to Dim: Location | | channel_key | INT | FK to Dim: Channel | | salesperson_key | INT | FK to Dim: Employee | | quantity | DECIMAL | Units sold | | unit_price | DECIMAL | Price per unit | | discount_amount | DECIMAL | Discount applied | | tax_amount | DECIMAL | Tax charged | | net_amount | DECIMAL | Revenue after discount, before tax | | cost_amount | DECIMAL | Cost of goods sold | | gross_margin | DECIMAL | net_amount minus cost_amount |

Grain: One row per order line item per day.

Fact: Inventory

Tracks inventory levels as periodic snapshots rather than events.

| Column | Type | Description | |--------|------|-------------| | inventory_key | BIGINT | Surrogate key | | date_key | INT | FK to Dim: Time (snapshot date) | | product_key | INT | FK to Dim: Product | | warehouse_key | INT | FK to Dim: Warehouse | | quantity_on_hand | DECIMAL | Current stock | | quantity_reserved | DECIMAL | Allocated to orders | | quantity_available | DECIMAL | On hand minus reserved | | reorder_point | DECIMAL | Minimum before reorder | | stock_value | DECIMAL | Quantity times unit cost |

Grain: One row per product per warehouse per day.

Fact: Production

For manufacturing companies, the production fact tracks work orders.

| Column | Type | Description | |--------|------|-------------| | production_key | BIGINT | Surrogate key | | date_key | INT | FK to Dim: Time | | product_key | INT | FK to Dim: Product | | workcenter_key | INT | FK to Dim: Workcenter | | planned_quantity | DECIMAL | Target output | | actual_quantity | DECIMAL | Actual output | | scrap_quantity | DECIMAL | Waste | | planned_duration_hrs | DECIMAL | Expected time | | actual_duration_hrs | DECIMAL | Actual time | | yield_rate | DECIMAL | actual / planned quantity |

Grain: One row per work order per product per day.

Additional Fact Tables

  • Fact: Purchases --- procurement spend by vendor, product, and time.
  • Fact: Support Tickets --- ticket volume, response time, resolution time by agent, customer, and category.
  • Fact: Web Traffic --- page views, sessions, conversions by page, source, and campaign. Useful for marketing attribution analysis.

Designing Dimension Tables

Dimension tables provide the context that makes fact table numbers meaningful. The key principle is denormalization --- storing redundant data to simplify queries.

Dim: Time

The time dimension is present in every star schema. Pre-calculate calendar attributes to avoid complex date functions in queries.

| Column | Example | Purpose | |--------|---------|---------| | date_key | 20260315 | Integer key (YYYYMMDD) | | full_date | 2026-03-15 | Date value | | day_of_week | Sunday | Grouping | | day_of_month | 15 | Grouping | | week_of_year | 11 | Grouping | | month_name | March | Grouping | | month_number | 3 | Sorting | | quarter | Q1 | Grouping | | year | 2026 | Grouping | | fiscal_quarter | FQ4 | Fiscal year alignment | | fiscal_year | FY2026 | Fiscal year alignment | | is_weekend | TRUE | Filtering | | is_holiday | FALSE | Filtering |

Dim: Customer

Denormalize customer attributes from the CRM, accounting, and eCommerce systems into a single dimension.

| Column | Description | |--------|-------------| | customer_key | Surrogate key | | customer_id | Natural key (Odoo ID) | | customer_name | Full name | | customer_email | Email address | | customer_segment | Enterprise, SMB, Individual | | industry | Manufacturing, Retail, Services | | country | Country name | | region | Geographic region | | city | City | | acquisition_source | Organic, Paid, Referral | | acquisition_date | First purchase date | | rfm_segment | Champion, Loyal, At-Risk | | lifetime_value_tier | High, Medium, Low |

The rfm_segment and lifetime_value_tier columns are calculated fields derived from RFM analysis, updated periodically by the ETL pipeline.

Dim: Product

| Column | Description | |--------|-------------| | product_key | Surrogate key | | product_id | Natural key | | product_name | Display name | | sku | Stock keeping unit | | category_l1 | Top-level category | | category_l2 | Sub-category | | category_l3 | Sub-sub-category | | brand | Brand name | | unit_cost | Current standard cost | | list_price | Current list price | | weight | Shipping weight | | is_active | Currently for sale |


Slowly Changing Dimensions

When a customer moves from New York to London, what should the data warehouse do? The answer depends on the business question.

Type 1: Overwrite

Replace the old value with the new value. The customer's city becomes London, and all historical orders now show London. Use this when historical accuracy of the attribute does not matter.

Type 2: Add New Row

Create a new row for the customer with the new city, an effective date, and an expiration date. Historical orders still point to the old row (New York), and new orders point to the new row (London). This is the most common approach for attributes that affect analysis --- customer segment, employee department, product category.

| customer_key | customer_id | city | effective_date | expiration_date | is_current | |-------------|------------|------|---------------|----------------|-----------| | 1001 | CUST-042 | New York | 2024-01-15 | 2026-02-28 | FALSE | | 1002 | CUST-042 | London | 2026-03-01 | 9999-12-31 | TRUE |

Type 3: Add New Column

Store both old and new values in separate columns. Useful when you need to compare before and after but do not need full history. Less common in practice.

For mid-market companies, use Type 2 for customer segment, employee department, product category, and geographic attributes. Use Type 1 for everything else to keep the warehouse simple.


ETL Design Patterns

The ETL (Extract, Transform, Load) process moves data from source systems into the warehouse. Design patterns that work well for ERP and eCommerce data include the following.

Incremental Loading

Rather than reloading all data on every run, track the last successfully loaded timestamp and only process records modified since then. Odoo's write_date field and Shopify's updated_at parameter make this straightforward.

1. Query source: SELECT * FROM sale_order_line WHERE write_date > last_load_timestamp
2. Transform: Map source fields to warehouse columns, look up dimension keys
3. Load: INSERT new rows, UPDATE changed rows (upsert)
4. Update: Set last_load_timestamp to current run start time

Surrogate Key Management

Dimension tables use surrogate keys (auto-incrementing integers) instead of natural keys (Odoo IDs, Shopify IDs). This decouples the warehouse from source system key formats and handles multi-source consolidation where different systems have conflicting ID schemes.

Late-Arriving Dimensions

Sometimes a fact record arrives before the corresponding dimension record --- an order references a new customer who has not been synced yet. Handle this with a placeholder dimension row that gets updated when the full dimension record arrives.

Refresh Scheduling

| Data Type | Refresh Frequency | Rationale | |-----------|-------------------|-----------| | Sales transactions | Every 15-60 minutes | Near-real-time revenue tracking | | Inventory snapshots | Every 4-6 hours | Balance accuracy vs. database load | | Customer dimensions | Daily | Changes are infrequent | | Product dimensions | Daily | Changes are infrequent | | Financial data | Daily (after close) | Depends on accounting workflows | | Marketing data | Every 1-4 hours | Campaign optimization needs fresher data |

For real-time requirements, see our guide to streaming analytics.


Query Performance Optimization

A well-designed star schema already performs well because of its simple join patterns. Additional optimizations include the following.

Indexes. Create indexes on all dimension foreign keys in fact tables and on commonly filtered dimension attributes (date ranges, customer segments, product categories).

Materialized views. Pre-aggregate common queries: daily revenue by product category, weekly inventory levels by warehouse, monthly customer acquisition by channel. Refresh materialized views after each ETL load.

Partitioning. Partition large fact tables by date (monthly or quarterly). Queries that filter by date range scan only the relevant partitions.

Column statistics. Keep PostgreSQL statistics up to date with ANALYZE after bulk loads so the query planner makes optimal decisions.

These optimizations support the self-service BI experience where business users run ad-hoc queries without performance concerns.


Frequently Asked Questions

How big does the company need to be to justify a data warehouse?

There is no minimum size, but the investment becomes worthwhile when you have multiple data sources that need to be combined for analysis, when operational database queries are slowing down production systems, or when you spend more than 10 hours per week on manual data gathering and report creation. Most companies with 30 or more employees and at least two systems (ERP plus eCommerce) benefit from a warehouse.

Should we use a cloud data warehouse like Snowflake or BigQuery?

For mid-market companies, PostgreSQL handles most analytical workloads well and costs significantly less. Cloud warehouses like Snowflake become attractive when your data exceeds 1 TB, when you need to separate compute from storage for cost optimization, or when you have complex data sharing requirements across organizations. Start with PostgreSQL and migrate when you outgrow it.

How long does it take to build a data warehouse?

A minimum viable warehouse with one fact table (sales), four dimension tables, and an ETL pipeline connecting Odoo and Shopify takes four to eight weeks for an experienced team. Adding fact tables, slowly changing dimensions, and data quality monitoring takes another four to eight weeks per fact table. Plan for three to six months for a comprehensive warehouse covering all major business areas.


What Is Next

A well-designed star schema is the foundation for every analytics capability --- from self-service dashboards to predictive models to embedded analytics. It is part of a broader BI strategy that transforms how your company makes decisions.

ECOSIRE builds data warehouses and analytics pipelines for companies running Odoo, Shopify, and GoHighLevel. Our Odoo consultancy team designs warehouse schemas tailored to your business model, and our OpenClaw AI services layer predictive analytics on top.

Contact us to discuss your data warehouse architecture.


Published by ECOSIRE --- helping businesses scale with AI-powered solutions across Odoo ERP, Shopify eCommerce, and OpenClaw AI.

E

Geschrieben von

ECOSIRE Research and Development Team

Entwicklung von Enterprise-Digitalprodukten bei ECOSIRE. Einblicke in Odoo-Integrationen, E-Commerce-Automatisierung und KI-gestützte Geschäftslösungen.

Chatten Sie auf WhatsApp