本文目前仅提供英文版本。翻译即将推出。
属于我们的Data Analytics & BI系列
阅读完整指南Data Warehouse Design: Star Schema for ERP & eCommerce Analytics
Your ERP database is optimized for transactions --- inserting orders, updating inventory, processing payments. Your eCommerce platform is optimized for serving product pages and processing checkouts. Neither is optimized for answering the questions that drive business decisions: Which product categories are most profitable after returns? Which customer segments have growing lifetime value? Where are the bottlenecks in our supply chain?
That gap is what a data warehouse fills. And the star schema is the design pattern that makes analytical queries fast, intuitive, and maintainable.
Key Takeaways
- Star schema separates business metrics (facts) from descriptive context (dimensions), making queries intuitive and fast
- ERP and eCommerce analytics typically need four to six fact tables and eight to twelve dimension tables to cover core business questions
- ETL pipelines should use incremental loading with slowly changing dimensions to handle historical analysis without reprocessing all data
- A well-designed star schema reduces query complexity by 60 to 80 percent compared to querying normalized operational databases directly
Why Not Query the ERP Directly?
Before investing in a separate data warehouse, many companies try to run analytical queries against their operational database. This fails for three reasons.
Performance. Analytical queries scan millions of rows, compute aggregations, and join many tables. Running these against the production database slows down the ERP for every user. A report that scans six months of order data can lock tables and degrade checkout performance on your Shopify store.
Complexity. Operational databases are normalized --- designed to minimize data redundancy. A simple question like "total revenue by product category by month" might require joining eight tables in Odoo's PostgreSQL database. In a star schema, the same query joins two tables.
History. Operational systems overwrite data. When a customer changes their address, the old address is gone. When a product is recategorized, historical reports retroactively change. A data warehouse preserves history through slowly changing dimensions.
Multi-source. Mid-market companies typically run three to seven systems that contain business data. The data warehouse consolidates all of them. Our guide to ETL pipelines for ERP data covers extraction and loading in detail.
Star Schema Fundamentals
A star schema organizes data into two types of tables: fact tables and dimension tables. Fact tables sit at the center (the star's body), and dimension tables surround them (the star's points).
Fact Tables
Fact tables store measurable business events --- things that happened. Each row represents one event at the lowest meaningful grain.
Characteristics:
- Contain numeric measures (quantity, amount, duration, count)
- Contain foreign keys to dimension tables
- Are typically the largest tables in the warehouse
- Grow continuously as new events occur
- Should be at the finest grain that supports business questions
Dimension Tables
Dimension tables store descriptive context --- the who, what, where, when, and how of business events.
Characteristics:
- Contain textual attributes and hierarchies
- Are relatively small (thousands to millions of rows, not billions)
- Change slowly over time
- Are denormalized for query simplicity
- Provide the labels, filters, and groupings for reports
The Star Shape
Dim: Customer
|
Dim: Product --- Fact: Sales --- Dim: Time
|
Dim: Location
A query like "total revenue by product category by quarter by region" joins the sales fact table to three dimension tables. No subqueries, no complex nested joins --- just straightforward star joins.
Designing Fact Tables for ERP and eCommerce
A typical mid-market company running Odoo ERP and Shopify eCommerce needs four to six fact tables to cover the core analytical use cases.
Fact: Sales
The sales fact table is the cornerstone. Each row represents one line item on a sales order.
| Column | Type | Description | |--------|------|-------------| | sale_key | BIGINT | Surrogate key | | date_key | INT | FK to Dim: Time | | customer_key | INT | FK to Dim: Customer | | product_key | INT | FK to Dim: Product | | location_key | INT | FK to Dim: Location | | channel_key | INT | FK to Dim: Channel | | salesperson_key | INT | FK to Dim: Employee | | quantity | DECIMAL | Units sold | | unit_price | DECIMAL | Price per unit | | discount_amount | DECIMAL | Discount applied | | tax_amount | DECIMAL | Tax charged | | net_amount | DECIMAL | Revenue after discount, before tax | | cost_amount | DECIMAL | Cost of goods sold | | gross_margin | DECIMAL | net_amount minus cost_amount |
Grain: One row per order line item per day.
Fact: Inventory
Tracks inventory levels as periodic snapshots rather than events.
| Column | Type | Description | |--------|------|-------------| | inventory_key | BIGINT | Surrogate key | | date_key | INT | FK to Dim: Time (snapshot date) | | product_key | INT | FK to Dim: Product | | warehouse_key | INT | FK to Dim: Warehouse | | quantity_on_hand | DECIMAL | Current stock | | quantity_reserved | DECIMAL | Allocated to orders | | quantity_available | DECIMAL | On hand minus reserved | | reorder_point | DECIMAL | Minimum before reorder | | stock_value | DECIMAL | Quantity times unit cost |
Grain: One row per product per warehouse per day.
Fact: Production
For manufacturing companies, the production fact tracks work orders.
| Column | Type | Description | |--------|------|-------------| | production_key | BIGINT | Surrogate key | | date_key | INT | FK to Dim: Time | | product_key | INT | FK to Dim: Product | | workcenter_key | INT | FK to Dim: Workcenter | | planned_quantity | DECIMAL | Target output | | actual_quantity | DECIMAL | Actual output | | scrap_quantity | DECIMAL | Waste | | planned_duration_hrs | DECIMAL | Expected time | | actual_duration_hrs | DECIMAL | Actual time | | yield_rate | DECIMAL | actual / planned quantity |
Grain: One row per work order per product per day.
Additional Fact Tables
- Fact: Purchases --- procurement spend by vendor, product, and time.
- Fact: Support Tickets --- ticket volume, response time, resolution time by agent, customer, and category.
- Fact: Web Traffic --- page views, sessions, conversions by page, source, and campaign. Useful for marketing attribution analysis.
Designing Dimension Tables
Dimension tables provide the context that makes fact table numbers meaningful. The key principle is denormalization --- storing redundant data to simplify queries.
Dim: Time
The time dimension is present in every star schema. Pre-calculate calendar attributes to avoid complex date functions in queries.
| Column | Example | Purpose | |--------|---------|---------| | date_key | 20260315 | Integer key (YYYYMMDD) | | full_date | 2026-03-15 | Date value | | day_of_week | Sunday | Grouping | | day_of_month | 15 | Grouping | | week_of_year | 11 | Grouping | | month_name | March | Grouping | | month_number | 3 | Sorting | | quarter | Q1 | Grouping | | year | 2026 | Grouping | | fiscal_quarter | FQ4 | Fiscal year alignment | | fiscal_year | FY2026 | Fiscal year alignment | | is_weekend | TRUE | Filtering | | is_holiday | FALSE | Filtering |
Dim: Customer
Denormalize customer attributes from the CRM, accounting, and eCommerce systems into a single dimension.
| Column | Description | |--------|-------------| | customer_key | Surrogate key | | customer_id | Natural key (Odoo ID) | | customer_name | Full name | | customer_email | Email address | | customer_segment | Enterprise, SMB, Individual | | industry | Manufacturing, Retail, Services | | country | Country name | | region | Geographic region | | city | City | | acquisition_source | Organic, Paid, Referral | | acquisition_date | First purchase date | | rfm_segment | Champion, Loyal, At-Risk | | lifetime_value_tier | High, Medium, Low |
The rfm_segment and lifetime_value_tier columns are calculated fields derived from RFM analysis, updated periodically by the ETL pipeline.
Dim: Product
| Column | Description | |--------|-------------| | product_key | Surrogate key | | product_id | Natural key | | product_name | Display name | | sku | Stock keeping unit | | category_l1 | Top-level category | | category_l2 | Sub-category | | category_l3 | Sub-sub-category | | brand | Brand name | | unit_cost | Current standard cost | | list_price | Current list price | | weight | Shipping weight | | is_active | Currently for sale |
Slowly Changing Dimensions
When a customer moves from New York to London, what should the data warehouse do? The answer depends on the business question.
Type 1: Overwrite
Replace the old value with the new value. The customer's city becomes London, and all historical orders now show London. Use this when historical accuracy of the attribute does not matter.
Type 2: Add New Row
Create a new row for the customer with the new city, an effective date, and an expiration date. Historical orders still point to the old row (New York), and new orders point to the new row (London). This is the most common approach for attributes that affect analysis --- customer segment, employee department, product category.
| customer_key | customer_id | city | effective_date | expiration_date | is_current | |-------------|------------|------|---------------|----------------|-----------| | 1001 | CUST-042 | New York | 2024-01-15 | 2026-02-28 | FALSE | | 1002 | CUST-042 | London | 2026-03-01 | 9999-12-31 | TRUE |
Type 3: Add New Column
Store both old and new values in separate columns. Useful when you need to compare before and after but do not need full history. Less common in practice.
For mid-market companies, use Type 2 for customer segment, employee department, product category, and geographic attributes. Use Type 1 for everything else to keep the warehouse simple.
ETL Design Patterns
The ETL (Extract, Transform, Load) process moves data from source systems into the warehouse. Design patterns that work well for ERP and eCommerce data include the following.
Incremental Loading
Rather than reloading all data on every run, track the last successfully loaded timestamp and only process records modified since then. Odoo's write_date field and Shopify's updated_at parameter make this straightforward.
1. Query source: SELECT * FROM sale_order_line WHERE write_date > last_load_timestamp
2. Transform: Map source fields to warehouse columns, look up dimension keys
3. Load: INSERT new rows, UPDATE changed rows (upsert)
4. Update: Set last_load_timestamp to current run start time
Surrogate Key Management
Dimension tables use surrogate keys (auto-incrementing integers) instead of natural keys (Odoo IDs, Shopify IDs). This decouples the warehouse from source system key formats and handles multi-source consolidation where different systems have conflicting ID schemes.
Late-Arriving Dimensions
Sometimes a fact record arrives before the corresponding dimension record --- an order references a new customer who has not been synced yet. Handle this with a placeholder dimension row that gets updated when the full dimension record arrives.
Refresh Scheduling
| Data Type | Refresh Frequency | Rationale | |-----------|-------------------|-----------| | Sales transactions | Every 15-60 minutes | Near-real-time revenue tracking | | Inventory snapshots | Every 4-6 hours | Balance accuracy vs. database load | | Customer dimensions | Daily | Changes are infrequent | | Product dimensions | Daily | Changes are infrequent | | Financial data | Daily (after close) | Depends on accounting workflows | | Marketing data | Every 1-4 hours | Campaign optimization needs fresher data |
For real-time requirements, see our guide to streaming analytics.
Query Performance Optimization
A well-designed star schema already performs well because of its simple join patterns. Additional optimizations include the following.
Indexes. Create indexes on all dimension foreign keys in fact tables and on commonly filtered dimension attributes (date ranges, customer segments, product categories).
Materialized views. Pre-aggregate common queries: daily revenue by product category, weekly inventory levels by warehouse, monthly customer acquisition by channel. Refresh materialized views after each ETL load.
Partitioning. Partition large fact tables by date (monthly or quarterly). Queries that filter by date range scan only the relevant partitions.
Column statistics. Keep PostgreSQL statistics up to date with ANALYZE after bulk loads so the query planner makes optimal decisions.
These optimizations support the self-service BI experience where business users run ad-hoc queries without performance concerns.
Frequently Asked Questions
How big does the company need to be to justify a data warehouse?
There is no minimum size, but the investment becomes worthwhile when you have multiple data sources that need to be combined for analysis, when operational database queries are slowing down production systems, or when you spend more than 10 hours per week on manual data gathering and report creation. Most companies with 30 or more employees and at least two systems (ERP plus eCommerce) benefit from a warehouse.
Should we use a cloud data warehouse like Snowflake or BigQuery?
For mid-market companies, PostgreSQL handles most analytical workloads well and costs significantly less. Cloud warehouses like Snowflake become attractive when your data exceeds 1 TB, when you need to separate compute from storage for cost optimization, or when you have complex data sharing requirements across organizations. Start with PostgreSQL and migrate when you outgrow it.
How long does it take to build a data warehouse?
A minimum viable warehouse with one fact table (sales), four dimension tables, and an ETL pipeline connecting Odoo and Shopify takes four to eight weeks for an experienced team. Adding fact tables, slowly changing dimensions, and data quality monitoring takes another four to eight weeks per fact table. Plan for three to six months for a comprehensive warehouse covering all major business areas.
What Is Next
A well-designed star schema is the foundation for every analytics capability --- from self-service dashboards to predictive models to embedded analytics. It is part of a broader BI strategy that transforms how your company makes decisions.
ECOSIRE builds data warehouses and analytics pipelines for companies running Odoo, Shopify, and GoHighLevel. Our Odoo consultancy team designs warehouse schemas tailored to your business model, and our OpenClaw AI services layer predictive analytics on top.
Contact us to discuss your data warehouse architecture.
Published by ECOSIRE --- helping businesses scale with AI-powered solutions across Odoo ERP, Shopify eCommerce, and OpenClaw AI.
作者
ECOSIRE Research and Development Team
在 ECOSIRE 构建企业级数字产品。分享关于 Odoo 集成、电商自动化和 AI 驱动商业解决方案的洞见。
相关文章
Advanced Production Scheduling: APS, Constraint Theory & Bottleneck Analysis
Master production scheduling with APS, Theory of Constraints & bottleneck analysis. Finite capacity planning, scheduling heuristics & Odoo integration.
Audit Trail Requirements: Building Compliance-Ready ERP Systems
Complete guide to audit trail requirements for ERP systems covering what to log, immutable storage, retention by regulation, and Odoo implementation patterns.
Building B2B Buyer Portals with Odoo: Self-Service Ordering & Reorders
Step-by-step guide to building B2B buyer portals in Odoo with self-service ordering, reorders, invoice access, and RFQ submission for wholesale operations.
更多来自Data Analytics & BI
From Data to Decisions: Building a BI Strategy for Mid-Market Companies
A complete guide to building a business intelligence strategy for mid-market companies covering maturity models, tool selection, data governance, and ROI.
Cohort Analysis & Retention Metrics: Beyond Vanity Numbers
Master cohort analysis and retention metrics to understand customer behavior over time including retention curves, churn calculation, and trend identification.
Customer Lifetime Value Optimization: Beyond the First Purchase
Master CLV calculation with historical and predictive formulas, segment-based optimization, and proven strategies to maximize customer lifetime value.
Customer RFM Analysis: Segmentation, Lifetime Value & Targeting
Master RFM analysis for customer segmentation covering scoring methodology, segment definitions, CLV calculation, and segment-specific marketing strategies.
Demand Forecasting Strategies: ABC Analysis, Min-Max & Safety Stock
Master demand forecasting with ABC-XYZ analysis, min-max rules, and safety stock formulas. Reduce stockouts by 40% and inventory costs by 20%.
Embedded Analytics: Adding Dashboards Inside Your Business Applications
Guide to embedding analytics dashboards inside business applications covering iframes, SDKs, APIs, multi-tenancy, row-level security, and performance.