Power BI Dataflows: Centralized Data Preparation
Every Power BI environment eventually develops the same problem: dozens of reports, each with slightly different versions of the "same" data preparation logic. Customer data cleaned and standardized one way in the sales dashboard, slightly differently in the marketing report, and differently again in the executive summary. When the source system changes — a column is renamed, a new region is added — updating every report individually is a maintenance nightmare.
Power BI dataflows solve this by moving data preparation from individual report files (Power BI Desktop .pbix) to a shared, centralized layer in the Power BI service. Logic written once in a dataflow is available to any report, by any developer, with a consistent result. This guide covers dataflow architecture, implementation patterns, and the advanced capabilities that make dataflows the foundation of a governed Power BI environment.
Key Takeaways
- Dataflows centralize Power Query ETL logic in the Power BI service, eliminating duplication across reports
- Dataflows produce standardized entities (tables) that multiple reports consume from a single source
- Linked entities allow dataflows to reference tables from other dataflows, enabling layered architecture
- Computed entities perform transformations on linked entities within the Premium dataflow engine
- Dataflow Gen2 in Microsoft Fabric extends dataflows with staging and output destinations
- AI insights (Premium) apply ML models to dataflow output — anomaly detection, sentiment analysis, key phrase extraction
- Incremental refresh on dataflows keeps large transformation outputs current without full reprocessing
- Dataflow governance controls who can create, edit, and consume dataflows through workspace permissions
Why Dataflows Exist
To understand dataflows, it's helpful to visualize the problem they solve.
Without dataflows (the common pattern):
- Developer A builds Report 1, connects to Salesforce, writes 40 Power Query steps to clean and transform the data
- Developer B builds Report 2, also connects to Salesforce, writes 38 similar Power Query steps (slightly different)
- Developer C builds Report 3, same source, 45 steps
- The Salesforce API credentials are stored in three different files
- The "customer segment" categorization logic is implemented three slightly different ways
- When the API changes, three files need updating
- All three reports run their own scheduled refreshes against the Salesforce API
With dataflows:
- Data engineer builds one Dataflow with the 40 Power Query steps
- Reports 1, 2, and 3 all connect to the dataflow entity as their data source
- One API credential, one transformation logic, one scheduled refresh
- When the API changes, one dataflow is updated
This is the fundamental value proposition: dataflows are the ETL layer between source systems and consuming reports.
Dataflow Architecture Patterns
Well-designed dataflow architectures follow a layered pattern analogous to a data warehouse medallion architecture:
Bronze layer (Staging dataflow): Extracts data from source systems with minimal transformation — rename columns, fix types, filter obviously invalid records. This layer captures raw data in a standardized format.
Silver layer (Core dataflow): Applies business logic — calculates derived fields, applies reference data lookups, deduplicates records, applies organization-specific business rules. This layer produces the canonical representation of each business entity.
Gold layer (Reporting dataflow or semantic model): Aggregates and structures data for specific analytical use cases — pre-calculated aggregations, report-specific measures, time period calculations.
In Power BI, linked entities connect these layers: the Silver dataflow references entities from the Bronze dataflow using linked entities. The Gold layer references Silver entities. Reports connect to Gold layer entities.
This architecture means: if a source system changes, only the Bronze dataflow needs updating. The business logic in Silver and the reporting structure in Gold remain stable.
Creating Your First Dataflow
Dataflows are created in the Power BI service (not Power BI Desktop). Navigate to a workspace → New → Dataflow.
Dataflow editing environment is Power Query Online — essentially the same Power Query interface as Power BI Desktop, but running in the browser and executing in Microsoft's cloud infrastructure.
Step 1: Define a data source
Click "Add new entities" → Choose a connector. All Power BI Desktop connectors are available in dataflows, plus some cloud-native connectors (Azure Data Factory integration, etc.).
For a SQL Server source:
Server: your-server.database.windows.net
Database: YourDatabase
Authentication: Organizational account or service principal
Step 2: Write transformation queries
The Power Query interface presents familiar: Applied Steps, formula bar, and preview. Build your transformation logic exactly as in Power BI Desktop — filter rows, rename columns, merge with reference tables, apply custom logic.
For a customer data standardization query:
let
Source = Sql.Database("server", "db"),
Customers = Source{[Schema="dbo", Item="Customers"]}[Data],
FilteredActive = Table.SelectRows(Customers, each [Status] = "Active"),
RenamedColumns = Table.RenameColumns(FilteredActive, {
{"cust_id", "CustomerID"},
{"cust_nm", "CustomerName"},
{"seg_cd", "SegmentCode"}
}),
SegmentLookup = Table.Join(
RenamedColumns, "SegmentCode",
SegmentDefinitions, "Code",
JoinKind.LeftOuter
),
RemovedDuplicates = Table.Distinct(SegmentLookup, {"CustomerID"})
in
RemovedDuplicates
Step 3: Configure refresh schedule
Set the dataflow refresh schedule (up to 48× per day in Premium, 8× per day in Pro). The dataflow refresh runs the transformation queries against the source and writes results to Azure Data Lake Gen2 storage managed by Power BI.
Step 4: Connect reports to the dataflow
In Power BI Desktop: Get Data → Power Platform → Power BI Dataflows → navigate to the workspace → select the entity. The report connects to the dataflow entity's stored output, not the source system.
Linked and Computed Entities (Premium)
Linked entities allow one dataflow to reference entities from another dataflow. This is how the layered architecture described above is implemented.
Creating a linked entity: In the Silver dataflow → New Entity → Link entities from other dataflows → select the Bronze entity.
The linked entity appears in the Silver dataflow as a virtual table pointing to the Bronze dataflow's output. You can add additional transformation steps on top of the linked entity — these additional steps execute in the dataflow engine, not at the source.
Computed entities are linked entities with additional Power Query transformations applied. They execute in the Premium dataflow engine's in-memory processing rather than at the source, providing significant performance advantages for complex transformations on large datasets.
Key distinction:
- Without Premium: Linked entities reference the other dataflow's data but all processing happens at query time against the source
- With Premium (computed entities): Transformations on linked entities run in Power BI's analytical engine using the cached data, not the source — dramatically faster for complex transformations
This is particularly valuable for transformations that are expensive to run at the source (joins across large tables, aggregations, window functions) but that need to happen before data reaches reports.
Incremental Refresh for Dataflows
Like datasets, dataflows support incremental refresh to process only new and changed records rather than reloading all data on every cycle.
Requirements:
- Premium workspace
- Datetime column in the source query
- RangeStart and RangeEnd parameters defined in the dataflow query
The configuration is identical to dataset incremental refresh: define parameters, apply date filter in the query, configure the incremental refresh policy on the entity. The dataflow engine creates partitions covering the historical window and refreshes only the recent window on each cycle.
Incremental refresh for dataflows is most valuable when:
- Transformations are computationally expensive and you don't want to rerun them on unchanged historical data
- The source query is slow due to large table sizes, and limiting the query window dramatically reduces fetch time
- Storage cost matters — incremental partitions allow historical data to remain stored without re-querying
For most small-to-medium dataflows (under 10 million rows), full refresh is simpler and sufficient. Incremental refresh becomes important when refresh times exceed 30–60 minutes.
AI Insights in Dataflows (Premium)
Power BI Premium dataflows include AI Insights — pre-built machine learning functions available directly in Power Query Online.
Available AI functions:
| Function | Description | Use Case |
|---|---|---|
| Text Analytics: Sentiment Score | Returns positive/negative/neutral + score | Customer feedback, reviews |
| Text Analytics: Key Phrases | Extracts main topics from text | Support tickets, comments |
| Text Analytics: Language Detection | Identifies language of text | Multilingual content classification |
| Text Analytics: Named Entity Recognition | Identifies persons, places, organizations | Document processing |
| Vision: Tag Image | Labels objects in images | Product catalog classification |
| Vision: Describe Image | Generates image description | Content moderation |
| AutoML (custom models) | Apply trained Azure ML models | Any custom classification/regression |
These functions are invoked as custom column transformations in the Power Query editor. A sentiment scoring step on a customer_comments column:
= Table.AddColumn(Source, "Sentiment", each
TextAnalytics.SentimentScore([CustomerComment]),
type number
)
The AI function calls Azure Cognitive Services behind the scenes; the result (a sentiment score from 0 to 1) appears as a new column. This enables enriched datasets without requiring a separate data science pipeline.
Dataflow Governance and Security
As the central data preparation layer, dataflows require governance controls to ensure quality and prevent unauthorized changes.
Workspace permissions control who can create and edit dataflows. Dataflow creation requires Contributor or Admin access to the workspace. Consumers (report developers who connect to the dataflow) need only Viewer access. This role separation ensures that business logic in the Bronze and Silver layers is maintained by authorized data engineers.
Certification marks dataflows as approved by a central authority. A certified dataflow is highlighted in the data source picker in Power BI Desktop, directing report developers to the authoritative, governed data source rather than building their own from scratch.
Sensitivity labels apply Microsoft Purview Information Protection labels to dataflows containing sensitive data. A dataflow containing PII receives a "Confidential" label, which cascades to any report consuming that dataflow.
Data lineage in the Power BI admin portal shows the flow from source → dataflow → dataset → report. When a source system changes, data lineage helps identify all downstream reports that may be affected.
Monitoring dataflow refresh: Power BI's admin portal shows dataflow refresh history, duration, and failures. Setting up alerts via Power Automate for failed dataflow refreshes ensures that data freshness problems are caught immediately rather than discovered when a user reports stale data.
Dataflows vs. Data Warehouse
Dataflows are not a replacement for a dedicated data warehouse — they're a complement. Understanding where each fits prevents architectural mistakes.
| Capability | Dataflows | Data Warehouse |
|---|---|---|
| Power Query transformations | Native | Not native |
| SQL transformations | Not supported | Native |
| Complex joins across large tables | Limited | Optimized |
| Storage cost | Managed, fixed pricing | Variable |
| Version control (dbt, GitHub) | Not supported | Excellent |
| Non-Power BI consumers (Tableau, Python) | Limited | Yes |
| Serving multiple BI tools | Power BI only | Any tool |
| Enterprise governance maturity | Moderate | High |
Organizations with mature data engineering practices should use a data warehouse as the primary transformation and storage layer, with dataflows as an optional lightweight transformation for Power BI-specific logic. Organizations without data engineering resources often find dataflows sufficient for their needs without requiring a separate warehouse.
Frequently Asked Questions
What is the difference between Power BI dataflows and Power BI datasets?
Dataflows are the ETL/data preparation layer — they extract, transform, and store data as tables (entities) in Azure Data Lake. Datasets (semantic models) are the analytical layer — they define measures, hierarchies, relationships, and security on top of stored data. A common pattern: dataflows prepare and store clean data → datasets import from dataflows and add analytical logic → reports connect to datasets. They serve different roles in the architecture.
Do I need Power BI Premium to use dataflows?
Basic dataflows are available with Power BI Pro workspaces. Premium (or Fabric) adds computed entities, AI insights, incremental refresh, and enhanced performance. For most small-to-medium organizations, Pro-tier dataflows are sufficient. Premium features become important when transformation volumes are large, AI enrichment is needed, or incremental refresh is required.
Can I connect non-Power BI tools to dataflow data?
Yes. Power BI dataflows store their output in Azure Data Lake Gen2 in CDM (Common Data Model) format. Organizations with Premium or Fabric can configure the dataflow to use their own Azure Data Lake account, making the parquet files accessible to other tools (Azure Synapse Analytics, Azure Databricks, Python, Tableau). This "bring your own lake" configuration is available in Premium and Fabric workspaces.
How do dataflows handle data source credential management?
Data source credentials in dataflows are stored in the Power BI service and managed by workspace admins. This is an improvement over report-level credentials — instead of each report developer storing credentials in their .pbix file, credentials are centrally managed for the dataflow. Service principal (Azure AD application) authentication is recommended for automated, production dataflows rather than personal user credentials that expire when the user leaves the organization.
Can dataflows call REST APIs or non-standard data sources?
Yes. Dataflows use the same Power Query connector ecosystem as Power BI Desktop, including REST API connectors via the Web connector, custom connectors (.mez files), and Function connectors. Custom M functions can be defined within the dataflow to encapsulate API logic. Complex API pagination, authentication flows, and rate limiting can all be handled in Power Query within the dataflow environment.
Next Steps
Dataflows are the foundation of a scalable, governed Power BI analytics environment. Investing in the right dataflow architecture early prevents the technical debt of hundreds of disconnected reports with duplicated, inconsistent data preparation logic.
ECOSIRE's Power BI data modeling services include dataflow architecture design, implementation of layered Bronze-Silver-Gold data preparation patterns, and governance configuration. Contact us to assess your current environment and design a dataflow strategy that scales with your organization.
Written by
ECOSIRE Research and Development Team
Building enterprise-grade digital products at ECOSIRE. Sharing insights on Odoo integrations, e-commerce automation, and AI-powered business solutions.
Related Articles
Building Financial Dashboards with Power BI
Step-by-step guide to building financial dashboards in Power BI covering data connections to accounting systems, DAX measures for KPIs, P&L visualisations, and best practices.
Case Study: Power BI Analytics for Multi-Location Retail
How a 14-location retail chain unified their reporting in Power BI connected to Odoo, replacing 40 spreadsheets with one dashboard and cutting reporting time by 78%.
ERP Data Migration: Best Practices and Common Pitfalls
A complete guide to ERP data migration. Covers data extraction, cleaning, transformation, loading, validation, and the common pitfalls that derail migrations.