Power BI Dataflows: Centralized Data Preparation

Master Power BI dataflows to centralize ETL logic, eliminate duplicate data preparation across reports, and build a governed, reusable data layer for your organization.

E
ECOSIRE Research and Development Team
|March 19, 202611 min read2.3k Words|

Power BI Dataflows: Centralized Data Preparation

Every Power BI environment eventually develops the same problem: dozens of reports, each with slightly different versions of the "same" data preparation logic. Customer data cleaned and standardized one way in the sales dashboard, slightly differently in the marketing report, and differently again in the executive summary. When the source system changes — a column is renamed, a new region is added — updating every report individually is a maintenance nightmare.

Power BI dataflows solve this by moving data preparation from individual report files (Power BI Desktop .pbix) to a shared, centralized layer in the Power BI service. Logic written once in a dataflow is available to any report, by any developer, with a consistent result. This guide covers dataflow architecture, implementation patterns, and the advanced capabilities that make dataflows the foundation of a governed Power BI environment.

Key Takeaways

  • Dataflows centralize Power Query ETL logic in the Power BI service, eliminating duplication across reports
  • Dataflows produce standardized entities (tables) that multiple reports consume from a single source
  • Linked entities allow dataflows to reference tables from other dataflows, enabling layered architecture
  • Computed entities perform transformations on linked entities within the Premium dataflow engine
  • Dataflow Gen2 in Microsoft Fabric extends dataflows with staging and output destinations
  • AI insights (Premium) apply ML models to dataflow output — anomaly detection, sentiment analysis, key phrase extraction
  • Incremental refresh on dataflows keeps large transformation outputs current without full reprocessing
  • Dataflow governance controls who can create, edit, and consume dataflows through workspace permissions

Why Dataflows Exist

To understand dataflows, it's helpful to visualize the problem they solve.

Without dataflows (the common pattern):

  • Developer A builds Report 1, connects to Salesforce, writes 40 Power Query steps to clean and transform the data
  • Developer B builds Report 2, also connects to Salesforce, writes 38 similar Power Query steps (slightly different)
  • Developer C builds Report 3, same source, 45 steps
  • The Salesforce API credentials are stored in three different files
  • The "customer segment" categorization logic is implemented three slightly different ways
  • When the API changes, three files need updating
  • All three reports run their own scheduled refreshes against the Salesforce API

With dataflows:

  • Data engineer builds one Dataflow with the 40 Power Query steps
  • Reports 1, 2, and 3 all connect to the dataflow entity as their data source
  • One API credential, one transformation logic, one scheduled refresh
  • When the API changes, one dataflow is updated

This is the fundamental value proposition: dataflows are the ETL layer between source systems and consuming reports.


Dataflow Architecture Patterns

Well-designed dataflow architectures follow a layered pattern analogous to a data warehouse medallion architecture:

Bronze layer (Staging dataflow): Extracts data from source systems with minimal transformation — rename columns, fix types, filter obviously invalid records. This layer captures raw data in a standardized format.

Silver layer (Core dataflow): Applies business logic — calculates derived fields, applies reference data lookups, deduplicates records, applies organization-specific business rules. This layer produces the canonical representation of each business entity.

Gold layer (Reporting dataflow or semantic model): Aggregates and structures data for specific analytical use cases — pre-calculated aggregations, report-specific measures, time period calculations.

In Power BI, linked entities connect these layers: the Silver dataflow references entities from the Bronze dataflow using linked entities. The Gold layer references Silver entities. Reports connect to Gold layer entities.

This architecture means: if a source system changes, only the Bronze dataflow needs updating. The business logic in Silver and the reporting structure in Gold remain stable.


Creating Your First Dataflow

Dataflows are created in the Power BI service (not Power BI Desktop). Navigate to a workspace → New → Dataflow.

Dataflow editing environment is Power Query Online — essentially the same Power Query interface as Power BI Desktop, but running in the browser and executing in Microsoft's cloud infrastructure.

Step 1: Define a data source

Click "Add new entities" → Choose a connector. All Power BI Desktop connectors are available in dataflows, plus some cloud-native connectors (Azure Data Factory integration, etc.).

For a SQL Server source:

Server: your-server.database.windows.net
Database: YourDatabase
Authentication: Organizational account or service principal

Step 2: Write transformation queries

The Power Query interface presents familiar: Applied Steps, formula bar, and preview. Build your transformation logic exactly as in Power BI Desktop — filter rows, rename columns, merge with reference tables, apply custom logic.

For a customer data standardization query:

let
    Source = Sql.Database("server", "db"),
    Customers = Source{[Schema="dbo", Item="Customers"]}[Data],
    FilteredActive = Table.SelectRows(Customers, each [Status] = "Active"),
    RenamedColumns = Table.RenameColumns(FilteredActive, {
        {"cust_id", "CustomerID"},
        {"cust_nm", "CustomerName"},
        {"seg_cd", "SegmentCode"}
    }),
    SegmentLookup = Table.Join(
        RenamedColumns, "SegmentCode",
        SegmentDefinitions, "Code",
        JoinKind.LeftOuter
    ),
    RemovedDuplicates = Table.Distinct(SegmentLookup, {"CustomerID"})
in
    RemovedDuplicates

Step 3: Configure refresh schedule

Set the dataflow refresh schedule (up to 48× per day in Premium, 8× per day in Pro). The dataflow refresh runs the transformation queries against the source and writes results to Azure Data Lake Gen2 storage managed by Power BI.

Step 4: Connect reports to the dataflow

In Power BI Desktop: Get Data → Power Platform → Power BI Dataflows → navigate to the workspace → select the entity. The report connects to the dataflow entity's stored output, not the source system.


Linked and Computed Entities (Premium)

Linked entities allow one dataflow to reference entities from another dataflow. This is how the layered architecture described above is implemented.

Creating a linked entity: In the Silver dataflow → New Entity → Link entities from other dataflows → select the Bronze entity.

The linked entity appears in the Silver dataflow as a virtual table pointing to the Bronze dataflow's output. You can add additional transformation steps on top of the linked entity — these additional steps execute in the dataflow engine, not at the source.

Computed entities are linked entities with additional Power Query transformations applied. They execute in the Premium dataflow engine's in-memory processing rather than at the source, providing significant performance advantages for complex transformations on large datasets.

Key distinction:

  • Without Premium: Linked entities reference the other dataflow's data but all processing happens at query time against the source
  • With Premium (computed entities): Transformations on linked entities run in Power BI's analytical engine using the cached data, not the source — dramatically faster for complex transformations

This is particularly valuable for transformations that are expensive to run at the source (joins across large tables, aggregations, window functions) but that need to happen before data reaches reports.


Incremental Refresh for Dataflows

Like datasets, dataflows support incremental refresh to process only new and changed records rather than reloading all data on every cycle.

Requirements:

  • Premium workspace
  • Datetime column in the source query
  • RangeStart and RangeEnd parameters defined in the dataflow query

The configuration is identical to dataset incremental refresh: define parameters, apply date filter in the query, configure the incremental refresh policy on the entity. The dataflow engine creates partitions covering the historical window and refreshes only the recent window on each cycle.

Incremental refresh for dataflows is most valuable when:

  • Transformations are computationally expensive and you don't want to rerun them on unchanged historical data
  • The source query is slow due to large table sizes, and limiting the query window dramatically reduces fetch time
  • Storage cost matters — incremental partitions allow historical data to remain stored without re-querying

For most small-to-medium dataflows (under 10 million rows), full refresh is simpler and sufficient. Incremental refresh becomes important when refresh times exceed 30–60 minutes.


AI Insights in Dataflows (Premium)

Power BI Premium dataflows include AI Insights — pre-built machine learning functions available directly in Power Query Online.

Available AI functions:

FunctionDescriptionUse Case
Text Analytics: Sentiment ScoreReturns positive/negative/neutral + scoreCustomer feedback, reviews
Text Analytics: Key PhrasesExtracts main topics from textSupport tickets, comments
Text Analytics: Language DetectionIdentifies language of textMultilingual content classification
Text Analytics: Named Entity RecognitionIdentifies persons, places, organizationsDocument processing
Vision: Tag ImageLabels objects in imagesProduct catalog classification
Vision: Describe ImageGenerates image descriptionContent moderation
AutoML (custom models)Apply trained Azure ML modelsAny custom classification/regression

These functions are invoked as custom column transformations in the Power Query editor. A sentiment scoring step on a customer_comments column:

= Table.AddColumn(Source, "Sentiment", each
    TextAnalytics.SentimentScore([CustomerComment]),
    type number
)

The AI function calls Azure Cognitive Services behind the scenes; the result (a sentiment score from 0 to 1) appears as a new column. This enables enriched datasets without requiring a separate data science pipeline.


Dataflow Governance and Security

As the central data preparation layer, dataflows require governance controls to ensure quality and prevent unauthorized changes.

Workspace permissions control who can create and edit dataflows. Dataflow creation requires Contributor or Admin access to the workspace. Consumers (report developers who connect to the dataflow) need only Viewer access. This role separation ensures that business logic in the Bronze and Silver layers is maintained by authorized data engineers.

Certification marks dataflows as approved by a central authority. A certified dataflow is highlighted in the data source picker in Power BI Desktop, directing report developers to the authoritative, governed data source rather than building their own from scratch.

Sensitivity labels apply Microsoft Purview Information Protection labels to dataflows containing sensitive data. A dataflow containing PII receives a "Confidential" label, which cascades to any report consuming that dataflow.

Data lineage in the Power BI admin portal shows the flow from source → dataflow → dataset → report. When a source system changes, data lineage helps identify all downstream reports that may be affected.

Monitoring dataflow refresh: Power BI's admin portal shows dataflow refresh history, duration, and failures. Setting up alerts via Power Automate for failed dataflow refreshes ensures that data freshness problems are caught immediately rather than discovered when a user reports stale data.


Dataflows vs. Data Warehouse

Dataflows are not a replacement for a dedicated data warehouse — they're a complement. Understanding where each fits prevents architectural mistakes.

CapabilityDataflowsData Warehouse
Power Query transformationsNativeNot native
SQL transformationsNot supportedNative
Complex joins across large tablesLimitedOptimized
Storage costManaged, fixed pricingVariable
Version control (dbt, GitHub)Not supportedExcellent
Non-Power BI consumers (Tableau, Python)LimitedYes
Serving multiple BI toolsPower BI onlyAny tool
Enterprise governance maturityModerateHigh

Organizations with mature data engineering practices should use a data warehouse as the primary transformation and storage layer, with dataflows as an optional lightweight transformation for Power BI-specific logic. Organizations without data engineering resources often find dataflows sufficient for their needs without requiring a separate warehouse.


Frequently Asked Questions

What is the difference between Power BI dataflows and Power BI datasets?

Dataflows are the ETL/data preparation layer — they extract, transform, and store data as tables (entities) in Azure Data Lake. Datasets (semantic models) are the analytical layer — they define measures, hierarchies, relationships, and security on top of stored data. A common pattern: dataflows prepare and store clean data → datasets import from dataflows and add analytical logic → reports connect to datasets. They serve different roles in the architecture.

Do I need Power BI Premium to use dataflows?

Basic dataflows are available with Power BI Pro workspaces. Premium (or Fabric) adds computed entities, AI insights, incremental refresh, and enhanced performance. For most small-to-medium organizations, Pro-tier dataflows are sufficient. Premium features become important when transformation volumes are large, AI enrichment is needed, or incremental refresh is required.

Can I connect non-Power BI tools to dataflow data?

Yes. Power BI dataflows store their output in Azure Data Lake Gen2 in CDM (Common Data Model) format. Organizations with Premium or Fabric can configure the dataflow to use their own Azure Data Lake account, making the parquet files accessible to other tools (Azure Synapse Analytics, Azure Databricks, Python, Tableau). This "bring your own lake" configuration is available in Premium and Fabric workspaces.

How do dataflows handle data source credential management?

Data source credentials in dataflows are stored in the Power BI service and managed by workspace admins. This is an improvement over report-level credentials — instead of each report developer storing credentials in their .pbix file, credentials are centrally managed for the dataflow. Service principal (Azure AD application) authentication is recommended for automated, production dataflows rather than personal user credentials that expire when the user leaves the organization.

Can dataflows call REST APIs or non-standard data sources?

Yes. Dataflows use the same Power Query connector ecosystem as Power BI Desktop, including REST API connectors via the Web connector, custom connectors (.mez files), and Function connectors. Custom M functions can be defined within the dataflow to encapsulate API logic. Complex API pagination, authentication flows, and rate limiting can all be handled in Power Query within the dataflow environment.


Next Steps

Dataflows are the foundation of a scalable, governed Power BI analytics environment. Investing in the right dataflow architecture early prevents the technical debt of hundreds of disconnected reports with duplicated, inconsistent data preparation logic.

ECOSIRE's Power BI data modeling services include dataflow architecture design, implementation of layered Bronze-Silver-Gold data preparation patterns, and governance configuration. Contact us to assess your current environment and design a dataflow strategy that scales with your organization.

E

Written by

ECOSIRE Research and Development Team

Building enterprise-grade digital products at ECOSIRE. Sharing insights on Odoo integrations, e-commerce automation, and AI-powered business solutions.

Chat on WhatsApp