Microsoft Fabric is the next version of Azure Data Factory, Azure Data Explorer, Azure Synapse Analytics, and Power BI. It brings all of these capabilities together into a single unified analytics platform that goes from the data lake to the business user in a SaaS-like environment. Therefore, the vision of Fabric is to be a one-stop shop for all the analytical needs for every enterprise and one platform for everyone from a citizen developer to a data engineer. Fabric will cover the complete spectrum of services including data movement, data lake, data engineering, data integration and data science, observational analytics, and business intelligence. With Fabric, there is no need to stitch together different services from multiple vendors. Instead, the customer enjoys end-to-end, highly integrated, single offering that is easy to understand, onboard, create and operate.
This is a hugely important new product from Microsoft and I will simplify your understanding of it via a presentation and demo.
Agenda:
What is Microsoft Fabric?
Workspaces and capacities
OneLake
Lakehouse
Data Warehouse
ADF
Power BI / DirectLake
Resources
1. Microsoft Fabric
A unified analytics solution for the era of AI
James Serra
Industry Advisor
Microsoft, Federal Civilian
jamesserra3@gmail.com
6/16/23
2. About Me
Microsoft, Data & AI Solution Architect in Microsoft Federal Civilian
At Microsoft for most of the last nine years as a Data & AI Architect , with a brief stop at EY
In IT for 35 years, worked on many BI and DW projects
Worked as desktop/web/database developer, DBA, BI and DW architect and developer, MDM
architect, PDW/APS developer
Been perm employee, contractor, consultant, business owner
Presenter at PASS Summit, SQLBits, Enterprise Data World conference, Big Data Conference
Europe, SQL Saturdays, Informatica World
Blog at JamesSerra.com
Former SQL Server MVP
Author of book “Deciphering Data Architectures: Choosing Between a Modern Data Warehouse,
Data Fabric, Data Lakehouse, and Data Mesh”
3. My upcoming book
- Foundation
- Big data
- Types of data architectures
- Architecture Design Session
- Common data architecture concepts
- Relational Data Warehouse
- Data Lake
- Approaches to Data Stores
- Approaches to Design
- Approaches to Data Modeling
- Approaches to Data Ingestion
- Data Architectures
- Modern Data Warehouse (MDW)
- Data Fabric
- Data Lakehouse
- Data Mesh Foundation
- Data Mesh Adoption
- People, Process, and Technology
- People and process
- Technologies
- Data architectures on Microsoft Azure
First two chapters available now:
Deciphering Data Architectures (oreilly.com)
Table of contents
4. Agenda
What is Microsoft Fabric?
Workspaces and capacities
OneLake
Lakehouse
Data Warehouse
ADF
Power BI / DirectLake
Resources
Not covered:
Real-time analytics
Spark
Data science
Fabric capacities
Billing / Pricing
Reflex / Data Activator
Git integration
Admin monitoring
Purview integration
Data mesh
Copilot
5. Microsoft Fabric does it all—in a unified solution
An end-to-end analytics platform that brings together all the data and analytics tools that
organizations need to go from the data lake to the business user
Data Integration
Data Factory
Data Engineering
Synapse
Data Warehouse
Synapse
Data Science
Synapse
Real Time Analytics
Synapse
Business Intelligence
Power BI
UNIFIED
SaaS product experience
Unified data foundation
OneLake
Observability
Data Activator
Security and governance Compute and storage Business model
6. Onboarding and trials
Sign-on
Navigation model
UX model
Workspace organization
Collaboration experience
Data Lake
Storage format
Data copy for all engines
Security model
CI/CD
Monitoring hub
Data Hub
Governance & compliance
Single…
The Intelligent data foundation
AI Assisted
Shared Workspaces
Universal Compute Capacities
OneSecurity
OneLake
Data
Factory
Synapse Data
Engineering
Synapse Data
Science
Synapse Data
Warehousing
Synapse Real
Time Analytics Power BI
Data
Activator
Microsoft Fabric
The data platform for the era of AI
7. SaaS
Frictionless onboarding
Quick results w/ Intuitive UX
Minimal knobs
Auto optimized
Auto Integrated
Tenant-wide governance
Instant Provisioning
5x5
Centralized security
management
Compliance built-in
Centralized
administration
Success
by Default
5 seconds to signup, 5 minutes to wow
9. Understanding Microsoft Fabric / FAQ
• Think of it as taking the PBI workspace and adding a SaaS version of Synapse to it
• You will wake up one day and PBI workspaces will be automatically migrated to Fabric workspaces: PBI
capacities will become fabric capacities. Your PBI tenant will have the Fabric workloads automatically built-
in
• Aligned to backend fabric capacity. Similar to Power BI capacity – specific amount of compute assigned to it.
A universal bucket of compute. No more Synapse DWU’s, Spark clusters, etc
• Serverless Pool and Dedicated Pool combined into one – no more relational storage or dedicated resources.
Everything is serverless. All about data lakehouse
• No Azure portal, subscriptions, creating storage. User won’t even realize they are using Azure
• Fabric has strong separation between person who buys and pays the bill, with person who builds stuff. In
Azure, the person building the solution has to also have the power to buy
• This is not just for departmental use. It’s not PaaS services (i.e., Synapse) vs Fabric. Fabric is the future.
Fabric is going to run your entire data estate: departmental projects as well as the largest data warehouse,
data lakehouses and data science projects
• One platform for enterprise data professional and citizen developer (next slide)
10. •Quickly tune a custom model by
integrating a model built and trained in
Azure ML in a Spark notebook
•Work faster with the ability to user your
preferred data science frameworks,
languages, and tools
•Bypass engineering dependencies
with the ability to use your preferred no-
code ML Ops to deploy and operate
models in production
•Tap into proven-at-scale models and
services to accelerate your AI
differentiation (AOAI, Cognitive Services,
ONNX integration, etc).
•Avoid slow, progress-stagnating
data wrangling by seamlessly triggering
a workflow that can unlock data
engineering tools and capabilities quickly.
•Accelerate your work with visual and
SQL based tools for self-serve data
transformations and modeling as well as
self-serve tools for reporting, dashboards,
and data visualizations
•Turn data into impact with industry-
leading BI tools and integration with the
apps your people use everyday like
Microsoft 365
•Make more data-driven decisions
with actionable insights and intelligence
in your preferred applications
•Maintain access to all the data you
need, without being overwhelmed by
data ancillary to your role thanks to fine
grain data access management controls
Data Engineers
•Execute faster with the ability to spin up
a Spark VM cluster in seconds, or
configure with familiar experiences like
Git DevOps pipelines for data
engineering artifacts
•Streamline your work with a single
platform to build and operate real-time
analytics pipelines, data lakes, lake
houses, warehouses, marts, and cubes
using your preferred IDE, plug-ins, and
tools.
•Reduce costly data replication and
movement with the ability to produce
base datasets that can serve data analysts
and data scientists without needing to
build pipelines
Supporting experiences:
Data Scientists
Supporting experiences
Data Analysts
Supporting experiences
Data Citizens
Supporting experiences
Serve data via
warehouse or
lakehouse
Serve
transformed
data
Serve insights
via
embedding
Serve data via warehouse or lakehouse
Data Stewards
•Maintain visibility and control of costs with a unified consumption and cost model that provides evergreen spend optics on your end-to-end data estate
•Gain full visibility and governance over your entire analytics estate from data sources and connections to your data lake, to users and their insights
Data Factory
Real-time analytics
Data Warehouse
Data Engineering
Data
Warehouse
Power BI
Real-time
analytics
Data Science Azure ML Power BI Microsoft 365
13. Create fabric capacity
Capacity is a dedicated set of resources reserved for exclusive use. It offers dependable,
consistent performance for your content. Each capacity offers a selection of SKUs, and
each SKU provides different resource tiers for memory and computing power. You pay
for the provisioned capacity whether you use it or not.
A capacity is a quota-based system, and scaling up or down a capacity doesn't involve
provisioning compute or moving data, so it’s instant.
14. Once the capacity is created, we can see the capacity on the Admin portal- Capacity Settings pane under the "Fabric Capacity" tab
Create fabric capacity
19. OneLake for all data 2
“The OneDrive for data”
A single unified logical SaaS data lake for
the whole organization (no silos)
Organize data into domains
Foundation for all Fabric data items
Provides full and open access through
industry standard APIs and formats to any
application (no lock-in)
OneLake
One Copy
One Security
OneLake Data Hub Intelligent data fabric
Data
Factory
Synapse Data
Warehousing
Synapse Data
Engineering
Synapse Data
Science
Synapse Real
Time Analytics
Power BI
Data
Activator
20. One Copy for all computes 4
Real separation of compute and storage
No matter which engine or item you use,
everyone contributes to building the same lake.
Engines are being optimized to work with
Delta Parquet as their native format
Compute powers the applications and
experiences in Fabric. The compute is
separate from the storage.
Multiple compute engines are available, and
all engines can access the same data without
needing to import or export it. You are able
to choose the right engine for the right job.
Non-Fabric engines can also read/write
to the same copy of data using the
ADLS APIs or added through shortcuts
Unified management and governance
Workspace A
Warehouse
Finance
Lakehouse
Customer
360
Workspace B
Lakehouse
Service
telemetry
Warehouse
Business
KPIs
Data
Factory
Synapse Data
Warehousing
Synapse Data
Engineering
Synapse Data
Science
Synapse Real
Time Analytics
Power BI
Data
Activator
T-SQL Spark
Analysis
services
KQL
21. Shortcuts virtualize data across domains and clouds
No data movements or duplication
A shortcut is a symbolic link which points
from one data location to another
Create a shortcut to make data from a
warehouse part of your lakehouse
Create a shortcut within Fabric to consolidate
data across items or workspaces without
changing the ownership of the data. Data can be
reused multiple times without data duplication.
Existing ADLS gen2 storage accounts and
Amazon S3 buckets can be managed
externally to Fabric and Microsoft while still
being virtualized into OneLake with shortcuts
All data is mapped to a unified namespace
and can be accessed using the same APIs
including the ADLS Gen2 DFS APIs
Unified management and governance
Workspace A
Warehouse
Finance
Lakehouse
Customer
360
Workspace B
Lakehouse
Service
telemetry
Warehouse
Business
KPIs
Amazon
Azure
Data
Factory
Synapse Data
Warehousing
Synapse Data
Engineering
Synapse Data
Science
Synapse Real
Time Analytics
Power BI
Data
Activator
23. OneLake Data Hub
Discover, manage and use data in one place
Central location within Fabric to discover,
manage, and reuse data
Data can be easily discovered by its domain
(e.g. Finance) so users can see what matters
for them
Explorer capability to easily browse and find
data by its folder (workspace) hierarchy
Efficient data discovery using search, filter
and sort
26. Lakehouse – Lakehouse mode
Table - This is a virtual view of the managed area in your lake. This is the main container to host
tables of all types (CSV, Parquet, Delta, Managed tables and External tables). All tables, whether
automatically or explicitly created, will show up as a table under the managed area of the Lakehouse.
This area can also include any types of files or folder/subfolder organizations.
Files - This is a virtual view of the unmanaged area in your lake. It can contain any files and
folders/subfolder’s structure. The main distinction between the managed area and the unmanaged
area is the automatic delta table detection process which runs over any folders created in the
managed area. Any delta format files (parquet + transaction log) will be automatically registered as a
table and will also be available from the serving layer (TSQL)
Automatic Table Discovery and Registration
Lakehouse Table Automatic discovery and registration is a feature of the lakehouse that provides a fully managed
file to table experience for data engineers and data scientists. Users can drop a file into the managed area of the
lakehouse and the file will be automatically validated for supported structured formats, which is currently only
Delta tables, and registered into the metastore with the necessary metadata such as column names, formats,
compression and more. Users can then reference the file as a table and use SparkSQL syntax to interact with the
data.
29. Workspaces and capacities accessing OneLake
Each tenant will have only one OneLake, and any tenant can
access files in a OneLake from other tenants via shortcuts
Lakehouse
Sales
32. Data warehouse
Data Source
Shortcut Enabled
Structured /
Unstructured
Ingestion
Mounts
Pipelines &
Dataflows
Store
Data Warehouse
Transform
Procedures
Expose
PBI
Warehouse
33. Synapse Data Warehouse
Infinitely scalable and open
Synapse Data Warehouse in Fabric
Infinite serverless compute
Open Storage Format
in customer owned Data Lake
Relational Engine
Data
Warehouse
Data
Warehouse
Data
Warehouse
Data
Warehouse
1
1 Open standard format in an open
data lake replaces proprietary
formats as the native storage
• First transactional data warehouse natively
embracing an open standard format
• Data is stored in Delta – Parquet with no
vendor lock-in
• Is auto-integrated and auto-optimized with
minimal knobs
• Extends full SQL ecosystem benefits
34. Infinite serverless compute
Open Storage Format
in customer owned Data Lake
Relational Engine
Data
Warehouse
Data
Warehouse
Data
Warehouse
Data
Warehouse
Synapse Data Warehouse
Infinitely scalable and open
Synapse Data Warehouse in Fabric
2
2 Dedicated clusters are replaced by
serverless compute infrastructure
1
• Physical compute resources assigned
within milliseconds to jobs
• Infinite scaling with dynamic resource
allocation tailored to data volume and
query complexity
• Instant scaling up/down with no physical
provisioning involved
• Resource pooling providing significant
efficiencies and pricing
36. Workspaces and capacities accessing OneLake
Each tenant will have only one OneLake, and any tenant can
access files in a OneLake from other tenants via shortcuts
Warehouse
Sales
37. Data Warehouse
Use this to build a relational layer on top of the physical data
in the Lakehouse and expose it to analysis and reporting
tools using T-SQL/TDS end-point.
This offers a transactional data warehouse with T-SQL DML
support, stored procedures, tables, and views
How can I control “bad actor” queries?
Fabric compute is designed to automatically classify queries
to allocate resources and ensure high priority queries (i.e. ETL,
data preparation, and reporting) are not impacted by
potentially poorly written ad hoc queries.
How is the classification for an incoming query determined?
Queries are intelligently classified by a combination of the
source (i.e., pipeline vs. Power BI) and the query type (I.e.,
INSERT vs. SELECT)
Where is the physical storage for the Data Warehouse? All
data for Fabric is stored in OneLake in the open Delta format.
A single COPY of the data is therefore exposed to all the
compute engines of Fabric without needing to move or
duplicate data
41. Why two options?
Delta lake shortcomings:
- No multi-table transactions
- Lack of full T-SQL support (no
updates, limited reads)
- Performance problem for trickle
transactions
46. ADF Review Mapping data flows Wrangling data flows
Data Pipelines Don’t
Exist
Dataflow Gen2
Dataflow Gen1
47. Data Factory in Fabric
What is Dataflows Gen2?
This is the new generation of Dataflows Gen1. Dataflows provide a low-code
interface for ingesting data from 100s of data sources, transforming your data
using 300+ data transformations and loading the resulting data into multiple
destinations such as Azure SQL Databases, Lakehouse, and more
We currently have multiple Dataflows experiences with Power BI Dataflows
Gen1, Power Query Dataflows and ADF Data flows. What is the strategy with
Fabric with these various experiences?
Our goal is to evolve over time with a single Dataflow that combines the ease of
use of PBI, Power Query and the scale of ADF
What is Fabric Pipelines?
Fabric pipelines enable powerful workflow capabilities at cloud-scale. With data
pipelines, you can build complex workflows that can refresh your dataflow, move
PB-size data, and define sophisticated control flow pipelines. Use data pipelines to
build complex ETL and Data factory workflows that can perform a number of
different tasks at scale. Control flow capabilities are built into pipelines that will
allow you to build workflow logic which provides loops and conditional.
50. For best performance you should compress the data using
the VORDER compression method (50%-70% more
compression). Stored this way by ADF by default
51. Should I use Fabric now?
Yes, for prototyping
Yes, if you won’t be in production for several months
You have to be OK with bugs, missing features, and possible performance issues
Don’t use if have hundreds of terabytes
52. If building in Synapse, how to make transition to Fabric smooth?
Do not use dedicated pools, unless needed for serving and performance
Don’t use any stored procedures to modify data in dedicated pools
Use ADF for pipelines and for PowerQuery, and don’t use ADF mapping data flows. Don’t use
Synapse pipelines or mapping data flows
Embrace the data lakehouse architecture
53. Resources
Microsoft Fabric webinar series: https://aka.ms/fabric-webinar-series
New documentation: https://aka.ms/fabric-docs. Check out the tutorials.
Data Mesh, Data Fabric, Data Lakehouse – (video from Toronto Data Professional Community on 2/15/23)
Build videos:
Build 2-day demos
Microsoft Fabric Synapse data warehouse, Q&A
My intro blog on Microsoft Fabric (with helpful links at the bottom)
Fabric notes
Advancing Analytics videos
Ask me Anything (AMA) about Microsoft Fabric!
54. Q & A ?
James Serra, Microsoft, Industry Advisor
Email me at: jamesserra3@gmail.com
Follow me at: @JamesSerra
Link to me at: www.linkedin.com/in/JamesSerra
Visit my blog at: JamesSerra.com
Editor's Notes
Abstract
Microsoft Fabric
Microsoft Fabric is the next version of Azure Data Factory, Azure Data Explorer, Azure Synapse Analytics, and Power BI. It brings all of these capabilities together into a single unified analytics platform that goes from the data lake to the business user in a SaaS-like environment. Therefore, the vision of Fabric is to be a one-stop shop for all the analytical needs for every enterprise. Fabric will cover the complete spectrum of services including data movement, data lake, data engineering, data integration and data science, observational analytics, and business intelligence. With Fabric, there is no need to stitch together different services from multiple vendors. Instead, the customer enjoys end-to-end, highly integrated, single offering that is easy to understand, onboard, create and operate.
This is a hugely important new product that I have spent a ton of hours understanding and simplifying into a deck and demo that I will present to you. This will shortcut your time to upskill on it so you are prepared to answer customer questions. My presentation comes from the angle that you are in the field and are familiar with Azure Synapse and want to know how this differs.
-----------------------
May public preview, Microsoft build
GA by end-of-year
MVP for GA, incremental updates, release features over time
TODO:
can you have multiple workspace per capacities. Show diagram with workspaces all pointing to the same warehouse
dataflow
how do we do snowflake cloning
How will CDC work with no data flows
how do we talk to customers about waiting for GA for Fabric when they need to do something now, go Lakehouse route in Synapse
slide on mounting a dedicated pool
Architecture diagrams on how things are done in Fabric / Use Cases
This is giving new power to pbi users
enterprise solution vs department wide solution slide https://www.jamesserra.com/archive/2022/06/power-bi-as-a-enterprise-solution/
Slide with Synapse missing items
Will copilot be added?
Highlight no more dedicated pools - use serverless
PBI capacities - how to delegate
get email about behind the scenes warehouse
PBI desktop vs Fabric – what features are not yet in Fabric as modeling is there now
Schema drift
Failover
TODO specialist:
all or nothing with users
who talks about it?
Segment out Synapse - I want to build something now, what product do I use? Don't use dedicated pools, but will have a mounting option. Use ADF instead of Synpase piplelines
Missing features in Synapse and what is better
Snowflake compete slides
link to S3
Pricing
Position Synapse today to move seameless into Fabric - migration path
Purview not in here
What about synapse database templates
Demo built?
Ability to request a demo/presentation from PM/GBB who have been doing it
Fluff, but point is I bring real work experience to the session
Microsoft Fabric combines Data Factory, Synapse Analytics, Data Explorer, and Power BI into a single, unified experience, on the cloud. The open and governed data lakehouse foundation is a cost-effective and performance-optimized fabric for business intelligence, machine kerning, and AI workloads at any scale. It is the foundation for migrating and modernizing existing analytics solutions, whether this be data appliances or traditional data warehouses.
Talk Track for Greenfield/Growing Analytics Customers – Microsoft Fabric’s SaaS environment makes it easier to deploy an entire end-to-end analytics engine from the ground up at an accelerated pace. With the solution’s built-in security and governance capabilities, you can be rest assured your data and insights are protected.
Talk Track for Existing Synapse Customers – Microsoft Fabric is an evolution of Azure Synapse. You will still be able to enjoy the benefits and limitless scale of Synapse in an easier to use SaaS solution while adopting new capabilities that enhance your entire analytics approach. And with the addition of Power BI, you can help democratize the ability to uncover insights and create interactive reports across the organization, helping everyone make more data-driven decisions in their everyday work.
Talk Track for Existing Power BI Customers – With Microsoft Fabric, you’ll be able to access new and powerful data tools and services like Azure Synapse within the same user experience you already enjoy with Power BI. You can unify these tools with your disparate data sources in the same environment to establish a single source of truth for all data, driving the ability for everyone to uncover more accurate and consistent insights than before. And instead of having to worry about security concerns of a patchwork analytics estate, you can be rest assured your data is protected with the built-in security and governance capabilities.
It is not possible to share capacities across tenants
https://learn.microsoft.com/en-us/fabric/enterprise/licenses
PBI v-cores will evolve to Compute Units (group of 8 v-cores)
Public preview available March 23rd. Switch in admin console to run the functionality on/off completely. It will be off by default until July 1st, when it will be switched on unless they go into the admin console and say “No, I don’t want to have this switched on starting July 1st”. Can control at the tenant and capacity levels.
The Microsoft Fabric (Preview) trial includes access to the Fabric product experiences and the resources to create and host Fabric items. The Fabric (Preview) trial lasts until Fabric General Availability (GA), unless canceled. After GA, the Fabric (Preview) trial converts to the GA version and is extended for 60 days.
Starting with OneLake itself. OneLake provides you a single data lake for your entire organization.
Users cannot create OneLake storage. OneLake storage (ADLS Gen2) managed by OneLake API is attached to Fabric tenant. When a workspace is created, a folder is created in OneLake storage (ADLS Gen2 behind the scenes) on a customer tenant.
Everyone is able to contribute to the same lake no matter which engine you use.
We are doing a lot a work to optimize our engines to work directly with delta parquet as their native format for tabular data as you can see with T-SQL for Data warehousing and DirectLake mode in Analysis Service for BI.
Think of OneLake as an abstraction layer. You can mount existing ADLS Gen2 to it. Virtualization across many storage account. Maintains a single namespace.
A shortcut is nothing more than a symbolic link which points from one data location to another. Just like you can create shortcuts in Windows or Linux, the data will appear in the shortcut location as if it were physically there.
Today, if you have tables in a data warehouse, which you want you want to make available along side other tables or files in a lakehouse, you will need to copy that data out of warehouse. With OneLake, you simply create a shortcut in the lakehouse pointing to the warehouse. The data will appear in your lakehouse as if you had physically copied it. Since you didn’t copy it, when data is updated in the warehouse, changes are automatically reflected in the lakehouse.
You can also use shortcuts to consolidate data across workspace and domains without changing the ownership of the data. In this example, the workspace B still owns the data. They still have ultimate control over who can access it and how it stays up to date.
Many of you already have existing data lakes stored in ADLS gen2 or in Amazon S3 buckets. These lakes can continue to exist and be managed externally to Fabric.
We have extended shortcuts to include lake outside of OneLake and even outside of Azure so that you can virtualize you existing ADLS gen 2 accounts or Amazon S3 buckets into OneLake.
All data is mapped to the same unified namespace and can be accessed using the same ADLS gen2 APIs even when it is coming from S3.
The Microsoft Fabric Lakehouse analytics scenario makes it so that data can be ingested into OneLake with shortcuts to other clouds repositories, pipelines, and dataflows in order to allow end-users to leverage other data.
Once that data has been pulled into Microsoft Fabric, users can leverage notebooks to transform that data in OneLake and then store them in Lakehouses with medallion structure.
From there, users can begin to analyze and visualize that data with Power BI using the see-through mode or SQL endpoints.
If you don’t see a Lakehouse table in the warehouse (default), check the data format. Only the tables in Delta Lake format are available in the warehouse (default). Parquet, CSV, and other formats cannot be queried using the warehouse (default)
Warehouse mode in the Lakehouse allows a user to transition from the “Lake” view of the Lakehouse (which supports data engineering and Apache Spark) to the “SQL” experiences that a data warehouse would provide, supporting T-SQL. In warehouse mode the user has a subset of SQL commands that can define and query data objects but not manipulate the data. You can perform the following actions in your warehouse(default):
• Query the tables that reference data in your Delta Lake folders in the lake.
• Create views, inline TVFs, and procedures to encapsulate your semantics and business logic in T-SQL.
• Manage permissions on the objects.
Warehouse mode is primarily oriented towards designing your warehouse and BI needs and serving data.
The Data Warehouse analytics scenario takes existing sources that are mounted, while pipelines and dataflows can bring in all other data that is needed.
IT teams can then define and store procedures to transform the data, which is stored as Parquet/Delta Lake files in OneLake.
From there, business users can analyze and visualize data with Power BI, again using the see-through mode or SQL endpoints.
Can more than one capacity be connected to a Datawarehouse, for instance, one to handle data writes and one to handle data reads? Currently, a capacity is assigned to the workspace level and a Data Warehouse is associated to a single workspace. This means all artifacts in the workspace will share the same capacity and all read/write operations will use the same capacity.
Does Fabric Data Warehouse support fine grained access control like row-level security, column-level security, dynamic data masking? These security constructs are not available but are planned for the Fabric Data Warehouse and will integrate with Fabric’s universal security model.
We already support stats, it’s in the docs. Automatic stats on load and on metadata discovery should land soon. Query plans, indexes, SQL RLS etc also will land incrementally
Python, R, Scala
https://learn.microsoft.com/en-us/fabric/get-started/decision-guide-warehouse-lakehouse
Lakehouse: call it delta lake, owned and managed by Spark, customer can update files - it's user owned. Use if customer likes Spark and using files
Warehouse: well structured, SQL front door, transactional guarentees, multi-table transactions. Nobody except the SQL engine can update the files. Use if customer is comfortable with SQL, comes from a relational database world
LDF and MDF are still used behind the scenes. Can query the warehouse from the lakehouse, but can’t do opposite. Data is synced into onelake from LDF and MDF (only INSERT works for now)
Can't support everything SQL supports with the current open source format at Delta, (multi-table transactions, indexing), so have to use SQL engine
Want to get to a point where don't use LDF/MDF files but use delta underneath. Talk about using Iceberg way down the road
Other knobs to turn? What should we expose? Hide things (DMV, explain plan) because can't tune. Performance DBA's - getting rid of another part of your job
Fabric: Can land in bronze zone in warehouse and just use that for all layers to use T-SQL to write
Compared to Synapse: no longer having relational storage and dedicated compute - idea is it’s all done within lake
3 deployment models
How to organize workspaces – dev/test/prod, by orgs, by cost
High concurrency clusters, spark monitoring
Mapping data flows = ADF Data flows
Wrangling data flows = Power Query
Mapping data flows = ADF Data flows
Wrangling data flows = Power Query
Just in the Fabric context, I would say it this way: Fabric Dataflows are the PQ UI with the scale of ADF
the Gen1 vs. Gen2 is just the distinction between what is in PBI and Excel today vs. what is in Fabric
what will be the response to customers who ask how they move ADF Data flows to Fabric?
We are looking at ways to convert to Fabric data flows and work with Partners who can help with the conversions like 1
Since ADF PQ already had cloud scale, why not just move that into Fabric dataflow gen2?
so Fabric Dataflow Gen2 will eventually be an improvement over ADF PQ
DirectLake mode is a groundbreaking new engine capability to analyze very large datasets in Power BI. The technology is based on the idea to load parquet-formatted files directly from a data lake without having to query a Data Warehouse or Lakehouse endpoint and without having to import or duplicate data into a Power BI dataset. DirectLake is a fast path to load the data from the lake straight into the Power BI engine, ready for analysis. It loads the data directly from the files into memory at runtime. Because there is no explicit import process, it is possible to pick up any changes at the source as they occur, thus combining the advantages of DirectQuery and import mode while avoiding their disadvantages. DirectLake can read parquet-formatted delta files, but for best performance you should compress the data using the VORDER compression method (50%-70% more compression).