Microsoft Fabric Introduction

Microsoft Fabric
A unified analytics solution for the era of AI
James Serra
Industry Advisor
Microsoft, Federal Civilian
jamesserra3@gmail.com
6/16/23

About Me
 Microsoft, Data & AI Solution Architect in Microsoft Federal Civilian
 At Microsoft for most of the last nine years as a Data & AI Architect , with a brief stop at EY
 In IT for 35 years, worked on many BI and DW projects
 Worked as desktop/web/database developer, DBA, BI and DW architect and developer, MDM
architect, PDW/APS developer
 Been perm employee, contractor, consultant, business owner
 Presenter at PASS Summit, SQLBits, Enterprise Data World conference, Big Data Conference
Europe, SQL Saturdays, Informatica World
 Blog at JamesSerra.com
 Former SQL Server MVP
 Author of book “Deciphering Data Architectures: Choosing Between a Modern Data Warehouse,
Data Fabric, Data Lakehouse, and Data Mesh”

My upcoming book
- Foundation
- Big data
- Types of data architectures
- Architecture Design Session
- Common data architecture concepts
- Relational Data Warehouse
- Data Lake
- Approaches to Data Stores
- Approaches to Design
- Approaches to Data Modeling
- Approaches to Data Ingestion
- Data Architectures
- Modern Data Warehouse (MDW)
- Data Fabric
- Data Lakehouse
- Data Mesh Foundation
- Data Mesh Adoption
- People, Process, and Technology
- People and process
- Technologies
- Data architectures on Microsoft Azure
First two chapters available now:
Deciphering Data Architectures (oreilly.com)
Table of contents

Agenda
 What is Microsoft Fabric?
 Workspaces and capacities
 OneLake
 Lakehouse
 Data Warehouse
 ADF
 Power BI / DirectLake
 Resources
 Not covered:
 Real-time analytics
 Spark
 Data science
 Fabric capacities
 Billing / Pricing
 Reflex / Data Activator
 Git integration
 Admin monitoring
 Purview integration
 Data mesh
 Copilot

Microsoft Fabric does it all—in a unified solution
An end-to-end analytics platform that brings together all the data and analytics tools that
organizations need to go from the data lake to the business user
Data Integration
Data Factory
Data Engineering
Synapse
Data Warehouse
Synapse
Data Science
Synapse
Real Time Analytics
Synapse
Business Intelligence
Power BI
UNIFIED
SaaS product experience
Unified data foundation
OneLake
Observability
Data Activator
Security and governance Compute and storage Business model

Onboarding and trials
Sign-on
Navigation model
UX model
Workspace organization
Collaboration experience
Data Lake
Storage format
Data copy for all engines
Security model
CI/CD
Monitoring hub
Data Hub
Governance & compliance
Single…
The Intelligent data foundation
AI Assisted
Shared Workspaces
Universal Compute Capacities
OneSecurity
OneLake
Data
Factory
Synapse Data
Engineering
Synapse Data
Science
Synapse Data
Warehousing
Synapse Real
Time Analytics Power BI
Data
Activator
Microsoft Fabric
The data platform for the era of AI

SaaS
Frictionless onboarding
Quick results w/ Intuitive UX
Minimal knobs
Auto optimized
Auto Integrated
Tenant-wide governance
Instant Provisioning
5x5
Centralized security
management
Compliance built-in
Centralized
administration
Success
by Default
5 seconds to signup, 5 minutes to wow

Understanding Microsoft Fabric / FAQ
• Think of it as taking the PBI workspace and adding a SaaS version of Synapse to it
• You will wake up one day and PBI workspaces will be automatically migrated to Fabric workspaces: PBI
capacities will become fabric capacities. Your PBI tenant will have the Fabric workloads automatically built-
in
• Aligned to backend fabric capacity. Similar to Power BI capacity – specific amount of compute assigned to it.
A universal bucket of compute. No more Synapse DWU’s, Spark clusters, etc
• Serverless Pool and Dedicated Pool combined into one – no more relational storage or dedicated resources.
Everything is serverless. All about data lakehouse
• No Azure portal, subscriptions, creating storage. User won’t even realize they are using Azure
• Fabric has strong separation between person who buys and pays the bill, with person who builds stuff. In
Azure, the person building the solution has to also have the power to buy
• This is not just for departmental use. It’s not PaaS services (i.e., Synapse) vs Fabric. Fabric is the future.
Fabric is going to run your entire data estate: departmental projects as well as the largest data warehouse,
data lakehouses and data science projects
• One platform for enterprise data professional and citizen developer (next slide)

•Quickly tune a custom model by
integrating a model built and trained in
Azure ML in a Spark notebook
•Work faster with the ability to user your
preferred data science frameworks,
languages, and tools
•Bypass engineering dependencies
with the ability to use your preferred no-
code ML Ops to deploy and operate
models in production
•Tap into proven-at-scale models and
services to accelerate your AI
differentiation (AOAI, Cognitive Services,
ONNX integration, etc).
•Avoid slow, progress-stagnating
data wrangling by seamlessly triggering
a workflow that can unlock data
engineering tools and capabilities quickly.
•Accelerate your work with visual and
SQL based tools for self-serve data
transformations and modeling as well as
self-serve tools for reporting, dashboards,
and data visualizations
•Turn data into impact with industry-
leading BI tools and integration with the
apps your people use everyday like
Microsoft 365
•Make more data-driven decisions
with actionable insights and intelligence
in your preferred applications
•Maintain access to all the data you
need, without being overwhelmed by
data ancillary to your role thanks to fine
grain data access management controls
Data Engineers
•Execute faster with the ability to spin up
a Spark VM cluster in seconds, or
configure with familiar experiences like
Git DevOps pipelines for data
engineering artifacts
•Streamline your work with a single
platform to build and operate real-time
analytics pipelines, data lakes, lake
houses, warehouses, marts, and cubes
using your preferred IDE, plug-ins, and
tools.
•Reduce costly data replication and
movement with the ability to produce
base datasets that can serve data analysts
and data scientists without needing to
build pipelines
Supporting experiences:
Data Scientists
Supporting experiences
Data Analysts
Data Citizens
Serve data via
warehouse or
lakehouse
Serve
transformed
data
Serve insights
via
embedding
Serve data via warehouse or lakehouse
Data Stewards
•Maintain visibility and control of costs with a unified consumption and cost model that provides evergreen spend optics on your end-to-end data estate
•Gain full visibility and governance over your entire analytics estate from data sources and connections to your data lake, to users and their insights
Data Factory
Real-time analytics
Data Warehouse
Data Engineering
Data
Warehouse
Power BI
Real-time
analytics
Data Science Azure ML Power BI Microsoft 365

Create fabric capacity
Capacity is a dedicated set of resources reserved for exclusive use. It offers dependable,
consistent performance for your content. Each capacity offers a selection of SKUs, and
each SKU provides different resource tiers for memory and computing power. You pay
for the provisioned capacity whether you use it or not.
A capacity is a quota-based system, and scaling up or down a capacity doesn't involve
provisioning compute or moving data, so it’s instant.

Once the capacity is created, we can see the capacity on the Admin portal- Capacity Settings pane under the "Fabric Capacity" tab
Create fabric capacity

Turning on Microsoft Fabric
Enable Microsoft Fabric for your organization - Microsoft Fabric | Microsoft Learn

OneLake for all data 2
“The OneDrive for data”
A single unified logical SaaS data lake for
the whole organization (no silos)
Organize data into domains
Foundation for all Fabric data items
Provides full and open access through
industry standard APIs and formats to any
application (no lock-in)
OneLake
One Copy
One Security
OneLake Data Hub Intelligent data fabric
Data
Factory
Synapse Data
Warehousing
Synapse Data
Engineering
Synapse Data
Science
Synapse Real
Time Analytics
Power BI
Data
Activator

One Copy for all computes 4
Real separation of compute and storage
No matter which engine or item you use,
everyone contributes to building the same lake.
Engines are being optimized to work with
Delta Parquet as their native format
Compute powers the applications and
experiences in Fabric. The compute is
separate from the storage.
Multiple compute engines are available, and
all engines can access the same data without
needing to import or export it. You are able
to choose the right engine for the right job.
Non-Fabric engines can also read/write
to the same copy of data using the
ADLS APIs or added through shortcuts
Unified management and governance
Workspace A
Warehouse
Finance
Lakehouse
Customer
360
Workspace B
Lakehouse
Service
telemetry
Warehouse
Business
KPIs
Data
Factory
Synapse Data
Warehousing
Synapse Data
Engineering
Synapse Data
Science
Synapse Real
Time Analytics
Power BI
Data
Activator
T-SQL Spark
Analysis
services
KQL

Shortcuts virtualize data across domains and clouds
No data movements or duplication
A shortcut is a symbolic link which points
from one data location to another
Create a shortcut to make data from a
warehouse part of your lakehouse
Create a shortcut within Fabric to consolidate
data across items or workspaces without
changing the ownership of the data. Data can be
reused multiple times without data duplication.
Existing ADLS gen2 storage accounts and
Amazon S3 buckets can be managed
externally to Fabric and Microsoft while still
being virtualized into OneLake with shortcuts
All data is mapped to a unified namespace
and can be accessed using the same APIs
including the ADLS Gen2 DFS APIs
Unified management and governance
Workspace A
Warehouse
Finance
Lakehouse
Customer
360
Workspace B
Lakehouse
Service
telemetry
Warehouse
Business
KPIs
Amazon
Azure
Data
Factory
Synapse Data
Warehousing
Synapse Data
Engineering
Synapse Data
Science
Synapse Real
Time Analytics
Power BI
Data
Activator

OneLake Data Hub
Discover, manage and use data in one place
Central location within Fabric to discover,
manage, and reuse data
Data can be easily discovered by its domain
(e.g. Finance) so users can see what matters
for them
Explorer capability to easily browse and find
data by its folder (workspace) hierarchy
Efficient data discovery using search, filter
and sort

Lakehouse
Data Source
Shortcut Enabled
Structured /
Unstructured
Ingestion
Shortcuts
Pipelines &
Dataflows
Store
Lakehouse(s)
Transform
Notebooks &
Dataflows
Expose
PBI
Lake Warehouse

Lakehouse – Lakehouse mode
Table - This is a virtual view of the managed area in your lake. This is the main container to host
tables of all types (CSV, Parquet, Delta, Managed tables and External tables). All tables, whether
automatically or explicitly created, will show up as a table under the managed area of the Lakehouse.
This area can also include any types of files or folder/subfolder organizations.
Files - This is a virtual view of the unmanaged area in your lake. It can contain any files and
folders/subfolder’s structure. The main distinction between the managed area and the unmanaged
area is the automatic delta table detection process which runs over any folders created in the
managed area. Any delta format files (parquet + transaction log) will be automatically registered as a
table and will also be available from the serving layer (TSQL)
Automatic Table Discovery and Registration
Lakehouse Table Automatic discovery and registration is a feature of the lakehouse that provides a fully managed
file to table experience for data engineers and data scientists. Users can drop a file into the managed area of the
lakehouse and the file will be automatically validated for supported structured formats, which is currently only
Delta tables, and registered into the metastore with the necessary metadata such as column names, formats,
compression and more. Users can then reference the file as a table and use SparkSQL syntax to interact with the
data.

Lakehouse – SQL endpoint mode

Lakehouse – shortcuts (to lakehouse)

Workspaces and capacities accessing OneLake
Each tenant will have only one OneLake, and any tenant can
access files in a OneLake from other tenants via shortcuts
Lakehouse
Sales

Data warehouse
Data Source
Shortcut Enabled
Structured /
Unstructured
Ingestion
Mounts
Pipelines &
Dataflows
Store
Data Warehouse
Transform
Procedures
Expose
PBI
Warehouse

Synapse Data Warehouse
Infinitely scalable and open
Synapse Data Warehouse in Fabric
Infinite serverless compute
Open Storage Format
in customer owned Data Lake
Relational Engine
Data
Warehouse
Data
Warehouse
Data
Warehouse
Data
Warehouse
1
1 Open standard format in an open
data lake replaces proprietary
formats as the native storage
• First transactional data warehouse natively
embracing an open standard format
• Data is stored in Delta – Parquet with no
vendor lock-in
• Is auto-integrated and auto-optimized with
minimal knobs
• Extends full SQL ecosystem benefits

Infinite serverless compute
Open Storage Format
in customer owned Data Lake
Relational Engine
Data
Warehouse
Data
Warehouse
Data
Warehouse
Data
Warehouse
Synapse Data Warehouse
Infinitely scalable and open
Synapse Data Warehouse in Fabric
2
2 Dedicated clusters are replaced by
serverless compute infrastructure
1
• Physical compute resources assigned
within milliseconds to jobs
• Infinite scaling with dynamic resource
allocation tailored to data volume and
query complexity
• Instant scaling up/down with no physical
provisioning involved
• Resource pooling providing significant
efficiencies and pricing

Workspaces and capacities accessing OneLake
Each tenant will have only one OneLake, and any tenant can
access files in a OneLake from other tenants via shortcuts
Warehouse
Sales

Data Warehouse
Use this to build a relational layer on top of the physical data
in the Lakehouse and expose it to analysis and reporting
tools using T-SQL/TDS end-point.
This offers a transactional data warehouse with T-SQL DML
support, stored procedures, tables, and views
How can I control “bad actor” queries?
Fabric compute is designed to automatically classify queries
to allocate resources and ensure high priority queries (i.e. ETL,
data preparation, and reporting) are not impacted by
potentially poorly written ad hoc queries.
How is the classification for an incoming query determined?
Queries are intelligently classified by a combination of the
source (i.e., pipeline vs. Power BI) and the query type (I.e.,
INSERT vs. SELECT)
Where is the physical storage for the Data Warehouse? All
data for Fabric is stored in OneLake in the open Delta format.
A single COPY of the data is therefore exposed to all the
compute engines of Fabric without needing to move or
duplicate data

Microsoft Fabric
Advancing Analytics

Why two options?
Delta lake shortcomings:
- No multi-table transactions
- Lack of full T-SQL support (no
updates, limited reads)
- Performance problem for trickle
transactions

Advancing Analytics
Microsoft Fabric

ADF Review Mapping data flows Wrangling data flows

ADF Review Mapping data flows Wrangling data flows
Data Pipelines Don’t
Exist
Dataflow Gen2
Dataflow Gen1

Data Factory in Fabric
What is Dataflows Gen2?
This is the new generation of Dataflows Gen1. Dataflows provide a low-code
interface for ingesting data from 100s of data sources, transforming your data
using 300+ data transformations and loading the resulting data into multiple
destinations such as Azure SQL Databases, Lakehouse, and more
We currently have multiple Dataflows experiences with Power BI Dataflows
Gen1, Power Query Dataflows and ADF Data flows. What is the strategy with
Fabric with these various experiences?
Our goal is to evolve over time with a single Dataflow that combines the ease of
use of PBI, Power Query and the scale of ADF
What is Fabric Pipelines?
Fabric pipelines enable powerful workflow capabilities at cloud-scale. With data
pipelines, you can build complex workflows that can refresh your dataflow, move
PB-size data, and define sophisticated control flow pipelines. Use data pipelines to
build complex ETL and Data factory workflows that can perform a number of
different tasks at scale. Control flow capabilities are built into pipelines that will
allow you to build workflow logic which provides loops and conditional.

For best performance you should compress the data using
the VORDER compression method (50%-70% more
compression). Stored this way by ADF by default

Should I use Fabric now?
 Yes, for prototyping
 Yes, if you won’t be in production for several months
 You have to be OK with bugs, missing features, and possible performance issues
 Don’t use if have hundreds of terabytes

If building in Synapse, how to make transition to Fabric smooth?
 Do not use dedicated pools, unless needed for serving and performance
 Don’t use any stored procedures to modify data in dedicated pools
 Use ADF for pipelines and for PowerQuery, and don’t use ADF mapping data flows. Don’t use
Synapse pipelines or mapping data flows
 Embrace the data lakehouse architecture

Resources
Microsoft Fabric webinar series: https://aka.ms/fabric-webinar-series
New documentation: https://aka.ms/fabric-docs. Check out the tutorials.
Data Mesh, Data Fabric, Data Lakehouse – (video from Toronto Data Professional Community on 2/15/23)
Build videos:
Build 2-day demos
Microsoft Fabric Synapse data warehouse, Q&A
My intro blog on Microsoft Fabric (with helpful links at the bottom)
Fabric notes
Advancing Analytics videos
Ask me Anything (AMA) about Microsoft Fabric!

Q & A ?
James Serra, Microsoft, Industry Advisor
Email me at: jamesserra3@gmail.com
Follow me at: @JamesSerra
Link to me at: www.linkedin.com/in/JamesSerra
Visit my blog at: JamesSerra.com

Microsoft Fabric Introduction

Related slideshows

More Related Content

Microsoft Fabric Introduction

Editor's Notes