Microsoft Azure Data Fundamentals
Microsoft Azure Data Fundamentals
FOR USE ONLY AS PART OF MICROSOFT VIRTUAL TRAINING DAYS PROGRAM. THESE MATERIALS ARE NOT AUTHORIZED
FOR DISTRIBUTION, REPRODUCTION OR OTHER USE BY NON-MICROSOFT PARTIES.
Customer {
"firstName": "Joe",
"lastName": "Jones",
ID FirstName LastName Email Address "address":
{
joe@litware.c "streetAddress": "1 Main {
1 Joe Jones 1 Main St. "firstName": "Samir",
om
St.",
"city": "New York", "lastName": "Nadoy",
"state": "NY", "address":
samir@north {
2 Samir Nadoy 123 Elm Pl. "postalCode": "10099"
wind.com },
Pl.",
"streetAddress": "123 Elm
"contact":
[ "unit": "500",
"city": "Seattle",
Product
{
"type": "home", "state": "WA",
"number": "555 123-1234" "postalCode": "98999"
ID Name Price }, },
"contact":
{
[
123 Hammer 2.99
"type": "email",
"address": {
"joe@litware.com" "type": "email",
162 Screwdriver 3.49 } "address":
] "samir@northwind.com"
}
201 Wrench 4.25 }
]
}
How is data stored?
Files Databases
Delimited Text
FirstName,LastName,Email
Relational Customer
Product
ID Name Price
Hardware
}
Key-value {
Optimized formats:
2 "Name": “Samir Nadoy"
Ben }
Graph
Avro, ORC, Parquet Document
Operational data workloads
Data is stored in a database that is optimized for online transactional processing
(OLTP) operations that support applications
A mix of read and write activity
For example:
Read the Product table to display a catalog Order
… … …
Write to the Order table to record a purchase … … …
∑
▲ ----
▼ ----
▲ ----
Operational data is extracted, transformed, and loaded (ETL) into a data lake for analysis
Data is loaded into a schema of tables - typically in a Spark-based data lakehouse with tabular
abstractions over files in the data lake, or a data warehouse with a fully relational SQL engine
Data in tables may be aggregated and loaded into an online analytical processing (OLAP)
model, or cube
The files in the data lake, relational tables, and analytical model can be queried to produce
reports and dashboards
Learning Objective 2: Data roles and services
Data professional roles
All rows have the same columns 1 Joe David Jones joe@litware.com 1 Main St. Seattle
Each column is assigned a datatype 2 Samir Nadoy samir@northwind.com 123 Elm Pl. New York
LineItem Product
Customer Order OrderNo ItemNo ProductID Quantity ID Name Price
ID FirstName LastName Address City OrderNo OrderDate Customer 1000 1 123 1 123 Hammer 2.99
1 Joe Jones 1 Main St. Seattle 1000 1/1/2022 1 1000 2 201 2 162 Screwdriver 3.49
2 Samir Nadoy 123 Elm Pl. New York 1001 1/1/2022 2 1001 1 123 2 201 Wrench 4.25
Structured Query Language (SQL)
SQL is a standard language for use with relational databases
Standards are maintained by ANSI and ISO
Most RDBMS systems support proprietary extensions of standard SQL
Data Definition Language (DDL) Data Control Language (DCL) Data Manipulation Language (DML)
CREATE, ALTER, DROP, RENAME GRANT, DENY, REVOKE INSERT, UPDATE, DELETE, SELECT
CREATE TABLE Product GRANT SELECT, INSERT, UPDATE SELECT Name, Price
( ON Product FROM Product
ProductID INT PRIMARY KEY, TO user1; WHERE Price > 2.50
Name VARCHAR(20) NOT NULL, ORDER BY Price;
Price DECIMAL NULL Product Results
);
ID Name Price Name Price
123 Hammer 2.99
Product Hammer 2.99
162 Screwdriver 3.49 Screwdriver 3.49
ID Name Price
201 Wrench 4.25 Wrench 4.25
Other common database objects
Views Stored Procedures Indexes
Pre-defined SQL queries that behave as Pre-defined SQL statements that can Tree-based structures that improve query
virtual tables include parameters performance
CREATE VIEW Deliveries CREATE PROCEDURE RenameProduct CREATE INDEX idx_ProductName
AS @ProductID INT, ON Product(Name);
SELECT o.OrderNo, o.OrderDate, @NewName VARCHAR(20)
c.Address, c.City AS
FROM Order AS o JOIN Customer AS c
ON o.Customer = c.ID; UPDATE Product
SET Name = @NewName
Customer Order WHERE ID = @ProductID; ●
...
… … … … … …
EXEC RenameProduct 201, 'Spanner'; Product
… … … … … … A-L M-Z ID Name Price
123 Hammer 2.99
Deliveries Product
162 Screwdriver 3.49
OrderNo OrderDate Address City ID Name Price
201 Wrench 4.25
1000 1/1/2022 1 Main St. Seattle 201 Wrench Spanner 4.25
1001 1/1/2022 123 Elm Pl. New York
Learning Objective 2: Explore Azure services
for relational data
Azure SQL
Family of SQL Server based cloud database services
SQL Server on Azure VMs Azure SQL Managed Instance Azure SQL Database
Guaranteed compatibility to SQL Server Near 100% compatibility with SQL Server Core database functionality
on premises on-premises compatibility with SQL Server
Customer manages everything – OS Automatic backups, software patching, Automatic backups, software patching,
upgrades, software upgrades, backups, database monitoring, and other database monitoring, and other
replication maintenance tasks maintenance tasks
Pay for the server VM running costs and Use a single instance with multiple Single database or elastic pool to
software licensing, not per database databases, or multiple instances in a pool dynamically share resources across
Great for hybrid cloud or migrating with shared resources multiple databases
complex on-premises database Great for migrating most on-premises Great for new, cloud-based applications
configurations databases to the cloud
IaaS PaaS
Azure Database services for open-source
Azure managed solutions for common open-source RDBMSs
PaaS implementation of MySQL in An implementation of the MariaDB Database service in the Microsoft
the Azure cloud, based on the Community Edition database cloud based on the PostgreSQL
MySQL Community Edition management system adapted to run Community Edition database
in Azure engine
Commonly used in Linux, Apache,
MySQL, PHP (LAMP) application Compatibility with Oracle Database Hybrid relational and object
architectures storage
PaaS
Demo • Provision Azure relational database services
Explore fundamentals of
non-relational data in Azure
Fundamentals of Azure Storage
Learning Objectives Fundamentals of Azure Cosmos DB
Learning Objective 1: Fundamentals
of Azure Storage
Azure Blob Storage
Storage for data as binary large objects (BLOBs)
• Block blobs Azure Storage Account
o Large, discrete, binary objects that change infrequently
o Blobs can be up to 4.7 TB, composed of blocks of up to 100 MB
➢ A blob can contain up to 50,000 blocks Blob Container
• Page blobs
o Used as virtual disk storage for VMs
o Blobs can be up to 8 TB, composed of fixed sized-512 byte pages blob1
• Append blobs
o Block blobs that are used to optimize append operations folder1/blob2
o Maximum size just over 195 GB - each block can be up to 4 MB
Azure Cosmos DB for Table Azure Cosmos DB for Apache Cassandra Azure Cosmos DB for Apache Gremlin
▲----
▼----
▲----
Extract, Transform, and Load (ETL) or Flexible, scalable file Semantic models for Reports
Extract, Load, and Transform (ELT) storage in a data lake analytical entities Charts
orchestration Relational tables in a Often in the form of Dashboards
Distributed processing to cleanse data lakehouse or data aggregated cubes that
and restructure data at scale warehouse summarize numeric values
Batch and real-time data processing across one or more
dimensions
Data processing in large-scale analytics
Data is stored in a relational database Data files are stored in a distributed file system
and queried using a SQL query engine (a data lake) and typically processed using
Tables are denormalized for query Apache Spark
optimization Metadata is used to define tables that provide
Typically as a star or snowflake schema a relational SQL interface to the file data
of numeric facts that can be aggregated Commonly, a delta lake format is used to provide
by dimensions transactional database functionality
PaaS data analytics services
Azure Synapse Analytics Azure Databricks Azure HDInsight
Use for a single, unified large-scale Use to leverage Databricks skills and Use when you need to support
analytical solution on Azure for cloud portability multiple open-source platforms
SaaS data analytics with Microsoft Fabric
Microsoft Fabric
Unified
SaaS product Security and Business
Compute Storage
experience governance model
Demo • Explore Microsoft Fabric
Explore fundamentals of real-time
analytics
Learning Objectives Streaming and real-time analytics
Learning Objective: Streaming and real-time
analytics
Batch vs stream processing
Batch processing Stream processing
Data is collected and processed at regular intervals Data is processed in (near) real-time as it arrives
Real-time data processing with Azure Stream Analytics
Create an individual Azure Stream Analytics
job or an Azure Stream Analytics cluster
• Ingest data from an input, such as:
o Azure Event Hubs Azure Stream Analytics Job
o Azure IoT Hub
o Azure Blob Storage
o …
• Process data with a perpetual query Input SELECT … Output
• Send results to an output, such as:
o Azure Blob Storage
Query
o Azure SQL Database
o Azure Synapse Analytics
o Azure Function
o Azure Event Hubs
o Power BI
o …
Real-time log and telemetry analysis with Azure Data Explorer
Product
Sales (fact)
Key TimeKey ProductKey CustomerKey Quantity Revenue
ice
er
Al
1 01012022 1 1 1 2.99
om
ir
m
2 01012022 2 1 2 6.98
t
Sa
s
Cu
3 02012022 1 2 2 5.98
e
Jan Feb Mar Apr May …
Jo
Time
Time (dimension)
Measures
Year Month Day Revenue
Model aggregates measures
Key Year Month Day WeekDay
at each hierarchy level 2022 8221.48
01012022 2022 Jan 1 Sat Jan 574.86
02012022 2022 Jan 2 Sun 1 9.97
2 5.98
Hierarchy
… …
Common data visualizations in reports
Tables and text Bar or column chart Line chart