Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
5 views

Microsoft Azure Data Fundamentals

The document provides an overview of Microsoft Azure's Virtual Training Day focused on Data Fundamentals, covering core data concepts, data storage methods, and the roles and services related to data management. It details structured and unstructured data, operational and analytical workloads, and the various Azure services available for relational and non-relational data. Additionally, it highlights the importance of SQL and normalization in relational databases, as well as Azure's offerings for open-source database management.

Uploaded by

fi20
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

Microsoft Azure Data Fundamentals

The document provides an overview of Microsoft Azure's Virtual Training Day focused on Data Fundamentals, covering core data concepts, data storage methods, and the roles and services related to data management. It details structured and unstructured data, operational and analytical workloads, and the various Azure services available for relational and non-relational data. Additionally, it highlights the importance of SQL and normalization in relational databases, as well as Azure's offerings for open-source database management.

Uploaded by

fi20
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 60

© Copyright Microsoft Corporation. All rights reserved.

FOR USE ONLY AS PART OF MICROSOFT VIRTUAL TRAINING DAYS PROGRAM. THESE MATERIALS ARE NOT AUTHORIZED
FOR DISTRIBUTION, REPRODUCTION OR OTHER USE BY NON-MICROSOFT PARTIES.

Classified as Microsoft Confidential


Microsoft Azure
Virtual Training Day:
Data Fundamentals
Explore fundamentals of data
Core data concepts
Learning Objectives
Data roles and services
Learning Objective 1: Core data concepts
What is data?
Values used to record information – often representing entities that have one or more attributes
Structured Semi-structured Unstructured

Customer {
"firstName": "Joe",
"lastName": "Jones",
ID FirstName LastName Email Address "address":
{
joe@litware.c "streetAddress": "1 Main {
1 Joe Jones 1 Main St. "firstName": "Samir",
om
St.",
"city": "New York", "lastName": "Nadoy",
"state": "NY", "address":
samir@north {
2 Samir Nadoy 123 Elm Pl. "postalCode": "10099"
wind.com },
Pl.",
"streetAddress": "123 Elm
"contact":
[ "unit": "500",
"city": "Seattle",
Product
{
"type": "home", "state": "WA",
"number": "555 123-1234" "postalCode": "98999"
ID Name Price }, },
"contact":
{
[
123 Hammer 2.99
"type": "email",
"address": {
"joe@litware.com" "type": "email",
162 Screwdriver 3.49 } "address":
] "samir@northwind.com"
}
201 Wrench 4.25 }
]
}
How is data stored?
Files Databases
Delimited Text
FirstName,LastName,Email
Relational Customer
Product
ID Name Price

Joe,Jones,joe@litware.com ID Email Address 123 Hammer 2.99

Samir,Nadoy,samir@northwind.com 1 joe@litware.com 1 Main St. 162 Screwdriver 3.49

2 samir@northwind.com 123 Elm Pl. 201 Wrench 4.25

JavaScript Object Notation (JSON)


{ LineItem
Order
OrderNo ItemNo ProductID Quantity
"customers": OrderNo OrderDate Customer
1000 1 123 1
[ 1000 1/1/2022 1
1000 2 201 2
{ "firstName": "Joe", "lastName": "Jones"}, 1001 1/1/2022 2
1001 1 123 2
{ "firstName": "Samir", "lastName": "Nadoy"}
]
}
Non-relational Orderswo
Extensible Markup Language (XML) rk
o si
st
Key Customer Product
Products nAddress
<Customer firstName="Joe" lastName="Jones"/> rt Sue
po
Name Name Price
Key Value

123 “Hammer ($2.99)”


r e
Customer 1000 Joe Jones 1 Main St. Hammer 2.99

Binary Large Object (BLOB)


Key Document 1001 Samir Nadoy 123Elm Pl. Wrench 4.25
162 “Screwdriver ($3.49)” {
Column Family
works in
1
10110101101010110010... 201 “Wrench ($4.25)”
"Name": "Joe Jones"

Hardware
}

Key-value {

Optimized formats:
2 "Name": “Samir Nadoy"
Ben }
Graph
Avro, ORC, Parquet Document
Operational data workloads
Data is stored in a database that is optimized for online transactional processing
(OLTP) operations that support applications
A mix of read and write activity
For example:
Read the Product table to display a catalog Order
… … …
Write to the Order table to record a purchase … … …

Data is stored using transactions * * *

Transactions are "ACID" based:


Atomicity – each transaction is treated as a single unit of work, which succeeds completely or fails completely
Consistency – transactions can only take the data in the database from one valid state to another
Isolation – concurrent transactions cannot interfere with one another
Durability – when a transaction has succeeded, the data changes are persisted in the database
Analytical data workloads


▲ ----
▼ ----
▲ ----

Operational data is extracted, transformed, and loaded (ETL) into a data lake for analysis
Data is loaded into a schema of tables - typically in a Spark-based data lakehouse with tabular
abstractions over files in the data lake, or a data warehouse with a fully relational SQL engine
Data in tables may be aggregated and loaded into an online analytical processing (OLAP)
model, or cube
The files in the data lake, relational tables, and analytical model can be queried to produce
reports and dashboards
Learning Objective 2: Data roles and services
Data professional roles

Database Administrator Data Engineer Data Analyst


Database provisioning, Data integration pipelines and ETL Analytical modeling
configuration and management processes
Data reporting and summarization
Database security and user access Data cleansing and transformation
Data visualization
Database backups and resiliency Analytical data store schemas and
data loads
Database performance monitoring
and optimization
Microsoft cloud services for data
Operational Data Workloads Analytical Data Workloads
Azure SQL Software-as-a-Service (SaaS) Platform-as-a-Service (PaaS)
Family of SQL Server based relational
database services Microsoft Fabric Azure Synapse Analytics
Azure Database for open-source Unified, SaaS based analytics Integrated solution for data
platform based on open and analytics in Azure
Maria DB, MySQL, PostgreSQL governed lakehouse: Pipelines, Apache Spark, SQL,
Data ingestion and ETL Data Explorer
Azure Cosmos DB
Data Lakehouse
Highly scalable database system for Azure Databricks
relational and non-relational data Data Warehouse
Data Science and ML Lakehouse analytics platform
Azure Storage Realtime Analytics
Azure HDInsight
File, blob, and table storage Data visualization
Apache open-source platform
Hierarchical namespace for data lake Data governance and
storage management
others…
Explore fundamentals of relational data in
Azure
Explore relational data concepts
Learning Objectives
Explore Azure services for relational data
Learning Objective 1: Explore relational data
concepts
Relational tables

 Data is stored in tables Customer


 Tables consists of rows and columns ID FirstName Middle LastName Email Address City

 All rows have the same columns 1 Joe David Jones joe@litware.com 1 Main St. Seattle

 Each column is assigned a datatype 2 Samir Nadoy samir@northwind.com 123 Elm Pl. New York

Product Order LineItem


ID Name Price OrderNo OrderDate Customer OrderNo ItemNo ProductID Quantity

123 Hammer 2.99 1000 1/1/2022 1 1000 1 123 1

162 Screwdriver 3.49 1001 1/1/2022 2 1000 2 201 2

201 Wrench 4.25 1001 1 123 2


Normalization
Sales Data  Separate each entity into its own table
OrderNo OrderDate Customer Product Quantity  Separate each discrete attribute into its own
1000 1/1/2022 Joe Jones, 1 Main St, Seattle Hammer ($2.99) 1 column
1000 1/1/2022 Joe Jones- 1 Main St, Seattle Screwdriver ($3.49) 2  Uniquely identify each entity instance (row)
1001 1/1/2022 Samir Nadoy, 123 Elm Pl, New York Hammer ($2.99) 2 using a primary key
… … … … …  Use foreign key columns to link related entities

LineItem Product
Customer Order OrderNo ItemNo ProductID Quantity ID Name Price

ID FirstName LastName Address City OrderNo OrderDate Customer 1000 1 123 1 123 Hammer 2.99

1 Joe Jones 1 Main St. Seattle 1000 1/1/2022 1 1000 2 201 2 162 Screwdriver 3.49

2 Samir Nadoy 123 Elm Pl. New York 1001 1/1/2022 2 1001 1 123 2 201 Wrench 4.25
Structured Query Language (SQL)
 SQL is a standard language for use with relational databases
 Standards are maintained by ANSI and ISO
 Most RDBMS systems support proprietary extensions of standard SQL

Data Definition Language (DDL) Data Control Language (DCL) Data Manipulation Language (DML)

CREATE, ALTER, DROP, RENAME GRANT, DENY, REVOKE INSERT, UPDATE, DELETE, SELECT
CREATE TABLE Product GRANT SELECT, INSERT, UPDATE SELECT Name, Price
( ON Product FROM Product
ProductID INT PRIMARY KEY, TO user1; WHERE Price > 2.50
Name VARCHAR(20) NOT NULL, ORDER BY Price;
Price DECIMAL NULL Product Results
);
ID Name Price Name Price
123 Hammer 2.99
Product Hammer 2.99
162 Screwdriver 3.49 Screwdriver 3.49
ID Name Price
201 Wrench 4.25 Wrench 4.25
Other common database objects
Views Stored Procedures Indexes
Pre-defined SQL queries that behave as Pre-defined SQL statements that can Tree-based structures that improve query
virtual tables include parameters performance
CREATE VIEW Deliveries CREATE PROCEDURE RenameProduct CREATE INDEX idx_ProductName
AS @ProductID INT, ON Product(Name);
SELECT o.OrderNo, o.OrderDate, @NewName VARCHAR(20)
c.Address, c.City AS
FROM Order AS o JOIN Customer AS c
ON o.Customer = c.ID; UPDATE Product
SET Name = @NewName
Customer Order WHERE ID = @ProductID; ●
...
… … … … … …
EXEC RenameProduct 201, 'Spanner'; Product
… … … … … … A-L M-Z ID Name Price
123 Hammer 2.99
Deliveries Product
162 Screwdriver 3.49
OrderNo OrderDate Address City ID Name Price
201 Wrench 4.25
1000 1/1/2022 1 Main St. Seattle 201 Wrench Spanner 4.25
1001 1/1/2022 123 Elm Pl. New York
Learning Objective 2: Explore Azure services
for relational data
Azure SQL
Family of SQL Server based cloud database services

SQL Server on Azure VMs Azure SQL Managed Instance Azure SQL Database

Guaranteed compatibility to SQL Server Near 100% compatibility with SQL Server Core database functionality
on premises on-premises compatibility with SQL Server
Customer manages everything – OS Automatic backups, software patching, Automatic backups, software patching,
upgrades, software upgrades, backups, database monitoring, and other database monitoring, and other
replication maintenance tasks maintenance tasks
Pay for the server VM running costs and Use a single instance with multiple Single database or elastic pool to
software licensing, not per database databases, or multiple instances in a pool dynamically share resources across
Great for hybrid cloud or migrating with shared resources multiple databases
complex on-premises database Great for migrating most on-premises Great for new, cloud-based applications
configurations databases to the cloud

IaaS PaaS
Azure Database services for open-source
Azure managed solutions for common open-source RDBMSs

Azure Database for Azure Database for Azure Database for


MySQL MariaDB PostgreSQL

PaaS implementation of MySQL in An implementation of the MariaDB Database service in the Microsoft
the Azure cloud, based on the Community Edition database cloud based on the PostgreSQL
MySQL Community Edition management system adapted to run Community Edition database
in Azure engine
Commonly used in Linux, Apache,
MySQL, PHP (LAMP) application Compatibility with Oracle Database Hybrid relational and object
architectures storage

PaaS
Demo • Provision Azure relational database services
Explore fundamentals of
non-relational data in Azure
Fundamentals of Azure Storage
Learning Objectives Fundamentals of Azure Cosmos DB
Learning Objective 1: Fundamentals
of Azure Storage
Azure Blob Storage
Storage for data as binary large objects (BLOBs)
• Block blobs Azure Storage Account
o Large, discrete, binary objects that change infrequently
o Blobs can be up to 4.7 TB, composed of blocks of up to 100 MB
➢ A blob can contain up to 50,000 blocks Blob Container
• Page blobs
o Used as virtual disk storage for VMs
o Blobs can be up to 8 TB, composed of fixed sized-512 byte pages blob1
• Append blobs
o Block blobs that are used to optimize append operations folder1/blob2
o Maximum size just over 195 GB - each block can be up to 4 MB

Per-blob storage tiers Blobs can be organized in virtual directories,


• Hot – Highest cost, lowest latency but each path is considered a single blob in a
• Cool – Lower cost, higher latency flat namespace – folder level operations are
• Archive – Lowest cost, highest latency not supported
Azure Data Lake Store Gen 2
Distributed file system built on Blob Storage Azure Storage Account
• Combines Azure Data Lake Store Gen 1 with Azure Blob
Storage for large-scale file storage and analytics Blob Container
• Enables file and directory level access control and
management
Directory
• Compatible with common large scale analytical systems File1
File2
Enabled in an Azure Storage account through the
Hierarchical Namespace option Hierarchical Namespace
• Set during account creation File system includes directories and files, and
• Upgrade existing storage account is compatible with large scale data analytics
o One-way upgrade process systems like Hadoop, Databricks, and Azure
Synapse Analytics
Azure Files

Files shares in the cloud that can be Azure Storage Account


accessed from anywhere with an internet
connection Azure Files share
• Support for common file sharing protocols:
o Server Message Block (SMB)
o Network File System (NFS) – requires premium tier

• Data is replicated for redundancy and encrypted at


rest
Azure Table Storage
Key-Value storage for application data Azure Storage Account
 Tables consist of key and value columns
o Partition and row keys
o Custom property columns for data values
➢ A Timestamp column is added automatically to log
Tables
data changes
 Rows are grouped into partitions to improve
performance
 Property columns are assigned a data type, PartitionKey RowKey Timestamp Property1 Property2
and can contain any value of that type
1 123 2022-01-01 A value Another value
 Rows do not need to include the same
1 124 2022-01-01 This value
property columns
2 125 2022-01-01 That value
Demo • Explore Azure Storage
Learning Objective 2: Fundamentals of
Azure Cosmos DB
What is Azure Cosmos DB?
{
A multi-model, global-scale NoSQL database "x":[…]
management system }
Documents Graphs
Support for multiple storage APIs
Real time access with fast read and write
performance
Enable multi-region writes to replicate data
globally; enabling users in specified regions to
work with a local replica
Key-Value Tables Column Family Stores
Key Value Col1 Col2 Col2
Azure Cosmos DB APIs
Azure Cosmos DB for NoSQL Azure Cosmos DB for MongoDB Azure Cosmos DB for PostgreSQL
Native API for Cosmos DB Compatibility with MongoDB Compatibility with PostgreSQL
SELECT * {
FROM customers c "id": "joe@litware.com", id name dept manager
WHERE c.id = "joe@litware.com" {
"name": "Joe Jones", db.products.find({ id: 123}) "id": 123,
"address": { "name": "Hammer", 1 Sue Smith Hardware Joe Jones
"street": "1 Main St.", "price": 2.99}
"city": "Seattle" } 2 Ben Chan Hardware Sue Smith
}
}

Azure Cosmos DB for Table Azure Cosmos DB for Apache Cassandra Azure Cosmos DB for Apache Gremlin

Key-value storage API Compatibility with Apache Used to work with


Compatible with Azure Table Storage Cassandra graph data w
or
to ks
vertices are connected ts i
id name dept manager por (1) Sue n
PartitionKey RowKey Name re
via relationships
1 123 Joe Jones 1 Sue Smith Hardware
(edges) works in
1 124 Samir Nadoy 2 Ben Chan Hardware Sue Smith (h) Hardware
(2) Ben
Demo Explore Azure Cosmos DB
Explore fundamentals of data
analytics
Learning Objectives Large-scale data analytics
Learning Objective 1: Large-scale data analytics
Elements of a large-scale data analytics solution
Data ingestion and processing Analytical data store Analytical data model Data visualization

▲----
▼----
▲----

Extract, Transform, and Load (ETL) or Flexible, scalable file Semantic models for Reports
Extract, Load, and Transform (ELT) storage in a data lake analytical entities Charts
orchestration Relational tables in a Often in the form of Dashboards
Distributed processing to cleanse data lakehouse or data aggregated cubes that
and restructure data at scale warehouse summarize numeric values
Batch and real-time data processing across one or more
dimensions
Data processing in large-scale analytics

Relational Database Apache Spark

• Well established model for relational • Open-Source platform for scalable,


data storage and processing distributed data processing

• Comprehensive SQL language support • Multi-language data processing code


for querying and data manipulation (Python, Scala, Java, SQL, …)
Analytical data store architectures

Data Warehouse Data Lakehouse

Data is stored in a relational database Data files are stored in a distributed file system
and queried using a SQL query engine (a data lake) and typically processed using
Tables are denormalized for query Apache Spark
optimization Metadata is used to define tables that provide
Typically as a star or snowflake schema a relational SQL interface to the file data
of numeric facts that can be aggregated Commonly, a delta lake format is used to provide
by dimensions transactional database functionality
PaaS data analytics services
Azure Synapse Analytics Azure Databricks Azure HDInsight

Unified solution for relational data Azure-based implementation of Azure-based implementation of


warehouse and data lake analytics Databricks cloud analytics platform common Apache "big data"
Scalable processing and querying Scalable Spark and SQL querying frameworks built on a data lake
through multiple analytics runtimes for data lake analytics Hadoop - Query data lake files using
Synapse SQL Hive tables
Interactive experience in Azure
Apache Spark Spark – Use Spark APIs to query data,
Databricks workspace and abstract underlying file storage
Synapse Data Explorer
Use Azure Data Factory to as tables
Interactive experience in Azure
implement data ingestion and Kafka – Real-time event processing
Synapse Studio
processing pipelines Storm – Stream processing
Built-in pipeline integration for data
HBase – NoSQL data store
ingestion and processing

Use for a single, unified large-scale Use to leverage Databricks skills and Use when you need to support
analytical solution on Azure for cloud portability multiple open-source platforms
SaaS data analytics with Microsoft Fabric
Microsoft Fabric

Data Data Data Data Real-Time Business Applied


Integration Engineering Warehouse Science Analytics Intelligence Observability
Data Factory Synapse Synapse Synapse Synapse Power BI Data Activator

Unified data foundation


OneLake

Unified
SaaS product Security and Business
Compute Storage
experience governance model
Demo • Explore Microsoft Fabric
Explore fundamentals of real-time
analytics
Learning Objectives Streaming and real-time analytics
Learning Objective: Streaming and real-time
analytics
Batch vs stream processing
Batch processing Stream processing

Data is collected and processed at regular intervals Data is processed in (near) real-time as it arrives
Real-time data processing with Azure Stream Analytics
Create an individual Azure Stream Analytics
job or an Azure Stream Analytics cluster
• Ingest data from an input, such as:
o Azure Event Hubs Azure Stream Analytics Job
o Azure IoT Hub
o Azure Blob Storage
o …
• Process data with a perpetual query Input SELECT … Output
• Send results to an output, such as:
o Azure Blob Storage
Query
o Azure SQL Database
o Azure Synapse Analytics
o Azure Function
o Azure Event Hubs
o Power BI
o …
Real-time log and telemetry analysis with Azure Data Explorer

High throughput, scalable service for batch


and streaming data
Azure Data Explorer dedicated service
LogEvents …
Azure Synapse Data Explorer runtime in …
Azure Synapse Analytics
LogEvents
• Data is ingested from streaming and batch | where StartTime > datetime(2021-12-31)
sources into tables in a database | where EventType == 'Error'
| project StartTime, EventType , Message
• Tables can be queried using Kusto Query
Language (KQL):
Start Time EventType Message
o Intuitive syntax for read-only queries
2022-01-01 Error Invalid key
o Optimized for raw telemetry and time-series 2022-01-01 Error Device failure
data
Real-time analytics in Microsoft Fabric

• Support for continuous data ingestion


from multiple sources
• Capture streaming data in an
eventstream ••• Lakehouse
table
• Write real-time data to a table in a Eventstream

Lakehouse or a KQL database


• Query real-time data using SQL or KQL
KQL Database
• Build real-time visualizations table
Demo • Explore real-time analytics in Microsoft Fabric
Explore fundamentals of data
visualization
Learning Objectives Data visualization
Learning Objective: Data visualization
Introduction to data visualization with Power BI
Start with Power BI Desktop
Import data from one or more sources
Define a data model
Create visualizations in a report
Publish to Power BI Service Web Browser
Schedule data refresh
Create dashboards and apps
Power BI Service
Share with other users Power BI Desktop
Interact with published reports
Web browser Power BI Phone App
Power BI phone app
Analytical data modeling
Customer (dimension) Product (dimension) Total revenue for wrenches
Key Name Category
sold to Samir in January
Key Name Address City
1 Joe 1 Main St. Seattle 1 Hammer Tools

2 Samir 123 Elm Pl. New York 2 Screwdriver Tools

Hammer Wrench Screwdriver


3 Alice 2 High St. Seattle 3 Wrench Tools
4 Bolts Hardware

Product
Sales (fact)
Key TimeKey ProductKey CustomerKey Quantity Revenue

ice
er

Al
1 01012022 1 1 1 2.99

om

ir
m
2 01012022 2 1 2 6.98
t

Sa
s
Cu
3 02012022 1 2 2 5.98

e
Jan Feb Mar Apr May …

Jo
Time
Time (dimension)
Measures
Year Month Day Revenue
Model aggregates measures
Key Year Month Day WeekDay
at each hierarchy level 2022 8221.48
01012022 2022 Jan 1 Sat Jan 574.86
02012022 2022 Jan 2 Sun 1 9.97
2 5.98
Hierarchy
… …
Common data visualizations in reports
Tables and text Bar or column chart Line chart

Pie chart Scatter plot Map


Demo • Visualize data with Power BI
Further learning
To review what you've learned and do additional labs:

Explore core data concepts https://aka.ms/ExploreDataConcepts


Explore relational data in Azure https://aka.ms/ExploreRelationalData
Explore non-relational data in Azure https://aka.ms/ExploreNonRelationalData
Explore data analytics in Azure https://aka.ms/ExploreDataAnalytics

You might also like