Slides from QSSUG Aug 2017 by David Alzamendi:
When on-premise, Data Warehouses are not the only option, many questions arise surrounding Azure SQL Data Warehouse.
In this session, David will cover the fundamentals of using Azure SQL Data Warehouse from a beginner's perspective. He'll discuss the benefits, demystify the pricing measurements and explain the difference between Azure SQL Database and Big Data.
By the end of this session, you will know how to deploy this service in just a few minutes using some of the latest techniques like extracting data from Azure data lakes and accessing Azure blob storage through PolyBase.
Report
Share
Report
Share
1 of 37
More Related Content
Azure SQL Data Warehouse for beginners
1. Azure SQL Data
Warehouse for Beginners
Overview of the Azure SQL Data Warehouse
Service
David Alzamendi
Presented by :
Wednesday, September 13, 2017
4. About Me
I have been working as a Business Intelligence consultant the past few
years. I have worked in all different areas of Business Intelligence
solutions, providing high standard responses. Likewise, I have acquired
international experience through living and working abroad in Argentina,
Spain and Australia.
4
Hobbies:
• Travel
• Cooking
• Sports
Data Enthusiast
@david_alzamendi
www.linkedin.com/in/dalzamendi
david@davidalzamendi.com
6. Today’s Session
• Introduction to ADW
• Architecture (Change of rules)
• Demo 1 (Provisioning)
• Pros
• Cons
• Demo 2 (Polybase)
• Best Practices
• Security
6
7. Introduction to Azure Data Warehouse
Azure SQL Data Warehouse is a massively parallel
processing (MPP) cloud-based, scale-out, relational
database capable of processing large amounts of
data.
7
9. 2010
Microsoft Launched FastTrack
Architecture Data Warehouse
reference using SMP. DW best
practices offered with leading H/W
Partners.
TIMELIN
E
2008
Microsoft Acquired DATAllegro,
company that makes data
warehouse appliances
2011/2013
Parallel Data Warehouse v1 and
v2. Data Allegro product on
Windows & SQL First DW
appliance by Microsoft in
partnership with Dell and HP
2014
Analytics Platform System
(APS). Introduction of Hadoop
region within appliance and new
naming to reflect broader Big
Data capabilities.
9
2015
Introduction of Azure SQL DW.
Service based onAPS’s MPP
capabilities.
On-
Premises
Cloud
10. • You plan to move your DW to the cloud
• Tight deadlines for delivering a DW solution
• When you don’t want to invest millions of dollars
• When you need a DW ASAP
• Not enough skills for infrastructure, security,
network, etc..PAAS!
When to use Azure Data Warehouse 10
11. • Store large amounts of data
• Integrate data from one or many sources in a
single database
• Transform, aggregate and shape data
• Run complex queries or Ad-hoc reports against
large amounts of data
When to use Azure Data Warehouse 11
12. • OLTP Systems
• Row by row processing
• High number of single transactions
• Frequent reads and writes
When NOT to use Azure Data Warehouse 12
15. • Data is stored in 60 distributions
• Round Robin: Distributes information randomly.
• Hash Distributed: Distributes data bases on
hashing values from a single column
• Columnstore Index by default (10X compression +
100X query performance)
Architecture Tables 15
16. You can use load data using:
• Polybase
• Bcp
• Azure Data Factory
• SSIS
Load Data 16
17. But what is the difference? 17
• Blob Storage
• Table Storage
• File Storage
• Queue
• Cosmos DB
• Azure DB
• Azure Data Lakes
• Azure DW
18. Structured Data
Azure Data Warehouse
Azure Data Warehouse vs Azure
Data Lakes
VS
Azure Data Lakes
Structured, semi structured and non-
structured data
It has shape (schema-on-write) No, shape (schema-on-read)
Less agile, fixed configuration Highly agile, configure and reconfigure as
needed
Mature Maturing
Business professionals All users (data scientists)
18
Data
Processing
Agility
Security
Final Users
20. Relational DB using MPP
Azure Data Warehouse
Azure Data Warehouse vs Azure SQL
Database
VS
Azure SQL Database
Relational DB using SMP
Up to 32 (not DMVs) Up to 30 000
Unlimited Up to 4TB
Yes No
OnlineAnalytical Processing (OLAP) OnlineTransaction Processing (OLTP)
20
Architecture
Concurrent Queries
Storage
Polybase
System
26. • Grow or shrink storage size independent of compute.
• Grow or shrink compute power without moving data.
• Pause compute capacity while leaving data intact, only
paying for storage.
Decouples storage from compute 26
27. • Deploy it in a few minutes
• From DW100 to DW6000 (DW9000, DW18000 * Public Preview )
DW100 = 1 compute node
DW6000 = 60 compute nodes
DW 18000= 180 compute nodes
• Automatically Scale out using:
T-SQL
Rest API
Power Shell
Scale Out 27
28. • DWU (Data Warehouse Units)
(DWUs are a measure of underlying resources like CPU,
memory, and I/O bandwidth which are allocated to your SQL
Data Warehouse. Increasing the number of DWUs increases
resources and performance.)
• DWU Calculator to rescue!
• 1 compute node (100 DWU) x $1.21/hour (USD)
• 1TB x $0.17/hour = $122/month (USD)
Price 28
29. • Compute and memory resources are returned to the pool
of available resources in the data center
• DWU costs are zero for the duration of the pause.
• Data storage is not affected and your data stays intact.
• SQL Data Warehouse cancels all running or queued
operations.
• Pause and Resume using Power Shell, ARM and REST
APIs.
Pause the Service 29
30. • Available in 30 regions (SE AUS Melbourne- SE ASIE
Singapore)
• Locally Redundant Storage (LRD) for free
• Available Read Access Geo Redundant Storage (RA-GRS)
• Snapshot every 4 to 8 hours
• Retention for 7 days
Availability and Backups 30
31. • Maintain statistics
• Data Types (geography,geometry,hierarchyid,image,text,ntext,sql_variant,timestamp,xml)
• Primary key, Foreign keys, Unique and Check Table
Constraints
• Unique Indexes, Computed columns, User-Defined Types,
Global temporary tables, Synonyms, Indexed Views,
Triggers, Sequences
• In Memory OLTP is not available
Azure Data Warehouse CONS 31
More
32. Demo 2
Load data using Polybase:
• Create Credentials
• Create External Data Source
• Load Data using CTAS
Polybase 338 000 000 rows / 24 files /DW1000
33. • Pause the service
• Scale when necessary
• Use SSAS on top of the ADW to increase concurrency (better together)
• Maintain Statistics
• Load the data using Polybase
• Hash large tables
• Not Over Partition
• Temporary Heap Tables to transform data
• Use Large Resource Class for Memory Consuming Transactions
• Use DMVS to monitor queries
Best Practices 33
34. • TDE (Transparent Data Encryption)
• It doesn’t support:
• Always Encrypted.
• Grant permission to different users through Schemas and
Roles
Security 34
35. • Easy to use
• Go and create your first DW in a few minutes
• Don’t forget to pause the service
• Scale out and down as needed
• 200$ free credit in your first Azure Subscription
• 25$/12 months Visual Studio Dev Essentials
Wrap Up 35