Azure SQL Database & Azure SQL Data Warehouse

Microsoft Azure Data Services
Program
Azure SQL Database & Azure SQL Data
Warehouse
MohamedTawfik
Azure CoE EMEA

There’s big
opportunity
$15B+
Linux DB
market by 2019
Source: Cloud Market Intelligence, FY16 H1 LRF (Nov 2015)
Windows
Linux
Relational DB market
growth through 2019
New server shipments of Linux
expected to be 2.4xthat of
Windows by FY 2021
6.6%
per year
Microsoft is the only
Gartner RDBMS
Magic Quadrant
vendor without
support for Linux

Committed
to choice
Azure and Red Hat partnership
HDInsight for Linux
R Server on Linux
SQL Server on Linux
So for the first time
now, we have the
ability to go to an
enterprise and talk
about that entire data
estate across Windows
and Linux.

SQL Server
Everywhere Linux
1010
0101
0010
{ } T-SQL
Java
C/C++
C#/VB.NET
PHP
Node.js
Python
Ruby
Any data Any application
Any cloud Any platform
Windows Server

Windows Linux
Developer, Express, Web, Standard, Enterprise  
Database Engine, Integration Services  
R Services, Analysis Services, Reporting Services, MDS, DQS 
Maximum number of cores Unlimited Unlimited
Maximum memory utilized per instance 24 TB 12 TB
Maximum database size 524 PB 524 PB
Basic OLTP (Basic In-Memory OLTP, Basic operational analytics)  
Advanced OLTP (Advanced In-Memory OLTP, Advanced operational analytics)  
Basic high availability (2-node single database failover, non-readable secondary)  
Advanced HA (Always On - multi-node, multi-db failover, readable secondaries)  
Security
Basic security (Basic auditing, Row-level security, Data masking, Always Encrypted)  
Advanced security (Transparent Data Encryption)  
Data
warehousing
PolyBase2 
Basic data warehousing/data marts (Basic In-Memory ColumnStore, Partitioning, Compression)  
Advanced data warehousing (Advanced In-Memory ColumnStore)  
Advanced data integration (Fuzzy grouping and look ups) 
Tools
Windows ecosystem: Full-fidelity Management & Dev Tool (SSMS & SSDT), command line tools  
Linux/OSX/Windows ecosystem: Dev tools (VS Code), DB Admin GUI tool, command line tools  
Developer
Programmability (T-SQL, CLR, Data Types, JSON)  
Windows Filesystem Integration - FileTable 
Business
intelligence
Basic reporting, analytics & data integration 
Basic Corporate Business Intelligence (Multi-dimensional models, Basic tabular model) 
Advanced Corporate Business Intelligence (Advanced tabular model, DirectQuery, advanced data mining) 
Mobile BI (Datazen) 
Advanced analytics
Basic “R” integration (Connectivity to R Open, Limited parallelism for ScaleR) 
Advanced “R” integration (Full parallelism for ScaleR) 
Hybrid cloud Stretch Database 
What’s coming in
SQL Server on
Linux

12
Azure SQL Database (PaaS)
 Fully managed database-as-a-service that lets you focus on your business
 Database provisioning on-demand
 Scalable and elastic performance for all workloads
 99.99% availability, zero maintenance
 Intelligent: learns and adapts to optimize performance
 Secure and compliant to protect sensitive data
 Geo-replication and restore-from-backup for data protection
 Compatible with SQL Server 2014, 2016

Seamless and compatibleIntelligent DBaaS Competitive TCO
( 2 0 1 7 ) A Z U R E S Q L DATA B A S E
Privacy and Trust
OPERATIONAL ANALYTICS
Columnstore
Hekaton (in-memory
OLTP)
PREDICTABLE PERFORMANCE
Query Store
Index Optimization
AUTOMATIC TUNING
AUTO QUERY PLAN
CORRECTION
PERFORMANCE INSIGHT IN
OMS
ADAPTIVE QUERY
PROCESSING
SQL GRAPH
ADVANCED ANALYTICS
NATIVE PREDICT
R SERVICES
ACTIVITY MONITORING
Engine Audit
Threat Detection (NEW
SCENARIOS)
CENTRALIZED DASHBOARD
OMS INTEGRATION
ACCESS CONTROL
SQL Firewall
RLS, Dyn. Data Masking
AAD WITH MFA
DATA PROTECTION
Encrypt in motion (TLS)
TDE & BYK
Always Encrypted (S/W)
SERVICE ENDPOINT
ALWAYS ENCRYPTED (SECURE
H/W)
DISCOVERY & ASSESSMENT
VULNERABILITY ASSESSMENT
HA-DR BUILT-IN
99.99% SLA
Geo-restore
ACTIVE GEO REPLICAS (4)
MULTI-AZ
BACKUP AND RESTORE
Backup with health
check
35 days PITR
10 YEARS DATA RETENTION
DISTRIBUTED APPLICATION
Change Tracking
TRANSACTION REPLICATION
DATA SYNC
SSIS SERVICE
BIZ MODEL & SKUS
DTU/eDTU
<=1TB
BIGGER STD: S4-S12
SEPARATE COMPUTE AND
STORAGE
AZURE HYBRID BENEFIT
COST OPTIMIZATION
INTELLIGENT PAAS

16
Azure SQL Database (PaaS)
 You need to use a logical server prior to creating your first database.A logical server is the entry point
for the databases and controls logins, firewall rules, auditing rules, thread detection policies and
failover groups.You should not confuse an Azure SQL Database logical server with an on-premises SQL
Server.The logical server is a logical structure that doesn’t provide any way for connecting to instance
or feature level.
 Because of how Azure provides high availability to the databases, there is no need for the Logical server
to be on the same region as the databases it manages.Azure SQL Database does not guarantee that
the logical server and its related databases will be on the same region.
 This first account is a SQL login account.You can only use SQL login andAzure Active Directory login
accounts.Windows authentication is not supported with SQL logical server.

21
vCore-based model
 Each 100 DTU in Standard tier requires at least 1 vCore in General Purpose tier; each
125 DTU in Premium tier requires at least 1 vCore in Business Critical tier.
 In the vCore-based purchasing model, you can exchange your existing licenses for
discounted rates on SQL Database using the Azure Hybrid Use Benefit for SQL Server.
This Azure benefit allows you to use your on-premises SQL Server licenses to save
more than 40% on Azure SQL Database using your on-premises SQL Server licenses
with Software Assurance.
 If your database or elastic pool consumes more than 300 DTU conversion to vCore
may reduce your cost.

25
SQLQueryStress
https://github.com/ErikEJ/SqlQueryStress/wiki

26
DTU Calculator
https://dtucalculator.azurewebsites.net/

27
DTU Calculator
https://dtucalculator.azurewebsites.net/

28
Elastic pools
 You can configure resources for the pool based
either on the DTU-based purchasing model or the
vCore-based purchasing model.The resource
requirement for a pool is determined by the
aggregate utilization of its databases.The
amount of resources available to the pool is
controlled by the developer budget.
 The user adds databases to the pool, sets the
minimum and maximum eDTUS for each
database, and sets the eDTU limit of the pool
based on their budget.This means that within the
pool, each database is given the ability to auto-
scale in a set range.

Azure SQL Database Service Tiers

30
Managed Instance
• Are your customers
interested in moving to
cloud?
• Want to close your data center
• Current hosting solution is high
maintenance
• You’re asked to do more with less
• Want to expand your reach globally
Managed Instance brings
PaaS closer to you!
??
?
• Do your customer want to
avoid app rewrites but still
benefit from PaaS?

34
Backup
 Configuring and performing point in time recovery Azure SQL Database does a full backup every week, a differential
backup each day, and an incremental log backup every five minutes. If you want to extend the default retention period,
you need to configure long-term retention.This feature depends on Azure Recovery Services, and you can extend the
retention time up to 10 years.
 SQL Database automatically creates database backups and uses Azure read-access geo-redundant storage (RA-GRS) to
provide geo-redundancy.These backups are created automatically and at no additional charge.
 If you delete the Azure SQL server that hosts SQL databases, all elastic pools and databases that belong to the server
are also deleted and cannot be recovered.You cannot restore a deleted server. But if you configured long-term
retention, the backups for the databases with LTR will not be deleted and these databases can be restored.
 If your database is encrypted withTDE, the backups are automatically encrypted at rest, including LTR backups
 Backup storage up to 100% of the maximum database size is included, beyond which you will be billed in GB/month
consumed.

35
Backup
When you need to recover a database from an automatic backup you can
restore it to:
 A new database in the same logical server from a point-in-time within
the retention period.
 A database in the same logical server from a deleted database.
 A new database from the most recent daily backup to any logical server
in any region.

36
Backup
Recovery point objective (RPO)
RecoveryTime Objective (RTO)
Estimated recovery time (ERT)

37
Backup
*If you need faster recovery, use active geo-replication. If you need to be able to recover data
from a period older than 35 days, use Long-term retention.

43
Business Continuity
 Every Azure SQL Database subscription has built-in redundancy.Three copies of your
data are stored across fault domains in the datacenter to protect against server and
hardware failure.This is built in to the subscription price and is not configurable.
 Standard/general purpose model that provides 99.99% of availability but with some
potential performance degradation during maintenance activities.
 Premium/business critical model that provides also provides 99.99% availability with
minimal performance impact on your workload even during maintenance activities.
 Although high availability is a great feature, it does not protect against a catastrophic
failure of the entire Azure region. For those cases, you need to put in place a disaster
recovery plan. Azure SQL Database provides you with two features that makes it easier
to implement these type of plans: active geo-replication and auto-failover groups.

44
Failover groups and active geo-replication
Active geo-replication has the following benefits:
 Database-level disaster recovery goes quickly when you’ve replicated transactions to
databases on different SQL Database servers in the same or different regions.
 You can fail over to a different data center in the event of a natural disaster or other
intentionally malicious act.
 Online secondary databases are readable, and they can be used as load balancers for
read-only workloads such as reporting.
 With automatic asynchronous replication, after an online secondary database has
been seeded, updates to the primary database are automatically copied to the
secondary database.

45
 With active geo-replication you can configure up to four readable
secondary databases in the same or different regions. In case of a region
outage, your application needs to manually failover the database. If you
require that the failover happens automatically performance, then you
need to use auto-failover groups.
 Secondary active geo-replication databases are priced at 100 percent of
primary database prices.The cost of geo-replication traffic between the
primary and the online secondary is included in the cost of the online
secondary. Active geo-replication is available for all database tiers.

46
Before you create an online secondary, the following requirements must be
met:
 The secondary database must have the same name as the primary.
 They must be on separate servers.
 They both must be on the same subscription.
 The secondary server cannot be a lower performance tier than the
primary.

47

48

49

50

51
Elastic scalability
 If you reach 80% of your performance metrics, it’s time to consider
increasing your service tier or performance level. If you’re consistently
below 10 percent of the DTU, you might consider decreasing your service
tier or performance level.
 we can scale-up.This means that we will add CPU, memory, and better
disk i/o to handle the load. In Azure SQL Database, scaling up is very
simple: we just move the slider bar over to the right or choose a new
pricing tier.This will give us the ability to handle more DTUs.

52
Elastic scalability
In some cases, even the highest performance tiers and performance optimizations might
not handle your workload on successful and cost-effective way. we might even not be able
to scale-up much further. In that cases you have other options to scale your database:
 Read scale-out is a feature available in where you are getting one read-only replica of
your data where you can execute demanding read-only queries such as reports. Read-
only replica will handle your read-only workload without affecting resource usage on your
primary database.
 Database sharding is a set of techniques that enables you to split your data into several
databases and scale them independently.

53
Read scale-out
Each database in the Premium tier (DTU-based purchasing model)
or in the Business Critical tier (vCore-based purchasing model) is
automatically provisioned with severalAlwaysON replicas to
support the availability SLA.
These replicas are provisioned with the same performance level as
the read-write replica used by the regular database connections.
The Read Scale-Out feature allows you to load balance SQL
Database read-only workloads using the capacity of one of the
read-only replicas instead of sharing the read-write replica.

54
Sharding
 We may shard a database because:
It is too large to be stored in a single Azure SQL Database.
It is too much data to backup and restore in a reasonable amount of time.
Our customers require that their data is stored away from other customers
 Sharding involves rewriting a significant portion of our applications to
handle multiple databases.
 Sharding is easily implemented in AzureTable Storage and Azure Cosmos
DB, but is significantly more difficult in a relational database like Azure SQL
Database.The complexity comes from being transactionally consistent while
having data available and spread throughout several databases.

55
Sharding
 Microsoft has released a set of tools called Elastic DatabaseTools that
are compatible with Azure SQL Database.This client library can be used in
your application to create sharded databases.
 The main power of the Elastic DatabaseTools is the ability to fan-out
queries across multiple shards without a lot of code changes.

56
Sharding
When you use the Elastic client library, you deal with
shards, which is conceptually equivalent to a database.
This client library helps you with:
 Shard map management creates a shard map
database for storing metadata about the mapping of
each tenant with its database, allowing you to register
each database as a shard
 Data dependent routing allows you to select the
correct database based on the information that you
provide on the query for accessing the tenant’s data.
 Multi-shard queries (MSQ) executes the sameT-SQL
on all shards that participate with the query and returns
the resultant data as the result of a UNION ALL.

57
Azure SQL Data Sync
Synchronize data across multipleAzure SQL databases and
SQL Server instances, in uni-direction or bi-direction.
Keep data up-to-date across all SQL databases Distributed
Applications
Cloud
App
Cloud
App
Cloud
App
On-prem
App

58
Azure SQL Data Sync
 SQL Data Sync is a new service for Azure SQL Database. It allows you to bi-directionally
replicate data between two Azure SQL Databases or between an Azure SQL Database and
an on-premise SQL Server.
 A Sync Group is a group of databases that you want to synchronize using Azure SQL Data
Sync.
 A Sync Schema is the data you want to synchronize.
 Sync Direction allows you to synchronize data in either one direction or bi-directionally.
 Sync Interval controls how often synchronization occurs.
 Finally, a Conflict Resolution Policy determines who wins if data conflicts with one another.
 The hub database must always be an Azure SQL Database. A member database can either
be Azure SQL Database or an on-premise SQL Server.
 This can be used to populate a read-only version of the database for reporting, but only if
the schema will be 100% consistent.

59
Azure SQL Data Sync
• All SQL databases supported
(SQL Server, SQL IaaS & Azure SQL
Database)
• Zero code required to enable data
synchronization among SQL databases
• Hub-and-Spoke Synchronization
technology
• Both One-way or Bi-
directional synchronization
• Table-level synchronization with
Column Filter
• Minute-level latency

62
Azure SQL Data Sync
Data Sync Active Geo Replication
Pros • Active-active support
• Sync selected tables and
columns
• Sync between on-prem and
Azure SQL Database
• Seconds level latency
• Transactional consistency
• Auto failover with failover
group
• Designed for DR or read-only
scaling
Cons • 5 min or more latency
• No transactional consistency
• Higher performance impact
• Non-Writeable secondaries
• Replicates the entire database
• Secondary must use same
edition

63
Azure SQL Data Sync
Data Sync Transactional Replication
Pros • Active-active support
• Bi-directional between on-
prem and Azure SQL Database
• Lower latency
• Transactional consistency
• Designed for on-prem to
Azure DB replication or
migration
Cons • 5 min or more latency
• No transactional consistency
• Higher performance impact
• On-prem/Azure SQLVM to
Azure SQL Database only
• High maintenance cost

64
Azure SQL Data Sync
Data Sync SSIS
Pros • Easy configuration • Support transformation
• Support more types of
sources and destinations
• Designed for ETL
Cons • Transformation is not
supported
• Domain knowledge required
• Need extra hosted services
(VM or SSIS PaaS)
• Need additional change
tracking technologies

65
SQL Server Stretch Database
 SQL Server Stretch Database migrates your cool data securely and
transparently to Azure.
 The main advantage of this solution is that your data is always online, and
you not need to change any query or any configuration or code line in
your application to work with SQL Server Stretch Database.
 Since you are moving your cool data to the cloud, you reduce your need
for high performance storage for the on-premises database servers.
 You can migrate full tables or just parts of online tables by using a filtering
function.

66
Creates a secure connection between the
Source SQL Server andAzure
Provisions remote instance and begins
migration
Apps and Queries continue to run for both
the local database and remote endpoint
Security controls and maintenance remain
local
Available in all versions of SQL Server 2016
SQL
Stretch
Database
SQL
2016 Cold DataHot data
Cold data
On-premises network Azure PaaS

67
 Compute billed as DU, storage billed as Standard Disk rates.

68
Migration to Azure SQL Database

69
Link

71
Migration with downtime during the migration
*Rather than using DMA, you can also use a BACPAC file.
See Import a BACPAC file to a new Azure SQL Database.

72
UseTransactional Replication

73

74

75

76

77

S E A M L E S S C LO U D
I N T E G R AT I O N
Easy lift-and-shift, integrate and
distribute
Active Geo-replicas “data CDN” for your edge
deployments
SQL Azure Data Sync v2 synchronize data
across distributed and occasionally connected
applications
Azure SQL Database Managed Instance
facilitates lift and shift migration from on-
premises SQL Server to cloud
Azure Hybrid Benefit for SQL Server
maximizes current on-premises license
investments to facilitate migration
Database Migration Service (DMS)
provides seamless and reliable migration at scale
with minimal downtime
Most consistent data platform
Database Migration
Ser vice (DMS)
Azure SQL Database
Managed Instance
Azure Hybrid Benefit
(AHB) for SQL Ser ver
SQL Ser ver
Managed SSIS in Azure
Azure SQL Database

79
Graph Database
 SQL Server 2017 introduces a new graph database feature.
 Graph databases are yet another NoSQL solution.
 Graph database introduce two new vocabulary words: nodes and relationships.
 Nodes are entities in relational database terms. Each node is popularly a noun, like a person, an
event, an employee, a product, or a car. A relationship is similar to a relationship in SQL Server in
that it defines that a connection exists between nouns.
 A key difference between a relational storage engine and a graph database storage engine is
that as the number of nodes increase, the performance cost stays the same.
 Graph databases are popularly traversed through a domain specific language (DSL) called
Gremlin. In Azure SQL Database, graph-like capabilities are implemented throughT-SQL.
 DDL Extensions – create node/edge tables
 Query Language Extensions – New built-in: MATCH, to support pattern matching and
traversals

80
What is a Graph?
Attendee Session
attends
• A graph is collection of Nodes and Edges
– Nodes: Entities – for example
customer, supplier, product
– Edges: Relationships that various
entities share with each other
– Properties: Node or Edge attributes

81
Why Graph Databases?
Hierarchical or interconnected
data, entities with multiple
parents.
Analyze interconnected data,
materialize new information
from existing facts. Identify non-
obvious connections
Complex many-to-many
relationships. One relation
flexibly connecting multiple
entities.
A
John
Mary
Alice
Shaun
Jacob
Jerry
Natalie
Bob
leads
manages
leadsleads

82
Our approach – Embrace and Extend
Backed by Research
References
J. Fan, A. Gerald, S. Raj and J. M. Patel,
"The case against specialized graph
analytics engines," in CIDR, Asilomar,
CA, 2015.
A. Jindal, S. Madden, M. Castellanos
and M. Hsu, "Graph analytics using
vertica relational database," in IEEE
BigData, Santa Clara, CA, 2015
Matured Product
40+ years of academic and
industry research.
Highly evolved ecosystem,
including tooling and
community support
Build on-prem, cloud,
Hybrid Solutions
Best of both relational
and graph database on a
single platform
Trusted
Used and trusted by
millions of customers for
enterprise and mission
critical workloads.

83
DDL Extensions
CREATE NODE
CREATE TABLE [dbo].[Attendee](
[Attendee_Id] [uniqueidentifier] PRIMARY KEY,
[Attendee_FName] varchar(100),
[Attendee_LName] varchar(100)
) AS NODE
GO
SELECT TOP 5 * FROM Attendee;

84
DDL Extensions
CREATE TABLE attends (Rating integer) AS EDGE;
CREATE TABLE [from] AS EDGE;
CREATE EDGE
SELECT TOP 5 * FROM [from];

85
Query Language Extensions
• Multi-hop navigation and join-free pattern matching using MATCH
predicate
• ASCII-art syntax to facilitate graph traversal
SELECT
Attendee.Attendee_Name AS ‘AttendeeName’,
Session.Session_ID AS ‘SessionName’
FROM
attends a,
Attendee at,
Session s
WHERE
MATCH (Attendee-(attends)->Session)
AND Session.session_name = 'Graph extensions in Microsoft SQL
Server 2017 and Azure SQL Database'

86
Relational vs. Graph
 Graph and relational designs can answer the same questions
 But if traversal of relationships define the primary application requirements,
Graph can solve this more intuitively and with less code

87
Graph Database Scenarios
 Recommendation Systems
 Fraud Detection
 Content Management
 Bill of Materials, product hierarchy
 CRM

88
AutomaticTuning
• One-click to enable
• Prevent and mitigate
performance issues
• No app changes needed
• Tuning actions
Create missing indexes
Drop unused/duplicate indexes
Force last good plan

94
Intelligent Insights
• Continuous monitoring
• Disruptive event detection
• Root cause analysis
• Available as diagnostic log
Azure SQL Analytics solution
Stream to Event Hub
Archive to Storage
Root-cause: Hitting resource limits caused by new ad-hoc query 0X9001RTYU. Impacted query 0X9002FGJR started
timing out. Consider stopping the ad-hoc query or increasing your pricing tier.
Disruptive
event
Queries:
0X9003HA4J OK
0X9002FGJR Regressed query
0X901119GI OK
0X900044RJ OK

100
Query Performance Insight
Query Performance Insight allows you to spend less time troubleshooting database
performance by providing the following:
 Deeper insight into your databases resource (DTU) consumption.
 The top queries by CPU/Duration/Execution count, which can potentially be tuned
for improved performance.
 The ability to drill down into the details of a query, view its text and history of
resource utilization.
 Performance tuning annotations that show actions performed by SQL Azure
Database Advisor
*Query Performance Insight requires that Query Store is active
on your database. If Query Store is not running, the portal
prompts you to turn it on.

103
Most Secure Database
Secure Code
• Secure development lifecycle
• Least vulnerable last 7 years
• SQL Threat Detection
• SQL Server Auditing
• Row-level Security
• Dynamic Data Masking
• AlwaysEncrypted
• TransparentDataEncryption
• Encryption-in-flight
Identity
Management
• SQL Authentication
• Windows Authentication
• Azure Active Directory Auth.
Monitor activity
Control access
Protect data

104
Why SQL Security Intelligence?
Common threats
• SQL injection
• Password cracking
• Credential theft/leak
• Privilege abuse
Secure your database
1. Discover sensitive data
2. Identify & remediate SQL vulnerabilities
3. Detect & remediate suspicious database activities
4. Meet security regulations requirements
Common regulations
• GDPR (Personal)
• PCI (Payment)
• HIPPA(Health)
• FedRAMP(Government)
- No organization is immune to data breaches and security incidents
- 75% perpetrated by outsiders, while 25% involved internal actors
Verizon Data Breach Investigation Report 2017
SQL

106
SQLVulnerability Assessment

107
 Automated discovery and
classification of sensitive data
 Labeling (tagging) sensitive data on
column level with persistency
 Audit access to sensitive data
 Visibility through dashboards and
reports
 Hybrid cloud + on-premises

110
Track and improve database security state
Azure SQL
Database
Vulnerability
Assessment
Identifies, tracks,
resolves SQL security
vulnerabilities





SQL Server On-Prem
Azure SQL Database

112

113

115
Detects suspicious database activities
 Just turn it ON
 Detects potential
vulnerabilities and SQL
injection attacks
 Detects unusual behavior
activities
 Actionable alerts which
recommend how to
investigate & remediate
Azure SQL DatabaseApps
Audit
Log
Threat Detection
(1) Turn on Threat Detection
(3) Real-time actionable alerts
*It costs $15/server/month , first 60 days for free.
(2) Possible threat to
access / breach data

117

120
http://download.microsoft.com/downloa
d/4/9/4/4948194B-A613-49ED-90A5-
5144313549AB/microsoft-sql-and-the-
gdpr.pdf

121
Service Endpoint
Restrict Access to the DB
from VMs in a given
VNET/Subnet
Separation of duties between network
admin and DB admin
Simplify management of VIPs and
firewall rules;
Server-level configuration
available for SQL Database, SQL Data
Warehouse

126
Orchestration Key ManagementPrivate Connections Monitoring
AZURE EXPRESSROUTE AZURE DATA FACTORY AZURE KEY VAULT OPERATIONS MANAGEMENT SUITE
AZURE SQL DATA WAREHOUSE
DATA FACTORY
DATA FACTORY
AZURE MACHINE LEARNING & MACHINE LEARNING SERVER
AZURE DATA LAKE STORE AZURE DATA LAKE ANALYTICS COSMOS DB WEB & MOBILE APPS
AZURE STREAM ANALYTICS
Power BI
COGNITIVE SERVICESBOT SERVICE Logic App
AZURE ANALYSIS SERVICES

127
SMP vs. MPP Architecture
VS
Scale-up Scale-out
Symmetric Multi-Processing (SMP) vs. Massively Parallel Processing (MPP)

Compute
Dist_DB_1
Dist_DB_2
Dist_DB_12
Premium storage
Queries Control
Engine
DMS
SQL DB
DMS
SQL DB
…
Compute
Dist_DB_13
Dist_DB_14
Dist_DB_24
DMS
SQL DB
… Compute
Dist_DB_25
Dist_DB_26
Dist_DB_36
DMS
SQL DB
…
Compute
Dist_DB_37
Dist_DB_38
Dist_DB_48
DMS
SQL DB
…
Compute
Dist_DB_49
Dist_DB_50
Dist_DB_60
DMS
SQL DB
…

Compute
Dist_DB_1
Dist_DB_2
Dist_DB_6
Queries Control
Engine
DMS
SQL DB
DMS
SQL DB
…
Compute
Dist_DB_7
Dist_DB_8
Dist_DB_12
DMS
SQL DB
…
Compute
Dist_DB_13
Dist_DB_14
Dist_DB_18
DMS
SQL DB
… Compute
Dist_DB_19
Dist_DB_20
Dist_DB_24
DMS
SQL DB
…
Compute
Dist_DB_25
Dist_DB_26
Dist_DB_30
DMS
SQL DB
…
Compute
Dist_DB_31
Dist_DB32
Dist_DB_26
DMS
SQL DB
…
Compute
Dist_DB_37
Dist_DB_38
Dist_DB_42
DMS
SQL DB
…
Compute
Dist_DB_43
Dist_DB_44
Dist_DB_48
DMS
SQL DB
…
Compute
Dist_DB_49
Dist_DB_50
Dist_DB_54
DMS
SQL DB
…
Compute
Dist_DB_55
Dist_DB_56
Dist_DB_60
DMS
SQL DB
…
Premium storage

132
DataWarehouse Units
Normalized amount of compute
Converts to billing units i.e. what you pay
DWU
100
200
300
400
500
600
1000
1200
1500
2000
3000
6000

133
Azure SQL DataWarehouse
Azure SQL DataWarehouse offers two different performance tiers:
 Optimized for Elasticity On this performance tier, storage and compute are in
separate architectural layers.This tier is ideal for workloads of heavy peaks of
activity, allowing you to scale the compute and storage tiers separately
depending on your needs.
 Optimized for Compute Microsoft provides you with the latest hardware for
this performance tier, using NVMe Solid State Disk cache.This way, most
recently accessed data keeps as close as possible to the CPU.This tier provides
the highest level of scalability, by providing you up to 30,000 compute Data
Warehouse Units (cDWU).

139
How to choose your performance tier
Elasticity Compute
Current status Generally available Preview in fall
Regional availability 33 6 (growing over time)
Entry pricing $1.21 / hour $6.05 / hour (preview rate)
Starting scale point 100 DWUs 1000 cDWUs
Max compute scale 6,000 DWUs 30,000 cDWUs
Max storage 240TB (compressed) Unlimited (columnar)
Use of elasticity Dynamic “burst” scaling Incremental scaling
Min memory per query 6GB 15 GB
Language surface area Same Same

140
Hash-distributed tables
A hash distributed table can deliver the highest
query performance for joins and aggregations on
large tables.

141
Round-robin distributed tables
A round-robin table is the simplest table to create and delivers
fast performance when used as a staging table for loads.

142
ReplicatedTables
A replicated table provides the fastest query
performance for small tables.

143
Table Distribution Options

144
Data Migration Recommendations
Data FormatConversion
• Date Format, Field delimiters, escaping, field order, encoding
Compression
• Use Gzip, ORC, Parquet
• 7-Zip utility, .NET/JAVA libraries
Export
• BCP for fast export
• Multiple files per large table, one folder per table
Copy
• AZCopy
• Data Movement Library
Tips
• Incorrect format means migration
needs to be entirely repeated
• Exploit bcp options, hints, parallelism
• Multiple compressed files, Split files
• Parallel import, reliable transfer
• Don’t use multiple files in the same
gziped file
• EfficientCopy
• Parallel, Async, Resumable
• Limit concurrent copies if low
bandwidth
• Very Large Data transfer
• Express Route, Import/Export Service

145
Data Loading Recommendations
PolyBase and SSIS (with 2017 Azure feature pack) the fastest method
• Upload to BLOB viaAZCOPY or PowerShell library
• Historical load – use CTAS
• Incremental – use INSERT…SELECT
Use the highest resource class (without sacrificing concurrency)
Increase DWU during load, decrease when done
PolyBase now supports UTF-16 file types.ADLS as a source and target is also supported
Known Issues:
• Does not support extendedASCII
• Does not support custom multi-date format. E.g. 2000-1-6
• No reject files/reason for rejected rows.

146
PolyBase
WHAT does PolyBase do?
HOWdoesitdoit?

147
Target workload: Analytics (OLAP)
 Store large volumes of data
 Consolidate disparate data into a single location
 Shape, model, transform and aggregate data
 Perform query analysis across large datasets
 Ad-hoc reporting across large data volumes
 All using simple SQL constructs

148
Unsuitable workloads
Operational workloads (OLTP)
 High frequency reads & writes
 Large numbers of singleton selects
 High volume of single row inserts
Data Preparation
 Row by row processing needs
 Incompatible formats (JSON, XML)

ThankYou
MohamedTawfik
Azure CoE EMEA

Azure SQL Database & Azure SQL Data Warehouse

More Related Content

What's hot

What's hot (20)

Similar to Azure SQL Database & Azure SQL Data Warehouse

Similar to Azure SQL Database & Azure SQL Data Warehouse (20)

More from Mohamed Tawfik

More from Mohamed Tawfik (20)

Recently uploaded

Recently uploaded (20)

Azure SQL Database & Azure SQL Data Warehouse

Editor's Notes