Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
SlideShare a Scribd company logo
Azure Cosmos DB
Technical Deep Dive
Andre Essing
Technology Solutions Professional
Microsoft Deutschland GmbH
Andre advises customers in topics all around the
Microsoft Data Platform. Since version 7.0, Andre
gathering experience with the SQL Server product
family. Today Andre concentrates on working with
data in the cloud, like Modern Data Warehouse
architectures, Artificial Intelligence and new scalable
database systems like Azure Cosmos DB.
aessing/Andre_Essingandre.essing@microsoft.com /aessing @aessingandreessing.de
W H AT I S N O S Q L
NOSQL, BUILT FOR SIMPLE AND FAST APPLICATION
DEVELOPMENT
NoSQL, referring most times to “Non-SQL”, “Not Only SQL” or
also “non-relational” is a kind of database where the data is
modeled differently to relational systems.
• Different kinds available
• Document
• Key/Value
• Columnar
• Graph
• etc.
• Non-Relational
• Schema agnostic
• Built for scale and performance
• Different consistency model
D I F F E R E N T WAY S O F S TO R I N G D ATA W I T H Y O U R
M O D E R N A P P
Come as you are
Data normalization
SQL
MongoDB
Table API
Turnkey global
distribution
Elastic scale out
of storage & throughput
Guaranteed low latency
at the 99th percentile
Comprehensive
SLAs
Five well-defined
consistency models
A Z U R E C O S M O S D B
DocumentColumn-family
Key-value Graph
A globally distributed, massively scalable, multi-model database service
Leveraging Azure Cosmos DB to automatically scale
your data across the globe
This module will reference partitioning in the context
of all Azure Cosmos DB modules and APIs.
R E S O U R C E M O D E L
Account
DatabaseDatabaseDatabase
DatabaseDatabaseContainer
DatabaseDatabaseItem
Account
DatabaseDatabaseDatabase
DatabaseDatabaseContainer
DatabaseDatabaseItem
A C C O U N T U R I A N D C R E D E N T I A L S
********.azure.com
IGeAvVUp …
D ATA B A S E R E P R E S E N TAT I O N S
Account
DatabaseDatabaseDatabase
DatabaseDatabaseContainer
DatabaseDatabaseItem
DatabaseDatabaseUsers
DatabaseDatabasePermission
C O N TA I N E R R E P R E S E N TAT I O N S
Account
DatabaseDatabaseDatabase
DatabaseDatabaseContainer
DatabaseDatabaseItem
= Collection Graph Table
C O N TA I N E R - L E V E L R E S O U R C E S
Account
DatabaseDatabaseDatabase
DatabaseDatabaseContainer
DatabaseDatabaseItem ConflictSproc Trigger UDF
D E M O
S Y S T E M TO P O LO G Y ( B E H I N D T H E S C E N E S )
Resource
Manager
Language
Runtime(s)
Hosts
Query
Processor
RSM
Index Manager
Bw-tree++/ LLAMA++
Log Manager
IO Manager
Resource Governor
Transport
Database engine
Admission control
…
…
Planet Earth Azure regions Datacenters Stamps Fault domains
Cluster Machine Replica Database engine
Container
Various agents
R E S O U R C E H I E R A R C H Y
CONTAINERS
Logical resources “surfaced” to APIs as tables,
collections or graphs, which are made up of one or
more physical partitions or servers.
RESOURCE PARTITIONS
• Consistent, highly available, and resource-governed
coordination primitives
• Consist of replica sets, with each replica hosting an
instance of the database engine
Containers
Resource Partitions
CollectionsTables Graphs
Tenants
Leader
Follower
Follower
Forwarder
Replica Set
To remote resource partition(s)
Request Units (RUs) is a rate-based currency
Abstracts physical resources for performing requests
Key to multi-tenancy, SLAs, and COGS efficiency
Foreground and background activities
R E Q U E S T U N I T S
% IOPS% CPU% Memory
R E Q U E S T U N I T S
Normalized across various access methods
1 RU = 1 read of 1 KB document
Each request consumes fixed RUs
Applies to reads, writes, queries, and stored procedure
execution
GET
POST
PUT
Query
…
=
=
=
=
R E Q U E S T U N I T S
Normalized across various access methods
1 read of 1 KB document from a single partition
Each request consumes fixed RUs
Applies to reads, writes, queries, and stored procedure
execution
Provisioned in terms of RU/sec
Rate limiting based on amount of throughput provisioned
Can be increased or decreased instantaneously
Metered Hourly
Background processes like TTL expiration, index
transformations scheduled when quiescent
Min RU/sec
Max RU/sec
IncomingRequests
Replica Quiescent
Rate limit
No rate limiting
E L A S T I C S C A L E O U T O F S TO R A G E A N D T H R O U G H P U T
SCALES AS YOUR APPS’ NEEDS CHANGE
Independently and elastically scale storage and
throughput across regions – even during unpredictable
traffic bursts – with a database that adapts to your
app’s needs.
• Elastically scale throughput from 10 to
100s of millions of requests/sec across
multiple regions
• Support for requests/sec for different
workloads
• Pay only for the throughput and
storage you need
Leveraging Azure Cosmos DB to automatically scale
your data across the globe
This module will reference partitioning in the context
of all Azure Cosmos DB modules and APIs.
PA R T I T I O N I N G
PA R T I T I O N S
Cosmos DB Container
(e.g. Collection)
Partition Key: City
Logical Partitioning Abstraction
Behind the Scenes:
Physical Partition Sets
hash(City)
Psuedo-random distribution of data over range of possible hashed values
PA R T I T I O N S
…
Partition 1 Partition 2 Partition n
Frugal # of Partitions based on actual storage and throughput needs
(yielding scalability with low total cost of ownership)
hash(City)
Pseudo-random distribution of data over range of possible hashed values
Cologne
Hamburg
…
Munich
Stuttgart
Berlin
Leipzig
Bremen
Frankfurt
Dresden
…
PA R T I T I O N S
What happens when partitions need to grow?
hash(City)
Pseudo-random distribution of data over range of possible hashed values
…
Partition 1 Partition 2 Partition n
Cologne
Hamburg
…
Munich
Stuttgart
Berlin
Leipzig
Bremen
Frankfurt
Dresden
…
PA R T I T I O N S
Partition Ranges can be dynamically sub-divided to seamlessly
grow database as the application grows while simultaneously
maintaining high availability.
Partition management is fully managed by Azure Cosmos DB,
so you don't have to write code or manage your partitions.
+
Partition x Partition x1 Partition x2
hash(User ID)
Pseudo-random distribution of data over range of possible hashed values
Stuttgart
Berlin
…
Cologne
Hamburg
Stuttgart
Berlin
Leipzig
Dresden
…
Cologne
Hamburg
…
PA R T I T I O N S
Best Practices: Design Goals for Choosing a Good Partition Key
• Distribute the overall request + storage volume
• Avoid “hot” partition keys
Steps for Success
• Ballpark scale needs (size/throughput)
• Understand the workload
• # of reads/sec vs writes per sec
• Use pareto principal (80/20 rule) to help optimize bulk of workload
• For reads – understand top 3-5 queries (look for common filters)
• For writes – understand transactional needs
General Tips
• Build a POC to strengthen your understanding of the workload and
iterate (avoid analyses paralysis)
• Don’t be afraid of having too many partition keys
• Partitions keys are logical
• More partition keys = more scalability
• Partition Key is scope for multi-record transactions and routing queries
• Queries can be intelligently routed via partition key
• Omitting partition key on query requires fan-out
D E M O
High Availability
• Automatic and Manual Failover
• Multi-homing API removes need for app
redeployment
Low Latency (anywhere in the world)
• Packets cannot move fast than the speed of light
• Sending a packet across the world under ideal
network conditions takes 100’s of milliseconds
• You can cheat the speed of light – using data
locality
• CDN’s solved this for static content
• Azure Cosmos DB solves this for dynamic content
T U R N K E Y G LO B A L
D I S T R I B U T I O N
T U R N K E Y G LO B A L D I S T R I B U T I O N
• Automatic and transparent replication worldwide
• Each partition hosts a replica set per region
• Customers can test end to end application
availability by programmatically simulating failovers
• All regions are hidden behind a single global URI
with multi-homing capabilities
• Customers can dynamically add / remove
additional regions at any time
Writes/
Reads
Reads
"airport" : “AMS" "airport" : “MEL"
West US
Container
"airport" : "LAX"
Local Distribution (via horizontal partitioning)
GlobalDistribution(ofresourcepartitions)
Reads
30K transactions/sec
Writes/
Reads
Reads
Reads
West Europe
30K transactions/sec
Partition-key = "airport"
R E P L I C AT I N G D ATA G LO B A L LY
R E P L I C AT I N G D ATA G LO B A L LY
D E M O
Impossible for distributed
data store to simultaneously
provide more than 2 out of
the following 3 guarantees:
• Consistency
• Availability
• Partition Tolerance
B R E W E R ’ S C A P T H E O R E M
C O N S I S T E N C Y
(West US)
(East US)
(North Europe)
Value = 5
Value = 5
Value = 5
Update 5 => 6
What happens when
a network partition
is introduced?
Reader: What is the value?
Should it see 5? (prioritize availability)
Or does the system go offline until
network is restored? (prioritize
consistency)
6
6
PA C E LC T H E O R E M
In the case of network
partitioning (P) in a
distributed computer system,
one has to choose between
availability (A) and
consistency (C) (as per the
CAP theorem), but else (E),
even when the system is
running normally in the
absence of partitions, one has
to choose between latency (L)
and consistency (C).
C O N S I S T E N C Y
Value = 5
Value = 5
Value = 5
Update 5 => 6
6
6
Latency: packet of information can travel as fast as speed of light. Replication between distant geographic regions can take 100’s of milliseconds
C O N S I S T E N C Y
Value = 5
Value = 5
Value = 5
Update 5 => 6
6
6
Should it Reader B see 5 immediately?
(prioritize latency)
Does it see the same result as reader
A? (quorum impacts throughput)
Does it sit and wait for 5 => 6
propagate? (prioritize consistency)Reader B: What is the value?
Reader A: What is the value?
Strong Bounded-staleness Session Consistent prefix Eventual
F I V E W E L L - D E F I N E D C O N S I S T E N C Y M O D E L S
CHOOSE THE BEST CONSISTENCY MODEL FOR YOUR APP
Five well-defined, consistency models
Overridable on a per-request basis
Provides control over performance-consistency tradeoffs,
backed by comprehensive SLAs.
An intuitive programming model offering low latency and
high availability for your planet-scale app.
CLEAR TRADEOFFS
• Latency
• Availability
• Throughput
D E M Y S T I F Y I N G C O N S I S T E N C Y M O D E L S
Strong consistency
Guarantees linearizability. Once an operation is complete, it will be visible to
all readers in a strongly-consistent manner across replicas.
Eventual consistency
Replicas are eventually consistent with any operations. There is a potential for
out-of-order reads. Lowest cost and highest performance for reads of all
consistency levels.
Strong
Eventual
Bounded-staleness
Session
Consistent prefix
D E M Y S T I F Y I N G C O N S I S T E N C Y M O D E L S
Bounded-staleness
Consistent prefix. Reads lag behind writes by at most k prefixes or t interval.
Similar properties to strong consistency except within staleness window.
Session
Consistent prefix. Within a session, reads and writes are monotonic. This is
referred to as “read-your-writes” and “write-follows-reads”. Predictable
consistency for a session. High read throughput and low latency outside of
session.
Consistent Prefix
Reads will never see out of order writes.
D E M O
Azure Cosmos DB’s schema-less service automatically
indexes all your data, regardless of the data model, to
delivery blazing fast queries.
H A N D L E A N Y D ATA
W I T H N O S C H E M A O R
I N D E X I N G R E Q U I R E D
Item Color
Microwave
safe
Liquid
capacity
CPU Memory Storage
Geek
mug
Graphite Yes 16ox ??? ??? ???
Coffee
Bean
mug
Tan No 12oz ??? ??? ???
Surface
book
Gray ??? ??? 3.4 GHz
Intel
Skylake
Core i7-
6600U
16GB 1 TB SSD
• Automatic index management
• Synchronous auto-indexing
• No schemas or secondary indices needed
• Works across every data model
GEEK
I N D E X I N G J S O N D O C U M E N T S
{
"locations": [
{
"country": "Germany",
"city": "Berlin"
},
{
"country": "France",
"city": "Paris"
}
],
"headquarter": "Belgium",
"exports": [
{ "city": "Moscow" },
{ "city": "Athens" }
]
}
locations headquarter exports
0
country city
Germany Berlin
1
country city
France Paris
0 1
city
Athens
city
Moscow
Belgium
I N D E X I N G J S O N D O C U M E N T S
{
"locations": [
{
"country": "Germany",
"city": "Bonn",
"revenue": 200
}
],
"headquarter": "Italy",
"exports": [
{
"city": "Berlin",
"dealers": [
{ "name": "Hans" }
]
},
{ "city": "Athens" }
]
}
locations headquarter exports
0
country city
Germany Bonn
revenue
200
0 1
citycity
Berlin
Italy
dealers
0
name
Hans
I N D E X I N G J S O N D O C U M E N T S
Athens
locations headquarter exports
0
country city
Germany Bonn
revenue
200
0 1
citycity
Berlin
Italy
dealers
0
name
Hans
locations headquarter exports
0
country city
Germany Berlin
1
country city
France Paris
0 1
city
Athens
city
Moscow
Belgium
I N V E R T E D I N D E X
locations headquarter exports
0
country city
Germany
Berlin
revenue
200
0 1
city
Athens
city
Berlin
Italy
dealers
0
name
Hans
Bonn
1
country city
France Paris
Belgium
Moscow
I N D E X P O L I C I E S
CUSTOM INDEXING POLICIES
Though all Azure Cosmos DB data is indexed by default, you
can specify a custom indexing policy for your collections.
Custom indexing policies allow you to design and customize
the shape of your index while maintaining schema flexibility.
• Define trade-offs between storage, write and query
performance, and query consistency
• Include or exclude documents and paths to and from the
index
• Configure various index types
{
"automatic": true,
"indexingMode": "Consistent",
"includedPaths": [{
"path": "/*",
"indexes": [{
"kind": "Hash",
"dataType": "String",
"precision": -1
}, {
"kind": "Range",
"dataType": "Number",
"precision": -1
}, {
"kind": "Spatial",
"dataType": "Point"
}]
}],
"excludedPaths": [{
"path": "/nonIndexedContent/*"
}]
}
D E M O
Some data produced by applications are only useful
for a finite period of time:
• Machine-generated event data
• Application log data
• User session information
It is important that the database system systematically
purges this data at pre-configured intervals.
S H O R T - L I F E T I M E D ATA
T I M E - TO - L I V E ( T T L )
AUTOMATICALLY PURGE DATA
Azure Cosmos DB allows you to set the length of time in
which documents live in the database before being
automatically purged. A document's "time-to-live" (TTL) is
measured in seconds from the last modification and can be
set at the collection level with override on a per-document
basis.
The TTL value is specified in the _ts field which exists on every
document.
• The _ts field is a unix-style epoch timestamp representing
the date and time. The _ts field is updated every time a
document is modified.
Once TTL is set, Azure Cosmos DB will automatically remove
documents that exist after that period of time.
E X P I R I N G R E C O R D S U S I N G T I M E - TO - L I V E
TTL BEHAVIOR
The TTL feature is controlled by TTL properties at two levels -
the collection level and the document level.
• DefaultTTL for the collection
• If missing (or set to null), documents are not deleted
automatically.
• If present and the value is "-1" = infinite –
documents don’t expire by default
• If present and the value is some number ("n") –
documents expire "n” seconds after last modification
• TTL for the documents:
• Property is applicable only if DefaultTTL is present
for the parent collection.
• Overrides the DefaultTTL value for the parent
collection.
The values are set in seconds and are treated as a delta from
the _ts that the document was last modified at.
Document
Document TTL
Default TTL
IoT, gaming, retail and operational logging applications
need to track and respond to tremendous amount of
data being ingested, modified or removed from a
globally-scaled database.
COMMON SCENARIOS
• Trigger notification for new items
• Perform real-time analytics on streamed data
• Synchronize data with a cache, search engine or data
warehouse.
M O D E R N R E A C T I V E
A P P L I C AT I O N S
C H A N G E F E E D
Persistent log of records within an Azure Cosmos DB
container. Preseneted in the order in which they were
modified
C H A N G E F E E D S C E N A R I O S
Event/stream
processing app tier
C H A N G E F E E D W I T H PA R T I T I O N S
Consumer parallelization
Change feed listens for any changes in Azure Cosmos DB
collection. It then outputs the sorted list of documents that
were changed in the order in which they were modified.
The changes are persisted, can be processed asynchronously
and incrementally, and the output can be distributed across
one or more consumers for parallel processing. The change
feed is available for each partition key range within the
document collection, and thus can be distributed across one
or more consumers for parallel processing.
Consumer 1
Consumer 2
Consumer 3
C H A N G E F E E D P R O C E S S O R L I B R A R Y
https://www.nuget.org/packages
/Microsoft.Azure.DocumentDB.C
hangeFeedProcessor/
Run native JavaScript server-side programming
logic to performic atomic multi-record transactions.
This module will reference programming in the
context of the SQL API.
P R O G R A M M I N G
GEEK
C O N T R O L C O N C U R R E N C Y U S I N G E TA G S
OPTIMISTIC CONCURRENCY
• The SQL API supports optimistic concurrency control (OCC) through HTTP entity tags, or ETags
• Every SQL API resource has an ETag system property, and the ETag value is generated on the server every time a document is
updated.
• If the ETag value stays constant – that means no other process has updated the document. If the ETag value unexpectedly
mutates – then another concurrent process has updated the document.
• ETags can be used with the If-Match HTTP request header to allow the server to decide whether a resource should be
updated:
HTTP 412
BENEFITS
• Familiar programming language
• Atomic Transactions
• Built-in Optimizations
• Business Logic Encapsulation
S TO R E D P R O C E D U R E S
M U LT I - D O C U M E N T T R A N S A C T I O N S
DATABASE TRANSACTIONS
In a typical database, a transaction can be defined as a
sequence of operations performed as a single logical
unit of work. Each transaction provides ACID guarantees.
In Azure Cosmos DB, JavaScript is hosted in the same
memory space as the database. Hence, requests made
within stored procedures and triggers execute in the
same scope of a database session.
Create
New
Document
Query
Collection
Update
Existing
Document
Delete
Existing
Document
Stored procedures utilize snapshot
isolation to guarantee all reads
within the transaction will see a
consistent snapshot of the data
T R A N S A C T I O N C O N T I N U AT I O N M O D E L
CONTINUING LONG-RUNNING TRANSACTIONS
• JavaScript functions can implement a continuation-based model
to batch/resume execution
• The continuation value can be any value of your own choosing.
This value can then be used by your applications to resume a
transaction from a new “starting point”
Bulk Create Documents
Return a “pointer” to resume later
Observe
Return
Value
Try Create
Each
Document
Done
Azure Cosmos DB - Technical Deep Dive

More Related Content

Azure Cosmos DB - Technical Deep Dive

  • 2. Andre Essing Technology Solutions Professional Microsoft Deutschland GmbH Andre advises customers in topics all around the Microsoft Data Platform. Since version 7.0, Andre gathering experience with the SQL Server product family. Today Andre concentrates on working with data in the cloud, like Modern Data Warehouse architectures, Artificial Intelligence and new scalable database systems like Azure Cosmos DB. aessing/Andre_Essingandre.essing@microsoft.com /aessing @aessingandreessing.de
  • 3. W H AT I S N O S Q L NOSQL, BUILT FOR SIMPLE AND FAST APPLICATION DEVELOPMENT NoSQL, referring most times to “Non-SQL”, “Not Only SQL” or also “non-relational” is a kind of database where the data is modeled differently to relational systems. • Different kinds available • Document • Key/Value • Columnar • Graph • etc. • Non-Relational • Schema agnostic • Built for scale and performance • Different consistency model
  • 4. D I F F E R E N T WAY S O F S TO R I N G D ATA W I T H Y O U R M O D E R N A P P Come as you are Data normalization
  • 5. SQL MongoDB Table API Turnkey global distribution Elastic scale out of storage & throughput Guaranteed low latency at the 99th percentile Comprehensive SLAs Five well-defined consistency models A Z U R E C O S M O S D B DocumentColumn-family Key-value Graph A globally distributed, massively scalable, multi-model database service
  • 6. Leveraging Azure Cosmos DB to automatically scale your data across the globe This module will reference partitioning in the context of all Azure Cosmos DB modules and APIs. R E S O U R C E M O D E L Account DatabaseDatabaseDatabase DatabaseDatabaseContainer DatabaseDatabaseItem
  • 7. Account DatabaseDatabaseDatabase DatabaseDatabaseContainer DatabaseDatabaseItem A C C O U N T U R I A N D C R E D E N T I A L S ********.azure.com IGeAvVUp …
  • 8. D ATA B A S E R E P R E S E N TAT I O N S Account DatabaseDatabaseDatabase DatabaseDatabaseContainer DatabaseDatabaseItem DatabaseDatabaseUsers DatabaseDatabasePermission
  • 9. C O N TA I N E R R E P R E S E N TAT I O N S Account DatabaseDatabaseDatabase DatabaseDatabaseContainer DatabaseDatabaseItem = Collection Graph Table
  • 10. C O N TA I N E R - L E V E L R E S O U R C E S Account DatabaseDatabaseDatabase DatabaseDatabaseContainer DatabaseDatabaseItem ConflictSproc Trigger UDF
  • 11. D E M O
  • 12. S Y S T E M TO P O LO G Y ( B E H I N D T H E S C E N E S ) Resource Manager Language Runtime(s) Hosts Query Processor RSM Index Manager Bw-tree++/ LLAMA++ Log Manager IO Manager Resource Governor Transport Database engine Admission control … … Planet Earth Azure regions Datacenters Stamps Fault domains Cluster Machine Replica Database engine Container Various agents
  • 13. R E S O U R C E H I E R A R C H Y CONTAINERS Logical resources “surfaced” to APIs as tables, collections or graphs, which are made up of one or more physical partitions or servers. RESOURCE PARTITIONS • Consistent, highly available, and resource-governed coordination primitives • Consist of replica sets, with each replica hosting an instance of the database engine Containers Resource Partitions CollectionsTables Graphs Tenants Leader Follower Follower Forwarder Replica Set To remote resource partition(s)
  • 14. Request Units (RUs) is a rate-based currency Abstracts physical resources for performing requests Key to multi-tenancy, SLAs, and COGS efficiency Foreground and background activities R E Q U E S T U N I T S % IOPS% CPU% Memory
  • 15. R E Q U E S T U N I T S Normalized across various access methods 1 RU = 1 read of 1 KB document Each request consumes fixed RUs Applies to reads, writes, queries, and stored procedure execution GET POST PUT Query … = = = =
  • 16. R E Q U E S T U N I T S Normalized across various access methods 1 read of 1 KB document from a single partition Each request consumes fixed RUs Applies to reads, writes, queries, and stored procedure execution Provisioned in terms of RU/sec Rate limiting based on amount of throughput provisioned Can be increased or decreased instantaneously Metered Hourly Background processes like TTL expiration, index transformations scheduled when quiescent Min RU/sec Max RU/sec IncomingRequests Replica Quiescent Rate limit No rate limiting
  • 17. E L A S T I C S C A L E O U T O F S TO R A G E A N D T H R O U G H P U T SCALES AS YOUR APPS’ NEEDS CHANGE Independently and elastically scale storage and throughput across regions – even during unpredictable traffic bursts – with a database that adapts to your app’s needs. • Elastically scale throughput from 10 to 100s of millions of requests/sec across multiple regions • Support for requests/sec for different workloads • Pay only for the throughput and storage you need
  • 18. Leveraging Azure Cosmos DB to automatically scale your data across the globe This module will reference partitioning in the context of all Azure Cosmos DB modules and APIs. PA R T I T I O N I N G
  • 19. PA R T I T I O N S Cosmos DB Container (e.g. Collection) Partition Key: City Logical Partitioning Abstraction Behind the Scenes: Physical Partition Sets hash(City) Psuedo-random distribution of data over range of possible hashed values
  • 20. PA R T I T I O N S … Partition 1 Partition 2 Partition n Frugal # of Partitions based on actual storage and throughput needs (yielding scalability with low total cost of ownership) hash(City) Pseudo-random distribution of data over range of possible hashed values Cologne Hamburg … Munich Stuttgart Berlin Leipzig Bremen Frankfurt Dresden …
  • 21. PA R T I T I O N S What happens when partitions need to grow? hash(City) Pseudo-random distribution of data over range of possible hashed values … Partition 1 Partition 2 Partition n Cologne Hamburg … Munich Stuttgart Berlin Leipzig Bremen Frankfurt Dresden …
  • 22. PA R T I T I O N S Partition Ranges can be dynamically sub-divided to seamlessly grow database as the application grows while simultaneously maintaining high availability. Partition management is fully managed by Azure Cosmos DB, so you don't have to write code or manage your partitions. + Partition x Partition x1 Partition x2 hash(User ID) Pseudo-random distribution of data over range of possible hashed values Stuttgart Berlin … Cologne Hamburg Stuttgart Berlin Leipzig Dresden … Cologne Hamburg …
  • 23. PA R T I T I O N S Best Practices: Design Goals for Choosing a Good Partition Key • Distribute the overall request + storage volume • Avoid “hot” partition keys Steps for Success • Ballpark scale needs (size/throughput) • Understand the workload • # of reads/sec vs writes per sec • Use pareto principal (80/20 rule) to help optimize bulk of workload • For reads – understand top 3-5 queries (look for common filters) • For writes – understand transactional needs General Tips • Build a POC to strengthen your understanding of the workload and iterate (avoid analyses paralysis) • Don’t be afraid of having too many partition keys • Partitions keys are logical • More partition keys = more scalability • Partition Key is scope for multi-record transactions and routing queries • Queries can be intelligently routed via partition key • Omitting partition key on query requires fan-out
  • 24. D E M O
  • 25. High Availability • Automatic and Manual Failover • Multi-homing API removes need for app redeployment Low Latency (anywhere in the world) • Packets cannot move fast than the speed of light • Sending a packet across the world under ideal network conditions takes 100’s of milliseconds • You can cheat the speed of light – using data locality • CDN’s solved this for static content • Azure Cosmos DB solves this for dynamic content T U R N K E Y G LO B A L D I S T R I B U T I O N
  • 26. T U R N K E Y G LO B A L D I S T R I B U T I O N • Automatic and transparent replication worldwide • Each partition hosts a replica set per region • Customers can test end to end application availability by programmatically simulating failovers • All regions are hidden behind a single global URI with multi-homing capabilities • Customers can dynamically add / remove additional regions at any time Writes/ Reads Reads "airport" : “AMS" "airport" : “MEL" West US Container "airport" : "LAX" Local Distribution (via horizontal partitioning) GlobalDistribution(ofresourcepartitions) Reads 30K transactions/sec Writes/ Reads Reads Reads West Europe 30K transactions/sec Partition-key = "airport"
  • 27. R E P L I C AT I N G D ATA G LO B A L LY
  • 28. R E P L I C AT I N G D ATA G LO B A L LY
  • 29. D E M O
  • 30. Impossible for distributed data store to simultaneously provide more than 2 out of the following 3 guarantees: • Consistency • Availability • Partition Tolerance B R E W E R ’ S C A P T H E O R E M
  • 31. C O N S I S T E N C Y (West US) (East US) (North Europe) Value = 5 Value = 5 Value = 5 Update 5 => 6 What happens when a network partition is introduced? Reader: What is the value? Should it see 5? (prioritize availability) Or does the system go offline until network is restored? (prioritize consistency) 6 6
  • 32. PA C E LC T H E O R E M In the case of network partitioning (P) in a distributed computer system, one has to choose between availability (A) and consistency (C) (as per the CAP theorem), but else (E), even when the system is running normally in the absence of partitions, one has to choose between latency (L) and consistency (C).
  • 33. C O N S I S T E N C Y Value = 5 Value = 5 Value = 5 Update 5 => 6 6 6 Latency: packet of information can travel as fast as speed of light. Replication between distant geographic regions can take 100’s of milliseconds
  • 34. C O N S I S T E N C Y Value = 5 Value = 5 Value = 5 Update 5 => 6 6 6 Should it Reader B see 5 immediately? (prioritize latency) Does it see the same result as reader A? (quorum impacts throughput) Does it sit and wait for 5 => 6 propagate? (prioritize consistency)Reader B: What is the value? Reader A: What is the value?
  • 35. Strong Bounded-staleness Session Consistent prefix Eventual F I V E W E L L - D E F I N E D C O N S I S T E N C Y M O D E L S CHOOSE THE BEST CONSISTENCY MODEL FOR YOUR APP Five well-defined, consistency models Overridable on a per-request basis Provides control over performance-consistency tradeoffs, backed by comprehensive SLAs. An intuitive programming model offering low latency and high availability for your planet-scale app. CLEAR TRADEOFFS • Latency • Availability • Throughput
  • 36. D E M Y S T I F Y I N G C O N S I S T E N C Y M O D E L S Strong consistency Guarantees linearizability. Once an operation is complete, it will be visible to all readers in a strongly-consistent manner across replicas. Eventual consistency Replicas are eventually consistent with any operations. There is a potential for out-of-order reads. Lowest cost and highest performance for reads of all consistency levels. Strong Eventual
  • 37. Bounded-staleness Session Consistent prefix D E M Y S T I F Y I N G C O N S I S T E N C Y M O D E L S Bounded-staleness Consistent prefix. Reads lag behind writes by at most k prefixes or t interval. Similar properties to strong consistency except within staleness window. Session Consistent prefix. Within a session, reads and writes are monotonic. This is referred to as “read-your-writes” and “write-follows-reads”. Predictable consistency for a session. High read throughput and low latency outside of session. Consistent Prefix Reads will never see out of order writes.
  • 38. D E M O
  • 39. Azure Cosmos DB’s schema-less service automatically indexes all your data, regardless of the data model, to delivery blazing fast queries. H A N D L E A N Y D ATA W I T H N O S C H E M A O R I N D E X I N G R E Q U I R E D Item Color Microwave safe Liquid capacity CPU Memory Storage Geek mug Graphite Yes 16ox ??? ??? ??? Coffee Bean mug Tan No 12oz ??? ??? ??? Surface book Gray ??? ??? 3.4 GHz Intel Skylake Core i7- 6600U 16GB 1 TB SSD • Automatic index management • Synchronous auto-indexing • No schemas or secondary indices needed • Works across every data model GEEK
  • 40. I N D E X I N G J S O N D O C U M E N T S { "locations": [ { "country": "Germany", "city": "Berlin" }, { "country": "France", "city": "Paris" } ], "headquarter": "Belgium", "exports": [ { "city": "Moscow" }, { "city": "Athens" } ] } locations headquarter exports 0 country city Germany Berlin 1 country city France Paris 0 1 city Athens city Moscow Belgium
  • 41. I N D E X I N G J S O N D O C U M E N T S { "locations": [ { "country": "Germany", "city": "Bonn", "revenue": 200 } ], "headquarter": "Italy", "exports": [ { "city": "Berlin", "dealers": [ { "name": "Hans" } ] }, { "city": "Athens" } ] } locations headquarter exports 0 country city Germany Bonn revenue 200 0 1 citycity Berlin Italy dealers 0 name Hans
  • 42. I N D E X I N G J S O N D O C U M E N T S Athens locations headquarter exports 0 country city Germany Bonn revenue 200 0 1 citycity Berlin Italy dealers 0 name Hans locations headquarter exports 0 country city Germany Berlin 1 country city France Paris 0 1 city Athens city Moscow Belgium
  • 43. I N V E R T E D I N D E X locations headquarter exports 0 country city Germany Berlin revenue 200 0 1 city Athens city Berlin Italy dealers 0 name Hans Bonn 1 country city France Paris Belgium Moscow
  • 44. I N D E X P O L I C I E S CUSTOM INDEXING POLICIES Though all Azure Cosmos DB data is indexed by default, you can specify a custom indexing policy for your collections. Custom indexing policies allow you to design and customize the shape of your index while maintaining schema flexibility. • Define trade-offs between storage, write and query performance, and query consistency • Include or exclude documents and paths to and from the index • Configure various index types { "automatic": true, "indexingMode": "Consistent", "includedPaths": [{ "path": "/*", "indexes": [{ "kind": "Hash", "dataType": "String", "precision": -1 }, { "kind": "Range", "dataType": "Number", "precision": -1 }, { "kind": "Spatial", "dataType": "Point" }] }], "excludedPaths": [{ "path": "/nonIndexedContent/*" }] }
  • 45. D E M O
  • 46. Some data produced by applications are only useful for a finite period of time: • Machine-generated event data • Application log data • User session information It is important that the database system systematically purges this data at pre-configured intervals. S H O R T - L I F E T I M E D ATA
  • 47. T I M E - TO - L I V E ( T T L ) AUTOMATICALLY PURGE DATA Azure Cosmos DB allows you to set the length of time in which documents live in the database before being automatically purged. A document's "time-to-live" (TTL) is measured in seconds from the last modification and can be set at the collection level with override on a per-document basis. The TTL value is specified in the _ts field which exists on every document. • The _ts field is a unix-style epoch timestamp representing the date and time. The _ts field is updated every time a document is modified. Once TTL is set, Azure Cosmos DB will automatically remove documents that exist after that period of time.
  • 48. E X P I R I N G R E C O R D S U S I N G T I M E - TO - L I V E TTL BEHAVIOR The TTL feature is controlled by TTL properties at two levels - the collection level and the document level. • DefaultTTL for the collection • If missing (or set to null), documents are not deleted automatically. • If present and the value is "-1" = infinite – documents don’t expire by default • If present and the value is some number ("n") – documents expire "n” seconds after last modification • TTL for the documents: • Property is applicable only if DefaultTTL is present for the parent collection. • Overrides the DefaultTTL value for the parent collection. The values are set in seconds and are treated as a delta from the _ts that the document was last modified at. Document Document TTL Default TTL
  • 49. IoT, gaming, retail and operational logging applications need to track and respond to tremendous amount of data being ingested, modified or removed from a globally-scaled database. COMMON SCENARIOS • Trigger notification for new items • Perform real-time analytics on streamed data • Synchronize data with a cache, search engine or data warehouse. M O D E R N R E A C T I V E A P P L I C AT I O N S
  • 50. C H A N G E F E E D Persistent log of records within an Azure Cosmos DB container. Preseneted in the order in which they were modified
  • 51. C H A N G E F E E D S C E N A R I O S
  • 52. Event/stream processing app tier C H A N G E F E E D W I T H PA R T I T I O N S Consumer parallelization Change feed listens for any changes in Azure Cosmos DB collection. It then outputs the sorted list of documents that were changed in the order in which they were modified. The changes are persisted, can be processed asynchronously and incrementally, and the output can be distributed across one or more consumers for parallel processing. The change feed is available for each partition key range within the document collection, and thus can be distributed across one or more consumers for parallel processing. Consumer 1 Consumer 2 Consumer 3
  • 53. C H A N G E F E E D P R O C E S S O R L I B R A R Y https://www.nuget.org/packages /Microsoft.Azure.DocumentDB.C hangeFeedProcessor/
  • 54. Run native JavaScript server-side programming logic to performic atomic multi-record transactions. This module will reference programming in the context of the SQL API. P R O G R A M M I N G GEEK
  • 55. C O N T R O L C O N C U R R E N C Y U S I N G E TA G S OPTIMISTIC CONCURRENCY • The SQL API supports optimistic concurrency control (OCC) through HTTP entity tags, or ETags • Every SQL API resource has an ETag system property, and the ETag value is generated on the server every time a document is updated. • If the ETag value stays constant – that means no other process has updated the document. If the ETag value unexpectedly mutates – then another concurrent process has updated the document. • ETags can be used with the If-Match HTTP request header to allow the server to decide whether a resource should be updated: HTTP 412
  • 56. BENEFITS • Familiar programming language • Atomic Transactions • Built-in Optimizations • Business Logic Encapsulation S TO R E D P R O C E D U R E S
  • 57. M U LT I - D O C U M E N T T R A N S A C T I O N S DATABASE TRANSACTIONS In a typical database, a transaction can be defined as a sequence of operations performed as a single logical unit of work. Each transaction provides ACID guarantees. In Azure Cosmos DB, JavaScript is hosted in the same memory space as the database. Hence, requests made within stored procedures and triggers execute in the same scope of a database session. Create New Document Query Collection Update Existing Document Delete Existing Document Stored procedures utilize snapshot isolation to guarantee all reads within the transaction will see a consistent snapshot of the data
  • 58. T R A N S A C T I O N C O N T I N U AT I O N M O D E L CONTINUING LONG-RUNNING TRANSACTIONS • JavaScript functions can implement a continuation-based model to batch/resume execution • The continuation value can be any value of your own choosing. This value can then be used by your applications to resume a transaction from a new “starting point” Bulk Create Documents Return a “pointer” to resume later Observe Return Value Try Create Each Document Done