100% found this document useful (1 vote)

184 views

Azure Cosmos DB Workshop

Azure Cosmos DB is a globally distributed, massively scalable, multi-model database service that offers guaranteed low latency at scale. It provides elastic scaling of storage and throughput, supports five consistency models, and has turnkey global distribution. Azure Cosmos DB also offers comprehensive SLAs and supports multiple data models and APIs.

Uploaded by

springlee

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

184 views

Azure Cosmos DB Workshop

Uploaded by

springlee

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 147

Azure Cosmos DB Workshop

Azure Cosmos DB
A globally distributed, massively scalable, multi-model database service

Guaranteed low latency at the 99th percentile

Elastic scale out
of storage & throughput Five well-defined consistency models

Turnkey global distribution Comprehensive SLAs

Azure Cosmos DB
A globally distributed, massively scalable, multi-model database service

Column-family
Document

Key-value Graph

Guaranteed low latency at the 99th percentile

Elastic scale out
of storage & throughput Five well-defined consistency models

Turnkey global distribution Comprehensive SLAs

Azure Cosmos DB
A globally distributed, massively scalable, multi-model database service

MongoDB
Table API

Column-family
Document

Key-value Graph

Guaranteed low latency at the 99th percentile

Elastic scale out
of storage & throughput Five well-defined consistency models

Turnkey global distribution Comprehensive SLAs

What sets Azure Cosmos DB apart
Turnkey Global Distribution
Worldwide presence as a Foundational Azure service

Automatic multi-region replication

Multi-homing APIs

Manual and automatic failovers

Designed for High Availability

Guaranteed low latency at P99 (99th percentile)
Requests are served from local region
Reads Indexed writes
(1KB) (1KB) Single-digit millisecond latency worldwide

Write optimized, latch-free database engine

P50 <2ms <6ms designed for SSD

Synchronous automatic indexing at sustained

P99 <10ms <15ms ingestion rates
Multiple, well-defined consistency choices
Global distribution forces us to navigate the CAP theorem

Writing correct distributed applications is hard

Five well-defined consistency levels

Intuitive and practical with clear PACELC tradeoffs

Programmatically change at anytime

Can be overridden on a per-request basis

Elastically scalable storage and throughput
Single machine is never a bottle neck
Provisioned request / sec

Black Friday
12000000
10000000
Transparent server-side partition management
8000000
6000000
4000000
Elastically scale storage (GB to PB) and throughput (100 to 100M req/sec)
across many machines and multiple regions
2000000

Nov 2016 Dec 2016

Time Automatic expiration via policy based TTL

Hourly throughput (request/sec)
Pay by the hour, change throughput at any time for only what you need
Schema-agnostic, automatic indexing
At global scale, schema/index management is painful

Automatic and synchronous indexing

Hash, range, and geospatial

Schema
Works across every data model

Highly write-optimized database engine

Physical index
Multi-model, multi-API
Database engine operates on Atom-Record-Sequence type system

All data models can be efficiently translated to ARS

Multi-model: Key-value, Document, Column, and Graph

Multi-API: SQL (DocumentDB), MongoDB, Table, Cassandra and Gremlin

More data-models and APIs to be added

Industry-leading, enterprise-grade SLAs
99.99% availability – even with a single region

Made possible with highly-redundant storage architecture

Guaranteed durability – writes are majority quorum committed

First and only service to offer SLAs on:

• Low-latency
• Consistency
• Throughput
Security & Compliance
Always encrypted at rest and in transit
• Encryption@ Rest – AES256
• Encryption @ Transit – SSL / TLS

Fine grained “row level” authorization

• User/Permissions with Resource Tokens

Network security with IP firewall rules and VNET

Comprehensive Azure compliance certification:

• ISO 27001, ISO 27018, EUMC, HIPAA
• PCI, SOC1 and SOC2
• FEDRAMP, HITRUST
Common Use Cases and Scenarios
Content Management Systems
Azure region A

Azure Cosmos DB
Azure region B (app + session state)

Azure Globally distributed

Traffic across regions
Manager Azure region C
Internet of Things – Telemetry & Sensor Data

Azure Cosmos DB (Hot) Azure API App

Azure IoT Hub Azure Databricks Spark
(TTL = 90 days) (user facing app)
(Structured Streaming)

Azure Function Azure Storage (Cold)

(triggered via Cosmos DB change feed)
Retail Product Catalogs

Azure Web App Azure Cosmos DB Azure Search

(e-commerce app) (product catalog) (full-text index)

Azure Storage
(logs, static Azure Cosmos DB
catalog content) (session state)
Retail Order Processing Pipelines

Azure Functions Azure Cosmos DB

(E-Commerce Checkout API) (Order Event Store)

...
Azure Functions Azure Functions Azure Functions
(Microservice 1: Tax) (Microservice 2: Payment) (Microservice N: Fufillment)
Real-time Recommendations
Online Recommendations Service

Azure Container Service Azure Cosmos DB

(Recommendations API) (Product + User Vectors)

Shoppers
E-commerce Store Apache Spark on
Azure Databricks

Azure Container Service Azure Cosmos DB

(Order Transaction API) (Customer Orders)

Order Transactions
Multiplayer Gaming

Azure CDN
Azure Storage
(game files)

Azure Cosmos DB Azure Databricks

Azure Traffic Azure API Apps (game database) (game analytics)
Manager (game backend)

Azure Functions Azure Notification Hubs

(push notifications)
Scale-out Computation

MLlib
Spark Spark GraphX
(machine
SQL Streaming (graph)
learning)

Apache Spark on Databricks

Scale-out Database

Spark Connector using SQL API

Azure Cosmos DB
Let’s zoom in Azure Cosmos DB
Resource Model
Account

Database

Container

Item
********.azure.com
Account

Database IGeAvVUp …

Container

Item
Account

Database

Container

Item
Account

Database

Container

Item
Account

Database

Container

Item
Account

Database

Container

Item
Account

Database

Container

Item
Account

Database

Container

Item
Account

Database

Container

Item
Account

Database

Container

Item
Account

Database

Container

Item
Account

Database

Container

Item
Account

Database

Container User

Item Permission
Account

Database

Container = Collection Graph Table

Item
Account

Database

Container

Item
Account

Database

Container

Item
Account

Database

Container
Note: Throughput can also be
shared across a set of collections

Item
Account

Database

Container

Item
Account

Database

Container

Item
Account

Database

Container

Item Sproc Trigger UDF

Account

Database

Container

Item Sproc Trigger UDF Conflict

System design (logical)

Tenants

Follower
K
K
V
V
Follower
Lead
K V

Tables Collections Graphs er

Forwarder

Replica set To a remote

resource partition(s)
Container
Container
Container
Containers Resource Partition
• A consistent, highly available, and resource
governed coordination primitive
• Consists of a replica set with each replica
hosting an instance of database engine
… • Uniquely belongs to a tenant
• Owns a set of keys
Resource Partitions
Request Units
% CPU Request Units
Request Units (RU) is a rate-based currency
% Memory
% IOPS Abstracts physical resources for performing requests

Key to providing isolation in a multi-tenant environment

Enables SLAs for predictable performance

Foreground and background activities

GET
Request Units
Normalized across various access methods

1 RU = 1 read of 1 KB record
POST

Each request consumes fixed RUs

Applies to reads, writes, queries, and stored procedure

execution
PUT

Query

…
Request Units
Provisioned in terms of RU/sec
Rate
limit
Rate limiting based on amount of throughput provisioned

Can be increased or decreased instantaneously

Max RU/sec No
Incoming Requests

throttling Billing is metered hourly

Background processes like TTL expiration, index

Min RU/sec transformations scheduled when quiescent
Replica
Quiescent
Pricing Example
Storage Cost

Avg Record Size (KB) 1

Number of Records 100,000,000

Total Storage (GB) 100

Monthly Cost per GB $0.25
Expected Monthly Cost for Storage $25.00

Throughput Cost
Operation Type Number of Requests per Second Avg RU's per Request RU's Needed
Create 100 5 500
Read 400 1 400

Total RU/sec 900

Monthly Cost per 100 RU/sec $6.00
Expected Monthly Cost for Throughput $54.00

Total Monthly Cost

[Total Monthly Cost] = [Monthly Cost for Storage] + [Monthly Cost for Throughput]
= $25 + $54
= $79 per month

* pricing may vary by region; for up-to-date pricing, see: https://azure.microsoft.com/pricing/details/cosmos-db/

Partitioning
Cosmos DB Container (e.g. Collection)
Cosmos DB Container (e.g. Collection)
Cosmos DB Container (e.g. Collection)

Partitioning Scheme: top-most design decision in Cosmos DB

Cosmos DB Container (e.g. Collection)

Partition Key: User Id

Cosmos DB Container (e.g. Collection)

Partition Key: User Id

Logical Partitioning Abstraction

hash(User Id) Behind the Scenes:

Physical Partition Sets

Psuedo-random distribution of data over

range of possible hashed values
Behind the Scenes:
Physical Partition Sets hash(User Id)

Dharma
Andrew
Shireesh

Karthik

Rimma

….
Mike
Bob
Alice

Carol

… …

Partition 1 Partition 2 Partition n

Frugal # of Partitions based on actual storage and throughput needs

(yielding scalability with low total cost of ownership)
Behind the Scenes:
Physical Partition Sets hash(User Id)

Dharma
Andrew
Shireesh

Karthik

Rimma

….
Mike
Bob
Alice

Carol

… …

Partition 1 Partition 2 Partition n

What happens when partitions need to grow?

Behind the Scenes:
Physical Partition Sets hash(User Id)

Partition Ranges can be dynamically sub-divided

Dharma
To seamlessly grow database as the application grows Shireesh
Dharma Rimma

While sedulously maintaining high availability Karthik

Rimma

Alice
Shireesh + Karthik

Carol

… … …

Partition X Partition X1 Partition X2

Behind the Scenes:
Physical Partition Sets hash(User Id)

Partition Ranges can be dynamically sub-divided

Dharma
To seamlessly grow database as the application grows Shireesh
Dharma Rimma

While sedulously maintaining high availability Karthik

Rimma

Best of All: Alice

Shireesh + Karthik

Partition management is completely taken care of by the system

You don’t have to lift a finger… the database takes care of you. Carol

… … …

Partition X Partition X1 Partition X2

Cosmos DB Container (e.g. Collection) Best Practices: Design Goals for Choosing a Good Partition Key

1) Distribute the overall request + storage volume

• Avoid “hot” partition keys
2) Partition Key is scope for [efficient] queries and transactions
• Queries can be intelligently routed via partition key
• Omitting partition key on query requires fan-out
Cosmos DB Container (e.g. Collection) Best Practices: Design Goals for Choosing a Good Partition Key

1) Distribute the overall request + storage volume

Steps for Success

1. Ballpark scale needs (size/throughput)

2. Understand the workload
3. # of reads/sec vs writes per sec
• Use 80/20 rule to help optimize bulk of workload
• For reads – understand top X queries (look for common filters)
• For writes – understand transactional needs
understand ratio of inserts vs updates
Cosmos DB Container (e.g. Collection) Best Practices: Design Goals for Choosing a Good Partition Key

1) Distribute the overall request + storage volume

Steps for Success

1. Ballpark scale needs (size/throughput)

General Tips
• Don’t be afraid of having too many partition keys
• Partitions keys are logical
• More partition keys => more scalability
Object Model Design
A few notes about containers

Containers do NOT enforce schema

There are benefits to co-locate multiple types in a container

Annotate records with a "type" property

Co-locating types in the same container

Ability to query across multiple entity types with a single network request.
Ability to query across multiple entity types with a single network request.

For example, we have two types of documents: cat and person.

{
{
   "id": "Andrew",
   "id": "Ralph",
   "type": "Person",
   "type": "Cat",
   "familyId": "Liu",
   "familyId": "Liu",
   "worksOn": "Azure Cosmos DB"
   "fur": {
}
         "length": "short",
         "color": "brown"
   }
}
Ability to query across multiple entity types with a single network request.

For example, we have two types of documents: cat and person.

We can query both types of documents without needing a JOIN simply by running a query without a filter on type:

SELECT * FROM c WHERE c.familyId = "Liu"

Ability to query across multiple entity types with a single network request.

For example, we have two types of documents: cat and person.

{
{
   "id": "Andrew",
   "id": "Ralph",
   "type": "Person",
   "type": "Cat",
   "familyId": "Liu",
   "familyId": "Liu",
   "worksOn": "Azure Cosmos DB“
   "fur": {
}
         "length": "short",
         "color": "brown"
   }
}

If we wanted to filter on type = “Person”, we can simply add a filter on type to our query:

SELECT * FROM c WHERE c.familyId = "Liu" AND c.type = "Person"

Co-locating types in the same container

Ability to query across multiple entity types with a single network request.

Ability to perform transactions across multiple types

Global Distribution
Why Global Distribution
High Availability
• Automatic and Manuel Failover
• Multi-homing API removes need for app redeployment

Low Latency (anywhere in the world)

• Packets cannot move fast than the speed of light
• Sending a packet across the world under ideal network
conditions takes 100’s of milliseconds
• You can cheat the speed of light – using data locality
• CDN’s solved this for static content
• Azure Cosmos DB solves this for dynamic content
Note: For multi-master enabled accounts –
The priority list indicates which is the designated
“hub” region for resolving write conflicts.
Consistency
ACID != CAP

Consistency w.r.t. Transactions is NOT the same thing as Consistency w.r.t. Replication.

this is about moving from one valid state to this about getting a consistent view across
another for a single given tx replicated copies of data
(West US)

(East US)

(North Europe)
Value = 5

Value = 5

Value = 5
Value = 5 6

Update 5 => 6 Value = 5

Value = 5
Value = 5 6

Update 5 => 6 Value = 5 6

Value = 5

What happens when a network partition is introduced?

Value = 5 6

Update 5 => 6 Value = 5 6

Value = 5

What happens when a network partition is introduced? Reader: What is the value?
Should it see 5? (prioritize availability)
Or does the system go offline until network is restored? (prioritize consistency)
Brewer’s CAP Theorem: impossible for distributed data store to
simultaneously provide more than 2 out of the following 3 guarantees:
Consistency, Availability, Partition Tolerance
Latency: packet of information can travel as fast as speed of light.
Replication between distant geographic regions can take 100’s of milliseconds

Value = 5 6

Update 5 => 6 Value = 5 6

Value = 5
Reader A: What is the value?
Value = 5 6

Update 5 => 6 Value = 5 6

Value = 5

Reader B: What is the value?

Reader A: What is the value?
Value = 5 6

Update 5 => 6 Value = 5 6

Value = 5

Reader B: What is the value?

Should it see 5 immediately? (prioritize latency)
Does it see the same result as reader A? (quorum impacts throughput)
Or does it sit and wait for 5 => 6 propagate? (prioritize consistency)
PACELC Theorem: In the case of network partitioning (P) in a distributed computer
system, one has to choose between availability (A) and consistency (C) (as per the CAP
theorem), but else (E), even when the system is running normally in the absence of
partitions, one has to choose between latency (L) and consistency (C).
Programmable Data Consistency

Choice for
most
distributed
apps

Strong consistency Eventual consistency,

High latency Low latency
Well-defined consistency models
• Intuitive programming model
• 5 Well-defined, consistency models
• Overridable on a per-request basis

• Clear tradeoffs
• Latency
• Availability
• Throughput
Consistency Level Guarantees

Strong Linearizability (once operation is complete, it will be visible to all)

Bounded Staleness Consistent Prefix.

Reads lag behind writes by at most k prefixes or t interval
Similar properties to strong consistency (except within staleness window), while
preserving 99.99% availability and low latency.

Session Consistent Prefix.

Within a session: monotonic reads, monotonic writes, read-your-writes, write-follows-
reads
Predictable consistency for a session, high read throughput + low latency

Consistent Prefix Reads will never see out of order writes (no gaps).

Eventual Potential for out of order reads. Lowest cost for reads of all consistency levels.
Bounded-Staleness: Bounds are set server-side via the Azure Portal
Session Consistency: Session is controlled using a “session token”.
• Session tokens are automatically cached by the Client SDK
• Can be pulled out and used to override other requests (to preserve session between multiple clients)
string sessionToken;

using (DocumentClient client = new DocumentClient(new Uri(""), ""))

{
ResourceResponse<Document> response = client.CreateDocumentAsync(
collectionLink,
new { id = "an id", value = "some value" }
).Result;
sessionToken = response.SessionToken;
}

using (DocumentClient client = new DocumentClient(new Uri(""), ""))

{
ResourceResponse<Document> read = client.ReadDocumentAsync(
documentLink,
new RequestOptions { SessionToken = sessionToken }
).Result;
}
Consistency can be relaxed on a per-request basis

client.ReadDocumentAsync(
documentLink,
new RequestOptions { ConsistencyLevel = ConsistencyLevel.Eventual }
);
Indexing
Schema-agnostic, automatic indexing
Automatically index every property of every record without having to
define schemas and indices upfront.

No need for schema and index management

Works across every data model

Schema

Latch free data structure for highly write-optimized database engine

Multiple index types: Hash, range, and geospatial

Physical index
SQL API
Query Demo:
https://www.documentdb.com/sql/demo
SQL API
Example: SQL Parameterization

Example: LINQ
SQL API
Query Results are paginated:

ToList() automatically iterates through all pages:

SQL API

Cross-Partition Queries
Concurrency
Write Optimized Database Engine
Designed for sustained large write volume without any term locality

B-Tree Lock free; threads never block

Log structured with large elastic writes

Cache
Blind incremental updates
Log Structured In memory
Store Low write, read, and space amplification

Index updates must operate within frugal resource budgets

Optimistic Concurrency Control via etag property

Optimistic Concurrency Control

{
"id": "2c9cddbb-a011-4947-94c2-6f8ccf421d2e",
"_rid": "o8ExAJlS4xRIAAAAAAAAAA==",
"_self": "dbs/o8ExAA==/colls/o8ExAJlS4xQ=/docs/o8ExAJlS4xRIAAAAAAAAAA==/",
"_etag": "\"2e004542-0000-0000-0000-5af31e8c0000\"",
"_attachments": "attachments/",
"_ts": 1525882508
}
Optimistic Concurrency Control
Optimistic Concurrency Control
Transactions
JavaScript Language Integrated Transactions
Context Pool ACID transactions over multiple records scoped to a partition key
… Compiled JavaScript Rich programming model via Stored Procedures
… …
Exposed via a JavaScript as a modern day T-SQL
Store
REPLACE
REPLACE
QUERY

Snapshot isolation at beginning of script invocation

… To other replicas
Writes gets atomically committed upon successful script invocation
Transaction

Exceptions (via “throw” keyword) rolls back the transaction

JavaScript Language Integrated Transactions
ACID transactions over multiple records scoped to a partition key
function(playerId1, playerId2) {
    var playersToSwap = __.filter (function (document) {
        return (document.id == playerId1 || document.id == playerId2);
Rich programming model via Stored Procedures
    });
    var player1 = playersToSwap[0], player2 = playersToSwap[1];
Exposed via a JavaScript as a modern day T-SQL
    var player1ItemTemp = player1.item;
    player1.item = player2.item;
    player2.item = player1ItemTemp; Snapshot isolation at beginning of script invocation
    __.replace(player1)
        .then(function() { return __.replace (player2); })
        .fail(function(error){ throw 'Unable to update players, abort'; }); Writes gets atomically committed upon successful script invocation
}

System checks for e-tag violations at commit time to avoid conflict

Exceptions (via “throw” keyword) rolls back the transaction

Stored Procedures Best Practices + Caveats
ACID transactions across multiple records in a distributed system involves tradeoffs:
• These are laws of physics – cannot be avoided
• Transactions across multiple machines require expensive coordination – common approach is 2 phase commit
• Providing isolation against read/write skew against other concurrent transactions also involve concurrency tradeoffs

Guidelines & Best Practices:

• Sprocs must be scoped to a partition key value
• Sprocs have bounded execution (5 second rule)
• CRUD methods expose an isAccepted API to help script detect when it is nearing execution boundary
• Long running transactions should be broken up in to “chunks” with a continuation model
• Ex: return a Boolean indicating whether transaction is done, and include metadata (_ts watermark or
pointer) to help resume business logic
• Sprocs are implemented via JS – use callback convention to serialize control flow and avoid queueing up too many
async requests
• Tip: avoid deserializing string => object input unless required – this uses unnecessary CPU / RUs
Change Feed
Azure Cosmos DB Change Feed

Persistent log of records within an Azure Cosmos DB container in the

order in which they were modified
Common Scenarios
Event Sourcing for Microservices
Trigger Action
From Change Feed

Persistent Microservice
Event Store #1

Microservice #2
New Event

…
Microservice
#N
Materializing Views

Application

Cosmos DB
Materialized View
Subscription User Create Date …

123abc Ben6 6/17/17 User Total Subscriptions

456efg Ben6 3/14/17 Ben6 2

789hij Jen4 8/1/16 Jen4 1
012klm Joe3 3/4/17 Joe3 1
Replicating Data Secondary Datastore (e.g. archive)

Replicate Records

CRUD Data
Working with Change Feed
Working with Change Feed

Step 1: Retrieve a list of the partition key ranges

Working with Change Feed
Step 2: Consume the Change Feed on each PartitionKeyRange
Change Feed Processor Library
Behind the Scenes
Working with Change Feed Processor Library

Step 1: Implement ProcessChangesAsync() on IChangeFeedObserver

Working with Change Feed Processor Library

Step 2: Register the IChangeFeedObserver with to a ChangeFeedEventHost

Security & Compliance
Always encrypted at rest and in transit
• Encryption@ Rest – AES256
• Encryption @ Transit – SSL / TLS

Fine grained “row level” authorization

• User/Permissions with Resource Tokens

Network security with IP firewall rules and VNET

Comprehensive Azure compliance certification:

• ISO 27001, ISO 27018, EUMC, HIPAA
• PCI, SOC1 and SOC2
• FEDRAMP, HITRUST
A few more tips & tricks
Bulk Executor Library

Supports bulk import and update

Auto handles congestion control + transient errors

10x client-side performance improvement

Available for .NET and Java

Azure Cosmos DB + Apache Spark
1. Spark master node connects to the
Cosmos DB gateway node
2. Metadata is returned to Spark master gateway
data
node
node 1 nodes

3. Query is executed from Spark worker 2

master
nodes in parallel to the Cosmos DB data node
nodes 3 Spark-DocumentDB
Connector (Java)
4. Query results are returned from the worker nodes
Cosmos DB data nodes to the Spark 4
worker nodes.
Components of a Lambda Architecture
batch layer serving layer
1. All data pushed into both batch and speed
layer for processing
pre-compute batch view
2. The batch layer has a master dataset
2 3
(immutable, append-only set of raw data)
and pre-compute the batch views
master dataset batch view
3. The serving layer has batch views so data 5
1
available for fast queries. new query
4. The speed layer compensates for data
processing time (to serving layer) and
deals with recent data only. speed layer
4
5. All queries can be answered by merging
real-time view real-time view
results from batch views and real-time
views.

Source: http://lambda-architecture.net/
Lambda Architecture Simplified
1. All data pushed into both batch and speed 1
layer for processing new
data
2. The batch layer has a master dataset collections
(immutable, append-only set of raw data) CosmosDB
computed RT
and pre-compute the batch views change feed
2 4 4
master dataset
3. The serving layer has batch views so data
available for fast queries. 3 computed batch
3
4. The speed layer compensates for
processing time (to serving layer) and pre-compute
5 batch
deals with recent data only. 2

5. All queries can be answered by merging query

results from batch views and real-time
views.
Trouble Shooting Guide
Troubleshooting 429s – Rate Limited Calls
Master Resource High RU charges for High volume of
Throttling? Operations in the requests (even
time window of rate though RU charges
limited calls? are low?)
Yes No

Repeated unnecessary calls to Queries? Ingestion Path? Skew in the data? Read-only Stored
“Master” resources? - Create/Replace/Upsert - Is there an uneven distribution of data Procedures?
- Read Collection and/or requests across partition keys?
- Read Offer - Is the cardinality of the partition key
- Sproc are optimized for
- Read Database too low?
atomically and
- Multiple Client instances transactionally writing
multiple records
- Not ideal for read-only
procedural logic
- Refactor Data Model Design - Re-write read-only logic
Navigate to Navigate to - Follow this guide for best practices on as query/read operations
- Re-use singleton
Troubleshooting Troubleshooting partitioning
instance of client and
refactor redundant Query Ingestion -https://docs.microsoft.com/en-
calls Performance Performance us/azure/cosmos-db/partition-data
slide slide
Troubleshooting Query Performance
High RU Charges for High Query
Query Operations? latencies?

Queries on id? Are these aggregate Sorting on a field Ask for QueryMetrics - Are there a large number of cross Mongo API?
queries? with a large number https://docs.microso partition queries? - Cross partition queries in
of distinct values? ft.com/en- - Are there a large number of physical Mongo are serial. These
e.g. timestamp? us/azure/cosmos- partitions for the collection? queries are expected to
db/sql-api-sql-query- have high latencies.
metrics#query-
execution-metrics
- Control the degree of parallelism
instead of setting - Contact support for help
- Use read document Materialized a view to offload some - Use read document MaxDegreeOfParallelism to -1 re-tuning queries
computation from read path to - This will ensure x number of
over query over query
document write path document partitions are executed against
- Run aggregations client side in parallel
- Contact support to - Contact support to
- Store results of aggregations in
preview upcoming a second collection preview upcoming
indexing - Use ChangeFeed library to indexing
improvements reduce latency for generating improvements
real-time views
https://docs.microsoft.com/en-
us/azure/cosmos-db/change-
feed
Troubleshooting Ingestion Performance
High RU Charges for Ingestion
Operations?

Large documents? Default Indexing Is the BulkExecutor being used?

policy with a large
number of fields?

Try BulkExecutor if not already using

- Can the documents be split into - Default indexing indexes on all
multiple documents? fields by default. This might be
- This can reduce the scope and unnecessary
RU cost of updates. While total - If only a known subset of fields
RUs consumed for other will be used as filters on
operations will be queries, the indexing policy can
approximately the same, be modified to only include
records can partially succeed those fields – this has
instead of rate limiting large empirically proven to show large
chunks improvements in throughput
utilization
- https://docs.microsoft.com/en-
us/azure/cosmos-db/indexing-
policies#index-paths
General Performance Troubleshooting

What is the Linux VMs?

memory - Are you bottlenecked
usage on by max file size (i.e. no
the VM? of open connections)?
- i.e. nofiles

General/High Level
Troubleshooting

What is the
CPU
utilization
How many threads are
on the VM?
being executed in
parallel?
- Does the VM have
How many client instances sufficient cores?
are being created per
process? Ideally, the number
of instances should be
limited to 1 per process.
Thank you and Q&A

Follow @AzureCosmosDB
cosmosdb.com #azure-cosmosdb
Use #CosmosDB

Data Fundamentals
No ratings yet
Data Fundamentals
37 pages
Azure Data Factory
100% (4)
Azure Data Factory
16 pages
Teradata
100% (2)
Teradata
971 pages
Azure Databricks Monitoring
100% (1)
Azure Databricks Monitoring
22 pages
Mastering Azure Synapse Analytics: Learn how to develop end-to-end analytics solutions with Azure Synapse Analytics (English Edition)
From Everand
Mastering Azure Synapse Analytics: Learn how to develop end-to-end analytics solutions with Azure Synapse Analytics (English Edition)
Debananda Ghosh
No ratings yet
PySpark+Slides v1
No ratings yet
PySpark+Slides v1
458 pages
Practice Questions for Snowflake Snowpro Core Certification Concept Based - Latest Edition 2023
From Everand
Practice Questions for Snowflake Snowpro Core Certification Concept Based - Latest Edition 2023
Exam OG
5/5 (1)
DP-203 StudyGuide ENU FY23Q2a Vnext
No ratings yet
DP-203 StudyGuide ENU FY23Q2a Vnext
13 pages
HP Elitebook 8460P INVENTEC 6050A2398501-MB-A02 CLASH DISCRETE Schematics
No ratings yet
HP Elitebook 8460P INVENTEC 6050A2398501-MB-A02 CLASH DISCRETE Schematics
69 pages
Exploring Hadoop Ecosystem (Volume 2): Stream Processing
From Everand
Exploring Hadoop Ecosystem (Volume 2): Stream Processing
Wei Liu
No ratings yet
Azure Total Cost of Ownership (TCO) Summary: Sample Report For Data Center Migration (Windows and Linux Servers)
No ratings yet
Azure Total Cost of Ownership (TCO) Summary: Sample Report For Data Center Migration (Windows and Linux Servers)
10 pages
Azure Cosmos DB: Technical Deep Dive
100% (1)
Azure Cosmos DB: Technical Deep Dive
193 pages
Azure Cosmos DB Developer Ebook - FINAL
100% (1)
Azure Cosmos DB Developer Ebook - FINAL
49 pages
Introduction To Azure Cosmos DB PDF
No ratings yet
Introduction To Azure Cosmos DB PDF
1,816 pages
Dp203 Notes
No ratings yet
Dp203 Notes
87 pages
Azure Data Engineer Guide
No ratings yet
Azure Data Engineer Guide
87 pages
Azure
100% (1)
Azure
71 pages
Cosmosdb: Understanding The Main Factors For Successful Deployment
No ratings yet
Cosmosdb: Understanding The Main Factors For Successful Deployment
58 pages
Azure Cosmos DB 2 Cheat Sheet v4 PDF
No ratings yet
Azure Cosmos DB 2 Cheat Sheet v4 PDF
2 pages
Azure Data Fundamentals
No ratings yet
Azure Data Fundamentals
210 pages
Azure SQL Trainings: Contact: +91 90 32 82 44 67
No ratings yet
Azure SQL Trainings: Contact: +91 90 32 82 44 67
6 pages
Nosql: Non-Relational Next Generation Operational Datastores and Databases
No ratings yet
Nosql: Non-Relational Next Generation Operational Datastores and Databases
19 pages
4.1 The Spark UI - Databricks
No ratings yet
4.1 The Spark UI - Databricks
7 pages
Day1 Main
No ratings yet
Day1 Main
188 pages
10987C ENU PowerPoint Day 3
No ratings yet
10987C ENU PowerPoint Day 3
125 pages
Azure Data Factory - A Complete Introduction
No ratings yet
Azure Data Factory - A Complete Introduction
72 pages
Microsoft Certified: Azure Data Engineer Associate - Skills Measured
No ratings yet
Microsoft Certified: Azure Data Engineer Associate - Skills Measured
4 pages
Data Lakes White Paper PDF
No ratings yet
Data Lakes White Paper PDF
16 pages
UNIX and Shell Scripting - Module 4
No ratings yet
UNIX and Shell Scripting - Module 4
100 pages
Aindump2go dp-300 Exam Question 2022-Nov-04 by Ferdinand 105q Vce
No ratings yet
Aindump2go dp-300 Exam Question 2022-Nov-04 by Ferdinand 105q Vce
6 pages
ADB Course Catalog
No ratings yet
ADB Course Catalog
84 pages
ABD00 Notebooks Combined - Databricks
No ratings yet
ABD00 Notebooks Combined - Databricks
109 pages
Implementing An Azure Data Solution DP-200 - DumpsTool - Mansoor
No ratings yet
Implementing An Azure Data Solution DP-200 - DumpsTool - Mansoor
4 pages
TERADATA
92% (13)
TERADATA
55 pages
ADF Course Deck
No ratings yet
ADF Course Deck
154 pages
Data Factory
100% (2)
Data Factory
26 pages
Warner DP 203 Slides
No ratings yet
Warner DP 203 Slides
98 pages
Ebook Solving Business Needs With Delta Lakev2
No ratings yet
Ebook Solving Business Needs With Delta Lakev2
43 pages
06.introduction To Data Factory
No ratings yet
06.introduction To Data Factory
26 pages
Final Print Py Spark
No ratings yet
Final Print Py Spark
133 pages
Azure Database For MySQL E-Book
No ratings yet
Azure Database For MySQL E-Book
16 pages
Unity Catalog
No ratings yet
Unity Catalog
16 pages
Apache Spark Interview Questions and Answers PDF
No ratings yet
Apache Spark Interview Questions and Answers PDF
31 pages
Azure Developer Intro
No ratings yet
Azure Developer Intro
770 pages
Azure Data Engineer Interview Questions
No ratings yet
Azure Data Engineer Interview Questions
15 pages
AZ304 MicrosoftAzureArchitectDesign1
No ratings yet
AZ304 MicrosoftAzureArchitectDesign1
5 pages
Azure Data Factory Data Flows: Luke Newport Technical Specialist - Data & AI
100% (1)
Azure Data Factory Data Flows: Luke Newport Technical Specialist - Data & AI
30 pages
PostgreSQL Administration
No ratings yet
PostgreSQL Administration
8 pages
Azure SQL
No ratings yet
Azure SQL
3,323 pages
Interactive Visual Data Exploration With Spark in Databricks Cloud
No ratings yet
Interactive Visual Data Exploration With Spark in Databricks Cloud
26 pages
Azure Data Engineer Course Curriculum Nareshit
No ratings yet
Azure Data Engineer Course Curriculum Nareshit
10 pages
Azure Data Factory Notes 1682135573
No ratings yet
Azure Data Factory Notes 1682135573
78 pages
Az 300
No ratings yet
Az 300
547 pages
Microsoft Azure Fundamentals
No ratings yet
Microsoft Azure Fundamentals
39 pages
A Performance Comparison of SQL and NoSQL Databases
No ratings yet
A Performance Comparison of SQL and NoSQL Databases
5 pages
UNIX and Shell Scripting - Module 3
No ratings yet
UNIX and Shell Scripting - Module 3
13 pages
When Where and Why To Use NoSQL
No ratings yet
When Where and Why To Use NoSQL
13 pages
Azure Databricks Overview
No ratings yet
Azure Databricks Overview
23 pages
AZURE DATA FACTORY Content
No ratings yet
AZURE DATA FACTORY Content
5 pages
Ultimate Azure Data Engineering
From Everand
Ultimate Azure Data Engineering
Ashish Agarwal
No ratings yet
Databricks Certified Associate Developer for Apache Spark Using Python: The ultimate guide to getting certified in Apache Spark using practical examples with Python
From Everand
Databricks Certified Associate Developer for Apache Spark Using Python: The ultimate guide to getting certified in Apache Spark using practical examples with Python
Saba Shah
No ratings yet
MC Microsoft Certified Azure Data Fundamentals Study Guide: Exam DP-900
From Everand
MC Microsoft Certified Azure Data Fundamentals Study Guide: Exam DP-900
Jake Switzer
No ratings yet
Docker Automation With Dockerfiles (Linux)
No ratings yet
Docker Automation With Dockerfiles (Linux)
59 pages
Kanban: CEN 4010 Intro To Software Engineering Professor Alex Roque
No ratings yet
Kanban: CEN 4010 Intro To Software Engineering Professor Alex Roque
25 pages
Kanban in 30 Minutes An Introduction: John Carey JULY 2018
No ratings yet
Kanban in 30 Minutes An Introduction: John Carey JULY 2018
25 pages
Big Data and Visualization
No ratings yet
Big Data and Visualization
141 pages
Azure Container Service
No ratings yet
Azure Container Service
12 pages
Ccsa Cloudlabs Webinar 06112018
No ratings yet
Ccsa Cloudlabs Webinar 06112018
24 pages
Azure Machine Learning NOVA SQL 200150824
No ratings yet
Azure Machine Learning NOVA SQL 200150824
30 pages
TAU - WindowsAzureCloudServices
No ratings yet
TAU - WindowsAzureCloudServices
23 pages
Welcome: Please Fill in My Session Feedback Form Available On Each Chair
No ratings yet
Welcome: Please Fill in My Session Feedback Form Available On Each Chair
23 pages
Sample Migration TCO - Rebuild (MS)
No ratings yet
Sample Migration TCO - Rebuild (MS)
10 pages
Serverless: Computing For R
No ratings yet
Serverless: Computing For R
35 pages
Cybersecurity Essentials 3.0-Module01
No ratings yet
Cybersecurity Essentials 3.0-Module01
58 pages
Chapter 4: Product and Service Design: Goods and Services It Offers Capability of An Organization To
No ratings yet
Chapter 4: Product and Service Design: Goods and Services It Offers Capability of An Organization To
12 pages
A Case Study of Requirements Management in Banking System
No ratings yet
A Case Study of Requirements Management in Banking System
8 pages
Internet and Higher Education
No ratings yet
Internet and Higher Education
4 pages
Excel Coaching - Est 2
No ratings yet
Excel Coaching - Est 2
8 pages
Cu0280 - Study - Comparing PV Cable Sizing Standards - v1
No ratings yet
Cu0280 - Study - Comparing PV Cable Sizing Standards - v1
16 pages
Atoll 3.1 General Features - Forsk - Radio Planning and Ion Software
No ratings yet
Atoll 3.1 General Features - Forsk - Radio Planning and Ion Software
3 pages
Body Control Module (BCM) Inspection: 3A 3B 3C 3D 3E 3F 3G 3H 3I 3J 3K 3L
No ratings yet
Body Control Module (BCM) Inspection: 3A 3B 3C 3D 3E 3F 3G 3H 3I 3J 3K 3L
8 pages
My Resume
No ratings yet
My Resume
4 pages
AMAN MISHRA - Final Report - Aman Mishra
No ratings yet
AMAN MISHRA - Final Report - Aman Mishra
36 pages
DrufelCNC Manual
No ratings yet
DrufelCNC Manual
57 pages
History
No ratings yet
History
10 pages
Slide 3 Requirements Engineering
No ratings yet
Slide 3 Requirements Engineering
19 pages
MCP40D17/18/19: 7-Bit Single I C™ (With Command Code) Digital POT With Volatile Memory in SC70
No ratings yet
MCP40D17/18/19: 7-Bit Single I C™ (With Command Code) Digital POT With Volatile Memory in SC70
66 pages
Digital Signals Processing Quiz - With Answer
No ratings yet
Digital Signals Processing Quiz - With Answer
7 pages
Semiconductor Transistor
No ratings yet
Semiconductor Transistor
526 pages
Using Technology in Elt
No ratings yet
Using Technology in Elt
7 pages
Bukh Diesel Engine Type BBD1105 & BBV1505 Operator's Maintenance Manual
No ratings yet
Bukh Diesel Engine Type BBD1105 & BBV1505 Operator's Maintenance Manual
40 pages
Measurements Recorded Area: Ground Level
No ratings yet
Measurements Recorded Area: Ground Level
1 page
Job Interviews - Student
No ratings yet
Job Interviews - Student
4 pages
CNC USB Controller Mk3: User Manual
No ratings yet
CNC USB Controller Mk3: User Manual
20 pages
Dynamic Memory Allocation in C Using Malloc, Calloc, Free and Realloc
No ratings yet
Dynamic Memory Allocation in C Using Malloc, Calloc, Free and Realloc
10 pages
Communication Protocol Specfication Edition D 989-329
No ratings yet
Communication Protocol Specfication Edition D 989-329
484 pages
Current Electricity O Level
No ratings yet
Current Electricity O Level
59 pages
EE3206 Java Programming and Applications: Lecture 4. Object-Oriented Programming (Classes, Objects, Inheritance)
No ratings yet
EE3206 Java Programming and Applications: Lecture 4. Object-Oriented Programming (Classes, Objects, Inheritance)
89 pages
DHL Parcel - Redirect A Parcel
No ratings yet
DHL Parcel - Redirect A Parcel
2 pages
Neles ValvGuard "Exercises" To Keep Fit
No ratings yet
Neles ValvGuard "Exercises" To Keep Fit
1 page
Honeywell MagneticSensors
No ratings yet
Honeywell MagneticSensors
13 pages
03 Gas Turbine
No ratings yet
03 Gas Turbine
56 pages