Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
SlideShare a Scribd company logo
Denys Chamberland
Senior BI Developer
© PAAS SQL Saturday 2018 – Denys Chamberland
Azure Cosmos DB + Gremlin API
Thanks to our sponsors
Agenda
• Brief Azure Cosmos DB Overview
• Welcome to the world of Graph API
• Introduction to Apache TinkerPop 3- Gremlin Traversal Language
• T-SQL vs Gremlin
• Basic Gremlin topology
• How does it work in real context
• Demos – Visual Studio Code + Visual Studio 2017 + Graph Explorer (Core 2.0)
• Questions
What is Azure Cosmos DB?
A globally distributed, massively scalable, multi-model database service
o Transparent and automatic multi-region replication
• Associate any number of regions with your
database account, at any time
• Policy based geo-fencing
Global Distribution
o Multi-homing APIs
• All endpoints are logical, by default
• Apps don’t need to be redeployed during
regional failover
• Apps can also access physical endpoints if
needed
o Support for both manual and automatic failover
o Designed for high availability
• Simulate regional disasters via API
• Allows for dynamically setting priorities to regions
• Test the end-to-end availability for the entire app
(beyond just the database)
What is Azure Cosmos DB?
A globally distributed, massively scalable, multi-model database service
Elastic scale-out
• Partition management is automatically taken
care for you
• Independently scale storage and throughput
across regions
• Scale storage from Gigabytes to Petabytes
• Scale throughput from 100s to 100,000,000s of
requests/record
• Dial down throughput and provision only
what is needed
What is Azure Cosmos DB?
A globally distributed, massively scalable, multi-model database service
What are Azure Cosmos DB - RUs (Request Units)?
o Normalized number representing
amount of CPU/Memory/IO Operations
o Reserved compute for processing operations
o Request Units are per seconds
What is Azure Cosmos DB?
A globally distributed, massively scalable, multi-model database service
What are Azure Cosmos DB - RUs (Request Units)?
o RUs (Request Units) are rate base currency
o You reserve in increments of 100
o Minimum of 400 for Fixed DB of 10 GB
o Minimum of 1000 for Unlimited databases
What is Azure Cosmos DB?
A globally distributed, massively scalable, multi-model database service
Calculating Cosmos DB Request Units (RU) for CRUD and Queries
https://www.documentdb.com/capacityplanner
https://blog.maximerouiller.com/post/calculating-cosmos-db-request-units-ru-for-crud-and-queries/
What is Azure Cosmos DB?
A globally distributed, massively scalable, multi-model database service
Guaranteed single-digit latency
o Reads and writes served from local regions
o Guaranteed millisecond latency worldwide
o Write optimized, latch-free database engine
o Automatically indexed SSD storage
• Synchronous and automatic indexing at sustained ingestion
rates
What is Azure Cosmos DB?
A globally distributed, massively scalable, multi-model database service
Choice of 5 consistency levels
STRONG: Getting perfect data every time no matter how long it takes
The strong model favors data consistency above all else and preserves the order in which data is written.
It guarantees your app users will see all previous writes.
When you choose the strong model, you ask your app users to wait until all data writes have been fully written in the master
and made durably available. Your app users get an error message if their request comes before the data is ready.
The strong model is great if you need your app users to read the absolute truth every time.
o Banking accounts need to reflects the order of transactions and provide an accurate balance, so team members in
different offices don’t pay the same bill twice.
o Payment processing for online orders need to occur in the correct order especially to avoid charging for the same order
more than once.
o Reservation systems must show accurate availability when customers finalize their booking.
What is Azure Cosmos DB?
A globally distributed, massively scalable, multi-model database service
Choice of 5 consistency levels
BOUNDED STALENESS: Fetching data that’s not “too old” to boost performance.
The bounded staleness model ensures relatively accurate data in a more reasonable time frame than strong model.
When you choose the bounded staleness model, you are saying it’s okay for apps to fetch old data from local replicas provided
it’s not more than x versions older than a primary or peer.
The bounded staleness model is great for apps that can afford a little lag time in favor odd data consistency.
o Flight status apps provide flight arrival time estimations using GPS data collected from planes as they fly.
The GPS data doesn’t have to be the most up to date to provide a reasonable estimation.
It’s more important that the user get information when they need it.
o Package tracking apps for a shipping company need to provide chronologically ordered and check points that show
where and when a package is received.
What is Azure Cosmos DB?
A globally distributed, massively scalable, multi-model database service
Choice of 5 consistency levels
SESSION: Putting the individual app user’s experience front and center
The session model prioritizes the user’s interaction by guaranteeing highly available and consistent data throughout
that particular session.
Session consistency provides predictable read-your-own-write consistency for a given session with maximum read throughput
while preserving low latency writes and reads. Consistency within a given session is strong, while consistency outside the given
session is eventual.
The session model is great for apps that require logical and real-time experiences for the user.
o Profile updates your user writes to her account must be immediately available for her to read,
whereas it’s less important for her to read profile updates other users are writing simultaneously.
o Social music apps such as Spotify need to be consistent with users’ playlists preferences as they are building them,
but the preferences don’t have to show up right away for everyone else who is “following.”
What is Azure Cosmos DB?
A globally distributed, massively scalable, multi-model database service
Choice of 5 consistency levels
CONSISTENT PREFIX: Preserving the order of data writes without too much concern for how old it is
The consistent prefix model favors performance and availability without sacrificing the sequence of events by fetching old data fast
When you choose the consistent prefix model, you’re saying it’s okay to give your app users old data as long as the
data read observes the actual sequence of writes. This differs from the eventual model in that it reflects the order of
writes as they occurred.
o Baseball score updates running at the bottom of ESPN must appear in the order that they occurred during the game
at the expense of being up-to-the minute accurate
o Social media comments must be ordered to preserve the back-and-forth nature of dialogue and make sense to people
reading them, but the reads do not need to be fully up-to-date.
As a result, the cost of read operations (in terms of system resources) are lower than
Session, Bounded Staleness and Strong.
What is Azure Cosmos DB?
A globally distributed, massively scalable, multi-model database service
Choice of 5 consistency levels
When you choose the eventual model, you’re saying it doesn’t matter what order data is read as long as something is available.
Data that fetches under the eventual model offers the lowest latency for both reads and writes but it also provides the
weakest consistency.
EVENTUAL: Getting whatever you can, whenever you can, as fast as you can
The eventual model favors app performance above data consistency or write order.
The eventual model is great for apps that live and die according to their availability.
o Product reviews have to be available for customers to reach when they want them but
it’s not crucial that the reviews always include the latest ratings or preserve the order of the ratings.
o Social media wall posts (not the comments to a post, but the initial post itself) just need to show up eventually.
Users care more about seeing activity when they’re on the site then they care about seeing the order of the activity.
It’s okay if, later on, the posts reorder or repopulate in their feed as long as there’s something new to see now.
o Transaction receipts don’t necessarily need to be available immediately after purchase,
as long as they show up within a reasonable window of time.
What is Azure Cosmos DB?
A globally distributed, massively scalable, multi-model database service
Enterprise level SLAs
o Only service with financially backed SLAs for millisecond latency
at the 99th percentile, 99.99% HA (High Availability) and guaranteed throughput
and consistency
What is Azure Cosmos DB?
A globally distributed, massively scalable, multi-model database service
Multi-model + multi API
o Database engine operates on atom-record-sequence
(ARS) based type system
o All data models are efficiently translated to ARS
o API and wire protocols are supported via extensible
modules
o Instance of a given data model can be materialized as trees
o Graph, documents, key-value, column-family, … more to come
What is a Graph Database?
• Vertices (nodes) represent entities/domain objects
A graph database is a database that models data using nodes and edges
and properties to efficiently model the relationships between objects
What is a Graph Database?
• Vertices (nodes) represent entities/domain objects
• Edges define a directional relationship between 2 Vertices
A graph database is a database that models data using nodes and edges
and properties to efficiently model the relationships between objects
What is a Graph Database?
• Vertices (nodes) represent entities/domain objects
• Edges define a directional relationship between 2 Vertices
• Both Vertex and Edges have labels and properties
label: person
name: Scott
A graph database is a database that models data using nodes and edges
and properties to efficiently model the relationships between objects
label: company
name: Microsoft
label: person
name: Satya
label: employs
since: 2005
label: employs
since: 2001
label: follows
Common use for a graph database
Social Networks Recommender Systems Logistics e.g. Flights
IoT (Internet of Things) Fraud Network Detection many other scenarios…
How do I know if I need a graph?
If you can easily answer Yes to one or more questions about your data,
you might be a candidate for using a graph:
o Is your domain a natural fit for a graph?
(e.g. storing IT dependencies, network management, relationships between
entities, etc.)?
o Are you dealing with scenarios where the connections | relationships
between entities are more prominent than the entities themselves?
o Is the structure of your data continuously evolving?
o Are you dealing with an overload of multiple UNIONs of data or a variable
number of JOINS in your SQL queries?
o Do you have recursive CTEs?
Take some tables…
…and add a few relations…
Which blogs friends of my friends liked the most?
Which blogs friends of my friends liked the most?
Which blogs friends of my friends liked the most?
Which blogs friends of my friends liked the most?
Which blogs friends of my friends liked the most?
Which blogs friends of my friends liked the most?
Azure Cosmos DB - Graph API
Similar to computing in general, graph computing makes a distinction between
structure (graph) and process (traversal)
o The structure of the graph is the data model defined by a vertex/edge/property topology
o The process of the graph is the means by which the structure is analyzed
o The typical form of graph is called traversal – i.e. Gremlin language
Structure
Graph
Process
Traversal
+
Apache TinkerPop is a graph computing framework and
top level project hosted by the Apache Software Foundation.
The project includes the following components:
http://tinkerpop.apache.org
o Gremlin
A graph traversal language
o Gremlin Console
An interactive shell for working with local or remote graphs
http://tinkerpop.apache.org/docs/console
o Gremlin Server
Allows hosting of graphs remotely via an HTTP/Web socket connection.
http://tinkerpop.apache.org/docs/server
Gremlin allows the users to write complex queries to traverse their graphs by using a composed sequence of
steps, with each step performing an operation on the data stream.
step: a generic, general purpose computational step
o transform: take an object and emit a transformation of it.
Steps
http://tinkerpop.apache.org/docs/current/tutorials/getting-started/
Steps
Gremlin allows the users to write complex queries to traverse their graphs by using a composed sequence of
steps, with each step performing an operation on the data stream.
step: a generic, general purpose computational step
o transform: take an object and emit a transformation of it.
o filter: decide whether to allow an object to pass or not.
Steps
http://tinkerpop.apache.org/docs/current/tutorials/getting-started/
Steps
Gremlin allows the users to write complex queries to traverse their graphs by using a composed sequence of
steps, with each step performing an operation on the data stream.
step: a generic, general purpose computational step
o transform: take an object and emit a transformation of it.
o filter: decide whether to allow an object to pass or not.
o sideEffect: pass the object, but yield some side effect.
Steps
http://tinkerpop.apache.org/docs/current/tutorials/getting-started/
Steps
Gremlin allows the users to write complex queries to traverse their graphs by using a composed sequence of
steps, with each step performing an operation on the data stream.
step: a generic, general purpose computational step
o transform: take an object and emit a transformation of it.
o filter: decide whether to allow an object to pass or not.
o sideEffect: pass the object, but yield some side effect.
o branch: decide which step to take.
Steps
http://tinkerpop.apache.org/docs/current/tutorials/getting-started/
Steps
http://sql2gremlin.com/
Coming from relational table world?
This tool may help you getting started
No problem…
Demo 1 – Gremlin topology 101
Case to reproduce:
o Building a simple basic social network
o Create Vector and Edges data from basic template skeleton
o View our data using basic Gremlin traversal query topology
Azure Cosmos DB + Gremlin API in Action
Getting started with Azure Cosmos DB: Graph API
https://github.com/Azure-Samples/azure-cosmos-db-graph-gremlindotnet-getting-started
Pete
Pete
Anna
Pete
Anna
Paul
Pete
Anna
Paul
Windie
Pete
Anna
Paul
Windie
Justin
Pete
Anna
Paul
Windie
Justin
Laptop
Pete
Anna
Paul
Windie
Justin
Laptop
Mobile
Pete
Anna
Paul
Windie
Justin
Laptop
Mobile
Windows
Pete
Anna
Paul
Windie
Justin
Laptop
Mobile
Windows
Android
Pete
Anna
Paul
Windie
Justin
Laptop
Mobile
Windows
AndroidMartial Arts
Pete
Anna
Paul
Windie
Justin
Laptop
Mobile
Windows
AndroidMartial Arts
Photography
Pete
Anna
Paul
Windie
Justin
Laptop
Mobile
Windows
AndroidMartial Arts
Photography
Yoga
Pete
Anna
Paul
Windie
Justin
Laptop
Mobile
Windows
AndroidMartial Arts
Photography
Yoga
Pilates
Pete
Anna
Paul
Windie
Justin
Laptop
Mobile
Windows
AndroidMartial Arts
Photography
Yoga
Pilates
Pete
Anna
Paul
Windie
Justin
Laptop
Mobile
Windows
AndroidMartial Arts
Photography
Yoga
Pilates
Pete
Anna
Paul
Windie
Justin
Laptop
Mobile
Windows
AndroidMartial Arts
Photography
Yoga
Pilates
Pete
Anna
Paul
Windie
Justin
Laptop
Mobile
Windows
AndroidMartial Arts
Photography
Yoga
Pilates
Pete
Anna
Paul
Windie
Justin
Laptop
Mobile
Windows
AndroidMartial Arts
Photography
Yoga
Pilates
Pete
Anna
Paul
Windie
Justin
Laptop
Mobile
Windows
AndroidMartial Arts
Photography
Yoga
Pilates
Pete
Anna
Paul
Windie
Justin
Laptop
Mobile
Windows
AndroidMartial Arts
Photography
Yoga
Pilates
Pete
Anna
Paul
Windie
Justin
Laptop
Mobile
Windows
AndroidMartial Arts
Photography
Yoga
Pilates
uses
Pete
Anna
Paul
Windie
Justin
Laptop
Mobile
Windows
AndroidMartial Arts
Photography
Yoga
Pilates
runsOn
Pete
Anna
Paul
Windie
Justin
Laptop
Mobile
Windows
AndroidMartial Arts
Photography
Yoga
Pilates
runsOn
runsOn
Pete
Anna
Paul
Windie
Justin
Laptop
Mobile
Windows
AndroidMartial Arts
Photography
Yoga
Pilates
Pete
Anna
Paul
Windie
Justin
Laptop
Mobile
Windows
AndroidMartial Arts
Photography
Yoga
Pilates
Vertex-Centrix Index Traversal
Vertex-Centrix Index Traversal
Vertex-Centrix Index Traversal
Vertex-Centrix Index Traversal
Global Traversal
Global Traversal
Global Traversal
Global Traversal
Global Traversal
Global Traversal
:> g.V().values('firstName','lastName','age')
Pete,
Zaria,
55
Paul,
Hemmick,
44
Anna,
Stazik,
39
Windie,
Weather,
28
Justin,
Case,
25
:> g.V().hasLabel('person')
.has('age',gt(30))
.values('firstName', 'lastName', 'age')
Pete,
Zaria,
55
Paul,
Hemmick,
44
Anna,
Stazik,
39
:> g.V().hasLabel('person')
.has('age',lt(30))
.values('firstName', 'lastName', 'age')
Windie,
Weather,
28
Justin,
Case,
25
:> g.V('martialarts').inE()
.outV()
.values('firstName','lastName')
Pete,
Zaria,
Anna,
Stazik,
Martial Arts
:> g.V('photography').inE()
.outV()
.values('firstName','lastName')
Pete,
Zaria,
Photography
Paul,
Hemmick
Justin,
Case
:> g.V().where(out('knows')
.has('id', 'pete'))
.where(out('skills')
.has('id', 'photography'))
.where(out('uses').has('id', 'mobile'))
.values('firstName','lastName')
Photography
Justin,
Case
Mobile
:> g.V().where(out('knows')
.has('id', 'pete'))
.where(out('skills')
.has('id', 'photography'))
.where(out('uses').has('id', 'laptop'))
.values('firstName','lastName')
Photography
Paul,
Hemmick
Laptop
Demo 2: cosmosdb-gremlin-flights-core2
Case to reproduce:
This is a repro of original Cosmos-gremlin-flights – by Anthony Chu
Original version was built using .Net Framework 4.5.2
o Challenge was to see if similar application could be rebuilt all from
scratch using dotnet CLI Core 2.0 – in Visual Studio Code
o Formatting issues were found notably with Microsoft.Azure.Graph
and reported as a bug. Issue was managed by rebuilding the
application using Gremlin.Net library.
https://github.com/Tameshiwari/cosmosdb-gremlin-flights-core2
1. CosmosDBGremlinFlights.Console - dotnet CLI console app that generates
Vertex (Airports) and Edges (Flights) data through csv streaming.
2. CosmosDBGremlinFlights.Web - dotnet CLI MVC Core 2.0 client app
with Bing Map chart for tracing flights routes.
Azure Cosmos DB + Gremlin API in Action
Azure Cosmos DB + Gremlin API in Action
Azure Cosmos DB + Gremlin API in Action
dcvizflights
dcvizflights
dcvizflights
dcvizflightsrg
dcvizflights
dcvizflightsrg
dcvizflights
dcvizflightsrg
dcvizflights
dcvizflightsrg
dcvizflights
dcvizflightsrg
dcvizflights
dcvizflightsrg
dcvizflights
dcvizflightsrg
Azure Cosmos DB + Gremlin API in Action
Azure Cosmos DB + Gremlin API in Action
Azure Cosmos DB + Gremlin API in Action
This URI refers to Microsoft.Azure.Graphs!!
All released versions never came further than –preview stage versions.
Microsoft Azure Cosmos DB team does NOT intend to invest
energy in further Microsoft.Azure.Graphs versions,
and strongly recommend using Gremlin.Net instead from now on.
With Gremlin.Net library, you’d rather use a similar URI pattern:
private static string hostname = "dcvizflights.gremlin.cosmosdb.azure.com";
Azure Cosmos DB + Gremlin API in Action
flightsdb
flights
400
flightsdb
flights
400
flightsdb
flights
400
flightsdb
flights
400
flightsdb
flightsdb
flights
400
flightsdb
flights
flightsdb
flights
400
flightsdb
flights
flightsdb
flights
400
flightsdb
flights
400
flightsdb
flights
400
flightsdb
flights
400
flightsdb
flights
400
400
dotnet new console
dotnet add package CsvHelper --version 7.1.0
dotnet add package GeoCoordinate.NetCore –version 1.0.0.1
dotnet add package Polly --version 6.0.1
dotnet add package Gremlin.Net --version 3.3.3
Azure Cosmos DB + Gremlin API in Action
IATA Code (e.g. “YUL”)
Official Airport Name
(e.g. “Montreal Pierre-Elliot Trudeau International Airport”)
Coordinate (e.g. latitude, longitude)
hostname is for Gremlin.Net NOT Microsoft.Azure.Graphs!
Azure Cosmos DB + Gremlin API in Action
Azure Cosmos DB + Gremlin API in Action
Azure Cosmos DB + Gremlin API in Action
Azure Cosmos DB + Gremlin API in Action
Azure Cosmos DB + Gremlin API in Action
Azure Cosmos DB + Gremlin API in Action
Azure Cosmos DB + Gremlin API in Action
Azure Cosmos DB + Gremlin API in Action
Azure Cosmos DB + Gremlin API in Action
Azure Cosmos DB + Gremlin API in Action
Azure Cosmos DB + Gremlin API in Action
Azure Cosmos DB + Gremlin API in Action
dotnet run
dotnet run
Gremlin.Net: -73.7407989502!
Microsoft.Azure.Graphs 0.3.1 -preview
73.7407989502
missing (-) hyphen issue!!!
Asp.Net Core 2 MVC client application with Bing Map
> dotnet new mvc
Azure Cosmos DB + Gremlin API in Action
YUL CDG
LAX YUL
Thank you all for attending
Questions?
References links…
https://github.com/Tameshiwari/cosmosdb-gremlin-flights-core2
https://docs.microsoft.com/en-us/azure/cosmos-db/introduction
https://docs.microsoft.com/en-gb/blog/a-technical-overview-of-azure-comos-db
https://docs.microsoft.com/azure/en-us/azure/cosmos-db/request-units
https://docs.microsoft.com/capacityplanner
http://kelvinlawrence.net/book/Gremlin-Graph-Guide.html
https://tinkerpop.apache.org/gremlin.html
https://github.com/Azure-Samples/azure-cosmos-db-graph-gremlindotnet-getting-started
http://sql2gremlin.com/

More Related Content

Azure Cosmos DB + Gremlin API in Action