Introduction To Big Data and NoSQL
Introduction To Big Data and NoSQL
Introduction To Big Data and NoSQL
Meet Don
Advisory Solutions Architect EMC Consulting
Application Architecture, Development & Design
Monoculture
Limit CPU cycles Limit disk space Limit memory Limited OS Development Limited Software Programmers
Mono-lingual Mono-persistence
Database
Database
Browser Customer #2
Web Tier
B/L Tier
Database
Browser Customer #3
Web Tier
B/L Tier
Database
Browser Customer #2
Web Tier
B/L Tier
Browser Customer #3
Web Tier
B/L Tier
10
11
Polyglot Persistence
12
Polyglot Programmer
13
14
2009 - Eric Evans & Johan Oskarsson of Last.fm wanted to organize an event to discuss opensource distributed databases
15
16
Atlanta 2009
No:sql(east) conference
select fun, profit from real_world where relational=false
17
18
Document
Key Value
Graph
Column Family
19
Document Store
Apache Jackrabbit CouchDB MongoDB SimpleDB
XML Databases
MarkLogic Server eXist.
20
Document?
Okay think of a web page...
Relational model requires column/tag Lots of empty columns Wasted space
21
Graph Storage
AllegroGraph Core Data Neo4j DEX
FlockDB
Microsoft Trinity (research project)
http://research.microsoft.com/en-us/projects/trinity/
22
Whats a graph?
Graph consists of
Node (stations of the graph) Edges (lines between them)
FlockDB
Created by the Twitter folks Nodes = Users Edges = Nature of relationship between nodes.
23
Key/Value Stores
On disk Cache in Ram Eventually Consistent
Weak Definition
If no updates occur for a period, eventually all updates will propagate through the system and all replicas will be consistent
Strong Definition
for a given update and a given replica eventually either the update reaches the replica or the replica retires
Ordered
Distributed Hash Table allows lexicographical processing
24
Key/Value Examples
Azure AppFabric Cache Memcache-d VMWare vFabric GemFire
25
Object Databases
Db4o GemStone/S InterSystems Cach Objectivity/DB
ZODB
26
Tabular
BigTable Mnesia Hbase Hypertable
27
28
Big Data
29
30
EMC Atmos
Amazon S3 SQL Azure (with Federations support)
31
To deliver a tweet requires rapid paging of followers Heavy write load as followers are added and removed Set arithmetic for @mentions (intersection of users).
32
33
34
35
And commutative
Changing the order of operands doesnt change the result.
36
37
ACID
Atomicity
All or Nothing
Consistency
Valid according to all defined rules
Isolation
No transaction should be able to interfere with another transaction
Durability
Once a transaction has been committed, it will remain so, even in the event of power loss, crashes, or errors
38
BASE
Basically Available
High availability but not always consistent
Soft state
Background cleanup mechanism
Eventual consistency
Given a sufficiently long period of time over which no changes are sent, all updates can be expected to propagate eventually through the system and all the replicas will be consistent.
39
Extract
Transform
Load
Data Warehouse
40
41
MongoDB Example
> // map function > m = function(){ ... this.tags.forEach( ... function(z){ ... emit( z , { count : 1 } ); ... } ... ); ...}; > // reduce function > r = function( key , values ){ ... var total = 0; ... for ( var i=0; i<values.length; i++ ) ... total += values[i].count; ... return { count : total }; ...};
42
MongoDB Demo
43
Hadoop on Azure
https://www.hadooponazure.com/
44
Data
Data
Client
Data
Data
45
Web Role
Worker Role
Req
Req
Req
Queue
Web Role
Worker Role
Web Role
Web Role
Worker Role
Worker Role
46
Aggregate Stores
47
Visualizing Aggregates
ID: 1001 Customer: Ann Line Items
Orders
Customers
2 1 1
Payment Details
Order Lines
Credit Cards
48
Visualizing Aggregates
ID: 1001 Customer: Ann
Line Items
2 1 1
Payment Details
49
50
Next Steps
Learn a NoSQL product
Great place to start AppFabric Cache, Azure Table Storage, MongoDB
51
THANK YOU
52