Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

NOSql

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 46
At a glance
Powered by AI
Some of the key takeaways from the document are that NoSQL databases do not follow the relational model and use SQL, the CAP theorem describes tradeoffs in distributed systems between consistency, availability and partition tolerance, and CouchDB is an Apache project document database that uses a RESTful JSON API.

The CAP theorem states that a distributed system can only guarantee two of three properties - consistency, availability and partition tolerance. It has implications for how applications scale, with some favoring database technologies and sharding, while others manage data outside databases for as long as possible.

The main types of NoSQL databases described are key-value stores, column stores, document stores, and graph/object stores. Specific examples of each type are also provided.

NoSQL Databases CouchDB

By Tom Sausner

Agenda
Introduction
Review of NoSQL storage options
CAP Theorem
Review categories of storage options

CouchDB
Overview
Interacting with data
Examples

Technologies applying Couch DB

What does it mean?


Not Only SQL or NO! SQL
A more general definition a datastore
that does not follow the relational model
including using SQL to interact with the
data.
Why?
One size does not fit all
Relational Model has scaling issues
Freedom from the tyranny of the DBA?

CAP Theorem
Eric Brewer of U.C. Berkeley, Seth
Gilbert and Nancy Lynch, of MIT
Relates to distributed systems
Consistency, Availability, Partition
Tolerance pick 2
A distributed system is built of
nodes (computers), which can
(attempt to) send messages to each
other over a network.

Consistency
is equivalent to requiring requests
of the distributed shared memory to
act as if they were executing on a
single node, responding to operations
one at a time.
Not the same as ACID

Linearizability ~ operations behave


as if there were no concurrency.
Does not mention transactions

Available
every request received by a nonfailing node in the system must
result in a response.
says nothing about the content of the
response. It could be anything; it
need not be successful or
correct.

Partition Tolerant
any guarantee of consistency or
availability is still guaranteed even if
there is a partition.
if a system is not partition-tolerant,
that means that if the network can
lose messages or any nodes can fail,
then any guarantee of atomicity or
consistency is voided.

Implications of CAP
How to best scale your application? The world
falls broadly into two ideological camps: the
database crowd and the non-database crowd.
The database crowd, unsurprisingly, like
database technology and will tend to address
scale by talking of things like optimistic
locking and sharding
The non-database crowd will tend to address
scale by managing data outside of the
database environment (avoiding the
relational world) for as long as possible.

Types of NoSQL datastores

Key - value stores


Column stores
Document stores
Oject stores

Key Value stores


Memcache ( just merged with
CouchDB)
Redis
Riak

Column Stores

Big Table ( Google )


Dynamo
Cassandra
Hadoop/HBase

Document Stores
Couch DB
Mongo

Graph, Object Stores


Neo4J
db4o

Couch DB - relax ( taken


from website)
An Apache project create by.Damien Katz
A document database server, accessible via a
RESTful JSON API.
Ad-hoc and schema-free with a flat address
space.
Distributed, featuring robust, incremental
replication with bi-directional conflict detection
and management.
Recently merged with Membase

More on CouchDB
The CouchDB file layout and commitment
system features all Atomic Consistent Isolated
Durable (ACID) properties.
Document updates (add, edit, delete) are
serialized, except for binary blobs which are
written concurrently.
CouchDB read operations use a Multi-Version
Concurrency Control (MVCC) model where each
client sees a consistent snapshot of the
database from the beginning to the end of the
read operation.
Eventually Consistent

Couch DB Access via CURL

curl http://127.0.0.1:5984/

curl -X GET http://127.0.0.1:5984/_all_dbs

curl -X PUT http://127.0.0.1:5984/baseball

// error.... already exist


curl -X PUT http://127.0.0.1:5984/baseball

curl -X DELETE http://127.0.0.1:5984/baseball

Adding Docs via CURL


curl -X PUT http://127.0.0.1:5984/albums

curl -X PUT http://127.0.0.1:5984/albums/1000 -d


'{"title":"Abbey Road","artist":"The Beatles"} '
Uuids curl -X GET http://127.0.0.1:5984/_uuids
curl -X GET http://127.0.0.1:5984/albums/1000
_rev - If you want to update or delete a document,
CouchDB expects you to include the _rev field of
the revision you wish to change
curl -X PUT http://127.0.0.1:5984/albums/1000 -d
'{"_rev":"142c7396a84eaf1728cdbf08415a09a41","title":"Ab
bey Road", "artist":"The Beatles","year":"1969"}'

Futon Couch DB
Maintenence
http://127.0.0.1:5984/_utils/index.htm
l
Albums database review
Add another document

Tools
Database, Document, View Creation
Secuity, Compact & Cleanup
Create and Delete

Demo Setup
Examples implemented in Groovy
Use HttpBuilder to interact with the
database
Groovy RESTClient
Use google GSON to move objects
between JSON and Java/Groovy
Use Federal Contribution database for
our dataset.
Eclipse

Data Loading Review


Limited input to NY candidates, and
only year 2010
contributions.fec.2010.csv
Groovy bean for input data
Readfile.groovy
contribDB.put(path:"fed_contrib_test/$
{contrib.transactionId}", contentType: JSON,
requestContentType: JSON, body:json )

Couch DB Design
Documents
CouchDB is designed to work best
when there is a one-to-one
correspondence between
applications and design documents.
_design/design_doc_name
Design Documents are applications
Ie. A CouchDB can be an application.

Design Documents
contents
Update Handler
updates: {"hello" : function(doc, req) {}

Views ( more on this later)


Validation
Shows
Lists
Filters
libs

Updates
If you have multiple design
documents, each with a
validate_doc_update function, all of
those functions are called upon each
incoming write request
If any of the validate functions fail
then the document is not added to
the database

Validation
Validation functions are a powerful tool to
ensure that only documents you expect
end up in your databases.
validate_doc_update section of the view
document
function(newDoc, oldDoc, userCtx) {}
throw({forbidden : message});
throw({unauthorized : message});

Ok, how can I see my data?


CouchDB design documents can
contain a views section
Views contain Map/Reduce functions
Map/Reduce functions are
implemented in javascript
However there are different Query Servers
available using different languages

Views
Filtering the documents in your database to
find those relevant to a particular process.
Building efficient indexes to find documents
by any value or structure that resides in
them
Extracting data from your documents and
presenting it in a specific order.
Use these indexes to represent
relationships among documents.

Map/Reduce dialog
Bob: So, how do I query the database?
IT guy: Its not a database. Its a key-value
store.
Bob: OK, its not a database. How do I
query it?
IT guy: You write a distributed mapreduce function in Erlang.
Bob: Did you just tell me to go screw
myself?
IT guy: I believe I did, Bob.

Map/Reduce in CouchDB
Map functions have a single parameter a
document, and emit a list of key/value pairs
of JSON values
CouchDB allows arbitrary JSON structures to be
used as keys

Map is called for every document in the


database
Efficiency?

emit() function can be called multiple times


in the map function
View results are stored in B-Trees

Reduce/Rereduce
The reduce function is optional
used to produce aggregate results for that view
Reduce functions must accept, as input, results
emitted by its corresponding map function as
well as results returned by the reduce function
itself(rereduce).
On rereduce the key = null
On a large database objects to be reduced will
be sent to your reduce function in batches.
These batches will be broken up on B-tree
boundaries, which may occur in arbitrary places.

More on Map/Reduce
Linked Documents - If you emit an object
value which has {'_id': XXX} then
include_docs=true will fetch the document
with id XXX rather than the document which
was processed to emit the key/value pair.
Complex Keys
emit([lastName, firstName, zipcode], doc)

Grouping
Grouping Levels

Restrictions on
Map/Reduce
Map functions must be referentially
transparent. Given the same doc will
always issue the same key/value pairs
Allows for incremental update

reduce functions must be able reduce on


its own output
This requirement of reduce functions allows
CouchDB to store off intermediated reductions
directly into inner nodes of btree indexes, and
the view index updates and retrievals will have
logarithmic cost

List Donors
Map:
function(doc) {
if(doc.recipientName){
emit(doc.recipientName, doc);
}
else if(doc.recipientType){
emit(doc.recipientType, doc)
}
}
No reduce function

List of Query Parameters

key
startkey, endkey
startkey_docid , endkey_docid
limit, skip, stale, decending
group, grouplevel
reduce
include_docs, inclusive_end

List all NY candidates


Want a list of all of the unique
candidates in the database
Map:
emit(doc.recipientType, null);

Reduce:
return true

Must set group = true

Total Candidate Donations


List the total campaign contributions for each
candidate
Map:
emit(doc.recipientType, doc.amount)

Reduce:
function(keys, values) {

var sum = 0;
for(var idx in keys) {
sum = sum + parseFloat(values[idx]);
}
return sum;
Must set group=true

Donation Totals by Zip


Complex Keys
In the map function:
emit([doc.recipientType, doc.contributorZipCode],
doc.amount);

Reduce:
function(keys, values) {

var sum = 0;
for(var idx in keys) {
sum = sum + parseFloat(values[idx]);
}
return sum;
}

Referencing other
documents

Conflict Management
Multi-Version Concurrency Control (MVCC)
CouchDB does not attempt to merge the
conflicting revisions this is an application
If there is a conflict in revisions between
nodes
App is ultimately responsible for resolving the
conflict
All revisions are saved
One revision is selected as the most recent
_conflict property set

Database Replication
CouchDB has built-in conflict detection
and management and the replication
process is incremental and fast, copying
only documents and individual fields
changed since the previous replication.
replication is a unidirectional process.
Databases in CouchDB have a sequence
number that gets incremented every time
the database is changed.

Replication Continued
"continuous: true
automatically replicate over any new docs as
they come into the source to the target
theres a complex algorithm determining the
ideal moment to replicate for maximum
performance.

Create albums_backup using futon replicator

curl -X PUT
http://127.0.0.1:5984/albums/1010 -d
'{"title":"Let It Be","artist":"The Beatles"} '

Replication & Conflict


Replicate albums db via Futon
curl -X PUT
http://127.0.0.1:5984/albums/1050 -d
'{"title":RJUG Roundup","artist":"Rob",
year":2010"}
Replicate again
curl -X PUT
http://127.0.0.1:5984/albums_backup/1050
-d '{"title":RJUG Roundup","artist":"Rob",
year":2011"}
Replicate, review

Notifications
Polling , long polling
_changes

If executing not from a browser can


request continuous changes
Filters can be applied to changes
Ex only notify when level = error

filterName:function(doc, req)
Req contains query parameters
Also contains userCtx

Security
ships with OAuth, cookie auth
handler, default - standard http
Authorizations
Reader - read/write document
Database Admin - compact, add/edit
views
Server Admin - create and remove
databases

CouchDB Applied
CouchOne
Hosting Services
CouchDB on Android

CouchApp
HTML5 applications

jCouchDB
Java layer for CouchDB access

CouchDB Lounge
Clustering support

Links
http://couchdb.apache.org/
http://wiki
.apache.org/couchdb/FrontPage
http://guide.couchdb.org/editions/1/e
n/index.html

Questions?
Thanks!

You might also like