NOSql
NOSql
NOSql
By Tom Sausner
Agenda
Introduction
Review of NoSQL storage options
CAP Theorem
Review categories of storage options
CouchDB
Overview
Interacting with data
Examples
CAP Theorem
Eric Brewer of U.C. Berkeley, Seth
Gilbert and Nancy Lynch, of MIT
Relates to distributed systems
Consistency, Availability, Partition
Tolerance pick 2
A distributed system is built of
nodes (computers), which can
(attempt to) send messages to each
other over a network.
Consistency
is equivalent to requiring requests
of the distributed shared memory to
act as if they were executing on a
single node, responding to operations
one at a time.
Not the same as ACID
Available
every request received by a nonfailing node in the system must
result in a response.
says nothing about the content of the
response. It could be anything; it
need not be successful or
correct.
Partition Tolerant
any guarantee of consistency or
availability is still guaranteed even if
there is a partition.
if a system is not partition-tolerant,
that means that if the network can
lose messages or any nodes can fail,
then any guarantee of atomicity or
consistency is voided.
Implications of CAP
How to best scale your application? The world
falls broadly into two ideological camps: the
database crowd and the non-database crowd.
The database crowd, unsurprisingly, like
database technology and will tend to address
scale by talking of things like optimistic
locking and sharding
The non-database crowd will tend to address
scale by managing data outside of the
database environment (avoiding the
relational world) for as long as possible.
Column Stores
Document Stores
Couch DB
Mongo
More on CouchDB
The CouchDB file layout and commitment
system features all Atomic Consistent Isolated
Durable (ACID) properties.
Document updates (add, edit, delete) are
serialized, except for binary blobs which are
written concurrently.
CouchDB read operations use a Multi-Version
Concurrency Control (MVCC) model where each
client sees a consistent snapshot of the
database from the beginning to the end of the
read operation.
Eventually Consistent
curl http://127.0.0.1:5984/
Futon Couch DB
Maintenence
http://127.0.0.1:5984/_utils/index.htm
l
Albums database review
Add another document
Tools
Database, Document, View Creation
Secuity, Compact & Cleanup
Create and Delete
Demo Setup
Examples implemented in Groovy
Use HttpBuilder to interact with the
database
Groovy RESTClient
Use google GSON to move objects
between JSON and Java/Groovy
Use Federal Contribution database for
our dataset.
Eclipse
Couch DB Design
Documents
CouchDB is designed to work best
when there is a one-to-one
correspondence between
applications and design documents.
_design/design_doc_name
Design Documents are applications
Ie. A CouchDB can be an application.
Design Documents
contents
Update Handler
updates: {"hello" : function(doc, req) {}
Updates
If you have multiple design
documents, each with a
validate_doc_update function, all of
those functions are called upon each
incoming write request
If any of the validate functions fail
then the document is not added to
the database
Validation
Validation functions are a powerful tool to
ensure that only documents you expect
end up in your databases.
validate_doc_update section of the view
document
function(newDoc, oldDoc, userCtx) {}
throw({forbidden : message});
throw({unauthorized : message});
Views
Filtering the documents in your database to
find those relevant to a particular process.
Building efficient indexes to find documents
by any value or structure that resides in
them
Extracting data from your documents and
presenting it in a specific order.
Use these indexes to represent
relationships among documents.
Map/Reduce dialog
Bob: So, how do I query the database?
IT guy: Its not a database. Its a key-value
store.
Bob: OK, its not a database. How do I
query it?
IT guy: You write a distributed mapreduce function in Erlang.
Bob: Did you just tell me to go screw
myself?
IT guy: I believe I did, Bob.
Map/Reduce in CouchDB
Map functions have a single parameter a
document, and emit a list of key/value pairs
of JSON values
CouchDB allows arbitrary JSON structures to be
used as keys
Reduce/Rereduce
The reduce function is optional
used to produce aggregate results for that view
Reduce functions must accept, as input, results
emitted by its corresponding map function as
well as results returned by the reduce function
itself(rereduce).
On rereduce the key = null
On a large database objects to be reduced will
be sent to your reduce function in batches.
These batches will be broken up on B-tree
boundaries, which may occur in arbitrary places.
More on Map/Reduce
Linked Documents - If you emit an object
value which has {'_id': XXX} then
include_docs=true will fetch the document
with id XXX rather than the document which
was processed to emit the key/value pair.
Complex Keys
emit([lastName, firstName, zipcode], doc)
Grouping
Grouping Levels
Restrictions on
Map/Reduce
Map functions must be referentially
transparent. Given the same doc will
always issue the same key/value pairs
Allows for incremental update
List Donors
Map:
function(doc) {
if(doc.recipientName){
emit(doc.recipientName, doc);
}
else if(doc.recipientType){
emit(doc.recipientType, doc)
}
}
No reduce function
key
startkey, endkey
startkey_docid , endkey_docid
limit, skip, stale, decending
group, grouplevel
reduce
include_docs, inclusive_end
Reduce:
return true
Reduce:
function(keys, values) {
var sum = 0;
for(var idx in keys) {
sum = sum + parseFloat(values[idx]);
}
return sum;
Must set group=true
Reduce:
function(keys, values) {
var sum = 0;
for(var idx in keys) {
sum = sum + parseFloat(values[idx]);
}
return sum;
}
Referencing other
documents
Conflict Management
Multi-Version Concurrency Control (MVCC)
CouchDB does not attempt to merge the
conflicting revisions this is an application
If there is a conflict in revisions between
nodes
App is ultimately responsible for resolving the
conflict
All revisions are saved
One revision is selected as the most recent
_conflict property set
Database Replication
CouchDB has built-in conflict detection
and management and the replication
process is incremental and fast, copying
only documents and individual fields
changed since the previous replication.
replication is a unidirectional process.
Databases in CouchDB have a sequence
number that gets incremented every time
the database is changed.
Replication Continued
"continuous: true
automatically replicate over any new docs as
they come into the source to the target
theres a complex algorithm determining the
ideal moment to replicate for maximum
performance.
curl -X PUT
http://127.0.0.1:5984/albums/1010 -d
'{"title":"Let It Be","artist":"The Beatles"} '
Notifications
Polling , long polling
_changes
filterName:function(doc, req)
Req contains query parameters
Also contains userCtx
Security
ships with OAuth, cookie auth
handler, default - standard http
Authorizations
Reader - read/write document
Database Admin - compact, add/edit
views
Server Admin - create and remove
databases
CouchDB Applied
CouchOne
Hosting Services
CouchDB on Android
CouchApp
HTML5 applications
jCouchDB
Java layer for CouchDB access
CouchDB Lounge
Clustering support
Links
http://couchdb.apache.org/
http://wiki
.apache.org/couchdb/FrontPage
http://guide.couchdb.org/editions/1/e
n/index.html
Questions?
Thanks!