Java Web Development With MongoDB (Presented at Devoxx 2010)
Java Web Development With MongoDB (Presented at Devoxx 2010)
alvin@10gen.com
Topics
Overview
Data modeling
Replication & Sharding
Developing with Java
Deployment
Drinking from the fire hose
Part One
MongoDB Overview
Strong adoption of MongoDB
90,000
Database
downloads
per month
Over 1,000 Production
Deployments
3 Reason
- Performance
- Large number of readers / writers
- Large data volume
- Agility (ease of development)
NoSQL Really
Means:
non-‐relational,
next-‐generation
operational
datastores
and
databases
RDBMS
(Oracle,
MySQL)
past : one-size-fits-all
RDBMS
(Oracle,
MySQL)
New Gen.
OLAP
(vertica,
aster,
greenplum)
future
we claim nosql segment will be:
* large
* not fragmented
* ‘platformitize-able’
Philosophy:
maximize
features
-‐
up
to
the
“knee”
in
the
curve,
then
stop
• memcached
• RDBMS
Horizontally Scalable
Architectures
no
joins
+ no
complex
transactions
ease of development a surprisingly big benefit : faster to code, faster to change, avoid upgrades and scheduled downtime
more predictable performance
fast single server performance -> developer spends less time manually coding around the database
bottom line: usually, developers like it much better after trying
Platform and Language support
MongoDB is Implemented in C++ for best performance
ease of development a surprisingly big benefit : faster to code, faster to change, avoid upgrades and scheduled downtime
more predictable performance
fast single server performance -> developer spends less time manually coding around the database
bottom line: usually, developers like it much better after trying
Part Two
Data Modeling in MongoDB
So why model data?
A brief history of normalization
• 1970 E.F.Codd introduces 1st Normal Form (1NF)
• 1971 E.F.Codd introduces 2nd and 3rd Normal Form (2NF, 3NF)
• 1974 Codd & Boyce define Boyce/Codd Normal Form (BCNF)
• 2002 Date, Darween, Lorentzos define 6th Normal Form (6NF)
Goals:
• Avoid anomalies when inserting, updating or deleting
• Minimize redesign when extending the schema
• Make the model informative to users
• Avoid bias towards a particular style of query
* source : wikipedia
The real benefit of relational
• Before relational
• Data and Logic combined
• After relational
• Separation of concerns
• Data modeled independent of logic
• Logic freed from concerns of data design
RDBMS MongoDB
Table Collection
Row(s) JSON
Document
Index Index
Join Embedding
&
Linking
Partition Shard
Partition
Key Shard
Key
DB Considerations
How can we manipulate Access Patterns ?
this data ?
• Read / Write Ratio
• Dynamic Queries • Types of updates
• Secondary Indexes • Types of queries
• Atomic Updates • Data life-cycle
• Map Reduce
Considerations
• No Joins
• Document writes are atomic
So today’s example will use...
Design Session
Design documents that simply map to
your application
post
=
{author:
“Hergé”,
date:
new
Date(),
text:
“Destination
Moon”,
tags:
[“comic”,
“adventure”]}
>db.post.save(post)
Find the document
>db.posts.find()
{ _id : ObjectId("4c4ba5c0672c685e5e8aabf3"),
author : "Hergé",
date : "Sat Jul 24 2010 19:47:11 GMT-0700 (PDT)",
text : "Destination Moon",
tags : [ "comic", "adventure" ] }
Notes:
• ID must be unique, but can be anything you’d like
• MongoDB will generate a default ID if one is not
supplied
Add and index, find via Index
Secondary index for “author”
>db.posts.ensureIndex({author: 1})
>db.posts.find({author: 'Hergé'})
{ _id : ObjectId("4c4ba5c0672c685e5e8aabf3"),
author : "Hergé",
... }
Verifying indexes exist
>db.system.indexes.find()
// Index on ID
{ name : "_id_",
ns : "test.posts",
key : { "_id" : 1 } }
// Index on author
{ _id : ObjectId("4c4ba6c5672c685e5e8aabf4"),
ns : "test.posts",
key : { "author" : 1 },
name : "author_1" }
Query operators
Conditional operators:
$ne, $in, $nin, $mod, $all, $size, $exists, $type, ..
$lt, $lte, $gt, $gte, $ne,
Regular expressions:
// posts where author starts with h
>db.posts.find({author: /^h*/i })
Query operators
Conditional operators:
$ne, $in, $nin, $mod, $all, $size, $exists, $type, ..
$lt, $lte, $gt, $gte, $ne,
Regular expressions:
// posts where author starts with h
>db.posts.find({author: /^h*/i })
Counting:
// posts written by Hergé
>db.posts.find({author:
“Hergé”}).count()
Extending the Schema
new_comment
=
{author:
“Kyle”,
date:
new
Date(),
text:
“great
book”}
>db.posts.find({comments.author:”Kyle”})
Extending the Schema
// create index on nested documents:
>db.posts.ensureIndex({"comments.author": 1})
>db.posts.find({comments.author:”Kyle”})
>db.posts.find({comments.author:”Kyle”})
>db[res.result].find()
{ _id : "comic", value : { count : 1 } }
{ _id : "adventure", value : { count : 1 } }
Group
[
{
"author"
:
"Hergé",
"count"
:
1
},
{
"author"
:
"Kyle",
"count"
:
3
}
]
Review
So Far:
- Started out with a simple schema
- Queried Data
- Evolved the schema
- Queried / Updated the data some more
Single Table Inheritance
>db.shapes.find()
{ _id: ObjectId("..."), type: "circle", area: 3.14, radius: 1}
{ _id: ObjectId("..."), type: "square", area: 4, d: 2}
{ _id: ObjectId("..."), type: "rect", area: 10, length: 5, width: 2}
// create index
>db.shapes.ensureIndex({radius: 1})
One to Many
- Embedded Array / Array Keys
- slice operator to return subset of array
- some queries hard
e.g find latest comments across all documents
One to Many
- Embedded Array / Array Keys
- slice operator to return subset of array
- some queries hard
e.g find latest comments across all documents
- Embedded tree
- Single document
- Natural
- Hard to query
One to Many
- Embedded Array / Array Keys
- slice operator to return subset of array
- some queries hard
e.g find latest comments across all documents
- Embedded tree
- Single document
- Natural
- Hard to query
- Normalized (2 collections)
- most flexible
- more queries
Many - Many
Example:
Products Product_Categories
- product_id - product_id
- category_id
Category
- category_id
Many - Many
products:
{
_id:
ObjectId("4c4ca23933fb5941681b912e"),
name:
"Destination
Moon",
category_ids:
[
ObjectId("4c4ca25433fb5941681b912f"),
ObjectId("4c4ca25433fb5941681b92af”]}
Many - Many
products:
{
_id:
ObjectId("4c4ca23933fb5941681b912e"),
name:
"Destination
Moon",
category_ids:
[
ObjectId("4c4ca25433fb5941681b912f"),
ObjectId("4c4ca25433fb5941681b92af”]}
categories:
{
_id:
ObjectId("4c4ca25433fb5941681b912f"),
name:
"Adventure",
product_ids:
[
ObjectId("4c4ca23933fb5941681b912e"),
ObjectId("4c4ca30433fb5941681b9130"),
ObjectId("4c4ca30433fb5941681b913a"]}
Many - Many
products:
{
_id:
ObjectId("4c4ca23933fb5941681b912e"),
name:
"Destination
Moon",
category_ids:
[
ObjectId("4c4ca25433fb5941681b912f"),
ObjectId("4c4ca25433fb5941681b92af”]}
categories:
{
_id:
ObjectId("4c4ca25433fb5941681b912f"),
name:
"Adventure",
product_ids:
[
ObjectId("4c4ca23933fb5941681b912e"),
ObjectId("4c4ca30433fb5941681b9130"),
ObjectId("4c4ca30433fb5941681b913a"]}
{
comments:
[
{
author:
“Kyle”,
text:
“...”,
replies:
[
{author:
“Fred”,
text:
“...”,
replies:
[]}
]}
]
}
Trees
Parent Links
- Each node is stored as a document
- Contains the id of the parent
Child Links
- Each node contains the id’s of the children
- Can support graphs (multiple parents / child)
Array of Ancestors
- Store Ancestors of a node
{ _id: "a" }
{ _id: "b", ancestors: [ "a" ], parent: "a" }
{ _id: "c", ancestors: [ "a", "b" ], parent: "b" }
{ _id: "d", ancestors: [ "a", "b" ], parent: "b" }
{ _id: "e", ancestors: [ "a" ], parent: "a" }
{ _id: "f", ancestors: [ "a", "e" ], parent: "e" }
{ _id: "g", ancestors: [ "a", "b", "d" ], parent: "d" }
Array of Ancestors
- Store Ancestors of a node
{ _id: "a" }
{ _id: "b", ancestors: [ "a" ], parent: "a" }
{ _id: "c", ancestors: [ "a", "b" ], parent: "b" }
{ _id: "d", ancestors: [ "a", "b" ], parent: "b" }
{ _id: "e", ancestors: [ "a" ], parent: "a" }
{ _id: "f", ancestors: [ "a", "e" ], parent: "e" }
{ _id: "g", ancestors: [ "a", "b", "d" ], parent: "d" }
job = db.jobs.findAndModify({
query: {inprogress: false},
sort: {priority: -1),
update: {$set: {inprogress: true,
started: new Date()}},
new: true})
Part Three
Replication & Sharding
Scaling
What is scaling?
Well - hopefully for everyone here.
Traditional Horizontal Scaling
ReplicaSet 1
Primary
Secondary
Secondary
write
Basics
• MongoDB replication is a bit like MySQL replication
Asynchronous master/slave at its core
• Variations:
Master / slave
Replica Pairs (deprecated – use replica sets)
Replica Sets
Replica Sets
• A cluster of N servers
• Any (one) node can be primary
• Consensus election of primary
• Automatic failover
• Automatic recovery
• All writes to primary
• Reads can be to primary (default) or a secondary
Replica Sets – Design Concepts
Member 1
Member 3
Member 2
Replica Set: Electing primary
Member 1
Member 3
Member 2
PRIMARY
Replica Set: Failure of master
negotiate
Member 1 new
Member 3
master PRIMARY
Member 2
DOWN
Replica Set: Reconfiguring
Member 1
Member 3
PRIMARY
Member 2
DOWN
Replica Set: Member recovers
Member 1
Member 3
PRIMARY
Member 2
RECOVER-
ING
Replica Set: Active
Member 1
Member 3
PRIMARY
Member 2
Set Member Types
Normal (priority == 1)
Passive (priority == 0)
Arbiter (no data, but can vote)
Write Scalability: Sharding
read key
range
key
range
key
range
0
..
30 31
..
60 61
..
100
write
Sharding
mongod
mongod
client
Config Servers
coll.save(
new
BasicDBObjectBuilder(
“author”,
“Hergé”).
append(“text”,
“Destination
Moon”).
append(“date”,
new
Date()).
append(“tags”,
tags);
Simple Example, Again
DBCollection
coll
=
new
Mongo().getDB(“blogdb”);
coll.insert(new
BasicDBObject(fields));
DBObject <-> (B/J)SON
{author:”kyle”,
text:“Destination
Moon”,
date:
}
EntityListeners
EntityInterceptor
Basic POJO
@Entity
class Person {
@Id
String author;
@Indexed
Date date;
String text;
}
Datastore Basics
get(class, id)
find(class, […])
save(entity, […])
delete(query)
getCount(query)
update/First(query, upOps)
findAndModify/Delete(query, upOps)
Add, Get, Delete
Blog
entry
=
new
Blog(“Hergé”,
New
Date(),
“Destination
Moon”)
ds.save(entry);
ds.delete(entry);
Queries
Datastore
ds
=
…
Query q = ds.createQuery(Blog.class);
q.field(“author”).equal(“Hergé”).limit(5);
for(Blog
e
:
q.fetch())
print(e);
inc(field,
[val])
dec(field)
add(field,
val)
addAdd(field,
vals)
removeFirst/Last(field)
removeAll(field,
vals)
Relationships
[@Embedded]
Loaded/Saved
with
Entity
Update
@Reference
Stored
as
DBRef(s)
Loaded
with
Entity
Not
automatically
saved
Key<T>
(DBRef)
Stored
as
DBRef(s)
Just
a
link,
but
resolvable
by
Datastore/
Query
MongoDB features in Java
• Durability
• Replication
• Sharding
• Connection options
Durability
• Write acknowledged
when in memory on
master only
Durability - Master + Slaves
• Write acknowledged when
in memory on master +
slave
slaveOk()
- driver to send read requests to Secondaries
- driver will always send writes to Primary
Can be set on
-‐
DB.slaveOk()
-‐
Collection.slaveOk()
-‐
find(q).addOption(Bytes.QUERYOPTION_SLAVEOK);
Using sharding Java
Before sharding
coll.save(
new
BasicDBObjectBuilder(“author”,
“Hergé”).
append(“text”,
“Destination
Moon”).
append(“date”,
new
Date());
After sharding
• Performance tuning
• Sizing
• O/S Tuning / File System layout
• Backup
Backup
• Typically backups are driven from a slave
• Eliminates impact to client / application traffic to master
Backup
•Two strategies
• mogodump / mongorestore
• fsync + lock
mongodump
• RAM - lots of it
• Filesystem
• EXT4 / XFS
• Better file allocation & performance
• I/O
• More disk the better
• Consider RAID10 or other RAID configs
Monitoring
Primary function:
• Measure stats over time
• Tells you what is going on with
your system
• Alerts when threshold reached
Remember me?
Summary