Full Metal Mongo: Scale
Full Metal Mongo: Scale
Full Metal Mongo: Scale
Humongous: Slang. Extraordinary large; Open source NoSQL database Written in C++ https://github.com/mongodb/mongo
expressive coinage, perhaps reecting huge and monstrous, with stress pattern of tremendous
Production Deployments
Outline
Terminology and
basics
Terminology
NoSQL is almost everything Schemaless is nonesense : mongoDB do
have a schema
Scaling out
scale speed NoSQL
features
Format
BSON: Binary encoded serialization of
JSON documents
JSON
{ _id : ObjectId(xxxxx), name : 'Full Metal Mongo', date : Date(), presenter: 'isra', attendants : [ {name:'ana', age:23}, {name:'luis', age: 32} ] } //default _id: 24 hex chars
Data schema
Database
Collection
Document { user: 1, name: [] }
Collection
Flexible: no xed structure ALTER TABLE (implicit) Created in the rst insertion (same for
dbs)
Document
JSON document _id (ObjectId) unique for the collection it can be a document itself Fields: numeric, string, date Arrays and subdocuments
MongoDB basics
Default port: 27017 Optional authentication Data location: /data/db/ Modes automatic replication automatic fail-over
Drivers
Ofcially supported C, C++, Erlang, Haskell, Java,
Javascript, .NET, Perl, PHP, Python, Ruby, Scala
Connection
mongodb://username:password@host:port/
database?options
username and password are optional port: 27017 by default database: admin database by default options: name=value pairs
Insertion
Suppose a collection of GUL courses.
db.courses.insert ({ name : 'Full Metal Mongo', date : new Date(), presenter: 'isra', attendants : [ {name: 'ana', age: 23}, {name: 'luis', age: 32} ] }
Querying
//Full Metal Mongo course db.gul.find({name:'Full Metal Mongo'}) //Courses attended by ana db.gul.find({attendants.name:'ana'}) //Course names given by isra db.gul.find({presenter:'isra'}, {name:1})
Querying II
//Courses ordered by name db.gul.find().sort({name:1}); //The first 5 courses db.gul.find().limit(5); //Next five courses db.gul.find().skip(5).limit(5); //First course (natural order) db.gul.findOne()
Querying III
//Courses attended by any under-age db.gul.find({attendants.age:{$lt:18}}); //Last year courses between Monday and Thursday db.gul.find({date:{ $gt:new Date(2012,03,08), $lt:new Date(2012,03,11)} });
Querying IV
//Courses attended by pedro or ana db.gul.find({'attendants.name': {$in:['pedro', 'ana']} }); //Courses attended by 10 people db.gul.find({attendants: {$size:10} });
$ operators
$in / $nin $all (default is any) $gt(e) / $lt(e) $ne $elemMatch
(conditions in the same subdoc)
More $ expressions
$sum $avg $min $max $push (insert) $addToSet (insert) $rst (sort) $last (sort)
Update
//updates if exits; inserts if new db.gul.save(x) //update speakers in the crafty course db.gul.update( {name:'Crafty'}, {$set:{presenter:['javi','isra']}} ); //new attendant to a course (not multi) db.gul.update( {name:'mongoDB'}, {attendants: {$push:{name:'pepe', age:19}} } );
Remove
//removes all db.gul.remove() //search and remove db.gul.remove({presenter:'isra'})
Update your course to add attendants Query courses with speaker Jess Espino Query course on Friday Query courses tagged as android
Aggregation
db.gul.aggregate([ pipeline ])
Pipelines (7)
Aggregation I
//Number of courses db.gul.count(); //Number of courses given by isra db.gul.count({presenter:'isra'}); //Distinct attendants to all courses db.gul.distinct('attendants.name');
Aggregation II
db.grades.aggregate([ {$unwind:"$scores"}, {$match:{"scores.type":{$ne:"quiz"}}}, {$group:{ _id:{class_id:"$class_id", student_id:"$student_id"}, score:{$avg:"$scores.score"}} }}, {$group:{ _id:{class_id:"$_id.class_id"}, score:{$avg:"$score"} }}, {$sort: {score:-1}} ])
Map/Reduce
Batch processing of data and aggregation
operations
Where GROUP BY was used in SQL Input from a collection and output going
to a collection
Map/reduce (II)
Courses attended per individual
var map = function(){ for(var i in this.attendants){ emit(this.attendants[i].name,1); } }
Map/reduce (III)
attended per individual Courses var reduce = function(key, values){
var sum=0; for (var i in values){ sum+=values[i]; } return sum; }
Map/reduce (IV)
Courses attended per individual
db.gul.mapReduce({ map: map, reduce: reduce, {out: {inline:1}, query:{initial_query}} });
Get all the courses attended by individual Distinct tags and count
Schema design
Schema Design
Function of the data and the use case Decisions # of collections Embedding or linking Indexes Sharding
Relationships
Types 1:1(person:resume) 1:n (city:person, post:comments) m:n (teacher:student) Doc limit: 16MB Examples: school, blog
Transactions
No transactions Redesign schema Implement in SW Tolerate no transactions
Indexes
Indexes
Objective: Query optimization Used in the query itself and/or the ordering B-Tree indexes _id index is automatic (unique)
db.gul.ensureIndex({ name:1 }) db.gul.getIndexes() db.gul.stats() //Size of the index
Indexes (II)
For arrays, the index is multikey (one
index entry per array element)
Indexes types
default unique
db.gul.ensureIndex({name:1}, {unique:1})
sparse
db.gul.ensureIndex({name:1}, {sparse:1})
Indexes options
dropDups: drop duplicate keys when
creating the index (converted in unique) primary of the replica set, in the foreground on secondaries
Hints
db.gul.find().hint({name:1})
Geospatial indexes
2d-only compound indexes may be used
db.places.ensureIndex({'loc':'2d'}) db.places.find({loc:{ $near:[20,40], $maxDistance:2} }).limit(50)
DBAs stuff
Backups
mongodump / mongorestore copy les using your own software
(journaling enabled required)
Commands
db.gul.runCommand('compact') db.runCommand({compact:'gul'}) //Run a script from the command line mongo < path/to/script.js
Proler
Log queries / commands
mongod --profile 0/1/2 --slowms 100 //0: no //1: slow queries //2: all queries //slowms: threshold for type 1
Proler (II)
From the mongo shell
db.getProfilingLevel() // 0-1-2 db.getProfilingStatus() // { "was" : 0, "slowms" : 100 } db.setProfilingLevel(1,1000)
Kill operations
db.currentOp() in progress operations db.killOp(op_id) Dont kill write ops in secondaries compact internal ops
Security tips
Security
mongod/mongos --auth //not from
localhost
Add user use admin db.addUser(user, passwd, [readOnly]) Auth use admin db.auth(user, passwd)
Types of users
admin created in the admin db access to all dbs regular access a specic db read/write or readOnly
Intra-cluster security
For replica sets, to use non-auth (faster)
communications among the nodes
Replica sets
Write concern
Journal: list of operations (inserts, updates)
done, saved in disk (permanent)
Sharding
What is sharding?
Scalability Horizontal partitioning of a database A BSON document stored in ONE shard Shard key Not unique No unique elds in the collection Mongo offers auto-sharding
What is sharding?
Auto balancing Easy addition of new machines Up to 1k nodes No single point of failure Automatic failover Select a convenient shard key
Sharding cong
Need of cong servers store metadata about chunks mongod --congsvr Need mongod routers mongos (accessed by the apps)
Sharding operations
chunk: range of the sharding key being in a
shard
Sharding diagram
via: http://www.cloudifysource.org/2012/03/25/petclinic_deepdive.html
References
MongoDB devel docs: http://
www.mongodb.org/display/DOCS/ Developer+Zone display/DOCS/Developer+FAQ cookbook.mongodb.org/
References
Kyle Bankers blog: Aggregation: http://kylebanker.com/blog/
2009/11/mongodb-count-group/
kylebanker.com/blog/2010/04/30/ mongodb-and-ecommerce/