A Crash Course in MongoDB
A Crash Course in MongoDB
A Crash Course in MongoDB
Crash Course
in
MongoDB
PyCon US 2013
Andy Dirnberger
github.com/dirn @dirnonline Engineering @ CBS Local
hi. Im
dirn@dirnonline.com
So what is
? MongoDB
http://mongodb.org
MongoDB is...
Document-oriented JSON-like (BSON) Dynamic schema* Scalable Open Source (GNU AGPL v3.0)**
*not the same thing as schemaless **drivers use the Apache license
Metrics Logging* Messaging Queues Blog Content Management Anything you want
*Capped collections behave as xed-sized FIFO queues *TTL collections have a special index that will automatically remove old data
To run MongoDB...
http://docs.mongodb.org/manual/tutorial/manage-mongodb-processes/
PyMongo MongoDB
using with
Python
https://github.com/mongodb/mongo-python-driver
The driver...
Install it:
$ pip install pymongo
Packages:
pymongo bson gridfs
http://api.mongodb.org/python/current/
BSON supports...
http://bsonspec.org/
50d4dce70ea5fae6fb84e44b
4-byte timestamp (50d4dce7) 3-byte machine identier (0ea5fa) 2-byte process ID (e6fb) 3-byte counter (84e44b)
Connect with MongoClient >>> from pymongo import MongoClient >>> >>> MongoClient(host='localhost', port=27017) MongoClient('localhost', 27017) >>> >>> MongoClient(host='mongodb://localhost:27017') MongoClient('localhost', 27017) >>> >>> MongoClient('mongodb://localhost:27017').pycon Database(MongoClient('localhost', 27017), u'pycon')
Querying
Documents can be retrieved with... >>> coll = db.talks >>> coll.find_one({ 'name': 'A Crash Course in MongoDB'}) { u'track': 2, u'_id': ObjectId('5145e5380ea5fa321fa97064'), u'speaker': u'Andy Dirnberger', u'name': u'A Crash Course in MongoDB', u'language': u'python', u'time': datetime.datetime(2013, 3, 17, 14, 30) }
>>> coll.find({ 'track': 2, 'time': {'$gte': datetime(2013, 3, 17), '$lt': datetime(2013, 3, 18)}}, {'name': 1}) <pymongo.cursor.Cursor object at 0x10da4ed90>
http://docs.mongodb.org/manual/reference/operators/#query-selectors
>>> for doc in cursor: ... print doc ... {u'_id': ObjectId('5145e4f00ea5fa321fa97062'), u'name': u'Elasticsearch (Part 2)'} {u'_id': ObjectId('5145e5200ea5fa321fa97063'), u'name': u'Going beyond the Django ORM'} {u'_id': ObjectId('5145e5380ea5fa321fa97064'), u'name': u'A Crash Course in MongoDB'}
http://api.mongodb.org/python/current/api/pymongo/cursor.html
Updating
>>> coll.remove({ 'language': {'$in': ['php', 'node.js']}}) { u'connectionId': 8, u'ok': 1.0, u'err': None, u'n': 0 }
>>> coll.remove({'language': {'$ne': 'python'}}) { u'connectionId': 8, u'ok': 1.0, u'err': None, u'n': 0 }
Documents can be inserted with... >>> db.sessions.update( {'track': 2}, {'track': 2, 'date': datetime(2013, 3, 17), 'order': 1, 'chair': 'Megan Speir', 'runner': 'Erik Bray'}, upsert=True) { ... u'upserted': ObjectId('5145ecfd3f69a773554253e8'), u'n': 1, u'updatedExisting': False }
save()
Works like update(..., upsert=True) if _id is specied, insert() if its not
find_and_modify()
Modies the document in the database, returns the original by default, the updated with new=True
A note about update() >>> db.sessions.update( {'_id': ObjectId('5145ecfd3f69a773554253e8')}, {'num_talks': 3}) {...} >>> >>> # The document has been replaced >>> db.sessions.find_one({ '_id': ObjectId('5145ecfd3f69a773554253e8')}) { u'_id': ObjectId('5145ecfd3f69a773554253e8'), u'num_talks': 3 }
Using update operators to target specic elds... >>> db.sessions.update( {'_id': ObjectId('5145ecfd3f69a773554253e8')}, {'$set': {'num_talks': 3}}) { u'updatedExisting': True, u'connectionId': 8, u'ok': 1.0, u'err': None, u'n': 1 }
http://docs.mongodb.org/manual/reference/operators/#update
Write concern...
w
The number of servers that must acknowledge the write, including the primary
wtimeout
The timeout for the write, without it the write could block forever
http://docs.mongodb.org/manual/core/write-operations/#write-concern
Write concern...
Indexes
create_index()
Unconditionally creates an index on one or more elds
ensure_index()
Works like create_index() except the driver will remember that the index was already made
Indexes...
Are directional
>>> db.sessions.ensure_index([ ('date', pymongo.ASCENDING), ('order', pymongo.DESCENDING)]) u'date_1_order_-1'
Can be sparse
Only documents containing all elds in the index will be included in the index
Explain plans... { 'cursor' : '<Cursor Type and Index>', 'n' : <num (documents matching query)>, 'nscanned': <num (documents scanned)>, 'scanAndOrder': <boolean>, } You want n and nscanned to be as close together as possible If scanAndOrder is True, the index cant be used for sorting
http://docs.mongodb.org/manual/reference/explain/
GridFS
http://docs.mongodb.org/manual/applications/gridfs/
To use GridFS... >>> import gridfs >>> fs = gridfs.GridFS(db) >>> file_id = fs.put('PyCon 2013', city='Santa Clara', state='CA') >>> file = fs.get(file_id) >>> file.read() 'PyCon 2013' >>> file.upload_date datetime.datetime(2013, 3, 17, 21, 30, 0, 0) >>> file.city, file.state (u'Santa Clara', u'CA')
GridFS is versioned...
get_last_version()
Gets the most recent le matching the query
get_version()
Works like get_last_version() except it can request specic versions of a le
Geospatial
Create an index...
>>> db.tracks.update( {'_id': ObjectId('5145eb4e0ea5fa321fa97065')}, {'loc': [37.3542, 121.9542]}) {...} >>> db.tracks.ensure_index([ ('loc', pymongo.GEO2D)]) u'loc_2d'
http://docs.mongodb.org/manual/applications/geospatial-indexes/
>>> db.tracks.find({'loc': [37.3542, 121.9542]}) <pymongo.cursor.Cursor object at 0x10e14eb90> >>> db.tracks.find({ 'loc': {'$near': [37.3542, 121.9542]}}) <pymongo.cursor.Cursor object at 0x10e14edd0>
{'$center': [center, radius]} {'$box': [[x1, y1], [x2, y2]]} {'$polygon': [[x1, y1], [x2, y2],
[x3, y3]]}
Anything else...
Aggregation Framework
Helps with simple map reduce queries, but is subject to the same 16MB as documents
Libraries
http://api.mongodb.org/python/current/tools.html
Thank you!
dirn.it/PyCon2013
Questions?