PostgreSQL As A NoSQL Database
PostgreSQL As A NoSQL Database
Welcome!
Im Christophe. PostgreSQL person since 1997. Consultant with PostgreSQL Experts, Inc. cpettus@pgexperts.com thebuild.com @xof on Twitter.
A note on NoSQL.
Worst. Term. Ever. Its true that all modern schemaless
databases do not use SQL, but
Schematic.
A schema is a xed (although mutable
over time) denition of the data. table to eld/column/attribute.
Database to schema (unfortunate term) to Individual elds can be optional (NULL). Adding new columns requires a schema
change.
Rock-n-Roll!
Schemaless databases store documents
rather than rows.
They have internal structure, but that structure is per document. No elds! No schemas! Make up whatever
you like!
Grew out of the need for persistent object and impatience with the (perceived)
limitations of relational databases and object-relational managers.
Let us never speak of this again. Theres a lot to talk about in schemaless vs
traditional relational databases.
But lets not. Todays topic: If you want to store And what can you expect?
The application knows what to expect in and how to react if it doesnt get it.
PostgreSQL has you covered. Not one, not two, but three different
document types:
XML
It seemed like a good idea at the time.
XML
Been around since the mid-1990s. Hierarchical structured data based on
SGML.
XML, your dads document language. Can specify XML schemas using DTDs. No one does this. Can do automatic transformations of XML
into other markups using XSLT.
XML Support in PostgreSQL. Built-in type. Can handle documents up to 2 gigabytes. A healthy selection of XML operators. xpath in particular. Very convenient XML export functions. Great for external XML requirements.
XML Indexing.
There isnt any. Unless you build it yourself with an
expression index.
hstore
The hidden gem of contrib/
hstore
A hierarchical storage type specic to
PostgreSQL.
Maps string keys to string values, or to other hstore values. Contrib module; not part of the
PostgreSQL core.
hstore functions
Lots and lots and lots of hstore functions. h->a (get value for key a). h?a (does h contain key a?). h@>a->2 (does key a contain 2?). Many others.
hstore indexing.
Can create GiST and GIN indexes over
hstore values. key.
Indexes the whole hierarchy, not just one Accelerates @>, ?, ?& and ?| operators. Can also build expression indexes.
JSON
All the cool kids are doing it.
JSON
JavaScript Object Notation. JavaScripts data structure declaration
format, turned into a protocol.
Very comfortable for Python and Ruby MongoDBs native data storage type.
JSON Indexing.
Expression indexing. Can also treat as a text string for strict
comparison
PL/V8!
The V8 JavaScript engine from Google is
available as an embedded language. youd expect.
JavaScript deals with JSON very well, as Not part of core or contrib; needs to be
built and installed separately.
PL/V8 ProTips
Use the static V8 engine that comes with
PL/V8.
Function is compiled by V8 on rst use. Now that we got rid of SQL injection
attacks, we now have JSON injection attacks.
Schemaless Strategies
Create single-eld tables with only a
hierarchical type. an object API.
Wrap up the (very simple) SQL to provide Create indexes to taste Maybe extract elds if you need to JOIN. Prot!
CREATE OR REPLACE FUNCTION get_json_key(structure JSON, key TEXT) RETURNS TEXT AS $get_json_key$ var js_object = structure; if (typeof ej != 'object') return NULL; return JSON.stringify(js_object[key]); $get_json_key$ IMMUTABLE STRICT LANGUAGE plv8;
CREATE TABLE blog { post json } CREATE INDEX post_pk_idx ON blog((get_json_key(post, post_id)::BIGINT)); CREATE INDEX post_date_idx ON blog((get_json_key(post, post_date)::TIMESTAMPTZ));
Some Numbers.
When all else fails, measure.
Schemaless Shootout!
A very basic document structure: id, name, company, address1, address2,
city, state, postal code.
The Competitors!
Traditional relational schema. hstore (GiST and GIN indexes). XML JSON One column per table for these. MongoDB
Timing Harness.
Scripts written in Python. psycopg2 2.4.6 for PostgreSQL interface. pymongo 2.4.2 for MongoDB interface.
Indexing Philosophy
For relational, index on primary key. For hstore, index using GiST and GIN (and
none).
For MongoDB, index on primary key. Indexes created before records loaded.
The Sophisticated Database Tuning Philosophy. None. Stock PostgreSQL 9.2.2, from source. No changes to postgresql.conf Stock MongoDB 2.2, from MacPorts. Fire it up, let it go.
We measure total load time, including (COPY will be much much much faster.) mongoimport too, most likely.
Records/Second
6000
4500
3000
1500
Relational
hstore
XML
JSON
MongoDB
Observations.
No attempt made to speed up PostgreSQL. Synchronous commit, checkpoint tuning,
etc.
GIN indexes are really slow to build. The XML xpath function is probably the
culprit for its load time.
Data
Index
1500
750
Relational
hstore
XML
JSON
MongoDB
Observations.
GIN indexes are really big on disk. PostgreSQLs relational data storage is very
efcient.
MongoDB certain likes its disk space. padding factor was 1, so it wasnt that.
Next Test: Query on Primary Key For a sample of 100 documents, query a
Results not fetched. For PostgreSQL, time of .execute()
method from Python. method.
300
200
100
Relational
hstore
XML
JSON
MongoDB
9.75
6.5
3.25
Relational
XML
JSON
MongoDB
300
200
100
hstore
hstore (GiST)
hstore (GIN)
Observations.
B-tree indexes kick ass. GiST and GIN not even in same league
for simple key retrieval.
37500
25000
12500
Relational
hstore
XML
JSON
MongoDB
375
250
125
Relational
hstore
hstore (GiST)
hstore (GIN)
MongoDB
37500
25000
12500
XML
JSON
Observations.
GiST and GIN accelerate every eld, not
just the primary key.
MongoDBs grotesquely bloated disk Not that theres anything wrong with that.
Some Conclusions.
PostgreSQL does pretty well as a
schemaless database.
or use GiST and hstore if you want GIN might well be worth it for other cases.
Some Conclusions, 2.
Avoid doing full-table scans if you need to
use an accessor function. to xpath or a PL.
Although hstores are not bad compared Seriously consider hstore if you have the
exibility.
Flame Bait!
MongoDB doesnt seem to be more
performant than PostgreSQL. goodies.
And you still get all of PostgreSQLs Larger documents will probably continue to
favor PostgreSQL.
Fire Extinguisher.
You can nd workloads that prove any dBase II included.
data model, now and in the future. real-world volumes. data storage technology is the right answer.
Be very realistic about your workload and Test, and test fairly with real-world data in
Thank you!
thebuild.com @xof