100% found this document useful (1 vote)

209 views

PostgreSQL As A NoSQL Database

PostgreSQL can be used as a schemaless database using three different data types: XML, hstore, and JSON. XML has been available since the 1990s but has performance issues for indexing. Hstore uses key-value pairs and has good indexing support via GiST and GIN indexes. JSON was recently added to the core in PostgreSQL 9.2 and has basic functionality, with more coming in 9.3. Based on benchmarks loading 1.78 million records and querying on the primary key, the relational and MongoDB approaches had the best performance, while hstore with GiST/GIN and XML had slower performance than the other options.

Uploaded by

Ivan Voras

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

209 views

PostgreSQL As A NoSQL Database

Uploaded by

Ivan Voras

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 61

PostgreSQL as a Schemaless Database.

Christophe Pettus PostgreSQL Experts, Inc. PgDay FOSDEM 2013

Welcome!
Im Christophe. PostgreSQL person since 1997. Consultant with PostgreSQL Experts, Inc. cpettus@pgexperts.com thebuild.com @xof on Twitter.

Whats on the menu?

What is a schemaless database? How can you use PostgreSQL to store
schemaless data?

How does do the various schemaless

options perform?

A note on NoSQL.
Worst. Term. Ever. Its true that all modern schemaless
databases do not use SQL, but

Neither did Postgres before it became

PostgreSQL. (Remember QUEL?) xed schema.

The dening characteristic is the lack of a

Schematic.
A schema is a xed (although mutable
over time) denition of the data. table to eld/column/attribute.

Database to schema (unfortunate term) to Individual elds can be optional (NULL). Adding new columns requires a schema
change.

Rock-n-Roll!
Schemaless databases store documents
rather than rows.

They have internal structure, but that structure is per document. No elds! No schemas! Make up whatever
you like!

We are not amused.

Culturally, very different from the glass
house data warehouse model. storage

Grew out of the need for persistent object and impatience with the (perceived)
limitations of relational databases and object-relational managers.

Let us never speak of this again. Theres a lot to talk about in schemaless vs
traditional relational databases.

But lets not. Todays topic: If you want to store And what can you expect?

schemaless data in PostgreSQL, how can you?

What is schemaless data?

Schemaless does not mean unstructured. Each document (=record/row) is a
hierarchical structure of arrays and keyvalue pairs. one of these

The application knows what to expect in and how to react if it doesnt get it.

PostgreSQL has you covered. Not one, not two, but three different
document types:

Lets see what theyve got.

XML hstore JSON

XML
It seemed like a good idea at the time.

XML
Been around since the mid-1990s. Hierarchical structured data based on
SGML.

Underlying technology for SOAP and a lot

of other stuff that was really popular for a while.

Still super-popular in the Java world.

XML, your dads document language. Can specify XML schemas using DTDs. No one does this. Can do automatic transformations of XML
into other markups using XSLT.

Lets not forget the most important use of

XML!

Only the masochistic do this.

Tomcat Conguration Files.

XML Support in PostgreSQL. Built-in type. Can handle documents up to 2 gigabytes. A healthy selection of XML operators. xpath in particular. Very convenient XML export functions. Great for external XML requirements.

XML Indexing.
There isnt any. Unless you build it yourself with an
expression index.

Functionality is great. Performance is well talk about this later.

hstore
The hidden gem of contrib/

hstore
A hierarchical storage type specic to
PostgreSQL.

Maps string keys to string values, or to other hstore values. Contrib module; not part of the
PostgreSQL core.

hstore functions
Lots and lots and lots of hstore functions. h->a (get value for key a). h?a (does h contain key a?). h@>a->2 (does key a contain 2?). Many others.

hstore indexing.
Can create GiST and GIN indexes over
hstore values. key.

Indexes the whole hierarchy, not just one Accelerates @>, ?, ?& and ?| operators. Can also build expression indexes.

JSON
All the cool kids are doing it.

JSON
JavaScript Object Notation. JavaScripts data structure declaration
format, turned into a protocol.

Dictionaries, arrays, primitive types. Originally designed to just be passed into

eval() in JavaScript.

Please dont do this.

JSON, the new hotness

The de facto standard API data format for
REST web services. programmers.

Very comfortable for Python and Ruby MongoDBs native data storage type.

JSON? Yeah, we got that.

JSON type in core as of 9.2. Validates JSON going in. And not much else right now. array_to_json, row_to_json. Lots more coming in 9.3 (offer subject to
committer approval).

JSON Indexing.
Expression indexing. Can also treat as a text string for strict
comparison

which is kind of a weird idea and Im

not sure why youd do that.

But the coolest part of JSON in core is!

PL/V8!
The V8 JavaScript engine from Google is
available as an embedded language. youd expect.

JavaScript deals with JSON very well, as Not part of core or contrib; needs to be
built and installed separately.

PL/V8 ProTips
Use the static V8 engine that comes with
PL/V8.

Function is compiled by V8 on rst use. Now that we got rid of SQL injection
attacks, we now have JSON injection attacks.

PL invocation overhead is non-trivial.

Schemaless Strategies
Create single-eld tables with only a
hierarchical type. an object API.

Wrap up the (very simple) SQL to provide Create indexes to taste Maybe extract elds if you need to JOIN. Prot!

CREATE OR REPLACE FUNCTION get_json_key(structure JSON, key TEXT) RETURNS TEXT AS $get_json_key$ var js_object = structure; if (typeof ej != 'object') return NULL; return JSON.stringify(js_object[key]); $get_json_key$ IMMUTABLE STRICT LANGUAGE plv8;

CREATE TABLE blog { post json } CREATE INDEX post_pk_idx ON blog((get_json_key(post, post_id)::BIGINT)); CREATE INDEX post_date_idx ON blog((get_json_key(post, post_date)::TIMESTAMPTZ));

But but but

PostgreSQL was not designed to be a
schemaless database.

Wouldnt it be better to use a bespoke Well, lets nd out!

database designed for this kind of data?

Some Numbers.
When all else fails, measure.

Schemaless Shootout!
A very basic document structure: id, name, company, address1, address2,
city, state, postal code.

address2 and company are optional

(NULL in relational version).

1,780,000 records, average 63 bytes each.

id 64-bit integer, all others text.

The Competitors!
Traditional relational schema. hstore (GiST and GIN indexes). XML JSON One column per table for these. MongoDB

Timing Harness.
Scripts written in Python. psycopg2 2.4.6 for PostgreSQL interface. pymongo 2.4.2 for MongoDB interface.

The Test Track.

This laptop. OS X 10.7.5. 2.8GHz Intel Core i7. 7200 RPM disk. 8GB (never comes close to using a fraction
of it).

Indexing Philosophy
For relational, index on primary key. For hstore, index using GiST and GIN (and
none).

For JSON and XML, expression index on

primary key.

For MongoDB, index on primary key. Indexes created before records loaded.

Your Methodology Sucks.

Documents are not particularly large. No deep hierarchies. Hot cache. Only one index. No joins. No updates.

The Sophisticated Database Tuning Philosophy. None. Stock PostgreSQL 9.2.2, from source. No changes to postgresql.conf Stock MongoDB 2.2, from MacPorts. Fire it up, let it go.

First Test: Bulk Load

Scripts read a CSV le, parse it into the
appropriate format, INSERT it into the database. parsing time.

We measure total load time, including (COPY will be much much much faster.) mongoimport too, most likely.

Records/Second
6000

4500

3000

1500

Relational

hstore

hstore (GiST) hstore (GIN)

XML

JSON

MongoDB

Observations.
No attempt made to speed up PostgreSQL. Synchronous commit, checkpoint tuning,
etc.

GIN indexes are really slow to build. The XML xpath function is probably the
culprit for its load time.

Next Test: Disk Footprint.

Final disk footprint once data is loaded. For PostgreSQL, reported database sizes
from the pg_*_size functions.

For MongoDB, reported by db.stats().

Disk Footprint in Megabytes

2250

Data

Index

1500

750

Relational

hstore

hstore (GiST) hstore (GIN)

XML

JSON

MongoDB

Observations.
GIN indexes are really big on disk. PostgreSQLs relational data storage is very
efcient.

MongoDB certain likes its disk space. padding factor was 1, so it wasnt that.

None of these records are TOAST-able.

Next Test: Query on Primary Key For a sample of 100 documents, query a
Results not fetched. For PostgreSQL, time of .execute()
method from Python. method.

single document based on the primary key.

For MongoDB, time of .fetch()

Fetch Time in Milliseconds

400

300

200

100

Relational

hstore

hstore (GiST) hstore (GIN)

XML

JSON

MongoDB

Fetch Time in Milliseconds (<100ms)

9.75

6.5

3.25

Relational

XML

JSON

MongoDB

Fetch Time in Milliseconds (>100ms)

400

300

200

100

hstore

hstore (GiST)

hstore (GIN)

Observations.
B-tree indexes kick ass. GiST and GIN not even in same league
for simple key retrieval.

Difference between relational, XML and

JSON is not statistically signicant. be super-performant. Huh.

Wait, I thought MongoDB was supposed to

Next Test: Query on Name

For a sample of 100 names, query all
documents with that name.

Results not fetched. Required a full-table scan (except for

hstore with GiST and GIN indexes).

Same timing methodology.

Fetch Time in Milliseconds

50000

37500

25000

12500

Relational

hstore

hstore (GiST) hstore (GIN)

XML

JSON

MongoDB

Fetch Time in Milliseconds (<500ms)

500

375

250

125

Relational

hstore

hstore (GiST)

hstore (GIN)

MongoDB

Fetch Time in Milliseconds (>500ms)

50000

37500

25000

12500

XML

JSON

Observations.
GiST and GIN accelerate every eld, not
just the primary key.

Wow, executing the accessor function on

each XML and JSON eld is slow. footprint hurts it here.

MongoDBs grotesquely bloated disk Not that theres anything wrong with that.

Now that we know this, what do we know?

Some Conclusions.
PostgreSQL does pretty well as a
schemaless database.

Build indexes using expressions on

commonly-queried elds full exibility.

or use GiST and hstore if you want GIN might well be worth it for other cases.

Some Conclusions, 2.
Avoid doing full-table scans if you need to
use an accessor function. to xpath or a PL.

Although hstores are not bad compared Seriously consider hstore if you have the
exibility.

Its really fast.

Flame Bait!
MongoDB doesnt seem to be more
performant than PostgreSQL. goodies.

And you still get all of PostgreSQLs Larger documents will probably continue to
favor PostgreSQL.

As will larger tables.

Fire Extinguisher.
You can nd workloads that prove any dBase II included.
data model, now and in the future. real-world volumes. data storage technology is the right answer.

Be very realistic about your workload and Test, and test fairly with real-world data in

Thank you!

thebuild.com @xof

Postgresql InterviewQuestion
100% (1)
Postgresql InterviewQuestion
5 pages
Ora2postgres DF
No ratings yet
Ora2postgres DF
72 pages
DB Monitoring & Performance Script
No ratings yet
DB Monitoring & Performance Script
14 pages
Sybex - Mcts.microsoft - sql.Server.2005.Implementation - And.maintenance - Study.guide - Exam.70 431.jul.2
100% (1)
Sybex - Mcts.microsoft - sql.Server.2005.Implementation - And.maintenance - Study.guide - Exam.70 431.jul.2
679 pages
Aluminum Household Utensil Making Plant
100% (2)
Aluminum Household Utensil Making Plant
26 pages
Introduction To Postgresql
100% (1)
Introduction To Postgresql
54 pages
DATA WAREHOUSE - Imp
No ratings yet
DATA WAREHOUSE - Imp
76 pages
Internals of PostgreSQL Wal
100% (1)
Internals of PostgreSQL Wal
51 pages
Postgres For Oracle Dbas PDF
No ratings yet
Postgres For Oracle Dbas PDF
41 pages
Introduction To PL PGSQL Development
No ratings yet
Introduction To PL PGSQL Development
145 pages
Introduction Postgre SQLAdministration V11
No ratings yet
Introduction Postgre SQLAdministration V11
274 pages
1. Phần Mềm Cài Đặt: 2.1. Install Oracle Goldengate For Oracle
No ratings yet
1. Phần Mềm Cài Đặt: 2.1. Install Oracle Goldengate For Oracle
9 pages
01 PostgreSQL Introduction
100% (1)
01 PostgreSQL Introduction
14 pages
PostgreSQL Database Performance Optimization
No ratings yet
PostgreSQL Database Performance Optimization
59 pages
ASM Troubleshooting Overview PDF
No ratings yet
ASM Troubleshooting Overview PDF
27 pages
High Performance PostgreSQL, Tuning and Optimization Guide - FileId - 160682
No ratings yet
High Performance PostgreSQL, Tuning and Optimization Guide - FileId - 160682
21 pages
Postgres Topic
No ratings yet
Postgres Topic
116 pages
How SQL Server Stores Data On Disk in The Data and Log Files
No ratings yet
How SQL Server Stores Data On Disk in The Data and Log Files
16 pages
Oracle DBA Material Draft
100% (2)
Oracle DBA Material Draft
143 pages
EDB High Availability Scalability v1.0
No ratings yet
EDB High Availability Scalability v1.0
23 pages
Inside PostgreSQL Shared Memory
100% (3)
Inside PostgreSQL Shared Memory
25 pages
PostgreSQL Quick Start
100% (1)
PostgreSQL Quick Start
57 pages
Data Warehouse Concepts & Terminology: - Vamshi Myana
No ratings yet
Data Warehouse Concepts & Terminology: - Vamshi Myana
39 pages
03-PostgreSQL-Database Admin Overview
No ratings yet
03-PostgreSQL-Database Admin Overview
32 pages
Database Architecture Oracle Dba
100% (1)
Database Architecture Oracle Dba
41 pages
DB12c Tuning New Features Alex Zaballa PDF
No ratings yet
DB12c Tuning New Features Alex Zaballa PDF
82 pages
AWR Architecture: - V$Sess - Time - Model V$Sys - Time - Model - V$Active - Session - History - V$Sysstat V$Sesstat
No ratings yet
AWR Architecture: - V$Sess - Time - Model V$Sys - Time - Model - V$Active - Session - History - V$Sysstat V$Sesstat
11 pages
PostgreSQL On Amazon RDS - Amazon Relational Database Service PDF
No ratings yet
PostgreSQL On Amazon RDS - Amazon Relational Database Service PDF
72 pages
DBA Notes
67% (3)
DBA Notes
102 pages
DB2 LUW For The Oracle DBA
No ratings yet
DB2 LUW For The Oracle DBA
46 pages
SQL & NoSQL Cheat Sheet
No ratings yet
SQL & NoSQL Cheat Sheet
52 pages
15 Advanced PostgreSQL Commands
No ratings yet
15 Advanced PostgreSQL Commands
11 pages
Postgresql Performance Tuning
No ratings yet
Postgresql Performance Tuning
7 pages
The Best of Bruce's Postgres Slides: Ruce Omjian
No ratings yet
The Best of Bruce's Postgres Slides: Ruce Omjian
26 pages
SRVCTL Command Sheet
No ratings yet
SRVCTL Command Sheet
6 pages
SQL Oracle11g Notes
No ratings yet
SQL Oracle11g Notes
125 pages
Security Best Practices For Postgresql: Whitepaper
No ratings yet
Security Best Practices For Postgresql: Whitepaper
14 pages
Oracle 11G Dataguard Configuration
100% (2)
Oracle 11G Dataguard Configuration
18 pages
Introbook v4 en
No ratings yet
Introbook v4 en
145 pages
Oracle DBA WST 2015 Madhu
100% (1)
Oracle DBA WST 2015 Madhu
430 pages
Sop - DB Admin
No ratings yet
Sop - DB Admin
16 pages
10gen-MongoDB Operations Best Practices
No ratings yet
10gen-MongoDB Operations Best Practices
26 pages
Administering Snowflake
No ratings yet
Administering Snowflake
4 pages
Postgre SQL
No ratings yet
Postgre SQL
35 pages
What Is RAC
No ratings yet
What Is RAC
6 pages
Optimizing Data Loading
No ratings yet
Optimizing Data Loading
26 pages
MongoDB Quick Book
No ratings yet
MongoDB Quick Book
11 pages
SQL PLSQL
No ratings yet
SQL PLSQL
213 pages
Internals of Active DataGuard
100% (1)
Internals of Active DataGuard
28 pages
Oracle 12c - CDB - PDB - Performing Basic Tasks PDF
No ratings yet
Oracle 12c - CDB - PDB - Performing Basic Tasks PDF
18 pages
IBM Replication Updates: 4+ in 45: The Fillmore Group - February 2019
No ratings yet
IBM Replication Updates: 4+ in 45: The Fillmore Group - February 2019
46 pages
PL SQL
100% (1)
PL SQL
62 pages
Informatica Session Properties Presentation1
No ratings yet
Informatica Session Properties Presentation1
18 pages
Relational (OLTP) Data Modeling
No ratings yet
Relational (OLTP) Data Modeling
2 pages
Advance SQL
No ratings yet
Advance SQL
103 pages
Optimizing SQL Queries in Oracle
0% (1)
Optimizing SQL Queries in Oracle
69 pages
SQL Questions
100% (1)
SQL Questions
28 pages
Inner Join Returns Only Rows That Match in Both The Tables
No ratings yet
Inner Join Returns Only Rows That Match in Both The Tables
4 pages
Oracle SOA BPEL Process Manager 11gR1 A Hands-on Tutorial
From Everand
Oracle SOA BPEL Process Manager 11gR1 A Hands-on Tutorial
Ravi Saraswathi
5/5 (1)
IBM InfoSphere Replication Server and Data Event Publisher
From Everand
IBM InfoSphere Replication Server and Data Event Publisher
Pav Kumar-Chatterjee
No ratings yet
Oracle Database 11g - Underground Advice for Database Administrators: Beyond the basics
From Everand
Oracle Database 11g - Underground Advice for Database Administrators: Beyond the basics
April C. Sims
No ratings yet
ABC Chemistry Rates Modern Publishers
18% (11)
ABC Chemistry Rates Modern Publishers
5 pages
Just A Bit Wicked 7
No ratings yet
Just A Bit Wicked 7
194 pages
RC 2
No ratings yet
RC 2
13 pages
How To Become Millionare in 5 Years or Less
No ratings yet
How To Become Millionare in 5 Years or Less
5 pages
0.1 Differential Operator: D DX D DX 2
No ratings yet
0.1 Differential Operator: D DX D DX 2
16 pages
Functions in Python
No ratings yet
Functions in Python
3 pages
PHYS1101 - Advanced Physics Varsha Venkatesh and William Jackson 29 May, 2009
No ratings yet
PHYS1101 - Advanced Physics Varsha Venkatesh and William Jackson 29 May, 2009
4 pages
Electric-Fusion-Welded Steel Pipe For Atmospheric and Lower Temperatures
No ratings yet
Electric-Fusion-Welded Steel Pipe For Atmospheric and Lower Temperatures
7 pages
Book Review-Leading Geeks
No ratings yet
Book Review-Leading Geeks
10 pages
Vargas
No ratings yet
Vargas
12 pages
Ricotta Cheese Making Recipe - Cheese Making Supply Co - Information
No ratings yet
Ricotta Cheese Making Recipe - Cheese Making Supply Co - Information
4 pages
Every Letter I Write Is Not A Love Letter': Ina Blom
No ratings yet
Every Letter I Write Is Not A Love Letter': Ina Blom
10 pages
The Vocabulary of Regionalism
No ratings yet
The Vocabulary of Regionalism
1 page
Toyota Ebook PDF
100% (2)
Toyota Ebook PDF
47 pages
Soc105 - r5 - Consumption and Mass Media Worksheet
100% (1)
Soc105 - r5 - Consumption and Mass Media Worksheet
3 pages
S. No Basis of Comparison Baseband Transmission Broadband Transmission
No ratings yet
S. No Basis of Comparison Baseband Transmission Broadband Transmission
2 pages
EXL - Cetpa List
No ratings yet
EXL - Cetpa List
3 pages
Club Body Boss Training Plans
No ratings yet
Club Body Boss Training Plans
13 pages
Compal La-B131p r1.0 Schematics
No ratings yet
Compal La-B131p r1.0 Schematics
53 pages
Marina Obooko
No ratings yet
Marina Obooko
85 pages
FINAL EXAM Grade 8 3rd Quarter
No ratings yet
FINAL EXAM Grade 8 3rd Quarter
5 pages
Lesson Exemplar For Handicraft Production
No ratings yet
Lesson Exemplar For Handicraft Production
2 pages
3D Printing
No ratings yet
3D Printing
17 pages
Biomass - A Renewable Energy Source
No ratings yet
Biomass - A Renewable Energy Source
311 pages
Chapter - I: Travelling Allowance
No ratings yet
Chapter - I: Travelling Allowance
55 pages
I. Concept Notes Joint Costs
No ratings yet
I. Concept Notes Joint Costs
9 pages
Feeling Stressed: Plant A Garden
No ratings yet
Feeling Stressed: Plant A Garden
4 pages
Data Processing Instructions
No ratings yet
Data Processing Instructions
21 pages
Fischer-Tropsch Fuels: R&D Fac Ts
No ratings yet
Fischer-Tropsch Fuels: R&D Fac Ts
2 pages