Building A Scalable Time-Series Database Using Postgres: Mike Freedman
Building A Scalable Time-Series Database Using Postgres: Mike Freedman
Building A Scalable Time-Series Database Using Postgres: Mike Freedman
using Postgres
Mike Freedman
Co-founder / CTO, Timescale
mike@timescale.com
https://github.com/timescale/timescaledb
Time-series data is everywhere,
greater volumes than ever before
What DB for time-series data?
Relational 32%
NoSQL 68%
0% 23.333% 46.667% 70%
https://www.percona.com/blog/2017/02/10/percona-blog-poll-database-engine-using-store-time-series-data/
Why so much NoSQL?
1. Schemas are a pain
2. Scalability!
Postgres, MySQL:
• JSON/JSONB data types
• Constraint validation!
Not applicable:
1. Don’t need for time-series
2. NoSQL doesn’t solve anyway
Postgres 9.6.2 on Azure standard DS4 v2 (8 cores), SSD (premium LRS storage)
Each row has 12 columns (1 timestamp, indexed 1 host ID, 10 metrics)
Challenge in Scaling Up
• As table grows large:
– Data and indexes no longer fit in memory
– Reads/writes to random locations in B-tree
– Separate B-tree for each secondary index
older
Adaptive time/space partitioning
(for both scaling up & out)
older
How EXACTLY do we partition by time?
Manage it
like Postgres
Familiar management
Looks/feels/speaks PostgreSQL
Administration Connectors!
ODBC, JDBC, Postgres
• Query improvements
– Better constrained exclusions avoid querying children
– New time/partition-aware query optimizations
– New time-oriented features
• Insert improvements
– Adaptive auto-creation/closing of partitions
– More efficient insert path (both single row and batch)
Familiar management
Creating/migrating is easy
$ psql
psql (9.6.2)
Type "help" for help.
144K metrics/s
14.4K inserts/s
Postgres 9.6.2 on Azure standard DS4 v2 (8 cores), SSD (premium LRS storage)
Each row has 12 columns (1 timestamp, indexed 1 host ID, 10 metrics)
Single-node INSERT scalability
144K metrics/s
14.4K inserts/s
Postgres 9.6.2 on Azure standard DS4 v2 (8 cores), SSD (premium LRS storage)
Each row has 12 columns (1 timestamp, indexed 1 host ID, 10 metrics)
Single-node INSERT scalability
1.3M metrics/s
130K inserts/s
15x
Postgres 9.6.2 on Azure standard DS4 v2 (8 cores), SSD (premium LRS storage)
Each row has 12 columns (1 timestamp, indexed 1 host ID, 10 metrics)
Single-node QUERY performance
21,991%
Mean results for 2500 query, randomly chosen IDs and times for each query
Single-node QUERY performance
e.g., query “max per minute for all hosts with limit” is SQL:
Mean results for 2500 query, randomly chosen IDs and times for each query
Should NOT use if: Should use if:
✗ Simple read requirements: ✓ Full SQL: Complex predicates
KV lookups, single-column rollup or aggregates, JOINs
https://github.com/timescale/timescaledb
Apache 2.0 license
https://github.com/timescale/timescaledb
Apache 2.0 license