0% found this document useful (0 votes)

11 views

Postgresql Benchmark

The document discusses reaching 1 billion rows per second in PostgreSQL by scaling out to multiple nodes. It details how the author tweaked PostgreSQL for parallel query processing, implemented sharding across 32 nodes to process queries in parallel, and was able to achieve over 1 billion rows per second. It also discusses future improvements planned for PostgreSQL.

Uploaded by

Robert Marbun

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views

Postgresql Benchmark

Uploaded by

Robert Marbun

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 36

Reaching 1 billion rows / second

Hans-Jürgen Schönig
www.postgresql-support.de

Hans-Jürgen Schönig
www.postgresql-support.de
Reaching a milestone

Hans-Jürgen Schönig
www.postgresql-support.de
Goal

I Processing 1 billion rows / second

I Show a path to even more scalability
I Silence the “scalability” discussion at some point
I See where the limitations are
I Do it WITHOUT commercial tools, warehousing tools, etc.

Hans-Jürgen Schönig
www.postgresql-support.de
Traditional PostgreSQL limitations

I Traditionally:
I We could only use 1 CPU core per query
I Scaling was possible by running more than one query at a time
I Usually hard to do

Hans-Jürgen Schönig
www.postgresql-support.de
PL/Proxy: The traditional way to do it

I PL/Proxy is a stored procedure language to scale out to shards.

I Worked nicely for OLTP workloads
I Somewhat usable for analytics
I A LOT of manual work

Hans-Jürgen Schönig
www.postgresql-support.de
On the app level

I Doing scaling on the app level

I A lot of manual work
I Not cool enough
I Needs a lot of development
I Why use a database if work is still manual?
I Solving things on the app level is certainly not an option

Hans-Jürgen Schönig
www.postgresql-support.de
The 1 billion row challenge

Hans-Jürgen Schönig
www.postgresql-support.de
Coming up with a data structure

I We tried to keep that simple:

Hans-Jürgen Schönig
www.postgresql-support.de
The query

SELECT grp, count(data)

FROM t_demo
GROUP BY 1;

Hans-Jürgen Schönig
www.postgresql-support.de
Single server performance

Hans-Jürgen Schönig
www.postgresql-support.de
Tweaking a simple server

I The main questions are:

I How much can we expect from a single server?
I How well does it scale with many CPUs?
I How far can we get?

Hans-Jürgen Schönig
www.postgresql-support.de
PostgreSQL parallelism

I Parallel queries have been added in PostgreSQL 9.6

I It can do a lot
I It is by far not feature complete yet
I Number of workers will be determined by the PostgreSQL
optimizer
I We do not want that
I We want ALL cores to be at work

Hans-Jürgen Schönig
www.postgresql-support.de
Adjusting CPU core usage

I Usually the number of processes per scan is derived from the

size of the table

test=# SHOW min_parallel_relation_size ;

min_parallel_relation_size
----------------------------
8MB
(1 row)

I One process is added if the tablesize triples

Hans-Jürgen Schönig
www.postgresql-support.de
Overruling the planner

I We could never have enough data to make PostgreSQL go for

16 or 32 cores.
I Even if the value is set to a couple of kilobytes.
I The default mechanism can be overruled:

test=# ALTER TABLE t_demo

SET (parallel_workers = 32);
ALTER TABLE

Hans-Jürgen Schönig
www.postgresql-support.de
Making full use of cores

I How well does PostgreSQL scale on a single box?

I For the next test we assume that I/O is not an issue
I If I/O does not keep up, CPU does not make a difference
I Make sure that data can be read fast enough.
I Observation: 1 SSD might not be enough to feed a modern
Intel chip

Hans-Jürgen Schönig
www.postgresql-support.de
Single node scalability (1)

Hans-Jürgen Schönig
www.postgresql-support.de
{
Single node scalability (2)

I We used a 16 core box here

I As you can see, the query scales up nicely
I Beyond 16 cores hyperthreading kicks in
I We managed to gain around 18%

Hans-Jürgen Schönig
www.postgresql-support.de
Single node scalability (3)

I On a single Google VM we could reach close to 40 million rows

/ second
I For many workloads this is already more than enough
I Rows / sec will of course depend on type of query

Hans-Jürgen Schönig
www.postgresql-support.de
Moving on to many nodes

Hans-Jürgen Schönig
www.postgresql-support.de
The basic system architecture (1)

I We want to shard data to as many nodes as needed

I For the demo: Place 100 million rows on each node
I We do so to eliminate the I/O bottleneck
I In case I/O happens we can always compensate using more
servers
I Use parallel queries on each shard

Hans-Jürgen Schönig
www.postgresql-support.de
Testing with two nodes (1)

explain SELECT grp, COUNT(data) FROM t_demo GROUP BY 1;

Finalize HashAggregate
Group Key: t_demo.grp
-> Append
-> Foreign Scan (partial aggregate)
-> Foreign Scan (partial aggregate)
-> Partial HashAggregate
Group Key: t_demo.grp
-> Seq Scan on t_demo

Hans-Jürgen Schönig
www.postgresql-support.de
Testing with two nodes (2)

I Throughput doubles as long as partial results are small

I Planner pushes down stuff nicely
I Linear increases are necessary to scale to 1 billion rows

Hans-Jürgen Schönig
www.postgresql-support.de
Preconditions to make it work (1)

I postgres_fdw uses cursors on the remote side

I cursor_tuple_fraction has to be set to 1 to improve the
planning process
I set fetch_size to a large value
I That is the easy part

Hans-Jürgen Schönig
www.postgresql-support.de
Preconditions to make it work (2)

I We have to make sure that all remote database servers work at

the same time
I This requires “parallel append and async fetching”
I All queries are sent to the many nodes in parallel
I Data can be fetched in parallel
I We cannot afford to wait for each nodes to complete if we want
to scale in a linear way

Hans-Jürgen Schönig
www.postgresql-support.de
Preconditions to make it work (3)

I PostgreSQL could not be changed without substantial work

being done recently
I Traditionally joins had to be done BEFORE aggregation
I This is a showstopper for distributed aggregation because all the
data has to be fetched from the remote host before aggregation
I Without this change the test is not possible.

Hans-Jürgen Schönig
www.postgresql-support.de
Preconditions to make it work (4)

I Easy tasks:
I Aggregates have to be implemented to handle partial results
coming from shards
I Code is simple and available as extension
I For the test we implemented a handful of aggregates

Hans-Jürgen Schönig
www.postgresql-support.de
Parallel execution on shards is now possible

I Dissect aggregation
I Send partial queries to shards in parallel
I Perform parallel execution on shards
I Add up data on main node

Hans-Jürgen Schönig
www.postgresql-support.de
Final results

node=# SELECT grp, count(data) FROM t_demo GROUP BY 1;

grp | count
-----+-----------
0 | 320000000
1 | 320000000
...
9 | 320000000
(10 rows)
Planning time: 0.955 ms
Execution time: 2910.367 ms

Hans-Jürgen Schönig
www.postgresql-support.de
Hardware used

I We used 32 boxes (16 cores) on Google

I Data was in memory
I Adding more servers is EASY
I Price tag: The staggering amount of EUR 28.14 (for
development, testing and running the test)

Hans-Jürgen Schönig
www.postgresql-support.de
A look at PostgreSQL 10.0

I A lot more parallelism will be available

I Many executor nodes will enjoy parallel execution
I PostgreSQL 10.0 will be a giant leap forward

Hans-Jürgen Schönig
www.postgresql-support.de
More complex plans

I ROLLUP / CUBE / GROUPING SETS has to wait for 10.0

I A patch for that has been seen on the mailing list
I Be careful with complex intermediate results
I Avoid sorting of large amounts of data
I Some things are just harder on large data sets

Hans-Jürgen Schönig
www.postgresql-support.de
Future ideas: JIT compilation

I JIT will allow us to do the same thing with fewer CPUs

I Will significantly improve throughput
I Some project teams are working on that

Hans-Jürgen Schönig
www.postgresql-support.de
Future ideas: “Deeper execution”

I So far only one “stage” of execution is used

I Nothing stops us from building “trees” of servers
I More complex operations can be done
I Infrastructure is in place

Hans-Jürgen Schönig
www.postgresql-support.de
Future things: Column stores

I Column stores will bring a real boost

I Vectorization can speed up things drastically
I Many commercial vendors already do that
I GPUs may also be useful

Hans-Jürgen Schönig
www.postgresql-support.de
Finally

I Any questions?

Hans-Jürgen Schönig
www.postgresql-support.de
Contact us

Cybertec Schönig & Schönig GmbH

Hans-Jürgen Schönig
Gröhrmühlgasse 26
A-2700 Wiener Neustadt

www.postgresql-support.de

Follow us on Twitter: @PostgresSupport

Hans-Jürgen Schönig
www.postgresql-support.de

PostgreSQL DBA Contents
No ratings yet
PostgreSQL DBA Contents
2 pages
50 46 Pgcon2008 Problem
No ratings yet
50 46 Pgcon2008 Problem
36 pages
Parallel Query Processing in PostgreSQL
No ratings yet
Parallel Query Processing in PostgreSQL
15 pages
PostgreSQL Distributed Architectures and Best Practices
No ratings yet
PostgreSQL Distributed Architectures and Best Practices
42 pages
Postgres For Interview
100% (1)
Postgres For Interview
15 pages
PostgreSQL Proficiency For Python People
No ratings yet
PostgreSQL Proficiency For Python People
215 pages
Foundations PostgreSQL Administration 13
100% (1)
Foundations PostgreSQL Administration 13
307 pages
20250219 - Zafin Learn Session - PostgreSQL Performance for Application Developers
No ratings yet
20250219 - Zafin Learn Session - PostgreSQL Performance for Application Developers
58 pages
Introduction Postgre SQLAdministration V11
No ratings yet
Introduction Postgre SQLAdministration V11
274 pages
Distributed PostgreSQL
No ratings yet
Distributed PostgreSQL
118 pages
Psycopg 2010 Stuttgart
No ratings yet
Psycopg 2010 Stuttgart
44 pages
Postgresql Course Material
No ratings yet
Postgresql Course Material
205 pages
Postgrre
No ratings yet
Postgrre
14 pages
Accidentaldbalinuxcon 130102190320 Phpapp02
No ratings yet
Accidentaldbalinuxcon 130102190320 Phpapp02
61 pages
SOW For Postgres DB Activities - v01 - Linux
No ratings yet
SOW For Postgres DB Activities - v01 - Linux
7 pages
Challenges of Distributing Postgres: A Citus Story: Ozgun Erdogan
No ratings yet
Challenges of Distributing Postgres: A Citus Story: Ozgun Erdogan
47 pages
CSE-6001 PostgreSQL Tutorial
No ratings yet
CSE-6001 PostgreSQL Tutorial
34 pages
Making Postgres Central in Your Data Center
No ratings yet
Making Postgres Central in Your Data Center
39 pages
Postgresql
No ratings yet
Postgresql
56 pages
ADC Theory
No ratings yet
ADC Theory
7 pages
admin-workshop
No ratings yet
admin-workshop
117 pages
PostgreSQL When It's Not Your Job
No ratings yet
PostgreSQL When It's Not Your Job
183 pages
Get PostgreSQL Server Programming 1st Edition Hannu Krosing Kirk Roybal Jim Mlodgenski free all chapters
100% (5)
Get PostgreSQL Server Programming 1st Edition Hannu Krosing Kirk Roybal Jim Mlodgenski free all chapters
55 pages
Postgresql: Complete
No ratings yet
Postgresql: Complete
56 pages
Resumo Postgresql
No ratings yet
Resumo Postgresql
6 pages
Major Features: Postgres 10: Ruce Omjian
No ratings yet
Major Features: Postgres 10: Ruce Omjian
20 pages
PostgreSQL IQ
No ratings yet
PostgreSQL IQ
27 pages
Rethinking Web Development With PostgreSQL
No ratings yet
Rethinking Web Development With PostgreSQL
5 pages
DBA Roadmap - Learn To Become A Database Administrator With Postg
No ratings yet
DBA Roadmap - Learn To Become A Database Administrator With Postg
8 pages
Postgresql Query Optimization: Step by Step Techniques
No ratings yet
Postgresql Query Optimization: Step by Step Techniques
50 pages
0292 Introduction Postgresql
No ratings yet
0292 Introduction Postgresql
91 pages
PostgresChina2018 刘东明 PostgreSQL并行查询
No ratings yet
PostgresChina2018 刘东明 PostgreSQL并行查询
36 pages
Postgresql Tuning Guide: Postgresql Architecture: Key Takeaways
No ratings yet
Postgresql Tuning Guide: Postgresql Architecture: Key Takeaways
8 pages
Postgresql 7.2 Tutorial: The Postgresql Global Development Group
No ratings yet
Postgresql 7.2 Tutorial: The Postgresql Global Development Group
35 pages
PostgreSQL For Beginners
100% (6)
PostgreSQL For Beginners
142 pages
PostgreSQL - Identifying Slow Queries and Fixing Them
No ratings yet
PostgreSQL - Identifying Slow Queries and Fixing Them
40 pages
Modul PostgreSQL Terbaru - WebHozz
No ratings yet
Modul PostgreSQL Terbaru - WebHozz
175 pages
Building A Scalable Time-Series Database Using Postgres: Mike Freedman
No ratings yet
Building A Scalable Time-Series Database Using Postgres: Mike Freedman
45 pages
PostgreSQL Server Programming 2nd Edition Usama Dar - Download the ebook and start exploring right away
No ratings yet
PostgreSQL Server Programming 2nd Edition Usama Dar - Download the ebook and start exploring right away
81 pages
Q4 2021 - Webinar - Slides - Tuning Tips To Maximize Postgres Performance
No ratings yet
Q4 2021 - Webinar - Slides - Tuning Tips To Maximize Postgres Performance
41 pages
03-PostgreSQL-Database Admin Overview
No ratings yet
03-PostgreSQL-Database Admin Overview
32 pages
52492-rc071 Postgresql PDF
No ratings yet
52492-rc071 Postgresql PDF
11 pages
A Tour of Postgresql Internals: Tom Lane Great Bridge, LLC Tgl@Sss - Pgh.Pa - Us 1
No ratings yet
A Tour of Postgresql Internals: Tom Lane Great Bridge, LLC Tgl@Sss - Pgh.Pa - Us 1
25 pages
Postgres Topic
No ratings yet
Postgres Topic
116 pages
Experiment 9
No ratings yet
Experiment 9
8 pages
PostgreSQL As A NoSQL Database
100% (1)
PostgreSQL As A NoSQL Database
61 pages
12 Algorithms for System Design Interviews
No ratings yet
12 Algorithms for System Design Interviews
8 pages
Postgres The First Experience
No ratings yet
Postgres The First Experience
173 pages
PostgreSQL Internals Notes Compilation
No ratings yet
PostgreSQL Internals Notes Compilation
18 pages
PostgreSQL%3A+Let%27s+make+PostgreSQL+multi-threaded
No ratings yet
PostgreSQL%3A+Let%27s+make+PostgreSQL+multi-threaded
1 page
PostgreSQL For Data Architects - Sample Chapter
No ratings yet
PostgreSQL For Data Architects - Sample Chapter
23 pages
Nosql For Postgresql: Best Practices
No ratings yet
Nosql For Postgresql: Best Practices
94 pages
EPAS_Essentials_v15
No ratings yet
EPAS_Essentials_v15
432 pages
PostgreSQL OpenCL Procedural Language
No ratings yet
PostgreSQL OpenCL Procedural Language
29 pages
Postgresql Performance Tuning: Ruce Omjian
No ratings yet
Postgresql Performance Tuning: Ruce Omjian
61 pages
Interview Questions Postgres
No ratings yet
Interview Questions Postgres
18 pages
Postgre SQL
No ratings yet
Postgre SQL
35 pages
Introbook v4 en
No ratings yet
Introbook v4 en
145 pages
Ansible For Windows By Examples
From Everand
Ansible For Windows By Examples
Berton
No ratings yet
50 Recipes for Programming Node.js
From Everand
50 Recipes for Programming Node.js
Jamie Munro
3/5 (4)
Multiclass Tr-AdaBoost Classification of Mobile Lidar Objects
No ratings yet
Multiclass Tr-AdaBoost Classification of Mobile Lidar Objects
25 pages
CSS G10 Q1 - Wk8 - Ping Command
No ratings yet
CSS G10 Q1 - Wk8 - Ping Command
11 pages
Tmctest 4 N
No ratings yet
Tmctest 4 N
1 page
OneDrive For Business Training Syllabus
No ratings yet
OneDrive For Business Training Syllabus
19 pages
Installation Guide PEGA
No ratings yet
Installation Guide PEGA
9 pages
Chapter 7 - Introduction To Structured Query Language (SQL)
No ratings yet
Chapter 7 - Introduction To Structured Query Language (SQL)
57 pages
Selecontrol® Mas: Tcp/Ip and Udp
No ratings yet
Selecontrol® Mas: Tcp/Ip and Udp
26 pages
CCIE SP v5
No ratings yet
CCIE SP v5
101 pages
Genetic Algorithms - Quick Guide
No ratings yet
Genetic Algorithms - Quick Guide
39 pages
12 Maths
No ratings yet
12 Maths
4 pages
All JEE JAN 2024 PYQs Matrices & Determinant
No ratings yet
All JEE JAN 2024 PYQs Matrices & Determinant
84 pages
Practical's Theory
No ratings yet
Practical's Theory
112 pages
DOCMAP Review Forms
No ratings yet
DOCMAP Review Forms
8 pages
Yash Raj Artificial Intelligence New
No ratings yet
Yash Raj Artificial Intelligence New
9 pages
Architecture Instruction Set Extensions Programming Reference
No ratings yet
Architecture Instruction Set Extensions Programming Reference
221 pages
EDA on Sales Data using MySQL and Power BI
No ratings yet
EDA on Sales Data using MySQL and Power BI
14 pages
This is a 101 Hybrid Chip Process Online 1
No ratings yet
This is a 101 Hybrid Chip Process Online 1
12 pages
Final DBMS Report
No ratings yet
Final DBMS Report
10 pages
Cfisd Homework
100% (1)
Cfisd Homework
7 pages
Lab: Windows Administration: Module 2: Cmdlets For Administration
No ratings yet
Lab: Windows Administration: Module 2: Cmdlets For Administration
12 pages
11AK33
No ratings yet
11AK33
35 pages
Testing Wifi Ofdm
No ratings yet
Testing Wifi Ofdm
41 pages
Second Midterm For ECE374: 04/16/12 Solution!! Instructions
No ratings yet
Second Midterm For ECE374: 04/16/12 Solution!! Instructions
10 pages
PASWI Membership Form 4
No ratings yet
PASWI Membership Form 4
1 page
프리즌 라이프 핵
No ratings yet
프리즌 라이프 핵
174 pages
jssp.2024.12.161
No ratings yet
jssp.2024.12.161
6 pages
Process and Scheduling - OS
No ratings yet
Process and Scheduling - OS
52 pages
Unit 5 Review - Solving Quadratic Equations F20 Name: - Factor Completely: 1) 2) 3)
No ratings yet
Unit 5 Review - Solving Quadratic Equations F20 Name: - Factor Completely: 1) 2) 3)
5 pages
Kunal Shinde: Professional Summary
No ratings yet
Kunal Shinde: Professional Summary
3 pages
The Joy of Computing Using Python PART 1
No ratings yet
The Joy of Computing Using Python PART 1
5 pages

Postgresql Benchmark

Uploaded by

Postgresql Benchmark

Uploaded by

Reaching 1 billion rows / second

I Processing 1 billion rows / second

I PL/Proxy is a stored procedure language to scale out to shards.

I Doing scaling on the app level

I We tried to keep that simple:

SELECT grp, count(data)

I The main questions are:

I Parallel queries have been added in PostgreSQL 9.6

I Usually the number of processes per scan is derived from the

test=# SHOW min_parallel_relation_size ;

I One process is added if the tablesize triples

I We could never have enough data to make PostgreSQL go for

test=# ALTER TABLE t_demo

I How well does PostgreSQL scale on a single box?

I We used a 16 core box here

I On a single Google VM we could reach close to 40 million rows

I We want to shard data to as many nodes as needed

explain SELECT grp, COUNT(data) FROM t_demo GROUP BY 1;

I Throughput doubles as long as partial results are small

I postgres_fdw uses cursors on the remote side

I We have to make sure that all remote database servers work at

I PostgreSQL could not be changed without substantial work

node=# SELECT grp, count(data) FROM t_demo GROUP BY 1;

I We used 32 boxes (16 cores) on Google

I A lot more parallelism will be available

I ROLLUP / CUBE / GROUPING SETS has to wait for 10.0

I JIT will allow us to do the same thing with fewer CPUs

I So far only one “stage” of execution is used

I Column stores will bring a real boost

Cybertec Schönig & Schönig GmbH

Follow us on Twitter: @PostgresSupport

You might also like