Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

Commit 40e2e5e

Browse files
Introduce framework for parallelizing various pg_upgrade tasks.
A number of pg_upgrade steps require connecting to every database in the cluster and running the same query in each one. When there are many databases, these steps are particularly time-consuming, especially since they are performed sequentially, i.e., we connect to a database, run the query, and process the results before moving on to the next database. This commit introduces a new framework that makes it easy to parallelize most of these once-in-each-database tasks by processing multiple databases concurrently. This framework manages a set of slots that follow a simple state machine, and it uses libpq's asynchronous APIs to establish the connections and run the queries. The --jobs option is used to determine the number of slots to use. To use this new task framework, callers simply need to provide the query and a callback function to process its results, and the framework takes care of the rest. A more complete description is provided at the top of the new task.c file. None of the eligible once-in-each-database tasks are converted to use this new framework in this commit. That will be done via several follow-up commits. Reviewed-by: Jeff Davis, Robert Haas, Daniel Gustafsson, Ilya Gladyshev, Corey Huinker Discussion: https://postgr.es/m/20240516211638.GA1688936%40nathanxps13
1 parent d891c49 commit 40e2e5e

File tree

6 files changed

+474
-3
lines changed

6 files changed

+474
-3
lines changed

doc/src/sgml/ref/pgupgrade.sgml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -118,7 +118,7 @@ PostgreSQL documentation
118118
<varlistentry>
119119
<term><option>-j <replaceable class="parameter">njobs</replaceable></option></term>
120120
<term><option>--jobs=<replaceable class="parameter">njobs</replaceable></option></term>
121-
<listitem><para>number of simultaneous processes or threads to use
121+
<listitem><para>number of simultaneous connections and processes/threads to use
122122
</para></listitem>
123123
</varlistentry>
124124

@@ -587,8 +587,8 @@ NET STOP postgresql-&majorversion;
587587

588588
<para>
589589
The <option>--jobs</option> option allows multiple CPU cores to be used
590-
for copying/linking of files and to dump and restore database schemas
591-
in parallel; a good place to start is the maximum of the number of
590+
for copying/linking of files, dumping and restoring database schemas
591+
in parallel, etc.; a good place to start is the maximum of the number of
592592
CPU cores and tablespaces. This option can dramatically reduce the
593593
time to upgrade a multi-database server running on a multiprocessor
594594
machine.

src/bin/pg_upgrade/Makefile

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,7 @@ OBJS = \
2525
relfilenumber.o \
2626
server.o \
2727
tablespace.o \
28+
task.o \
2829
util.o \
2930
version.o
3031

src/bin/pg_upgrade/meson.build

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@ pg_upgrade_sources = files(
1414
'relfilenumber.c',
1515
'server.c',
1616
'tablespace.c',
17+
'task.c',
1718
'util.c',
1819
'version.c',
1920
)

src/bin/pg_upgrade/pg_upgrade.h

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -494,3 +494,24 @@ void parallel_transfer_all_new_dbs(DbInfoArr *old_db_arr, DbInfoArr *new_db_arr
494494
char *old_pgdata, char *new_pgdata,
495495
char *old_tablespace);
496496
bool reap_child(bool wait_for_child);
497+
498+
/* task.c */
499+
500+
typedef void (*UpgradeTaskProcessCB) (DbInfo *dbinfo, PGresult *res, void *arg);
501+
502+
/* struct definition is private to task.c */
503+
typedef struct UpgradeTask UpgradeTask;
504+
505+
UpgradeTask *upgrade_task_create(void);
506+
void upgrade_task_add_step(UpgradeTask *task, const char *query,
507+
UpgradeTaskProcessCB process_cb, bool free_result,
508+
void *arg);
509+
void upgrade_task_run(const UpgradeTask *task, const ClusterInfo *cluster);
510+
void upgrade_task_free(UpgradeTask *task);
511+
512+
/* convenient type for common private data needed by several tasks */
513+
typedef struct
514+
{
515+
FILE *file;
516+
char path[MAXPGPATH];
517+
} UpgradeTaskReport;

0 commit comments

Comments
 (0)