Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

Commit bc8cd50

Browse files
committed
Fix pg_dump for hash partitioning on enum columns.
Hash partitioning on an enum is problematic because the hash codes are derived from the OIDs assigned to the enum values, which will almost certainly be different after a dump-and-reload than they were before. This means that some rows probably end up in different partitions than before, causing restore to fail because of partition constraint violations. (pg_upgrade dodges this problem by using hacks to force the enum values to keep the same OIDs, but that's not possible nor desirable for pg_dump.) Users can work around that by specifying --load-via-partition-root, but since that's a dump-time not restore-time decision, one might find out the need for it far too late. Instead, teach pg_dump to apply that option automatically when dealing with a partitioned table that has hash-on-enum partitioning. Also deal with a pre-existing issue for --load-via-partition-root mode: in a parallel restore, we try to TRUNCATE target tables just before loading them, in order to enable some backend optimizations. This is bad when using --load-via-partition-root because (a) we're likely to suffer deadlocks from restore jobs trying to restore rows into other partitions than they came from, and (b) if we miss getting a deadlock we might still lose data due to a TRUNCATE removing rows from some already-completed restore job. The fix for this is conceptually simple: just don't TRUNCATE if we're dealing with a --load-via-partition-root case. The tricky bit is for pg_restore to identify those cases. In dumps using COPY commands we can inspect each COPY command to see if it targets the nominal target table or some ancestor. However, in dumps using INSERT commands it's pretty impractical to examine the INSERTs in advance. To provide a solution for that going forward, modify pg_dump to mark TABLE DATA items that are using --load-via-partition-root with a comment. (This change also responds to a complaint from Robert Haas that the dump output for --load-via-partition-root is pretty confusing.) pg_restore checks for the special comment as well as checking the COPY command if present. This will fail to identify the combination of --load-via-partition-root and --inserts in pre-existing dump files, but that should be a pretty rare case in the field. If it does happen you will probably get a deadlock failure that you can work around by not using parallel restore, which is the same as before this bug fix. Having done this, there seems no remaining reason for the alarmism in the pg_dump man page about combining --load-via-partition-root with parallel restore, so remove that warning. Patch by me; thanks to Julien Rouhaud for review. Back-patch to v11 where hash partitioning was introduced. Discussion: https://postgr.es/m/1376149.1675268279@sss.pgh.pa.us
1 parent de4d456 commit bc8cd50

File tree

8 files changed

+288
-52
lines changed

8 files changed

+288
-52
lines changed

doc/src/sgml/ref/pg_dump.sgml

-10
Original file line numberDiff line numberDiff line change
@@ -903,16 +903,6 @@ PostgreSQL documentation
903903
and the two systems have different definitions of the collation used
904904
to sort the partitioning column.
905905
</para>
906-
907-
<para>
908-
It is best not to use parallelism when restoring from an archive made
909-
with this option, because <application>pg_restore</application> will
910-
not know exactly which partition(s) a given archive data item will
911-
load data into. This could result in inefficiency due to lock
912-
conflicts between parallel jobs, or perhaps even restore failures due
913-
to foreign key constraints being set up before all the relevant data
914-
is loaded.
915-
</para>
916906
</listitem>
917907
</varlistentry>
918908

doc/src/sgml/ref/pg_dumpall.sgml

-4
Original file line numberDiff line numberDiff line change
@@ -361,10 +361,6 @@ PostgreSQL documentation
361361
and the two systems have different definitions of the collation used
362362
to sort the partitioning column.
363363
</para>
364-
365-
<!-- Currently, we don't need pg_dump's warning about parallelism here,
366-
since parallel restore from a pg_dumpall script is impossible.
367-
-->
368364
</listitem>
369365
</varlistentry>
370366

src/bin/pg_dump/common.c

+10-8
Original file line numberDiff line numberDiff line change
@@ -230,6 +230,9 @@ getSchemaData(Archive *fout, int *numTablesPtr)
230230
pg_log_info("flagging inherited columns in subtables");
231231
flagInhAttrs(fout->dopt, tblinfo, numTables);
232232

233+
pg_log_info("reading partitioning data");
234+
getPartitioningInfo(fout);
235+
233236
pg_log_info("reading indexes");
234237
getIndexes(fout, tblinfo, numTables);
235238

@@ -285,7 +288,6 @@ static void
285288
flagInhTables(Archive *fout, TableInfo *tblinfo, int numTables,
286289
InhInfo *inhinfo, int numInherits)
287290
{
288-
DumpOptions *dopt = fout->dopt;
289291
int i,
290292
j;
291293

@@ -301,18 +303,18 @@ flagInhTables(Archive *fout, TableInfo *tblinfo, int numTables,
301303
continue;
302304

303305
/*
304-
* Normally, we don't bother computing anything for non-target tables,
305-
* but if load-via-partition-root is specified, we gather information
306-
* on every partition in the system so that getRootTableInfo can trace
307-
* from any given to leaf partition all the way up to the root. (We
308-
* don't need to mark them as interesting for getTableAttrs, though.)
306+
* Normally, we don't bother computing anything for non-target tables.
307+
* However, we must find the parents of non-root partitioned tables in
308+
* any case, so that we can trace from leaf partitions up to the root
309+
* (in case a leaf is to be dumped but its parents are not). We need
310+
* not mark such parents interesting for getTableAttrs, though.
309311
*/
310312
if (!tblinfo[i].dobj.dump)
311313
{
312314
mark_parents = false;
313315

314-
if (!dopt->load_via_partition_root ||
315-
!tblinfo[i].ispartition)
316+
if (!(tblinfo[i].relkind == RELKIND_PARTITIONED_TABLE &&
317+
tblinfo[i].ispartition))
316318
find_parents = false;
317319
}
318320

src/bin/pg_dump/meson.build

+1
Original file line numberDiff line numberDiff line change
@@ -96,6 +96,7 @@ tests += {
9696
't/001_basic.pl',
9797
't/002_pg_dump.pl',
9898
't/003_pg_dump_with_server.pl',
99+
't/004_pg_dump_parallel.pl',
99100
't/010_dump_connstr.pl',
100101
],
101102
},

src/bin/pg_dump/pg_backup_archiver.c

+64-9
Original file line numberDiff line numberDiff line change
@@ -86,6 +86,7 @@ static RestorePass _tocEntryRestorePass(TocEntry *te);
8686
static bool _tocEntryIsACL(TocEntry *te);
8787
static void _disableTriggersIfNecessary(ArchiveHandle *AH, TocEntry *te);
8888
static void _enableTriggersIfNecessary(ArchiveHandle *AH, TocEntry *te);
89+
static bool is_load_via_partition_root(TocEntry *te);
8990
static void buildTocEntryArrays(ArchiveHandle *AH);
9091
static void _moveBefore(TocEntry *pos, TocEntry *te);
9192
static int _discoverArchiveFormat(ArchiveHandle *AH);
@@ -887,6 +888,8 @@ restore_toc_entry(ArchiveHandle *AH, TocEntry *te, bool is_parallel)
887888
}
888889
else
889890
{
891+
bool use_truncate;
892+
890893
_disableTriggersIfNecessary(AH, te);
891894

892895
/* Select owner and schema as necessary */
@@ -898,13 +901,24 @@ restore_toc_entry(ArchiveHandle *AH, TocEntry *te, bool is_parallel)
898901

899902
/*
900903
* In parallel restore, if we created the table earlier in
901-
* the run then we wrap the COPY in a transaction and
902-
* precede it with a TRUNCATE. If archiving is not on
903-
* this prevents WAL-logging the COPY. This obtains a
904-
* speedup similar to that from using single_txn mode in
905-
* non-parallel restores.
904+
* this run (so that we know it is empty) and we are not
905+
* restoring a load-via-partition-root data item then we
906+
* wrap the COPY in a transaction and precede it with a
907+
* TRUNCATE. If wal_level is set to minimal this prevents
908+
* WAL-logging the COPY. This obtains a speedup similar
909+
* to that from using single_txn mode in non-parallel
910+
* restores.
911+
*
912+
* We mustn't do this for load-via-partition-root cases
913+
* because some data might get moved across partition
914+
* boundaries, risking deadlock and/or loss of previously
915+
* loaded data. (We assume that all partitions of a
916+
* partitioned table will be treated the same way.)
906917
*/
907-
if (is_parallel && te->created)
918+
use_truncate = is_parallel && te->created &&
919+
!is_load_via_partition_root(te);
920+
921+
if (use_truncate)
908922
{
909923
/*
910924
* Parallel restore is always talking directly to a
@@ -942,7 +956,7 @@ restore_toc_entry(ArchiveHandle *AH, TocEntry *te, bool is_parallel)
942956
AH->outputKind = OUTPUT_SQLCMDS;
943957

944958
/* close out the transaction started above */
945-
if (is_parallel && te->created)
959+
if (use_truncate)
946960
CommitTransaction(&AH->public);
947961

948962
_enableTriggersIfNecessary(AH, te);
@@ -1036,6 +1050,43 @@ _enableTriggersIfNecessary(ArchiveHandle *AH, TocEntry *te)
10361050
fmtQualifiedId(te->namespace, te->tag));
10371051
}
10381052

1053+
/*
1054+
* Detect whether a TABLE DATA TOC item is performing "load via partition
1055+
* root", that is the target table is an ancestor partition rather than the
1056+
* table the TOC item is nominally for.
1057+
*
1058+
* In newer archive files this can be detected by checking for a special
1059+
* comment placed in te->defn. In older files we have to fall back to seeing
1060+
* if the COPY statement targets the named table or some other one. This
1061+
* will not work for data dumped as INSERT commands, so we could give a false
1062+
* negative in that case; fortunately, that's a rarely-used option.
1063+
*/
1064+
static bool
1065+
is_load_via_partition_root(TocEntry *te)
1066+
{
1067+
if (te->defn &&
1068+
strncmp(te->defn, "-- load via partition root ", 27) == 0)
1069+
return true;
1070+
if (te->copyStmt && *te->copyStmt)
1071+
{
1072+
PQExpBuffer copyStmt = createPQExpBuffer();
1073+
bool result;
1074+
1075+
/*
1076+
* Build the initial part of the COPY as it would appear if the
1077+
* nominal target table is the actual target. If we see anything
1078+
* else, it must be a load-via-partition-root case.
1079+
*/
1080+
appendPQExpBuffer(copyStmt, "COPY %s ",
1081+
fmtQualifiedId(te->namespace, te->tag));
1082+
result = strncmp(te->copyStmt, copyStmt->data, copyStmt->len) != 0;
1083+
destroyPQExpBuffer(copyStmt);
1084+
return result;
1085+
}
1086+
/* Assume it's not load-via-partition-root */
1087+
return false;
1088+
}
1089+
10391090
/*
10401091
* This is a routine that is part of the dumper interface, hence the 'Archive*' parameter.
10411092
*/
@@ -2955,8 +3006,12 @@ _tocEntryRequired(TocEntry *te, teSection curSection, ArchiveHandle *AH)
29553006
res = res & ~REQ_DATA;
29563007
}
29573008

2958-
/* If there's no definition command, there's no schema component */
2959-
if (!te->defn || !te->defn[0])
3009+
/*
3010+
* If there's no definition command, there's no schema component. Treat
3011+
* "load via partition root" comments as not schema.
3012+
*/
3013+
if (!te->defn || !te->defn[0] ||
3014+
strncmp(te->defn, "-- load via partition root ", 27) == 0)
29603015
res = res & ~REQ_SCHEMA;
29613016

29623017
/*

0 commit comments

Comments
 (0)