Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

Commit 50e3ed8

Browse files
committed
Fix replay of create database records on standby
Crash recovery on standby may encounter missing directories when replaying create database WAL records. Prior to this patch, the standby would fail to recover in such a case. However, the directories could be legitimately missing. Consider a sequence of WAL records as follows: CREATE DATABASE DROP DATABASE DROP TABLESPACE If, after replaying the last WAL record and removing the tablespace directory, the standby crashes and has to replay the create database record again, the crash recovery must be able to move on. This patch adds a mechanism similar to invalid-page tracking, to keep a tally of missing directories during crash recovery. If all the missing directory references are matched with corresponding drop records at the end of crash recovery, the standby can safely continue following the primary. Backpatch to 13, at least for now. The bug is older, but fixing it in older branches requires more careful study of the interactions with commit e6d8069, which appeared in 13. A new TAP test file is added to verify the condition. However, because it depends on commit d6d317d, it can only be added to branch master. I (Álvaro) manually verified that the code behaves as expected in branch 14. It's a bit nervous-making to leave the code uncovered by tests in older branches, but leaving the bug unfixed is even worse. Also, the main reason this fix took so long is precisely that we couldn't agree on a good strategy to approach testing for the bug, so perhaps this is the best we can do. Diagnosed-by: Paul Guo <paulguo@gmail.com> Author: Paul Guo <paulguo@gmail.com> Author: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Author: Asim R Praveen <apraveen@pivotal.io> Discussion: https://postgr.es/m/CAEET0ZGx9AvioViLf7nbR_8tH9-=27DN5xWJ2P9-ROH16e4JUA@mail.gmail.com
1 parent 1ce14b6 commit 50e3ed8

File tree

6 files changed

+243
-1
lines changed

6 files changed

+243
-1
lines changed

src/backend/access/transam/xlog.c

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8107,6 +8107,12 @@ CheckRecoveryConsistency(void)
81078107
*/
81088108
XLogCheckInvalidPages();
81098109

8110+
/*
8111+
* Check if the XLOG sequence contained any unresolved references to
8112+
* missing directories.
8113+
*/
8114+
XLogCheckMissingDirs();
8115+
81108116
reachedConsistency = true;
81118117
ereport(LOG,
81128118
(errmsg("consistent recovery state reached at %X/%X",

src/backend/access/transam/xlogutils.c

Lines changed: 158 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -34,6 +34,164 @@
3434
/* GUC variable */
3535
bool ignore_invalid_pages = false;
3636

37+
38+
/*
39+
* If a create database WAL record is being replayed more than once during
40+
* crash recovery on a standby, it is possible that either the tablespace
41+
* directory or the template database directory is missing. This happens when
42+
* the directories are removed by replay of subsequent drop records. Note
43+
* that this problem happens only on standby and not on master. On master, a
44+
* checkpoint is created at the end of create database operation. On standby,
45+
* however, such a strategy (creating restart points during replay) is not
46+
* viable because it will slow down WAL replay.
47+
*
48+
* The alternative is to track references to each missing directory
49+
* encountered when performing crash recovery in the following hash table.
50+
* Similar to invalid page table above, the expectation is that each missing
51+
* directory entry should be matched with a drop database or drop tablespace
52+
* WAL record by the end of crash recovery.
53+
*/
54+
typedef struct xl_missing_dir_key
55+
{
56+
Oid spcNode;
57+
Oid dbNode;
58+
} xl_missing_dir_key;
59+
60+
typedef struct xl_missing_dir
61+
{
62+
xl_missing_dir_key key;
63+
char path[MAXPGPATH];
64+
} xl_missing_dir;
65+
66+
static HTAB *missing_dir_tab = NULL;
67+
68+
69+
/*
70+
* Keep track of a directory that wasn't found while replaying database
71+
* creation records. These should match up with tablespace removal records
72+
* later in the WAL stream; we verify that before reaching consistency.
73+
*/
74+
void
75+
XLogRememberMissingDir(Oid spcNode, Oid dbNode, char *path)
76+
{
77+
xl_missing_dir_key key;
78+
bool found;
79+
xl_missing_dir *entry;
80+
81+
/*
82+
* Database OID may be invalid but tablespace OID must be valid. If
83+
* dbNode is InvalidOid, we are logging a missing tablespace directory,
84+
* otherwise we are logging a missing database directory.
85+
*/
86+
Assert(OidIsValid(spcNode));
87+
88+
if (missing_dir_tab == NULL)
89+
{
90+
/* create hash table when first needed */
91+
HASHCTL ctl;
92+
93+
memset(&ctl, 0, sizeof(ctl));
94+
ctl.keysize = sizeof(xl_missing_dir_key);
95+
ctl.entrysize = sizeof(xl_missing_dir);
96+
97+
missing_dir_tab = hash_create("XLOG missing directory table",
98+
100,
99+
&ctl,
100+
HASH_ELEM | HASH_BLOBS);
101+
}
102+
103+
key.spcNode = spcNode;
104+
key.dbNode = dbNode;
105+
106+
entry = hash_search(missing_dir_tab, &key, HASH_ENTER, &found);
107+
108+
if (found)
109+
{
110+
if (dbNode == InvalidOid)
111+
elog(DEBUG1, "missing directory %s (tablespace %u) already exists: %s",
112+
path, spcNode, entry->path);
113+
else
114+
elog(DEBUG1, "missing directory %s (tablespace %u database %u) already exists: %s",
115+
path, spcNode, dbNode, entry->path);
116+
}
117+
else
118+
{
119+
strlcpy(entry->path, path, sizeof(entry->path));
120+
if (dbNode == InvalidOid)
121+
elog(DEBUG1, "logged missing dir %s (tablespace %u)",
122+
path, spcNode);
123+
else
124+
elog(DEBUG1, "logged missing dir %s (tablespace %u database %u)",
125+
path, spcNode, dbNode);
126+
}
127+
}
128+
129+
/*
130+
* Remove an entry from the list of directories not found. This is to be done
131+
* when the matching tablespace removal WAL record is found.
132+
*/
133+
void
134+
XLogForgetMissingDir(Oid spcNode, Oid dbNode)
135+
{
136+
xl_missing_dir_key key;
137+
138+
key.spcNode = spcNode;
139+
key.dbNode = dbNode;
140+
141+
/* Database OID may be invalid but tablespace OID must be valid. */
142+
Assert(OidIsValid(spcNode));
143+
144+
if (missing_dir_tab == NULL)
145+
return;
146+
147+
if (hash_search(missing_dir_tab, &key, HASH_REMOVE, NULL) != NULL)
148+
{
149+
if (dbNode == InvalidOid)
150+
{
151+
elog(DEBUG2, "forgot missing dir (tablespace %u)", spcNode);
152+
}
153+
else
154+
{
155+
char *path = GetDatabasePath(dbNode, spcNode);
156+
157+
elog(DEBUG2, "forgot missing dir %s (tablespace %u database %u)",
158+
path, spcNode, dbNode);
159+
pfree(path);
160+
}
161+
}
162+
}
163+
164+
/*
165+
* This is called at the end of crash recovery, before entering archive
166+
* recovery on a standby. PANIC if the hash table is not empty.
167+
*/
168+
void
169+
XLogCheckMissingDirs(void)
170+
{
171+
HASH_SEQ_STATUS status;
172+
xl_missing_dir *hentry;
173+
bool foundone = false;
174+
175+
if (missing_dir_tab == NULL)
176+
return; /* nothing to do */
177+
178+
hash_seq_init(&status, missing_dir_tab);
179+
180+
while ((hentry = (xl_missing_dir *) hash_seq_search(&status)) != NULL)
181+
{
182+
elog(WARNING, "missing directory \"%s\" tablespace %u database %u",
183+
hentry->path, hentry->key.spcNode, hentry->key.dbNode);
184+
foundone = true;
185+
}
186+
187+
if (foundone)
188+
elog(PANIC, "WAL contains references to missing directories");
189+
190+
hash_destroy(missing_dir_tab);
191+
missing_dir_tab = NULL;
192+
}
193+
194+
37195
/*
38196
* During XLOG replay, we may see XLOG records for incremental updates of
39197
* pages that no longer exist, because their relation was later dropped or
@@ -59,7 +217,6 @@ typedef struct xl_invalid_page
59217

60218
static HTAB *invalid_page_tab = NULL;
61219

62-
63220
/* Report a reference to an invalid page */
64221
static void
65222
report_invalid_page(int elevel, RelFileNode node, ForkNumber forkno,

src/backend/commands/dbcommands.c

Lines changed: 56 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2185,7 +2185,9 @@ dbase_redo(XLogReaderState *record)
21852185
xl_dbase_create_rec *xlrec = (xl_dbase_create_rec *) XLogRecGetData(record);
21862186
char *src_path;
21872187
char *dst_path;
2188+
char *parent_path;
21882189
struct stat st;
2190+
bool skip = false;
21892191

21902192
src_path = GetDatabasePath(xlrec->src_db_id, xlrec->src_tablespace_id);
21912193
dst_path = GetDatabasePath(xlrec->db_id, xlrec->tablespace_id);
@@ -2203,6 +2205,56 @@ dbase_redo(XLogReaderState *record)
22032205
(errmsg("some useless files may be left behind in old database directory \"%s\"",
22042206
dst_path)));
22052207
}
2208+
else if (!reachedConsistency)
2209+
{
2210+
/*
2211+
* It is possible that a drop tablespace record appearing later in
2212+
* WAL has already been replayed -- in other words, that we are
2213+
* replaying the database creation record a second time with no
2214+
* intervening checkpoint. In that case, the tablespace directory
2215+
* has already been removed and the create database operation
2216+
* cannot be replayed. Skip the replay itself, but remember the
2217+
* fact that the tablespace directory is missing, to be matched
2218+
* with the expected tablespace drop record later.
2219+
*/
2220+
parent_path = pstrdup(dst_path);
2221+
get_parent_directory(parent_path);
2222+
if (!(stat(parent_path, &st) == 0 && S_ISDIR(st.st_mode)))
2223+
{
2224+
XLogRememberMissingDir(xlrec->tablespace_id, InvalidOid, parent_path);
2225+
skip = true;
2226+
ereport(WARNING,
2227+
(errmsg("skipping replay of database creation WAL record"),
2228+
errdetail("The target tablespace \"%s\" directory was not found.",
2229+
parent_path),
2230+
errhint("A future WAL record that removes the directory before reaching consistent mode is expected.")));
2231+
}
2232+
pfree(parent_path);
2233+
}
2234+
2235+
/*
2236+
* If the source directory is missing, skip the copy and make a note of
2237+
* it for later.
2238+
*
2239+
* One possible reason for this is that the template database used for
2240+
* creating this database may have been dropped, as noted above.
2241+
* Moving a database from one tablespace may also be a partner in the
2242+
* crime.
2243+
*/
2244+
if (!(stat(src_path, &st) == 0 && S_ISDIR(st.st_mode)) &&
2245+
!reachedConsistency)
2246+
{
2247+
XLogRememberMissingDir(xlrec->src_tablespace_id, xlrec->src_db_id, src_path);
2248+
skip = true;
2249+
ereport(WARNING,
2250+
(errmsg("skipping replay of database creation WAL record"),
2251+
errdetail("The source database directory \"%s\" was not found.",
2252+
src_path),
2253+
errhint("A future WAL record that removes the directory before reaching consistent mode is expected.")));
2254+
}
2255+
2256+
if (skip)
2257+
return;
22062258

22072259
/*
22082260
* Force dirty buffers out to disk, to ensure source database is
@@ -2260,6 +2312,10 @@ dbase_redo(XLogReaderState *record)
22602312
ereport(WARNING,
22612313
(errmsg("some useless files may be left behind in old database directory \"%s\"",
22622314
dst_path)));
2315+
2316+
if (!reachedConsistency)
2317+
XLogForgetMissingDir(xlrec->tablespace_ids[i], xlrec->db_id);
2318+
22632319
pfree(dst_path);
22642320
}
22652321

src/backend/commands/tablespace.c

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -58,6 +58,7 @@
5858
#include "access/xact.h"
5959
#include "access/xlog.h"
6060
#include "access/xloginsert.h"
61+
#include "access/xlogutils.h"
6162
#include "catalog/catalog.h"
6263
#include "catalog/dependency.h"
6364
#include "catalog/indexing.h"
@@ -1529,6 +1530,22 @@ tblspc_redo(XLogReaderState *record)
15291530
{
15301531
xl_tblspc_drop_rec *xlrec = (xl_tblspc_drop_rec *) XLogRecGetData(record);
15311532

1533+
if (!reachedConsistency)
1534+
XLogForgetMissingDir(xlrec->ts_id, InvalidOid);
1535+
1536+
/*
1537+
* Before we remove the tablespace directory, update minimum recovery
1538+
* point to cover this WAL record. Once the tablespace is removed,
1539+
* there's no going back. This manually enforces the WAL-first rule.
1540+
* Doing this before the removal means that if the removal fails for
1541+
* some reason, the directory is left alone and needs to be manually
1542+
* removed. Alternatively we could update the minimum recovery point
1543+
* after removal, but that would leave a small window where the
1544+
* WAL-first rule could be violated.
1545+
*/
1546+
if (!reachedConsistency)
1547+
XLogFlush(record->EndRecPtr);
1548+
15321549
/*
15331550
* If we issued a WAL record for a drop tablespace it implies that
15341551
* there were no files in it at all when the DROP was done. That means

src/include/access/xlogutils.h

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,10 @@ extern void XLogDropDatabase(Oid dbid);
2323
extern void XLogTruncateRelation(RelFileNode rnode, ForkNumber forkNum,
2424
BlockNumber nblocks);
2525

26+
extern void XLogRememberMissingDir(Oid spcNode, Oid dbNode, char *path);
27+
extern void XLogForgetMissingDir(Oid spcNode, Oid dbNode);
28+
extern void XLogCheckMissingDirs(void);
29+
2630
/* Result codes for XLogReadBufferForRedo[Extended] */
2731
typedef enum
2832
{

src/tools/pgindent/typedefs.list

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3516,6 +3516,8 @@ xl_invalid_page
35163516
xl_invalid_page_key
35173517
xl_invalidations
35183518
xl_logical_message
3519+
xl_missing_dir_key
3520+
xl_missing_dir
35193521
xl_multi_insert_tuple
35203522
xl_multixact_create
35213523
xl_multixact_truncate

0 commit comments

Comments
 (0)