Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

Commit 8dc3c97

Browse files
committed
Treat directory open failures as hard errors in ResetUnloggedRelations().
Previously, this code just reported such problems at LOG level and kept going. The problem with this approach is that transient failures (e.g., ENFILE) could prevent us from resetting unlogged relations to empty, yet allow recovery to appear to complete successfully. That seems like a data corruption hazard large enough to treat such problems as reasons to fail startup. For the same reason, treat unlink failures for unlogged files as hard errors not just LOG messages. It's a little odd that we did it like that when file-level errors in other steps (copy_file, fsync_fname) are ERRORs. The sole case that I left alone is that ENOENT failure on a tablespace (not database) directory is not an error, though it will now be logged rather than just silently ignored. This is to cover the scenario where a previous DROP TABLESPACE removed the tablespace directory but failed before removing the pg_tblspc symlink. I'm not sure that that's very likely in practice, but that seems like the only real excuse for the old behavior here, so let's allow for it. (As coded, this will also allow ENOENT on $PGDATA/base/. But since we'll fail soon enough if that's gone, I don't think we need to complicate this code by distinguishing that from a true tablespace case.) Discussion: https://postgr.es/m/21040.1512418508@sss.pgh.pa.us
1 parent e7cfb26 commit 8dc3c97

File tree

1 file changed

+39
-70
lines changed

1 file changed

+39
-70
lines changed

src/backend/storage/file/reinit.c

+39-70
Original file line numberDiff line numberDiff line change
@@ -98,7 +98,9 @@ ResetUnloggedRelations(int op)
9898
MemoryContextDelete(tmpctx);
9999
}
100100

101-
/* Process one tablespace directory for ResetUnloggedRelations */
101+
/*
102+
* Process one tablespace directory for ResetUnloggedRelations
103+
*/
102104
static void
103105
ResetUnloggedRelationsInTablespaceDir(const char *tsdirname, int op)
104106
{
@@ -107,29 +109,32 @@ ResetUnloggedRelationsInTablespaceDir(const char *tsdirname, int op)
107109
char dbspace_path[MAXPGPATH * 2];
108110

109111
ts_dir = AllocateDir(tsdirname);
110-
if (ts_dir == NULL)
112+
113+
/*
114+
* If we get ENOENT on a tablespace directory, log it and return. This
115+
* can happen if a previous DROP TABLESPACE crashed between removing the
116+
* tablespace directory and removing the symlink in pg_tblspc. We don't
117+
* really want to prevent database startup in that scenario, so let it
118+
* pass instead. Any other type of error will be reported by ReadDir
119+
* (causing a startup failure).
120+
*/
121+
if (ts_dir == NULL && errno == ENOENT)
111122
{
112-
/* anything except ENOENT is fishy */
113-
if (errno != ENOENT)
114-
ereport(LOG,
115-
(errcode_for_file_access(),
116-
errmsg("could not open directory \"%s\": %m",
117-
tsdirname)));
123+
ereport(LOG,
124+
(errcode_for_file_access(),
125+
errmsg("could not open directory \"%s\": %m",
126+
tsdirname)));
118127
return;
119128
}
120129

121130
while ((de = ReadDir(ts_dir, tsdirname)) != NULL)
122131
{
123-
int i = 0;
124-
125132
/*
126133
* We're only interested in the per-database directories, which have
127134
* numeric names. Note that this code will also (properly) ignore "."
128135
* and "..".
129136
*/
130-
while (isdigit((unsigned char) de->d_name[i]))
131-
++i;
132-
if (de->d_name[i] != '\0' || i == 0)
137+
if (strspn(de->d_name, "0123456789") != strlen(de->d_name))
133138
continue;
134139

135140
snprintf(dbspace_path, sizeof(dbspace_path), "%s/%s",
@@ -140,7 +145,9 @@ ResetUnloggedRelationsInTablespaceDir(const char *tsdirname, int op)
140145
FreeDir(ts_dir);
141146
}
142147

143-
/* Process one per-dbspace directory for ResetUnloggedRelations */
148+
/*
149+
* Process one per-dbspace directory for ResetUnloggedRelations
150+
*/
144151
static void
145152
ResetUnloggedRelationsInDbspaceDir(const char *dbspacedirname, int op)
146153
{
@@ -158,32 +165,23 @@ ResetUnloggedRelationsInDbspaceDir(const char *dbspacedirname, int op)
158165
*/
159166
if ((op & UNLOGGED_RELATION_CLEANUP) != 0)
160167
{
161-
HTAB *hash = NULL;
168+
HTAB *hash;
162169
HASHCTL ctl;
163170

164-
/* Open the directory. */
165-
dbspace_dir = AllocateDir(dbspacedirname);
166-
if (dbspace_dir == NULL)
167-
{
168-
ereport(LOG,
169-
(errcode_for_file_access(),
170-
errmsg("could not open directory \"%s\": %m",
171-
dbspacedirname)));
172-
return;
173-
}
174-
175171
/*
176172
* It's possible that someone could create a ton of unlogged relations
177173
* in the same database & tablespace, so we'd better use a hash table
178174
* rather than an array or linked list to keep track of which files
179175
* need to be reset. Otherwise, this cleanup operation would be
180176
* O(n^2).
181177
*/
178+
memset(&ctl, 0, sizeof(ctl));
182179
ctl.keysize = sizeof(unlogged_relation_entry);
183180
ctl.entrysize = sizeof(unlogged_relation_entry);
184181
hash = hash_create("unlogged hash", 32, &ctl, HASH_ELEM);
185182

186183
/* Scan the directory. */
184+
dbspace_dir = AllocateDir(dbspacedirname);
187185
while ((de = ReadDir(dbspace_dir, dbspacedirname)) != NULL)
188186
{
189187
ForkNumber forkNum;
@@ -222,21 +220,9 @@ ResetUnloggedRelationsInDbspaceDir(const char *dbspacedirname, int op)
222220
}
223221

224222
/*
225-
* Now, make a second pass and remove anything that matches. First,
226-
* reopen the directory.
223+
* Now, make a second pass and remove anything that matches.
227224
*/
228225
dbspace_dir = AllocateDir(dbspacedirname);
229-
if (dbspace_dir == NULL)
230-
{
231-
ereport(LOG,
232-
(errcode_for_file_access(),
233-
errmsg("could not open directory \"%s\": %m",
234-
dbspacedirname)));
235-
hash_destroy(hash);
236-
return;
237-
}
238-
239-
/* Scan the directory. */
240226
while ((de = ReadDir(dbspace_dir, dbspacedirname)) != NULL)
241227
{
242228
ForkNumber forkNum;
@@ -266,15 +252,11 @@ ResetUnloggedRelationsInDbspaceDir(const char *dbspacedirname, int op)
266252
{
267253
snprintf(rm_path, sizeof(rm_path), "%s/%s",
268254
dbspacedirname, de->d_name);
269-
270-
/*
271-
* It's tempting to actually throw an error here, but since
272-
* this code gets run during database startup, that could
273-
* result in the database failing to start. (XXX Should we do
274-
* it anyway?)
275-
*/
276-
if (unlink(rm_path))
277-
elog(LOG, "could not unlink file \"%s\": %m", rm_path);
255+
if (unlink(rm_path) < 0)
256+
ereport(ERROR,
257+
(errcode_for_file_access(),
258+
errmsg("could not remove file \"%s\": %m",
259+
rm_path)));
278260
else
279261
elog(DEBUG2, "unlinked file \"%s\"", rm_path);
280262
}
@@ -294,19 +276,8 @@ ResetUnloggedRelationsInDbspaceDir(const char *dbspacedirname, int op)
294276
*/
295277
if ((op & UNLOGGED_RELATION_INIT) != 0)
296278
{
297-
/* Open the directory. */
298-
dbspace_dir = AllocateDir(dbspacedirname);
299-
if (dbspace_dir == NULL)
300-
{
301-
/* we just saw this directory, so it really ought to be there */
302-
ereport(LOG,
303-
(errcode_for_file_access(),
304-
errmsg("could not open directory \"%s\": %m",
305-
dbspacedirname)));
306-
return;
307-
}
308-
309279
/* Scan the directory. */
280+
dbspace_dir = AllocateDir(dbspacedirname);
310281
while ((de = ReadDir(dbspace_dir, dbspacedirname)) != NULL)
311282
{
312283
ForkNumber forkNum;
@@ -350,16 +321,6 @@ ResetUnloggedRelationsInDbspaceDir(const char *dbspacedirname, int op)
350321
* (especially the metadata ones) at once.
351322
*/
352323
dbspace_dir = AllocateDir(dbspacedirname);
353-
if (dbspace_dir == NULL)
354-
{
355-
/* we just saw this directory, so it really ought to be there */
356-
ereport(LOG,
357-
(errcode_for_file_access(),
358-
errmsg("could not open directory \"%s\": %m",
359-
dbspacedirname)));
360-
return;
361-
}
362-
363324
while ((de = ReadDir(dbspace_dir, dbspacedirname)) != NULL)
364325
{
365326
ForkNumber forkNum;
@@ -388,6 +349,14 @@ ResetUnloggedRelationsInDbspaceDir(const char *dbspacedirname, int op)
388349

389350
FreeDir(dbspace_dir);
390351

352+
/*
353+
* Lastly, fsync the database directory itself, ensuring the
354+
* filesystem remembers the file creations and deletions we've done.
355+
* We don't bother with this during a call that does only
356+
* UNLOGGED_RELATION_CLEANUP, because if recovery crashes before we
357+
* get to doing UNLOGGED_RELATION_INIT, we'll redo the cleanup step
358+
* too at the next startup attempt.
359+
*/
391360
fsync_fname(dbspacedirname, true);
392361
}
393362
}

0 commit comments

Comments
 (0)