Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

Commit 566372b

Browse files
committed
Prevent concurrent SimpleLruTruncate() for any given SLRU.
The SimpleLruTruncate() header comment states the new coding rule. To achieve this, add locktype "frozenid" and two LWLocks. This closes a rare opportunity for data loss, which manifested as "apparent wraparound" or "could not access status of transaction" errors. Data loss is more likely in pg_multixact, due to released branches' thin margin between multiStopLimit and multiWrapLimit. If a user's physical replication primary logged ": apparent wraparound" messages, the user should rebuild standbys of that primary regardless of symptoms. At less risk is a cluster having emitted "not accepting commands" errors or "must be vacuumed" warnings at some point. One can test a cluster for this data loss by running VACUUM FREEZE in every database. Back-patch to 9.5 (all supported versions). Discussion: https://postgr.es/m/20190218073103.GA1434723@rfd.leadboat.com
1 parent d4d443b commit 566372b

File tree

11 files changed

+117
-13
lines changed

11 files changed

+117
-13
lines changed

doc/src/sgml/catalogs.sgml

+3-1
Original file line numberDiff line numberDiff line change
@@ -10226,7 +10226,8 @@ SCRAM-SHA-256$<replaceable>&lt;iteration count&gt;</replaceable>:<replaceable>&l
1022610226
and general database objects (identified by class OID and object OID,
1022710227
in the same way as in <structname>pg_description</structname> or
1022810228
<structname>pg_depend</structname>). Also, the right to extend a
10229-
relation is represented as a separate lockable object.
10229+
relation is represented as a separate lockable object, as is the right to
10230+
update <structname>pg_database</structname>.<structfield>datfrozenxid</structfield>.
1023010231
Also, <quote>advisory</quote> locks can be taken on numbers that have
1023110232
user-defined meanings.
1023210233
</para>
@@ -10254,6 +10255,7 @@ SCRAM-SHA-256$<replaceable>&lt;iteration count&gt;</replaceable>:<replaceable>&l
1025410255
Type of the lockable object:
1025510256
<literal>relation</literal>,
1025610257
<literal>extend</literal>,
10258+
<literal>frozenid</literal>,
1025710259
<literal>page</literal>,
1025810260
<literal>tuple</literal>,
1025910261
<literal>transactionid</literal>,

doc/src/sgml/monitoring.sgml

+16
Original file line numberDiff line numberDiff line change
@@ -1742,6 +1742,12 @@ postgres 27093 0.0 0.0 30096 2752 ? Ss 11:34 0:00 postgres: ser
17421742
<entry><literal>extend</literal></entry>
17431743
<entry>Waiting to extend a relation.</entry>
17441744
</row>
1745+
<row>
1746+
<entry><literal>frozenid</literal></entry>
1747+
<entry>Waiting to
1748+
update <structname>pg_database</structname>.<structfield>datfrozenxid</structfield>
1749+
and <structname>pg_database</structname>.<structfield>datminmxid</structfield>.</entry>
1750+
</row>
17451751
<row>
17461752
<entry><literal>object</literal></entry>
17471753
<entry>Waiting to acquire a lock on a non-relation database object.</entry>
@@ -1910,6 +1916,11 @@ postgres 27093 0.0 0.0 30096 2752 ? Ss 11:34 0:00 postgres: ser
19101916
<entry><literal>NotifyQueue</literal></entry>
19111917
<entry>Waiting to read or update <command>NOTIFY</command> messages.</entry>
19121918
</row>
1919+
<row>
1920+
<entry><literal>NotifyQueueTail</literal></entry>
1921+
<entry>Waiting to update limit on <command>NOTIFY</command> message
1922+
storage.</entry>
1923+
</row>
19131924
<row>
19141925
<entry><literal>NotifySLRU</literal></entry>
19151926
<entry>Waiting to access the <command>NOTIFY</command> message SLRU
@@ -2086,6 +2097,11 @@ postgres 27093 0.0 0.0 30096 2752 ? Ss 11:34 0:00 postgres: ser
20862097
<entry><literal>WALWrite</literal></entry>
20872098
<entry>Waiting for WAL buffers to be written to disk.</entry>
20882099
</row>
2100+
<row>
2101+
<entry><literal>WrapLimitsVacuum</literal></entry>
2102+
<entry>Waiting to update limits on transaction id and multixact
2103+
consumption.</entry>
2104+
</row>
20892105
<row>
20902106
<entry><literal>XactBuffer</literal></entry>
20912107
<entry>Waiting for I/O on a transaction status SLRU buffer.</entry>

src/backend/access/transam/slru.c

+8
Original file line numberDiff line numberDiff line change
@@ -1191,6 +1191,14 @@ SimpleLruFlush(SlruCtl ctl, bool allow_redirtied)
11911191

11921192
/*
11931193
* Remove all segments before the one holding the passed page number
1194+
*
1195+
* All SLRUs prevent concurrent calls to this function, either with an LWLock
1196+
* or by calling it only as part of a checkpoint. Mutual exclusion must begin
1197+
* before computing cutoffPage. Mutual exclusion must end after any limit
1198+
* update that would permit other backends to write fresh data into the
1199+
* segment immediately preceding the one containing cutoffPage. Otherwise,
1200+
* when the SLRU is quite full, SimpleLruTruncate() might delete that segment
1201+
* after it has accrued freshly-written data.
11941202
*/
11951203
void
11961204
SimpleLruTruncate(SlruCtl ctl, int cutoffPage)

src/backend/access/transam/subtrans.c

+2-2
Original file line numberDiff line numberDiff line change
@@ -349,8 +349,8 @@ ExtendSUBTRANS(TransactionId newestXact)
349349
/*
350350
* Remove all SUBTRANS segments before the one holding the passed transaction ID
351351
*
352-
* This is normally called during checkpoint, with oldestXact being the
353-
* oldest TransactionXmin of any running transaction.
352+
* oldestXact is the oldest TransactionXmin of any running transaction. This
353+
* is called only during checkpoint.
354354
*/
355355
void
356356
TruncateSUBTRANS(TransactionId oldestXact)

src/backend/commands/async.c

+27-10
Original file line numberDiff line numberDiff line change
@@ -244,19 +244,22 @@ typedef struct QueueBackendStatus
244244
/*
245245
* Shared memory state for LISTEN/NOTIFY (excluding its SLRU stuff)
246246
*
247-
* The AsyncQueueControl structure is protected by the NotifyQueueLock.
247+
* The AsyncQueueControl structure is protected by the NotifyQueueLock and
248+
* NotifyQueueTailLock.
248249
*
249-
* When holding the lock in SHARED mode, backends may only inspect their own
250-
* entries as well as the head and tail pointers. Consequently we can allow a
251-
* backend to update its own record while holding only SHARED lock (since no
252-
* other backend will inspect it).
250+
* When holding NotifyQueueLock in SHARED mode, backends may only inspect
251+
* their own entries as well as the head and tail pointers. Consequently we
252+
* can allow a backend to update its own record while holding only SHARED lock
253+
* (since no other backend will inspect it).
253254
*
254-
* When holding the lock in EXCLUSIVE mode, backends can inspect the entries
255-
* of other backends and also change the head and tail pointers.
255+
* When holding NotifyQueueLock in EXCLUSIVE mode, backends can inspect the
256+
* entries of other backends and also change the head pointer. When holding
257+
* both NotifyQueueLock and NotifyQueueTailLock in EXCLUSIVE mode, backends
258+
* can change the tail pointer.
256259
*
257260
* NotifySLRULock is used as the control lock for the pg_notify SLRU buffers.
258-
* In order to avoid deadlocks, whenever we need both locks, we always first
259-
* get NotifyQueueLock and then NotifySLRULock.
261+
* In order to avoid deadlocks, whenever we need multiple locks, we first get
262+
* NotifyQueueTailLock, then NotifyQueueLock, and lastly NotifySLRULock.
260263
*
261264
* Each backend uses the backend[] array entry with index equal to its
262265
* BackendId (which can range from 1 to MaxBackends). We rely on this to make
@@ -2177,6 +2180,10 @@ asyncQueueAdvanceTail(void)
21772180
int newtailpage;
21782181
int boundary;
21792182

2183+
/* Restrict task to one backend per cluster; see SimpleLruTruncate(). */
2184+
LWLockAcquire(NotifyQueueTailLock, LW_EXCLUSIVE);
2185+
2186+
/* Compute the new tail. */
21802187
LWLockAcquire(NotifyQueueLock, LW_EXCLUSIVE);
21812188
min = QUEUE_HEAD;
21822189
for (BackendId i = QUEUE_FIRST_LISTENER; i > 0; i = QUEUE_NEXT_LISTENER(i))
@@ -2185,7 +2192,6 @@ asyncQueueAdvanceTail(void)
21852192
min = QUEUE_POS_MIN(min, QUEUE_BACKEND_POS(i));
21862193
}
21872194
oldtailpage = QUEUE_POS_PAGE(QUEUE_TAIL);
2188-
QUEUE_TAIL = min;
21892195
LWLockRelease(NotifyQueueLock);
21902196

21912197
/*
@@ -2205,6 +2211,17 @@ asyncQueueAdvanceTail(void)
22052211
*/
22062212
SimpleLruTruncate(NotifyCtl, newtailpage);
22072213
}
2214+
2215+
/*
2216+
* Advertise the new tail. This changes asyncQueueIsFull()'s verdict for
2217+
* the segment immediately prior to the new tail, allowing fresh data into
2218+
* that segment.
2219+
*/
2220+
LWLockAcquire(NotifyQueueLock, LW_EXCLUSIVE);
2221+
QUEUE_TAIL = min;
2222+
LWLockRelease(NotifyQueueLock);
2223+
2224+
LWLockRelease(NotifyQueueTailLock);
22082225
}
22092226

22102227
/*

src/backend/commands/vacuum.c

+13
Original file line numberDiff line numberDiff line change
@@ -1361,6 +1361,14 @@ vac_update_datfrozenxid(void)
13611361
bool bogus = false;
13621362
bool dirty = false;
13631363

1364+
/*
1365+
* Restrict this task to one backend per database. This avoids race
1366+
* conditions that would move datfrozenxid or datminmxid backward. It
1367+
* avoids calling vac_truncate_clog() with a datfrozenxid preceding a
1368+
* datfrozenxid passed to an earlier vac_truncate_clog() call.
1369+
*/
1370+
LockDatabaseFrozenIds(ExclusiveLock);
1371+
13641372
/*
13651373
* Initialize the "min" calculation with
13661374
* GetOldestNonRemovableTransactionId(), which is a reasonable
@@ -1551,6 +1559,9 @@ vac_truncate_clog(TransactionId frozenXID,
15511559
bool bogus = false;
15521560
bool frozenAlreadyWrapped = false;
15531561

1562+
/* Restrict task to one backend per cluster; see SimpleLruTruncate(). */
1563+
LWLockAcquire(WrapLimitsVacuumLock, LW_EXCLUSIVE);
1564+
15541565
/* init oldest datoids to sync with my frozenXID/minMulti values */
15551566
oldestxid_datoid = MyDatabaseId;
15561567
minmulti_datoid = MyDatabaseId;
@@ -1660,6 +1671,8 @@ vac_truncate_clog(TransactionId frozenXID,
16601671
*/
16611672
SetTransactionIdLimit(frozenXID, oldestxid_datoid);
16621673
SetMultiXactIdLimit(minMulti, minmulti_datoid, false);
1674+
1675+
LWLockRelease(WrapLimitsVacuumLock);
16631676
}
16641677

16651678

src/backend/storage/lmgr/lmgr.c

+20
Original file line numberDiff line numberDiff line change
@@ -460,6 +460,21 @@ UnlockRelationForExtension(Relation relation, LOCKMODE lockmode)
460460
LockRelease(&tag, lockmode, false);
461461
}
462462

463+
/*
464+
* LockDatabaseFrozenIds
465+
*
466+
* This allows one backend per database to execute vac_update_datfrozenxid().
467+
*/
468+
void
469+
LockDatabaseFrozenIds(LOCKMODE lockmode)
470+
{
471+
LOCKTAG tag;
472+
473+
SET_LOCKTAG_DATABASE_FROZEN_IDS(tag, MyDatabaseId);
474+
475+
(void) LockAcquire(&tag, lockmode, false, false);
476+
}
477+
463478
/*
464479
* LockPage
465480
*
@@ -1098,6 +1113,11 @@ DescribeLockTag(StringInfo buf, const LOCKTAG *tag)
10981113
tag->locktag_field2,
10991114
tag->locktag_field1);
11001115
break;
1116+
case LOCKTAG_DATABASE_FROZEN_IDS:
1117+
appendStringInfo(buf,
1118+
_("pg_database.datfrozenxid of database %u"),
1119+
tag->locktag_field1);
1120+
break;
11011121
case LOCKTAG_PAGE:
11021122
appendStringInfo(buf,
11031123
_("page %u of relation %u of database %u"),

src/backend/storage/lmgr/lwlocknames.txt

+3
Original file line numberDiff line numberDiff line change
@@ -50,3 +50,6 @@ MultiXactTruncationLock 41
5050
OldSnapshotTimeMapLock 42
5151
LogicalRepWorkerLock 43
5252
XactTruncationLock 44
53+
# 45 was XactTruncationLock until removal of BackendRandomLock
54+
WrapLimitsVacuumLock 46
55+
NotifyQueueTailLock 47

src/backend/utils/adt/lockfuncs.c

+12
Original file line numberDiff line numberDiff line change
@@ -29,6 +29,7 @@
2929
const char *const LockTagTypeNames[] = {
3030
"relation",
3131
"extend",
32+
"frozenid",
3233
"page",
3334
"tuple",
3435
"transactionid",
@@ -254,6 +255,17 @@ pg_lock_status(PG_FUNCTION_ARGS)
254255
nulls[8] = true;
255256
nulls[9] = true;
256257
break;
258+
case LOCKTAG_DATABASE_FROZEN_IDS:
259+
values[1] = ObjectIdGetDatum(instance->locktag.locktag_field1);
260+
nulls[2] = true;
261+
nulls[3] = true;
262+
nulls[4] = true;
263+
nulls[5] = true;
264+
nulls[6] = true;
265+
nulls[7] = true;
266+
nulls[8] = true;
267+
nulls[9] = true;
268+
break;
257269
case LOCKTAG_PAGE:
258270
values[1] = ObjectIdGetDatum(instance->locktag.locktag_field1);
259271
values[2] = ObjectIdGetDatum(instance->locktag.locktag_field2);

src/include/storage/lmgr.h

+3
Original file line numberDiff line numberDiff line change
@@ -59,6 +59,9 @@ extern bool ConditionalLockRelationForExtension(Relation relation,
5959
LOCKMODE lockmode);
6060
extern int RelationExtensionLockWaiterCount(Relation relation);
6161

62+
/* Lock to recompute pg_database.datfrozenxid in the current database */
63+
extern void LockDatabaseFrozenIds(LOCKMODE lockmode);
64+
6265
/* Lock a page (currently only used within indexes) */
6366
extern void LockPage(Relation relation, BlockNumber blkno, LOCKMODE lockmode);
6467
extern bool ConditionalLockPage(Relation relation, BlockNumber blkno, LOCKMODE lockmode);

src/include/storage/lock.h

+10
Original file line numberDiff line numberDiff line change
@@ -138,6 +138,7 @@ typedef enum LockTagType
138138
{
139139
LOCKTAG_RELATION, /* whole relation */
140140
LOCKTAG_RELATION_EXTEND, /* the right to extend a relation */
141+
LOCKTAG_DATABASE_FROZEN_IDS, /* pg_database.datfrozenxid */
141142
LOCKTAG_PAGE, /* one page of a relation */
142143
LOCKTAG_TUPLE, /* one physical tuple */
143144
LOCKTAG_TRANSACTION, /* transaction (for waiting for xact done) */
@@ -194,6 +195,15 @@ typedef struct LOCKTAG
194195
(locktag).locktag_type = LOCKTAG_RELATION_EXTEND, \
195196
(locktag).locktag_lockmethodid = DEFAULT_LOCKMETHOD)
196197

198+
/* ID info for frozen IDs is DB OID */
199+
#define SET_LOCKTAG_DATABASE_FROZEN_IDS(locktag,dboid) \
200+
((locktag).locktag_field1 = (dboid), \
201+
(locktag).locktag_field2 = 0, \
202+
(locktag).locktag_field3 = 0, \
203+
(locktag).locktag_field4 = 0, \
204+
(locktag).locktag_type = LOCKTAG_DATABASE_FROZEN_IDS, \
205+
(locktag).locktag_lockmethodid = DEFAULT_LOCKMETHOD)
206+
197207
/* ID info for a page is RELATION info + BlockNumber */
198208
#define SET_LOCKTAG_PAGE(locktag,dboid,reloid,blocknum) \
199209
((locktag).locktag_field1 = (dboid), \

0 commit comments

Comments
 (0)