Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

Commit c780a7a

Browse files
committed
Add CheckBuffer() to check on-disk pages without shared buffer loading
CheckBuffer() is designed to be a concurrent-safe function able to run sanity checks on a relation page without loading it into the shared buffers. The operation is done using a lock on the partition involved in the shared buffer mapping hashtable and an I/O lock for the buffer itself, preventing the risk of false positives due to any concurrent activity. The primary use of this function is the detection of on-disk corruptions for relation pages. If a page is found in shared buffers, the on-disk page is checked if not dirty (a follow-up checkpoint would flush a valid version of the page if dirty anyway), as it could be possible that a page was present for a long time in shared buffers with its on-disk version corrupted. Such a scenario could lead to a corrupted cluster if a host is plugged off for example. If the page is not found in shared buffers, its on-disk state is checked. PageIsVerifiedExtended() is used to apply the same sanity checks as when a page gets loaded into shared buffers. This function will be used by an upcoming patch able to check the state of on-disk relation pages using a SQL function. Author: Julien Rouhaud, Michael Paquier Reviewed-by: Masahiko Sawada Discussion: https://postgr.es/m/CAOBaU_aVvMjQn=ge5qPiJOPMmOj5=ii3st5Q0Y+WuLML5sR17w@mail.gmail.com
1 parent 9e0f87a commit c780a7a

File tree

2 files changed

+95
-0
lines changed

2 files changed

+95
-0
lines changed

src/backend/storage/buffer/bufmgr.c

+92
Original file line numberDiff line numberDiff line change
@@ -4585,3 +4585,95 @@ TestForOldSnapshot_impl(Snapshot snapshot, Relation relation)
45854585
(errcode(ERRCODE_SNAPSHOT_TOO_OLD),
45864586
errmsg("snapshot too old")));
45874587
}
4588+
4589+
4590+
/*
4591+
* CheckBuffer
4592+
*
4593+
* Check the state of a buffer without loading it into the shared buffers. To
4594+
* avoid torn pages and possible false positives when reading data, a shared
4595+
* LWLock is taken on the target buffer pool partition mapping, and we check
4596+
* if the page is in shared buffers or not. An I/O lock is taken on the block
4597+
* to prevent any concurrent activity from happening.
4598+
*
4599+
* If the page is found as dirty in the shared buffers, it is ignored as
4600+
* it will be flushed to disk either before the end of the next checkpoint
4601+
* or during recovery in the event of an unsafe shutdown.
4602+
*
4603+
* If the page is found in the shared buffers but is not dirty, we still
4604+
* check the state of its data on disk, as it could be possible that the
4605+
* page stayed in shared buffers for a rather long time while the on-disk
4606+
* data got corrupted.
4607+
*
4608+
* If the page is not found in shared buffers, the block is read from disk
4609+
* while holding the buffer pool partition mapping LWLock.
4610+
*
4611+
* The page data is stored in a private memory area local to this function
4612+
* while running the checks.
4613+
*/
4614+
bool
4615+
CheckBuffer(SMgrRelation smgr, ForkNumber forknum, BlockNumber blkno)
4616+
{
4617+
char buffer[BLCKSZ];
4618+
BufferTag buf_tag; /* identity of requested block */
4619+
uint32 buf_hash; /* hash value for buf_tag */
4620+
LWLock *partLock; /* buffer partition lock for the buffer */
4621+
BufferDesc *bufdesc;
4622+
int buf_id;
4623+
4624+
Assert(smgrexists(smgr, forknum));
4625+
4626+
/* create a tag so we can look after the buffer */
4627+
INIT_BUFFERTAG(buf_tag, smgr->smgr_rnode.node, forknum, blkno);
4628+
4629+
/* determine its hash code and partition lock ID */
4630+
buf_hash = BufTableHashCode(&buf_tag);
4631+
partLock = BufMappingPartitionLock(buf_hash);
4632+
4633+
/* see if the block is in the buffer pool or not */
4634+
LWLockAcquire(partLock, LW_SHARED);
4635+
buf_id = BufTableLookup(&buf_tag, buf_hash);
4636+
if (buf_id >= 0)
4637+
{
4638+
uint32 buf_state;
4639+
4640+
/*
4641+
* Found it. Now, retrieve its state to know what to do with it, and
4642+
* release the pin immediately. We do so to limit overhead as much as
4643+
* possible. We keep the shared LWLock on the target buffer mapping
4644+
* partition for now, so this buffer cannot be evicted, and we acquire
4645+
* an I/O Lock on the buffer as we may need to read its contents from
4646+
* disk.
4647+
*/
4648+
bufdesc = GetBufferDescriptor(buf_id);
4649+
4650+
LWLockAcquire(BufferDescriptorGetIOLock(bufdesc), LW_SHARED);
4651+
buf_state = LockBufHdr(bufdesc);
4652+
UnlockBufHdr(bufdesc, buf_state);
4653+
4654+
/* If the page is dirty or invalid, skip it */
4655+
if ((buf_state & BM_DIRTY) != 0 || (buf_state & BM_TAG_VALID) == 0)
4656+
{
4657+
LWLockRelease(BufferDescriptorGetIOLock(bufdesc));
4658+
LWLockRelease(partLock);
4659+
return true;
4660+
}
4661+
4662+
/* Read the buffer from disk, with the I/O lock still held */
4663+
smgrread(smgr, forknum, blkno, buffer);
4664+
LWLockRelease(BufferDescriptorGetIOLock(bufdesc));
4665+
}
4666+
else
4667+
{
4668+
/*
4669+
* Simply read the buffer. There's no risk of modification on it as
4670+
* we are holding the buffer pool partition mapping lock.
4671+
*/
4672+
smgrread(smgr, forknum, blkno, buffer);
4673+
}
4674+
4675+
/* buffer lookup done, so now do its check */
4676+
LWLockRelease(partLock);
4677+
4678+
return PageIsVerifiedExtended(buffer, blkno, PIV_REPORT_STAT);
4679+
}

src/include/storage/bufmgr.h

+3
Original file line numberDiff line numberDiff line change
@@ -240,6 +240,9 @@ extern void AtProcExit_LocalBuffers(void);
240240

241241
extern void TestForOldSnapshot_impl(Snapshot snapshot, Relation relation);
242242

243+
extern bool CheckBuffer(struct SMgrRelationData *smgr, ForkNumber forknum,
244+
BlockNumber blkno);
245+
243246
/* in freelist.c */
244247
extern BufferAccessStrategy GetAccessStrategy(BufferAccessStrategyType btype);
245248
extern void FreeAccessStrategy(BufferAccessStrategy strategy);

0 commit comments

Comments
 (0)