Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

Commit 7e2561e

Browse files
committed
Cope with catcache entries becoming stale during detoasting.
We've long had a policy that any toasted fields in a catalog tuple should be pulled in-line before entering the tuple in a catalog cache. However, that requires access to the catalog's toast table, and we'll typically do AcceptInvalidationMessages while opening the toast table. So it's possible that the catalog tuple is outdated by the time we finish detoasting it. Since no cache entry exists yet, we can't mark the entry stale during AcceptInvalidationMessages, and instead we'll press forward and build an apparently-valid cache entry. The upshot is that we have a race condition whereby an out-of-date entry could be made in a backend's catalog cache, and persist there indefinitely causing indeterminate misbehavior. To fix, use the existing systable_recheck_tuple code to recheck whether the catalog tuple is still up-to-date after we finish detoasting it. If not, loop around and restart the process of searching the catalog and constructing cache entries from the top. The case is rare enough that this shouldn't create any meaningful performance penalty, even in the SearchCatCacheList case where we need to tear down and reconstruct the whole list. Indeed, the case is so rare that AFAICT it doesn't occur during our regression tests, and there doesn't seem to be any easy way to build a test that would exercise it reliably. To allow testing of the retry code paths, add logic (in USE_ASSERT_CHECKING builds only) that randomly pretends that the recheck failed about one time out of a thousand. This is enough to ensure that we'll pass through the retry paths during most regression test runs. By adding an extra level of looping, this commit creates a need to reindent most of SearchCatCacheMiss and SearchCatCacheList. I'll do that separately, to allow putting those changes in .git-blame-ignore-revs. Patch by me; thanks to Alexander Lakhin for having built a test case to prove the bug is real, and to Xiaoran Wang for review. Back-patch to all supported branches. Discussion: https://postgr.es/m/1393953.1698353013@sss.pgh.pa.us Discussion: https://postgr.es/m/CAGjhLkOoBEC9mLsnB42d3CO1vcMx71MLSEuigeABbQ8oRdA6gw@mail.gmail.com
1 parent 41820e6 commit 7e2561e

File tree

1 file changed

+124
-40
lines changed

1 file changed

+124
-40
lines changed

src/backend/utils/cache/catcache.c

Lines changed: 124 additions & 40 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,7 @@
2424
#include "catalog/pg_operator.h"
2525
#include "catalog/pg_type.h"
2626
#include "common/hashfn.h"
27+
#include "common/pg_prng.h"
2728
#include "miscadmin.h"
2829
#include "port/pg_bitutils.h"
2930
#ifdef CATCACHE_STATS
@@ -89,10 +90,10 @@ static void CatCachePrintStats(int code, Datum arg);
8990
static void CatCacheRemoveCTup(CatCache *cache, CatCTup *ct);
9091
static void CatCacheRemoveCList(CatCache *cache, CatCList *cl);
9192
static void CatalogCacheInitializeCache(CatCache *cache);
92-
static CatCTup *CatalogCacheCreateEntry(CatCache *cache, HeapTuple ntp,
93+
static CatCTup *CatalogCacheCreateEntry(CatCache *cache,
94+
HeapTuple ntp, SysScanDesc scandesc,
9395
Datum *arguments,
94-
uint32 hashValue, Index hashIndex,
95-
bool negative);
96+
uint32 hashValue, Index hashIndex);
9697

9798
static void CatCacheFreeKeys(TupleDesc tupdesc, int nkeys, int *attnos,
9899
Datum *keys);
@@ -1316,6 +1317,7 @@ SearchCatCacheMiss(CatCache *cache,
13161317
SysScanDesc scandesc;
13171318
HeapTuple ntp;
13181319
CatCTup *ct;
1320+
bool stale;
13191321
Datum arguments[CATCACHE_MAXKEYS];
13201322

13211323
/* Initialize local parameter array */
@@ -1324,16 +1326,6 @@ SearchCatCacheMiss(CatCache *cache,
13241326
arguments[2] = v3;
13251327
arguments[3] = v4;
13261328

1327-
/*
1328-
* Ok, need to make a lookup in the relation, copy the scankey and fill
1329-
* out any per-call fields.
1330-
*/
1331-
memcpy(cur_skey, cache->cc_skey, sizeof(ScanKeyData) * nkeys);
1332-
cur_skey[0].sk_argument = v1;
1333-
cur_skey[1].sk_argument = v2;
1334-
cur_skey[2].sk_argument = v3;
1335-
cur_skey[3].sk_argument = v4;
1336-
13371329
/*
13381330
* Tuple was not found in cache, so we have to try to retrieve it directly
13391331
* from the relation. If found, we will add it to the cache; if not
@@ -1348,9 +1340,28 @@ SearchCatCacheMiss(CatCache *cache,
13481340
* will eventually age out of the cache, so there's no functional problem.
13491341
* This case is rare enough that it's not worth expending extra cycles to
13501342
* detect.
1343+
*
1344+
* Another case, which we *must* handle, is that the tuple could become
1345+
* outdated during CatalogCacheCreateEntry's attempt to detoast it (since
1346+
* AcceptInvalidationMessages can run during TOAST table access). We do
1347+
* not want to return already-stale catcache entries, so we loop around
1348+
* and do the table scan again if that happens.
13511349
*/
13521350
relation = table_open(cache->cc_reloid, AccessShareLock);
13531351

1352+
do
1353+
{
1354+
/*
1355+
* Ok, need to make a lookup in the relation, copy the scankey and
1356+
* fill out any per-call fields. (We must re-do this when retrying,
1357+
* because systable_beginscan scribbles on the scankey.)
1358+
*/
1359+
memcpy(cur_skey, cache->cc_skey, sizeof(ScanKeyData) * nkeys);
1360+
cur_skey[0].sk_argument = v1;
1361+
cur_skey[1].sk_argument = v2;
1362+
cur_skey[2].sk_argument = v3;
1363+
cur_skey[3].sk_argument = v4;
1364+
13541365
scandesc = systable_beginscan(relation,
13551366
cache->cc_indexoid,
13561367
IndexScanOK(cache, cur_skey),
@@ -1359,12 +1370,18 @@ SearchCatCacheMiss(CatCache *cache,
13591370
cur_skey);
13601371

13611372
ct = NULL;
1373+
stale = false;
13621374

13631375
while (HeapTupleIsValid(ntp = systable_getnext(scandesc)))
13641376
{
1365-
ct = CatalogCacheCreateEntry(cache, ntp, arguments,
1366-
hashValue, hashIndex,
1367-
false);
1377+
ct = CatalogCacheCreateEntry(cache, ntp, scandesc, NULL,
1378+
hashValue, hashIndex);
1379+
/* upon failure, we must start the scan over */
1380+
if (ct == NULL)
1381+
{
1382+
stale = true;
1383+
break;
1384+
}
13681385
/* immediately set the refcount to 1 */
13691386
ResourceOwnerEnlargeCatCacheRefs(CurrentResourceOwner);
13701387
ct->refcount++;
@@ -1373,6 +1390,7 @@ SearchCatCacheMiss(CatCache *cache,
13731390
}
13741391

13751392
systable_endscan(scandesc);
1393+
} while (stale);
13761394

13771395
table_close(relation, AccessShareLock);
13781396

@@ -1391,9 +1409,11 @@ SearchCatCacheMiss(CatCache *cache,
13911409
if (IsBootstrapProcessingMode())
13921410
return NULL;
13931411

1394-
ct = CatalogCacheCreateEntry(cache, NULL, arguments,
1395-
hashValue, hashIndex,
1396-
true);
1412+
ct = CatalogCacheCreateEntry(cache, NULL, NULL, arguments,
1413+
hashValue, hashIndex);
1414+
1415+
/* Creating a negative cache entry shouldn't fail */
1416+
Assert(ct != NULL);
13971417

13981418
CACHE_elog(DEBUG2, "SearchCatCache(%s): Contains %d/%d tuples",
13991419
cache->cc_relname, cache->cc_ntup, CacheHdr->ch_ntup);
@@ -1600,7 +1620,8 @@ SearchCatCacheList(CatCache *cache,
16001620
* We have to bump the member refcounts temporarily to ensure they won't
16011621
* get dropped from the cache while loading other members. We use a PG_TRY
16021622
* block to ensure we can undo those refcounts if we get an error before
1603-
* we finish constructing the CatCList.
1623+
* we finish constructing the CatCList. ctlist must be valid throughout
1624+
* the PG_TRY block.
16041625
*/
16051626
ResourceOwnerEnlargeCatCacheListRefs(CurrentResourceOwner);
16061627

@@ -1611,19 +1632,23 @@ SearchCatCacheList(CatCache *cache,
16111632
ScanKeyData cur_skey[CATCACHE_MAXKEYS];
16121633
Relation relation;
16131634
SysScanDesc scandesc;
1614-
1615-
/*
1616-
* Ok, need to make a lookup in the relation, copy the scankey and
1617-
* fill out any per-call fields.
1618-
*/
1619-
memcpy(cur_skey, cache->cc_skey, sizeof(ScanKeyData) * cache->cc_nkeys);
1620-
cur_skey[0].sk_argument = v1;
1621-
cur_skey[1].sk_argument = v2;
1622-
cur_skey[2].sk_argument = v3;
1623-
cur_skey[3].sk_argument = v4;
1635+
bool stale;
16241636

16251637
relation = table_open(cache->cc_reloid, AccessShareLock);
16261638

1639+
do
1640+
{
1641+
/*
1642+
* Ok, need to make a lookup in the relation, copy the scankey and
1643+
* fill out any per-call fields. (We must re-do this when
1644+
* retrying, because systable_beginscan scribbles on the scankey.)
1645+
*/
1646+
memcpy(cur_skey, cache->cc_skey, sizeof(ScanKeyData) * cache->cc_nkeys);
1647+
cur_skey[0].sk_argument = v1;
1648+
cur_skey[1].sk_argument = v2;
1649+
cur_skey[2].sk_argument = v3;
1650+
cur_skey[3].sk_argument = v4;
1651+
16271652
scandesc = systable_beginscan(relation,
16281653
cache->cc_indexoid,
16291654
IndexScanOK(cache, cur_skey),
@@ -1634,6 +1659,8 @@ SearchCatCacheList(CatCache *cache,
16341659
/* The list will be ordered iff we are doing an index scan */
16351660
ordered = (scandesc->irel != NULL);
16361661

1662+
stale = false;
1663+
16371664
while (HeapTupleIsValid(ntp = systable_getnext(scandesc)))
16381665
{
16391666
uint32 hashValue;
@@ -1676,9 +1703,32 @@ SearchCatCacheList(CatCache *cache,
16761703
if (!found)
16771704
{
16781705
/* We didn't find a usable entry, so make a new one */
1679-
ct = CatalogCacheCreateEntry(cache, ntp, arguments,
1680-
hashValue, hashIndex,
1681-
false);
1706+
ct = CatalogCacheCreateEntry(cache, ntp, scandesc, NULL,
1707+
hashValue, hashIndex);
1708+
/* upon failure, we must start the scan over */
1709+
if (ct == NULL)
1710+
{
1711+
/*
1712+
* Release refcounts on any items we already had. We dare
1713+
* not try to free them if they're now unreferenced, since
1714+
* an error while doing that would result in the PG_CATCH
1715+
* below doing extra refcount decrements. Besides, we'll
1716+
* likely re-adopt those items in the next iteration, so
1717+
* it's not worth complicating matters to try to get rid
1718+
* of them.
1719+
*/
1720+
foreach(ctlist_item, ctlist)
1721+
{
1722+
ct = (CatCTup *) lfirst(ctlist_item);
1723+
Assert(ct->c_list == NULL);
1724+
Assert(ct->refcount > 0);
1725+
ct->refcount--;
1726+
}
1727+
/* Reset ctlist in preparation for new try */
1728+
ctlist = NIL;
1729+
stale = true;
1730+
break;
1731+
}
16821732
}
16831733

16841734
/* Careful here: add entry to ctlist, then bump its refcount */
@@ -1688,6 +1738,7 @@ SearchCatCacheList(CatCache *cache,
16881738
}
16891739

16901740
systable_endscan(scandesc);
1741+
} while (stale);
16911742

16921743
table_close(relation, AccessShareLock);
16931744

@@ -1794,22 +1845,42 @@ ReleaseCatCacheList(CatCList *list)
17941845
* CatalogCacheCreateEntry
17951846
* Create a new CatCTup entry, copying the given HeapTuple and other
17961847
* supplied data into it. The new entry initially has refcount 0.
1848+
*
1849+
* To create a normal cache entry, ntp must be the HeapTuple just fetched
1850+
* from scandesc, and "arguments" is not used. To create a negative cache
1851+
* entry, pass NULL for ntp and scandesc; then "arguments" is the cache
1852+
* keys to use. In either case, hashValue/hashIndex are the hash values
1853+
* computed from the cache keys.
1854+
*
1855+
* Returns NULL if we attempt to detoast the tuple and observe that it
1856+
* became stale. (This cannot happen for a negative entry.) Caller must
1857+
* retry the tuple lookup in that case.
17971858
*/
17981859
static CatCTup *
1799-
CatalogCacheCreateEntry(CatCache *cache, HeapTuple ntp, Datum *arguments,
1800-
uint32 hashValue, Index hashIndex,
1801-
bool negative)
1860+
CatalogCacheCreateEntry(CatCache *cache, HeapTuple ntp, SysScanDesc scandesc,
1861+
Datum *arguments,
1862+
uint32 hashValue, Index hashIndex)
18021863
{
18031864
CatCTup *ct;
18041865
HeapTuple dtp;
18051866
MemoryContext oldcxt;
18061867

1807-
/* negative entries have no tuple associated */
18081868
if (ntp)
18091869
{
18101870
int i;
18111871

1812-
Assert(!negative);
1872+
/*
1873+
* The visibility recheck below essentially never fails during our
1874+
* regression tests, and there's no easy way to force it to fail for
1875+
* testing purposes. To ensure we have test coverage for the retry
1876+
* paths in our callers, make debug builds randomly fail about 0.1% of
1877+
* the times through this code path, even when there's no toasted
1878+
* fields.
1879+
*/
1880+
#ifdef USE_ASSERT_CHECKING
1881+
if (pg_prng_uint32(&pg_global_prng_state) <= (PG_UINT32_MAX / 1000))
1882+
return NULL;
1883+
#endif
18131884

18141885
/*
18151886
* If there are any out-of-line toasted fields in the tuple, expand
@@ -1819,7 +1890,20 @@ CatalogCacheCreateEntry(CatCache *cache, HeapTuple ntp, Datum *arguments,
18191890
* something using a slightly stale catcache entry.
18201891
*/
18211892
if (HeapTupleHasExternal(ntp))
1893+
{
18221894
dtp = toast_flatten_tuple(ntp, cache->cc_tupdesc);
1895+
1896+
/*
1897+
* The tuple could become stale while we are doing toast table
1898+
* access (since AcceptInvalidationMessages can run then), so we
1899+
* must recheck its visibility afterwards.
1900+
*/
1901+
if (!systable_recheck_tuple(scandesc, ntp))
1902+
{
1903+
heap_freetuple(dtp);
1904+
return NULL;
1905+
}
1906+
}
18231907
else
18241908
dtp = ntp;
18251909

@@ -1858,7 +1942,7 @@ CatalogCacheCreateEntry(CatCache *cache, HeapTuple ntp, Datum *arguments,
18581942
}
18591943
else
18601944
{
1861-
Assert(negative);
1945+
/* Set up keys for a negative cache entry */
18621946
oldcxt = MemoryContextSwitchTo(CacheMemoryContext);
18631947
ct = (CatCTup *) palloc(sizeof(CatCTup));
18641948

@@ -1880,7 +1964,7 @@ CatalogCacheCreateEntry(CatCache *cache, HeapTuple ntp, Datum *arguments,
18801964
ct->c_list = NULL;
18811965
ct->refcount = 0; /* for the moment */
18821966
ct->dead = false;
1883-
ct->negative = negative;
1967+
ct->negative = (ntp == NULL);
18841968
ct->hash_value = hashValue;
18851969

18861970
dlist_push_head(&cache->cc_bucket[hashIndex], &ct->cache_elem);

0 commit comments

Comments
 (0)