Fix incorrect order of operations during sinval reset processing.

tglsfdc · tglsfdc · commit f4d7f1adbae8 · 2011-08-16T14:38:20.000-04:00
We have to be sure that we have revalidated each nailed-in-cache relcache
entry before we try to use it to load data for some other relcache entry.
The introduction of "mapped relations" in 9.0 broke this, because although
we updated the state kept in relmapper.c early enough, we failed to
propagate that information into relcache entries soon enough; in
particular, we could try to fetch pg_class rows out of pg_class before
we'd updated its relcache entry's rd_node.relNode value from the map.

This bug accounts for Dave Gould's report of failures after "vacuum full
pg_class", and I believe that there is risk for other system catalogs
as well.

The core part of the fix is to copy relmapper data into the relcache
entries during "phase 1" in RelationCacheInvalidate(), before they'll be
used in "phase 2".  To try to future-proof the code against other similar
bugs, I also rearranged the order in which nailed relations are visited
during phase 2: now it's pg_class first, then pg_class_oid_index, then
other nailed relations.  This should ensure that RelationClearRelation can
apply RelationReloadIndexInfo to all nailed indexes without risking use
of not-yet-revalidated relcache entries.

Back-patch to 9.0 where the relation mapper was introduced.
diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
@@ -2104,11 +2104,11 @@ RelationCacheInvalidateEntry(Oid relationId)
  *	 so hash_seq_search will complete safely; (b) during the second pass we
  *	 only hold onto pointers to nondeletable entries.
  *
- *	 The two-phase approach also makes it easy to ensure that we process
- *	 nailed-in-cache indexes before other nondeletable items, and that we
- *	 process pg_class_oid_index first of all.  In scenarios where a nailed
- *	 index has been given a new relfilenode, we have to detect that update
- *	 before the nailed index is used in reloading any other relcache entry.
+ *	 The two-phase approach also makes it easy to update relfilenodes for
+ *	 mapped relations before we do anything else, and to ensure that the
+ *	 second pass processes nailed-in-cache items before other nondeletable
+ *	 items.  This should ensure that system catalogs are up to date before
+ *	 we attempt to use them to reload information about other open relations.
  */
 void
 RelationCacheInvalidate(void)
@@ -2120,6 +2120,11 @@ RelationCacheInvalidate(void)
 	List	   *rebuildList = NIL;
 	ListCell   *l;
 
+	/*
+	 * Reload relation mapping data before starting to reconstruct cache.
+	 */
+	RelationMapInvalidateAll();
+
 	/* Phase 1 */
 	hash_seq_init(&status, RelationIdCache);
 
@@ -2130,7 +2135,7 @@ RelationCacheInvalidate(void)
 		/* Must close all smgr references to avoid leaving dangling ptrs */
 		RelationCloseSmgr(relation);
 
-		/* Ignore new relations, since they are never SI targets */
+		/* Ignore new relations, since they are never cross-backend targets */
 		if (relation->rd_createSubid != InvalidSubTransactionId)
 			continue;
 
@@ -2144,22 +2149,33 @@ RelationCacheInvalidate(void)
 		}
 		else
 		{
+			/*
+			 * If it's a mapped relation, immediately update its rd_node in
+			 * case its relfilenode changed.  We must do this during phase 1
+			 * in case the relation is consulted during rebuild of other
+			 * relcache entries in phase 2.  It's safe since consulting the
+			 * map doesn't involve any access to relcache entries.
+			 */
+			if (RelationIsMapped(relation))
+				RelationInitPhysicalAddr(relation);
+
 			/*
 			 * Add this entry to list of stuff to rebuild in second pass.
-			 * pg_class_oid_index goes on the front of rebuildFirstList, other
-			 * nailed indexes on the back, and everything else into
-			 * rebuildList (in no particular order).
+			 * pg_class goes to the front of rebuildFirstList while
+			 * pg_class_oid_index goes to the back of rebuildFirstList, so
+			 * they are done first and second respectively.  Other nailed
+			 * relations go to the front of rebuildList, so they'll be done
+			 * next in no particular order; and everything else goes to the
+			 * back of rebuildList.
 			 */
-			if (relation->rd_isnailed &&
-				relation->rd_rel->relkind == RELKIND_INDEX)
-			{
-				if (RelationGetRelid(relation) == ClassOidIndexId)
-					rebuildFirstList = lcons(relation, rebuildFirstList);
-				else
-					rebuildFirstList = lappend(rebuildFirstList, relation);
-			}
-			else
+			if (RelationGetRelid(relation) == RelationRelationId)
+				rebuildFirstList = lcons(relation, rebuildFirstList);
+			else if (RelationGetRelid(relation) == ClassOidIndexId)
+				rebuildFirstList = lappend(rebuildFirstList, relation);
+			else if (relation->rd_isnailed)
 				rebuildList = lcons(relation, rebuildList);
+			else
+				rebuildList = lappend(rebuildList, relation);
 		}
 	}
 
@@ -2170,11 +2186,6 @@ RelationCacheInvalidate(void)
 	 */
 	smgrcloseall();
 
-	/*
-	 * Reload relation mapping data before starting to reconstruct cache.
-	 */
-	RelationMapInvalidateAll();
-
 	/* Phase 2: rebuild the items found to need rebuild in phase 1 */
 	foreach(l, rebuildFirstList)
 	{