Don't lock partitions pruned by initial pruning

Before executing a cached generic plan, AcquireExecutorLocks() in plancache.c locks all relations in a plan's range table to ensure the plan is safe for execution. However, this locks runtime-prunable relations that will later be pruned during "initial" runtime pruning, introducing unnecessary overhead. This commit defers locking for such relations to executor startup and ensures that if the CachedPlan is invalidated due to concurrent DDL during this window, replanning is triggered. Deferring these locks avoids unnecessary locking overhead for pruned partitions, resulting in significant speedup, particularly when many partitions are pruned during initial runtime pruning. * Changes to locking when executing generic plans: AcquireExecutorLocks() now locks only unprunable relations, that is, those found in PlannedStmt.unprunableRelids (introduced in commit cbc127917e), to avoid locking runtime-prunable partitions unnecessarily. The remaining locks are taken by ExecDoInitialPruning(), which acquires them only for partitions that survive pruning. This deferral does not affect the locks required for permission checking in InitPlan(), which takes place before initial pruning. ExecCheckPermissions() now includes an Assert to verify that all relations undergoing permission checks, none of which can be in the set of runtime-prunable relations, are properly locked. * Plan invalidation handling: Deferring locks introduces a window where prunable relations may be altered by concurrent DDL, invalidating the plan. A new function, ExecutorStartCachedPlan(), wraps ExecutorStart() to detect and handle invalidation caused by deferred locking. If invalidation occurs, ExecutorStartCachedPlan() updates CachedPlan using the new UpdateCachedPlan() function and retries execution with the updated plan. To ensure all code paths that may be affected by this handle invalidation properly, all callers of ExecutorStart that may execute a PlannedStmt from a CachedPlan have been updated to use ExecutorStartCachedPlan() instead. UpdateCachedPlan() replaces stale plans in CachedPlan.stmt_list. A new CachedPlan.stmt_context, created as a child of CachedPlan.context, allows freeing old PlannedStmts while preserving the CachedPlan structure and its statement list. This ensures that loops over statements in upstream callers of ExecutorStartCachedPlan() remain intact. ExecutorStart() and ExecutorStart_hook implementations now return a boolean value indicating whether plan initialization succeeded with a valid PlanState tree in QueryDesc.planstate, or false otherwise, in which case QueryDesc.planstate is NULL. Hook implementations are required to call standard_ExecutorStart() at the beginning, and if it returns false, they should do the same without proceeding. * Testing: To verify these changes, the delay_execution module tests scenarios where cached plans become invalid due to changes in prunable relations after deferred locks. * Note to extension authors: ExecutorStart_hook implementations must verify plan validity after calling standard_ExecutorStart(), as explained earlier. For example: if (prev_ExecutorStart) plan_valid = prev_ExecutorStart(queryDesc, eflags); else plan_valid = standard_ExecutorStart(queryDesc, eflags); if (!plan_valid) return false; <extension-code> return true; Extensions accessing child relations, especially prunable partitions, via ExecGetRangeTableRelation() must now ensure their RT indexes are present in es_unpruned_relids (introduced in commit cbc127917e), or they will encounter an error. This is a strict requirement after this change, as only relations in that set are locked. The idea of deferring some locks to executor startup, allowing locks for prunable partitions to be skipped, was first proposed by Tom Lane. Reviewed-by: Robert Haas <robertmhaas@gmail.com> (earlier versions) Reviewed-by: David Rowley <dgrowleyml@gmail.com> (earlier versions) Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> (earlier versions) Reviewed-by: Tomas Vondra <tomas@vondra.me> Reviewed-by: Junwang Zhao <zhjwpku@gmail.com> Discussion: https://postgr.es/m/CA+HiwqFGkMSge6TgC9KQzde0ohpAycLQuV7ooitEEpbKB0O_mg@mail.gmail.com
author: Amit Langote 2025-02-20 08:09:48 +0000
committer: Amit Langote 2025-02-20 08:09:48 +0000
commit: 525392d5727f469e9a5882e1d728917a4be56147 (patch)
tree: d11c0f04fefc3c6d4bf91eb969a3d091d0d67979 /src/backend/utils/cache
parent: 4aa6fa3cd0a2422f1bb47db837b82f2b53d4c065 (diff)
1 files changed, 172 insertions, 25 deletions
diff --git a/src/backend/utils/cache/plancache.c b/src/backend/utils/cache/plancache.c
index 55db8f53705..6c2979d5c82 100644
--- a/src/backend/utils/cache/plancache.c
+++ b/src/backend/utils/cache/plancache.c
@@ -101,7 +101,8 @@ static dlist_head cached_expression_list = DLIST_STATIC_INIT(cached_expression_l
 
 static void ReleaseGenericPlan(CachedPlanSource *plansource);
 static List *RevalidateCachedQuery(CachedPlanSource *plansource,
-								   QueryEnvironment *queryEnv);
+								   QueryEnvironment *queryEnv,
+								   bool release_generic);
 static bool CheckCachedPlan(CachedPlanSource *plansource);
 static CachedPlan *BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
 								   ParamListInfo boundParams, QueryEnvironment *queryEnv);
@@ -578,10 +579,17 @@ ReleaseGenericPlan(CachedPlanSource *plansource)
  * The result value is the transient analyzed-and-rewritten query tree if we
  * had to do re-analysis, and NIL otherwise.  (This is returned just to save
  * a tree copying step in a subsequent BuildCachedPlan call.)
+ *
+ * This also releases and drops the generic plan (plansource->gplan), if any,
+ * as most callers will typically build a new CachedPlan for the plansource
+ * right after this. However, when called from UpdateCachedPlan(), the
+ * function does not release the generic plan, as UpdateCachedPlan() updates
+ * an existing CachedPlan in place.
  */
 static List *
 RevalidateCachedQuery(CachedPlanSource *plansource,
-					  QueryEnvironment *queryEnv)
+					  QueryEnvironment *queryEnv,
+					  bool release_generic)
 {
 	bool		snapshot_set;
 	RawStmt    *rawtree;
@@ -678,8 +686,9 @@ RevalidateCachedQuery(CachedPlanSource *plansource,
 		MemoryContextDelete(qcxt);
 	}
 
-	/* Drop the generic plan reference if any */
-	ReleaseGenericPlan(plansource);
+	/* Drop the generic plan reference, if any, and if requested */
+	if (release_generic)
+		ReleaseGenericPlan(plansource);
 
 	/*
 	 * Now re-do parse analysis and rewrite.  This not incidentally acquires
@@ -815,8 +824,10 @@ RevalidateCachedQuery(CachedPlanSource *plansource,
  * Caller must have already called RevalidateCachedQuery to verify that the
  * querytree is up to date.
  *
- * On a "true" return, we have acquired the locks needed to run the plan.
- * (We must do this for the "true" result to be race-condition-free.)
+ * On a "true" return, we have acquired locks on the "unprunableRelids" set
+ * for all plans in plansource->stmt_list. However, the plans are not fully
+ * race-condition-free until the executor acquires locks on the prunable
+ * relations that survive initial runtime pruning during InitPlan().
  */
 static bool
 CheckCachedPlan(CachedPlanSource *plansource)
@@ -901,6 +912,8 @@ CheckCachedPlan(CachedPlanSource *plansource)
  * Planning work is done in the caller's memory context.  The finished plan
  * is in a child memory context, which typically should get reparented
  * (unless this is a one-shot plan, in which case we don't copy the plan).
+ *
+ * Note: When changing this, you should also look at UpdateCachedPlan().
  */
 static CachedPlan *
 BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
@@ -911,6 +924,7 @@ BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
 	bool		snapshot_set;
 	bool		is_transient;
 	MemoryContext plan_context;
+	MemoryContext stmt_context = NULL;
 	MemoryContext oldcxt = CurrentMemoryContext;
 	ListCell   *lc;
 
@@ -928,7 +942,7 @@ BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
 	 * let's treat it as real and redo the RevalidateCachedQuery call.
 	 */
 	if (!plansource->is_valid)
-		qlist = RevalidateCachedQuery(plansource, queryEnv);
+		qlist = RevalidateCachedQuery(plansource, queryEnv, true);
 
 	/*
 	 * If we don't already have a copy of the querytree list that can be
@@ -967,10 +981,19 @@ BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
 		PopActiveSnapshot();
 
 	/*
-	 * Normally we make a dedicated memory context for the CachedPlan and its
-	 * subsidiary data.  (It's probably not going to be large, but just in
-	 * case, allow it to grow large.  It's transient for the moment.)  But for
-	 * a one-shot plan, we just leave it in the caller's memory context.
+	 * Normally, we create a dedicated memory context for the CachedPlan and
+	 * its subsidiary data. Although it's usually not very large, the context
+	 * is designed to allow growth if necessary.
+	 *
+	 * The PlannedStmts are stored in a separate child context (stmt_context)
+	 * of the CachedPlan's memory context. This separation allows
+	 * UpdateCachedPlan() to free and replace the PlannedStmts without
+	 * affecting the CachedPlan structure or its stmt_list List.
+	 *
+	 * For one-shot plans, we instead use the caller's memory context, as the
+	 * CachedPlan will not persist.  stmt_context will be set to NULL in this
+	 * case, because UpdateCachedPlan() should never get called on a one-shot
+	 * plan.
 	 */
 	if (!plansource->is_oneshot)
 	{
@@ -979,12 +1002,17 @@ BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
 											 ALLOCSET_START_SMALL_SIZES);
 		MemoryContextCopyAndSetIdentifier(plan_context, plansource->query_string);
 
-		/*
-		 * Copy plan into the new context.
-		 */
-		MemoryContextSwitchTo(plan_context);
+		stmt_context = AllocSetContextCreate(CurrentMemoryContext,
+											 "CachedPlan PlannedStmts",
+											 ALLOCSET_START_SMALL_SIZES);
+		MemoryContextCopyAndSetIdentifier(stmt_context, plansource->query_string);
+		MemoryContextSetParent(stmt_context, plan_context);
 
+		MemoryContextSwitchTo(stmt_context);
 		plist = copyObject(plist);
+
+		MemoryContextSwitchTo(plan_context);
+		plist = list_copy(plist);
 	}
 	else
 		plan_context = CurrentMemoryContext;
@@ -1025,8 +1053,10 @@ BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
 		plan->saved_xmin = InvalidTransactionId;
 	plan->refcount = 0;
 	plan->context = plan_context;
+	plan->stmt_context = stmt_context;
 	plan->is_oneshot = plansource->is_oneshot;
 	plan->is_saved = false;
+	plan->is_reused = false;
 	plan->is_valid = true;
 
 	/* assign generation number to new plan */
@@ -1038,6 +1068,113 @@ BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
 }
 
 /*
+ * UpdateCachedPlan
+ *		Create fresh plans for all queries in the CachedPlanSource, replacing
+ *		those in the generic plan's stmt_list, and return the plan for the
+ *		query_index'th query.
+ *
+ * This function is primarily used by ExecutorStartCachedPlan() to handle
+ * cases where the original generic CachedPlan becomes invalid. Such
+ * invalidation may occur when prunable relations in the old plan for the
+ * query_index'th query are locked in preparation for execution.
+ *
+ * Note that invalidations received during the execution of the query_index'th
+ * query can affect both the queries that have already finished execution
+ * (e.g., due to concurrent modifications on prunable relations that were not
+ * locked during their execution) and also the queries that have not yet been
+ * executed.  As a result, this function updates all plans to ensure
+ * CachedPlan.is_valid is safely set to true.
+ *
+ * The old PlannedStmts in plansource->gplan->stmt_list are freed here, so
+ * the caller and any of its callers must not rely on them remaining accessible
+ * after this function is called.
+ */
+PlannedStmt *
+UpdateCachedPlan(CachedPlanSource *plansource, int query_index,
+				 QueryEnvironment *queryEnv)
+{
+	List	   *query_list = plansource->query_list,
+			   *plan_list;
+	ListCell   *l1,
+			   *l2;
+	CachedPlan *plan = plansource->gplan;
+	MemoryContext oldcxt;
+
+	Assert(ActiveSnapshotSet());
+
+	/* Sanity checks (XXX can be Asserts?) */
+	if (plan == NULL)
+		elog(ERROR, "UpdateCachedPlan() called in the wrong context: plansource->gplan is NULL");
+	else if (plan->is_valid)
+		elog(ERROR, "UpdateCachedPlan() called in the wrong context: plansource->gplan->is_valid is true");
+	else if (plan->is_oneshot)
+		elog(ERROR, "UpdateCachedPlan() called in the wrong context: plansource->gplan->is_oneshot is true");
+
+	/*
+	 * The plansource might have become invalid since GetCachedPlan() returned
+	 * the CachedPlan. See the comment in BuildCachedPlan() for details on why
+	 * this might happen.  Although invalidation is likely a false positive as
+	 * stated there, we make the plan valid to ensure the query list used for
+	 * planning is up to date.
+	 *
+	 * The risk of catching an invalidation is higher here than when
+	 * BuildCachedPlan() is called from GetCachedPlan(), because this function
+	 * is normally called long after GetCachedPlan() returns the CachedPlan,
+	 * so much more processing could have occurred including things that mark
+	 * the CachedPlanSource invalid.
+	 *
+	 * Note: Do not release plansource->gplan, because the upstream callers
+	 * (such as the callers of ExecutorStartCachedPlan()) would still be
+	 * referencing it.
+	 */
+	if (!plansource->is_valid)
+		query_list = RevalidateCachedQuery(plansource, queryEnv, false);
+	Assert(query_list != NIL);
+
+	/*
+	 * Build a new generic plan for all the queries after making a copy to be
+	 * scribbled on by the planner.
+	 */
+	query_list = copyObject(query_list);
+
+	/*
+	 * Planning work is done in the caller's memory context.  The resulting
+	 * PlannedStmt is then copied into plan->stmt_context after throwing away
+	 * the old ones.
+	 */
+	plan_list = pg_plan_queries(query_list, plansource->query_string,
+								plansource->cursor_options, NULL);
+	Assert(list_length(plan_list) == list_length(plan->stmt_list));
+
+	MemoryContextReset(plan->stmt_context);
+	oldcxt = MemoryContextSwitchTo(plan->stmt_context);
+	forboth(l1, plan_list, l2, plan->stmt_list)
+	{
+		PlannedStmt *plannedstmt = lfirst(l1);
+
+		lfirst(l2) = copyObject(plannedstmt);
+	}
+	MemoryContextSwitchTo(oldcxt);
+
+	/*
+	 * XXX Should this also (re)set the properties of the CachedPlan that are
+	 * set in BuildCachedPlan() after creating the fresh plans such as
+	 * planRoleId, dependsOnRole, and save_xmin?
+	 */
+
+	/*
+	 * We've updated all the plans that might have been invalidated, so mark
+	 * the CachedPlan as valid.
+	 */
+	plan->is_valid = true;
+
+	/* Also update generic_cost because we just created a new generic plan. */
+	plansource->generic_cost = cached_plan_cost(plan, false);
+
+	return list_nth_node(PlannedStmt, plan->stmt_list, query_index);
+}
+
+/*
  * choose_custom_plan: choose whether to use custom or generic plan
  *
  * This defines the policy followed by GetCachedPlan.
@@ -1153,8 +1290,13 @@ cached_plan_cost(CachedPlan *plan, bool include_planner)
  * plan or a custom plan for the given parameters: the caller does not know
  * which it will get.
  *
- * On return, the plan is valid and we have sufficient locks to begin
- * execution.
+ * On return, the plan is valid, but if it is a reused generic plan, not all
+ * locks are acquired. In such cases, CheckCachedPlan() does not take locks
+ * on relations subject to initial runtime pruning; instead, these locks are
+ * deferred until execution startup, when ExecDoInitialPruning() performs
+ * initial pruning.  The plan's "is_reused" flag is set to indicate that
+ * CachedPlanRequiresLocking() should return true when called by
+ * ExecDoInitialPruning().
  *
  * On return, the refcount of the plan has been incremented; a later
  * ReleaseCachedPlan() call is expected.  If "owner" is not NULL then
@@ -1180,7 +1322,7 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
 		elog(ERROR, "cannot apply ResourceOwner to non-saved cached plan");
 
 	/* Make sure the querytree list is valid and we have parse-time locks */
-	qlist = RevalidateCachedQuery(plansource, queryEnv);
+	qlist = RevalidateCachedQuery(plansource, queryEnv, true);
 
 	/* Decide whether to use a custom plan */
 	customplan = choose_custom_plan(plansource, boundParams);
@@ -1192,6 +1334,8 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
 			/* We want a generic plan, and we already have a valid one */
 			plan = plansource->gplan;
 			Assert(plan->magic == CACHEDPLAN_MAGIC);
+			/* Reusing the existing plan, so not all locks may be acquired. */
+			plan->is_reused = true;
 		}
 		else
 		{
@@ -1654,7 +1798,7 @@ CachedPlanGetTargetList(CachedPlanSource *plansource,
 		return NIL;
 
 	/* Make sure the querytree list is valid and we have parse-time locks */
-	RevalidateCachedQuery(plansource, queryEnv);
+	RevalidateCachedQuery(plansource, queryEnv, true);
 
 	/* Get the primary statement and find out what it returns */
 	pstmt = QueryListGetPrimaryStmt(plansource->query_list);
@@ -1776,7 +1920,7 @@ AcquireExecutorLocks(List *stmt_list, bool acquire)
 	foreach(lc1, stmt_list)
 	{
 		PlannedStmt *plannedstmt = lfirst_node(PlannedStmt, lc1);
-		ListCell   *lc2;
+		int			rtindex;
 
 		if (plannedstmt->commandType == CMD_UTILITY)
 		{
@@ -1794,13 +1938,16 @@ AcquireExecutorLocks(List *stmt_list, bool acquire)
 			continue;
 		}
 
-		foreach(lc2, plannedstmt->rtable)
+		rtindex = -1;
+		while ((rtindex = bms_next_member(plannedstmt->unprunableRelids,
+										  rtindex)) >= 0)
 		{
-			RangeTblEntry *rte = (RangeTblEntry *) lfirst(lc2);
+			RangeTblEntry *rte = list_nth_node(RangeTblEntry,
+											   plannedstmt->rtable,
+											   rtindex - 1);
 
-			if (!(rte->rtekind == RTE_RELATION ||
-				  (rte->rtekind == RTE_SUBQUERY && OidIsValid(rte->relid))))
-				continue;
+			Assert(rte->rtekind == RTE_RELATION ||
+				   (rte->rtekind == RTE_SUBQUERY && OidIsValid(rte->relid)));
 
 			/*
 			 * Acquire the appropriate type of lock on each relation OID. Note
author	Amit Langote	2025-02-20 08:09:48 +0000
committer	Amit Langote	2025-02-20 08:09:48 +0000
commit	525392d5727f469e9a5882e1d728917a4be56147 (patch)
tree	d11c0f04fefc3c6d4bf91eb969a3d091d0d67979 /src/backend/utils/cache
parent	4aa6fa3cd0a2422f1bb47db837b82f2b53d4c065 (diff)