Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

Commit 7170268

Browse files
committed
Improve ANALYZE's handling of concurrent-update scenarios.
This patch changes the rule for whether or not a tuple seen by ANALYZE should be included in its sample. When we last touched this logic, in commit 51e1445, we weren't thinking very hard about tuples being UPDATEd by a long-running concurrent transaction. In such a case, we might see the pre-image as either LIVE or DELETE_IN_PROGRESS depending on timing; and we might see the post-image not at all, or as INSERT_IN_PROGRESS. Since the existing code will not sample either DELETE_IN_PROGRESS or INSERT_IN_PROGRESS tuples, this leads to concurrently-updated rows being omitted from the sample entirely. That's not very helpful, and it's especially the wrong thing if the concurrent transaction ends up rolling back. The right thing seems to be to sample DELETE_IN_PROGRESS rows just as if they were live. This makes the "sample it" and "count it" decisions the same, which seems good for consistency. It's clearly the right thing if the concurrent transaction ends up rolling back; in effect, we are sampling as though IN_PROGRESS transactions haven't happened yet. Also, this combination of choices ensures maximum robustness against the different combinations of whether and in which state we might see the pre- and post-images of an update. It's slightly annoying that we end up recording immediately-out-of-date stats in the case where the transaction does commit, but on the other hand the stats are fine for columns that didn't change in the update. And the alternative of sampling INSERT_IN_PROGRESS rows instead seems like a bad idea, because then the sampling would be inconsistent with the way rows are counted for the stats report. Per report from Mark Chambers; thanks to Jeff Janes for diagnosing what was happening. Back-patch to all supported versions. Discussion: https://postgr.es/m/CAFh58O_Myr6G3tcH3gcGrF-=OExB08PJdWZcSBcEcovaiPsrHA@mail.gmail.com
1 parent 68a13f2 commit 7170268

File tree

1 file changed

+18
-3
lines changed

1 file changed

+18
-3
lines changed

src/backend/commands/analyze.c

Lines changed: 18 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1124,20 +1124,35 @@ acquire_sample_rows(Relation onerel, int elevel,
11241124
case HEAPTUPLE_DELETE_IN_PROGRESS:
11251125

11261126
/*
1127-
* We count delete-in-progress rows as still live, using
1128-
* the same reasoning given above; but we don't bother to
1129-
* include them in the sample.
1127+
* We count and sample delete-in-progress rows the same as
1128+
* live ones, so that the stats counters come out right if
1129+
* the deleting transaction commits after us, per the same
1130+
* reasoning given above.
11301131
*
11311132
* If the delete was done by our own transaction, however,
11321133
* we must count the row as dead to make
11331134
* pgstat_report_analyze's stats adjustments come out
11341135
* right. (Note: this works out properly when the row was
11351136
* both inserted and deleted in our xact.)
1137+
*
1138+
* The net effect of these choices is that we act as
1139+
* though an IN_PROGRESS transaction hasn't happened yet,
1140+
* except if it is our own transaction, which we assume
1141+
* has happened.
1142+
*
1143+
* This approach ensures that we behave sanely if we see
1144+
* both the pre-image and post-image rows for a row being
1145+
* updated by a concurrent transaction: we will sample the
1146+
* pre-image but not the post-image. We also get sane
1147+
* results if the concurrent transaction never commits.
11361148
*/
11371149
if (TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetUpdateXid(targtuple.t_data)))
11381150
deadrows += 1;
11391151
else
1152+
{
1153+
sample_it = true;
11401154
liverows += 1;
1155+
}
11411156
break;
11421157

11431158
default:

0 commit comments

Comments
 (0)