@@ -214,6 +214,34 @@ page). Since we hold a lock on the lower page (per L&Y) until we have
214
214
re-found the parent item that links to it, we can be assured that the
215
215
parent item does still exist and can't have been deleted.
216
216
217
+ VACUUM's linear scan, concurrent page splits
218
+ --------------------------------------------
219
+
220
+ VACUUM accesses the index by doing a linear scan to search for deletable
221
+ TIDs, while considering the possibility of deleting empty pages in
222
+ passing. This is in physical/block order, not logical/keyspace order.
223
+ The tricky part of this is avoiding missing any deletable tuples in the
224
+ presence of concurrent page splits: a page split could easily move some
225
+ tuples from a page not yet passed over by the sequential scan to a
226
+ lower-numbered page already passed over.
227
+
228
+ To implement this, we provide a "vacuum cycle ID" mechanism that makes it
229
+ possible to determine whether a page has been split since the current
230
+ btbulkdelete cycle started. If btbulkdelete finds a page that has been
231
+ split since it started, and has a right-link pointing to a lower page
232
+ number, then it temporarily suspends its sequential scan and visits that
233
+ page instead. It must continue to follow right-links and vacuum dead
234
+ tuples until reaching a page that either hasn't been split since
235
+ btbulkdelete started, or is above the location of the outer sequential
236
+ scan. Then it can resume the sequential scan. This ensures that all
237
+ tuples are visited. It may be that some tuples are visited twice, but
238
+ that has no worse effect than an inaccurate index tuple count (and we
239
+ can't guarantee an accurate count anyway in the face of concurrent
240
+ activity). Note that this still works if the has-been-recently-split test
241
+ has a small probability of false positives, so long as it never gives a
242
+ false negative. This makes it possible to implement the test with a small
243
+ counter value stored on each index page.
244
+
217
245
Deleting entire pages during VACUUM
218
246
-----------------------------------
219
247
@@ -371,33 +399,6 @@ as part of the atomic update for the delete (either way, the metapage has
371
399
to be the last page locked in the update to avoid deadlock risks). This
372
400
avoids race conditions if two such operations are executing concurrently.
373
401
374
- VACUUM needs to do a linear scan of an index to search for deleted pages
375
- that can be reclaimed because they are older than all open transactions.
376
- For efficiency's sake, we'd like to use the same linear scan to search for
377
- deletable tuples. Before Postgres 8.2, btbulkdelete scanned the leaf pages
378
- in index order, but it is possible to visit them in physical order instead.
379
- The tricky part of this is to avoid missing any deletable tuples in the
380
- presence of concurrent page splits: a page split could easily move some
381
- tuples from a page not yet passed over by the sequential scan to a
382
- lower-numbered page already passed over. (This wasn't a concern for the
383
- index-order scan, because splits always split right.) To implement this,
384
- we provide a "vacuum cycle ID" mechanism that makes it possible to
385
- determine whether a page has been split since the current btbulkdelete
386
- cycle started. If btbulkdelete finds a page that has been split since
387
- it started, and has a right-link pointing to a lower page number, then
388
- it temporarily suspends its sequential scan and visits that page instead.
389
- It must continue to follow right-links and vacuum dead tuples until
390
- reaching a page that either hasn't been split since btbulkdelete started,
391
- or is above the location of the outer sequential scan. Then it can resume
392
- the sequential scan. This ensures that all tuples are visited. It may be
393
- that some tuples are visited twice, but that has no worse effect than an
394
- inaccurate index tuple count (and we can't guarantee an accurate count
395
- anyway in the face of concurrent activity). Note that this still works
396
- if the has-been-recently-split test has a small probability of false
397
- positives, so long as it never gives a false negative. This makes it
398
- possible to implement the test with a small counter value stored on each
399
- index page.
400
-
401
402
Fastpath For Index Insertion
402
403
----------------------------
403
404
0 commit comments