Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

Commit 18ae680

Browse files
committed
Fix and clarify comments on replacement selection.
These were modified by the patch to only use replacement selection for the first run in an external sort.
1 parent 5e1431f commit 18ae680

File tree

1 file changed

+16
-16
lines changed

1 file changed

+16
-16
lines changed

src/backend/utils/sort/tuplesort.c

+16-16
Original file line numberDiff line numberDiff line change
@@ -13,26 +13,26 @@
1313
* See Knuth, volume 3, for more than you want to know about the external
1414
* sorting algorithm. Historically, we divided the input into sorted runs
1515
* using replacement selection, in the form of a priority tree implemented
16-
* as a heap (essentially his Algorithm 5.2.3H -- although that strategy is
17-
* often avoided altogether), but that can now only happen first the first
18-
* run. We merge the runs using polyphase merge, Knuth's Algorithm
16+
* as a heap (essentially his Algorithm 5.2.3H), but now we only do that
17+
* for the first run, and only if the run would otherwise end up being very
18+
* short. We merge the runs using polyphase merge, Knuth's Algorithm
1919
* 5.4.2D. The logical "tapes" used by Algorithm D are implemented by
2020
* logtape.c, which avoids space wastage by recycling disk space as soon
2121
* as each block is read from its "tape".
2222
*
23-
* We never form the initial runs using Knuth's recommended replacement
24-
* selection data structure (Algorithm 5.4.1R), because it uses a fixed
25-
* number of records in memory at all times. Since we are dealing with
26-
* tuples that may vary considerably in size, we want to be able to vary
27-
* the number of records kept in memory to ensure full utilization of the
28-
* allowed sort memory space. So, we keep the tuples in a variable-size
29-
* heap, with the next record to go out at the top of the heap. Like
30-
* Algorithm 5.4.1R, each record is stored with the run number that it
31-
* must go into, and we use (run number, key) as the ordering key for the
32-
* heap. When the run number at the top of the heap changes, we know that
33-
* no more records of the prior run are left in the heap. Note that there
34-
* are in practice only ever two distinct run numbers, due to the greatly
35-
* reduced use of replacement selection in PostgreSQL 9.6.
23+
* We do not use Knuth's recommended data structure (Algorithm 5.4.1R) for
24+
* the replacement selection, because it uses a fixed number of records
25+
* in memory at all times. Since we are dealing with tuples that may vary
26+
* considerably in size, we want to be able to vary the number of records
27+
* kept in memory to ensure full utilization of the allowed sort memory
28+
* space. So, we keep the tuples in a variable-size heap, with the next
29+
* record to go out at the top of the heap. Like Algorithm 5.4.1R, each
30+
* record is stored with the run number that it must go into, and we use
31+
* (run number, key) as the ordering key for the heap. When the run number
32+
* at the top of the heap changes, we know that no more records of the prior
33+
* run are left in the heap. Note that there are in practice only ever two
34+
* distinct run numbers, because since PostgreSQL 9.6, we only use
35+
* replacement selection to form the first run.
3636
*
3737
* In PostgreSQL 9.6, a heap (based on Knuth's Algorithm H, with some small
3838
* customizations) is only used with the aim of producing just one run,

0 commit comments

Comments
 (0)