Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

Commit df700e6

Browse files
committed
Improve tuplesort.c to support variable merge order. The original coding
with fixed merge order (fixed number of "tapes") was based on obsolete assumptions, namely that tape drives are expensive. Since our "tapes" are really just a couple of buffers, we can have a lot of them given adequate workspace. This allows reduction of the number of merge passes with consequent savings of I/O during large sorts. Simon Riggs with some rework by Tom Lane
1 parent 85c0eac commit df700e6

File tree

3 files changed

+169
-63
lines changed

3 files changed

+169
-63
lines changed

src/backend/optimizer/path/costsize.c

Lines changed: 12 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -49,7 +49,7 @@
4949
* Portions Copyright (c) 1994, Regents of the University of California
5050
*
5151
* IDENTIFICATION
52-
* $PostgreSQL: pgsql/src/backend/optimizer/path/costsize.c,v 1.153 2006/02/05 02:59:16 tgl Exp $
52+
* $PostgreSQL: pgsql/src/backend/optimizer/path/costsize.c,v 1.154 2006/02/19 05:54:06 tgl Exp $
5353
*
5454
*-------------------------------------------------------------------------
5555
*/
@@ -70,10 +70,10 @@
7070
#include "utils/selfuncs.h"
7171
#include "utils/lsyscache.h"
7272
#include "utils/syscache.h"
73+
#include "utils/tuplesort.h"
7374

7475

7576
#define LOG2(x) (log(x) / 0.693147180559945)
76-
#define LOG6(x) (log(x) / 1.79175946922805)
7777

7878
/*
7979
* Some Paths return less than the nominal number of rows of their parent
@@ -767,11 +767,10 @@ cost_functionscan(Path *path, PlannerInfo *root, RelOptInfo *baserel)
767767
* If the total volume exceeds work_mem, we switch to a tape-style merge
768768
* algorithm. There will still be about t*log2(t) tuple comparisons in
769769
* total, but we will also need to write and read each tuple once per
770-
* merge pass. We expect about ceil(log6(r)) merge passes where r is the
771-
* number of initial runs formed (log6 because tuplesort.c uses six-tape
772-
* merging). Since the average initial run should be about twice work_mem,
773-
* we have
774-
* disk traffic = 2 * relsize * ceil(log6(p / (2*work_mem)))
770+
* merge pass. We expect about ceil(logM(r)) merge passes where r is the
771+
* number of initial runs formed and M is the merge order used by tuplesort.c.
772+
* Since the average initial run should be about twice work_mem, we have
773+
* disk traffic = 2 * relsize * ceil(logM(p / (2*work_mem)))
775774
* cpu = comparison_cost * t * log2(t)
776775
*
777776
* The disk traffic is assumed to be half sequential and half random
@@ -824,10 +823,14 @@ cost_sort(Path *path, PlannerInfo *root,
824823
{
825824
double npages = ceil(nbytes / BLCKSZ);
826825
double nruns = (nbytes / work_mem_bytes) * 0.5;
827-
double log_runs = ceil(LOG6(nruns));
826+
double mergeorder = tuplesort_merge_order(work_mem_bytes);
827+
double log_runs;
828828
double npageaccesses;
829829

830-
if (log_runs < 1.0)
830+
/* Compute logM(r) as log(r) / log(M) */
831+
if (nruns > mergeorder)
832+
log_runs = ceil(log(nruns) / log(mergeorder));
833+
else
831834
log_runs = 1.0;
832835
npageaccesses = 2.0 * npages * log_runs;
833836
/* Assume half are sequential (cost 1), half are not */

0 commit comments

Comments
 (0)