Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

Commit 43ceb3d

Browse files
committed
Further examination of ltsReleaseBlock usage shows that it's got a
performance issue during regular merge passes not only the 'final merge' case. The original design contemplated that there would never be more than about one free block per 'tape', hence no need for an efficient method of keeping the free blocks sorted. But given the later addition of merge preread behavior in tuplesort.c, there is likely to be about work_mem worth of free blocks, which is not so small ... and for that matter the number of tapes isn't necessarily small anymore either. So we'd better get rid of the assumption entirely. Instead, I'm assuming that the usage pattern will involve alternation between merge preread and writing of a new run. This makes it reasonable to just add blocks to the list without sorting during successive ltsReleaseBlock calls, and then do a qsort() when we start getting ltsGetFreeBlock() calls. Experimentation seems to confirm that there aren't many qsort calls relative to the number of ltsReleaseBlock/ltsGetFreeBlock calls.
1 parent 8db05ba commit 43ceb3d

File tree

1 file changed

+49
-20
lines changed

1 file changed

+49
-20
lines changed

src/backend/utils/sort/logtape.c

+49-20
Original file line numberDiff line numberDiff line change
@@ -30,8 +30,7 @@
3030
* For simplicity, we allocate and release space in the underlying file
3131
* in BLCKSZ-size blocks. Space allocation boils down to keeping track
3232
* of which blocks in the underlying file belong to which logical tape,
33-
* plus any blocks that are free (recycled and not yet reused). Normally
34-
* there are not very many free blocks, so we just keep those in a list.
33+
* plus any blocks that are free (recycled and not yet reused).
3534
* The blocks in each logical tape are remembered using a method borrowed
3635
* from the Unix HFS filesystem: we store data block numbers in an
3736
* "indirect block". If an indirect block fills up, we write it out to
@@ -53,6 +52,13 @@
5352
* not clear this helps much, but it can't hurt. (XXX perhaps a LIFO
5453
* policy for free blocks would be better?)
5554
*
55+
* To support the above policy of writing to the lowest free block,
56+
* ltsGetFreeBlock sorts the list of free block numbers into decreasing
57+
* order each time it is asked for a block and the list isn't currently
58+
* sorted. This is an efficient way to handle it because we expect cycles
59+
* of releasing many blocks followed by re-using many blocks, due to
60+
* tuplesort.c's "preread" behavior.
61+
*
5662
* Since all the bookkeeping and buffer memory is allocated with palloc(),
5763
* and the underlying file(s) are made with OpenTemporaryFile, all resources
5864
* for a logical tape set are certain to be cleaned up even if processing
@@ -64,7 +70,7 @@
6470
* Portions Copyright (c) 1994, Regents of the University of California
6571
*
6672
* IDENTIFICATION
67-
* $PostgreSQL: pgsql/src/backend/utils/sort/logtape.c,v 1.20 2006/03/07 19:06:49 tgl Exp $
73+
* $PostgreSQL: pgsql/src/backend/utils/sort/logtape.c,v 1.21 2006/03/07 23:46:24 tgl Exp $
6874
*
6975
*-------------------------------------------------------------------------
7076
*/
@@ -143,15 +149,19 @@ struct LogicalTapeSet
143149

144150
/*
145151
* We store the numbers of recycled-and-available blocks in freeBlocks[].
146-
* When there are no such blocks, we extend the underlying file. Note
147-
* that the block numbers in freeBlocks are always in *decreasing* order,
148-
* so that removing the last entry gives us the lowest free block.
152+
* When there are no such blocks, we extend the underlying file.
149153
*
150154
* If forgetFreeSpace is true then any freed blocks are simply forgotten
151155
* rather than being remembered in freeBlocks[]. See notes for
152156
* LogicalTapeSetForgetFreeSpace().
157+
*
158+
* If blocksSorted is true then the block numbers in freeBlocks are in
159+
* *decreasing* order, so that removing the last entry gives us the lowest
160+
* free block. We re-sort the blocks whenever a block is demanded; this
161+
* should be reasonably efficient given the expected usage pattern.
153162
*/
154163
bool forgetFreeSpace; /* are we remembering free blocks? */
164+
bool blocksSorted; /* is freeBlocks[] currently in order? */
155165
long *freeBlocks; /* resizable array */
156166
int nFreeBlocks; /* # of currently free blocks */
157167
int freeBlocksLen; /* current allocated length of freeBlocks[] */
@@ -223,6 +233,23 @@ ltsReadBlock(LogicalTapeSet *lts, long blocknum, void *buffer)
223233
blocknum)));
224234
}
225235

236+
/*
237+
* qsort comparator for sorting freeBlocks[] into decreasing order.
238+
*/
239+
static int
240+
freeBlocks_cmp(const void *a, const void *b)
241+
{
242+
long ablk = *((const long *) a);
243+
long bblk = *((const long *) b);
244+
245+
/* can't just subtract because long might be wider than int */
246+
if (ablk < bblk)
247+
return 1;
248+
if (ablk > bblk)
249+
return -1;
250+
return 0;
251+
}
252+
226253
/*
227254
* Select a currently unused block for writing to.
228255
*
@@ -234,11 +261,19 @@ ltsGetFreeBlock(LogicalTapeSet *lts)
234261
{
235262
/*
236263
* If there are multiple free blocks, we select the one appearing last in
237-
* freeBlocks[]. If there are none, assign the next block at the end of
238-
* the file.
264+
* freeBlocks[] (after sorting the array if needed). If there are none,
265+
* assign the next block at the end of the file.
239266
*/
240267
if (lts->nFreeBlocks > 0)
268+
{
269+
if (!lts->blocksSorted)
270+
{
271+
qsort((void *) lts->freeBlocks, lts->nFreeBlocks,
272+
sizeof(long), freeBlocks_cmp);
273+
lts->blocksSorted = true;
274+
}
241275
return lts->freeBlocks[--lts->nFreeBlocks];
276+
}
242277
else
243278
return lts->nFileBlocks++;
244279
}
@@ -250,7 +285,6 @@ static void
250285
ltsReleaseBlock(LogicalTapeSet *lts, long blocknum)
251286
{
252287
int ndx;
253-
long *ptr;
254288

255289
/*
256290
* Do nothing if we're no longer interested in remembering free space.
@@ -269,19 +303,13 @@ ltsReleaseBlock(LogicalTapeSet *lts, long blocknum)
269303
}
270304

271305
/*
272-
* Insert blocknum into array, preserving decreasing order (so that
273-
* ltsGetFreeBlock returns the lowest available block number). This could
274-
* get fairly slow if there were many free blocks, but we don't expect
275-
* there to be very many at one time.
306+
* Add blocknum to array, and mark the array unsorted if it's no longer
307+
* in decreasing order.
276308
*/
277309
ndx = lts->nFreeBlocks++;
278-
ptr = lts->freeBlocks + ndx;
279-
while (ndx > 0 && ptr[-1] < blocknum)
280-
{
281-
ptr[0] = ptr[-1];
282-
ndx--, ptr--;
283-
}
284-
ptr[0] = blocknum;
310+
lts->freeBlocks[ndx] = blocknum;
311+
if (ndx > 0 && lts->freeBlocks[ndx-1] < blocknum)
312+
lts->blocksSorted = false;
285313
}
286314

287315
/*
@@ -503,6 +531,7 @@ LogicalTapeSetCreate(int ntapes)
503531
lts->pfile = BufFileCreateTemp(false);
504532
lts->nFileBlocks = 0L;
505533
lts->forgetFreeSpace = false;
534+
lts->blocksSorted = true; /* a zero-length array is sorted ... */
506535
lts->freeBlocksLen = 32; /* reasonable initial guess */
507536
lts->freeBlocks = (long *) palloc(lts->freeBlocksLen * sizeof(long));
508537
lts->nFreeBlocks = 0;

0 commit comments

Comments
 (0)