Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

Commit 31f38f2

Browse files
committed
Redesign the planner's handling of index-descent cost estimation.
Historically we've used a couple of very ad-hoc fudge factors to try to get the right results when indexes of different sizes would satisfy a query with the same number of index leaf tuples being visited. In commit 21a39de I tweaked one of these fudge factors, with results that proved disastrous for larger indexes. Commit bf01e34 fudged it some more, but still with not a lot of principle behind it. What seems like a better way to address these issues is to explicitly model index-descent costs, since that's what's really at stake when considering diferent indexes with similar leaf-page-level costs. We tried that once long ago, and found that charging random_page_cost per page descended through was way too much, because upper btree levels tend to stay in cache in real-world workloads. However, there's still CPU costs to think about, and the previous fudge factors can be seen as a crude attempt to account for those costs. So this patch replaces those fudge factors with explicit charges for the number of tuple comparisons needed to descend the index tree, plus a small charge per page touched in the descent. The cost multipliers are chosen so that the resulting charges are in the vicinity of the historical (pre-9.2) fudge factors for indexes of up to about a million tuples, while not ballooning unreasonably beyond that, as the old fudge factor did (even more so in 9.2). To make this work accurately for btree indexes, add some code that allows extraction of the known root-page height from a btree. There's no equivalent number readily available for other index types, but we can use the log of the number of index pages as an approximate substitute. This seems like too much of a behavioral change to risk back-patching, but it should improve matters going forward. In 9.2 I'll just revert the fudge-factor change.
1 parent e1b735a commit 31f38f2

File tree

6 files changed

+372
-118
lines changed

6 files changed

+372
-118
lines changed

src/backend/access/nbtree/nbtpage.c

+76
Original file line numberDiff line numberDiff line change
@@ -411,6 +411,82 @@ _bt_gettrueroot(Relation rel)
411411
return rootbuf;
412412
}
413413

414+
/*
415+
* _bt_getrootheight() -- Get the height of the btree search tree.
416+
*
417+
* We return the level (counting from zero) of the current fast root.
418+
* This represents the number of tree levels we'd have to descend through
419+
* to start any btree index search.
420+
*
421+
* This is used by the planner for cost-estimation purposes. Since it's
422+
* only an estimate, slightly-stale data is fine, hence we don't worry
423+
* about updating previously cached data.
424+
*/
425+
int
426+
_bt_getrootheight(Relation rel)
427+
{
428+
BTMetaPageData *metad;
429+
430+
/*
431+
* We can get what we need from the cached metapage data. If it's not
432+
* cached yet, load it. Sanity checks here must match _bt_getroot().
433+
*/
434+
if (rel->rd_amcache == NULL)
435+
{
436+
Buffer metabuf;
437+
Page metapg;
438+
BTPageOpaque metaopaque;
439+
440+
metabuf = _bt_getbuf(rel, BTREE_METAPAGE, BT_READ);
441+
metapg = BufferGetPage(metabuf);
442+
metaopaque = (BTPageOpaque) PageGetSpecialPointer(metapg);
443+
metad = BTPageGetMeta(metapg);
444+
445+
/* sanity-check the metapage */
446+
if (!(metaopaque->btpo_flags & BTP_META) ||
447+
metad->btm_magic != BTREE_MAGIC)
448+
ereport(ERROR,
449+
(errcode(ERRCODE_INDEX_CORRUPTED),
450+
errmsg("index \"%s\" is not a btree",
451+
RelationGetRelationName(rel))));
452+
453+
if (metad->btm_version != BTREE_VERSION)
454+
ereport(ERROR,
455+
(errcode(ERRCODE_INDEX_CORRUPTED),
456+
errmsg("version mismatch in index \"%s\": file version %d, code version %d",
457+
RelationGetRelationName(rel),
458+
metad->btm_version, BTREE_VERSION)));
459+
460+
/*
461+
* If there's no root page yet, _bt_getroot() doesn't expect a cache
462+
* to be made, so just stop here and report the index height is zero.
463+
* (XXX perhaps _bt_getroot() should be changed to allow this case.)
464+
*/
465+
if (metad->btm_root == P_NONE)
466+
{
467+
_bt_relbuf(rel, metabuf);
468+
return 0;
469+
}
470+
471+
/*
472+
* Cache the metapage data for next time
473+
*/
474+
rel->rd_amcache = MemoryContextAlloc(rel->rd_indexcxt,
475+
sizeof(BTMetaPageData));
476+
memcpy(rel->rd_amcache, metad, sizeof(BTMetaPageData));
477+
478+
_bt_relbuf(rel, metabuf);
479+
}
480+
481+
metad = (BTMetaPageData *) rel->rd_amcache;
482+
/* We shouldn't have cached it if any of these fail */
483+
Assert(metad->btm_magic == BTREE_MAGIC);
484+
Assert(metad->btm_version == BTREE_VERSION);
485+
Assert(metad->btm_fastroot != P_NONE);
486+
487+
return metad->btm_fastlevel;
488+
}
489+
414490
/*
415491
* _bt_checkpage() -- Verify that a freshly-read page looks sane.
416492
*/

src/backend/nodes/outfuncs.c

+3
Original file line numberDiff line numberDiff line change
@@ -1772,7 +1772,9 @@ _outIndexOptInfo(StringInfo str, const IndexOptInfo *node)
17721772
/* Do NOT print rel field, else infinite recursion */
17731773
WRITE_UINT_FIELD(pages);
17741774
WRITE_FLOAT_FIELD(tuples, "%.0f");
1775+
WRITE_INT_FIELD(tree_height);
17751776
WRITE_INT_FIELD(ncolumns);
1777+
/* array fields aren't really worth the trouble to print */
17761778
WRITE_OID_FIELD(relam);
17771779
/* indexprs is redundant since we print indextlist */
17781780
WRITE_NODE_FIELD(indpred);
@@ -1781,6 +1783,7 @@ _outIndexOptInfo(StringInfo str, const IndexOptInfo *node)
17811783
WRITE_BOOL_FIELD(unique);
17821784
WRITE_BOOL_FIELD(immediate);
17831785
WRITE_BOOL_FIELD(hypothetical);
1786+
/* we don't bother with fields copied from the pg_am entry */
17841787
}
17851788

17861789
static void

src/backend/optimizer/util/plancat.c

+12
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,7 @@
2020
#include "access/genam.h"
2121
#include "access/heapam.h"
2222
#include "access/htup_details.h"
23+
#include "access/nbtree.h"
2324
#include "access/sysattr.h"
2425
#include "access/transam.h"
2526
#include "access/xlog.h"
@@ -352,6 +353,17 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
352353
info->tuples = rel->tuples;
353354
}
354355

356+
if (info->relam == BTREE_AM_OID)
357+
{
358+
/* For btrees, get tree height while we have the index open */
359+
info->tree_height = _bt_getrootheight(indexRelation);
360+
}
361+
else
362+
{
363+
/* For other index types, just set it to "unknown" for now */
364+
info->tree_height = -1;
365+
}
366+
355367
index_close(indexRelation, NoLock);
356368

357369
indexinfos = lcons(info, indexinfos);

0 commit comments

Comments
 (0)