Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

Commit fc4b3de

Browse files
committed
User narrower representative tuples in the hash-agg hashtable.
So far the hashtable stored representative tuples in the form of its input slot, with all columns in the hashtable that are not needed (i.e. not grouped upon or functionally dependent) set to NULL. Thats good for saving memory, but it turns out that having tuples full of NULL isn't free. slot_deform_tuple is faster if there's no NULL bitmap even if no NULLs are encountered, and skipping over leading NULLs isn't free. So compute a separate tuple descriptor that only contains the needed columns. As columns have already been moved in/out the slot for the hashtable that does not imply additional per-row overhead. Author: Andres Freund Reviewed-By: Heikki Linnakangas Discussion: https://postgr.es/m/20161103110721.h5i5t5saxfk5eeik@alap3.anarazel.de
1 parent 8ed3f11 commit fc4b3de

File tree

2 files changed

+112
-57
lines changed

2 files changed

+112
-57
lines changed

src/backend/executor/nodeAgg.c

+108-56
Original file line numberDiff line numberDiff line change
@@ -1717,7 +1717,7 @@ build_hash_table(AggState *aggstate)
17171717
additionalsize = aggstate->numaggs * sizeof(AggStatePerGroupData);
17181718

17191719
aggstate->hashtable = BuildTupleHashTable(node->numCols,
1720-
node->grpColIdx,
1720+
aggstate->hashGrpColIdxHash,
17211721
aggstate->phase->eqfunctions,
17221722
aggstate->hashfunctions,
17231723
node->numGroups,
@@ -1727,47 +1727,88 @@ build_hash_table(AggState *aggstate)
17271727
}
17281728

17291729
/*
1730-
* Create a list of the tuple columns that actually need to be stored in
1731-
* hashtable entries. The incoming tuples from the child plan node will
1732-
* contain grouping columns, other columns referenced in our targetlist and
1733-
* qual, columns used to compute the aggregate functions, and perhaps just
1734-
* junk columns we don't use at all. Only columns of the first two types
1735-
* need to be stored in the hashtable, and getting rid of the others can
1736-
* make the table entries significantly smaller. To avoid messing up Var
1737-
* numbering, we keep the same tuple descriptor for hashtable entries as the
1738-
* incoming tuples have, but set unwanted columns to NULL in the tuples that
1739-
* go into the table.
1740-
*
1741-
* To eliminate duplicates, we build a bitmapset of the needed columns, then
1742-
* convert it to an integer list (cheaper to scan at runtime). The list is
1743-
* in decreasing order so that the first entry is the largest;
1744-
* lookup_hash_entry depends on this to use slot_getsomeattrs correctly.
1745-
* Note that the list is preserved over ExecReScanAgg, so we allocate it in
1746-
* the per-query context (unlike the hash table itself).
1747-
*
1748-
* Note: at present, searching the tlist/qual is not really necessary since
1749-
* the parser should disallow any unaggregated references to ungrouped
1750-
* columns. However, the search will be needed when we add support for
1751-
* SQL99 semantics that allow use of "functionally dependent" columns that
1752-
* haven't been explicitly grouped by.
1730+
* Compute columns that actually need to be stored in hashtable entries. The
1731+
* incoming tuples from the child plan node will contain grouping columns,
1732+
* other columns referenced in our targetlist and qual, columns used to
1733+
* compute the aggregate functions, and perhaps just junk columns we don't use
1734+
* at all. Only columns of the first two types need to be stored in the
1735+
* hashtable, and getting rid of the others can make the table entries
1736+
* significantly smaller. The hashtable only contains the relevant columns,
1737+
* and is packed/unpacked in lookup_hash_entry() / agg_retrieve_hash_table()
1738+
* into the format of the normal input descriptor.
1739+
*
1740+
* Additional columns, in addition to the columns grouped by, come from two
1741+
* sources: Firstly functionally dependent columns that we don't need to group
1742+
* by themselves, and secondly ctids for row-marks.
1743+
*
1744+
* To eliminate duplicates, we build a bitmapset of the needed columns, and
1745+
* then build an array of the columns included in the hashtable. Note that
1746+
* the array is preserved over ExecReScanAgg, so we allocate it in the
1747+
* per-query context (unlike the hash table itself).
17531748
*/
17541749
static List *
17551750
find_hash_columns(AggState *aggstate)
17561751
{
17571752
Agg *node = (Agg *) aggstate->ss.ps.plan;
17581753
Bitmapset *colnos;
17591754
List *collist;
1755+
TupleDesc hashDesc;
1756+
List *outerTlist = outerPlanState(aggstate)->plan->targetlist;
1757+
List *hashTlist = NIL;
17601758
int i;
17611759

1760+
aggstate->largestGrpColIdx = 0;
1761+
17621762
/* Find Vars that will be needed in tlist and qual */
17631763
colnos = find_unaggregated_cols(aggstate);
17641764
/* Add in all the grouping columns */
17651765
for (i = 0; i < node->numCols; i++)
17661766
colnos = bms_add_member(colnos, node->grpColIdx[i]);
17671767
/* Convert to list, using lcons so largest element ends up first */
17681768
collist = NIL;
1769+
1770+
aggstate->hashGrpColIdxInput =
1771+
palloc(bms_num_members(colnos) * sizeof(AttrNumber));
1772+
aggstate->hashGrpColIdxHash =
1773+
palloc(node->numCols * sizeof(AttrNumber));
1774+
1775+
/*
1776+
* First build mapping for columns directly hashed. These are the first,
1777+
* because they'll be accessed when computing hash values and comparing
1778+
* tuples for exact matches. We also build simple mapping for
1779+
* execGrouping, so it knows where to find the to-be-hashed / compared
1780+
* columns in the input.
1781+
*/
1782+
for (i = 0; i < node->numCols; i++)
1783+
{
1784+
aggstate->hashGrpColIdxInput[i] = node->grpColIdx[i];
1785+
aggstate->hashGrpColIdxHash[i] = i + 1;
1786+
aggstate->numhashGrpCols++;
1787+
/* delete already mapped columns */
1788+
bms_del_member(colnos, node->grpColIdx[i]);
1789+
}
1790+
1791+
/* and add the remaining columns */
17691792
while ((i = bms_first_member(colnos)) >= 0)
1770-
collist = lcons_int(i, collist);
1793+
{
1794+
aggstate->hashGrpColIdxInput[aggstate->numhashGrpCols] = i;
1795+
aggstate->numhashGrpCols++;
1796+
}
1797+
1798+
/* and build a tuple descriptor for the hashtable */
1799+
for (i = 0; i < aggstate->numhashGrpCols; i++)
1800+
{
1801+
int varNumber = aggstate->hashGrpColIdxInput[i] - 1;
1802+
1803+
hashTlist = lappend(hashTlist, list_nth(outerTlist, varNumber));
1804+
aggstate->largestGrpColIdx =
1805+
Max(varNumber + 1, aggstate->largestGrpColIdx);
1806+
}
1807+
1808+
hashDesc = ExecTypeFromTL(hashTlist, false);
1809+
ExecSetSlotDescriptor(aggstate->hashslot, hashDesc);
1810+
1811+
list_free(hashTlist);
17711812
bms_free(colnos);
17721813

17731814
return collist;
@@ -1804,27 +1845,22 @@ static TupleHashEntryData *
18041845
lookup_hash_entry(AggState *aggstate, TupleTableSlot *inputslot)
18051846
{
18061847
TupleTableSlot *hashslot = aggstate->hashslot;
1807-
ListCell *l;
18081848
TupleHashEntryData *entry;
18091849
bool isnew;
1810-
1811-
/* if first time through, initialize hashslot by cloning input slot */
1812-
if (hashslot->tts_tupleDescriptor == NULL)
1813-
{
1814-
ExecSetSlotDescriptor(hashslot, inputslot->tts_tupleDescriptor);
1815-
/* Make sure all unused columns are NULLs */
1816-
ExecStoreAllNullTuple(hashslot);
1817-
}
1850+
int i;
18181851

18191852
/* transfer just the needed columns into hashslot */
1820-
slot_getsomeattrs(inputslot, linitial_int(aggstate->hash_needed));
1821-
foreach(l, aggstate->hash_needed)
1853+
slot_getsomeattrs(inputslot, aggstate->largestGrpColIdx);
1854+
ExecClearTuple(hashslot);
1855+
1856+
for (i = 0; i < aggstate->numhashGrpCols; i++)
18221857
{
1823-
int varNumber = lfirst_int(l) - 1;
1858+
int varNumber = aggstate->hashGrpColIdxInput[i] - 1;
18241859

1825-
hashslot->tts_values[varNumber] = inputslot->tts_values[varNumber];
1826-
hashslot->tts_isnull[varNumber] = inputslot->tts_isnull[varNumber];
1860+
hashslot->tts_values[i] = inputslot->tts_values[varNumber];
1861+
hashslot->tts_isnull[i] = inputslot->tts_isnull[varNumber];
18271862
}
1863+
ExecStoreVirtualTuple(hashslot);
18281864

18291865
/* find or create the hashtable entry using the filtered tuple */
18301866
entry = LookupTupleHashEntry(aggstate->hashtable, hashslot, &isnew);
@@ -2286,6 +2322,7 @@ agg_retrieve_hash_table(AggState *aggstate)
22862322
TupleHashEntryData *entry;
22872323
TupleTableSlot *firstSlot;
22882324
TupleTableSlot *result;
2325+
TupleTableSlot *hashslot;
22892326

22902327
/*
22912328
* get state info from node
@@ -2294,13 +2331,17 @@ agg_retrieve_hash_table(AggState *aggstate)
22942331
econtext = aggstate->ss.ps.ps_ExprContext;
22952332
peragg = aggstate->peragg;
22962333
firstSlot = aggstate->ss.ss_ScanTupleSlot;
2334+
hashslot = aggstate->hashslot;
2335+
22972336

22982337
/*
22992338
* We loop retrieving groups until we find one satisfying
23002339
* aggstate->ss.ps.qual
23012340
*/
23022341
while (!aggstate->agg_done)
23032342
{
2343+
int i;
2344+
23042345
/*
23052346
* Find the next entry in the hash table
23062347
*/
@@ -2322,12 +2363,24 @@ agg_retrieve_hash_table(AggState *aggstate)
23222363
ResetExprContext(econtext);
23232364

23242365
/*
2325-
* Store the copied first input tuple in the tuple table slot reserved
2326-
* for it, so that it can be used in ExecProject.
2366+
* Transform representative tuple back into one with the right
2367+
* columns.
23272368
*/
2328-
ExecStoreMinimalTuple(entry->firstTuple,
2329-
firstSlot,
2330-
false);
2369+
ExecStoreMinimalTuple(entry->firstTuple, hashslot, false);
2370+
slot_getallattrs(hashslot);
2371+
2372+
ExecClearTuple(firstSlot);
2373+
memset(firstSlot->tts_isnull, true,
2374+
firstSlot->tts_tupleDescriptor->natts * sizeof(bool));
2375+
2376+
for (i = 0; i < aggstate->numhashGrpCols; i++)
2377+
{
2378+
int varNumber = aggstate->hashGrpColIdxInput[i] - 1;
2379+
2380+
firstSlot->tts_values[varNumber] = hashslot->tts_values[i];
2381+
firstSlot->tts_isnull[varNumber] = hashslot->tts_isnull[i];
2382+
}
2383+
ExecStoreVirtualTuple(firstSlot);
23312384

23322385
pergroup = (AggStatePerGroup) entry->additional;
23332386

@@ -2604,16 +2657,6 @@ ExecInitAgg(Agg *node, EState *estate, int eflags)
26042657
while ((i = bms_next_member(all_grouped_cols, i)) >= 0)
26052658
aggstate->all_grouped_cols = lcons_int(i, aggstate->all_grouped_cols);
26062659

2607-
/*
2608-
* Hashing can only appear in the initial phase.
2609-
*/
2610-
2611-
if (node->aggstrategy == AGG_HASHED)
2612-
execTuplesHashPrepare(node->numCols,
2613-
node->grpOperators,
2614-
&aggstate->phases[0].eqfunctions,
2615-
&aggstate->hashfunctions);
2616-
26172660
/*
26182661
* Initialize current phase-dependent values to initial phase
26192662
*/
@@ -2635,12 +2678,21 @@ ExecInitAgg(Agg *node, EState *estate, int eflags)
26352678
aggstate->peragg = peraggs;
26362679
aggstate->pertrans = pertransstates;
26372680

2681+
2682+
/*
2683+
* Hashing can only appear in the initial phase.
2684+
*/
26382685
if (node->aggstrategy == AGG_HASHED)
26392686
{
2687+
find_hash_columns(aggstate);
2688+
2689+
execTuplesHashPrepare(node->numCols,
2690+
node->grpOperators,
2691+
&aggstate->phases[0].eqfunctions,
2692+
&aggstate->hashfunctions);
2693+
26402694
build_hash_table(aggstate);
26412695
aggstate->table_filled = false;
2642-
/* Compute the columns we actually need to hash on */
2643-
aggstate->hash_needed = find_hash_columns(aggstate);
26442696
}
26452697
else
26462698
{

src/include/nodes/execnodes.h

+4-1
Original file line numberDiff line numberDiff line change
@@ -1860,7 +1860,10 @@ typedef struct AggState
18601860
/* these fields are used in AGG_HASHED mode: */
18611861
TupleHashTable hashtable; /* hash table with one entry per group */
18621862
TupleTableSlot *hashslot; /* slot for loading hash table */
1863-
List *hash_needed; /* list of columns needed in hash table */
1863+
int numhashGrpCols; /* number of columns in hash table */
1864+
int largestGrpColIdx; /* largest column required for hashing */
1865+
AttrNumber *hashGrpColIdxInput; /* and their indices in input slot */
1866+
AttrNumber *hashGrpColIdxHash; /* indices for execGrouping in hashtbl */
18641867
bool table_filled; /* hash table filled yet? */
18651868
TupleHashIterator hashiter; /* for iterating through hash table */
18661869
/* support for evaluation of agg inputs */

0 commit comments

Comments
 (0)