Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

Commit 44e95b5

Browse files
committed
Fix array size allocation for HashAggregate hash keys.
When there were duplicate columns in the hash key list, the array sizes could be miscomputed, resulting in access off the end of the array. Adjust the computation to ensure the array is always large enough. (I considered whether the duplicates could be removed in planning, but I can't rule out the possibility that duplicate columns might have different hash functions assigned. Simpler to just make sure it works at execution time regardless.) Bug apparently introduced in fc4b3de as part of narrowing down the tuples stored in the hashtable. Reported by Colm McHugh of Salesforce, though I didn't use their patch. Backpatch back to version 10 where the bug was introduced. Discussion: https://postgr.es/m/CAFeeJoKKu0u+A_A9R9316djW-YW3-+Gtgvy3ju655qRHR3jtdA@mail.gmail.com
1 parent 156c0c2 commit 44e95b5

File tree

3 files changed

+47
-7
lines changed

3 files changed

+47
-7
lines changed

src/backend/executor/nodeAgg.c

Lines changed: 22 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1312,9 +1312,14 @@ build_hash_table(AggState *aggstate)
13121312
* by themselves, and secondly ctids for row-marks.
13131313
*
13141314
* To eliminate duplicates, we build a bitmapset of the needed columns, and
1315-
* then build an array of the columns included in the hashtable. Note that
1316-
* the array is preserved over ExecReScanAgg, so we allocate it in the
1317-
* per-query context (unlike the hash table itself).
1315+
* then build an array of the columns included in the hashtable. We might
1316+
* still have duplicates if the passed-in grpColIdx has them, which can happen
1317+
* in edge cases from semijoins/distinct; these can't always be removed,
1318+
* because it's not certain that the duplicate cols will be using the same
1319+
* hash function.
1320+
*
1321+
* Note that the array is preserved over ExecReScanAgg, so we allocate it in
1322+
* the per-query context (unlike the hash table itself).
13181323
*/
13191324
static void
13201325
find_hash_columns(AggState *aggstate)
@@ -1335,6 +1340,7 @@ find_hash_columns(AggState *aggstate)
13351340
AttrNumber *grpColIdx = perhash->aggnode->grpColIdx;
13361341
List *hashTlist = NIL;
13371342
TupleDesc hashDesc;
1343+
int maxCols;
13381344
int i;
13391345

13401346
perhash->largestGrpColIdx = 0;
@@ -1359,15 +1365,24 @@ find_hash_columns(AggState *aggstate)
13591365
colnos = bms_del_member(colnos, attnum);
13601366
}
13611367
}
1362-
/* Add in all the grouping columns */
1363-
for (i = 0; i < perhash->numCols; i++)
1364-
colnos = bms_add_member(colnos, grpColIdx[i]);
1368+
1369+
/*
1370+
* Compute maximum number of input columns accounting for possible
1371+
* duplications in the grpColIdx array, which can happen in some edge
1372+
* cases where HashAggregate was generated as part of a semijoin or a
1373+
* DISTINCT.
1374+
*/
1375+
maxCols = bms_num_members(colnos) + perhash->numCols;
13651376

13661377
perhash->hashGrpColIdxInput =
1367-
palloc(bms_num_members(colnos) * sizeof(AttrNumber));
1378+
palloc(maxCols * sizeof(AttrNumber));
13681379
perhash->hashGrpColIdxHash =
13691380
palloc(perhash->numCols * sizeof(AttrNumber));
13701381

1382+
/* Add all the grouping columns to colnos */
1383+
for (i = 0; i < perhash->numCols; i++)
1384+
colnos = bms_add_member(colnos, grpColIdx[i]);
1385+
13711386
/*
13721387
* First build mapping for columns directly hashed. These are the
13731388
* first, because they'll be accessed when computing hash values and

src/test/regress/expected/aggregates.out

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2270,3 +2270,21 @@ select v||'a', case when v||'a' = 'aa' then 1 else 0 end, count(*)
22702270
ba | 0 | 1
22712271
(2 rows)
22722272

2273+
-- Make sure that generation of HashAggregate for uniqification purposes
2274+
-- does not lead to array overflow due to unexpected duplicate hash keys
2275+
-- see CAFeeJoKKu0u+A_A9R9316djW-YW3-+Gtgvy3ju655qRHR3jtdA@mail.gmail.com
2276+
explain (costs off)
2277+
select 1 from tenk1
2278+
where (hundred, thousand) in (select twothousand, twothousand from onek);
2279+
QUERY PLAN
2280+
-------------------------------------------------------------
2281+
Hash Join
2282+
Hash Cond: (tenk1.hundred = onek.twothousand)
2283+
-> Seq Scan on tenk1
2284+
Filter: (hundred = thousand)
2285+
-> Hash
2286+
-> HashAggregate
2287+
Group Key: onek.twothousand, onek.twothousand
2288+
-> Seq Scan on onek
2289+
(8 rows)
2290+

src/test/regress/sql/aggregates.sql

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -988,3 +988,10 @@ select v||'a', case v||'a' when 'aa' then 1 else 0 end, count(*)
988988
select v||'a', case when v||'a' = 'aa' then 1 else 0 end, count(*)
989989
from unnest(array['a','b']) u(v)
990990
group by v||'a' order by 1;
991+
992+
-- Make sure that generation of HashAggregate for uniqification purposes
993+
-- does not lead to array overflow due to unexpected duplicate hash keys
994+
-- see CAFeeJoKKu0u+A_A9R9316djW-YW3-+Gtgvy3ju655qRHR3jtdA@mail.gmail.com
995+
explain (costs off)
996+
select 1 from tenk1
997+
where (hundred, thousand) in (select twothousand, twothousand from onek);

0 commit comments

Comments
 (0)