Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

Commit baba60a

Browse files
tvondraCommitfest Bot
authored and
Commitfest Bot
committed
Postpone hashtable growth instead of disabling it
After increasing the number of batches and splitting the current one, we used to disable further growth if all tuples went into only one of the two new batches. It's possible to construct cases when this leads to disabling growth prematurely - maybe we can't split the batch now, but that doesn't mean we could not split it later. This generally requires underestimated size of the inner relation, so that we need to increase the number of batches. And then also hashes non-random in some way. There may be a "common" prefix, or maybe the data is just correlated in some way. So instead of hard-disabling the growth permanently, double the memory limit so that we retry the split after processing more data. Doubling the limit is somewhat arbitrary - it's the earliest when we could split the batch in half even if all the current tuples have duplicate hashes. But we could pick any other value, to retry sooner/later.
1 parent 1b09197 commit baba60a

File tree

3 files changed

+62
-1
lines changed

3 files changed

+62
-1
lines changed

src/backend/executor/nodeHash.c

Lines changed: 49 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -81,6 +81,9 @@ static void ExecParallelHashMergeCounters(HashJoinTable hashtable);
8181
static void ExecParallelHashCloseBatchAccessors(HashJoinTable hashtable);
8282

8383

84+
/* enable soft-disable of nbatch growth */
85+
bool enable_hashjoin_growth = false;
86+
8487
/* ----------------------------------------------------------------
8588
* ExecHash
8689
*
@@ -1172,10 +1175,55 @@ ExecHashIncreaseNumBatches(HashJoinTable hashtable)
11721175
* Increasing nbatch will not fix it since there's no way to subdivide the
11731176
* group any more finely. We have to just gut it out and hope the server
11741177
* has enough RAM.
1178+
*
1179+
* XXX This logic for hard-disabling nbatch growth assumes that if we're
1180+
* unable to split the batch now, we'll be unable to split it forever. That
1181+
* works well for "nice" random data sets, but it's not difficult to come
1182+
* up with cases where the assumption does not hold.
1183+
*
1184+
* 1) The hashes may share the same "prefix", but the next bit may be
1185+
* random - and doubling the number of batches again would split the batch
1186+
* about evenly. This is rather unlikely to happen by chance, of course.
1187+
*
1188+
* 2) The dataset may be correlated in some way, i.e. produce values with
1189+
* one hash value first, before producing other values. This might happen
1190+
* for data read from index, etc. If we underestimate the amount of data
1191+
* we will need to add to the hash, this may disable nbatch growth while
1192+
* still reading the first value, i.e. too soon.
1193+
*
1194+
* XXX With the "hard disable" this also handles the case when we run out
1195+
* of hash bits, because the first doubling after that only has "0" in the
1196+
* new bit, which means we get (nfreed == ninmemory) and stop adding more
1197+
* batches. But with the soft disable that's no longer the case, and we'd
1198+
* try adding more batches after a while. Not sure this a big deal, as the
1199+
* memory limit would need to be pretty substantial in that case already
1200+
* (assuming we already doubled the limit multiple times). But maybe t
1201+
* could be the first time we're hitting this condition? Perhaps the right
1202+
* solution is to detect running out of hash bits explicitly, and just do
1203+
* a "hard disable" in this case?
1204+
*
1205+
* XXX This should probably also increase the number of buckets etc.
11751206
*/
11761207
if (nfreed == 0 || nfreed == ninmemory)
11771208
{
1178-
hashtable->growEnabled = false;
1209+
/*
1210+
* If enable_hashjoin_grow=true, don't disable the growth permanently,
1211+
* and instead just increase the limit in the hope that we'll be able
1212+
* to split the batches in the future.
1213+
*
1214+
* XXX Not sure if 2.0 is the optimal growth factor, but it seems quite
1215+
* reasonable because it aligns with the expectation that we'll be able
1216+
* to split the memory usage in half. It's not perfect, because if we
1217+
* add a random bit to the hash, we might split earlier than that.
1218+
*/
1219+
if (enable_hashjoin_growth)
1220+
{
1221+
/* XXX Do we need to worry about overlowing spaceAllowed? */
1222+
hashtable->spaceAllowed *= 2.0;
1223+
}
1224+
else
1225+
hashtable->growEnabled = false;
1226+
11791227
#ifdef HJDEBUG
11801228
printf("Hashjoin %p: disabling further increase of nbatch\n",
11811229
hashtable);

src/backend/utils/misc/guc_tables.c

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -46,6 +46,7 @@
4646
#include "commands/vacuum.h"
4747
#include "common/file_utils.h"
4848
#include "common/scram-common.h"
49+
#include "executor/hashjoin.h"
4950
#include "jit/jit.h"
5051
#include "libpq/auth.h"
5152
#include "libpq/libpq.h"
@@ -901,6 +902,16 @@ struct config_bool ConfigureNamesBool[] =
901902
true,
902903
NULL, NULL, NULL
903904
},
905+
{
906+
{"enable_hashjoin_growth", PGC_USERSET, QUERY_TUNING_METHOD,
907+
gettext_noop("Enables growing the hash table memory limit instead of disabling batch increases permanently."),
908+
NULL,
909+
GUC_EXPLAIN
910+
},
911+
&enable_hashjoin_growth,
912+
false,
913+
NULL, NULL, NULL
914+
},
904915
{
905916
{"enable_gathermerge", PGC_USERSET, QUERY_TUNING_METHOD,
906917
gettext_noop("Enables the planner's use of gather merge plans."),

src/include/executor/hashjoin.h

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -71,6 +71,8 @@
7171
* ----------------------------------------------------------------
7272
*/
7373

74+
extern PGDLLIMPORT bool enable_hashjoin_growth;
75+
7476
/* these are in nodes/execnodes.h: */
7577
/* typedef struct HashJoinTupleData *HashJoinTuple; */
7678
/* typedef struct HashJoinTableData *HashJoinTable; */

0 commit comments

Comments
 (0)