You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
After increasing the number of batches and splitting the current one, we
used to disable further growth if all tuples went into only one of the
two new batches. It's possible to construct cases when this leads to
disabling growth prematurely - maybe we can't split the batch now, but
that doesn't mean we could not split it later.
This generally requires underestimated size of the inner relation, so
that we need to increase the number of batches. And then also hashes
non-random in some way. There may be a "common" prefix, or maybe the
data is just correlated in some way.
So instead of hard-disabling the growth permanently, double the memory
limit so that we retry the split after processing more data. Doubling
the limit is somewhat arbitrary - it's the earliest when we could split
the batch in half even if all the current tuples have duplicate hashes.
But we could pick any other value, to retry sooner/later.
0 commit comments