Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

Commit e9e26b5

Browse files
committed
Invent "multibitmapsets", and use them to speed up antijoin detection.
Implement a data structure that is a List of Bitmapsets, which is essentially a 2-D boolean array except that the rows need not all be the same width. Operations such as union and intersection are meaningful for these, just as they are for Bitmapsets. Eventually we might build many of the same operations that we have written for Bitmapsets, but for the first use-case we just need a few. That first use-case is for antijoin detection: reduce_outer_joins needs to find the set of Vars that are certain to be non-null in a successfully joined (not null-extended) left join row, and also find the set of Vars subject to higher-level IS NULL constraints, and intersect them. We had been doing this by making Lists of the Var nodes and then using list_intersect, which works but is pretty inefficient compared to a bitmapset-like intersection. Potentially it's O(N^2) if there are a lot of Vars involved, which fortunately there generally aren't; still it's not great. Moreover, that method requires the Vars of interest to be exactly equal() in the join condition and the upper IS NULL condition, which is problematic for my WIP patch that labels Vars according to which outer joins have possibly nulled them. Discussion: https://postgr.es/m/892228.1668437838@sss.pgh.pa.us Discussion: https://postgr.es/m/CAMbWs4-mvPPCJ1W6iK6dD5HiNwoJdi6mZp=-7mE8N9Sh+cd0tQ@mail.gmail.com
1 parent 90e4f30 commit e9e26b5

File tree

7 files changed

+238
-25
lines changed

7 files changed

+238
-25
lines changed

src/backend/nodes/Makefile

+1
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,7 @@ OBJS = \
2121
extensible.o \
2222
list.o \
2323
makefuncs.o \
24+
multibitmapset.o \
2425
nodeFuncs.o \
2526
nodes.o \
2627
outfuncs.o \

src/backend/nodes/README

+1
Original file line numberDiff line numberDiff line change
@@ -58,6 +58,7 @@ FILES IN THIS DIRECTORY (src/backend/nodes/)
5858
Specialized manipulation functions:
5959
bitmapset.c - Bitmapset support
6060
list.c - generic list support
61+
multibitmapset.c - List-of-Bitmapset support
6162
params.c - Param support
6263
tidbitmap.c - TIDBitmap support
6364
value.c - support for value nodes

src/backend/nodes/meson.build

+1
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@ backend_sources += files(
33
'extensible.c',
44
'list.c',
55
'makefuncs.c',
6+
'multibitmapset.c',
67
'nodeFuncs.c',
78
'nodes.c',
89
'params.c',

src/backend/nodes/multibitmapset.c

+162
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,162 @@
1+
/*-------------------------------------------------------------------------
2+
*
3+
* multibitmapset.c
4+
* Lists of Bitmapsets
5+
*
6+
* A multibitmapset is useful in situations where members of a set can
7+
* be identified by two small integers; for example, varno and varattno
8+
* of a group of Vars within a query. The implementation is a List of
9+
* Bitmapsets, so that the empty set can be represented by NIL. (But,
10+
* as with Bitmapsets, that's not the only allowed representation.)
11+
* The zero-based index of a List element is the first identifying value,
12+
* and the (also zero-based) index of a bit within that Bitmapset is
13+
* the second identifying value. There is no expectation that the
14+
* Bitmapsets should all be the same size.
15+
*
16+
* The available operations on multibitmapsets are intended to parallel
17+
* those on bitmapsets, for example union and intersection. So far only
18+
* a small fraction of that has been built out; we'll add more as needed.
19+
*
20+
*
21+
* Copyright (c) 2022, PostgreSQL Global Development Group
22+
*
23+
* IDENTIFICATION
24+
* src/backend/nodes/multibitmapset.c
25+
*
26+
*-------------------------------------------------------------------------
27+
*/
28+
#include "postgres.h"
29+
30+
#include "nodes/multibitmapset.h"
31+
32+
33+
/*
34+
* mbms_add_member
35+
* Add a new member to a multibitmapset.
36+
*
37+
* The new member is identified by "listidx", the zero-based index of the
38+
* List element it should go into, and "bitidx", which specifies the bit
39+
* number to be set therein.
40+
*
41+
* This is like bms_add_member, but for multibitmapsets.
42+
*/
43+
List *
44+
mbms_add_member(List *a, int listidx, int bitidx)
45+
{
46+
Bitmapset *bms;
47+
ListCell *lc;
48+
49+
if (listidx < 0 || bitidx < 0)
50+
elog(ERROR, "negative multibitmapset member index not allowed");
51+
/* Add empty elements as needed */
52+
while (list_length(a) <= listidx)
53+
a = lappend(a, NULL);
54+
/* Update the target element */
55+
lc = list_nth_cell(a, listidx);
56+
bms = lfirst_node(Bitmapset, lc);
57+
bms = bms_add_member(bms, bitidx);
58+
lfirst(lc) = bms;
59+
return a;
60+
}
61+
62+
/*
63+
* mbms_add_members
64+
* Add all members of set b to set a.
65+
*
66+
* This is a UNION operation, but the left input is modified in-place.
67+
*
68+
* This is like bms_add_members, but for multibitmapsets.
69+
*/
70+
List *
71+
mbms_add_members(List *a, const List *b)
72+
{
73+
ListCell *lca,
74+
*lcb;
75+
76+
/* Add empty elements to a, as needed */
77+
while (list_length(a) < list_length(b))
78+
a = lappend(a, NULL);
79+
/* forboth will stop at the end of the shorter list, which is fine */
80+
forboth(lca, a, lcb, b)
81+
{
82+
Bitmapset *bmsa = lfirst_node(Bitmapset, lca);
83+
const Bitmapset *bmsb = lfirst_node(Bitmapset, lcb);
84+
85+
bmsa = bms_add_members(bmsa, bmsb);
86+
lfirst(lca) = bmsa;
87+
}
88+
return a;
89+
}
90+
91+
/*
92+
* mbms_int_members
93+
* Reduce set a to its intersection with set b.
94+
*
95+
* This is an INTERSECT operation, but the left input is modified in-place.
96+
*
97+
* This is like bms_int_members, but for multibitmapsets.
98+
*/
99+
List *
100+
mbms_int_members(List *a, const List *b)
101+
{
102+
ListCell *lca,
103+
*lcb;
104+
105+
/* Remove any elements of a that are no longer of use */
106+
a = list_truncate(a, list_length(b));
107+
/* forboth will stop at the end of the shorter list, which is fine */
108+
forboth(lca, a, lcb, b)
109+
{
110+
Bitmapset *bmsa = lfirst_node(Bitmapset, lca);
111+
const Bitmapset *bmsb = lfirst_node(Bitmapset, lcb);
112+
113+
bmsa = bms_int_members(bmsa, bmsb);
114+
lfirst(lca) = bmsa;
115+
}
116+
return a;
117+
}
118+
119+
/*
120+
* mbms_is_member
121+
* Is listidx/bitidx a member of A?
122+
*
123+
* This is like bms_is_member, but for multibitmapsets.
124+
*/
125+
bool
126+
mbms_is_member(int listidx, int bitidx, const List *a)
127+
{
128+
const Bitmapset *bms;
129+
130+
/* XXX better to just return false for negative indexes? */
131+
if (listidx < 0 || bitidx < 0)
132+
elog(ERROR, "negative multibitmapset member index not allowed");
133+
if (listidx >= list_length(a))
134+
return false;
135+
bms = list_nth_node(Bitmapset, a, listidx);
136+
return bms_is_member(bitidx, bms);
137+
}
138+
139+
/*
140+
* mbms_overlap_sets
141+
* Identify the bitmapsets having common members in a and b.
142+
*
143+
* The result is a bitmapset of the list indexes of bitmapsets that overlap.
144+
*/
145+
Bitmapset *
146+
mbms_overlap_sets(const List *a, const List *b)
147+
{
148+
Bitmapset *result = NULL;
149+
ListCell *lca,
150+
*lcb;
151+
152+
/* forboth will stop at the end of the shorter list, which is fine */
153+
forboth(lca, a, lcb, b)
154+
{
155+
const Bitmapset *bmsa = lfirst_node(Bitmapset, lca);
156+
const Bitmapset *bmsb = lfirst_node(Bitmapset, lcb);
157+
158+
if (bms_overlap(bmsa, bmsb))
159+
result = bms_add_member(result, foreach_current_index(lca));
160+
}
161+
return result;
162+
}

src/backend/optimizer/prep/prepjointree.c

+9-11
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,7 @@
2828
#include "catalog/pg_type.h"
2929
#include "funcapi.h"
3030
#include "nodes/makefuncs.h"
31+
#include "nodes/multibitmapset.h"
3132
#include "nodes/nodeFuncs.h"
3233
#include "optimizer/clauses.h"
3334
#include "optimizer/optimizer.h"
@@ -2769,7 +2770,7 @@ reduce_outer_joins_pass1(Node *jtnode)
27692770
* state: state data collected by phase 1 for this node
27702771
* root: toplevel planner state
27712772
* nonnullable_rels: set of base relids forced non-null by upper quals
2772-
* forced_null_vars: list of Vars forced null by upper quals
2773+
* forced_null_vars: multibitmapset of Vars forced null by upper quals
27732774
*/
27742775
static void
27752776
reduce_outer_joins_pass2(Node *jtnode,
@@ -2799,8 +2800,8 @@ reduce_outer_joins_pass2(Node *jtnode,
27992800
pass_nonnullable_rels = bms_add_members(pass_nonnullable_rels,
28002801
nonnullable_rels);
28012802
pass_forced_null_vars = find_forced_null_vars(f->quals);
2802-
pass_forced_null_vars = list_concat(pass_forced_null_vars,
2803-
forced_null_vars);
2803+
pass_forced_null_vars = mbms_add_members(pass_forced_null_vars,
2804+
forced_null_vars);
28042805
/* And recurse --- but only into interesting subtrees */
28052806
Assert(list_length(f->fromlist) == list_length(state->sub_states));
28062807
forboth(l, f->fromlist, s, state->sub_states)
@@ -2897,7 +2898,7 @@ reduce_outer_joins_pass2(Node *jtnode,
28972898
if (jointype == JOIN_LEFT)
28982899
{
28992900
List *nonnullable_vars;
2900-
List *overlap;
2901+
Bitmapset *overlap;
29012902

29022903
/* Find Vars in j->quals that must be non-null in joined rows */
29032904
nonnullable_vars = find_nonnullable_vars(j->quals);
@@ -2907,11 +2908,8 @@ reduce_outer_joins_pass2(Node *jtnode,
29072908
* forced_null_vars overlap: we need to know if the overlap
29082909
* includes any RHS variables.
29092910
*/
2910-
overlap = list_intersection(nonnullable_vars,
2911-
forced_null_vars);
2912-
if (overlap != NIL &&
2913-
bms_overlap(pull_varnos(root, (Node *) overlap),
2914-
right_state->relids))
2911+
overlap = mbms_overlap_sets(nonnullable_vars, forced_null_vars);
2912+
if (bms_overlap(overlap, right_state->relids))
29152913
jointype = JOIN_ANTI;
29162914
}
29172915

@@ -2964,8 +2962,8 @@ reduce_outer_joins_pass2(Node *jtnode,
29642962
/* OK to merge upper and local constraints */
29652963
local_nonnullable_rels = bms_add_members(local_nonnullable_rels,
29662964
nonnullable_rels);
2967-
local_forced_null_vars = list_concat(local_forced_null_vars,
2968-
forced_null_vars);
2965+
local_forced_null_vars = mbms_add_members(local_forced_null_vars,
2966+
forced_null_vars);
29692967
}
29702968
}
29712969
else

src/backend/optimizer/util/clauses.c

+25-14
Original file line numberDiff line numberDiff line change
@@ -31,6 +31,7 @@
3131
#include "funcapi.h"
3232
#include "miscadmin.h"
3333
#include "nodes/makefuncs.h"
34+
#include "nodes/multibitmapset.h"
3435
#include "nodes/nodeFuncs.h"
3536
#include "nodes/subscripting.h"
3637
#include "nodes/supportnodes.h"
@@ -1566,7 +1567,7 @@ find_nonnullable_rels_walker(Node *node, bool top_level)
15661567
* find_nonnullable_vars
15671568
* Determine which Vars are forced nonnullable by given clause.
15681569
*
1569-
* Returns a list of all level-zero Vars that are referenced in the clause in
1570+
* Returns the set of all level-zero Vars that are referenced in the clause in
15701571
* such a way that the clause cannot possibly return TRUE if any of these Vars
15711572
* is NULL. (It is OK to err on the side of conservatism; hence the analysis
15721573
* here is simplistic.)
@@ -1578,8 +1579,9 @@ find_nonnullable_rels_walker(Node *node, bool top_level)
15781579
* the expression to have been AND/OR flattened and converted to implicit-AND
15791580
* format.
15801581
*
1581-
* The result is a palloc'd List, but we have not copied the member Var nodes.
1582-
* Also, we don't bother trying to eliminate duplicate entries.
1582+
* Attnos of the identified Vars are returned in a multibitmapset (a List of
1583+
* Bitmapsets). List indexes correspond to relids (varnos), while the per-rel
1584+
* Bitmapsets hold varattnos offset by FirstLowInvalidHeapAttributeNumber.
15831585
*
15841586
* top_level is true while scanning top-level AND/OR structure; here, showing
15851587
* the result is either FALSE or NULL is good enough. top_level is false when
@@ -1608,7 +1610,9 @@ find_nonnullable_vars_walker(Node *node, bool top_level)
16081610
Var *var = (Var *) node;
16091611

16101612
if (var->varlevelsup == 0)
1611-
result = list_make1(var);
1613+
result = mbms_add_member(result,
1614+
var->varno,
1615+
var->varattno - FirstLowInvalidHeapAttributeNumber);
16121616
}
16131617
else if (IsA(node, List))
16141618
{
@@ -1623,9 +1627,9 @@ find_nonnullable_vars_walker(Node *node, bool top_level)
16231627
*/
16241628
foreach(l, (List *) node)
16251629
{
1626-
result = list_concat(result,
1627-
find_nonnullable_vars_walker(lfirst(l),
1628-
top_level));
1630+
result = mbms_add_members(result,
1631+
find_nonnullable_vars_walker(lfirst(l),
1632+
top_level));
16291633
}
16301634
}
16311635
else if (IsA(node, FuncExpr))
@@ -1657,7 +1661,12 @@ find_nonnullable_vars_walker(Node *node, bool top_level)
16571661
switch (expr->boolop)
16581662
{
16591663
case AND_EXPR:
1660-
/* At top level we can just recurse (to the List case) */
1664+
1665+
/*
1666+
* At top level we can just recurse (to the List case), since
1667+
* the result should be the union of what we can prove in each
1668+
* arm.
1669+
*/
16611670
if (top_level)
16621671
{
16631672
result = find_nonnullable_vars_walker((Node *) expr->args,
@@ -1689,7 +1698,7 @@ find_nonnullable_vars_walker(Node *node, bool top_level)
16891698
if (result == NIL) /* first subresult? */
16901699
result = subresult;
16911700
else
1692-
result = list_intersection(result, subresult);
1701+
result = mbms_int_members(result, subresult);
16931702

16941703
/*
16951704
* If the intersection is empty, we can stop looking. This
@@ -1788,8 +1797,8 @@ find_nonnullable_vars_walker(Node *node, bool top_level)
17881797
* side of conservatism; hence the analysis here is simplistic. In fact,
17891798
* we only detect simple "var IS NULL" tests at the top level.)
17901799
*
1791-
* The result is a palloc'd List, but we have not copied the member Var nodes.
1792-
* Also, we don't bother trying to eliminate duplicate entries.
1800+
* As with find_nonnullable_vars, we return the varattnos of the identified
1801+
* Vars in a multibitmapset.
17931802
*/
17941803
List *
17951804
find_forced_null_vars(Node *node)
@@ -1804,7 +1813,9 @@ find_forced_null_vars(Node *node)
18041813
var = find_forced_null_var(node);
18051814
if (var)
18061815
{
1807-
result = list_make1(var);
1816+
result = mbms_add_member(result,
1817+
var->varno,
1818+
var->varattno - FirstLowInvalidHeapAttributeNumber);
18081819
}
18091820
/* Otherwise, handle AND-conditions */
18101821
else if (IsA(node, List))
@@ -1815,8 +1826,8 @@ find_forced_null_vars(Node *node)
18151826
*/
18161827
foreach(l, (List *) node)
18171828
{
1818-
result = list_concat(result,
1819-
find_forced_null_vars(lfirst(l)));
1829+
result = mbms_add_members(result,
1830+
find_forced_null_vars((Node *) lfirst(l)));
18201831
}
18211832
}
18221833
else if (IsA(node, BoolExpr))

src/include/nodes/multibitmapset.h

+39
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,39 @@
1+
/*-------------------------------------------------------------------------
2+
*
3+
* multibitmapset.h
4+
* Lists of Bitmapsets
5+
*
6+
* A multibitmapset is useful in situations where members of a set can
7+
* be identified by two small integers; for example, varno and varattno
8+
* of a group of Vars within a query. The implementation is a List of
9+
* Bitmapsets, so that the empty set can be represented by NIL. (But,
10+
* as with Bitmapsets, that's not the only allowed representation.)
11+
* The zero-based index of a List element is the first identifying value,
12+
* and the (also zero-based) index of a bit within that Bitmapset is
13+
* the second identifying value. There is no expectation that the
14+
* Bitmapsets should all be the same size.
15+
*
16+
* The available operations on multibitmapsets are intended to parallel
17+
* those on bitmapsets, for example union and intersection. So far only
18+
* a small fraction of that has been built out; we'll add more as needed.
19+
*
20+
*
21+
* Copyright (c) 2022, PostgreSQL Global Development Group
22+
*
23+
* src/include/nodes/multibitmapset.h
24+
*
25+
*-------------------------------------------------------------------------
26+
*/
27+
#ifndef MULTIBITMAPSET_H
28+
#define MULTIBITMAPSET_H
29+
30+
#include "nodes/bitmapset.h"
31+
#include "nodes/pg_list.h"
32+
33+
extern List *mbms_add_member(List *a, int listidx, int bitidx);
34+
extern List *mbms_add_members(List *a, const List *b);
35+
extern List *mbms_int_members(List *a, const List *b);
36+
extern bool mbms_is_member(int listidx, int bitidx, const List *a);
37+
extern Bitmapset *mbms_overlap_sets(const List *a, const List *b);
38+
39+
#endif /* MULTIBITMAPSET_H */

0 commit comments

Comments
 (0)