Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

Commit 4747655

Browse files
committed
doc: make blooms docs match reality
Parallel execution changed the way bloom queries are executed, so update the EXPLAIN output, and restructure the docs to be clearer and more accurate. Reported-by: Daniel Westermann Discussion: https://postgr.es/m/ZR0P278MB0122119FAE78721A694C30C8D2340@ZR0P278MB0122.CHEP278.PROD.OUTLOOK.COM Author: Daniel Westermann and me Backpatch-through: 9.6
1 parent d88d8ad commit 4747655

File tree

1 file changed

+62
-55
lines changed

1 file changed

+62
-55
lines changed

doc/src/sgml/bloom.sgml

Lines changed: 62 additions & 55 deletions
Original file line numberDiff line numberDiff line change
@@ -110,75 +110,70 @@ CREATE INDEX bloomidx ON tbloom USING bloom (i1,i2,i3)
110110
FROM
111111
generate_series(1,10000000);
112112
SELECT 10000000
113-
=# CREATE INDEX bloomidx ON tbloom USING bloom (i1, i2, i3, i4, i5, i6);
114-
CREATE INDEX
115-
=# SELECT pg_size_pretty(pg_relation_size('bloomidx'));
116-
pg_size_pretty
117-
----------------
118-
153 MB
119-
(1 row)
120-
=# CREATE index btreeidx ON tbloom (i1, i2, i3, i4, i5, i6);
121-
CREATE INDEX
122-
=# SELECT pg_size_pretty(pg_relation_size('btreeidx'));
123-
pg_size_pretty
124-
----------------
125-
387 MB
126-
(1 row)
127113
</programlisting>
128114

129115
<para>
130116
A sequential scan over this large table takes a long time:
131117
<programlisting>
132118
=# EXPLAIN ANALYZE SELECT * FROM tbloom WHERE i2 = 898732 AND i5 = 123451;
133-
QUERY PLAN
134-
-------------------------------------------------------------------&zwsp;-----------------------------------------
135-
Seq Scan on tbloom (cost=0.00..213694.08 rows=1 width=24) (actual time=1445.438..1445.438 rows=0 loops=1)
119+
QUERY PLAN
120+
-------------------------------------------------------------------&zwsp;-----------------------------------
121+
Seq Scan on tbloom (cost=0.00..2137.14 rows=3 width=24) (actual time=15.480..15.480 rows=0 loops=1)
136122
Filter: ((i2 = 898732) AND (i5 = 123451))
137-
Rows Removed by Filter: 10000000
138-
Planning time: 0.177 ms
139-
Execution time: 1445.473 ms
123+
Rows Removed by Filter: 100000
124+
Planning Time: 0.340 ms
125+
Execution Time: 15.501 ms
140126
(5 rows)
141127
</programlisting>
142128
</para>
143129

144130
<para>
145-
So the planner will usually select an index scan if possible.
146-
With a btree index, we get results like this:
131+
Even with the btree index defined the result will still be a
132+
sequential scan:
147133
<programlisting>
134+
=# CREATE INDEX btreeidx ON tbloom (i1, i2, i3, i4, i5, i6);
135+
CREATE INDEX
136+
=# SELECT pg_size_pretty(pg_relation_size('btreeidx'));
137+
pg_size_pretty
138+
----------------
139+
3976 kB
140+
(1 row)
148141
=# EXPLAIN ANALYZE SELECT * FROM tbloom WHERE i2 = 898732 AND i5 = 123451;
149-
QUERY PLAN
150-
-------------------------------------------------------------------&zwsp;-------------------------------------------------------------
151-
Index Only Scan using btreeidx on tbloom (cost=0.56..298311.96 rows=1 width=24) (actual time=445.709..445.709 rows=0 loops=1)
152-
Index Cond: ((i2 = 898732) AND (i5 = 123451))
153-
Heap Fetches: 0
154-
Planning time: 0.193 ms
155-
Execution time: 445.770 ms
142+
QUERY PLAN
143+
-------------------------------------------------------------------&zwsp;-----------------------------------
144+
Seq Scan on tbloom (cost=0.00..2137.00 rows=2 width=24) (actual time=12.604..12.604 rows=0 loops=1)
145+
Filter: ((i2 = 898732) AND (i5 = 123451))
146+
Rows Removed by Filter: 100000
147+
Planning Time: 0.155 ms
148+
Execution Time: 12.617 ms
156149
(5 rows)
157150
</programlisting>
158151
</para>
159152

160153
<para>
161-
Bloom is better than btree in handling this type of search:
154+
Having the bloom index defined on the table is better than btree in
155+
handling this type of search:
162156
<programlisting>
157+
=# CREATE INDEX bloomidx ON tbloom USING bloom (i1, i2, i3, i4, i5, i6);
158+
CREATE INDEX
159+
=# SELECT pg_size_pretty(pg_relation_size('bloomidx'));
160+
pg_size_pretty
161+
----------------
162+
1584 kB
163+
(1 row)
163164
=# EXPLAIN ANALYZE SELECT * FROM tbloom WHERE i2 = 898732 AND i5 = 123451;
164-
QUERY PLAN
165-
-------------------------------------------------------------------&zwsp;--------------------------------------------------------
166-
Bitmap Heap Scan on tbloom (cost=178435.39..178439.41 rows=1 width=24) (actual time=76.698..76.698 rows=0 loops=1)
165+
QUERY PLAN
166+
-------------------------------------------------------------------&zwsp;--------------------------------------------------
167+
Bitmap Heap Scan on tbloom (cost=1792.00..1799.69 rows=2 width=24) (actual time=0.384..0.384 rows=0 loops=1)
167168
Recheck Cond: ((i2 = 898732) AND (i5 = 123451))
168-
Rows Removed by Index Recheck: 2439
169-
Heap Blocks: exact=2408
170-
-&gt; Bitmap Index Scan on bloomidx (cost=0.00..178435.39 rows=1 width=0) (actual time=72.455..72.455 rows=2439 loops=1)
169+
Rows Removed by Index Recheck: 26
170+
Heap Blocks: exact=26
171+
-&gt; Bitmap Index Scan on bloomidx (cost=0.00..1792.00 rows=2 width=0) (actual time=0.350..0.350 rows=26 loops=1)
171172
Index Cond: ((i2 = 898732) AND (i5 = 123451))
172-
Planning time: 0.475 ms
173-
Execution time: 76.778 ms
173+
Planning Time: 0.122 ms
174+
Execution Time: 0.407 ms
174175
(8 rows)
175176
</programlisting>
176-
Note the relatively large number of false positives: 2439 rows were
177-
selected to be visited in the heap, but none actually matched the
178-
query. We could reduce that by specifying a larger signature length.
179-
In this example, creating the index with <literal>length=200</literal>
180-
reduced the number of false positives to 55; but it doubled the index size
181-
(to 306 MB) and ended up being slower for this query (125 ms overall).
182177
</para>
183178

184179
<para>
@@ -187,24 +182,36 @@ CREATE INDEX
187182
A better strategy for btree is to create a separate index on each column.
188183
Then the planner will choose something like this:
189184
<programlisting>
185+
=# CREATE INDEX btreeidx1 ON tbloom (i1);
186+
CREATE INDEX
187+
=# CREATE INDEX btreeidx2 ON tbloom (i2);
188+
CREATE INDEX
189+
=# CREATE INDEX btreeidx3 ON tbloom (i3);
190+
CREATE INDEX
191+
=# CREATE INDEX btreeidx4 ON tbloom (i4);
192+
CREATE INDEX
193+
=# CREATE INDEX btreeidx5 ON tbloom (i5);
194+
CREATE INDEX
195+
=# CREATE INDEX btreeidx6 ON tbloom (i6);
196+
CREATE INDEX
190197
=# EXPLAIN ANALYZE SELECT * FROM tbloom WHERE i2 = 898732 AND i5 = 123451;
191-
QUERY PLAN
192-
-------------------------------------------------------------------&zwsp;-----------------------------------------------------------
193-
Bitmap Heap Scan on tbloom (cost=9.29..13.30 rows=1 width=24) (actual time=0.148..0.148 rows=0 loops=1)
198+
QUERY PLAN
199+
-------------------------------------------------------------------&zwsp;--------------------------------------------------------
200+
Bitmap Heap Scan on tbloom (cost=24.34..32.03 rows=2 width=24) (actual time=0.032..0.033 rows=0 loops=1)
194201
Recheck Cond: ((i5 = 123451) AND (i2 = 898732))
195-
-&gt; BitmapAnd (cost=9.29..9.29 rows=1 width=0) (actual time=0.145..0.145 rows=0 loops=1)
196-
-&gt; Bitmap Index Scan on tbloom_i5_idx (cost=0.00..4.52 rows=11 width=0) (actual time=0.089..0.089 rows=10 loops=1)
202+
-&gt; BitmapAnd (cost=24.34..24.34 rows=2 width=0) (actual time=0.029..0.030 rows=0 loops=1)
203+
-&gt; Bitmap Index Scan on btreeidx5 (cost=0.00..12.04 rows=500 width=0) (actual time=0.029..0.029 rows=0 loops=1)
197204
Index Cond: (i5 = 123451)
198-
-&gt; Bitmap Index Scan on tbloom_i2_idx (cost=0.00..4.52 rows=11 width=0) (actual time=0.048..0.048 rows=8 loops=1)
205+
-&gt; Bitmap Index Scan on btreeidx2 (cost=0.00..12.04 rows=500 width=0) (never executed)
199206
Index Cond: (i2 = 898732)
200-
Planning time: 2.049 ms
201-
Execution time: 0.280 ms
207+
Planning Time: 0.537 ms
208+
Execution Time: 0.064 ms
202209
(9 rows)
203210
</programlisting>
204211
Although this query runs much faster than with either of the single
205-
indexes, we pay a large penalty in index size. Each of the single-column
206-
btree indexes occupies 214 MB, so the total space needed is over 1.2GB,
207-
more than 8 times the space used by the bloom index.
212+
indexes, we pay a penalty in index size. Each of the single-column
213+
btree indexes occupies 2 MB, so the total space needed is 12 MB,
214+
eight times the space used by the bloom index.
208215
</para>
209216
</sect2>
210217

0 commit comments

Comments
 (0)