@@ -110,75 +110,70 @@ CREATE INDEX bloomidx ON tbloom USING bloom (i1,i2,i3)
110
110
FROM
111
111
generate_series(1,10000000);
112
112
SELECT 10000000
113
- =# CREATE INDEX bloomidx ON tbloom USING bloom (i1, i2, i3, i4, i5, i6);
114
- CREATE INDEX
115
- =# SELECT pg_size_pretty(pg_relation_size('bloomidx'));
116
- pg_size_pretty
117
- ----------------
118
- 153 MB
119
- (1 row)
120
- =# CREATE index btreeidx ON tbloom (i1, i2, i3, i4, i5, i6);
121
- CREATE INDEX
122
- =# SELECT pg_size_pretty(pg_relation_size('btreeidx'));
123
- pg_size_pretty
124
- ----------------
125
- 387 MB
126
- (1 row)
127
113
</programlisting>
128
114
129
115
<para>
130
116
A sequential scan over this large table takes a long time:
131
117
<programlisting>
132
118
=# EXPLAIN ANALYZE SELECT * FROM tbloom WHERE i2 = 898732 AND i5 = 123451;
133
- QUERY PLAN
134
- -------------------------------------------------------------------&zwsp;-----------------------------------------
135
- Seq Scan on tbloom (cost=0.00..213694.08 rows=1 width=24) (actual time=1445.438..1445.438 rows=0 loops=1)
119
+ QUERY PLAN
120
+ -------------------------------------------------------------------&zwsp;-----------------------------------
121
+ Seq Scan on tbloom (cost=0.00..2137.14 rows=3 width=24) (actual time=15.480..15.480 rows=0 loops=1)
136
122
Filter: ((i2 = 898732) AND (i5 = 123451))
137
- Rows Removed by Filter: 10000000
138
- Planning time : 0.177 ms
139
- Execution time: 1445.473 ms
123
+ Rows Removed by Filter: 100000
124
+ Planning Time : 0.340 ms
125
+ Execution Time: 15.501 ms
140
126
(5 rows)
141
127
</programlisting>
142
128
</para>
143
129
144
130
<para>
145
- So the planner will usually select an index scan if possible.
146
- With a btree index, we get results like this :
131
+ Even with the btree index defined the result will still be a
132
+ sequential scan :
147
133
<programlisting>
134
+ =# CREATE INDEX btreeidx ON tbloom (i1, i2, i3, i4, i5, i6);
135
+ CREATE INDEX
136
+ =# SELECT pg_size_pretty(pg_relation_size('btreeidx'));
137
+ pg_size_pretty
138
+ ----------------
139
+ 3976 kB
140
+ (1 row)
148
141
=# EXPLAIN ANALYZE SELECT * FROM tbloom WHERE i2 = 898732 AND i5 = 123451;
149
- QUERY PLAN
150
- -------------------------------------------------------------------&zwsp;-------------------------------------------------------------
151
- Index Only Scan using btreeidx on tbloom (cost=0.56..298311.96 rows=1 width=24) (actual time=445.709..445.709 rows=0 loops=1)
152
- Index Cond : ((i2 = 898732) AND (i5 = 123451))
153
- Heap Fetches: 0
154
- Planning time : 0.193 ms
155
- Execution time: 445.770 ms
142
+ QUERY PLAN
143
+ -------------------------------------------------------------------&zwsp;-----------------------------------
144
+ Seq Scan on tbloom (cost=0.00..2137.00 rows=2 width=24) (actual time=12.604..12.604 rows=0 loops=1)
145
+ Filter : ((i2 = 898732) AND (i5 = 123451))
146
+ Rows Removed by Filter: 100000
147
+ Planning Time : 0.155 ms
148
+ Execution Time: 12.617 ms
156
149
(5 rows)
157
150
</programlisting>
158
151
</para>
159
152
160
153
<para>
161
- Bloom is better than btree in handling this type of search:
154
+ Having the bloom index defined on the table is better than btree in
155
+ handling this type of search:
162
156
<programlisting>
157
+ =# CREATE INDEX bloomidx ON tbloom USING bloom (i1, i2, i3, i4, i5, i6);
158
+ CREATE INDEX
159
+ =# SELECT pg_size_pretty(pg_relation_size('bloomidx'));
160
+ pg_size_pretty
161
+ ----------------
162
+ 1584 kB
163
+ (1 row)
163
164
=# EXPLAIN ANALYZE SELECT * FROM tbloom WHERE i2 = 898732 AND i5 = 123451;
164
- QUERY PLAN
165
- -------------------------------------------------------------------&zwsp;--------------------------------------------------------
166
- Bitmap Heap Scan on tbloom (cost=178435.39..178439.41 rows=1 width=24) (actual time=76.698..76.698 rows=0 loops=1)
165
+ QUERY PLAN
166
+ -------------------------------------------------------------------&zwsp;--------------------------------------------------
167
+ Bitmap Heap Scan on tbloom (cost=1792.00..1799.69 rows=2 width=24) (actual time=0.384..0.384 rows=0 loops=1)
167
168
Recheck Cond: ((i2 = 898732) AND (i5 = 123451))
168
- Rows Removed by Index Recheck: 2439
169
- Heap Blocks: exact=2408
170
- -> Bitmap Index Scan on bloomidx (cost=0.00..178435.39 rows=1 width=0) (actual time=72.455..72.455 rows=2439 loops=1)
169
+ Rows Removed by Index Recheck: 26
170
+ Heap Blocks: exact=26
171
+ -> Bitmap Index Scan on bloomidx (cost=0.00..1792.00 rows=2 width=0) (actual time=0.350..0.350 rows=26 loops=1)
171
172
Index Cond: ((i2 = 898732) AND (i5 = 123451))
172
- Planning time : 0.475 ms
173
- Execution time: 76.778 ms
173
+ Planning Time : 0.122 ms
174
+ Execution Time: 0.407 ms
174
175
(8 rows)
175
176
</programlisting>
176
- Note the relatively large number of false positives: 2439 rows were
177
- selected to be visited in the heap, but none actually matched the
178
- query. We could reduce that by specifying a larger signature length.
179
- In this example, creating the index with <literal>length=200</literal>
180
- reduced the number of false positives to 55; but it doubled the index size
181
- (to 306 MB) and ended up being slower for this query (125 ms overall).
182
177
</para>
183
178
184
179
<para>
@@ -187,24 +182,36 @@ CREATE INDEX
187
182
A better strategy for btree is to create a separate index on each column.
188
183
Then the planner will choose something like this:
189
184
<programlisting>
185
+ =# CREATE INDEX btreeidx1 ON tbloom (i1);
186
+ CREATE INDEX
187
+ =# CREATE INDEX btreeidx2 ON tbloom (i2);
188
+ CREATE INDEX
189
+ =# CREATE INDEX btreeidx3 ON tbloom (i3);
190
+ CREATE INDEX
191
+ =# CREATE INDEX btreeidx4 ON tbloom (i4);
192
+ CREATE INDEX
193
+ =# CREATE INDEX btreeidx5 ON tbloom (i5);
194
+ CREATE INDEX
195
+ =# CREATE INDEX btreeidx6 ON tbloom (i6);
196
+ CREATE INDEX
190
197
=# EXPLAIN ANALYZE SELECT * FROM tbloom WHERE i2 = 898732 AND i5 = 123451;
191
- QUERY PLAN
192
- -------------------------------------------------------------------&zwsp;-----------------------------------------------------------
193
- Bitmap Heap Scan on tbloom (cost=9.29..13.30 rows=1 width=24) (actual time=0.148 ..0.148 rows=0 loops=1)
198
+ QUERY PLAN
199
+ -------------------------------------------------------------------&zwsp;--------------------------------------------------------
200
+ Bitmap Heap Scan on tbloom (cost=24.34..32.03 rows=2 width=24) (actual time=0.032 ..0.033 rows=0 loops=1)
194
201
Recheck Cond: ((i5 = 123451) AND (i2 = 898732))
195
- -> BitmapAnd (cost=9.29..9.29 rows=1 width=0) (actual time=0.145 ..0.145 rows=0 loops=1)
196
- -> Bitmap Index Scan on tbloom_i5_idx (cost=0.00..4.52 rows=11 width=0) (actual time=0.089 ..0.089 rows=10 loops=1)
202
+ -> BitmapAnd (cost=24.34..24.34 rows=2 width=0) (actual time=0.029 ..0.030 rows=0 loops=1)
203
+ -> Bitmap Index Scan on btreeidx5 (cost=0.00..12.04 rows=500 width=0) (actual time=0.029 ..0.029 rows=0 loops=1)
197
204
Index Cond: (i5 = 123451)
198
- -> Bitmap Index Scan on tbloom_i2_idx (cost=0.00..4.52 rows=11 width=0) (actual time=0.048..0.048 rows=8 loops=1 )
205
+ -> Bitmap Index Scan on btreeidx2 (cost=0.00..12.04 rows=500 width=0) (never executed )
199
206
Index Cond: (i2 = 898732)
200
- Planning time: 2.049 ms
201
- Execution time : 0.280 ms
207
+ Planning Time: 0.537 ms
208
+ Execution Time : 0.064 ms
202
209
(9 rows)
203
210
</programlisting>
204
211
Although this query runs much faster than with either of the single
205
- indexes, we pay a large penalty in index size. Each of the single-column
206
- btree indexes occupies 214 MB, so the total space needed is over 1.2GB ,
207
- more than 8 times the space used by the bloom index.
212
+ indexes, we pay a penalty in index size. Each of the single-column
213
+ btree indexes occupies 2 MB, so the total space needed is 12 MB ,
214
+ eight times the space used by the bloom index.
208
215
</para>
209
216
</sect2>
210
217
0 commit comments