[HACKERS] [PATCH] kNN for btree
От | Nikita Glukhov |
---|---|
Тема | [HACKERS] [PATCH] kNN for btree |
Дата | |
Msg-id | ce35e97b-cf34-3f5d-6b99-2c25bae49999@postgrespro.ru обсуждение исходный текст |
Ответы |
Re: [HACKERS] [PATCH] kNN for btree
|
Список | pgsql-hackers |
Hi hackers, I'd like to present a series of patches that implements k-Nearest Neighbors (kNN) search forbtree, which can be usedto speed up ORDER BY distance queries like this: SELECT * FROM eventsORDER BY date <-> '2000-01-01'::date ASC LIMIT 100; Now only GiST supports kNN, but kNN on btree can be emulated using contrib/btree_gist. Scanning algorithm ================== Algorithm is very simple: we use bidirectional B-tree index scan starting at the point from which we measure the distance(target point).At each step, we advance this scan in the directionthat has the nearest point. Butwhen the target point does not fall into the scannedrange, we don't even need to useabidirectional scan here --- wecan use ordinaryunidirectional scaninthe right direction. Performance results =================== Test database istaken from original kNN-GiST presentation (PGCon2010). Test query SELECT * FROM events ORDER BY date <-> '1957-10-04'::date ASC LIMIT k; can be optimizedto the next rather complicated UNION form, whichnolonger requireskNN: WITH t1 AS (SELECT * FROM events WHERE date >= '1957-10-04'::date ORDER BY date ASC LIMIT k), t2 AS (SELECT * FROM events WHERE date < '1957-10-04'::date ORDER BY date DESC LIMIT k), t AS (SELECT * FROM t1 UNION SELECT * FROM t2) SELECT * FROM t ORDER BY date <-> '1957-10-04'::date ASC LIMIT k; In each cell of this table shown query execution time in milliseconds and the number of accessed blocks: k | kNN-btree | kNN-GiST| Opt. query | Seq.scan | | (btree_gist) | withUNION | with sort -------|--------------|--------------|---------------|------------ 1 | 0.0414| 0.0794| 0.0608 |41.11824 10 | 0.0487 | 0.0919 | 0.09717 |41.81824 100 | 0.10747 | 0.192 52| 0.342104 |42.31824 1000 | 0.735573 | 0.913 650 | 2.9701160 |43.51824 10000 | 5.070 5622| 6.240 6760| 36.300 11031 | 54.1 1824 100000 | 49.600 51608| 61.900 64194 | 295.100 94980| 115.0 1824 As you can see, kNN-btree can be two times faster than kNN-GiST(btree_gist) when k < 1000,but the number of blocks readis roughly the same. Implementation details ====================== A brief description is given below for each of the patches: 1. Introduce amcanorderbyop() function This patch transformsexisting boolean AMpropertyamcanorderbyop into a method (function pointer).This is necessarybecause, unlike GiST,kNN for btreesupports only a one ordering operator onthe first index column and we need a different pathkey matching logicfor btree (there was acorresponding comment inmatch_pathkeys_to_index()). GiST-specific logic has been moved from match_pathkeys_to_index() to gistcanorderbyop(). 2. Extract substructure BTScanState from BTScanOpaque This refactoringis necessary for bidirectional kNN-scanimplementation. Now, BTScanOpaque'ssubstructure BTScanState containing only thefields related toscanpositionis passed to some functions where thewhole BTScanOpaque waspassedpreviously. 3. Extract get_index_column_opclass(), get_opclass_opfamily_and_input_type(). Extracted two simple common functions usedingistproperty() and btproperty() (see the next patch). 4. Add kNN supportto btree * Added additional optional BTScanState to BTScanOpaque for bidirectional kNN scan. * Implemented bidirectional kNN scan. * Implemented logic for selecting kNN strategy * Implemented btcanorderbyop(), updated btproperty() and btvalidate() B-tree user interface functions have not been altered because ordering operators are used directly. 5. Add distance operators for sometypes These operators for integer, float, date, time, timestamp, interval, cash and oidtypes havebeencopied fromcontrib/btree_gistand added to the existing btree opclasses as ordering operators. Their btree_gist duplicates areremoved in the next patch. 6. Remove duplicate distance operators from contrib/btree_gist. References to their own distance operators in btree_gist opclassesare replaced with references to the built-in operatorsand thanduplicate operators are dropped. But if the user is using somewhere these operators, upgrade of btree_gist from 1.3 to 1.4 should fail. 7. Add regression tests for btree kNN. Tests were added only after the built-in distance operators were added. -- Nikita Glukhov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Вложения
- 0001-introduce-amcanorderby-function-v01.patch
- 0002-extract-structure-BTScanState-v01.patch
- 0003-extract-get_index_column_opclass-and-get_opclass_opfamily_and_input_type-v01.patch
- 0004-add-kNN-support-to-btree-v01.patch
- 0005-add-distance-operators-v01.patch
- 0006-remove-distance-operators-from-btree_gist-v01.patch
- 0007-add-regression-tests-for-kNN-btree-v01.patch
В списке pgsql-hackers по дате отправления: