Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

Commit 9b6737c

Browse files
committed
README.
1 parent 6f62e61 commit 9b6737c

File tree

1 file changed

+332
-0
lines changed

1 file changed

+332
-0
lines changed

README.md

Lines changed: 332 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,332 @@
1+
JsQuery – json query language with GIN indexing support
2+
=======================================================
3+
4+
Introduction
5+
------------
6+
7+
JsQuery – is a language to query jsonb data type, introduced in PostgreSQL
8+
release 9.4.
9+
10+
It's primary goal is to provide an additional functionality to jsonb
11+
(currently missing in PostgreSQL), such as a simple and effective way
12+
to search in nested objects and arrays, more comparison operators with
13+
indexes support. We hope, that jsquery will be eventually a part of
14+
PostgreSQL.
15+
16+
Jsquery is released as tsquery data type (similar to tsquery) and @@
17+
match operator for jsonb.
18+
19+
Authors
20+
-------
21+
22+
* Teodor Sigaev <teodor@sigaev.ru>, Postgres Professional, Moscow, Russia
23+
* Alexander Korotkov <aekorotkov@gmail.com>, Postgres Professional, Moscow, Russia
24+
* Oleg Bartunov <oleg@sai.msu.su>, Postgres Professional, Moscow, Russia
25+
26+
Availability
27+
------------
28+
29+
JsQuery is realized as an extension and not available in default PostgreSQL
30+
installation. It is available from
31+
[github](https://github.com/akorotkov/jsquery)
32+
under the same license as
33+
[PostgreSQL](http://www.postgresql.org/about/licence/)
34+
and supports PostgreSQL 9.4+.
35+
36+
Regards
37+
-------
38+
39+
Development is sponsored by [Wargaming.net](http://wargaming.net).
40+
41+
Installation
42+
------------
43+
44+
JsQuery is PostgreSQL extension which requires PostgreSQL 9.4 or higher.
45+
Before build and install you should ensure following:
46+
47+
* PostgreSQL version is 9.4 or higher.
48+
* You have development package of PostgreSQL installed or you built
49+
PostgreSQL from source.
50+
* Your PATH variable is configured so that pg\_config command available.
51+
52+
Typical installation procedure may look like this:
53+
54+
$ git clone https://github.com/akorotkov/jsquery.git
55+
$ cd jsquery
56+
$ make USE_PGXS=1
57+
$ sudo make USE_PGXS=1 install
58+
$ make USE_PGXS=1 installcheck
59+
$ psql DB -c "CREATE EXTENSION jsquery;"
60+
61+
JSON query language
62+
-------------------
63+
64+
JsQuery extension contains `jsquery` datatype which represents whole JSON query
65+
as a single value (like `tsquery` does for fulltext search). The query is an
66+
expression on JSON-document values.
67+
68+
Simple expression is specified as `path binary_operator value` or
69+
`path unary_operator`. See following examples.
70+
71+
* `x = "abc"` – value of key "x" is equal to "abc";
72+
* `$ @> [4, 5, "zzz"]` – the JSON document is an array containing values
73+
4, 5 and "zzz";
74+
* `"abc xyz" >= 10` – value of key "abc xyz" is greater than or equal to 10;
75+
* `volume IS NUMERIC` – type of key "volume" is numeric.
76+
* `$ = true` – the whole JSON document is just a true.
77+
* `similar_ids.@# > 5` – similar\_ids is an array or object of length greater
78+
than 5;
79+
* `similar_product_ids.# = "0684824396"` – array "similar\_product\_ids"
80+
contains string "0684824396".
81+
* `*.color = "red"` – there is object somewhere which key "color" has value
82+
"red".
83+
* `foo = *` – key "foo" exists in object.
84+
85+
Path selects set of JSON values to be checked using given operators. In
86+
the simplest case path is just an key name. In general path is key names and
87+
placeholders combined by dot signs. Path can use following placeholders:
88+
89+
* `#` – any index of array;
90+
* `%` – any key of object;
91+
* `*` – any sequence of array indexes and object keys;
92+
* `@#` – length of array or object, could be only used as last component of
93+
path;
94+
* `$` – the whole JSON document as single value, could be only the whole path.
95+
96+
Expression is true when operator is true against at least one value selected
97+
by path.
98+
99+
Key names could be given either with or without double quotes. Key names
100+
without double quotes shouldn't contain spaces, start with number or concur
101+
with jsquery keyword.
102+
103+
The supported binary operators are:
104+
105+
* Equality operator: `=`;
106+
* Numeric comparison operators: `>`, `>=`, `<`, `<=`;
107+
* Search in the list of scalar values using `IN` operator;
108+
* Array comparison operators: `&&` (overlap), `@>` (contains),
109+
`<@` (contained in).
110+
111+
The supported unary operators are:
112+
113+
* Check for existence operator: `= *`;
114+
* Check for type operators: `IS ARRAY`, `IS NUMERIC`, `IS OBJECT`, `IS STRING`
115+
and `IS BOOLEAN`.
116+
117+
Expressions could be complex. Complex expression is a set of expressions
118+
combined by logical operators (`AND`, `OR`, `NOT`) and grouped using braces.
119+
120+
Examples of complex expressions are given below.
121+
122+
* `a = 1 AND (b = 2 OR c = 3) AND NOT d = 1`
123+
* `x.% = true OR x.# = true`
124+
125+
Prefix expressions are expressions given in the form path (subexpression).
126+
In this case path selects JSON values to be checked using given subexpression.
127+
Check results are aggregated in the same way as in simple expressions.
128+
129+
* `#(a = 1 AND b = 2)` – exists element of array which a key is 1 and b key is 1
130+
* `%($ >= 10 AND $ <= 20)` – exists object key which values is between 10 and 20
131+
132+
Path also could contain following special placeholders with "every" semantics:
133+
134+
* `#:` – every indexes of array;
135+
* `%:` – every key of object;
136+
* `*:` – every sequence of array indexes and object keys.
137+
138+
Consider following example.
139+
140+
%.#:($ >= 0 AND $ <= 1)
141+
142+
This example could be read as following: there is at least one key which value
143+
is array of numerics between 0 and 1.
144+
145+
We can rewrite this example in the following form with extra braces.
146+
147+
%(#:($ >= 0 AND $ <= 1))
148+
149+
The first placeholder `%` checks that expression in braces is true for at least
150+
one value in object. The second placeholder `#:` checks value to be array and
151+
all its elements satisfy expressions in braces.
152+
153+
We can rewrite this example without `#:` placeholder as follows.
154+
155+
%(NOT #(NOT ($ >= 0 AND $ <= 1)) AND $ IS ARRAY)
156+
157+
In this example we transform assertion that every element of array satisfy some
158+
condition to assertion that there is no one element which doesn't satisfy the
159+
same condition.
160+
161+
Some examples of using paths are given below.
162+
163+
* `numbers.#: IS NUMERIC` – every element of "numbers" array is numeric.
164+
* `*:($ IS OBJECT OR $ IS BOOLEAN)` – JSON is a structure of nested objects
165+
with booleans as leaf values.
166+
* `#:.%:($ >= 0 AND $ <= 1)` – each element of array is object containing
167+
only numeric values between 0 and 1.
168+
* `documents.#:.% = *` – "documents" is array of objects containing at least
169+
one key.
170+
* `%.#: ($ IS STRING)` – JSON object contains at least one array of strings.
171+
* `#.% = true` – at least one array element is objects which contains at least
172+
one "true" value.
173+
174+
Usage of path operators and braces need some explanation. When same path
175+
operators are used multiple times they may refer different values while you can
176+
refer same value multiple time by using braces and `$` operator. See following
177+
examples.
178+
179+
* `# < 10 AND # > 20` – exists element less than 10 and exists another element
180+
greater than 20.
181+
* `#($ < 10 AND $ > 20)` – exists element which both less than 10 and greater
182+
than 20 (impossible).
183+
* `#($ >= 10 AND $ <= 20)` – exists element between 10 and 20.
184+
* `# >= 10 AND # <= 20` – exists element great or equal to 10 and exists
185+
another element less or equal to 20. Query can be satisfied by array with
186+
no elements between 10 and 20, for instance [0,30].
187+
188+
Same rules apply when you search inside objects and branchy structures.
189+
190+
See our
191+
[pgconf.eu presentation](http://www.sai.msu.su/~megera/postgres/talks/pgconfeu-2014-jsquery.pdf)
192+
for more examples.
193+
194+
GIN indexes
195+
-----------
196+
197+
JsQuery extension contains two operator classes (opclasses) for GIN which
198+
provide different kinds of query optimization.
199+
200+
* jsonb\_value\_path\_ops
201+
* jsonb\_path\_value\_ops
202+
203+
In each of two GIN opclasses jsonb documents are decomposed into entries. Each
204+
entry is associated with particular value and it's path. Difference between
205+
opclasses is in the entry representation, comparison and usage for search
206+
optimization.
207+
208+
For example, jsonb document
209+
`{"a": [{"b": "xyz", "c": true}, 10], "d": {"e": [7, false]}}`
210+
would be decomposed into following entries:
211+
212+
* "a".#."b"."xyz"
213+
* "a".#."c".true
214+
* "a".#.10
215+
* "d"."e".#.7
216+
* "d"."e".#.false
217+
218+
Since JsQuery doesn't support search in particular array index, we consider
219+
all array elements to be equivalent. Thus, each array element is marked with
220+
same `#` sign in the path.
221+
222+
Major problem in the entries representation is its size. In the given example
223+
key "a" is presented three times. In the large branchy documents with long
224+
keys size of naive entries representation becomes unreasonable. Both opclasses
225+
address this issue but in a slightly different way.
226+
227+
### jsonb\_value\_path\_ops
228+
229+
jsonb\_value\_path\_ops represents entry as pair of path hash and value.
230+
Following pseudocode illustrates it.
231+
232+
(hash(path_item_1.path_item_2. ... .path_item_n); value)
233+
234+
In comparison of entries path hash is the higher part of entry and value is
235+
its lower part. This determines the features of this opclass. Since path
236+
is hashed and it is higher part of entry we need to know the full path to
237+
the value in order to use it for search. However, once path is specified
238+
we can use both exact and range searches very efficiently.
239+
240+
### jsonb\_path\_value\_ops
241+
242+
jsonb\_path\_value\_ops represents entry as pair of value and bloom filter
243+
of path.
244+
245+
(value; bloom(path_item_1) | bloom(path_item_2) | ... | bloom(path_item_n))
246+
247+
In comparison of entries value is the higher part of entry and bloom filter of
248+
path is its lower part. This determines the features of this opclass. Since
249+
value is the higher part of entry we can perform only exact value search
250+
efficiently. Range value search is possible as well but we would have to
251+
filter all the the different paths where matching values occur. Bloom filter
252+
over path items allows index usage for conditions containing `%` and `*` in
253+
their paths.
254+
255+
### Query optimization
256+
257+
JsQuery opclasses perform complex query optimization. Thus it's valuable for
258+
developer or administrator to see the result of such optimization.
259+
Unfortunately, opclasses aren't allowed to do any custom output to the
260+
EXPLAIN. That's why JsQuery provides following functions which allows to see
261+
how particular opclass optimizes given query.
262+
263+
* gin\_debug\_query\_value\_path(jsquery) – for jsonb\_value\_path\_ops
264+
* gin\_debug\_query\_path\_value(jsquery) – for jsonb\_path\_value\_ops
265+
266+
Result of these functions is a textual representation of query tree which
267+
leafs are GIN search entries. Following examples show different results of
268+
query optimization by different opclasses.
269+
270+
# SELECT gin_debug_query_path_value('x = 1 AND (*.y = 1 OR y = 2)');
271+
gin_debug_query_path_value
272+
----------------------------
273+
x = 1 , entry 0 +
274+
275+
# SELECT gin_debug_query_value_path('x = 1 AND (*.y = 1 OR y = 2)');
276+
gin_debug_query_value_path
277+
----------------------------
278+
AND +
279+
x = 1 , entry 0 +
280+
OR +
281+
*.y = 1 , entry 1 +
282+
y = 2 , entry 2 +
283+
284+
Unfortunately, jsonb have no statistics yet. That's why JsQuery optimizer has
285+
to do imperative decision while selecting conditions to be evaluated using
286+
index. This decision is made by assumtion that some condition types are less
287+
selective than others. Optimizer divides conditions into following selectivity
288+
class (listed by descending of selectivity).
289+
290+
1. Equality (x = c)
291+
2. Range (c1 < x < c2)
292+
3. Inequality (x > c)
293+
4. Is (x is type)
294+
5. Any (x = \*)
295+
296+
Optimizer evades index evaluation of less selective conditions when possible.
297+
For example, in the `x = 1 AND y > 0` query `x = 1` is assumed to be more
298+
selective than `y > 0`. That's why index isn't used for evaluation of `y > 0`.
299+
300+
# SELECT gin_debug_query_path_value('x = 1 AND y > 0');
301+
gin_debug_query_path_value
302+
----------------------------
303+
x = 1 , entry 0 +
304+
305+
With lack of statistics decisions made by optimizer can be inaccurate. That's
306+
why JsQuery supports hints. Comments `/*-- index */` and `/*-- noindex */`
307+
placed in the conditions forces optimizer to use and not use index
308+
correspondingly.
309+
310+
SELECT gin_debug_query_path_value('x = 1 AND y /*-- index */ > 0');
311+
gin_debug_query_path_value
312+
----------------------------
313+
AND +
314+
x = 1 , entry 0 +
315+
y > 0 , entry 1 +
316+
317+
SELECT gin_debug_query_path_value('x /*-- noindex */ = 1 AND y > 0');
318+
gin_debug_query_path_value
319+
----------------------------
320+
y > 0 , entry 0 +
321+
322+
Contribution
323+
------------
324+
325+
Please, notice, that JsQuery is still under development and while it's
326+
stable and tested, it may contains some bugs. Don't hesitate to raise
327+
[issues at github](https://github.com/akorotkov/jsquery/issues) with your
328+
bug reports.
329+
330+
If you're lacking of some functionality in JsQuery and feeling power to
331+
implement it then you're welcome to make pull requests.
332+

0 commit comments

Comments
 (0)