Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

Commit 312accd

Browse files
committed
Add explanation of VOPS projections to README
1 parent 0b486f5 commit 312accd

File tree

2 files changed

+137
-2
lines changed

2 files changed

+137
-2
lines changed

README.md

Lines changed: 135 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -770,6 +770,141 @@ initialization of VOPS extension. After invocation of this function,
770770
extension will be loaded and all subsequent queries will be normally
771771
transformed and produce expected results.
772772

773+
## <span id="projections">VOPS projections and automatic table sustitution</span>
774+
775+
VOPS provides some functions simplifying creation and usage of projections.
776+
In future it may be added to SQL grammar, so that it is possible to write
777+
`CREATE PROJECTION xxx OF TABLE yyy(column1, column2,...) GROUP BY (column1, column2, ...)`.
778+
But right now it can be done using `create_projection(projection_name text, source_table regclass, vector_columns text[], scalar_columns text[] default null, order_by text default null)` function.
779+
First argument of this function specifies name of the projection, second refers to existed Postgres table, `vector_columns` is array of
780+
column names which should be stores as VOPS tiles, `scalar_columns` is array of grouping columns which type is preserved and
781+
optional `order_by` parameter specifies name of ordering attribute (explained below).
782+
The `create_projection(PNAME,...)` functions does the following:
783+
784+
1. Creates projection table with specified name and attributes.
785+
2. Creates PNAME_refresh() functions which can be used to update projection.
786+
3. Creates functional BRIN indexes for `first()` and `last()` functions of ordering attribute (if any)
787+
4. Creates BRIN index on grouping attributes (if any)
788+
5. Insert information about created projection in `vops_projections` table. This table is used by optimizer to
789+
automatically substitute table with partition.
790+
791+
The `order_by` attribute is on of the VOPS projection vector columns by which data is sorted. Usually it is some kind of timestamp
792+
used in *time series* (for example trade date). Presence of such column in projection allows to incrementally update projection.
793+
Generated `PNAME_refresh()` method calls `populate` method with correspondent values of `predicate` and
794+
`sort` parameters, selecting from original table only rows with `order_by` column value greater than maximal
795+
value of this column in projection. It assumes that `order_by` is unique or at least refresh is done at the moment when there is some gap
796+
in collected events. In addition to `order_by`, sort list for `populate` includes all scalar (grouping) columns.
797+
It allows to efficiently group imported data by scalar columns and fill VOPS tiles (vector columns) with data.
798+
799+
When `order_by` attribute is specified, VOPS creates two functional BRIN indexes on `first()` and `last()`
800+
functions of this attribute. Presence of such indexes allows to efficiently select time slices. If original query contains
801+
predicates like `(trade_date between '01-01-2017' and '01-01-2018')` then VOPS projection substitution mechanism adds
802+
`(first(trade_date) >= '01-01-2017' and last(trade_date) >= '01-01-2018')` conjuncts which allows Postgres optimizer to use BRIN
803+
indexes to locate affected pages.
804+
805+
In in addition to BRIN indexes for `order_by` attribute, VOPS also creates BRIN index for grouping (scalar) columns.
806+
Such index allows to efficiently select groups and perform index join.
807+
808+
Like materialized views, VOPS projections are not updated automatically. It is responsibility of programmer to periodically refresh them.
809+
Certainly it is possible to define trigger or rule which will automatically insert data in projection table when original table is updated.
810+
But such approach will be extremely inefficient and slow. To take advantage of vector processing, VOPS has to group data in tiles.
811+
It can be done only if there is some batch of data which can be grouped by scalar attributes. If you insert records in projection table on-by-one,
812+
then most of VOPS tiles will contain just one element.
813+
The most convenient way is to use generated `PNAME_refresh()` function.
814+
If `order_by` attribute is specified, this function imports from original table only the new data (not present in projection).
815+
816+
The main advantage of VOPS projection mechanism is that it allows to automatically substitute queries on original tables with projections.
817+
There is `vops.auto_substitute_projections` configuration parameter which allows to switch on such substitution.
818+
By default it is switched off, because VOPS projects may be not synchronized with original table and query on projection may return different result.
819+
Right now projections can be automatically substituted only if:
820+
821+
1. Query doesn't contain joins.
822+
2. Query performs aggregation of vector (tile) columns.
823+
3. All other expressions in target list, `ORDER BY` / `GROUP BY` clauses refers only to scalar attributes of projection.
824+
825+
Projection can be removed using `drop_projection(projection_name text)` function.
826+
It not only drops the correspondent table, but also removes information about it from `vops_partitions` table
827+
and drops generated refresh function.
828+
829+
Example of using projections:
830+
```
831+
create extension vops;
832+
833+
create table lineitem(
834+
l_orderkey integer,
835+
l_partkey integer,
836+
l_suppkey integer,
837+
l_linenumber integer,
838+
l_quantity real,
839+
l_extendedprice real,
840+
l_discount real,
841+
l_tax real,
842+
l_returnflag "char",
843+
l_linestatus "char",
844+
l_shipdate date,
845+
l_commitdate date,
846+
l_receiptdate date,
847+
l_shipinstruct char(25),
848+
l_shipmode char(10),
849+
l_comment char(44),
850+
l_dummy char(1));
851+
852+
select create_projection('vops_lineitem','lineitem',array['l_shipdate','l_quantity','l_extendedprice','l_discount','l_tax'],array['l_returnflag','l_linestatus']);
853+
854+
\timing
855+
856+
copy lineitem from '/mnt/data/lineitem.tbl' delimiter '|' csv;
857+
858+
select vops_lineitem_refresh();
859+
860+
select
861+
l_returnflag,
862+
l_linestatus,
863+
sum(l_quantity) as sum_qty,
864+
sum(l_extendedprice) as sum_base_price,
865+
sum(l_extendedprice*(1-l_discount)) as sum_disc_price,
866+
sum(l_extendedprice*(1-l_discount)*(1+l_tax)) as sum_charge,
867+
avg(l_quantity) as avg_qty,
868+
avg(l_extendedprice) as avg_price,
869+
avg(l_discount) as avg_disc,
870+
count(*) as count_order
871+
from
872+
lineitem
873+
where
874+
l_shipdate <= '1998-12-01'
875+
group by
876+
l_returnflag,
877+
l_linestatus
878+
order by
879+
l_returnflag,
880+
l_linestatus;
881+
882+
set vops.auto_substitute_projections TO on;
883+
884+
select
885+
l_returnflag,
886+
l_linestatus,
887+
sum(l_quantity) as sum_qty,
888+
sum(l_extendedprice) as sum_base_price,
889+
sum(l_extendedprice*(1-l_discount)) as sum_disc_price,
890+
sum(l_extendedprice*(1-l_discount)*(1+l_tax)) as sum_charge,
891+
avg(l_quantity) as avg_qty,
892+
avg(l_extendedprice) as avg_price,
893+
avg(l_discount) as avg_disc,
894+
count(*) as count_order
895+
from
896+
lineitem
897+
where
898+
l_shipdate <= '1998-12-01'
899+
group by
900+
l_returnflag,
901+
l_linestatus
902+
order by
903+
l_returnflag,
904+
l_linestatus;
905+
```
906+
907+
773908
## <span id="example">Example</span>
774909

775910
The most popular benchmark for OLAP is [TPC-H](http://www.tpc.org/tpch).

vops.html

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -805,7 +805,7 @@ <h2><a name="projections">Table projections</a></h2>
805805
indexes to locate affected pages.
806806
</p>
807807
<p>
808-
In in addition to BRIN indexes by <code>order_by</code> attribute, VOPS also creates BRIN index for grouping (scalar) columns.
808+
In in addition to BRIN indexes for <code>order_by</code> attribute, VOPS also creates BRIN index for grouping (scalar) columns.
809809
Such index allows to efficiently select groups and perform index join.
810810
</p>
811811
<p>
@@ -830,7 +830,7 @@ <h2><a name="projections">Table projections</a></h2>
830830
</ul>
831831
<p>
832832
Projection can be removed using <code>drop_projection(projection_name text)</code> function.
833-
It not only drops the correspondent table, but also remove information about it from <code>vops_partitions</code> table
833+
It not only drops the correspondent table, but also removes information about it from <code>vops_partitions</code> table
834834
and drops generated refresh function.
835835
</p>
836836

0 commit comments

Comments
 (0)