diff options
author | Bruce Momjian | 2007-11-10 23:30:46 +0000 |
---|---|---|
committer | Bruce Momjian | 2007-11-10 23:30:46 +0000 |
commit | c3c69ab4fd25a20749b850d34cbc8ce3f1812e3b (patch) | |
tree | a7d9b4501297216ac71f597dca3840090ae791d5 /doc/src | |
parent | 6e414a171e8a91966b10ecd14aa367422870bdd2 (diff) |
Move most /contrib README files into SGML. Some still need conversion
or will never be converted.
Diffstat (limited to 'doc/src')
32 files changed, 9456 insertions, 2 deletions
diff --git a/doc/src/sgml/adminpack.sgml b/doc/src/sgml/adminpack.sgml new file mode 100644 index 00000000000..10816f5c24d --- /dev/null +++ b/doc/src/sgml/adminpack.sgml @@ -0,0 +1,32 @@ +<sect1> + <title>adminpack</title> + <para> + adminpack is a PostgreSQL standard module that implements a number of + support functions which pgAdmin and other administration and management tools + can use to provide additional functionality if installed on a server. + </para> + + <sect2> + <title>Functions implemented</title> + <para> + Functions implemented by adminpack can only be run by a superuser. Here's a + list of these functions: + </para> + <para> + <programlisting> + int8 pg_catalog.pg_file_write(fname text, data text, append bool) + bool pg_catalog.pg_file_rename(oldname text, newname text, archivname text) + bool pg_catalog.pg_file_rename(oldname text, newname text) + bool pg_catalog.pg_file_unlink(fname text) + setof record pg_catalog.pg_logdir_ls() + + /* Renaming of existing backend functions for pgAdmin compatibility */ + int8 pg_catalog.pg_file_read(fname text, data text, append bool) + bigint pg_catalog.pg_file_length(text) + int4 pg_catalog.pg_logfile_rotate() + </programlisting> + </para> + </sect2> + +</sect1> + diff --git a/doc/src/sgml/btree-gist.sgml b/doc/src/sgml/btree-gist.sgml new file mode 100644 index 00000000000..4e1126e33c3 --- /dev/null +++ b/doc/src/sgml/btree-gist.sgml @@ -0,0 +1,40 @@ +<sect1> + <!-- + <indexterm zone="btree-gist"> + <primary>btree-gist</primary> + </indexterm> + --> + + <title>btree-gist</title> + + <para> + btree-gist is a B-Tree implementation using GiST that supports the int2, int4, + int8, float4, float8 timestamp with/without time zone, time + with/without time zone, date, interval, oid, money, macaddr, char, + varchar/text, bytea, numeric, bit, varbit and inet/cidr types. + </para> + + <sect2> + <title>Example usage</title> + <programlisting> + CREATE TABLE test (a int4); + -- create index + CREATE INDEX testidx ON test USING gist (a); + -- query + SELECT * FROM test WHERE a < 10; + </programlisting> + </sect2> + + <sect2> + <title>Authors</title> + <para> + All work was done by Teodor Sigaev (<email>teodor@stack.net</email>) , + Oleg Bartunov (<email>oleg@sai.msu.su</email>), Janko Richter + (<email>jankorichter@yahoo.de</email>). See + <ulink url="http://www.sai.msu.su/~megera/postgres/gist"></ulink> for additional + information. + </para> + </sect2> + +</sect1> + diff --git a/doc/src/sgml/buffercache.sgml b/doc/src/sgml/buffercache.sgml new file mode 100644 index 00000000000..1347c75ba32 --- /dev/null +++ b/doc/src/sgml/buffercache.sgml @@ -0,0 +1,121 @@ +<sect1 id="buffercache"> + <title>pg_buffercache</title> + + <indexterm zone="buffercache"> + <primary>pg_buffercache</primary> + </indexterm> + + <para> + <literal>pg_buffercache</literal> module provides the means for examining + what's happening to the buffercache at any given time without having to + restart or rebuild the server with debugging code added. The intent is to + do for the buffercache what pg_locks does for locks. + </para> + <para> + This module consists of a C function <literal>pg_buffercache_pages()</literal> + that returns a set of records, plus a view <literal>pg_buffercache</literal> + to wrapper the function. + </para> + <para> + By default public access is REVOKED from both of these, just in case there + are security issues lurking. + </para> + + <sect2> + <title>Notes</title> + <para> + The definition of the columns exposed in the view is: + </para> + <programlisting> + Column | references | Description + ----------------+----------------------+------------------------------------ + bufferid | | Id, 1..shared_buffers. + relfilenode | pg_class.relfilenode | Refilenode of the relation. + reltablespace | pg_tablespace.oid | Tablespace oid of the relation. + reldatabase | pg_database.oid | Database for the relation. + relblocknumber | | Offset of the page in the relation. + isdirty | | Is the page dirty? + usagecount | | Page LRU count + </programlisting> + <para> + There is one row for each buffer in the shared cache. Unused buffers are + shown with all fields null except bufferid. + </para> + <para> + Because the cache is shared by all the databases, there are pages from + relations not belonging to the current database. + </para> + <para> + When the pg_buffercache view is accessed, internal buffer manager locks are + taken, and a copy of the buffer cache data is made for the view to display. + This ensures that the view produces a consistent set of results, while not + blocking normal buffer activity longer than necessary. Nonetheless there + could be some impact on database performance if this view is read often. + </para> + </sect2> + + <sect2> + <title>Sample output</title> + <programlisting> + regression=# \d pg_buffercache; + View "public.pg_buffercache" + Column | Type | Modifiers + ----------------+----------+----------- + bufferid | integer | + relfilenode | oid | + reltablespace | oid | + reldatabase | oid | + relblocknumber | bigint | + isdirty | boolean | + usagecount | smallint | + + View definition: + SELECT p.bufferid, p.relfilenode, p.reltablespace, p.reldatabase, + p.relblocknumber, p.isdirty, p.usagecount + FROM pg_buffercache_pages() p(bufferid integer, relfilenode oid, + reltablespace oid, reldatabase oid, relblocknumber bigint, + isdirty boolean, usagecount smallint); + + regression=# SELECT c.relname, count(*) AS buffers + FROM pg_class c INNER JOIN pg_buffercache b + ON b.relfilenode = c.relfilenode INNER JOIN pg_database d + ON (b.reldatabase = d.oid AND d.datname = current_database()) + GROUP BY c.relname + ORDER BY 2 DESC LIMIT 10; + relname | buffers + ---------------------------------+--------- + tenk2 | 345 + tenk1 | 141 + pg_proc | 46 + pg_class | 45 + pg_attribute | 43 + pg_class_relname_nsp_index | 30 + pg_proc_proname_args_nsp_index | 28 + pg_attribute_relid_attnam_index | 26 + pg_depend | 22 + pg_depend_reference_index | 20 + (10 rows) + + regression=# + </programlisting> + </sect2> + + <sect2> + <title>Authors</title> + <itemizedlist> + <listitem> + <para> + Mark Kirkwood <email>markir@paradise.net.nz</email> + </para> + </listitem> + <listitem> + <para>Design suggestions: Neil Conway <email>neilc@samurai.com</email></para> + </listitem> + <listitem> + <para>Debugging advice: Tom Lane <email>tgl@sss.pgh.pa.us</email></para> + </listitem> + </itemizedlist> + </sect2> + +</sect1> + diff --git a/doc/src/sgml/chkpass.sgml b/doc/src/sgml/chkpass.sgml new file mode 100644 index 00000000000..e0179b3971c --- /dev/null +++ b/doc/src/sgml/chkpass.sgml @@ -0,0 +1,84 @@ +<sect1 id="chkpass"> + <title>chkpass</title> + + <!-- + <indexterm zone="chkpass"> + <primary>chkpass</primary> + </indexterm> + --> + <para> + chkpass is a password type that is automatically checked and converted upon + entry. It is stored encrypted. To compare, simply compare against a clear + text password and the comparison function will encrypt it before comparing. + It also returns an error if the code determines that the password is easily + crackable. This is currently a stub that does nothing. + </para> + + <para> + Note that the chkpass data type is not indexable. + <!-- + I haven't worried about making this type indexable. I doubt that anyone + would ever need to sort a file in order of encrypted password. + --> + </para> + + <para> + If you precede the string with a colon, the encryption and checking are + skipped so that you can enter existing passwords into the field. + </para> + + <para> + On output, a colon is prepended. This makes it possible to dump and reload + passwords without re-encrypting them. If you want the password (encrypted) + without the colon then use the raw() function. This allows you to use the + type with things like Apache's Auth_PostgreSQL module. + </para> + + <para> + The encryption uses the standard Unix function crypt(), and so it suffers + from all the usual limitations of that function; notably that only the + first eight characters of a password are considered. + </para> + + <para> + Here is some sample usage: + </para> + + <programlisting> +test=# create table test (p chkpass); +CREATE TABLE +test=# insert into test values ('hello'); +INSERT 0 1 +test=# select * from test; + p +---------------- + :dVGkpXdOrE3ko +(1 row) + +test=# select raw(p) from test; + raw +--------------- + dVGkpXdOrE3ko +(1 row) + +test=# select p = 'hello' from test; + ?column? +---------- + t +(1 row) + +test=# select p = 'goodbye' from test; + ?column? +---------- + f +(1 row) + </programlisting> + + <sect2> + <title>Author</title> + <para> + D'Arcy J.M. Cain <email>darcy@druid.net</email> + </para> + </sect2> +</sect1> + diff --git a/doc/src/sgml/contrib.sgml b/doc/src/sgml/contrib.sgml new file mode 100644 index 00000000000..54c66d33a5c --- /dev/null +++ b/doc/src/sgml/contrib.sgml @@ -0,0 +1,56 @@ +<chapter id="contrib"> + <title>Standard Modules</title> + + <para> + This section contains information regarding the standard modules which + can be found in the <literal>contrib</literal> directory of the + PostgreSQL distribution. These are porting tools, analysis utilities, + and plug-in features that are not part of the core PostgreSQL system, + mainly because they address a limited audience or are too experimental + to be part of the main source tree. This does not preclude their + usefulness. + </para> + + <para> + Some modules supply new user-defined functions, operators, or types. In + these cases, you will need to run <literal>make</literal> and <literal>make + install</literal> in <literal>contrib/module</literal>. After you have + installed the files you need to register the new entities in the database + system by running the commands in the supplied .sql file. For example, + + <programlisting> + $ psql -d dbname -f module.sql + </programlisting> + </para> + + &adminpack; + &btree-gist; + &chkpass; + &cube; + &dblink; + &earthdistance; + &fuzzystrmatch; + &hstore; + &intagg; + &intarray; + &isn; + &lo; + <ree; + &oid2name; + &pageinspect; + &pgbench; + &buffercache; + &pgcrypto; + &freespacemap; + &pgrowlocks; + &standby; + &pgstattuple; + &trgm; + &seg; + &sslinfo; + &tablefunc; + &uuid-ossp; + &vacuumlo; + &xml2; +</chapter> + diff --git a/doc/src/sgml/cube.sgml b/doc/src/sgml/cube.sgml new file mode 100644 index 00000000000..da19ae204af --- /dev/null +++ b/doc/src/sgml/cube.sgml @@ -0,0 +1,529 @@ + +<sect1 id="cube"> + <title>cube</title> + + <indexterm zone="cube"> + <primary>cube</primary> + </indexterm> + + <para> + This module contains the user-defined type, CUBE, representing + multidimensional cubes. + </para> + + <sect2> + <title>Syntax</title> + + <para> + The following are valid external representations for the CUBE type: + </para> + + <table> + <title>Cube external representations</title> + <tgroup cols="2"> + <tbody> + <row> + <entry>'x'</entry> + <entry>A floating point value representing a one-dimensional point or + one-dimensional zero length cubement + </entry> + </row> + <row> + <entry>'(x)'</entry> + <entry>Same as above</entry> + </row> + <row> + <entry>'x1,x2,x3,...,xn'</entry> + <entry>A point in n-dimensional space, represented internally as a zero + volume box + </entry> + </row> + <row> + <entry>'(x1,x2,x3,...,xn)'</entry> + <entry>Same as above</entry> + </row> + <row> + <entry>'(x),(y)'</entry> + <entry>1-D cubement starting at x and ending at y or vice versa; the + order does not matter + </entry> + </row> + <row> + <entry>'(x1,...,xn),(y1,...,yn)'</entry> + <entry>n-dimensional box represented by a pair of its opposite corners, no + matter which. Functions take care of swapping to achieve "lower left -- + upper right" representation before computing any values + </entry> + </row> + </tbody> + </tgroup> + </table> + </sect2> + + <sect2> + <title>Grammar</title> + <table> + <title>Cube Grammar Rules</title> + <tgroup cols="2"> + <tbody> + <row> + <entry>rule 1</entry> + <entry>box -> O_BRACKET paren_list COMMA paren_list C_BRACKET</entry> + </row> + <row> + <entry>rule 2</entry> + <entry>box -> paren_list COMMA paren_list</entry> + </row> + <row> + <entry>rule 3</entry> + <entry>box -> paren_list</entry> + </row> + <row> + <entry>rule 4</entry> + <entry>box -> list</entry> + </row> + <row> + <entry>rule 5</entry> + <entry>paren_list -> O_PAREN list C_PAREN</entry> + </row> + <row> + <entry>rule 6</entry> + <entry>list -> FLOAT</entry> + </row> + <row> + <entry>rule 7</entry> + <entry>list -> list COMMA FLOAT</entry> + </row> + </tbody> + </tgroup> + </table> + </sect2> + + <sect2> + <title>Tokens</title> + <table> + <title>Cube Grammar Rules</title> + <tgroup cols="2"> + <tbody> + <row> + <entry>n</entry> + <entry>[0-9]+</entry> + </row> + <row> + <entry>i</entry> + <entry>nteger [+-]?{n}</entry> + </row> + <row> + <entry>real</entry> + <entry>[+-]?({n}\.{n}?|\.{n})</entry> + </row> + <row> + <entry>FLOAT</entry> + <entry>({integer}|{real})([eE]{integer})?</entry> + </row> + <row> + <entry>O_BRACKET</entry> + <entry>\[</entry> + </row> + <row> + <entry>C_BRACKET</entry> + <entry>\]</entry> + </row> + <row> + <entry>O_PAREN</entry> + <entry>\(</entry> + </row> + <row> + <entry>C_PAREN</entry> + <entry>\)</entry> + </row> + <row> + <entry>COMMA</entry> + <entry>\,</entry> + </row> + </tbody> + </tgroup> + </table> + </sect2> + + <sect2> + <title>Examples</title> + <table> + <title>Examples</title> + <tgroup cols="2"> + <tbody> + <row> + <entry>'x'</entry> + <entry>A floating point value representing a one-dimensional point + (or, zero-length one-dimensional interval) + </entry> + </row> + <row> + <entry>'(x)'</entry> + <entry>Same as above</entry> + </row> + <row> + <entry>'x1,x2,x3,...,xn'</entry> + <entry>A point in n-dimensional space,represented internally as a zero + volume cube + </entry> + </row> + <row> + <entry>'(x1,x2,x3,...,xn)'</entry> + <entry>Same as above</entry> + </row> + <row> + <entry>'(x),(y)'</entry> + <entry>A 1-D interval starting at x and ending at y or vice versa; the + order does not matter + </entry> + </row> + <row> + <entry>'[(x),(y)]'</entry> + <entry>Same as above</entry> + </row> + <row> + <entry>'(x1,...,xn),(y1,...,yn)'</entry> + <entry>An n-dimensional box represented by a pair of its diagonally + opposite corners, regardless of order. Swapping is provided + by all comarison routines to ensure the + "lower left -- upper right" representation + before actaul comparison takes place. + </entry> + </row> + <row> + <entry>'[(x1,...,xn),(y1,...,yn)]'</entry> + <entry>Same as above</entry> + </row> + </tbody> + </tgroup> + </table> + <para> + White space is ignored, so '[(x),(y)]' can be: '[ ( x ), ( y ) ]' + </para> + </sect2> + <sect2> + <title>Defaults</title> + <para> + I believe this union: + </para> +<programlisting> +select cube_union('(0,5,2),(2,3,1)','0'); +cube_union +------------------- +(0, 0, 0),(2, 5, 2) +(1 row) +</programlisting> + + <para> + does not contradict to the common sense, neither does the intersection + </para> + +<programlisting> +select cube_inter('(0,-1),(1,1)','(-2),(2)'); +cube_inter +------------- +(0, 0),(1, 0) +(1 row) +</programlisting> + + <para> + In all binary operations on differently sized boxes, I assume the smaller + one to be a cartesian projection, i. e., having zeroes in place of coordinates + omitted in the string representation. The above examples are equivalent to: + </para> + +<programlisting> +cube_union('(0,5,2),(2,3,1)','(0,0,0),(0,0,0)'); +cube_inter('(0,-1),(1,1)','(-2,0),(2,0)'); +</programlisting> + + <para> + The following containment predicate uses the point syntax, + while in fact the second argument is internally represented by a box. + This syntax makes it unnecessary to define the special Point type + and functions for (box,point) predicates. + </para> + +<programlisting> +select cube_contains('(0,0),(1,1)', '0.5,0.5'); +cube_contains +-------------- +t +(1 row) +</programlisting> + </sect2> + <sect2> + <title>Precision</title> + <para> +Values are stored internally as 64-bit floating point numbers. This means that +numbers with more than about 16 significant digits will be truncated. + </para> + </sect2> + + <sect2> + <title>Usage</title> + <para> + The access method for CUBE is a GiST index (gist_cube_ops), which is a + generalization of R-tree. GiSTs allow the postgres implementation of + R-tree, originally encoded to support 2-D geometric types such as + boxes and polygons, to be used with any data type whose data domain + can be partitioned using the concepts of containment, intersection and + equality. In other words, everything that can intersect or contain + its own kind can be indexed with a GiST. That includes, among other + things, all geometric data types, regardless of their dimensionality + (see also contrib/seg). + </para> + + <para> + The operators supported by the GiST access method include: + </para> + + <programlisting> +a = b Same as + </programlisting> + <para> + The cubements a and b are identical. + </para> + + <programlisting> +a && b Overlaps + </programlisting> + <para> + The cubements a and b overlap. + </para> + + <programlisting> +a @> b Contains + </programlisting> + <para> + The cubement a contains the cubement b. + </para> + + <programlisting> +a <@ b Contained in + </programlisting> + <para> + The cubement a is contained in b. + </para> + + <para> + (Before PostgreSQL 8.2, the containment operators @> and <@ were + respectively called @ and ~. These names are still available, but are + deprecated and will eventually be retired. Notice that the old names + are reversed from the convention formerly followed by the core geometric + datatypes!) + </para> + + <para> + Although the mnemonics of the following operators is questionable, I + preserved them to maintain visual consistency with other geometric + data types defined in Postgres. + </para> + + <para> + Other operators: + </para> + + <programlisting> +[a, b] < [c, d] Less than +[a, b] > [c, d] Greater than + </programlisting> + + <para> + These operators do not make a lot of sense for any practical + purpose but sorting. These operators first compare (a) to (c), + and if these are equal, compare (b) to (d). That accounts for + reasonably good sorting in most cases, which is useful if + you want to use ORDER BY with this type + </para> + + <para> + The following functions are available: + </para> + + <table> + <title>Functions available</title> + <tgroup cols="2"> + <tbody> + <row> + <entry><literal>cube_distance(cube, cube) returns double</literal></entry> + <entry>cube_distance returns the distance between two cubes. If both + cubes are points, this is the normal distance function. + </entry> + </row> + <row> + <entry><literal>cube(float8) returns cube</literal></entry> + <entry>This makes a one dimensional cube with both coordinates the same. + If the type of the argument is a numeric type other than float8 an + explicit cast to float8 may be needed. + <literal>cube(1) == '(1)'</literal> + </entry> + </row> + + <row> + <entry><literal>cube(float8, float8) returns cube</literal></entry> + <entry> + This makes a one dimensional cube. + <literal>cube(1,2) == '(1),(2)'</literal> + </entry> + </row> + + <row> + <entry><literal>cube(float8[]) returns cube</literal></entry> + <entry>This makes a zero-volume cube using the coordinates + defined by thearray.<literal>cube(ARRAY[1,2]) == '(1,2)'</literal> + </entry> + </row> + + <row> + <entry><literal>cube(float8[], float8[]) returns cube</literal></entry> + <entry>This makes a cube, with upper right and lower left + coordinates as defined by the 2 float arrays. Arrays must be of the + same length. + <literal>cube('{1,2}'::float[], '{3,4}'::float[]) == '(1,2),(3,4)' + </literal> + </entry> + </row> + + <row> + <entry><literal>cube(cube, float8) returns cube</literal></entry> + <entry>This builds a new cube by adding a dimension on to an + existing cube with the same values for both parts of the new coordinate. + This is useful for building cubes piece by piece from calculated values. + <literal>cube('(1)',2) == '(1,2),(1,2)'</literal> + </entry> + </row> + + <row> + <entry><literal>cube(cube, float8, float8) returns cube</literal></entry> + <entry>This builds a new cube by adding a dimension on to an + existing cube. This is useful for building cubes piece by piece from + calculated values. <literal>cube('(1,2)',3,4) == '(1,3),(2,4)'</literal> + </entry> + </row> + + <row> + <entry><literal>cube_dim(cube) returns int</literal></entry> + <entry>cube_dim returns the number of dimensions stored in the + the data structure + for a cube. This is useful for constraints on the dimensions of a cube. + </entry> + </row> + + <row> + <entry><literal>cube_ll_coord(cube, int) returns double </literal></entry> + <entry> + cube_ll_coord returns the nth coordinate value for the lower left + corner of a cube. This is useful for doing coordinate transformations. + </entry> + </row> + + <row> + <entry><literal>cube_ur_coord(cube, int) returns double + </literal></entry> + <entry>cube_ur_coord returns the nth coordinate value for the + upper right corner of a cube. This is useful for doing coordinate + transformations. + </entry> + </row> + + <row> + <entry><literal>cube_subset(cube, int[]) returns cube + </literal></entry> + <entry>Builds a new cube from an existing cube, using a list of + dimension indexes + from an array. Can be used to find both the ll and ur coordinate of single + dimenion, e.g.: cube_subset(cube('(1,3,5),(6,7,8)'), ARRAY[2]) = '(3),(7)' + Or can be used to drop dimensions, or reorder them as desired, e.g.: + cube_subset(cube('(1,3,5),(6,7,8)'), ARRAY[3,2,1,1]) = + '(5, 3, 1, 1),(8, 7, 6, 6)' + </entry> + </row> + + <row> + <entry><literal>cube_is_point(cube) returns bool</literal></entry> + <entry>cube_is_point returns true if a cube is also a point. + This is true when the two defining corners are the same.</entry> + </row> + + <row> + <entry><literal>cube_enlarge(cube, double, int) returns cube</literal></entry> + <entry> + cube_enlarge increases the size of a cube by a specified + radius in at least + n dimensions. If the radius is negative the box is shrunk instead. This + is useful for creating bounding boxes around a point for searching for + nearby points. All defined dimensions are changed by the radius. If n + is greater than the number of defined dimensions and the cube is being + increased (r >= 0) then 0 is used as the base for the extra coordinates. + LL coordinates are decreased by r and UR coordinates are increased by r. + If a LL coordinate is increased to larger than the corresponding UR + coordinate (this can only happen when r < 0) than both coordinates are + set to their average. To make it harder for people to break things there + is an effective maximum on the dimension of cubes of 100. This is set + in cubedata.h if you need something bigger. + </entry> + </row> + </tbody> + </tgroup> + </table> + + <para> + There are a few other potentially useful functions defined in cube.c + that vanished from the schema because I stopped using them. Some of + these were meant to support type casting. Let me know if I was wrong: + I will then add them back to the schema. I would also appreciate + other ideas that would enhance the type and make it more useful. + </para> + + <para> + For examples of usage, see sql/cube.sql + </para> + </sect2> + + <sect2> + <title>Credits</title> + <para> + This code is essentially based on the example written for + Illustra, <ulink url="http://garcia.me.berkeley.edu/~adong/rtree"></ulink> + </para> + <para> + My thanks are primarily to Prof. Joe Hellerstein + (<ulink url="http://db.cs.berkeley.edu/~jmh/"></ulink>) for elucidating the + gist of the GiST (<ulink url="http://gist.cs.berkeley.edu/"></ulink>), and + to his former student, Andy Dong + (<ulink url="http://best.me.berkeley.edu/~adong/"></ulink>), for his exemplar. + I am also grateful to all postgres developers, present and past, for enabling + myself to create my own world and live undisturbed in it. And I would like to + acknowledge my gratitude to Argonne Lab and to the U.S. Department of Energy + for the years of faithful support of my database research. + </para> + + <para> + Gene Selkov, Jr. + Computational Scientist + Mathematics and Computer Science Division + Argonne National Laboratory + 9700 S Cass Ave. + Building 221 + Argonne, IL 60439-4844 + <email>selkovjr@mcs.anl.gov</email> + </para> + + <para> + Minor updates to this package were made by Bruno Wolff III + <email>bruno@wolff.to</email> in August/September of 2002. These include + changing the precision from single precision to double precision and adding + some new functions. + </para> + + <para> + Additional updates were made by Joshua Reich <email>josh@root.net</email> in + July 2006. These include <literal>cube(float8[], float8[])</literal> and + cleaning up the code to use the V1 call protocol instead of the deprecated V0 + form. + </para> + </sect2> +</sect1> + diff --git a/doc/src/sgml/dblink.sgml b/doc/src/sgml/dblink.sgml new file mode 100644 index 00000000000..095d600099b --- /dev/null +++ b/doc/src/sgml/dblink.sgml @@ -0,0 +1,1312 @@ +<!-- <reference> --> +<sect1 id="dblink"> + <title>dblink</title> + <para> + dblink is a contrib module which allows connections with + other databases. + </para> + + <!-- dblink_connect --> + + <refentry> + <refnamediv> + <refname>dblink_connect</refname> + <refpurpose>opens a persistent connection to a remote database</refpurpose> + </refnamediv> + + <refsynopsisdiv> + <synopsis> + dblink_connect(text connstr) + dblink_connect(text connname, text connstr) + </synopsis> + </refsynopsisdiv> + + <refsect1> + <title>Inputs</title> + + <refsect2> + <title>connname</title> + <para> + if 2 arguments ar given, the first is used as a name for a persistent + connection + </para> + </refsect2> + + <refsect2> + <title>connstr</title> + <para> + standard libpq format connection string, + e.g. "hostaddr=127.0.0.1 port=5432 dbname=mydb user=postgres password=mypasswd" + </para> + <para> + if only one argument is given, the connection is unnamed; only one unnamed + connection can exist at a time + </para> + </refsect2> + </refsect1> + + <refsect1> + <title>Outputs</title> + <para>Returns status = "OK"</para> + </refsect1> + + <refsect1> + <title>Example</title> + <programlisting> + select dblink_connect('dbname=postgres'); + dblink_connect + ---------------- + OK + (1 row) + + select dblink_connect('myconn','dbname=postgres'); + dblink_connect + ---------------- + OK + (1 row) + </programlisting> + </refsect1> + </refentry> + + <!-- dblink_disconnect --> + <refentry> + <refnamediv> + <refname>dblink_disconnect</refname> + <refpurpose>closes a persistent connection to a remote database</refpurpose> + </refnamediv> + + <refsynopsisdiv> + <synopsis> + dblink_disconnect() + dblink_disconnect(text connname) + </synopsis> + </refsynopsisdiv> + + <refsect1> + <title>Inputs</title> + + <refsect2> + <title>connname</title> + <para> + if an argument is given, it is used as a name for a persistent + connection to close; otherwiase the unnamed connection is closed + </para> + </refsect2> + </refsect1> + + <refsect1> + <title>Outputs</title> + <para>Returns status = "OK"</para> + </refsect1> + + <refsect1> + <title>Example</title> + <programlisting> + test=# select dblink_disconnect(); + dblink_disconnect + ------------------- + OK + (1 row) + + select dblink_disconnect('myconn'); + dblink_disconnect + ------------------- + OK + (1 row) + </programlisting> + </refsect1> + </refentry> + + <!-- dblink_open --> + <refentry> + <refnamediv> + <refname>dblink_open</refname> + <refpurpose>opens a cursor on a remote database</refpurpose> + </refnamediv> + + <refsynopsisdiv> + <synopsis> + dblink_open(text cursorname, text sql [, bool fail_on_error]) + dblink_open(text connname, text cursorname, text sql [, bool fail_on_error]) + </synopsis> + </refsynopsisdiv> + + <refsect1> + <title>Inputs</title> + + <refsect2> + <title>connname</title> + <para> + if three arguments are present, the first is taken as the specific + connection name to use; otherwise the unnamed connection is assumed + </para> + </refsect2> + + <refsect2> + <title>cursorname</title> + <para> + a reference name for the cursor + </para> + </refsect2> + + <refsect2> + <title>sql</title> + <para> + sql statement that you wish to execute on the remote host + e.g. "select * from pg_class" + </para> + </refsect2> + + <refsect2> + <title>fail_on_error</title> + <para> + If true (default when not present) then an ERROR thrown on the remote side + of the connection causes an ERROR to also be thrown locally. If false, the + remote ERROR is locally treated as a NOTICE, and the return value is set + to 'ERROR'. + </para> + </refsect2> + </refsect1> + + <refsect1> + <title>Outputs</title> + <para>Returns status = "OK"</para> + </refsect1> + + <refsect1> + <title>Note</title> + <itemizedlist> + <listitem> + <para> + dblink_connect(text connstr) must be executed first + </para> + </listitem> + <listitem> + <para> + dblink_open starts an explicit transaction. If, after using dblink_open, + you use dblink_exec to change data, and then an error occurs or you use + dblink_disconnect without a dblink_close first, your change *will* be + lost. Also, using dblink_close explicitly ends the transaction and thus + effectively closes *all* open cursors. + </para> + </listitem> + </itemizedlist> + + </refsect1> + <refsect1> + <title>Example</title> + <programlisting> + test=# select dblink_connect('dbname=postgres'); + dblink_connect + ---------------- + OK + (1 row) + + test=# select dblink_open('foo','select proname, prosrc from pg_proc'); + dblink_open + ------------- + OK + (1 row) + </programlisting> + </refsect1> + </refentry> + + <!-- dblink_fetch --> + <refentry> + <refnamediv> + <refname>dblink_fetch</refname> + <refpurpose>returns a set from an open cursor on a remote database</refpurpose> + </refnamediv> + + <refsynopsisdiv> + <synopsis> + dblink_fetch(text cursorname, int32 howmany [, bool fail_on_error]) + dblink_fetch(text connname, text cursorname, int32 howmany [, bool fail_on_error]) + </synopsis> + </refsynopsisdiv> + + <refsect1> + <title>Inputs</title> + + <refsect2> + <title>connname</title> + <para> + if three arguments are present, the first is taken as the specific + connection name to use; otherwise the unnamed connection is assumed + </para> + </refsect2> + + <refsect2> + <title>cursorname</title> + <para> + The reference name for the cursor + </para> + </refsect2> + + <refsect2> + <title>howmany</title> + <para> + Maximum number of rows to retrieve. The next howmany rows are fetched, + starting at the current cursor position, moving forward. Once the cursor + has positioned to the end, no more rows are produced. + </para> + </refsect2> + + <refsect2> + <title>fail_on_error</title> + <para> + If true (default when not present) then an ERROR thrown on the remote side + of the connection causes an ERROR to also be thrown locally. If false, the + remote ERROR is locally treated as a NOTICE, and no rows are returned. + </para> + </refsect2> + </refsect1> + + <refsect1> + <title>Outputs</title> + <para>Returns setof record</para> + </refsect1> + + <refsect1> + <title>Note</title> + <para> + On a mismatch between the number of return fields as specified in the FROM + clause, and the actual number of fields returned by the remote cursor, an + ERROR will be thrown. In this event, the remote cursor is still advanced + by as many rows as it would have been if the ERROR had not occurred. + </para> + </refsect1> + + <refsect1> + <title>Example</title> + <programlisting> + test=# select dblink_connect('dbname=postgres'); + dblink_connect + ---------------- + OK + (1 row) + + test=# select dblink_open('foo','select proname, prosrc from pg_proc where proname like ''bytea%'''); + dblink_open + ------------- + OK + (1 row) + + test=# select * from dblink_fetch('foo',5) as (funcname name, source text); + funcname | source + ----------+---------- + byteacat | byteacat + byteacmp | byteacmp + byteaeq | byteaeq + byteage | byteage + byteagt | byteagt + (5 rows) + + test=# select * from dblink_fetch('foo',5) as (funcname name, source text); + funcname | source + -----------+----------- + byteain | byteain + byteale | byteale + bytealike | bytealike + bytealt | bytealt + byteane | byteane + (5 rows) + + test=# select * from dblink_fetch('foo',5) as (funcname name, source text); + funcname | source + ------------+------------ + byteanlike | byteanlike + byteaout | byteaout + (2 rows) + + test=# select * from dblink_fetch('foo',5) as (funcname name, source text); + funcname | source + ----------+-------- + (0 rows) + </programlisting> + </refsect1> + </refentry> + + <!-- dblink_close --> + <refentry> + <refnamediv> + <refname>dblink_close</refname> + <refpurpose>closes a cursor on a remote database</refpurpose> + </refnamediv> + + <refsynopsisdiv> + <synopsis> + dblink_close(text cursorname [, bool fail_on_error]) + dblink_close(text connname, text cursorname [, bool fail_on_error]) + </synopsis> + </refsynopsisdiv> + + <refsect1> + <title>Inputs</title> + + <refsect2> + <title>connname</title> + <para> + if two arguments are present, the first is taken as the specific + connection name to use; otherwise the unnamed connection is assumed + </para> + </refsect2> + + <refsect2> + <title>cursorname</title> + <para> + a reference name for the cursor + </para> + </refsect2> + + <refsect2> + <title>fail_on_error</title> + <para> + If true (default when not present) then an ERROR thrown on the remote side + of the connection causes an ERROR to also be thrown locally. If false, the + remote ERROR is locally treated as a NOTICE, and the return value is set + to 'ERROR'. + </para> + </refsect2> + </refsect1> + + <refsect1> + <title>Outputs</title> + <para>Returns status = "OK"</para> + </refsect1> + + <refsect1> + <title>Note</title> + <para> + dblink_connect(text connstr) or dblink_connect(text connname, text connstr) + must be executed first. + </para> + </refsect1> + + <refsect1> + <title>Example</title> + <programlisting> + test=# select dblink_connect('dbname=postgres'); + dblink_connect + ---------------- + OK + (1 row) + + test=# select dblink_open('foo','select proname, prosrc from pg_proc'); + dblink_open + ------------- + OK + (1 row) + + test=# select dblink_close('foo'); + dblink_close + -------------- + OK + (1 row) + + select dblink_connect('myconn','dbname=regression'); + dblink_connect + ---------------- + OK + (1 row) + + select dblink_open('myconn','foo','select proname, prosrc from pg_proc'); + dblink_open + ------------- + OK + (1 row) + + select dblink_close('myconn','foo'); + dblink_close + -------------- + OK + (1 row) + </programlisting> + </refsect1> + </refentry> + + <!-- dblink_exec --> + <refentry> + <refnamediv> + <refname>dblink_exec</refname> + <refpurpose>executes an UPDATE/INSERT/DELETE on a remote database</refpurpose> + </refnamediv> + + <refsynopsisdiv> + <synopsis> + dblink_exec(text connstr, text sql [, bool fail_on_error]) + dblink_exec(text connname, text sql [, bool fail_on_error]) + dblink_exec(text sql [, bool fail_on_error]) + </synopsis> + </refsynopsisdiv> + + <refsect1> + <title>Inputs</title> + + <refsect2> + <title>connname/connstr</title> + <para> + If two arguments are present, the first is first assumed to be a specific + connection name to use. If the name is not found, the argument is then + assumed to be a valid connection string, of standard libpq format, + e.g.: "hostaddr=127.0.0.1 dbname=mydb user=postgres password=mypasswd" + + If only one argument is used, then the unnamed connection is used. + </para> + </refsect2> + + <refsect2> + <title>sql</title> + <para> + sql statement that you wish to execute on the remote host, e.g.: + insert into foo values(0,'a','{"a0","b0","c0"}'); + </para> + </refsect2> + <refsect2> + <title>fail_on_error</title> + <para> + If true (default when not present) then an ERROR thrown on the remote side + of the connection causes an ERROR to also be thrown locally. If false, the + remote ERROR is locally treated as a NOTICE, and the return value is set + to 'ERROR'. + </para> + </refsect2> + </refsect1> + + <refsect1> + <title>Outputs</title> + <para>Returns status of the command, or 'ERROR' if the command failed.</para> + </refsect1> + + <refsect1> + <title>Notes</title> + <para> + dblink_open starts an explicit transaction. If, after using dblink_open, + you use dblink_exec to change data, and then an error occurs or you use + dblink_disconnect without a dblink_close first, your change *will* be + lost. + </para> + </refsect1> + + <refsect1> + <title>Example</title> + <programlisting> + select dblink_connect('dbname=dblink_test_slave'); + dblink_connect + ---------------- + OK + (1 row) + + select dblink_exec('insert into foo values(21,''z'',''{"a0","b0","c0"}'');'); + dblink_exec + ----------------- + INSERT 943366 1 + (1 row) + + select dblink_connect('myconn','dbname=regression'); + dblink_connect + ---------------- + OK + (1 row) + + select dblink_exec('myconn','insert into foo values(21,''z'',''{"a0","b0","c0"}'');'); + dblink_exec + ------------------ + INSERT 6432584 1 + (1 row) + + select dblink_exec('myconn','insert into pg_class values (''foo'')',false); + NOTICE: sql error + DETAIL: ERROR: null value in column "relnamespace" violates not-null constraint + + dblink_exec + ------------- + ERROR + (1 row) + </programlisting> + </refsect1> + </refentry> + + <!-- dblink_current_query --> + <refentry> + <refnamediv> + <refname>dblink_current_query</refname> + <refpurpose>returns the current query string</refpurpose> + </refnamediv> + + <refsynopsisdiv> + <synopsis> + dblink_current_query () RETURNS text + </synopsis> + </refsynopsisdiv> + + <refsect1> + <title>Inputs</title> + + <refsect2> + <title>None</title> + <para> + </para> + </refsect2> + </refsect1> + + <refsect1> + <title>Outputs</title> + <para>Returns test -- a copy of the currenty executing query</para> + </refsect1> + + <refsect1> + <title>Example</title> + <programlisting> + test=# select dblink_current_query() from (select dblink('dbname=postgres','select oid, proname from pg_proc where proname = ''byteacat''') as f1) as t1; + dblink_current_query + ----------------------------------------------------------------------------------------------------------------------------------------------------- + select dblink_current_query() from (select dblink('dbname=postgres','select oid, proname from pg_proc where proname = ''byteacat''') as f1) as t1; + (1 row) + </programlisting> + </refsect1> + </refentry> + + <!-- dblink_get_pkey --> + <refentry> + <refnamediv> + <refname>dblink_get_pkey</refname> + <refpurpose>returns the position and field names of a relation's + primary key fields + </refpurpose> + </refnamediv> + + <refsynopsisdiv> + <synopsis> + dblink_get_pkey(text relname) RETURNS setof dblink_pkey_results + </synopsis> + </refsynopsisdiv> + + <refsect1> + <title>Inputs</title> + + <refsect2> + <title>relname</title> + <para> + any relation name; + e.g. 'foobar' + </para> + </refsect2> + </refsect1> + + <refsect1> + <title>Outputs</title> + <para> + Returns setof dblink_pkey_results -- one row for each primary key field, + in order of position in the key. dblink_pkey_results is defined as follows: + CREATE TYPE dblink_pkey_results AS (position int4, colname text); + </para> + </refsect1> + + <refsect1> + <title>Example</title> + <programlisting> + test=# select * from dblink_get_pkey('foobar'); + position | colname + ----------+--------- + 1 | f1 + 2 | f2 + 3 | f3 + 4 | f4 + 5 | f5 + </programlisting> + </refsect1> + </refentry> + + <!-- dblink_build_sql_insert --> + <refentry> + <refnamediv> + <refname>dblink_build_sql_insert</refname> + <refpurpose> + builds an insert statement using a local tuple, replacing the + selection key field values with alternate supplied values + </refpurpose> + </refnamediv> + + <refsynopsisdiv> + <synopsis> + dblink_build_sql_insert(text relname + ,int2vector primary_key_attnums + ,int2 num_primary_key_atts + ,_text src_pk_att_vals_array + ,_text tgt_pk_att_vals_array) RETURNS text + </synopsis> + </refsynopsisdiv> + + <refsect1> + <title>Inputs</title> + + <refsect2> + <title>relname</title> + <para> + any relation name; + e.g. 'foobar'; + </para> + </refsect2> + <refsect2> + <title>primary_key_attnums</title> + <para> + vector of primary key attnums (1 based, see pg_index.indkey); + e.g. '1 2' + </para> + </refsect2> + <refsect2> + <title>num_primary_key_atts</title> + <para> + number of primary key attnums in the vector; e.g. 2 + </para> + </refsect2> + <refsect2> + <title>src_pk_att_vals_array</title> + <para> + array of primary key values, used to look up the local matching + tuple, the values of which are then used to construct the SQL + statement + </para> + </refsect2> + <refsect2> + <title>tgt_pk_att_vals_array</title> + <para> + array of primary key values, used to replace the local tuple + values in the SQL statement + </para> + </refsect2> + </refsect1> + + <refsect1> + <title>Outputs</title> + <para>Returns text -- requested SQL statement</para> + </refsect1> + + <refsect1> + <title>Example</title> + <programlisting> + test=# select dblink_build_sql_insert('foo','1 2',2,'{"1", "a"}','{"1", "b''a"}'); + dblink_build_sql_insert + -------------------------------------------------- + INSERT INTO foo(f1,f2,f3) VALUES('1','b''a','1') + (1 row) + </programlisting> + </refsect1> + </refentry> + + <!-- dblink_build_sql_delete --> + <refentry> + <refnamediv> + <refname>dblink_build_sql_delete</refname> + <refpurpose>builds a delete statement using supplied values for selection + key field values + </refpurpose> + </refnamediv> + + <refsynopsisdiv> + <synopsis> + dblink_build_sql_delete(text relname + ,int2vector primary_key_attnums + ,int2 num_primary_key_atts + ,_text tgt_pk_att_vals_array) RETURNS text + </synopsis> + </refsynopsisdiv> + + <refsect1> + <title>Inputs</title> + + <refsect2> + <title>relname</title> + <para> + any relation name; + e.g. 'foobar'; + </para> + </refsect2> + <refsect2> + <title>primary_key_attnums</title> + <para> + vector of primary key attnums (1 based, see pg_index.indkey); + e.g. '1 2' + </para> + </refsect2> + <refsect2> + <title>num_primary_key_atts</title> + <para> + number of primary key attnums in the vector; e.g. 2 + </para> + </refsect2> + <refsect2> + <title>src_pk_att_vals_array</title> + <para> + array of primary key values, used to look up the local matching + tuple, the values of which are then used to construct the SQL + statement + </para> + </refsect2> + <refsect2> + <title>tgt_pk_att_vals_array</title> + <para> + array of primary key values, used to replace the local tuple + values in the SQL statement + </para> + </refsect2> + </refsect1> + + <refsect1> + <title>Outputs</title> + <para>Returns text -- requested SQL statement</para> + </refsect1> + + <refsect1> + <title>Example</title> + <programlisting> + test=# select dblink_build_sql_delete('MyFoo','1 2',2,'{"1", "b"}'); + dblink_build_sql_delete + --------------------------------------------- + DELETE FROM "MyFoo" WHERE f1='1' AND f2='b' + (1 row) + </programlisting> + </refsect1> + </refentry> + + <!-- dblink_build_sql_update --> + <refentry> + <refnamediv> + <refname>dblink_build_sql_update</refname> + <refpurpose>builds an update statement using a local tuple, replacing + the selection key field values with alternate supplied values + </refpurpose> + </refnamediv> + + <refsynopsisdiv> + <synopsis> + dblink_build_sql_update(text relname + ,int2vector primary_key_attnums + ,int2 num_primary_key_atts + ,_text src_pk_att_vals_array + ,_text tgt_pk_att_vals_array) RETURNS text + </synopsis> + </refsynopsisdiv> + + <refsect1> + <title>Inputs</title> + + <refsect2> + <title>relname</title> + <para> + any relation name; + e.g. 'foobar'; + </para> + </refsect2> + <refsect2> + <title>primary_key_attnums</title> + <para> + vector of primary key attnums (1 based, see pg_index.indkey); + e.g. '1 2' + </para> + </refsect2> + <refsect2> + <title>num_primary_key_atts</title> + <para> + number of primary key attnums in the vector; e.g. 2 + </para> + </refsect2> + <refsect2> + <title>src_pk_att_vals_array</title> + <para> + array of primary key values, used to look up the local matching + tuple, the values of which are then used to construct the SQL + statement + </para> + </refsect2> + <refsect2> + <title>tgt_pk_att_vals_array</title> + <para> + array of primary key values, used to replace the local tuple + values in the SQL statement + </para> + </refsect2> + </refsect1> + + <refsect1> + <title>Outputs</title> + <para>Returns text -- requested SQL statement</para> + </refsect1> + + <refsect1> + <title>Example</title> + <programlisting> + test=# select dblink_build_sql_update('foo','1 2',2,'{"1", "a"}','{"1", "b"}'); + dblink_build_sql_update + ------------------------------------------------------------- + UPDATE foo SET f1='1',f2='b',f3='1' WHERE f1='1' AND f2='b' + (1 row) + </programlisting> + </refsect1> + </refentry> + + <!-- dblink_get_connections --> + <refentry> + <refnamediv> + <refname>dblink_get_connections</refname> + <refpurpose>returns a text array of all active named dblink connections</refpurpose> + </refnamediv> + + <refsynopsisdiv> + <synopsis> + dblink_get_connections() RETURNS text[] + </synopsis> + </refsynopsisdiv> + + <refsect1> + <title>Inputs</title> + + <refsect2> + <title>none</title> + <para></para> + </refsect2> + </refsect1> + + <refsect1> + <title>Outputs</title> + <para>Returns text array of all active named dblink connections</para> + </refsect1> + + <refsect1> + <title>Example</title> + <programlisting> + SELECT dblink_get_connections(); + </programlisting> + </refsect1> + </refentry> + + <!-- dblinkd_is_busy --> + <refentry> + <refnamediv> + <refname>dblink_is_busy</refname> + <refpurpose>checks to see if named connection is busy with an async query</refpurpose> + </refnamediv> + + <refsynopsisdiv> + <synopsis> + dblink_is_busy(text connname) RETURNS int + </synopsis> + </refsynopsisdiv> + + <refsect1> + <title>Inputs</title> + + <refsect2> + <title>connname</title> + <para> + The specific connection name to use + </para> + </refsect2> + </refsect1> + + <refsect1> + <title>Outputs</title> + <para> + Returns 1 if connection is busy, 0 if it is not busy. + If this function returns 0, it is guaranteed that dblink_get_result + will not block. + </para> + </refsect1> + + <refsect1> + <title>Example</title> + <programlisting> + SELECT dblink_is_busy('dtest1'); + </programlisting> + </refsect1> + </refentry> + + <!-- dblink_cancel_query --> + <refentry> + <refnamediv> + <refname>dblink_cancel_query</refname> + <refpurpose>cancels any active query on the named connection</refpurpose> + </refnamediv> + + <refsynopsisdiv> + <synopsis> + dblink_cancel_query(text connname) RETURNS text + </synopsis> + </refsynopsisdiv> + + <refsect1> + <title>Inputs</title> + + <refsect2> + <title>connname</title> + <para> + The specific connection name to use. + </para> + </refsect2> + </refsect1> + + <refsect1> + <title>Outputs</title> + <para> + Returns "OK" on success, or an error message on failure. + </para> + </refsect1> + + <refsect1> + <title>Example</title> + <programlisting> + SELECT dblink_cancel_query('dtest1'); + </programlisting> + </refsect1> + </refentry> + + <!-- dblink_error_message --> + <refentry> + <refnamediv> + <refname>dblink_error_message</refname> + <refpurpose>gets last error message on the named connection</refpurpose> + </refnamediv> + + <refsynopsisdiv> + <synopsis> + dblink_error_message(text connname) RETURNS text + </synopsis> + </refsynopsisdiv> + + <refsect1> + <title>Inputs</title> + + <refsect2> + <title>connname</title> + <para> + The specific connection name to use. + </para> + </refsect2> + </refsect1> + + <refsect1> + <title>Outputs</title> + <para> + Returns last error message. + </para> + </refsect1> + + <refsect1> + <title>Example</title> + <programlisting> + SELECT dblink_error_message('dtest1'); + </programlisting> + </refsect1> + </refentry> + + <!-- dblink --> + <refentry> + <refnamediv> + <refname>dblink</refname> + <refpurpose>returns a set from a remote database</refpurpose> + </refnamediv> + + <refsynopsisdiv> + <synopsis> + dblink(text connstr, text sql [, bool fail_on_error]) + dblink(text connname, text sql [, bool fail_on_error]) + dblink(text sql [, bool fail_on_error]) + </synopsis> + </refsynopsisdiv> + + <refsect1> + <title>Inputs</title> + + <refsect2> + <title>connname/connstr</title> + <para> + If two arguments are present, the first is first assumed to be a specific + connection name to use. If the name is not found, the argument is then + assumed to be a valid connection string, of standard libpq format, + e.g.: "hostaddr=127.0.0.1 dbname=mydb user=postgres password=mypasswd" + + If only one argument is used, then the unnamed connection is used. + </para> + </refsect2> + + <refsect2> + <title>sql</title> + <para> + sql statement that you wish to execute on the remote host + e.g. "select * from pg_class" + </para> + </refsect2> + <refsect2> + <title>fail_on_error</title> + <para> + If true (default when not present) then an ERROR thrown on the remote side + of the connection causes an ERROR to also be thrown locally. If false, the + remote ERROR is locally treated as a NOTICE, and no rows are returned. + </para> + </refsect2> + <refsect2> + <title></title> + <para> + </para> + </refsect2> + </refsect1> + + <refsect1> + <title>Outputs</title> + <para>Returns setof record</para> + </refsect1> + + <refsect1> + <title>Example</title> + <programlisting> + select * from dblink('dbname=postgres','select proname, prosrc from pg_proc') + as t1(proname name, prosrc text) where proname like 'bytea%'; + proname | prosrc + ------------+------------ + byteacat | byteacat + byteaeq | byteaeq + bytealt | bytealt + byteale | byteale + byteagt | byteagt + byteage | byteage + byteane | byteane + byteacmp | byteacmp + bytealike | bytealike + byteanlike | byteanlike + byteain | byteain + byteaout | byteaout + (12 rows) + + select dblink_connect('dbname=postgres'); + dblink_connect + ---------------- + OK + (1 row) + + select * from dblink('select proname, prosrc from pg_proc') + as t1(proname name, prosrc text) where proname like 'bytea%'; + proname | prosrc + ------------+------------ + byteacat | byteacat + byteaeq | byteaeq + bytealt | bytealt + byteale | byteale + byteagt | byteagt + byteage | byteage + byteane | byteane + byteacmp | byteacmp + bytealike | bytealike + byteanlike | byteanlike + byteain | byteain + byteaout | byteaout + (12 rows) + + select dblink_connect('myconn','dbname=regression'); + dblink_connect + ---------------- + OK + (1 row) + + select * from dblink('myconn','select proname, prosrc from pg_proc') + as t1(proname name, prosrc text) where proname like 'bytea%'; + proname | prosrc + ------------+------------ + bytearecv | bytearecv + byteasend | byteasend + byteale | byteale + byteagt | byteagt + byteage | byteage + byteane | byteane + byteacmp | byteacmp + bytealike | bytealike + byteanlike | byteanlike + byteacat | byteacat + byteaeq | byteaeq + bytealt | bytealt + byteain | byteain + byteaout | byteaout + (14 rows) + </programlisting> + <para> + A more convenient way to use dblink may be to create a view: + </para> + <programlisting> + create view myremote_pg_proc as + select * + from dblink('dbname=postgres','select proname, prosrc from pg_proc') + as t1(proname name, prosrc text); + </programlisting> + <para> + Then you can simply write: + </para> + <programlisting> + select * from myremote_pg_proc where proname like 'bytea%'; + </programlisting> + </refsect1> + </refentry> + + <!-- dblink_send_query --> + <refentry> + <refnamediv> + <refname>dblink_send_query</refname> + <refpurpose>sends an async query to a remote database</refpurpose> + </refnamediv> + + <refsynopsisdiv> + <synopsis> + dblink_send_query(text connname, text sql) + </synopsis> + </refsynopsisdiv> + + <refsect1> + <title>Inputs</title> + + <refsect2> + <title>connname</title> + <para> + The specific connection name to use. + </para> + </refsect2> + <refsect2> + <title>sql</title> + <para> + sql statement that you wish to execute on the remote host + e.g. "select * from pg_class" + </para> + </refsect2> + </refsect1> + + <refsect1> + <title>Outputs</title> + <para> + Returns int. A return value of 1 if the query was successfully dispatched, + 0 otherwise. If 1, results must be fetched by dblink_get_result(connname). + A running query may be cancelled by dblink_cancel_query(connname). + </para> + </refsect1> + + <refsect1> + <title>Example</title> + <para> + <literal> + SELECT dblink_connect('dtest1', 'dbname=contrib_regression'); + SELECT * FROM + dblink_send_query('dtest1', 'SELECT * FROM foo WHERE f1 < 3') AS t1; + </literal> + </para> + </refsect1> + </refentry> + + <!-- dblink_get_result --> + <refentry> + <refnamediv> + <refname>dblink_get_result</refname> + <refpurpose>gets an async query result</refpurpose> + </refnamediv> + + <refsynopsisdiv> + <synopsis> + dblink_get_result(text connname [, bool fail_on_error]) + </synopsis> + </refsynopsisdiv> + + <refsect1> + <title>Inputs</title> + + <refsect2> + <title>connname</title> + <para> + The specific connection name to use. An asynchronous query must + have already been sent using dblink_send_query() + </para> + </refsect2> + <refsect2> + <title>fail_on_error</title> + <para> + If true (default when not present) then an ERROR thrown on the remote side + of the connection causes an ERROR to also be thrown locally. If false, the + remote ERROR is locally treated as a NOTICE, and no rows are returned. + </para> + </refsect2> + </refsect1> + + <refsect1> + <title>Outputs</title> + <para>Returns setof record</para> + </refsect1> + + <refsect1> + <title>Notes</title> + <para> + Blocks until a result gets available. + + This function *must* be called if dblink_send_query returned + a 1, even on cancelled queries - otherwise the connection + can't be used anymore. It must be called once for each query + sent, and one additional time to obtain an empty set result, + prior to using the connection again. + </para> + </refsect1> + + <refsect1> + <title>Example</title> + <programlisting> + contrib_regression=# SELECT dblink_connect('dtest1', 'dbname=contrib_regression'); + dblink_connect + ---------------- + OK + (1 row) + + contrib_regression=# SELECT * from + contrib_regression-# dblink_send_query('dtest1', 'select * from foo where f1 < 3') as t1; + t1 + ---- + 1 + (1 row) + + contrib_regression=# SELECT * from dblink_get_result('dtest1') as t1(f1 int, f2 text, f3 text[]); + f1 | f2 | f3 + ----+----+------------ + 0 | a | {a0,b0,c0} + 1 | b | {a1,b1,c1} + 2 | c | {a2,b2,c2} + (3 rows) + + contrib_regression=# SELECT * from dblink_get_result('dtest1') as t1(f1 int, f2 text, f3 text[]); + f1 | f2 | f3 + ----+----+---- + (0 rows) + + contrib_regression=# SELECT * from + dblink_send_query('dtest1', 'select * from foo where f1 < 3; select * from foo where f1 > 6') as t1; + t1 + ---- + 1 + (1 row) + + contrib_regression=# SELECT * from dblink_get_result('dtest1') as t1(f1 int, f2 text, f3 text[]); + f1 | f2 | f3 + ----+----+------------ + 0 | a | {a0,b0,c0} + 1 | b | {a1,b1,c1} + 2 | c | {a2,b2,c2} + (3 rows) + + contrib_regression=# SELECT * from dblink_get_result('dtest1') as t1(f1 int, f2 text, f3 text[]); + f1 | f2 | f3 + ----+----+--------------- + 7 | h | {a7,b7,c7} + 8 | i | {a8,b8,c8} + 9 | j | {a9,b9,c9} + 10 | k | {a10,b10,c10} + (4 rows) + + contrib_regression=# SELECT * from dblink_get_result('dtest1') as t1(f1 int, f2 text, f3 text[]); + f1 | f2 | f3 + ----+----+---- + (0 rows) + </programlisting> + </refsect1> + </refentry> +</sect1> diff --git a/doc/src/sgml/earthdistance.sgml b/doc/src/sgml/earthdistance.sgml new file mode 100644 index 00000000000..2d08bb829d6 --- /dev/null +++ b/doc/src/sgml/earthdistance.sgml @@ -0,0 +1,133 @@ +<sect1 id="earthdistance"> + <title>earthdistance</title> + + <indexterm zone="earthdistance"> + <primary>earthdistance</primary> + </indexterm> + + <para> + This module contains two different approaches to calculating + great circle distances on the surface of the Earth. The one described + first depends on the contrib/cube package (which MUST be installed before + earthdistance is installed). The second one is based on the point + datatype using latitude and longitude for the coordinates. The install + script makes the defined functions executable by anyone. + </para> + <para> + A spherical model of the Earth is used. + </para> + <para> + Data is stored in cubes that are points (both corners are the same) using 3 + coordinates representing the distance from the center of the Earth. + </para> + <para> + The radius of the Earth is obtained from the earth() function. It is + given in meters. But by changing this one function you can change it + to use some other units or to use a different value of the radius + that you feel is more appropiate. + </para> + <para> + This package also has applications to astronomical databases as well. + Astronomers will probably want to change earth() to return a radius of + 180/pi() so that distances are in degrees. + </para> + <para> + Functions are provided to allow for input in latitude and longitude (in + degrees), to allow for output of latitude and longitude, to calculate + the great circle distance between two points and to easily specify a + bounding box usable for index searches. + </para> + <para> + The functions are all 'sql' functions. If you want to make these functions + executable by other people you will also have to make the referenced + cube functions executable. cube(text), cube(float8), cube(cube,float8), + cube_distance(cube,cube), cube_ll_coord(cube,int) and + cube_enlarge(cube,float8,int) are used indirectly by the earth distance + functions. is_point(cube) and cube_dim(cube) are used in constraints for data + in domain earth. cube_ur_coord(cube,int) is used in the regression tests and + might be useful for looking at bounding box coordinates in user applications. + </para> + <para> + A domain of type cube named earth is defined. + There are constraints on it defined to make sure the cube is a point, + that it does not have more than 3 dimensions and that it is very near + the surface of a sphere centered about the origin with the radius of + the Earth. + </para> + <para> + The following functions are provided: + </para> + + <table id="earthdistance-functions"> + <title>EarthDistance functions</title> + <tgroup cols="2"> + <tbody> + <row> + <entry><literal>earth()</literal></entry> + <entry>returns the radius of the Earth in meters.</entry> + </row> + <row> + <entry><literal>sec_to_gc(float8)</literal></entry> + <entry>converts the normal straight line + (secant) distance between between two points on the surface of the Earth + to the great circle distance between them. + </entry> + </row> + <row> + <entry><literal>gc_to_sec(float8)</literal></entry> + <entry>Converts the great circle distance + between two points on the surface of the Earth to the normal straight line + (secant) distance between them. + </entry> + </row> + <row> + <entry><literal>ll_to_earth(float8, float8)</literal></entry> + <entry>Returns the location of a point on the surface of the Earth given + its latitude (argument 1) and longitude (argument 2) in degrees. + </entry> + </row> + <row> + <entry><literal>latitude(earth)</literal></entry> + <entry>Returns the latitude in degrees of a point on the surface of the + Earth. + </entry> + </row> + <row> + <entry><literal>longitude(earth)</literal></entry> + <entry>Returns the longitude in degrees of a point on the surface of the + Earth. + </entry> + </row> + <row> + <entry><literal>earth_distance(earth, earth)</literal></entry> + <entry>Returns the great circle distance between two points on the + surface of the Earth. + </entry> + </row> + <row> + <entry><literal>earth_box(earth, float8)</literal></entry> + <entry>Returns a box suitable for an indexed search using the cube @> + operator for points within a given great circle distance of a location. + Some points in this box are further than the specified great circle + distance from the location so a second check using earth_distance + should be made at the same time. + </entry> + </row> + <row> + <entry><literal><@></literal> operator</entry> + <entry>gives the distance in statute miles between + two points on the Earth's surface. Coordinates are in degrees. Points are + taken as (longitude, latitude) and not vice versa as longitude is closer + to the intuitive idea of x-axis and latitude to y-axis. + </entry> + </row> + </tbody> + </tgroup> + </table> + <para> + One advantage of using cube representation over a point using latitude and + longitude for coordinates, is that you don't have to worry about special + conditions at +/- 180 degrees of longitude or near the poles. + </para> +</sect1> + diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml index 5502bb42086..a1a8d048ed3 100644 --- a/doc/src/sgml/filelist.sgml +++ b/doc/src/sgml/filelist.sgml @@ -1,4 +1,4 @@ -<!-- $PostgreSQL: pgsql/doc/src/sgml/filelist.sgml,v 1.51 2007/11/01 17:00:18 momjian Exp $ --> +<!-- $PostgreSQL: pgsql/doc/src/sgml/filelist.sgml,v 1.52 2007/11/10 23:30:46 momjian Exp $ --> <!entity history SYSTEM "history.sgml"> <!entity info SYSTEM "info.sgml"> @@ -89,6 +89,38 @@ <!entity sources SYSTEM "sources.sgml"> <!entity storage SYSTEM "storage.sgml"> +<!-- contrib information --> +<!entity contrib SYSTEM "contrib.sgml"> +<!entity adminpack SYSTEM "adminpack.sgml"> +<!entity btree-gist SYSTEM "btree-gist.sgml"> +<!entity chkpass SYSTEM "chkpass.sgml"> +<!entity cube SYSTEM "cube.sgml"> +<!entity dblink SYSTEM "dblink.sgml"> +<!entity earthdistance SYSTEM "earthdistance.sgml"> +<!entity fuzzystrmatch SYSTEM "fuzzystrmatch.sgml"> +<!entity hstore SYSTEM "hstore.sgml"> +<!entity intagg SYSTEM "intagg.sgml"> +<!entity intarray SYSTEM "intarray.sgml"> +<!entity isn SYSTEM "isn.sgml"> +<!entity lo SYSTEM "lo.sgml"> +<!entity ltree SYSTEM "ltree.sgml"> +<!entity oid2name SYSTEM "oid2name.sgml"> +<!entity pageinspect SYSTEM "pageinspect.sgml"> +<!entity pgbench SYSTEM "pgbench.sgml"> +<!entity buffercache SYSTEM "buffercache.sgml"> +<!entity pgcrypto SYSTEM "pgcrypto.sgml"> +<!entity freespacemap SYSTEM "freespacemap.sgml"> +<!entity pgrowlocks SYSTEM "pgrowlocks.sgml"> +<!entity standby SYSTEM "standby.sgml"> +<!entity pgstattuple SYSTEM "pgstattuple.sgml"> +<!entity trgm SYSTEM "trgm.sgml"> +<!entity seg SYSTEM "seg.sgml"> +<!entity sslinfo SYSTEM "sslinfo.sgml"> +<!entity tablefunc SYSTEM "tablefunc.sgml"> +<!entity uuid-ossp SYSTEM "uuid-ossp.sgml"> +<!entity vacuumlo SYSTEM "vacuumlo.sgml"> +<!entity xml2 SYSTEM "xml2.sgml"> + <!-- appendixes --> <!entity contacts SYSTEM "contacts.sgml"> <!entity cvs SYSTEM "cvs.sgml"> diff --git a/doc/src/sgml/freespacemap.sgml b/doc/src/sgml/freespacemap.sgml new file mode 100644 index 00000000000..70b27415524 --- /dev/null +++ b/doc/src/sgml/freespacemap.sgml @@ -0,0 +1,243 @@ +<sect1 id="pgfreespacemap"> + <title>pgfreespacemap</title> + + <indexterm zone="pgfreespacemap"> + <primary>pgfreespacemap</primary> + </indexterm> + + <para> + This modules provides the means for examining the free space map (FSM). It + consists of two C functions: <literal>pg_freespacemap_relations()</literal> + and <literal>pg_freespacemap_pages()</literal> that return a set + of records, plus two views <literal>pg_freespacemap_relations</literal> and + <literal>pg_freespacemap_pages</literal> for more user-friendly access to + the functions. + </para> + <para> + The module provides the ability to examine the contents of the free space + map, without having to restart or rebuild the server with additional + debugging code. + </para> + <para> + By default public access is REVOKED from the functions and views, just in + case there are security issues present in the code. + </para> + + <sect2> + <title>Notes</title> + <para> + The definitions for the columns exposed in the views are: + </para> + + <table> + <title>pg_freespacemap_relations</title> + <tgroup cols="3"> + <thead> + <row> + <entry>Column</entry> + <entry>references</entry> + <entry>Description</entry> + </row> + </thead> + <tbody> + <row> + <entry>reltablespace</entry> + <entry>pg_tablespace.oid</entry> + <entry>Tablespace oid of the relation.</entry> + </row> + <row> + <entry>reldatabase</entry> + <entry>pg_database.oid</entry> + <entry>Database oid of the relation.</entry> + </row> + <row> + <entry>relfilenode</entry> + <entry>pg_class.relfilenode</entry> + <entry>Relfilenode of the relation.</entry> + </row> + <row> + <entry>avgrequest</entry> + <entry></entry> + <entry>Moving average of free space requests (NULL for indexes)</entry> + </row> + <row> + <entry>interestingpages</entry> + <entry></entry> + <entry>Count of pages last reported as containing useful free space.</entry> + </row> + <row> + <entry>storedpages</entry> + <entry></entry> + <entry>Count of pages actually stored in free space map.</entry> + </row> + <row> + <entry>nextpage</entry> + <entry></entry> + <entry>Page index (from 0) to start next search at.</entry> + </row> + </tbody> + </tgroup> + </table> + + <table> + <title>pg_freespacemap_pages</title> + <tgroup cols="3"> + <thead> + <row> + <entry>Column</entry> + <entry> references</entry> + <entry>Description</entry> + </row> + </thead> + <tbody> + <row> + <entry>reltablespace</entry> + <entry>pg_tablespace.oid</entry> + <entry>Tablespace oid of the relation.</entry> + </row> + <row> + <entry>reldatabase</entry> + <entry>pg_database.oid</entry> + <entry>Database oid of the relation.</entry> + </row> + <row> + <entry>relfilenode</entry> + <entry>pg_class.relfilenode</entry> + <entry>Relfilenode of the relation.</entry> + </row> + <row> + <entry>relblocknumber</entry> + <entry></entry> + <entry>Page number in the relation.</entry> + </row> + <row> + <entry>bytes</entry> + <entry></entry> + <entry>Free bytes in the page, or NULL for an index page (see below).</entry> + </row> + </tbody> + </tgroup> + </table> + + <para> + For <literal>pg_freespacemap_relations</literal>, there is one row for each + relation in the free space map. <literal>storedpages</literal> is the + number of pages actually stored in the map, while + <literal>interestingpages</literal> is the number of pages the last VACUUM + thought had useful amounts of free space. + </para> + <para> + If <literal>storedpages</literal> is consistently less than interestingpages + then it'd be a good idea to increase <literal>max_fsm_pages</literal>. Also, + if the number of rows in <literal>pg_freespacemap_relations</literal> is + close to <literal>max_fsm_relations</literal>, then you should consider + increasing <literal>max_fsm_relations</literal>. + </para> + <para> + For <literal>pg_freespacemap_pages</literal>, there is one row for each page + in the free space map. The number of rows for a relation will match the + <literal>storedpages</literal> column in + <literal>pg_freespacemap_relations</literal>. + </para> + <para> + For indexes, what is tracked is entirely-unused pages, rather than free + space within pages. Therefore, the average request size and free bytes + within a page are not meaningful, and are shown as NULL. + </para> + <para> + Because the map is shared by all the databases, it will include relations + not belonging to the current database. + </para> + <para> + When either of the views are accessed, internal free space map locks are + taken, and a copy of the map data is made for them to display. + This ensures that the views produce a consistent set of results, while not + blocking normal activity longer than necessary. Nonetheless there + could be some impact on database performance if they are read often. + </para> + </sect2> + + <sect2> + <title>Sample output - pg_freespacemap_relations</title> + <programlisting> +regression=# \d pg_freespacemap_relations +View "public.pg_freespacemap_relations" + Column | Type | Modifiers +------------------+---------+----------- + reltablespace | oid | + reldatabase | oid | + relfilenode | oid | + avgrequest | integer | + interestingpages | integer | + storedpages | integer | + nextpage | integer | +View definition: + SELECT p.reltablespace, p.reldatabase, p.relfilenode, p.avgrequest, p.interestingpages, p.storedpages, p.nextpage + FROM pg_freespacemap_relations() p(reltablespace oid, reldatabase oid, relfilenode oid, avgrequest integer, interestingpages integer, storedpages integer, nextpage integer); + +regression=# SELECT c.relname, r.avgrequest, r.interestingpages, r.storedpages + FROM pg_freespacemap_relations r INNER JOIN pg_class c + ON c.relfilenode = r.relfilenode INNER JOIN pg_database d + ON r.reldatabase = d.oid AND (d.datname = current_database()) + ORDER BY r.storedpages DESC LIMIT 10; + relname | avgrequest | interestingpages | storedpages +---------------------------------+------------+------------------+------------- + onek | 256 | 109 | 109 + pg_attribute | 167 | 93 | 93 + pg_class | 191 | 49 | 49 + pg_attribute_relid_attnam_index | | 48 | 48 + onek2 | 256 | 37 | 37 + pg_depend | 95 | 26 | 26 + pg_type | 199 | 16 | 16 + pg_rewrite | 1011 | 13 | 13 + pg_class_relname_nsp_index | | 10 | 10 + pg_proc | 302 | 8 | 8 +(10 rows) + </programlisting> + </sect2> + + <sect2> + <title>Sample output - pg_freespacemap_pages</title> + <programlisting> +regression=# \d pg_freespacemap_pages + View "public.pg_freespacemap_pages" + Column | Type | Modifiers +----------------+---------+----------- + reltablespace | oid | + reldatabase | oid | + relfilenode | oid | + relblocknumber | bigint | + bytes | integer | +View definition: + SELECT p.reltablespace, p.reldatabase, p.relfilenode, p.relblocknumber, p.bytes + FROM pg_freespacemap_pages() p(reltablespace oid, reldatabase oid, relfilenode oid, relblocknumber bigint, bytes integer); + +regression=# SELECT c.relname, p.relblocknumber, p.bytes + FROM pg_freespacemap_pages p INNER JOIN pg_class c + ON c.relfilenode = p.relfilenode INNER JOIN pg_database d + ON (p.reldatabase = d.oid AND d.datname = current_database()) + ORDER BY c.relname LIMIT 10; + relname | relblocknumber | bytes +--------------+----------------+------- + a_star | 0 | 8040 + abstime_tbl | 0 | 7908 + aggtest | 0 | 8008 + altinhoid | 0 | 8128 + altstartwith | 0 | 8128 + arrtest | 0 | 7172 + b_star | 0 | 7976 + box_tbl | 0 | 7912 + bt_f8_heap | 54 | 7728 + bt_i4_heap | 49 | 8008 +(10 rows) + </programlisting> + </sect2> + + <sect2> + <title>Author</title> + <para> + Mark Kirkwood <email>markir@paradise.net.nz</email> + </para> + </sect2> +</sect1> + diff --git a/doc/src/sgml/fuzzystrmatch.sgml b/doc/src/sgml/fuzzystrmatch.sgml new file mode 100644 index 00000000000..666e031c0d6 --- /dev/null +++ b/doc/src/sgml/fuzzystrmatch.sgml @@ -0,0 +1,122 @@ + +<sect1 id="fuzzystrmatch"> + <title>fuzzystrmatch</title> + + <para> + This section describes the fuzzystrmatch module which provides different + functions to determine similarities and distance between strings. + </para> + + <sect2> + <title>Soundex</title> + <para> + The Soundex system is a method of matching similar sounding names + (or any words) to the same code. It was initially used by the + United States Census in 1880, 1900, and 1910, but it has little use + beyond English names (or the English pronunciation of names), and + it is not a linguistic tool. + </para> + <para> + When comparing two soundex values to determine similarity, the + difference function reports how close the match is on a scale + from zero to four, with zero being no match and four being an + exact match. + </para> + <para> + The following are some usage examples: + </para> + <programlisting> +SELECT soundex('hello world!'); + +SELECT soundex('Anne'), soundex('Ann'), difference('Anne', 'Ann'); +SELECT soundex('Anne'), soundex('Andrew'), difference('Anne', 'Andrew'); +SELECT soundex('Anne'), soundex('Margaret'), difference('Anne', 'Margaret'); + +CREATE TABLE s (nm text); + +INSERT INTO s VALUES ('john'); +INSERT INTO s VALUES ('joan'); +INSERT INTO s VALUES ('wobbly'); +INSERT INTO s VALUES ('jack'); + +SELECT * FROM s WHERE soundex(nm) = soundex('john'); + +SELECT a.nm, b.nm FROM s a, s b WHERE soundex(a.nm) = soundex(b.nm) AND a.oid <> b.oid; + +CREATE FUNCTION text_sx_eq(text, text) RETURNS boolean AS +'select soundex($1) = soundex($2)' +LANGUAGE SQL; + +CREATE FUNCTION text_sx_lt(text, text) RETURNS boolean AS +'select soundex($1) < soundex($2)' +LANGUAGE SQL; + +CREATE FUNCTION text_sx_gt(text, text) RETURNS boolean AS +'select soundex($1) > soundex($2)' +LANGUAGE SQL; + +CREATE FUNCTION text_sx_le(text, text) RETURNS boolean AS +'select soundex($1) <= soundex($2)' +LANGUAGE SQL; + +CREATE FUNCTION text_sx_ge(text, text) RETURNS boolean AS +'select soundex($1) >= soundex($2)' +LANGUAGE SQL; + +CREATE FUNCTION text_sx_ne(text, text) RETURNS boolean AS +'select soundex($1) <> soundex($2)' +LANGUAGE SQL; + +DROP OPERATOR #= (text, text); + +CREATE OPERATOR #= (leftarg=text, rightarg=text, procedure=text_sx_eq, commutator = #=); + +SELECT * FROM s WHERE text_sx_eq(nm, 'john'); + +SELECT * FROM s WHERE s.nm #= 'john'; + +SELECT * FROM s WHERE difference(s.nm, 'john') > 2; + </programlisting> + </sect2> + + <sect2> + <title>levenshtein</title> + <para> + This function calculates the levenshtein distance between two strings: + </para> + <programlisting> + int levenshtein(text source, text target) + </programlisting> + <para> + Both <literal>source</literal> and <literal>target</literal> can be any + NOT NULL string with a maximum of 255 characters. + </para> + <para> + Example: + </para> + <programlisting> + SELECT levenshtein('GUMBO','GAMBOL'); + </programlisting> + </sect2> + + <sect2> + <title>metaphone</title> + <para> + This function calculates and returns the metaphone code of an input string: + </para> + <programlisting> + text metahpone(text source, int max_output_length) + </programlisting> + <para> + <literal>source</literal> has to be a NOT NULL string with a maximum of + 255 characters. <literal>max_output_length</literal> fixes the maximum + length of the output metaphone code; if longer, the output is truncated + to this length. + </para> + <para>Example</para> + <programlisting> + SELECT metaphone('GUMBO',4); + </programlisting> + </sect2> + +</sect1> diff --git a/doc/src/sgml/hstore.sgml b/doc/src/sgml/hstore.sgml new file mode 100644 index 00000000000..147fc7fba60 --- /dev/null +++ b/doc/src/sgml/hstore.sgml @@ -0,0 +1,298 @@ +<sect1 id="hstore"> + <title>hstore</title> + + <indexterm zone="hstore"> + <primary>hstore</primary> + </indexterm> + + <para> + The <literal>hstore</literal> module is usefull for storing (key,value) pairs. + This module can be useful in different scenarios: case with many attributes + rarely searched, semistructural data or a lazy DBA. + </para> + + <sect2> + <title>Operations</title> + <itemizedlist> + <listitem> + <para> + <literal>hstore -> text</literal> - get value , perl analogy $h{key} + </para> + <programlisting> +select 'a=>q, b=>g'->'a'; + ? +------ + q + </programlisting> + <para> + Note the use of parenthesis in the select below, because priority of 'is' is + higher than that of '->': + </para> + <programlisting> +SELECT id FROM entrants WHERE (info->'education_period') IS NOT NULL; + </programlisting> + </listitem> + + <listitem> + <para> + <literal>hstore || hstore</literal> - concatenation, perl analogy %a=( %b, %c ); + </para> + <programlisting> +regression=# select 'a=>b'::hstore || 'c=>d'::hstore; + ?column? +-------------------- + "a"=>"b", "c"=>"d" +(1 row) + </programlisting> + + <para> + but, notice + </para> + + <programlisting> +regression=# select 'a=>b'::hstore || 'a=>d'::hstore; + ?column? +---------- + "a"=>"d" +(1 row) + </programlisting> + </listitem> + + <listitem> + <para> + <literal>text => text</literal> - creates hstore type from two text strings + </para> + <programlisting> +select 'a'=>'b'; + ?column? +---------- + "a"=>"b" + </programlisting> + </listitem> + + <listitem> + <para> + <literal>hstore @> hstore</literal> - contains operation, check if left operand contains right. + </para> + <programlisting> +regression=# select 'a=>b, b=>1, c=>NULL'::hstore @> 'a=>c'; + ?column? +---------- + f +(1 row) + +regression=# select 'a=>b, b=>1, c=>NULL'::hstore @> 'b=>1'; + ?column? +---------- + t +(1 row) + </programlisting> + </listitem> + + <listitem> + <para> + <literal>hstore <@ hstore</literal> - contained operation, check if + left operand is contained in right + </para> + <para> + (Before PostgreSQL 8.2, the containment operators @> and <@ were + respectively called @ and ~. These names are still available, but are + deprecated and will eventually be retired. Notice that the old names + are reversed from the convention formerly followed by the core geometric + datatypes!) + </para> + </listitem> + </itemizedlist> + </sect2> + + <sect2> + <title>Functions</title> + + <itemizedlist> + <listitem> + <para> + <literal>akeys(hstore)</literal> - returns all keys from hstore as array + </para> + <programlisting> +regression=# select akeys('a=>1,b=>2'); + akeys +------- + {a,b} + </programlisting> + </listitem> + + <listitem> + <para> + <literal>skeys(hstore)</literal> - returns all keys from hstore as strings + </para> + <programlisting> +regression=# select skeys('a=>1,b=>2'); + skeys +------- + a + b + </programlisting> + </listitem> + + <listitem> + <para> + <literal>avals(hstore)</literal> - returns all values from hstore as array + </para> + <programlisting> +regression=# select avals('a=>1,b=>2'); + avals +------- + {1,2} + </programlisting> + </listitem> + + <listitem> + <para> + <literal>svals(hstore)</literal> - returns all values from hstore as + strings + </para> + <programlisting> +regression=# select svals('a=>1,b=>2'); + svals +------- + 1 + 2 + </programlisting> + </listitem> + + <listitem> + <para> + <literal>delete (hstore,text)</literal> - delete (key,value) from hstore if + key matches argument. + </para> + <programlisting> +regression=# select delete('a=>1,b=>2','b'); + delete +---------- + "a"=>"1" + </programlisting> + </listitem> + + <listitem> + <para> + <literal>each(hstore)</literal> - return (key, value) pairs + </para> + <programlisting> +regression=# select * from each('a=>1,b=>2'); + key | value +-----+------- + a | 1 + b | 2 + </programlisting> + </listitem> + + <listitem> + <para> + <literal>exist (hstore,text)</literal> + </para> + <para> + <literal>hstore ? text</literal> - returns 'true if key is exists in hstore + and false otherwise. + </para> + <programlisting> +regression=# select exist('a=>1','a'), 'a=>1' ? 'a'; + exist | ?column? +-------+---------- + t | t + </programlisting> + </listitem> + + <listitem> + <para> + <literal>defined (hstore,text)</literal> - returns true if key is exists in + hstore and its value is not NULL. + </para> + <programlisting> +regression=# select defined('a=>NULL','a'); + defined +--------- + f + </programlisting> + </listitem> + </itemizedlist> + </sect2> + + <sect2> + <title>Indices</title> + <para> + Module provides index support for '@>' and '?' operations. + </para> + <programlisting> +CREATE INDEX hidx ON testhstore USING GIST(h); +CREATE INDEX hidx ON testhstore USING GIN(h); + </programlisting> + </sect2> + + <sect2> + <title>Examples</title> + + <para> + Add a key: + </para> + <programlisting> +UPDATE tt SET h=h||'c=>3'; + </programlisting> + <para> + Delete a key: + </para> + <programlisting> +UPDATE tt SET h=delete(h,'k1'); + </programlisting> + </sect2> + + <sect2> + <title>Statistics</title> + <para> +hstore type, because of its intrinsic liberality, could contain a lot of +different keys. Checking for valid keys is the task of application. +Examples below demonstrate several techniques how to check keys statistics. + </para> + + <para> + Simple example + </para> + <programlisting> +SELECT * FROM each('aaa=>bq, b=>NULL, ""=>1 '); + </programlisting> + + <para> + Using table + </para> + <programlisting> +SELECT (each(h)).key, (each(h)).value INTO stat FROM testhstore ; + </programlisting> + + <para>Online stat</para> + <programlisting> +SELECT key, count(*) FROM (SELECT (each(h)).key FROM testhstore) AS stat GROUP BY key ORDER BY count DESC, key; + key | count +-----------+------- + line | 883 + query | 207 + pos | 203 + node | 202 + space | 197 + status | 195 + public | 194 + title | 190 + org | 189 +................... + </programlisting> + </sect2> + + <sect2> + <title>Authors</title> + <para> + Oleg Bartunov <email>oleg@sai.msu.su</email>, Moscow, Moscow University, Russia + </para> + <para> + Teodor Sigaev <email>teodor@sigaev.ru</email>, Moscow, Delta-Soft Ltd.,Russia + </para> + </sect2> +</sect1> + diff --git a/doc/src/sgml/intagg.sgml b/doc/src/sgml/intagg.sgml new file mode 100644 index 00000000000..3fbd5c3281f --- /dev/null +++ b/doc/src/sgml/intagg.sgml @@ -0,0 +1,82 @@ + +<sect1 id="intagg"> + <title>intagg</title> + + <indexterm zone="intagg"> + <primary>intagg</primary> + </indexterm> + + <para> + This section describes the <literal>intagg</literal> module which provides an integer aggregator and an enumerator. + </para> + <para> + Many database systems have the notion of a one to many table. Such a table usually sits between two indexed tables, as: + </para> + <programlisting> +CREATE TABLE one_to_many(left INT, right INT) ; + </programlisting> + + <para> + And it is used like this: + </para> + + <programlisting> + SELECT right.* from right JOIN one_to_many ON (right.id = one_to_many.right) + WHERE one_to_many.left = item; + </programlisting> + + <para> + This will return all the items in the right hand table for an entry + in the left hand table. This is a very common construct in SQL. + </para> + + <para> + Now, this methodology can be cumbersome with a very large number of + entries in the one_to_many table. Depending on the order in which + data was entered, a join like this could result in an index scan + and a fetch for each right hand entry in the table for a particular + left hand entry. If you have a very dynamic system, there is not much you + can do. However, if you have some data which is fairly static, you can + create a summary table with the aggregator. + </para> + + <programlisting> +CREATE TABLE summary as SELECT left, int_array_aggregate(right) + AS right FROM one_to_many GROUP BY left; + </programlisting> + + <para> + This will create a table with one row per left item, and an array + of right items. Now this is pretty useless without some way of using + the array, thats why there is an array enumerator. + </para> + <programlisting> +SELECT left, int_array_enum(right) FROM summary WHERE left = item; + </programlisting> + + <para> + The above query using int_array_enum, produces the same results as: + </para> + <programlisting> +SELECT left, right FROM one_to_many WHERE left = item; + </programlisting> + + <para> + The difference is that the query against the summary table has to get + only one row from the table, where as the query against "one_to_many" + must index scan and fetch a row for each entry. + </para> + <para> + On our system, an EXPLAIN shows a query with a cost of 8488 gets reduced + to a cost of 329. The query is a join between the one_to_many table, + </para> + <programlisting> +SELECT right, count(right) FROM +( + SELECT left, int_array_enum(right) AS right FROM summary JOIN + (SELECT left FROM left_table WHERE left = item) AS lefts + ON (summary.left = lefts.left ) +) AS list GROUP BY right ORDER BY count DESC ; + </programlisting> +</sect1> + diff --git a/doc/src/sgml/intarray.sgml b/doc/src/sgml/intarray.sgml new file mode 100644 index 00000000000..7e538a894d5 --- /dev/null +++ b/doc/src/sgml/intarray.sgml @@ -0,0 +1,286 @@ +<sect1 id="intarray"> + <title>intarray</title> + + <indexterm zone="intarray"> + <primary>intarray</primary> + </indexterm> + + <para> + This is an implementation of RD-tree data structure using GiST interface + of PostgreSQL. It has built-in lossy compression. + </para> + + <para> + Current implementation provides index support for one-dimensional array of + int4's - gist__int_ops, suitable for small and medium size of arrays (used on + default), and gist__intbig_ops for indexing large arrays (we use superimposed + signature with length of 4096 bits to represent sets). + </para> + + <sect2> + <title>Functions</title> + + <itemizedlist> + + <listitem> + <para> + <literal>int icount(int[])</literal> - the number of elements in intarray + </para> + <programlisting> +test=# select icount('{1,2,3}'::int[]); + icount +-------- + 3 +(1 row) + </programlisting> + </listitem> + + <listitem> + <para> + <literal>int[] sort(int[], 'asc' | 'desc')</literal> - sort intarray + </para> + <programlisting> +test=# select sort('{1,2,3}'::int[],'desc'); + sort +--------- + {3,2,1} +(1 row) + </programlisting> + </listitem> + + <listitem> + <para> + <literal>int[] sort(int[])</literal> - sort in ascending order + </para> + </listitem> + + <listitem> + <para> + <literal>int[] sort_asc(int[]),sort_desc(int[])</literal> - shortcuts for sort + </para> + </listitem> + + <listitem> + <para> + <literal>int[] uniq(int[])</literal> - returns unique elements + </para> + <programlisting> +test=# select uniq(sort('{1,2,3,2,1}'::int[])); + uniq +--------- + {1,2,3} +(1 row) + </programlisting> + </listitem> + + <listitem> + <para> + <literal>int idx(int[], int item)</literal> - returns index of first + intarray matching element to item, or '0' if matching failed. + </para> + <programlisting> +test=# select idx('{1,2,3,2,1}'::int[],2); + idx +----- + 2 +(1 row) + </programlisting> + </listitem> + + <listitem> + <para> + <literal>int[] subarray(int[],int START [, int LEN])</literal> - returns + part of intarray starting from element number START (from 1) and length LEN. + </para> + <programlisting> +test=# select subarray('{1,2,3,2,1}'::int[],2,3); + subarray +---------- + {2,3,2} +(1 row) + </programlisting> + </listitem> + + <listitem> + <para> + <literal>int[] intset(int4)</literal> - casting int4 to int[] + </para> + <programlisting> +test=# select intset(1); + intset +-------- + {1} +(1 row) + </programlisting> + </listitem> + + </itemizedlist> + </sect2> + + <sect2> + <title>Operations</title> + <table> + <title>Operations</title> + <tgroup cols="2"> + <thead> + <row> + <entry>Operator</entry> + <entry>Description</entry> + </row> + </thead> + <tbody> + <row> + <entry><literal>int[] && int[]</literal></entry> + <entry>overlap - returns TRUE if arrays have at least one common element</entry> + </row> + <row> + <entry><literal>int[] @> int[]</literal></entry> + <entry>contains - returns TRUE if left array contains right array</entry> + </row> + <row> + <entry><literal>int[] <@ int[]</literal></entry> + <entry>contained - returns TRUE if left array is contained in right array</entry> + </row> + <row> + <entry><literal># int[]</literal></entry> + <entry>returns the number of elements in array</entry> + </row> + <row> + <entry><literal>int[] + int</literal></entry> + <entry>push element to array ( add to end of array)</entry> + </row> + <row> + <entry><literal>int[] + int[] </literal></entry> + <entry>merge of arrays (right array added to the end of left one)</entry> + </row> + <row> + <entry><literal>int[] - int</literal></entry> + <entry>remove entries matched by right argument from array</entry> + </row> + <row> + <entry><literal>int[] - int[]</literal></entry> + <entry>remove right array from left</entry> + </row> + <row> + <entry><literal>int[] | int</literal></entry> + <entry>returns intarray - union of arguments</entry> + </row> + <row> + <entry><literal>int[] | int[]</literal></entry> + <entry>returns intarray as a union of two arrays</entry> + </row> + + <row> + <entry><literal>int[] & int[]</literal></entry> + <entry>returns intersection of arrays</entry> + </row> + + <row> + <entry><literal>int[] @@ query_int</literal></entry> + <entry> + returns TRUE if array satisfies query (like + <literal>'1&(2|3)'</literal>) + </entry> + </row> + + <row> + <entry><literal>query_int ~~ int[]</literal></entry> + <entry>returns TRUE if array satisfies query (commutator of @@)</entry> + </row> + </tbody> + </tgroup> + </table> + <para> + (Before PostgreSQL 8.2, the containment operators @> and <@ were + respectively called @ and ~. These names are still available, but are + deprecated and will eventually be retired. Notice that the old names + are reversed from the convention formerly followed by the core geometric + datatypes!) + </para> + </sect2> + + <sect2> + <title>Example</title> + + <programlisting> +CREATE TABLE message (mid INT NOT NULL,sections INT[]); +CREATE TABLE message_section_map (mid INT NOT NULL,sid INT NOT NULL); + +-- create indices +CREATE unique index message_key ON message ( mid ); +CREATE unique index message_section_map_key2 ON message_section_map (sid, mid ); +CREATE INDEX message_rdtree_idx ON message USING GIST ( sections gist__int_ops); + +-- select some messages with section in 1 OR 2 - OVERLAP operator +SELECT message.mid FROM message WHERE message.sections && '{1,2}'; + +-- select messages contains in sections 1 AND 2 - CONTAINS operator +SELECT message.mid FROM message WHERE message.sections @> '{1,2}'; +-- the same, CONTAINED operator +SELECT message.mid FROM message WHERE '{1,2}' <@ message.sections; + </programlisting> + </sect2> + + <sect2> + <title>Benchmark</title> + <para> + subdirectory bench contains benchmark suite. + </para> + <programlisting> + cd ./bench + 1. createdb TEST + 2. psql TEST < ../_int.sql + 3. ./create_test.pl | psql TEST + 4. ./bench.pl - perl script to benchmark queries, supports OR, AND queries + with/without RD-Tree. Run script without arguments to + see availbale options. + + a)test without RD-Tree (OR) + ./bench.pl -d TEST -c -s 1,2 -v + b)test with RD-Tree + ./bench.pl -d TEST -c -s 1,2 -v -r + + BENCHMARKS: + + Size of table <message>: 200000 + Size of table <message_section_map>: 269133 + + Distribution of messages by sections: + + section 0: 74377 messages + section 1: 16284 messages + section 50: 1229 messages + section 99: 683 messages + + old - without RD-Tree support, + new - with RD-Tree + + +----------+---------------+----------------+ + |Search set|OR, time in sec|AND, time in sec| + | +-------+-------+--------+-------+ + | | old | new | old | new | + +----------+-------+-------+--------+-------+ + | 1| 0.625| 0.101| -| -| + +----------+-------+-------+--------+-------+ + | 99| 0.018| 0.017| -| -| + +----------+-------+-------+--------+-------+ + | 1,2| 0.766| 0.133| 0.628| 0.045| + +----------+-------+-------+--------+-------+ + | 1,2,50,65| 0.794| 0.141| 0.030| 0.006| + +----------+-------+-------+--------+-------+ + </programlisting> + </sect2> + + <sect2> + <title>Authors</title> + <para> + All work was done by Teodor Sigaev (<email>teodor@stack.net</email>) and Oleg + Bartunov (<email>oleg@sai.msu.su</email>). See + <ulink url="http://www.sai.msu.su/~megera/postgres/gist"></ulink> for + additional information. Andrey Oktyabrski did a great work on adding new + functions and operations. + </para> + </sect2> + +</sect1> + diff --git a/doc/src/sgml/isn.sgml b/doc/src/sgml/isn.sgml new file mode 100644 index 00000000000..c6fef47f084 --- /dev/null +++ b/doc/src/sgml/isn.sgml @@ -0,0 +1,502 @@ +<sect1 id="isn"> + <title>isn</title> + + <indexterm zone="isn"> + <primary>isn</primary> + </indexterm> + + <para> + The <literal>isn</literal> module adds data types for the following + international-standard namespaces: EAN13, UPC, ISBN (books), ISMN (music), + and ISSN (serials). This module is inspired by Garrett A. Wollman's + isbn_issn code. + </para> + <para> + This module validates, and automatically adds the correct + hyphenations to the numbers. Also, it supports the new ISBN-13 + numbers to be used starting in January 2007. + </para> + + <para> + Premises: + </para> + + <orderedlist> + <listitem> + <para>ISBN13, ISMN13, ISSN13 numbers are all EAN13 numbers</para> + </listitem> + <listitem> + <para>EAN13 numbers aren't always ISBN13, ISMN13 or ISSN13 (some are)</para> + </listitem> + <listitem> + <para>some ISBN13 numbers can be displayed as ISBN</para> + </listitem> + <listitem> + <para>some ISMN13 numbers can be displayed as ISMN</para> + </listitem> + <listitem> + <para>some ISSN13 numbers can be displayed as ISSN</para> + </listitem> + <listitem> + <para>all UPC, ISBN, ISMN and ISSN can be represented as EAN13 numbers</para> + </listitem> + </orderedlist> + + <note> + <para> + All types are internally represented as 64 bit integers, + and internally all are consistently interchangeable. + </para> + </note> + <note> + <para> + We have two operator classes (for btree and for hash) so each data type + can be indexed for faster access. + </para> + </note> + + <sect2> + <title>Data types</title> + + <para> + We have the following data types: + </para> + + <table> + <title>Data types</title> + <tgroup cols="2"> + <thead> + <row> + <entry><para>Data type</para></entry> + <entry><para>Description</para></entry> + </row> + </thead> + <tbody> + <row> + <entry><para><literal>EAN13</literal></para></entry> + <entry> + <para> + European Article Numbers. This type will always show the EAN13-display + format. Te output function for this is <literal>ean13_out()</literal> + </para> + </entry> + </row> + + <row> + <entry><para><literal>ISBN13</literal></para></entry> + <entry> + <para> + For International Standard Book Numbers to be displayed in + the new EAN13-display format. + </para> + </entry> + </row> + + <row> + <entry><para><literal>ISMN13</literal></para></entry> + <entry> + <para> + For International Standard Music Numbers to be displayed in + the new EAN13-display format. + </para> + </entry> + </row> + <row> + <entry><para><literal>ISSN13</literal></para></entry> + <entry> + <para> + For International Standard Serial Numbers to be displayed in the new + EAN13-display format. + </para> + </entry> + </row> + <row> + <entry><para><literal>ISBN</literal></para></entry> + <entry> + <para> + For International Standard Book Numbers to be displayed in the current + short-display format. + </para> + </entry> + </row> + <row> + <entry><para><literal>ISMN</literal></para></entry> + <entry> + <para> + For International Standard Music Numbers to be displayed in the + current short-display format. + </para> + </entry> + </row> + <row> + <entry><para><literal>ISSN</literal></para></entry> + <entry> + <para> + For International Standard Serial Numbers to be displayed in the + current short-display format. These types will display the short + version of the ISxN (ISxN 10) whenever it's possible, and it will + show ISxN 13 when it's impossible to show the short version. The + output function to do this is <literal>isn_out()</literal> + </para> + </entry> + </row> + <row> + <entry><para><literal>UPC</literal></para></entry> + <entry> + <para> + For Universal Product Codes. UPC numbers are a subset of the EAN13 + numbers (they are basically EAN13 without the first '0' digit.) + The output function to do this is also <literal>isn_out()</literal> + </para> + </entry> + </row> + </tbody> + </tgroup> + </table> + + <note> + <para> + <literal>EAN13</literal>, <literal>ISBN13</literal>, + <literal>ISMN13</literal> and <literal>ISSN13</literal> types will always + display the long version of the ISxN (EAN13). The output function to do + this is <literal>ean13_out()</literal>. + </para> + <para> + The need for these types is just for displaying in different ways the same + data: <literal>ISBN13</literal> is actually the same as + <literal>ISBN</literal>, <literal>ISMN13=ISMN</literal> and + <literal>ISSN13=ISSN</literal>. + </para> + </note> + </sect2> + + <sect2> + <title>Input functions</title> + + <para> + We have the following input functions: + </para> + + <table> + <title>Input functions</title> + <tgroup cols="2"> + <thead> + <row> + <entry>Function</entry> + <entry>Description</entry> + </row> + </thead> + <tbody> + <row> + <entry><para><literal>ean13_in()</literal></para></entry> + <entry> + <para> + To take a string and return an EAN13. + </para> + </entry> + </row> + + <row> + <entry><para><literal>isbn_in()</literal></para></entry> + <entry> + <para> + To take a string and return valid ISBN or ISBN13 numbers. + </para> + </entry> + </row> + + <row> + <entry><para><literal>ismn_in()</literal></para></entry> + <entry> + <para> + To take a string and return valid ISMN or ISMN13 numbers. + </para> + </entry> + </row> + + <row> + <entry><para><literal>issn_in()</literal></para></entry> + <entry> + <para> + To take a string and return valid ISSN or ISSN13 numbers. + </para> + </entry> + </row> + <row> + <entry><para><literal>upc_in()</literal></para></entry> + <entry> + <para> + To take a string and return an UPC codes. + </para> + </entry> + </row> + </tbody> + </tgroup> + </table> + </sect2> + + <sect2> + <title>Casts</title> + + <para> + We are able to cast from: + </para> + <itemizedlist> + <listitem> + <para> + ISBN13 -> EAN13 + </para> + </listitem> + <listitem> + <para> + ISMN13 -> EAN13 + </para> + </listitem> + <listitem> + <para> + ISSN13 -> EAN13 + </para> + </listitem> + <listitem> + <para> + ISBN -> EAN13 + </para> + </listitem> + <listitem> + <para> + ISMN -> EAN13 + </para> + </listitem> + <listitem> + <para> + ISSN -> EAN13 + </para> + </listitem> + <listitem> + <para> + UPC -> EAN13 + </para> + </listitem> + <listitem> + <para> + ISBN <-> ISBN13 + </para> + </listitem> + <listitem> + <para> + ISMN <-> ISMN13 + </para> + </listitem> + <listitem> + <para> + ISSN <-> ISSN13 + </para> + </listitem> + </itemizedlist> + </sect2> + + <sect2> + <title>C API</title> + <para> + The C API is implemented as: + </para> + <programlisting> + extern Datum isn_out(PG_FUNCTION_ARGS); + extern Datum ean13_out(PG_FUNCTION_ARGS); + extern Datum ean13_in(PG_FUNCTION_ARGS); + extern Datum isbn_in(PG_FUNCTION_ARGS); + extern Datum ismn_in(PG_FUNCTION_ARGS); + extern Datum issn_in(PG_FUNCTION_ARGS); + extern Datum upc_in(PG_FUNCTION_ARGS); + </programlisting> + + <para> + On success: + </para> + <itemizedlist> + <listitem> + <para> + <literal>isn_out()</literal> takes any of our types and returns a string containing + the shortes possible representation of the number. + </para> + </listitem> + <listitem> + <para> + <literal>ean13_out()</literal> takes any of our types and returns the + EAN13 (long) representation of the number. + </para> + </listitem> + <listitem> + <para> + <literal>ean13_in()</literal> takes a string and return a EAN13. Which, as stated in (2) + could or could not be any of our types, but it certainly is an EAN13 + number. Only if the string is a valid EAN13 number, otherwise it fails. + </para> + </listitem> + <listitem> + <para> + <literal>isbn_in()</literal> takes a string and return an ISBN/ISBN13. Only if the string + is really a ISBN/ISBN13, otherwise it fails. + </para> + </listitem> + <listitem> + <para> + <literal>ismn_in()</literal> takes a string and return an ISMN/ISMN13. Only if the string + is really a ISMN/ISMN13, otherwise it fails. + </para> + </listitem> + <listitem> + <para> + <literal>issn_in()</literal> takes a string and return an ISSN/ISSN13. Only if the string + is really a ISSN/ISSN13, otherwise it fails. + </para> + </listitem> + <listitem> + <para> + <literal>upc_in()</literal> takes a string and return an UPC. Only if the string is + really a UPC, otherwise it fails. + </para> + </listitem> + </itemizedlist> + + <para> + (on failure, the functions 'ereport' the error) + </para> + </sect2> + + <sect2> + <title>Testing functions</title> + <table> + <title>Testing functions</title> + <tgroup cols="2"> + <thead> + <row> + <entry><para>Function</para></entry> + <entry><para>Description</para></entry> + </row> + </thead> + <tbody> + <row> + <entry><para><literal>isn_weak(boolean)</literal></para></entry> + <entry><para>Sets the weak input mode.</para></entry> + </row> + <row> + <entry><para><literal>isn_weak()</literal></para></entry> + <entry><para>Gets the current status of the weak mode.</para></entry> + </row> + <row> + <entry><para><literal>make_valid()</literal></para></entry> + <entry><para>Validates an invalid number (deleting the invalid flag).</para></entry> + </row> + <row> + <entry><para><literal>is_valid()</literal></para></entry> + <entry><para>Checks for the invalid flag prsence.</para></entry> + </row> + </tbody> + </tgroup> + </table> + + <para> + <literal>Weak</literal> mode is used to be able to insert invalid data to + a table. Invalid as in the check digit being wrong, not missing numbers. + </para> + <para> + Why would you want to use the weak mode? Well, it could be that + you have a huge collection of ISBN numbers, and that there are so many of + them that for weird reasons some have the wrong check digit (perhaps the + numbers where scanned from a printed list and the OCR got the numbers wrong, + perhaps the numbers were manually captured... who knows.) Anyway, the thing + is you might want to clean the mess up, but you still want to be able to have + all the numbers in your database and maybe use an external tool to access + the invalid numbers in the database so you can verify the information and + validate it more easily; as selecting all the invalid numbers in the table. + </para> + <para> + When you insert invalid numbers in a table using the weak mode, the number + will be inserted with the corrected check digit, but it will be flagged + with an exclamation mark ('!') at the end (i.e. 0-11-000322-5!) + </para> + <para> + You can also force the insertion of invalid numbers even not in the weak mode, + appending the '!' character at the end of the number. + </para> + </sect2> + + <sect2> + <title>Examples</title> + <programlisting> +--Using the types directly: +SELECT isbn('978-0-393-04002-9'); +SELECT isbn13('0901690546'); +SELECT issn('1436-4522'); + +--Casting types: +-- note that you can only cast from ean13 to other type when the casted +-- number would be valid in the realm of the casted type; +-- thus, the following will NOT work: select isbn(ean13('0220356483481')); +-- but these will: +SELECT upc(ean13('0220356483481')); +SELECT ean13(upc('220356483481')); + +--Create a table with a single column to hold ISBN numbers: +CREATE TABLE test ( id isbn ); +INSERT INTO test VALUES('9780393040029'); + +--Automatically calculating check digits (observe the '?'): +INSERT INTO test VALUES('220500896?'); +INSERT INTO test VALUES('978055215372?'); + +SELECT issn('3251231?'); +SELECT ismn('979047213542?'); + +--Using the weak mode: +SELECT isn_weak(true); +INSERT INTO test VALUES('978-0-11-000533-4'); +INSERT INTO test VALUES('9780141219307'); +INSERT INTO test VALUES('2-205-00876-X'); +SELECT isn_weak(false); + +SELECT id FROM test WHERE NOT is_valid(id); +UPDATE test SET id=make_valid(id) WHERE id = '2-205-00876-X!'; + +SELECT * FROM test; + +SELECT isbn13(id) FROM test; + </programlisting> + </sect2> + + <sect2> + <title>Bibliography</title> + <para> + The information to implement this module was collected through + several sites, including: + </para> + <programlisting> + http://www.isbn-international.org/ + http://www.issn.org/ + http://www.ismn-international.org/ + http://www.wikipedia.org/ + </programlisting> + <para> + the prefixes used for hyphenation where also compiled from: + </para> + <programlisting> + http://www.gs1.org/productssolutions/idkeys/support/prefix_list.html + http://www.isbn-international.org/en/identifiers.html + http://www.ismn-international.org/ranges.html + </programlisting> + <para> + Care was taken during the creation of the algorithms and they + were meticulously verified against the suggested algorithms + in the official ISBN, ISMN, ISSN User Manuals. + </para> + </sect2> + + <sect2> + <title>Author</title> + <para> + Germ�n M�ndez Bravo (Kronuz), 2004 - 2006 + </para> + </sect2> +</sect1> + diff --git a/doc/src/sgml/lo.sgml b/doc/src/sgml/lo.sgml new file mode 100644 index 00000000000..2a23a5b5cd0 --- /dev/null +++ b/doc/src/sgml/lo.sgml @@ -0,0 +1,118 @@ + +<sect1 id="lo"> + <title>lo</title> + + <indexterm zone="lo"> + <primary>lo</primary> + </indexterm> + + <para> + PostgreSQL type extension for managing Large Objects + </para> + + <sect2> + <title>Overview</title> + <para> + One of the problems with the JDBC driver (and this affects the ODBC driver + also), is that the specification assumes that references to BLOBS (Binary + Large OBjectS) are stored within a table, and if that entry is changed, the + associated BLOB is deleted from the database. + </para> + <para> + As PostgreSQL stands, this doesn't occur. Large objects are treated as + objects in their own right; a table entry can reference a large object by + OID, but there can be multiple table entries referencing the same large + object OID, so the system doesn't delete the large object just because you + change or remove one such entry. + </para> + <para> + Now this is fine for new PostgreSQL-specific applications, but existing ones + using JDBC or ODBC won't delete the objects, resulting in orphaning - objects + that are not referenced by anything, and simply occupy disk space. + </para> + </sect2> + + <sect2> + <title>The Fix</title> + <para> + I've fixed this by creating a new data type 'lo', some support functions, and + a Trigger which handles the orphaning problem. The trigger essentially just + does a 'lo_unlink' whenever you delete or modify a value referencing a large + object. When you use this trigger, you are assuming that there is only one + database reference to any large object that is referenced in a + trigger-controlled column! + </para> + <para> + The 'lo' type was created because we needed to differentiate between plain + OIDs and Large Objects. Currently the JDBC driver handles this dilemma easily, + but (after talking to Byron), the ODBC driver needed a unique type. They had + created an 'lo' type, but not the solution to orphaning. + </para> + <para> + You don't actually have to use the 'lo' type to use the trigger, but it may be + convenient to use it to keep track of which columns in your database represent + large objects that you are managing with the trigger. + </para> + </sect2> + + <sect2> + <title>How to Use</title> + <para> + The easiest way is by an example: + </para> + <programlisting> + CREATE TABLE image (title TEXT, raster lo); + CREATE TRIGGER t_raster BEFORE UPDATE OR DELETE ON image + FOR EACH ROW EXECUTE PROCEDURE lo_manage(raster); + </programlisting> + <para> + Create a trigger for each column that contains a lo type, and give the column + name as the trigger procedure argument. You can have more than one trigger on + a table if you need multiple lo columns in the same table, but don't forget to + give a different name to each trigger. + </para> + </sect2> + + <sect2> + <title>Issues</title> + + <itemizedlist> + <listitem> + <para> + Dropping a table will still orphan any objects it contains, as the trigger + is not executed. + </para> + <para> + Avoid this by preceding the 'drop table' with 'delete from {table}'. + </para> + <para> + If you already have, or suspect you have, orphaned large objects, see + the contrib/vacuumlo module to help you clean them up. It's a good idea + to run contrib/vacuumlo occasionally as a back-stop to the lo_manage + trigger. + </para> + </listitem> + <listitem> + <para> + Some frontends may create their own tables, and will not create the + associated trigger(s). Also, users may not remember (or know) to create + the triggers. + </para> + </listitem> + </itemizedlist> + + <para> + As the ODBC driver needs a permanent lo type (& JDBC could be optimised to + use it if it's Oid is fixed), and as the above issues can only be fixed by + some internal changes, I feel it should become a permanent built-in type. + </para> + </sect2> + + <sect2> + <title>Author</title> + <para> + Peter Mount <email>peter@retep.org.uk</email> June 13 1998 + </para> + </sect2> +</sect1> + diff --git a/doc/src/sgml/ltree.sgml b/doc/src/sgml/ltree.sgml new file mode 100644 index 00000000000..75c02013c73 --- /dev/null +++ b/doc/src/sgml/ltree.sgml @@ -0,0 +1,771 @@ + +<sect1 id="ltree"> + <title>ltree</title> + + <indexterm zone="ltree"> + <primary>ltree</primary> + </indexterm> + + <para> + <literal>ltree</literal> is a PostgreSQL module that contains implementation + of data types, indexed access methods and queries for data organized as a + tree-like structures. + </para> + + <sect2> + <title>Definitions</title> + <para> + A <emphasis>label</emphasis> of a node is a sequence of one or more words + separated by blank character '_' and containing letters and digits ( for + example, [a-zA-Z0-9] for C locale). The length of a label is limited by 256 + bytes. + </para> + <para> + Example: 'Countries', 'Personal_Services' + </para> + <para> + A <emphasis>label path</emphasis> of a node is a sequence of one or more + dot-separated labels l1.l2...ln, represents path from root to the node. The + length of a label path is limited by 65Kb, but size <= 2Kb is preferrable. + We consider it's not a strict limitation (maximal size of label path for + DMOZ catalogue - <ulink url="http://www.dmoz.org"></ulink>, is about 240 + bytes!) + </para> + <para> + Example: <literal>'Top.Countries.Europe.Russia'</literal> + </para> + <para> + We introduce several datatypes: + </para> + <itemizedlist> + <listitem> + <para> + <literal>ltree</literal> - is a datatype for label path. + </para> + </listitem> + <listitem> + <para> + <literal>ltree[]</literal> - is a datatype for arrays of ltree. + </para> + </listitem> + <listitem> + <para> + <literal>lquery</literal> + - is a path expression that has regular expression in the label path and + used for ltree matching. Star symbol (*) is used to specify any number of + labels (levels) and could be used at the beginning and the end of lquery, + for example, '*.Europe.*'. + </para> + <para> + The following quantifiers are recognized for '*' (like in Perl): + </para> + <itemizedlist> + <listitem> + <para>{n} Match exactly n levels</para> + </listitem> + <listitem> + <para>{n,} Match at least n levels</para> + </listitem> + <listitem> + <para>{n,m} Match at least n but not more than m levels</para> + </listitem> + <listitem> + <para>{,m} Match at maximum m levels (eq. to {0,m})</para> + </listitem> + </itemizedlist> + <para> + It is possible to use several modifiers at the end of a label: + </para> + <itemizedlist> + <listitem> + <para>@ Do case-insensitive label matching</para> + </listitem> + <listitem> + <para>* Do prefix matching for a label</para> + </listitem> + <listitem> + <para>% Don't account word separator '_' in label matching, that is + 'Russian%' would match 'Russian_nations', but not 'Russian' + </para> + </listitem> + </itemizedlist> + + <para> + <literal>lquery</literal> can contain logical '!' (NOT) at the beginning + of the label and '|' (OR) to specify possible alternatives for label + matching. + </para> + <para> + Example of <literal>lquery</literal>: + </para> + <programlisting> + Top.*{0,2}.sport*@.!football|tennis.Russ*|Spain + a) b) c) d) e) + </programlisting> + <para> + A label path should + </para> + <orderedlist numeration='loweralpha'> + <listitem> + <para> + begin from a node with label 'Top' + </para> + </listitem> + <listitem> + <para> + and following zero or 2 labels until + </para> + </listitem> + <listitem> + <para> + a node with label beginning from case-insensitive prefix 'sport' + </para> + </listitem> + <listitem> + <para> + following node with label not matched 'football' or 'tennis' and + </para> + </listitem> + <listitem> + <para> + end on node with label beginning from 'Russ' or strictly matched + 'Spain'. + </para> + </listitem> + </orderedlist> + + </listitem> + + <listitem> + <para><literal>ltxtquery</literal> + - is a datatype for label searching (like type 'query' for full text + searching, see contrib/tsearch). It's possible to use modifiers @,%,* at + the end of word. The meaning of modifiers are the same as for lquery. + </para> + <para> + Example: <literal>'Europe & Russia*@ & !Transportation'</literal> + </para> + <para> + Search paths contain words 'Europe' and 'Russia*' (case-insensitive) and + not 'Transportation'. Notice, the order of words as they appear in label + path is not important ! + </para> + </listitem> + + </itemizedlist> + </sect2> + + <sect2> + <title>Operations</title> + <para> + The following operations are defined for type ltree: + </para> + + <itemizedlist> + <listitem> + <para> + <literal><,>,<=,>=,=, <></literal> + - Have their usual meanings. Comparison is doing in the order of direct + tree traversing, children of a node are sorted lexicographic. + </para> + </listitem> + <listitem> + <para> + <literal>ltree @> ltree</literal> + - returns TRUE if left argument is an ancestor of right argument (or + equal). + </para> + </listitem> + <listitem> + <para> + <literal>ltree <@ ltree </literal> + - returns TRUE if left argument is a descendant of right argument (or + equal). + </para> + </listitem> + <listitem> + <para> + <literal>ltree ~ lquery, lquery ~ ltree</literal> + - return TRUE if node represented by ltree satisfies lquery. + </para> + </listitem> + <listitem> + <para> + <literal>ltree ? lquery[], lquery ? ltree[]</literal> + - return TRUE if node represented by ltree satisfies at least one lquery + from array. + </para> + </listitem> + <listitem> + <para> + <literal>ltree @ ltxtquery, ltxtquery @ ltree</literal> + - return TRUE if node represented by ltree satisfies ltxtquery. + </para> + </listitem> + <listitem> + <para> + <literal>ltree || ltree, ltree || text, text || ltree</literal> + - return concatenated ltree. + </para> + </listitem> + </itemizedlist> + + <para> + Operations for arrays of ltree (<literal>ltree[]</literal>): + </para> + <itemizedlist> + <listitem> + <para> + <literal>ltree[] @> ltree, ltree <@ ltree[]</literal> + - returns TRUE if array ltree[] contains an ancestor of ltree. + </para> + </listitem> + <listitem> + <para> + <literal>ltree @> ltree[], ltree[] <@ ltree</literal> + - returns TRUE if array ltree[] contains a descendant of ltree. + </para> + </listitem> + <listitem> + <para> + <literal>ltree[] ~ lquery, lquery ~ ltree[]</literal> + - returns TRUE if array ltree[] contains label paths matched lquery. + </para> + </listitem> + <listitem> + <para> + <literal>ltree[] ? lquery[], lquery[] ? ltree[]</literal> + - returns TRUE if array ltree[] contains label paths matched atleaset one + lquery from array. + </para> + </listitem> + <listitem> + <para> + <literal>ltree[] @ ltxtquery, ltxtquery @ ltree[]</literal> + - returns TRUE if array ltree[] contains label paths matched ltxtquery + (full text search). + </para> + </listitem> + <listitem> + <para> + <literal>ltree[] ?@> ltree, ltree ?<@ ltree[], ltree[] ?~ lquery, ltree[] ?@ ltxtquery</literal> + + - returns first element of array ltree[] satisfies corresponding condition + and NULL in vice versa. + </para> + </listitem> + </itemizedlist> + </sect2> + + <sect2> + <title>Remark</title> + + <para> + Operations <literal><@</literal>, <literal>@></literal>, <literal>@</literal> and + <literal>~</literal> have analogues - <literal>^<@, ^@>, ^@, ^~,</literal> which don't use + indices! + </para> + </sect2> + + <sect2> + <title>Indices</title> + <para> + Various indices could be created to speed up execution of operations: + </para> + + <itemizedlist> + <listitem> + <para> + B-tree index over ltree: <literal><, <=, =, >=, ></literal> + </para> + </listitem> + <listitem> + <para> + GiST index over ltree: <literal><, <=, =, >=, >, @>, <@, @, ~, ?</literal> + </para> + <para> + Example: + </para> + <programlisting> + CREATE INDEX path_gist_idx ON test USING GIST (path); + </programlisting> + </listitem> + <listitem> + <para>GiST index over ltree[]: + <literal>ltree[]<@ ltree, ltree @> ltree[], @, ~, ?.</literal> + </para> + <para> + Example: + </para> + <programlisting> + CREATE INDEX path_gist_idx ON test USING GIST (array_path); + </programlisting> + <para> + Notices: This index is lossy. + </para> + </listitem> + </itemizedlist> + </sect2> + + <sect2> + <title>Functions</title> + + <itemizedlist> + <listitem> + <para> + <literal>ltree subltree(ltree, start, end)</literal> + returns subpath of ltree from start (inclusive) until the end. + </para> + <programlisting> + # select subltree('Top.Child1.Child2',1,2); + subltree + -------- + Child1 + </programlisting> + </listitem> + <listitem> + <para> + <literal>ltree subpath(ltree, OFFSET,LEN)</literal> and + <literal>ltree subpath(ltree, OFFSET)</literal> + returns subpath of ltree from OFFSET (inclusive) with length LEN. + If OFFSET is negative returns subpath starts that far from the end + of the path. If LENGTH is omitted, returns everything to the end + of the path. If LENGTH is negative, leaves that many labels off + the end of the path. + </para> + <programlisting> + # select subpath('Top.Child1.Child2',1,2); + subpath + ------- + Child1.Child2 + + # select subpath('Top.Child1.Child2',-2,1); + subpath + --------- + Child1 + </programlisting> + </listitem> + <listitem> + <para> + <literal>int4 nlevel(ltree)</literal> - returns level of the node. + </para> + <programlisting> + # select nlevel('Top.Child1.Child2'); + nlevel + -------- + 3 + </programlisting> + <para> + Note, that arguments start, end, OFFSET, LEN have meaning of level of the + node ! + </para> + </listitem> + <listitem> + <para> + <literal>int4 index(ltree,ltree)</literal> and + <literal>int4 index(ltree,ltree,OFFSET)</literal> + returns number of level of the first occurence of second argument in first + one beginning from OFFSET. if OFFSET is negative, than search begins from | + OFFSET| levels from the end of the path. + </para> + <programlisting> + SELECT index('0.1.2.3.5.4.5.6.8.5.6.8','5.6',3); + index + ------- + 6 + SELECT index('0.1.2.3.5.4.5.6.8.5.6.8','5.6',-4); + index + ------- + 9 + </programlisting> + </listitem> + <listitem> + <para> + <literal>ltree text2ltree(text)</literal> and + <literal>text ltree2text(text)</literal> cast functions for ltree and text. + </para> + </listitem> + <listitem> + <para> + <literal>ltree lca(ltree,ltree,...) (up to 8 arguments)</literal> and + <literal>ltree lca(ltree[])</literal> Returns Lowest Common Ancestor (lca). + </para> + <programlisting> + # select lca('1.2.2.3','1.2.3.4.5.6'); + lca + ----- + 1.2 + # select lca('{la.2.3,1.2.3.4.5.6}') is null; + ?column? + ---------- + f + </programlisting> + </listitem> + </itemizedlist> + </sect2> + + <sect2> + <title>Installation</title> + <programlisting> + cd contrib/ltree + make + make install + make installcheck + </programlisting> + </sect2> + + <sect2> + <title>Example</title> + <programlisting> + createdb ltreetest + psql ltreetest < /usr/local/pgsql/share/contrib/ltree.sql + psql ltreetest < ltreetest.sql + </programlisting> + + <para> +Now, we have a database ltreetest populated with a data describing hierarchy +shown below: + </para> + + <programlisting> + + + TOP + / | \ + Science Hobbies Collections + / | \ + Astronomy Amateurs_Astronomy Pictures + / \ | + Astrophysics Cosmology Astronomy + / | \ + Galaxies Stars Astronauts + </programlisting> + <para> + Inheritance: + </para> + + <programlisting> +ltreetest=# select path from test where path <@ 'Top.Science'; + path +------------------------------------ + Top.Science + Top.Science.Astronomy + Top.Science.Astronomy.Astrophysics + Top.Science.Astronomy.Cosmology +(4 rows) + </programlisting> + <para> + Matching: + </para> + <programlisting> +ltreetest=# select path from test where path ~ '*.Astronomy.*'; + path +----------------------------------------------- + Top.Science.Astronomy + Top.Science.Astronomy.Astrophysics + Top.Science.Astronomy.Cosmology + Top.Collections.Pictures.Astronomy + Top.Collections.Pictures.Astronomy.Stars + Top.Collections.Pictures.Astronomy.Galaxies + Top.Collections.Pictures.Astronomy.Astronauts +(7 rows) +ltreetest=# select path from test where path ~ '*.!pictures@.*.Astronomy.*'; + path +------------------------------------ + Top.Science.Astronomy + Top.Science.Astronomy.Astrophysics + Top.Science.Astronomy.Cosmology +(3 rows) + </programlisting> + <para> + Full text search: + </para> + <programlisting> +ltreetest=# select path from test where path @ 'Astro*% & !pictures@'; + path +------------------------------------ + Top.Science.Astronomy + Top.Science.Astronomy.Astrophysics + Top.Science.Astronomy.Cosmology + Top.Hobbies.Amateurs_Astronomy +(4 rows) + +ltreetest=# select path from test where path @ 'Astro* & !pictures@'; + path +------------------------------------ + Top.Science.Astronomy + Top.Science.Astronomy.Astrophysics + Top.Science.Astronomy.Cosmology +(3 rows) + </programlisting> + <para> + Using Functions: + </para> + <programlisting> +ltreetest=# select subpath(path,0,2)||'Space'||subpath(path,2) from test where path <@ 'Top.Science.Astronomy'; + ?column? +------------------------------------------ + Top.Science.Space.Astronomy + Top.Science.Space.Astronomy.Astrophysics + Top.Science.Space.Astronomy.Cosmology +(3 rows) +We could create SQL-function: +CREATE FUNCTION ins_label(ltree, int4, text) RETURNS ltree +AS 'select subpath($1,0,$2) || $3 || subpath($1,$2);' +LANGUAGE SQL IMMUTABLE; + </programlisting> + <para> + and previous select could be rewritten as: + </para> + + <programlisting> +ltreetest=# select ins_label(path,2,'Space') from test where path <@ 'Top.Science.Astronomy'; + ins_label +------------------------------------------ + Top.Science.Space.Astronomy + Top.Science.Space.Astronomy.Astrophysics + Top.Science.Space.Astronomy.Cosmology +(3 rows) + </programlisting> + + <para> + Or with another arguments: + </para> + + <programlisting> +CREATE FUNCTION ins_label(ltree, ltree, text) RETURNS ltree +AS 'select subpath($1,0,nlevel($2)) || $3 || subpath($1,nlevel($2));' +LANGUAGE SQL IMMUTABLE; + +ltreetest=# select ins_label(path,'Top.Science'::ltree,'Space') from test where path <@ 'Top.Science.Astronomy'; + ins_label +------------------------------------------ + Top.Science.Space.Astronomy + Top.Science.Space.Astronomy.Astrophysics + Top.Science.Space.Astronomy.Cosmology +(3 rows) + </programlisting> + </sect2> + + <sect2> + <title>Additional data</title> + <para> + To get more feeling from our ltree module you could download + dmozltree-eng.sql.gz (about 3Mb tar.gz archive containing 300,274 nodes), + available from + <ulink url="http://www.sai.msu.su/~megera/postgres/gist/ltree/"></ulink> + dmozltree-eng.sql.gz, which is DMOZ catalogue, prepared for use with ltree. + Setup your test database (dmoz), load ltree module and issue command: + </para> + <programlisting> + zcat dmozltree-eng.sql.gz| psql dmoz + </programlisting> + <para> + Data will be loaded into database dmoz and all indices will be created. + </para> + </sect2> + + <sect2> + <title>Benchmarks</title> + <para> + All runs were performed on my IBM ThinkPad T21 (256 MB RAM, 750Mhz) using DMOZ + data, containing 300,274 nodes (see above for download link). We used some + basic queries typical for walking through catalog. + </para> + + <sect3> + <title>Queries</title> + <itemizedlist> + <listitem> + <para> + Q0: Count all rows (sort of base time for comparison) + </para> + <programlisting> + select count(*) from dmoz; + count + -------- + 300274 + (1 row) + </programlisting> + </listitem> + <listitem> + <para> + Q1: Get direct children (without inheritance) + </para> + <programlisting> + select path from dmoz where path ~ 'Top.Adult.Arts.Animation.*{1}'; + path + ----------------------------------- + Top.Adult.Arts.Animation.Cartoons + Top.Adult.Arts.Animation.Anime + (2 rows) + </programlisting> + </listitem> + <listitem> + <para> + Q2: The same as Q1 but with counting of successors + </para> + <programlisting> + select path as parentpath , (select count(*)-1 from dmoz where path <@ + p.path) as count from dmoz p where path ~ 'Top.Adult.Arts.Animation.*{1}'; + parentpath | count + -----------------------------------+------- + Top.Adult.Arts.Animation.Cartoons | 2 + Top.Adult.Arts.Animation.Anime | 61 + (2 rows) + </programlisting> + </listitem> + <listitem> + <para> + Q3: Get all parents + </para> + <programlisting> + select path from dmoz where path @> 'Top.Adult.Arts.Animation' order by + path asc; + path + -------------------------- + Top + Top.Adult + Top.Adult.Arts + Top.Adult.Arts.Animation + (4 rows) + </programlisting> + </listitem> + <listitem> + <para> + Q4: Get all parents with counting of children + </para> + <programlisting> + select path, (select count(*)-1 from dmoz where path <@ p.path) as count + from dmoz p where path @> 'Top.Adult.Arts.Animation' order by path asc; + path | count + --------------------------+-------- + Top | 300273 + Top.Adult | 4913 + Top.Adult.Arts | 339 + Top.Adult.Arts.Animation | 65 + (4 rows) + </programlisting> + </listitem> + <listitem> + <para> + Q5: Get all children with levels + </para> + <programlisting> + select path, nlevel(path) - nlevel('Top.Adult.Arts.Animation') as level + from dmoz where path ~ 'Top.Adult.Arts.Animation.*{1,2}' order by path asc; + path | level + ------------------------------------------------+------- + Top.Adult.Arts.Animation.Anime | 1 + Top.Adult.Arts.Animation.Anime.Fan_Works | 2 + Top.Adult.Arts.Animation.Anime.Games | 2 + Top.Adult.Arts.Animation.Anime.Genres | 2 + Top.Adult.Arts.Animation.Anime.Image_Galleries | 2 + Top.Adult.Arts.Animation.Anime.Multimedia | 2 + Top.Adult.Arts.Animation.Anime.Resources | 2 + Top.Adult.Arts.Animation.Anime.Titles | 2 + Top.Adult.Arts.Animation.Cartoons | 1 + Top.Adult.Arts.Animation.Cartoons.AVS | 2 + Top.Adult.Arts.Animation.Cartoons.Members | 2 + (11 rows) + </programlisting> + </listitem> + </itemizedlist> + </sect3> + + <sect3> + <title>Timings</title> + <programlisting> ++---------------------------------------------+ +|Query|Rows|Time (ms) index|Time (ms) no index| +|-----+----+---------------+------------------| +| Q0| 1| NA| 1453.44| +|-----+----+---------------+------------------| +| Q1| 2| 0.49| 1001.54| +|-----+----+---------------+------------------| +| Q2| 2| 1.48| 3009.39| +|-----+----+---------------+------------------| +| Q3| 4| 0.55| 906.98| +|-----+----+---------------+------------------| +| Q4| 4| 24385.07| 4951.91| +|-----+----+---------------+------------------| +| Q5| 11| 0.85| 1003.23| ++---------------------------------------------+ + </programlisting> + <para> + Timings without indices were obtained using operations which doesn't use + indices (see above) + </para> + </sect3> + + <sect3> + <title>Remarks</title> + <para> + We didn't run full-scale tests, also we didn't present (yet) data for + operations with arrays of ltree (ltree[]) and full text searching. We'll + appreciate your input. So far, below some (rather obvious) results: + </para> + <itemizedlist> + <listitem> + <para> + Indices does help execution of queries + </para> + </listitem> + <listitem> + <para> + Q4 performs bad because one needs to read almost all data from the HDD + </para> + </listitem> + </itemizedlist> + </sect3> + </sect2> + <sect2> + <title>Some Backgrounds</title> + <para> + The approach we use for ltree is much like one we used in our other GiST based + contrib modules (intarray, tsearch, tree, btree_gist, rtree_gist). Theoretical + background is available in papers referenced from our GiST development page + (<ulink url="http://www.sai.msu.su/~megera/postgres/gist"></ulink>). + </para> + <para> + A hierarchical data structure (tree) is a set of nodes. Each node has a + signature (LPS) of a fixed size, which is a hashed label path of that node. + Traversing a tree we could *certainly* prune branches if + </para> + <programlisting> + LQS (bitwise AND) LPS != LQS + </programlisting> + <para> + where LQS is a signature of lquery or ltxtquery, obtained in the same way as + LPS. + </para> + <programlisting> + ltree[]: + </programlisting> + <para> + For array of ltree LPS is a bitwise OR-ed signatures of *ALL* children + reachable from that node. Signatures are stored in RD-tree, implemented using + GiST, which provides indexed access. + </para> + <programlisting> + ltree: + </programlisting> + <para> + For ltree we store LPS in a B-tree, implemented using GiST. Each node entry is + represented by (left_bound, signature, right_bound), so that we could speedup + operations <literal><, <=, =, >=, ></literal> using left_bound, right_bound and prune branches of + a tree using signature. + </para> + </sect2> + <sect2> + <title>Authors</title> + <para> + All work was done by Teodor Sigaev (<email>teodor@stack.net</email>) and + Oleg Bartunov (<email>oleg@sai.msu.su</email>). See + <ulink url="http://www.sai.msu.su/~megera/postgres/gist"></ulink> for + additional information. Authors would like to thank Eugeny Rodichev for + helpful discussions. Comments and bug reports are welcome. + </para> + </sect2> +</sect1> + diff --git a/doc/src/sgml/oid2name.sgml b/doc/src/sgml/oid2name.sgml new file mode 100644 index 00000000000..2c5c396522f --- /dev/null +++ b/doc/src/sgml/oid2name.sgml @@ -0,0 +1,205 @@ +<sect1 id="oid2name"> + <title>oid2name</title> + + <indexterm zone="oid2name"> + <primary>oid2name</primary> + </indexterm> + + <para> + This utility allows administrators to examine the file structure used by + PostgreSQL. To make use of it, you need to be familiar with the file + structure, which is described in <xref linkend="storage">. + </para> + + <sect2> + <title>Overview</title> + <para> + <literal>oid2name</literal> connects to the database and extracts OID, + filenode, and table name information. You can also have it show database + OIDs and tablespace OIDs. + </para> + <para> + When displaying specific tables, you can select which tables to show by + using -o, -f and -t. The first switch takes an OID, the second takes + a filenode, and the third takes a tablename (actually, it's a LIKE + pattern, so you can use things like "foo%"). Note that you can use as many + of these switches as you like, and the listing will include all objects + matched by any of the switches. Also note that these switches can only + show objects in the database given in -d. + </para> + <para> + If you don't give any of -o, -f or -t it will dump all the tables in the + database given in -d. If you don't give -d, it will show a database + listing. Alternatively you can give -s to get a tablespace listing. + </para> + <table> + <title>Additional switches</title> + <tgroup cols="2"> + <tbody> + <row> + <entry><literal>-i</literal></entry> + <entry>include indexes and sequences in the database listing.</entry> + </row> + <row> + <entry><literal>-x</literal></entry> + <entry>display more information about each object shown: tablespace name, + schema name, OID. + </entry> + </row> + <row> + <entry><literal>-S</literal></entry> + <entry>also show system objects (those in information_schema, pg_toast + and pg_catalog schemas) + </entry> + </row> + <row> + <entry><literal>-q</literal></entry> + <entry>don't display headers(useful for scripting)</entry> + </row> + </tbody> + </tgroup> + </table> + </sect2> + + <sect2> + <title>Examples</title> + + <programlisting> +$ oid2name +All databases: + Oid Database Name Tablespace +---------------------------------- + 17228 alvherre pg_default + 17255 regression pg_default + 17227 template0 pg_default + 1 template1 pg_default + +$ oid2name -s +All tablespaces: + Oid Tablespace Name +------------------------- + 1663 pg_default + 1664 pg_global + 155151 fastdisk + 155152 bigdisk + +$ cd $PGDATA/17228 + +$ # get top 10 db objects in the default tablespace, ordered by size +$ ls -lS * | head -10 +-rw------- 1 alvherre alvherre 136536064 sep 14 09:51 155173 +-rw------- 1 alvherre alvherre 17965056 sep 14 09:51 1155291 +-rw------- 1 alvherre alvherre 1204224 sep 14 09:51 16717 +-rw------- 1 alvherre alvherre 581632 sep 6 17:51 1255 +-rw------- 1 alvherre alvherre 237568 sep 14 09:50 16674 +-rw------- 1 alvherre alvherre 212992 sep 14 09:51 1249 +-rw------- 1 alvherre alvherre 204800 sep 14 09:51 16684 +-rw------- 1 alvherre alvherre 196608 sep 14 09:50 16700 +-rw------- 1 alvherre alvherre 163840 sep 14 09:50 16699 +-rw------- 1 alvherre alvherre 122880 sep 6 17:51 16751 + +$ oid2name -d alvherre -f 155173 +From database "alvherre": + Filenode Table Name +---------------------- + 155173 accounts + +$ # you can ask for more than one object +$ oid2name -d alvherre -f 155173 -f 1155291 +From database "alvherre": + Filenode Table Name +------------------------- + 155173 accounts + 1155291 accounts_pkey + +$ # you can also mix the options, and have more details +$ oid2name -d alvherre -t accounts -f 1155291 -x +From database "alvherre": + Filenode Table Name Oid Schema Tablespace +------------------------------------------------------ + 155173 accounts 155173 public pg_default + 1155291 accounts_pkey 1155291 public pg_default + +$ # show disk space for every db object +$ du [0-9]* | +> while read SIZE FILENODE +> do +> echo "$SIZE `oid2name -q -d alvherre -i -f $FILENODE`" +> done +16 1155287 branches_pkey +16 1155289 tellers_pkey +17561 1155291 accounts_pkey +... + +$ # same, but sort by size +$ du [0-9]* | sort -rn | while read SIZE FN +> do +> echo "$SIZE `oid2name -q -d alvherre -f $FN`" +> done +133466 155173 accounts +17561 1155291 accounts_pkey +1177 16717 pg_proc_proname_args_nsp_index +... + +$ # If you want to see what's in tablespaces, use the pg_tblspc directory +$ cd $PGDATA/pg_tblspc +$ oid2name -s +All tablespaces: + Oid Tablespace Name +------------------------- + 1663 pg_default + 1664 pg_global + 155151 fastdisk + 155152 bigdisk + +$ # what databases have objects in tablespace "fastdisk"? +$ ls -d 155151/* +155151/17228/ 155151/PG_VERSION + +$ # Oh, what was database 17228 again? +$ oid2name +All databases: + Oid Database Name Tablespace +---------------------------------- + 17228 alvherre pg_default + 17255 regression pg_default + 17227 template0 pg_default + 1 template1 pg_default + +$ # Let's see what objects does this database have in the tablespace. +$ cd 155151/17228 +$ ls -l +total 0 +-rw------- 1 postgres postgres 0 sep 13 23:20 155156 + +$ # OK, this is a pretty small table ... but which one is it? +$ oid2name -d alvherre -f 155156 +From database "alvherre": + Filenode Table Name +---------------------- + 155156 foo + +$ # end of sample session. + </programlisting> + + <para> + You can also get approximate size data for each object using psql. For + example, + </para> + <programlisting> + SELECT relpages, relfilenode, relname FROM pg_class ORDER BY relpages DESC; + </programlisting> + <para> + Each page is typically 8k. Relpages is updated by VACUUM. + </para> + </sect2> + + <sect2> + <title>Author</title> + <para> + b. palmer, <email>bpalmer@crimelabs.net</email> + </para> + </sect2> + +</sect1> + diff --git a/doc/src/sgml/pageinspect.sgml b/doc/src/sgml/pageinspect.sgml new file mode 100644 index 00000000000..3fe1edf378f --- /dev/null +++ b/doc/src/sgml/pageinspect.sgml @@ -0,0 +1,125 @@ + +<sect1 id="pageinspect"> + <title>pageinspect</title> + + <indexterm zone="pageinspect"> + <primary>pageinspect</primary> + </indexterm> + + <para> + The functions in this module allow you to inspect the contents of data pages + at a low level, for debugging purposes. + </para> + + <sect2> + <title>Functions included</title> + + <itemizedlist> + <listitem> + <para> + <literal>get_raw_page</literal> reads one block of the named table and returns a copy as a + bytea field. This allows a single time-consistent copy of the block to be + made. Use of this functions is restricted to superusers. + </para> + </listitem> + + <listitem> + <para> + <literal>page_header</literal> shows fields which are common to all PostgreSQL heap and index + pages. Use of this function is restricted to superusers. + </para> + <para> + A page image obtained with <literal>get_raw_page</literal> should be passed as argument: + </para> + <programlisting> + test=# SELECT * FROM page_header(get_raw_page('pg_class',0)); + lsn | tli | flags | lower | upper | special | pagesize | version + ----------+-----+-------+-------+-------+---------+----------+--------- + 0/3C5614 | 1 | 1 | 216 | 256 | 8192 | 8192 | 4 + (1 row) + </programlisting> + <para> + The returned columns correspond to the fields in the PageHeaderData-struct, + see src/include/storage/bufpage.h for more details. + </para> + </listitem> + + <listitem> + <para> + <literal>heap_page_items</literal> shows all line pointers on a heap page. For those line + pointers that are in use, tuple headers are also shown. All tuples are + shown, whether or not the tuples were visible to an MVCC snapshot at the + time the raw page was copied. Use of this function is restricted to + superusers. + </para> + <para> + A heap page image obtained with <literal>get_raw_page</literal> should be passed as argument: + </para> + <programlisting> + test=# SELECT * FROM heap_page_items(get_raw_page('pg_class',0)); + </programlisting> + <para> + See src/include/storage/itemid.h and src/include/access/htup.h for + explanations of the fields returned. + </para> + </listitem> + + <listitem> + <para> + <literal>bt_metap()</literal> returns information about the btree index metapage: + </para> + <programlisting> + test=> SELECT * FROM bt_metap('pg_cast_oid_index'); + -[ RECORD 1 ]----- + magic | 340322 + version | 2 + root | 1 + level | 0 + fastroot | 1 + fastlevel | 0 + </programlisting> + </listitem> + + <listitem> + <para> + <literal>bt_page_stats()</literal> shows information about single btree pages: + </para> + <programlisting> + test=> SELECT * FROM bt_page_stats('pg_cast_oid_index', 1); + -[ RECORD 1 ]-+----- + blkno | 1 + type | l + live_items | 256 + dead_items | 0 + avg_item_size | 12 + page_size | 8192 + free_size | 4056 + btpo_prev | 0 + btpo_next | 0 + btpo | 0 + btpo_flags | 3 + </programlisting> + </listitem> + + <listitem> + <para> + <literal>bt_page_items()</literal> returns information about specific items on btree pages: + </para> + <programlisting> + test=> SELECT * FROM bt_page_items('pg_cast_oid_index', 1); + itemoffset | ctid | itemlen | nulls | vars | data + ------------+---------+---------+-------+------+------------- + 1 | (0,1) | 12 | f | f | 23 27 00 00 + 2 | (0,2) | 12 | f | f | 24 27 00 00 + 3 | (0,3) | 12 | f | f | 25 27 00 00 + 4 | (0,4) | 12 | f | f | 26 27 00 00 + 5 | (0,5) | 12 | f | f | 27 27 00 00 + 6 | (0,6) | 12 | f | f | 28 27 00 00 + 7 | (0,7) | 12 | f | f | 29 27 00 00 + 8 | (0,8) | 12 | f | f | 2a 27 00 00 + </programlisting> + </listitem> + </itemizedlist> + </sect2> +</sect1> + diff --git a/doc/src/sgml/pgbench.sgml b/doc/src/sgml/pgbench.sgml new file mode 100644 index 00000000000..7f73dfa9eb3 --- /dev/null +++ b/doc/src/sgml/pgbench.sgml @@ -0,0 +1,422 @@ + +<sect1 id="pgbench"> + <title>pgbench</title> + + <indexterm zone="pgbench"> + <primary>pgbench</primary> + </indexterm> + + <para> + <literal>pgbench</literal> is a simple program to run a benchmark test. + <literal>pgbench</literal> is a client application of PostgreSQL and runs + with PostgreSQL only. It performs lots of small and simple transactions + including SELECT/UPDATE/INSERT operations then calculates number of + transactions successfully completed within a second (transactions + per second, tps). Targeting data includes a table with at least 100k + tuples. + </para> + <para> + Example outputs from pgbench look like: + </para> + <programlisting> +number of clients: 4 +number of transactions per client: 100 +number of processed transactions: 400/400 +tps = 19.875015(including connections establishing) +tps = 20.098827(excluding connections establishing) + </programlisting> + <para> Similar program called "JDBCBench" already exists, but it requires + Java that may not be available on every platform. Moreover some + people concerned about the overhead of Java that might lead + inaccurate results. So I decided to write in pure C, and named + it "pgbench." + </para> + + <para> + Features of pgbench: + </para> + <itemizedlist> + <listitem> + <para> + pgbench is written in C using libpq only. So it is very portable + and easy to install. + </para> + </listitem> + <listitem> + <para> + pgbench can simulate concurrent connections using asynchronous + capability of libpq. No threading is required. + </para> + </listitem> + </itemizedlist> + + <sect2> + <title>Overview</title> + <orderedlist> + <listitem> + <para>(optional)Initialize database by:</para> + <programlisting> +pgbench -i <dbname> + </programlisting> + <para> + where <dbname> is the name of database. pgbench uses four tables + accounts, branches, history and tellers. These tables will be + destroyed. Be very careful if you have tables having same + names. Default test data contains: + </para> + <programlisting> +table # of tuples +------------------------- +branches 1 +tellers 10 +accounts 100000 +history 0 + </programlisting> + <para> + You can increase the number of tuples by using -s option. branches, + tellers and accounts tables are created with a fillfactor which is + set using -F option. See below. + </para> + </listitem> + <listitem> + <para>Run the benchmark test</para> + <programlisting> +pgbench <dbname> + </programlisting> + <para> + The default configuration is: + </para> + <programlisting> + number of clients: 1 + number of transactions per client: 10 + </programlisting> + </listitem> + </orderedlist> + + <table> + <title><literal>pgbench</literal> options</title> + <tgroup cols="2"> + <thead> + <row> + <entry>Parameter</entry> + <entry>Description</entry> + </row> + </thead> + <tbody> + <row> + <entry><literal>-h hostname</literal></entry> + <entry> + <para> + hostname where the backend is running. If this option + is omitted, pgbench will connect to the localhost via + Unix domain socket. + </para> + </entry> + </row> + <row> + <entry><literal>-p port</literal></entry> + <entry> + <para> + the port number that the backend is accepting. default is + libpq's default, usually 5432. + </para> + </entry> + </row> + <row> + <entry><literal>-c number_of_clients</literal></entry> + <entry> + <para> + Number of clients simulated. default is 1. + </para> + </entry> + </row> + <row> + <entry><literal>-t number_of_transactions</literal></entry> + <entry> + <para> + Number of transactions each client runs. default is 10. + </para> + </entry> + </row> + <row> + <entry><literal>-s scaling_factor</literal></entry> + <entry> + <para> + this should be used with -i (initialize) option. + number of tuples generated will be multiple of the + scaling factor. For example, -s 100 will imply 10M + (10,000,000) tuples in the accounts table. + default is 1. + </para> + <para> + NOTE: scaling factor should be at least + as large as the largest number of clients you intend + to test; else you'll mostly be measuring update contention. + Regular (not initializing) runs using one of the + built-in tests will detect scale based on the number of + branches in the database. For custom (-f) runs it can + be manually specified with this parameter. + </para> + </entry> + </row> + <row> + <entry><literal>-D varname=value</literal></entry> + <entry> + <para> + Define a variable. It can be refered to by a script + provided by using -f option. Multiple -D options are allowed. + </para> + </entry> + </row> + <row> + <entry><literal>-U login</literal></entry> + <entry> + <para> + Specify db user's login name if it is different from + the Unix login name. + </para> + </entry> + </row> + <row> + <entry><literal>-P password</literal></entry> + <entry> + <para> + Specify the db password. CAUTION: using this option + might be a security hole since ps command will + show the password. Use this for TESTING PURPOSE ONLY. + </para> + </entry> + </row> + <row> + <entry><literal>-n</literal></entry> + <entry> + <para> + No vacuuming and cleaning the history table prior to the + test is performed. + </para> + </entry> + </row> + <row> + <entry><literal>-v</literal></entry> + <entry> + <para> + Do vacuuming before testing. This will take some time. + With neither -n nor -v, pgbench will vacuum tellers and + branches tables only. + </para> + </entry> + </row> + <row> + <entry><literal>-S</literal></entry> + <entry> + <para> + Perform select only transactions instead of TPC-B. + </para> + </entry> + </row> + <row> + <entry><literal>-N</literal></entry> + <entry> + <para> + Do not update "branches" and "tellers". This will + avoid heavy update contention on branches and tellers, + while it will not make pgbench supporting TPC-B like + transactions. + </para> + </entry> + </row> + <row> + <entry><literal>-f filename</literal></entry> + <entry> + <para> + Read transaction script from file. Detailed + explanation will appear later. + </para> + </entry> + </row> + <row> + <entry><literal>-C</literal></entry> + <entry> + <para> + Establish connection for each transaction, rather than + doing it just once at beginning of pgbench in the normal + mode. This is useful to measure the connection overhead. + </para> + </entry> + </row> + <row> + <entry><literal>-l</literal></entry> + <entry> + <para> + Write the time taken by each transaction to a logfile, + with the name "pgbench_log.xxx", where xxx is the PID + of the pgbench process. The format of the log is: + </para> + <programlisting> + client_id transaction_no time file_no time-epoch time-us + </programlisting> + <para> + where time is measured in microseconds, , the file_no is + which test file was used (useful when multiple were + specified with -f), and time-epoch/time-us are a + UNIX epoch format timestamp followed by an offset + in microseconds (suitable for creating a ISO 8601 + timestamp with a fraction of a second) of when + the transaction completed. + </para> + <para> + Here are example outputs: + </para> + <programlisting> + 0 199 2241 0 1175850568 995598 + 0 200 2465 0 1175850568 998079 + 0 201 2513 0 1175850569 608 + 0 202 2038 0 1175850569 2663 + </programlisting> + </entry> + </row> + <row> + <entry><literal>-F fillfactor</literal></entry> + <entry> + <para> + Create tables(accounts, tellers and branches) with the given + fillfactor. Default is 100. This should be used with -i + (initialize) option. + </para> + </entry> + </row> + <row> + <entry><literal>-d</literal></entry> + <entry> + <para> + debug option. + </para> + </entry> + </row> + </tbody> + </tgroup> + </table> + </sect2> + + <sect2> + <title>What is the "transaction" actually performed in pgbench?</title> + <orderedlist> + <listitem><para><literal>begin;</literal></para></listitem> + + <listitem><para><literal>update accounts set abalance = abalance + :delta where aid = :aid;</literal></para></listitem> + + <listitem><para><literal>select abalance from accounts where aid = :aid;</literal></para></listitem> + + <listitem><para><literal>update tellers set tbalance = tbalance + :delta where tid = :tid;</literal></para></listitem> + + <listitem><para><literal>update branches set bbalance = bbalance + :delta where bid = :bid;</literal></para></listitem> + + <listitem><para><literal>insert into history(tid,bid,aid,delta) values(:tid,:bid,:aid,:delta);</literal></para></listitem> + + <listitem><para><literal>end;</literal></para></listitem> + </orderedlist> + <para> + If you specify -N, (4) and (5) aren't included in the transaction. + </para> + </sect2> + + <sect2> + <title>Script file</title> + <para> + <literal>pgbench</literal> has support for reading a transaction script + from a specified file (<literal>-f</literal> option). This file should + include SQL commands in each line. SQL command consists of multiple lines + are not supported. Empty lines and lines begging with "--" will be ignored. + </para> + <para> + Multiple <literal>-f</literal> options are allowed. In this case each + transaction is assigned randomly chosen script. + </para> + <para> + SQL commands can include "meta command" which begins with "\" (back + slash). A meta command takes some arguments separted by white + spaces. Currently following meta command is supported: + </para> + + <itemizedlist> + <listitem> + <para> + <literal>\set name operand1 [ operator operand2 ]</literal> + - Sets the calculated value using "operand1" "operator" + "operand2" to variable "name". If "operator" and "operand2" + are omitted, the value of operand1 is set to variable "name". + </para> + <para> + Example: + </para> + <programlisting> +\set ntellers 10 * :scale + </programlisting> + </listitem> + <listitem> + <para> + <literal>\setrandom name min max</literal> + - Assigns random integer to name between min and max + </para> + <para> + Example: + </para> + <programlisting> +\setrandom aid 1 100000 + </programlisting> + </listitem> + <listitem> + <para> + Variables can be referred to in SQL comands by adding ":" in front + of the varible name. + </para> + <para> + Example: + </para> + <programlisting> +SELECT abalance FROM accounts WHERE aid = :aid + </programlisting> + <para> + Variables can also be defined by using -D option. + </para> + </listitem> + </itemizedlist> + </sect2> + + <sect2> + <title>Examples</title> + <para> + Example, TPC-B like benchmark can be defined as follows(scaling + factor = 1): + </para> + <programlisting> +\set nbranches :scale +\set ntellers 10 * :scale +\set naccounts 100000 * :scale +\setrandom aid 1 :naccounts +\setrandom bid 1 :nbranches +\setrandom tid 1 :ntellers +\setrandom delta 1 10000 +BEGIN +UPDATE accounts SET abalance = abalance + :delta WHERE aid = :aid +SELECT abalance FROM accounts WHERE aid = :aid +UPDATE tellers SET tbalance = tbalance + :delta WHERE tid = :tid +UPDATE branches SET bbalance = bbalance + :delta WHERE bid = :bid +INSERT INTO history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, 'now') +END + </programlisting> + <para> + If you want to automatically set the scaling factor from the number of + tuples in branches table, use -s option and shell command like this: + </para> + <programlisting> +pgbench -s $(psql -At -c "SELECT count(*) FROM branches") -f tpc_b.sql + </programlisting> + <para> + Notice that -f option does not execute vacuum and clearing history + table before starting benchmark. + </para> + </sect2> + +</sect1> + diff --git a/doc/src/sgml/pgcrypto.sgml b/doc/src/sgml/pgcrypto.sgml new file mode 100644 index 00000000000..4da29e03779 --- /dev/null +++ b/doc/src/sgml/pgcrypto.sgml @@ -0,0 +1,1144 @@ + +<sect1 id="pgcrypto"> + <title>pgcrypto</title> + + <indexterm zone="pgcrypto"> + <primary>pgcrypto</primary> + </indexterm> + + <para> + This module provides cryptographic functions for PostgreSQL. + </para> + + <sect2> + <title>Notes</title> + <sect3> + <title>Configuration</title> + <para> + pgcrypto configures itself according to the findings of main PostgreSQL + <literal>configure</literal> script. The options that affect it are + <literal>--with-zlib</literal> and <literal>--with-openssl</literal>. + </para> + <para> + When compiled with zlib, PGP encryption functions are able to + compress data before encrypting. + </para> + <para> + When compiled with OpenSSL there will be more algorithms available. + Also public-key encryption functions will be faster as OpenSSL + has more optimized BIGNUM functions. + </para> + <para> + Summary of functionality with and without OpenSSL: + </para> + <table> + <title>Summary of functionality with and without OpenSSL</title> + <tgroup cols="3"> + <thead> + <row> + <entry>Functionality</entry> + <entry>built-in</entry> + <entry>OpenSSL</entry> + </row> + </thead> + <tbody> + <row> + <entry>MD5</entry> + <entry>yes</entry> + <entry>yes</entry> + </row> + <row> + <entry>SHA1</entry> + <entry>yes</entry> + <entry>yes</entry> + </row> + <row> + <entry>SHA224/256/384/512</entry> + <entry>yes</entry> + <entry>yes (3)</entry> + </row> + <row> + <entry>Any other digest algo</entry> + <entry>no</entry> + <entry>yes (1)</entry> + </row> + <row> + <entry>Blowfish</entry> + <entry>yes</entry> + <entry>yes</entry> + </row> + <row> + <entry>AES</entry> + <entry>yes</entry> + <entry>yes (2)</entry> + </row> + <row> + <entry>DES/3DES/CAST5</entry> + <entry>no</entry> + <entry>yes</entry> + </row> + <row> + <entry>Raw encryption</entry> + <entry>yes</entry> + <entry>yes</entry> + </row> + <row> + <entry>PGP Symetric encryption</entry> + <entry>yes</entry> + <entry>yes</entry> + </row> + <row> + <entry>PGP Public-Key encryption</entry> + <entry>yes</entry> + <entry>yes</entry> + </row> + </tbody> + </tgroup> + </table> + + <orderedlist> + <listitem> + <para> + Any digest algorithm OpenSSL supports is automatically picked up. + This is not possible with ciphers, which need to be supported + explicitly. + </para> + </listitem> + <listitem> + <para> + AES is included in OpenSSL since version 0.9.7. If pgcrypto is + compiled against older version, it will use built-in AES code, + so it has AES always available. + </para> + </listitem> + <listitem> + <para> + SHA2 algorithms were added to OpenSSL in version 0.9.8. For + older versions, pgcrypto will use built-in code. + </para> + </listitem> + </orderedlist> + </sect3> + + <sect3> + <title>NULL handling</title> + <para> + As standard in SQL, all functions return NULL, if any of the arguments + are NULL. This may create security risks on careless usage. + </para> + </sect3> + + <sect3> + <title>Security</title> + <para> + All the functions here run inside database server. That means that all + the data and passwords move between pgcrypto and client application in + clear-text. Thus you must: + </para> + + <orderedlist> + <listitem> + <para>Connect locally or use SSL connections.</para> + </listitem> + <listitem> + <para>Trust both system and database administrator.</para> + </listitem> + </orderedlist> + <para> + If you cannot, then better do crypto inside client application. + </para> + </sect3> + </sect2> + + <sect2> + <title>General hashing</title> + + <sect3> + <title><literal>digest(data, type)</literal></title> + <programlisting> + digest(data text, type text) RETURNS bytea + digest(data bytea, type text) RETURNS bytea + </programlisting> + <para> + Type is here the algorithm to use. Standard algorithms are `md5` and + `sha1`, although there may be more supported, depending on build + options. + </para> + <para> + Returns binary hash. + </para> + <para> + If you want hexadecimal string, use `encode()` on result. Example: + </para> + <programlisting> + CREATE OR REPLACE FUNCTION sha1(bytea) RETURNS text AS $$ + SELECT encode(digest($1, 'sha1'), 'hex') + $$ LANGUAGE SQL STRICT IMMUTABLE; + </programlisting> + </sect3> + + <sect3> + <title><literal>hmac(data, key, type)</literal></title> + <programlisting> + hmac(data text, key text, type text) RETURNS bytea + hmac(data bytea, key text, type text) RETURNS bytea + </programlisting> + <para> + Calculates Hashed MAC over data. `type` is the same as in `digest()`. + If the key is larger than hash block size it will first hashed and the + hash will be used as key. + </para> + <para> + It is similar to digest() but the hash can be recalculated only knowing + the key. This avoids the scenario of someone altering data and also + changing the hash. + </para> + <para> + Returns binary hash. + </para> + </sect3> + </sect2> + + <sect2> + <title>Password hashing</title> + <para> + The functions <literal>crypt()</literal> and <literal>gen_salt()</literal> are specifically designed + for hashing passwords. <literal>crypt()</literal> does the hashing and `gen_salt()` + prepares algorithm parameters for it. + </para> + <para> + The algorithms in `crypt()` differ from usual hashing algorithms like + MD5 or SHA1 in following respects: + </para> + <orderedlist> + <listitem> + <para> + They are slow. As the amount of data is so small, this is only + way to make brute-forcing passwords hard. + </para> + </listitem> + <listitem> + <para> + Include random 'salt' with result, so that users having same + password would have different crypted passwords. This is also + additional defense against reversing the algorithm. + </para> + </listitem> + <listitem> + <para> + Include algorithm type in the result, so passwords hashed with + different algorithms can co-exist. + </para> + </listitem> + <listitem> + <para> + Some of them are adaptive - that means after computers get + faster, you can tune the algorithm to be slower, without + introducing incompatibility with existing passwords. + </para> + </listitem> + </orderedlist> + + <para> + Supported algorithms: + </para> + <programlisting> +`------`-------------`---------`----------`--------------------------- + Type Max password Adaptive Salt bits Description +---------------------------------------------------------------------- +`bf` 72 yes 128 Blowfish-based, variant 2a +`md5` unlimited no 48 md5-based crypt() +`xdes` 8 yes 24 Extended DES +`des` 8 no 12 Original UNIX crypt +---------------------------------------------------------------------- + </programlisting> + + <sect3> + <title>crypt(password, salt)</title> + <programlisting> + crypt(password text, salt text) RETURNS text + </programlisting> + <para> + Calculates UN*X crypt(3) style hash of password. When storing new + password, you need to use function `gen_salt()` to generate new salt. + When checking password you should use existing hash as salt. + </para> + <para> + Example - setting new password: + </para> + <programlisting> + UPDATE .. SET pswhash = crypt('new password', gen_salt('md5')); + </programlisting> + <para> + Example - authentication: + </para> + <programlisting> + SELECT pswhash = crypt('entered password', pswhash) WHERE .. ; + </programlisting> + <para> + returns true or false whether the entered password is correct. + It also can return NULL if `pswhash` field is NULL. + </para> + </sect3> + + <sect3> + <title>gen_salt(type)</title> + <programlisting> + gen_salt(type text) RETURNS text + </programlisting> + <para> + Generates a new random salt for usage in `crypt()`. For adaptible + algorithms, it uses the default iteration count. + </para> + <para> + Accepted types are: `des`, `xdes`, `md5` and `bf`. + </para> + </sect3> + <sect3> + <title>gen_salt(type, rounds)</title> + <programlisting> + gen_salt(type text, rounds integer) RETURNS text + </programlisting> + <para> + algorithms. The higher the count, the more time it takes to hash + the password and therefore the more time to break it. Although with + too high count the time to calculate a hash may be several years + - which is somewhat impractical. + </para> + <para> + Number is algorithm specific: + </para> + <programlisting> +`-----'---------'-----'---------- + type default min max +--------------------------------- + `xdes` 725 1 16777215 + `bf` 6 4 31 +--------------------------------- + </programlisting> + <para> + In case of xdes there is a additional limitation that the count must be + a odd number. + </para> + <para> + Notes: + </para> + <itemizedlist> + <listitem> + <para> + Original DES crypt was designed to have the speed of 4 hashes per + second on the hardware of that time. + </para> + </listitem> + <listitem> + <para> + Slower than 4 hashes per second would probably dampen usability. + </para> + </listitem> + <listitem> + <para> + Faster than 100 hashes per second is probably too fast. + </para> + </listitem> + <listitem> + <para> + See next section about possible values for `crypt-bf`. + </para> + </listitem> + </itemizedlist> + </sect3> + + <sect3> + <title>Comparison of crypt and regular hashes</title> + <para> + Here is a table that should give overview of relative slowness + of different hashing algorithms. + </para> + <itemizedlist> + <listitem> + <para> + The goal is to crack a 8-character password, which consists: + </para> + <orderedlist> + <listitem><para>Only of lowercase letters</para></listitem> + <listitem><para>Numbers, lower- and uppercase letters.</para></listitem> + </orderedlist> + </listitem> + <listitem> + <para> + The table below shows how much time it would take to try all + combinations of characters. + </para> + </listitem> + <listitem> + <para> + The <literal>crypt-bf</literal> is featured in several settings - the number + after slash is the <literal>rounds</literal> parameter of + <literal>gen_salt()</literal>. + </para> + </listitem> + </itemizedlist> + <programlisting> +`------------'----------'--------------'-------------------- +Algorithm Hashes/sec Chars: [a-z] Chars: [A-Za-z0-9] +------------------------------------------------------------ +crypt-bf/8 28 246 years 251322 years +crypt-bf/7 57 121 years 123457 years +crypt-bf/6 112 62 years 62831 years +crypt-bf/5 211 33 years 33351 years +crypt-md5 2681 2.6 years 2625 years +crypt-des 362837 7 days 19 years +sha1 590223 4 days 12 years +md5 2345086 1 day 3 years +------------------------------------------------------------ + </programlisting> + <itemizedlist> + <listitem> + <para> + The machine used is 1.5GHz Pentium 4. + </para> + </listitem> + <listitem> + <para> + crypt-des and crypt-md5 algorithm numbers are taken from + John the Ripper v1.6.38 `-test` output. + </para> + </listitem> + <listitem> + <para> + MD5 numbers are from mdcrack 1.2. + </para> + </listitem> + <listitem> + <para> + SHA1 numbers are from lcrack-20031130-beta. + </para> + </listitem> + <listitem> + <para> + <literal>crypt-bf</literal> numbers are taken using simple program that loops + over 1000 8-character passwords. That way I can show the speed with + different number of rounds. For reference: <literal>john -test</literal> + shows 213 loops/sec for crypt-bf/5. (The small difference in results is + in accordance to the fact that the <literal>crypt-bf</literal> implementation in pgcrypto + is same one that is used in John the Ripper.) + </para> + </listitem> + </itemizedlist> + + <para> + Note that "try all combinations" is not a realistic exercise. + Usually password cracking is done with the help of dictionaries, which + contain both regular words and various mutations of them. So, even + somewhat word-like passwords could be cracked much faster than the above + numbers suggest, and a 6-character non-word like password may escape + cracking. Or not. + </para> + </sect3> + </sect2> + + + <sect2> + <title>PGP encryption</title> + <para> + The functions here implement the encryption part of OpenPGP (RFC2440) + standard. Supported are both symmetric-key and public-key encryption. + </para> + + <sect3> + <title>Overview</title> + <para> + Encrypted PGP message consists of 2 packets: + </para> + <itemizedlist> + <listitem><para>Packet for session key - either symmetric- or public-key encrypted.</para></listitem> + <listitem><para>Packet for session-key encrypted data.</para></listitem> + </itemizedlist> + <para> + When encrypting with password: + </para> + <orderedlist> + <listitem> + <para> + Given password is hashed using String2Key (S2K) algorithm. This + is rather similar to `crypt()` algorithm - purposefully slow + and with random salt - but it produces a full-length binary key. + </para> + </listitem> + <listitem> + <para> + If separate session key is requested, new random key will be + generated. Otherwise S2K key will be used directly as session key. + </para> + </listitem> + <listitem> + <para> + If S2K key is to be used directly, then only S2K settings will be put + into session key packet. Otherwise session key will be encrypted with + S2K key and put into session key packet. + </para> + </listitem> + </orderedlist> + <para> + When encrypting with public key: + </para> + <orderedlist> + <listitem><para>New random session key is generated.</para></listitem> + <listitem><para>It is encrypted using public key and put into session key packet.</para></listitem> + </orderedlist> + + <para> + Now common part, the session-key encrypted data packet: + </para> + <orderedlist> + <listitem> + <para> + Optional data-manipulation: compression, conversion to UTF-8, + conversion of line-endings. + </para> + </listitem> + <listitem> + <para> + Data is prefixed with block of random bytes. This is equal + to using random IV. + </para> + </listitem> + <listitem> + <para> + A SHA1 hash of random prefix and data is appended. + </para> + </listitem> + <listitem> + <para> + All this is encrypted with session key. + </para> + </listitem> + </orderedlist> + </sect3> + + <sect3> + <title><literal>pgp_sym_encrypt(data, psw)</literal></title> + <programlisting> + pgp_sym_encrypt(data text, psw text [, options text] ) RETURNS bytea + pgp_sym_encrypt_bytea(data bytea, psw text [, options text] ) RETURNS bytea + </programlisting> + <para> + Return a symmetric-key encrypted PGP message. + </para> + <para> + Options are described in section 5.8. + </para> + </sect3> + + <sect3> + <title><literal>pgp_sym_decrypt(msg, psw)</literal></title> + <programlisting> + pgp_sym_decrypt(msg bytea, psw text [, options text] ) RETURNS text + pgp_sym_decrypt_bytea(msg bytea, psw text [, options text] ) RETURNS bytea + </programlisting> + <para> + Decrypt a symmetric-key encrypted PGP message. + </para> + <para> + Decrypting bytea data with `pgp_sym_decrypt` is disallowed. + This is to avoid outputting invalid character data. Decrypting + originally textual data with `pgp_sym_decrypt_bytea` is fine. + </para> + <para> + Options are described in section 5.8. + </para> + </sect3> + + <sect3> + <title><literal>pgp_pub_encrypt(data, pub_key)</literal></title> + <programlisting> + pgp_pub_encrypt(data text, key bytea [, options text] ) RETURNS bytea + pgp_pub_encrypt_bytea(data bytea, key bytea [, options text] ) RETURNS bytea + </programlisting> + <para> + Encrypt data with a public key. Giving this function a secret key will + produce a error. + </para> + <para> + Options are described in section 5.8. + </para> + </sect3> + + <sect3> + <title><literal>pgp_pub_decrypt(msg, sec_key [, psw])</literal></title> + <programlisting> + pgp_pub_decrypt(msg bytea, key bytea [, psw text [, options text]] ) RETURNS text + pgp_pub_decrypt_bytea(msg bytea, key bytea [,psw text [, options text]] ) RETURNS bytea + </programlisting> + <para> + Decrypt a public-key encrypted message with secret key. If the secret + key is password-protected, you must give the password in `psw`. If + there is no password, but you want to specify option for function, you + need to give empty password. + </para> + <para> + Decrypting bytea data with `pgp_pub_decrypt` is disallowed. + This is to avoid outputting invalid character data. Decrypting + originally textual data with `pgp_pub_decrypt_bytea` is fine. + </para> + <para> + Options are described in section 5.8. + </para> + </sect3> + + <sect3> + <title><literal>pgp_key_id(key / msg)</literal></title> + <programlisting> + pgp_key_id(key or msg bytea) RETURNS text + </programlisting> + <para> + It shows you either key ID if given PGP public or secret key. Or it + gives the key ID that was used for encrypting the data, if given + encrypted message. + </para> + <para> + It can return 2 special key IDs: + </para> + <itemizedlist> + <listitem> + <para> + SYMKEY: + </para> + <para> + The data is encrypted with symmetric key. + </para> + </listitem> + <listitem> + <para> + ANYKEY: + </para> + <para> + The data is public-key encrypted, but the key ID is cleared. + That means you need to try all your secret keys on it to see + which one decrypts it. pgcrypto itself does not produce such + messages. + </para> + </listitem> + </itemizedlist> + <para> + Note that different keys may have same ID. This is rare but normal + event. Client application should then try to decrypt with each one, + to see which fits - like handling ANYKEY. + </para> + </sect3> + + <sect3> + <title><literal>armor / dearmor</literal></title> + <programlisting> + armor(data bytea) RETURNS text + dearmor(data text) RETURNS bytea + </programlisting> + <para> + Those wrap/unwrap data into PGP Ascii Armor which is basically Base64 + with CRC and additional formatting. + </para> + </sect3> + + <sect3> + <title>Options for PGP functions</title> + <para> + Options are named to be similar to GnuPG. Values should be given after + an equal sign; separate options from each other with commas. Example: + </para> + <programlisting> + pgp_sym_encrypt(data, psw, 'compress-algo=1, cipher-algo=aes256') + </programlisting> + <para> + All of the options except `convert-crlf` apply only to encrypt + functions. Decrypt functions get the parameters from PGP data. + </para> + <para> + Most interesting options are probably `compression-algo` and + <literal>unicode-mode</literal>. The rest should have reasonable defaults. + </para> + </sect3> + + <sect3> + <title>cipher-algo</title> + <para> + What cipher algorithm to use. + </para> + <programlisting> + Values: bf, aes128, aes192, aes256 (OpenSSL-only: `3des`, `cast5`) + Default: aes128 + Applies: pgp_sym_encrypt, pgp_pub_encrypt + </programlisting> + </sect3> + + <sect3> + <title>compress-algo</title> + <para> + Which compression algorithm to use. Needs building with zlib. + </para> + <para> + Values: + </para> + <programlisting> + 0 - no compression + 1 - ZIP compression + 2 - ZLIB compression [=ZIP plus meta-data and block-CRC's] + Default: 0 + Applies: pgp_sym_encrypt, pgp_pub_encrypt + </programlisting> + </sect3> + + <sect3> + <title>compress-level</title> + <para> + How much to compress. Bigger level compresses smaller but is slower. + 0 disables compression. + </para> + <programlisting> + Values: 0, 1-9 + Default: 6 + Applies: pgp_sym_encrypt, pgp_pub_encrypt + </programlisting> + </sect3> + + <sect3> + <title>convert-crlf</title> + <para> + Whether to convert `\n` into `\r\n` when encrypting and `\r\n` to `\n` + when decrypting. RFC2440 specifies that text data should be stored + using `\r\n` line-feeds. Use this to get fully RFC-compliant + behavior. + </para> + <programlisting> + Values: 0, 1 + Default: 0 + Applies: pgp_sym_encrypt, pgp_pub_encrypt, pgp_sym_decrypt, pgp_pub_decrypt + </programlisting> + </sect3> + + <sect3> + <title>disable-mdc</title> + <para> + Do not protect data with SHA-1. Only good reason to use this + option is to achieve compatibility with ancient PGP products, as the + SHA-1 protected packet is from upcoming update to RFC2440. (Currently + at version RFC2440bis-14.) Recent gnupg.org and pgp.com software + supports it fine. + </para> + <programlisting> + Values: 0, 1 + Default: 0 + Applies: pgp_sym_encrypt, pgp_pub_encrypt + </programlisting> + </sect3> + + <sect3> + <title>enable-session-key</title> + <para> + Use separate session key. Public-key encryption always uses separate + session key, this is for symmetric-key encryption, which by default + uses S2K directly. + </para> + <programlisting> + Values: 0, 1 + Default: 0 + Applies: pgp_sym_encrypt + </programlisting> + </sect3> + + <sect3> + <title>s2k-mode</title> + <para> + Which S2K algorithm to use. + </para> + <programlisting> + Values: + 0 - Without salt. Dangerous! + 1 - With salt but with fixed iteration count. + 3 - Variable iteration count. + Default: 3 + Applies: pgp_sym_encrypt + </programlisting> + </sect3> + + <sect3> + <title>s2k-digest-algo</title> + <para> + Which digest algorithm to use in S2K calculation. + </para> + <programlisting> + Values: md5, sha1 + Default: sha1 + Applies: pgp_sym_encrypt + </programlisting> + </sect3> + + <sect3> + <title>s2k-cipher-algo</title> + <para> + Which cipher to use for encrypting separate session key. + </para> + <programlisting> + Values: bf, aes, aes128, aes192, aes256 + Default: use cipher-algo. + Applies: pgp_sym_encrypt + </programlisting> + </sect3> + + <sect3> + <title>unicode-mode</title> + <para> + Whether to convert textual data from database internal encoding to + UTF-8 and back. If your database already is UTF-8, no conversion will + be done, only the data will be tagged as UTF-8. Without this option + it will not be. + </para> + <programlisting> + Values: 0, 1 + Default: 0 + Applies: pgp_sym_encrypt, pgp_pub_encrypt + </programlisting> + </sect3> + </sect2> + + <sect2> + <title>Generating keys with GnuPG</title> + <para> + Generate a new key: + </para> + <programlisting> + gpg --gen-key + </programlisting> + <para> + The preferred key type is "DSA and Elgamal". + </para> + <para> + For RSA encryption you must create either DSA or RSA sign-only key + as master and then add RSA encryption subkey with `gpg --edit-key`. + </para> + <para> + List keys: + </para> + <programlisting> + gpg --list-secret-keys + </programlisting> + <para> + Export ascii-armored public key: + </para> + <programlisting> + gpg -a --export KEYID > public.key + </programlisting> + <para> + Export ascii-armored secret key: + </para> + <programlisting> + gpg -a --export-secret-keys KEYID > secret.key + </programlisting> + <para> + You need to use `dearmor()` on them before giving them to + pgp_pub_* functions. Or if you can handle binary data, you can drop + "-a" from gpg. + </para> + <para> + For more details see `man gpg`, + <ulink url="http://www.gnupg.org/gph/en/manual.html"></ulink>[The GNU + Privacy Handbook] and other docs on + <ulink url="http://www.gnupg.org"></ulink> site. + </para> + </sect2> + + <sect2> + <title>Limitations of PGP code</title> + <itemizedlist> + <listitem> + <para> + No support for signing. That also means that it is not checked + whether the encryption subkey belongs to master key. + </para> + </listitem> + <listitem> + <para> + No support for encryption key as master key. As such practice + is generally discouraged, it should not be a problem. + </para> + </listitem> + <listitem> + <para> + No support for several subkeys. This may seem like a problem, as this + is common practice. On the other hand, you should not use your regular + GPG/PGP keys with pgcrypto, but create new ones, as the usage scenario + is rather different. + </para> + </listitem> + </itemizedlist> + </sect2> + + <sect2> + <title>Raw encryption</title> + <para> + Those functions only run a cipher over data, they don't have any advanced + features of PGP encryption. Therefore they have some major problems: + </para> + <orderedlist> + <listitem> + <para> + They use user key directly as cipher key. + </para> + </listitem> + <listitem> + <para> + They don't provide any integrity checking, to see + if the encrypted data was modified. + </para> + </listitem> + <listitem> + <para> + They expect that users manage all encryption parameters + themselves, even IV. + </para> + </listitem> + <listitem> + <para> + They don't handle text. + </para> + </listitem> + </orderedlist> + <para> + So, with the introduction of PGP encryption, usage of raw + encryption functions is discouraged. + </para> + <programlisting> + encrypt(data bytea, key bytea, type text) RETURNS bytea + decrypt(data bytea, key bytea, type text) RETURNS bytea + + encrypt_iv(data bytea, key bytea, iv bytea, type text) RETURNS bytea + decrypt_iv(data bytea, key bytea, iv bytea, type text) RETURNS bytea + </programlisting> + <para> + Encrypt/decrypt data with cipher, padding data if needed. + </para> + <para> + <literal>type</literal> parameter description in pseudo-noteup: + </para> + <programlisting> + algo ['-' mode] ['/pad:' padding] + </programlisting> + <para> + Supported algorithms: + </para> + <itemizedlist> + <listitem><para><literal>bf</literal>- Blowfish</para></listitem> + <listitem><para><literal>aes</literal>- AES (Rijndael-128)</para></listitem> + </itemizedlist> + <para> + Modes: + </para> + <itemizedlist> + <listitem> + <para> + <literal>cbc</literal>- next block depends on previous. (default) + </para> + </listitem> + <listitem> + <para> + <literal>ecb</literal>- each block is encrypted separately. (for testing + only) + </para> + </listitem> + </itemizedlist> + <para> + Padding: + </para> + <itemizedlist> + <listitem> + <para> + <literal>pkcs</literal>-data may be any length (default) + </para> + </listitem> + <listitem> + <para> + <literal>none</literal>- data must be multiple of cipher block size. + </para> + </listitem> + </itemizedlist> + <para> + IV is initial value for mode, defaults to all zeroes. It is ignored for + ECB. It is clipped or padded with zeroes if not exactly block size. + </para> + <para> + So, example: + </para> + <programlisting> + encrypt(data, 'fooz', 'bf') + </programlisting> + <para> + is equal to + </para> + <programlisting> + encrypt(data, 'fooz', 'bf-cbc/pad:pkcs') + </programlisting> + </sect2> + + <sect2> + <title>Random bytes</title> + <programlisting> + gen_random_bytes(count integer) + </programlisting> + <para> + Returns `count` cryptographically strong random bytes as bytea value. + There can be maximally 1024 bytes extracted at a time. This is to avoid + draining the randomness generator pool. + </para> + </sect2> + + <sect2> + <title>References/Links</title> + + <sect3> + <title>Useful reading</title> + <itemizedlist> + <listitem> + <para><ulink url="http://www.gnupg.org/gph/en/manual.html"></ulink>:</para> + <para>The GNU Privacy Handbook</para> + </listitem> + <listitem> + <para><ulink url="http://www.openwall.com/crypt/"></ulink>:</para> + <para>Describes the crypt-blowfish algorithm.</para> + </listitem> + <listitem> + <para> + <ulink url="http://www.stack.nl/~galactus/remailers/passphrase-faq.html"></ulink>: + </para> + <para>How to choose good password.</para> + </listitem> + <listitem> + <para><ulink url="http://world.std.com/~reinhold/diceware.html"></ulink>:</para> + <para>Interesting idea for picking passwords.</para> + </listitem> + <listitem> + <para> + <ulink url="http://www.interhack.net/people/cmcurtin/snake-oil-faq.html"></ulink>: + </para> + <para>Describes good and bad cryptography.</para> + </listitem> + </itemizedlist> + </sect3> + + <sect3> + <title>Technical references</title> + <itemizedlist> + <listitem> + <para><ulink url="http://www.ietf.org/rfc/rfc2440.txt"></ulink>:</para> + <para>OpenPGP message format</para> + </listitem> + <listitem> + <para> + <ulink url="http://www.imc.org/draft-ietf-openpgp-rfc2440bis"></ulink>: + </para> + <para>New version of RFC2440.</para> + </listitem> + <listitem> + <para><ulink url="http://www.ietf.org/rfc/rfc1321.txt"></ulink>:</para> + <para>The MD5 Message-Digest Algorithm</para> + </listitem> + <listitem> + <para><ulink url="http://www.ietf.org/rfc/rfc2104.txt"></ulink>:</para> + <para>HMAC: Keyed-Hashing for Message Authentication</para> + </listitem> + <listitem> + <para> + <ulink url="http://www.usenix.org/events/usenix99/provos.html"></ulink>: + </para> + <para>Comparison of crypt-des, crypt-md5 and bcrypt algorithms.</para> + </listitem> + <listitem> + <para><ulink url="http://csrc.nist.gov/cryptval/des.htm"></ulink>:</para> + <para>Standards for DES, 3DES and AES.</para> + </listitem> + <listitem> + <para> + <ulink url="http://en.wikipedia.org/wiki/Fortuna_(PRNG)"></ulink>: + </para> + <para>Description of Fortuna CSPRNG.</para> + </listitem> + <listitem> + <para><ulink url="http://jlcooke.ca/random/"></ulink>:</para> + <para>Jean-Luc Cooke Fortuna-based /dev/random driver for Linux.</para> + </listitem> + <listitem> + <para><ulink url="http://www.cs.ut.ee/~helger/crypto/"></ulink>:</para> + <para>Collection of cryptology pointers.</para> + </listitem> + </itemizedlist> + </sect3> + </sect2> + + <sect2> + <title>Credits</title> + <para> + <literal>pgcrypto</literal> uses code from the following sources: + </para> + <table> + <title>Credits</title> + <tgroup cols="3"> + <thead> + <row> + <entry><para>Algorithm</para></entry> + <entry><para>Author</para></entry> + <entry><para>Source origin</para></entry> + </row> + </thead> + <tbody> + <row> + <entry><para>DES crypt()</para></entry> + <entry><para>David Burren and others</para></entry> + <entry><para>FreeBSD libcrypt</para></entry> + </row> + <row> + <entry><para>MD5 crypt()</para></entry> + <entry><para>Poul-Henning Kamp</para></entry> + <entry><para>FreeBSD libcrypt</para></entry> + </row> + <row> + <entry><para>Blowfish crypt()</para></entry> + <entry><para>Solar Designer</para></entry> + <entry><para>www.openwall.com</para></entry> + </row> + <row> + <entry><para>Blowfish cipher</para></entry> + <entry><para>Simon Tatham</para></entry> + <entry><para>PuTTY</para></entry> + </row> + <row> + <entry><para>Rijndael cipher</para></entry> + <entry><para>Brian Gladman</para></entry> + <entry><para>OpenBSD sys/crypto</para></entry> + </row> + <row> + <entry><para>MD5 and SHA1</para></entry> + <entry><para>WIDE Project</para></entry> + <entry><para>KAME kame/sys/crypto</para></entry> + </row> + <row> + <entry><para>SHA256/384/512 </para></entry> + <entry><para>Aaron D. Gifford</para></entry> + <entry><para>OpenBSD sys/crypto</para></entry> + </row> + <row> + <entry><para>BIGNUM math</para></entry> + <entry><para>Michael J. Fromberger</para></entry> + <entry><para>dartmouth.edu/~sting/sw/imath</para></entry> + </row> + </tbody> + </tgroup> + </table> + </sect2> + + <sect2> + <title>Author</title> + <para> + Marko Kreen <email>markokr@gmail.com</email> + </para> + </sect2> + +</sect1> + diff --git a/doc/src/sgml/pgrowlocks.sgml b/doc/src/sgml/pgrowlocks.sgml new file mode 100644 index 00000000000..f7b1e479a06 --- /dev/null +++ b/doc/src/sgml/pgrowlocks.sgml @@ -0,0 +1,123 @@ + +<sect1 id="pgrowlocks"> + <title>pgrowlocks</title> + + <indexterm zone="pgrowlocks"> + <primary>pgrowlocks</primary> + </indexterm> + + <para> + The <literal>pgrowlocks</literal> module provides a function to show row + locking information for a specified table. + </para> + + <sect2> + <title>Overview</title> + <programlisting> +pgrowlocks(text) RETURNS pgrowlocks_type + </programlisting> + <para> + The parameter is a name of table. And <literal>pgrowlocks_type</literal> is + defined as: + </para> + <programlisting> +CREATE TYPE pgrowlocks_type AS ( + locked_row TID, -- row TID + lock_type TEXT, -- lock type + locker XID, -- locking XID + multi bool, -- multi XID? + xids xid[], -- multi XIDs + pids INTEGER[] -- locker's process id +); + </programlisting> + + <table> + <title>pgrowlocks_type</title> + <tgroup cols="2"> + <tbody> + <row> + <entry>locked_row</entry> + <entry>tuple ID(TID) of each locked rows</entry> + </row> + <row> + <entry>lock_type</entry> + <entry>"Shared" for shared lock, "Exclusive" for exclusive lock</entry> + </row> + <row> + <entry>locker</entry> + <entry>transaction ID of locker (Note 1)</entry> + </row> + <row> + <entry>multi</entry> + <entry>"t" if locker is a multi transaction, otherwise "f"</entry> + </row> + <row> + <entry>xids</entry> + <entry>XIDs of lockers (Note 2)</entry> + </row> + <row> + <entry>pids</entry> + <entry>process ids of locking backends</entry> + </row> + </tbody> + </tgroup> + </table> + <para> + Note1: If the locker is multi transaction, it represents the multi ID. + </para> + <para> + Note2: If the locker is multi, multiple data are shown. + </para> + + <para> + The calling sequence for <literal>pgrowlocks</literal> is as follows: + <literal>pgrowlocks</literal> grabs AccessShareLock for the target table and + reads each row one by one to get the row locking information. You should + notice that: + </para> + <orderedlist> + <listitem> + <para> + if the table is exclusive locked by someone else, + <literal>pgrowlocks</literal> will be blocked. + </para> + </listitem> + <listitem> + <para> + <literal>pgrowlocks</literal> may show incorrect information if there's a + new lock or a lock is freeed while its execution. + </para> + </listitem> + </orderedlist> + <para> + <literal>pgrowlocks</literal> does not show the contents of locked rows. If + you want to take a look at the row contents at the same time, you could do + something like this: + </para> + <programlisting> +SELECT * FROM accounts AS a, pgrowlocks('accounts') AS p WHERE p.locked_ row = a.ctid; + </programlisting> + </sect2> + + <sect2> + <title>Example</title> + <para> + <literal>pgrowlocks</literal> returns the following data type: + </para> + <para> + Here is a sample execution of pgrowlocks: + </para> + <programlisting> +test=# SELECT * FROM pgrowlocks('t1'); + locked_row | lock_type | locker | multi | xids | pids +------------+-----------+--------+-------+-----------+--------------- + (0,1) | Shared | 19 | t | {804,805} | {29066,29068} + (0,2) | Shared | 19 | t | {804,805} | {29066,29068} + (0,3) | Exclusive | 804 | f | {804} | {29066} + (0,4) | Exclusive | 804 | f | {804} | {29066} +(4 rows) + </programlisting> + + </sect2> +</sect1> + diff --git a/doc/src/sgml/pgstattuple.sgml b/doc/src/sgml/pgstattuple.sgml new file mode 100644 index 00000000000..e8fa71602cf --- /dev/null +++ b/doc/src/sgml/pgstattuple.sgml @@ -0,0 +1,158 @@ + +<sect1 id="pgstattuple"> + <title>pgstattuple</title> + + <indexterm zone="pgstattuple"> + <primary>pgstattuple</primary> + </indexterm> + + <para> + <literal>pgstattuple</literal> modules provides various functions to obtain + tuple statistics. + </para> + + <sect2> + <title>Functions</title> + + <itemizedlist> + <listitem> + <para> + <literal>pgstattuple()</literal> returns the relation length, percentage + of the "dead" tuples of a relation and other info. This may help users to + determine whether vacuum is necessary or not. Here is an example session: + </para> + <programlisting> +test=> \x +Expanded display is on. +test=> SELECT * FROM pgstattuple('pg_catalog.pg_proc'); +-[ RECORD 1 ]------+------- +table_len | 458752 +tuple_count | 1470 +tuple_len | 438896 +tuple_percent | 95.67 +dead_tuple_count | 11 +dead_tuple_len | 3157 +dead_tuple_percent | 0.69 +free_space | 8932 +free_percent | 1.95 + </programlisting> + <para> + Here are explanations for each column: + </para> + + <table> + <title><literal>pgstattuple()</literal> column descriptions</title> + <tgroup cols="2"> + <thead> + <row> + <entry>Column</entry> + <entry>Description</entry> + </row> + </thead> + <tbody> + <row> + <entry>table_len</entry> + <entry>physical relation length in bytes</entry> + </row> + <row> + <entry>tuple_count</entry> + <entry>number of live tuples</entry> + </row> + <row> + <entry>tuple_len</entry> + <entry>total tuples length in bytes</entry> + </row> + <row> + <entry>tuple_percent</entry> + <entry>live tuples in %</entry> + </row> + <row> + <entry>dead_tuple_len</entry> + <entry>total dead tuples length in bytes</entry> + </row> + <row> + <entry>dead_tuple_percent</entry> + <entry>dead tuples in %</entry> + </row> + <row> + <entry>free_space</entry> + <entry>free space in bytes</entry> + </row> + <row> + <entry>free_percent</entry> + <entry>free space in %</entry> + </row> + </tbody> + </tgroup> + </table> + <para> + <note> + <para> + <literal>pgstattuple</literal> acquires only a read lock on the relation. So + concurrent update may affect the result. + </para> + </note> + <note> + <para> + <literal>pgstattuple</literal> judges a tuple is "dead" if HeapTupleSatisfiesNow() + returns false. + </para> + </note> + </para> + </listitem> + + + <listitem> + <para> + <literal>pg_relpages()</literal> returns the number of pages in the relation. + </para> + </listitem> + + <listitem> + <para> + <literal>pgstatindex()</literal> returns an array showing the information about an index: + </para> + <programlisting> +test=> \x +Expanded display is on. +test=> SELECT * FROM pgstatindex('pg_cast_oid_index'); +-[ RECORD 1 ]------+------ +version | 2 +tree_level | 0 +index_size | 8192 +root_block_no | 1 +internal_pages | 0 +leaf_pages | 1 +empty_pages | 0 +deleted_pages | 0 +avg_leaf_density | 50.27 +leaf_fragmentation | 0 + </programlisting> + </listitem> + </itemizedlist> + </sect2> + + <sect2> + <title>Usage</title> + <para> + <literal>pgstattuple</literal> may be called as a relation function and is + defined as follows: + </para> + <programlisting> + CREATE OR REPLACE FUNCTION pgstattuple(text) RETURNS pgstattuple_type + AS 'MODULE_PATHNAME', 'pgstattuple' + LANGUAGE C STRICT; + + CREATE OR REPLACE FUNCTION pgstattuple(oid) RETURNS pgstattuple_type + AS 'MODULE_PATHNAME', 'pgstattuplebyid' + LANGUAGE C STRICT; + </programlisting> + <para> + The argument is the relation name (optionally it may be qualified) + or the OID of the relation. Note that pgstattuple only returns + one row. + </para> + </sect2> + +</sect1> + diff --git a/doc/src/sgml/postgres.sgml b/doc/src/sgml/postgres.sgml index 35fb92d5739..2cc4d573130 100644 --- a/doc/src/sgml/postgres.sgml +++ b/doc/src/sgml/postgres.sgml @@ -1,4 +1,4 @@ -<!-- $PostgreSQL: pgsql/doc/src/sgml/postgres.sgml,v 1.83 2007/11/01 17:00:18 momjian Exp $ --> +<!-- $PostgreSQL: pgsql/doc/src/sgml/postgres.sgml,v 1.84 2007/11/10 23:30:46 momjian Exp $ --> <!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook V4.2//EN" [ @@ -102,6 +102,7 @@ &typeconv; &indices; &textsearch; + &contrib; &mvcc; &perform; diff --git a/doc/src/sgml/seg.sgml b/doc/src/sgml/seg.sgml new file mode 100644 index 00000000000..c60f3425def --- /dev/null +++ b/doc/src/sgml/seg.sgml @@ -0,0 +1,450 @@ + +<sect1 id="seg"> + <title>seg</title> + + <indexterm zone="seg"> + <primary>seg</primary> + </indexterm> + + <para> + The <literal>seg</literal> module contains the code for the user-defined + type, <literal>SEG</literal>, representing laboratory measurements as + floating point intervals. + </para> + + <sect2> + <title>Rationale</title> + <para> + The geometry of measurements is usually more complex than that of a + point in a numeric continuum. A measurement is usually a segment of + that continuum with somewhat fuzzy limits. The measurements come out + as intervals because of uncertainty and randomness, as well as because + the value being measured may naturally be an interval indicating some + condition, such as the temperature range of stability of a protein. + </para> + <para> + Using just common sense, it appears more convenient to store such data + as intervals, rather than pairs of numbers. In practice, it even turns + out more efficient in most applications. + </para> + <para> + Further along the line of common sense, the fuzziness of the limits + suggests that the use of traditional numeric data types leads to a + certain loss of information. Consider this: your instrument reads + 6.50, and you input this reading into the database. What do you get + when you fetch it? Watch: + </para> + <programlisting> +test=> select 6.50 as "pH"; + pH +--- +6.5 +(1 row) + </programlisting> + <para> + In the world of measurements, 6.50 is not the same as 6.5. It may + sometimes be critically different. The experimenters usually write + down (and publish) the digits they trust. 6.50 is actually a fuzzy + interval contained within a bigger and even fuzzier interval, 6.5, + with their center points being (probably) the only common feature they + share. We definitely do not want such different data items to appear the + same. + </para> + <para> + Conclusion? It is nice to have a special data type that can record the + limits of an interval with arbitrarily variable precision. Variable in + a sense that each data element records its own precision. + </para> + <para> + Check this out: + </para> + <programlisting> +test=> select '6.25 .. 6.50'::seg as "pH"; + pH +------------ +6.25 .. 6.50 +(1 row) + </programlisting> + </sect2> + + <sect2> + <title>Syntax</title> + <para> + The external representation of an interval is formed using one or two + floating point numbers joined by the range operator ('..' or '...'). + Optional certainty indicators (<, > and ~) are ignored by the internal + logics, but are retained in the data. + </para> + + <table> + <title>Rules</title> + <tgroup cols="2"> + <tbody> + <row> + <entry>rule 1</entry> + <entry>seg -> boundary PLUMIN deviation</entry> + </row> + <row> + <entry>rule 2</entry> + <entry>seg -> boundary RANGE boundary</entry> + </row> + <row> + <entry>rule 3</entry> + <entry>seg -> boundary RANGE</entry> + </row> + <row> + <entry>rule 4</entry> + <entry>seg -> RANGE boundary</entry> + </row> + <row> + <entry>rule 5</entry> + <entry>seg -> boundary</entry> + </row> + <row> + <entry>rule 6</entry> + <entry>boundary -> FLOAT</entry> + </row> + <row> + <entry>rule 7</entry> + <entry>boundary -> EXTENSION FLOAT</entry> + </row> + <row> + <entry>rule 8</entry> + <entry>deviation -> FLOAT</entry> + </row> + </tbody> + </tgroup> + </table> + + <table> + <title>Tokens</title> + <tgroup cols="2"> + <tbody> + <row> + <entry>RANGE</entry> + <entry>(\.\.)(\.)?</entry> + </row> + <row> + <entry>PLUMIN</entry> + <entry>\'\+\-\'</entry> + </row> + <row> + <entry>integer</entry> + <entry>[+-]?[0-9]+</entry> + </row> + <row> + <entry>real</entry> + <entry>[+-]?[0-9]+\.[0-9]+</entry> + </row> + <row> + <entry>FLOAT</entry> + <entry>({integer}|{real})([eE]{integer})?</entry> + </row> + <row> + <entry>EXTENSION</entry> + <entry>[<>~]</entry> + </row> + </tbody> + </tgroup> + </table> + + <table> + <title>Examples of valid <literal>SEG</literal> representations</title> + <tgroup cols="2"> + <tbody> + <row> + <entry>Any number</entry> + <entry> + (rules 5,6) -- creates a zero-length segment (a point, + if you will) + </entry> + </row> + <row> + <entry>~5.0</entry> + <entry> + (rules 5,7) -- creates a zero-length segment AND records + '~' in the data. This notation reads 'approximately 5.0', + but its meaning is not recognized by the code. It is ignored + until you get the value back. View it is a short-hand comment. + </entry> + </row> + <row> + <entry><5.0</entry> + <entry> + (rules 5,7) -- creates a point at 5.0; '<' is ignored but + is preserved as a comment + </entry> + </row> + <row> + <entry>>5.0</entry> + <entry> + (rules 5,7) -- creates a point at 5.0; '>' is ignored but + is preserved as a comment + </entry> + </row> + <row> + <entry><para>5(+-)0.3</para><para>5'+-'0.3</para></entry> + <entry> + <para> + (rules 1,8) -- creates an interval '4.7..5.3'. As of this + writing (02/09/2000), this mechanism isn't completely accurate + in determining the number of significant digits for the + boundaries. For example, it adds an extra digit to the lower + boundary if the resulting interval includes a power of ten: + </para> + <programlisting> +postgres=> select '10(+-)1'::seg as seg; + seg +--------- +9.0 .. 11 -- should be: 9 .. 11 + </programlisting> + <para> + Also, the (+-) notation is not preserved: 'a(+-)b' will + always be returned as '(a-b) .. (a+b)'. The purpose of this + notation is to allow input from certain data sources without + conversion. + </para> + </entry> + </row> + <row> + <entry>50 .. </entry> + <entry>(rule 3) -- everything that is greater than or equal to 50</entry> + </row> + <row> + <entry>.. 0</entry> + <entry>(rule 4) -- everything that is less than or equal to 0</entry> + </row> + <row> + <entry>1.5e-2 .. 2E-2 </entry> + <entry>(rule 2) -- creates an interval (0.015 .. 0.02)</entry> + </row> + <row> + <entry>1 ... 2</entry> + <entry> + The same as 1...2, or 1 .. 2, or 1..2 (space is ignored). + Because of the widespread use of '...' in the data sources, + I decided to stick to is as a range operator. This, and + also the fact that the white space around the range operator + is ignored, creates a parsing conflict with numeric constants + starting with a decimal point. + </entry> + </row> + </tbody> + </tgroup> + </table> + + <table> + <title>Examples</title> + <tgroup cols="2"> + <tbody> + <row> + <entry>.1e7</entry> + <entry>should be: 0.1e7</entry> + </row> + <row> + <entry>.1 .. .2</entry> + <entry>should be: 0.1 .. 0.2</entry> + </row> + <row> + <entry>2.4 E4</entry> + <entry>should be: 2.4E4</entry> + </row> + </tbody> + </tgroup> + </table> + <para> + The following, although it is not a syntax error, is disallowed to improve + the sanity of the data: + </para> + <table> + <title></title> + <tgroup cols="2"> + <tbody> + <row> + <entry>5 .. 2</entry> + <entry>should be: 2 .. 5</entry> + </row> + </tbody> + </tgroup> + </table> + </sect2> + + <sect2> + <title>Precision</title> + <para> + The segments are stored internally as pairs of 32-bit floating point + numbers. It means that the numbers with more than 7 significant digits + will be truncated. + </para> + <para> + The numbers with less than or exactly 7 significant digits retain their + original precision. That is, if your query returns 0.00, you will be + sure that the trailing zeroes are not the artifacts of formatting: they + reflect the precision of the original data. The number of leading + zeroes does not affect precision: the value 0.0067 is considered to + have just 2 significant digits. + </para> + </sect2> + + <sect2> + <title>Usage</title> + <para> + The access method for SEG is a GiST index (gist_seg_ops), which is a + generalization of R-tree. GiSTs allow the postgres implementation of + R-tree, originally encoded to support 2-D geometric types such as + boxes and polygons, to be used with any data type whose data domain + can be partitioned using the concepts of containment, intersection and + equality. In other words, everything that can intersect or contain + its own kind can be indexed with a GiST. That includes, among other + things, all geometric data types, regardless of their dimensionality + (see also contrib/cube). + </para> + <para> + The operators supported by the GiST access method include: + </para> + <itemizedlist> + <listitem> + <programlisting> +[a, b] << [c, d] Is left of + </programlisting> + <para> + The left operand, [a, b], occurs entirely to the left of the + right operand, [c, d], on the axis (-inf, inf). It means, + [a, b] << [c, d] is true if b < c and false otherwise + </para> + </listitem> + <listitem> + <programlisting> +[a, b] >> [c, d] Is right of + </programlisting> + <para> + [a, b] is occurs entirely to the right of [c, d]. + [a, b] >> [c, d] is true if a > d and false otherwise + </para> + </listitem> + <listitem> + <programlisting> +[a, b] &< [c, d] Overlaps or is left of + </programlisting> + <para> + This might be better read as "does not extend to right of". + It is true when b <= d. + </para> + </listitem> + <listitem> + <programlisting> +[a, b] &> [c, d] Overlaps or is right of + </programlisting> + <para> + This might be better read as "does not extend to left of". + It is true when a >= c. + </para> + </listitem> + <listitem> + <programlisting> +[a, b] = [c, d] Same as + </programlisting> + <para> + The segments [a, b] and [c, d] are identical, that is, a == b + and c == d + </para> + </listitem> + <listitem> + <programlisting> +[a, b] && [c, d] Overlaps + </programlisting> + <para> + The segments [a, b] and [c, d] overlap. + </para> + </listitem> + <listitem> + <programlisting> +[a, b] @> [c, d] Contains + </programlisting> + <para> + The segment [a, b] contains the segment [c, d], that is, + a <= c and b >= d + </para> + </listitem> + <listitem> + <programlisting> +[a, b] <@ [c, d] Contained in + </programlisting> + <para> + The segment [a, b] is contained in [c, d], that is, + a >= c and b <= d + </para> + </listitem> + </itemizedlist> + <para> + (Before PostgreSQL 8.2, the containment operators @> and <@ were + respectively called @ and ~. These names are still available, but are + deprecated and will eventually be retired. Notice that the old names + are reversed from the convention formerly followed by the core geometric + datatypes!) + </para> + <para> + Although the mnemonics of the following operators is questionable, I + preserved them to maintain visual consistency with other geometric + data types defined in Postgres. + </para> + <para> + Other operators: + </para> + + <programlisting> +[a, b] < [c, d] Less than +[a, b] > [c, d] Greater than + </programlisting> + <para> + These operators do not make a lot of sense for any practical + purpose but sorting. These operators first compare (a) to (c), + and if these are equal, compare (b) to (d). That accounts for + reasonably good sorting in most cases, which is useful if + you want to use ORDER BY with this type + </para> + + <para> + There are a few other potentially useful functions defined in seg.c + that vanished from the schema because I stopped using them. Some of + these were meant to support type casting. Let me know if I was wrong: + I will then add them back to the schema. I would also appreciate + other ideas that would enhance the type and make it more useful. + </para> + <para> + For examples of usage, see sql/seg.sql + </para> + <para> + NOTE: The performance of an R-tree index can largely depend on the + order of input values. It may be very helpful to sort the input table + on the SEG column (see the script sort-segments.pl for an example) + </para> + </sect2> + + <sect2> + <title>Credits</title> + <para> + My thanks are primarily to Prof. Joe Hellerstein + (<ulink url="http://db.cs.berkeley.edu/~jmh/"></ulink>) for elucidating the + gist of the GiST (<ulink url="http://gist.cs.berkeley.edu/"></ulink>). I am + also grateful to all postgres developers, present and past, for enabling + myself to create my own world and live undisturbed in it. And I would like + to acknowledge my gratitude to Argonne Lab and to the U.S. Department of + Energy for the years of faithful support of my database research. + </para> + <programlisting> + Gene Selkov, Jr. + Computational Scientist + Mathematics and Computer Science Division + Argonne National Laboratory + 9700 S Cass Ave. + Building 221 + Argonne, IL 60439-4844 + </programlisting> + <para> + <email>selkovjr@mcs.anl.gov</email> + </para> + </sect2> + +</sect1> + diff --git a/doc/src/sgml/sslinfo.sgml b/doc/src/sgml/sslinfo.sgml new file mode 100644 index 00000000000..828fca2591e --- /dev/null +++ b/doc/src/sgml/sslinfo.sgml @@ -0,0 +1,164 @@ + +<sect1 id="sslinfo"> + <title>sslinfo</title> + + <indexterm zone="sslinfo"> + <primary>sslinfo</primary> + </indexterm> + + <para> + This modules provides information about current SSL certificate for PostgreSQL. + </para> + + <sect2> + <title>Notes</title> + <para> + This extension won't build unless your PostgreSQL server is configured + with --with-openssl. Information provided with these functions would + be completely useless if you don't use SSL to connect to database. + </para> + </sect2> + + <sect2> + <title>Functions Description</title> + + <itemizedlist> + <listitem> + <programlisting> +ssl_is_used() RETURNS boolean; + </programlisting> + <para> + Returns TRUE, if current connection to server uses SSL and FALSE + otherwise. + </para> + </listitem> + + <listitem> + <programlisting> +ssl_client_cert_present() RETURNS boolean + </programlisting> + <para> + Returns TRUE if current client have presented valid SSL client + certificate to the server and FALSE otherwise (e.g., no SSL, + certificate hadn't be requested by server). + </para> + </listitem> + + <listitem> + <programlisting> +ssl_client_serial() RETURNS numeric + </programlisting> + <para> + Returns serial number of current client certificate. The combination + of certificate serial number and certificate issuer is guaranteed to + uniquely identify certificate (but not its owner -- the owner ought to + regularily change his keys, and get new certificates from the issuer). + </para> + <para> + So, if you run you own CA and allow only certificates from this CA to + be accepted by server, the serial number is the most reliable (albeit + not very mnemonic) means to indentify user. + </para> + </listitem> + + <listitem> + <programlisting> +ssl_client_dn() RETURNS text + </programlisting> + <para> + Returns the full subject of current client certificate, converting + character data into the current database encoding. It is assumed that + if you use non-Latin characters in the certificate names, your + database is able to represent these characters, too. If your database + uses the SQL_ASCII encoding, non-Latin characters in the name will be + represented as UTF-8 sequences. + </para> + <para> + The result looks like '/CN=Somebody /C=Some country/O=Some organization'. + </para> + </listitem> + + <listitem> + <programlisting> +ssl_issuer_dn() + </programlisting> + <para> + Returns the full issuer name of the client certificate, converting + character data into current database encoding. + </para> + <para> + The combination of the return value of this function with the + certificate serial number uniquely identifies the certificate. + </para> + <para> + The result of this function is really useful only if you have more + than one trusted CA certificate in your server's root.crt file, or if + this CA has issued some intermediate certificate authority + certificates. + </para> + </listitem> + + <listitem> + <programlisting> +ssl_client_dn_field(fieldName text) RETURNS text + </programlisting> + <para> + This function returns the value of the specified field in the + certificate subject. Field names are string constants that are + converted into ASN1 object identificators using the OpenSSL object + database. The following values are acceptable: + </para> + <programlisting> +commonName (alias CN) +surname (alias SN) +name +givenName (alias GN) +countryName (alias C) +localityName (alias L) +stateOrProvinceName (alias ST) +organizationName (alias O) +organizationUnitName (alias OU) +title +description +initials +postalCode +streetAddress +generationQualifier +description +dnQualifier +x500UniqueIdentifier +pseudonim +role +emailAddress + </programlisting> + <para> + All of these fields are optional, except commonName. It depends + entirely on your CA policy which of them would be included and which + wouldn't. The meaning of these fields, howeer, is strictly defined by + the X.500 and X.509 standards, so you cannot just assign arbitrary + meaning to them. + </para> + </listitem> + + <listitem> + <programlisting> +ssl_issuer_field(fieldName text) RETURNS text; + </programlisting> + <para> + Does same as ssl_client_dn_field, but for the certificate issuer + rather than the certificate subject. + </para> + </listitem> + </itemizedlist> + </sect2> + + <sect2> + <title>Author</title> + <para> + Victor Wagner <email>vitus@cryptocom.ru</email>, Cryptocom LTD + E-Mail of Cryptocom OpenSSL development group: + <email>openssl@cryptocom.ru</email> + </para> + </sect2> +</sect1> + diff --git a/doc/src/sgml/standby.sgml b/doc/src/sgml/standby.sgml new file mode 100644 index 00000000000..120fed4c2c0 --- /dev/null +++ b/doc/src/sgml/standby.sgml @@ -0,0 +1,249 @@ + +<sect1 id="pgstandby"> + <title>pg_standby</title> + + <indexterm zone="pgstandby"> + <primary>pgstandby</primary> + </indexterm> + + <para> + <literal>pg_standby</literal> is a production-ready program that can be used + to create a Warm Standby server. Other configuration is required as well, + all of which is described in the main server manual. + </para> + <para> + The program is designed to be a wait-for <literal>restore_command</literal>, + required to turn a normal archive recovery into a Warm Standby. Within the + <literal>restore_command</literal> of the <literal>recovery.conf</literal> + you could configure <literal>pg_standby</literal> in the following way: + </para> + <programlisting> + restore_command = 'pg_standby archiveDir %f %p' + </programlisting> + <para> + which would be sufficient to define that files will be restored from + archiveDir. + </para> + + <para> + <literal>pg_standby</literal> features include: + </para> + <itemizedlist> + <listitem> + <para> + It is written in C. So it is very portable + and easy to install. + </para> + </listitem> + <listitem> + <para> + Supports copy or link from a directory (only) + </para> + </listitem> + <listitem> + <para> + Source easy to modify, with specifically designated + sections to modify for your own needs, allowing + interfaces to be written for additional Backup Archive Restore + (BAR) systems + </para> + </listitem> + <listitem> + <para> + Already tested on Linux and Windows + </para> + </listitem> + </itemizedlist> + + <sect2> + <title>Usage</title> + <para> + <literal>pg_standby</literal> should be used within the + <literal>restore_command</literal> of the <literal>recovery.conf</literal> + file. + </para> + <para> + The basic usage should be like this: + </para> + <programlisting> + restore_command = 'pg_standby archiveDir %f %p' + </programlisting> + <para> + with the pg_standby command usage as + </para> + <programlisting> + pg_standby [OPTION]... [ARCHIVELOCATION] [NEXTWALFILE] [XLOGFILEPATH] + </programlisting> + <para> + When used within the <literal>restore_command</literal> the %f and %p macros + will provide the actual file and path required for the restore/recovery. + </para> + + <table> + <title>Options</title> + <tgroup cols="2"> + <tbody> + <row> + <entry>-c</entry> + <entry> use copy/cp command to restore WAL files from archive</entry> + </row> + <row> + <entry>-d</entry> + <entry>debug/logging option.</entry> + </row> + <row> + <entry>-k numfiles</entry> + <entry> + <para> + Cleanup files in the archive so that we maintain no more + than this many files in the archive. + </para> + <para> + You should be wary against setting this number too low, + since this may mean you cannot restart the standby. This + is because the last restartpoint marked in the WAL files + may be many files in the past and can vary considerably. + This should be set to a value exceeding the number of WAL + files that can be recovered in 2*checkpoint_timeout seconds, + according to the value in the warm standby postgresql.conf. + It is wholly unrelated to the setting of checkpoint_segments + on either primary or standby. + </para> + <para> + If in doubt, use a large value or do not set a value at all. + </para> + </entry> + </row> + <row> + <entry>-l</entry> + <entry> + <para> + use ln command to restore WAL files from archive + WAL files will remain in archive + </para> + <para> + Link is more efficient, but the default is copy to + allow you to maintain the WAL archive for recovery + purposes as well as high-availability. + </para> + <para> + This option uses the Windows Vista command mklink + to provide a file-to-file symbolic link. -l will + not work on versions of Windows prior to Vista. + Use the -c option instead. + see <ulink url="http://en.wikipedia.org/wiki/NTFS_symbolic_link"></ulink> + </para> + </entry> + </row> + <row> + <entry>-r maxretries</entry> + <entry> + <para> + the maximum number of times to retry the restore command if it + fails. After each failure, we wait for sleeptime * num_retries + so that the wait time increases progressively, so by default + we will wait 5 secs, 10 secs then 15 secs before reporting + the failure back to the database server. This will be + interpreted as and end of recovery and the Standby will come + up fully as a result. <literal>Default=3</literal> + </para> + </entry> + </row> + <row> + <entry>-s sleeptime</entry> + <entry> + the number of seconds to sleep between testing to see + if the file to be restored is available in the archive yet. + The default setting is not necessarily recommended, + consult the main database server manual for discussion. + <literal>Default=5</literal> + </entry> + </row> + <row> + <entry>-t triggerfile</entry> + <entry> + the presence of the triggerfile will cause recovery to end + whether or not the next file is available + It is recommended that you use a structured filename to + avoid confusion as to which server is being triggered + when multiple servers exist on same system. + e.g. /tmp/pgsql.trigger.5432 + </entry> + </row> + <row> + <entry>-w maxwaittime</entry> + <entry> + the maximum number of seconds to wait for the next file, + after which recovery will end and the Standby will come up. + The default setting is not necessarily recommended, + consult the main database server manual for discussion. + <literal>Default=0</literal> + </entry> + </row> + </tbody> + </tgroup> + </table> + <note> + <para> + <literal>--help</literal> is not supported since + <literal>pg_standby</literal> is not intended for interactive use, except + during development and testing. + </para> + </note> + </sect2> + + <sect2> + <title>Examples</title> + + <itemizedlist> + <listitem> + <para>Example on Linux</para> + <programlisting> +archive_command = 'cp %p ../archive/%f' + +restore_command = 'pg_standby -l -d -k 255 -r 2 -s 2 -w 0 -t /tmp/pgsql.trigger.5442 $PWD/../archive %f %p 2>> standby.log' + </programlisting> + <para> + which will + </para> + <itemizedlist> + <listitem><para>use a ln command to restore WAL files from archive</para></listitem> + <listitem><para>produce logfile output in standby.log</para></listitem> + <listitem><para>keep the last 255 full WAL files, plus the current one</para></listitem> + <listitem><para>sleep for 2 seconds between checks for next WAL file is full</para></listitem> + <listitem><para>never timeout if file not found</para></listitem> + <listitem><para>stop waiting when a trigger file called /tmp.pgsql.trigger.5442 appears</para></listitem> + </itemizedlist> + </listitem> + + <listitem> + <para> + Example on Windows + </para> + <programlisting> +archive_command = 'copy %p ..\\archive\\%f' + </programlisting> + <para> + Note that backslashes need to be doubled in the archive_command, but + *not* in the restore_command, in 8.2, 8.1, 8.0 on Windows. + </para> + <programlisting> +restore_command = 'pg_standby -c -d -s 5 -w 0 -t C:\pgsql.trigger.5442 + ..\archive %f %p 2>> standby.log' + </programlisting> + <para> + which will + </para> + <itemizedlist> + <listitem><para>use a copy command to restore WAL files from archive</para></listitem> + <listitem><para>produce logfile output in standby.log</para></listitem> + <listitem><para>sleep for 5 seconds between checks for next WAL file is full</para></listitem> + <listitem><para>never timeout if file not found</para></listitem> + <listitem><para>stop waiting when a trigger file called C:\pgsql.trigger.5442 appears</para></listitem> + </itemizedlist> + </listitem> + </itemizedlist> + </sect2> + +</sect1> + diff --git a/doc/src/sgml/tablefunc.sgml b/doc/src/sgml/tablefunc.sgml new file mode 100644 index 00000000000..23d12b55804 --- /dev/null +++ b/doc/src/sgml/tablefunc.sgml @@ -0,0 +1,765 @@ + +<sect1 id="tablefunc"> + <title>tablefunc</title> + + <indexterm zone="tablefunc"> + <primary>tablefunc</primary> + </indexterm> + + <para> + <literal>tablefunc</literal> provides functions to convert query rows into fields. + </para> + <sect2> + <title>Functions</title> + <table> + <title></title> + <tgroup cols="3"> + <thead> + <row> + <entry>Function</entry> + <entry>Returns</entry> + <entry>Comments</entry> + </row> + </thead> + <tbody> + <row> + <entry> + <literal> + normal_rand(int numvals, float8 mean, float8 stddev) + </literal> + </entry> + <entry> + returns a set of normally distributed float8 values + </entry> + <entry></entry> + </row> + <row> + <entry><literal>crosstabN(text sql)</literal></entry> + <entry>returns a set of row_name plus N category value columns</entry> + <entry> + crosstab2(), crosstab3(), and crosstab4() are defined for you, + but you can create additional crosstab functions per the instructions + in the documentation below. + </entry> + </row> + <row> + <entry><literal>crosstab(text sql)</literal></entry> + <entry>returns a set of row_name plus N category value columns</entry> + <entry> + requires anonymous composite type syntax in the FROM clause. See + the instructions in the documentation below. + </entry> + </row> + <row> + <entry><literal>crosstab(text sql, N int)</literal></entry> + <entry></entry> + <entry> + <para>obsolete version of crosstab()</para> + <para> + the argument N is now ignored, since the number of value columns + is always determined by the calling query + </para> + </entry> + </row> + <row> + <entry> + <literal> + connectby(text relname, text keyid_fld, text parent_keyid_fld + [, text orderby_fld], text start_with, int max_depth + [, text branch_delim]) + </literal> + </entry> + <entry> + returns keyid, parent_keyid, level, and an optional branch string + and an optional serial column for ordering siblings + </entry> + <entry> + requires anonymous composite type syntax in the FROM clause. See + the instructions in the documentation below. + </entry> + </row> + </tbody> + </tgroup> + </table> + + <sect3> + <title><literal>normal_rand</literal></title> + <programlisting> +normal_rand(int numvals, float8 mean, float8 stddev) RETURNS SETOF float8 + </programlisting> + <para> + Where <literal>numvals</literal> is the number of values to be returned + from the function. <literal>mean</literal> is the mean of the normal + distribution of values and <literal>stddev</literal> is the standard + deviation of the normal distribution of values. + </para> + <para> + Returns a float8 set of random values normally distributed (Gaussian + distribution). + </para> + <para> + Example: + </para> + <programlisting> + test=# SELECT * FROM + test=# normal_rand(1000, 5, 3); + normal_rand +---------------------- + 1.56556322244898 + 9.10040991424657 + 5.36957140345079 + -0.369151492880995 + 0.283600703686639 + . + . + . + 4.82992125404908 + 9.71308014517282 + 2.49639286969028 +(1000 rows) + </programlisting> + <para> + Returns 1000 values with a mean of 5 and a standard deviation of 3. + </para> + </sect3> + + + <sect3> + <title><literal>crosstabN(text sql)</literal></title> + <programlisting> +crosstabN(text sql) + </programlisting> + <para> + The <literal>sql</literal> parameter is a SQL statement which produces the + source set of data. The SQL statement must return one row_name column, one + category column, and one value column. <literal>row_name</literal> and + value must be of type text. The function returns a set of + <literal>row_name</literal> plus N category value columns. + </para> + <para> + Provided <literal>sql</literal> must produce a set something like: + </para> +<programlisting> +row_name cat value +---------+-------+------- + row1 cat1 val1 + row1 cat2 val2 + row1 cat3 val3 + row1 cat4 val4 + row2 cat1 val5 + row2 cat2 val6 + row2 cat3 val7 + row2 cat4 val8 + </programlisting> + <para> + The returned value is a <literal>SETOF table_crosstab_N</literal>, which + is defined by: + </para> + <programlisting> +CREATE TYPE tablefunc_crosstab_N AS ( + row_name TEXT, + category_1 TEXT, + category_2 TEXT, + . + . + . + category_N TEXT +); + </programlisting> + <para> + for the default installed functions, where N is 2, 3, or 4. + </para> + <para> + e.g. the provided crosstab2 function produces a set something like: + </para> + <programlisting> + <== values columns ==> + row_name category_1 category_2 + ---------+------------+------------ + row1 val1 val2 + row2 val5 val6 + </programlisting> + <note> + <orderedlist> + <listitem><para>The sql result must be ordered by 1,2.</para></listitem> + <listitem> + <para> + The number of values columns depends on the tuple description + of the function's declared return type. + </para> + </listitem> + <listitem> + <para> + Missing values (i.e. not enough adjacent rows of same row_name to + fill the number of result values columns) are filled in with nulls. + </para> + </listitem> + <listitem> + <para> + Extra values (i.e. too many adjacent rows of same row_name to fill + the number of result values columns) are skipped. + </para> + </listitem> + <listitem> + <para> + Rows with all nulls in the values columns are skipped. + </para> + </listitem> + <listitem> + <para> + The installed defaults are for illustration purposes. You + can create your own return types and functions based on the + crosstab() function of the installed library. See below for + details. + </para> + </listitem> + </orderedlist> + </note> + <para> + Example: + </para> + <programlisting> +create table ct(id serial, rowclass text, rowid text, attribute text, value text); +insert into ct(rowclass, rowid, attribute, value) values('group1','test1','att1','val1'); +insert into ct(rowclass, rowid, attribute, value) values('group1','test1','att2','val2'); +insert into ct(rowclass, rowid, attribute, value) values('group1','test1','att3','val3'); +insert into ct(rowclass, rowid, attribute, value) values('group1','test1','att4','val4'); +insert into ct(rowclass, rowid, attribute, value) values('group1','test2','att1','val5'); +insert into ct(rowclass, rowid, attribute, value) values('group1','test2','att2','val6'); +insert into ct(rowclass, rowid, attribute, value) values('group1','test2','att3','val7'); +insert into ct(rowclass, rowid, attribute, value) values('group1','test2','att4','val8'); + +select * from crosstab3( + 'select rowid, attribute, value + from ct + where rowclass = ''group1'' + and (attribute = ''att2'' or attribute = ''att3'') order by 1,2;'); + + row_name | category_1 | category_2 | category_3 +----------+------------+------------+------------ + test1 | val2 | val3 | + test2 | val6 | val7 | +(2 rows) + </programlisting> + </sect3> + + <sect3> + <title><literal>crosstab(text)</literal></title> + <programlisting> +crosstab(text sql) +crosstab(text sql, int N) + </programlisting> + <para> + The <literal>sql</literal> parameter is a SQL statement which produces the + source set of data. The SQL statement must return one + <literal>row_name</literal> column, one <literal>category</literal> column, + and one <literal>value</literal> column. <literal>N</literal> is an + obsolete argument; ignored if supplied (formerly this had to match the + number of category columns determined by the calling query). + </para> + <para> + </para> + <para> + e.g. provided sql must produce a set something like: + </para> +<programlisting> + row_name cat value +----------+-------+------- + row1 cat1 val1 + row1 cat2 val2 + row1 cat3 val3 + row1 cat4 val4 + row2 cat1 val5 + row2 cat2 val6 + row2 cat3 val7 + row2 cat4 val8 +</programlisting> + <para> + Returns a <literal>SETOF RECORD</literal>, which must be defined with a + column definition in the FROM clause of the SELECT statement, e.g.: + </para> + <programlisting> + SELECT * + FROM crosstab(sql) AS ct(row_name text, category_1 text, category_2 text); + </programlisting> + <para> + the example crosstab function produces a set something like: + </para> + <programlisting> + <== values columns ==> +row_name category_1 category_2 + ---------+------------+------------ + row1 val1 val2 + row2 val5 val6 + </programlisting> + <para> + Note that it follows these rules: + </para> + <orderedlist> + <listitem><para>The sql result must be ordered by 1,2.</para></listitem> + <listitem> + <para> + The number of values columns is determined by the column definition + provided in the FROM clause. The FROM clause must define one + row_name column (of the same datatype as the first result column + of the sql query) followed by N category columns (of the same + datatype as the third result column of the sql query). You can + set up as many category columns as you wish. + </para> + </listitem> + <listitem> + <para> + Missing values (i.e. not enough adjacent rows of same row_name to + fill the number of result values columns) are filled in with nulls. + </para> + </listitem> + <listitem> + <para> + Extra values (i.e. too many adjacent rows of same row_name to fill + the number of result values columns) are skipped. + </para> + </listitem> + <listitem> + <para> + Rows with all nulls in the values columns are skipped. + </para> + </listitem> + <listitem> + <para> + You can avoid always having to write out a FROM clause that defines the + output columns by setting up a custom crosstab function that has + the desired output row type wired into its definition. + </para> + </listitem> + </orderedlist> + <para> + There are two ways you can set up a custom crosstab function: + </para> + <itemizedlist> + <listitem> + <para> + Create a composite type to define your return type, similar to the + examples in the installation script. Then define a unique function + name accepting one text parameter and returning setof your_type_name. + For example, if your source data produces row_names that are TEXT, + and values that are FLOAT8, and you want 5 category columns: + </para> + <programlisting> + CREATE TYPE my_crosstab_float8_5_cols AS ( + row_name TEXT, + category_1 FLOAT8, + category_2 FLOAT8, + category_3 FLOAT8, + category_4 FLOAT8, + category_5 FLOAT8 + ); + + CREATE OR REPLACE FUNCTION crosstab_float8_5_cols(text) + RETURNS setof my_crosstab_float8_5_cols + AS '$libdir/tablefunc','crosstab' LANGUAGE C STABLE STRICT; + </programlisting> + </listitem> + <listitem> + <para> + Use OUT parameters to define the return type implicitly. + The same example could also be done this way: + </para> + <programlisting> + CREATE OR REPLACE FUNCTION crosstab_float8_5_cols(IN text, + OUT row_name TEXT, + OUT category_1 FLOAT8, + OUT category_2 FLOAT8, + OUT category_3 FLOAT8, + OUT category_4 FLOAT8, + OUT category_5 FLOAT8) + RETURNS setof record + AS '$libdir/tablefunc','crosstab' LANGUAGE C STABLE STRICT; + </programlisting> + </listitem> + </itemizedlist> + <para> + Example: + </para> + <programlisting> +CREATE TABLE ct(id SERIAL, rowclass TEXT, rowid TEXT, attribute TEXT, value TEXT); +INSERT INTO ct(rowclass, rowid, attribute, value) VALUES('group1','test1','att1','val1'); +INSERT INTO ct(rowclass, rowid, attribute, value) VALUES('group1','test1','att2','val2'); +INSERT INTO ct(rowclass, rowid, attribute, value) VALUES('group1','test1','att3','val3'); +INSERT INTO ct(rowclass, rowid, attribute, value) VALUES('group1','test1','att4','val4'); +INSERT INTO ct(rowclass, rowid, attribute, value) VALUES('group1','test2','att1','val5'); +INSERT INTO ct(rowclass, rowid, attribute, value) VALUES('group1','test2','att2','val6'); +INSERT INTO ct(rowclass, rowid, attribute, value) VALUES('group1','test2','att3','val7'); +INSERT INTO ct(rowclass, rowid, attribute, value) VALUES('group1','test2','att4','val8'); + +SELECT * +FROM crosstab( + 'select rowid, attribute, value + from ct + where rowclass = ''group1'' + and (attribute = ''att2'' or attribute = ''att3'') order by 1,2;', 3) +AS ct(row_name text, category_1 text, category_2 text, category_3 text); + + row_name | category_1 | category_2 | category_3 +----------+------------+------------+------------ + test1 | val2 | val3 | + test2 | val6 | val7 | +(2 rows) + </programlisting> + + </sect3> + + <sect3> + <title><literal>crosstab(text, text)</literal></title> + <programlisting> +crosstab(text source_sql, text category_sql) + </programlisting> + + <para> + Where <literal>source_sql</literal> is a SQL statement which produces the + source set of data. The SQL statement must return one + <literal>row_name</literal> column, one <literal>category</literal> column, + and one <literal>value</literal> column. It may also have one or more + <emphasis>extra</emphasis> columns. + </para> + <para> + The <literal>row_name</literal> column must be first. The + <literal>category</literal> and <literal>value</literal> columns must be + the last two columns, in that order. <emphasis>extra</emphasis> columns must + be columns 2 through (N - 2), where N is the total number of columns. + </para> + <para> + The <emphasis>extra</emphasis> columns are assumed to be the same for all + rows with the same <literal>row_name</literal>. The values returned are + copied from the first row with a given <literal>row_name</literal> and + subsequent values of these columns are ignored until + <literal>row_name</literal> changes. + </para> + <para> + e.g. <literal>source_sql</literal> must produce a set something like: + </para> + <programlisting> + SELECT row_name, extra_col, cat, value FROM foo; + + row_name extra_col cat value + ----------+------------+-----+--------- + row1 extra1 cat1 val1 + row1 extra1 cat2 val2 + row1 extra1 cat4 val4 + row2 extra2 cat1 val5 + row2 extra2 cat2 val6 + row2 extra2 cat3 val7 + row2 extra2 cat4 val8 + </programlisting> + + <para> + <literal>category_sql</literal> has to be a SQL statement which produces + the distinct set of categories. The SQL statement must return one category + column only. <literal>category_sql</literal> must produce at least one + result row or an error will be generated. <literal>category_sql</literal> + must not produce duplicate categories or an error will be generated. e.g.: + </para> + <programlisting> +SELECT DISTINCT cat FROM foo; + cat + ------- + cat1 + cat2 + cat3 + cat4 + </programlisting> + <para> + The function returns <literal>SETOF RECORD</literal>, which must be defined + with a column definition in the FROM clause of the SELECT statement, e.g.: + </para> + <programlisting> + SELECT * FROM crosstab(source_sql, cat_sql) + AS ct(row_name text, extra text, cat1 text, cat2 text, cat3 text, cat4 text); + </programlisting> + <para> + the example crosstab function produces a set something like: + </para> + <programlisting> + <== values columns ==> + row_name extra cat1 cat2 cat3 cat4 + ---------+-------+------+------+------+------ + row1 extra1 val1 val2 val4 + row2 extra2 val5 val6 val7 val8 + </programlisting> + <para> + Note that it follows these rules: + </para> + <orderedlist> + <listitem><para>source_sql must be ordered by row_name (column 1).</para></listitem> + <listitem> + <para> + The number of values columns is determined at run-time. The + column definition provided in the FROM clause must provide for + the correct number of columns of the proper data types. + </para> + </listitem> + <listitem> + <para> + Missing values (i.e. not enough adjacent rows of same row_name to + fill the number of result values columns) are filled in with nulls. + </para> + </listitem> + <listitem> + <para> + Extra values (i.e. source rows with category not found in category_sql + result) are skipped. + </para> + </listitem> + <listitem> + <para> + Rows with a null row_name column are skipped. + </para> + </listitem> + <listitem> + <para> + You can create predefined functions to avoid having to write out + the result column names/types in each query. See the examples + for crosstab(text). + </para> + </listitem> + </orderedlist> + + <programlisting> +CREATE TABLE cth(id serial, rowid text, rowdt timestamp, attribute text, val text); +INSERT INTO cth VALUES(DEFAULT,'test1','01 March 2003','temperature','42'); +INSERT INTO cth VALUES(DEFAULT,'test1','01 March 2003','test_result','PASS'); +INSERT INTO cth VALUES(DEFAULT,'test1','01 March 2003','volts','2.6987'); +INSERT INTO cth VALUES(DEFAULT,'test2','02 March 2003','temperature','53'); +INSERT INTO cth VALUES(DEFAULT,'test2','02 March 2003','test_result','FAIL'); +INSERT INTO cth VALUES(DEFAULT,'test2','02 March 2003','test_startdate','01 March 2003'); +INSERT INTO cth VALUES(DEFAULT,'test2','02 March 2003','volts','3.1234'); + +SELECT * FROM crosstab +( + 'SELECT rowid, rowdt, attribute, val FROM cth ORDER BY 1', + 'SELECT DISTINCT attribute FROM cth ORDER BY 1' +) +AS +( + rowid text, + rowdt timestamp, + temperature int4, + test_result text, + test_startdate timestamp, + volts float8 +); + rowid | rowdt | temperature | test_result | test_startdate | volts +-------+--------------------------+-------------+-------------+--------------------------+-------- + test1 | Sat Mar 01 00:00:00 2003 | 42 | PASS | | 2.6987 + test2 | Sun Mar 02 00:00:00 2003 | 53 | FAIL | Sat Mar 01 00:00:00 2003 | 3.1234 +(2 rows) + </programlisting> + </sect3> + <sect3> + <title> + <literal>connectby(text, text, text[, text], text, text, int[, text])</literal> + </title> + <programlisting> +connectby(text relname, text keyid_fld, text parent_keyid_fld + [, text orderby_fld], text start_with, int max_depth + [, text branch_delim]) + </programlisting> + <table> + <title><literal>connectby</literal> parameters</title> + <tgroup cols="2"> + <thead> + <row> + <entry>Parameter</entry> + <entry>Description</entry> + </row> + </thead> + <tbody> + <row> + <entry><literal>relname</literal></entry> + <entry>Name of the source relation</entry> + </row> + <row> + <entry><literal>keyid_fld</literal></entry> + <entry>Name of the key field</entry> + </row> + <row> + <entry><literal>parent_keyid_fld</literal></entry> + <entry>Name of the key_parent field</entry> + </row> + <row> + <entry><literal>orderby_fld</literal></entry> + <entry> + If optional ordering of siblings is desired: Name of the field to + order siblings + </entry> + </row> + <row> + <entry><literal>start_with</literal></entry> + <entry> + Root value of the tree input as a text value regardless of + <literal>keyid_fld</literal> + </entry> + </row> + <row> + <entry><literal>max_depth</literal></entry> + <entry> + Zero (0) for unlimited depth, otherwise restrict level to this depth + </entry> + </row> + <row> + <entry><literal>branch_delim</literal></entry> + <entry> + If optional branch value is desired, this string is used as the delimiter. + When not provided, a default value of '~' is used for internal + recursion detection only, and no "branch" field is returned. + </entry> + </row> + </tbody> + </tgroup> + </table> + <para> + The function returns <literal>SETOF RECORD</literal>, which must defined + with a column definition in the FROM clause of the SELECT statement, e.g.: + </para> + <programlisting> + SELECT * FROM connectby('connectby_tree', 'keyid', 'parent_keyid', 'row2', 0, '~') + AS t(keyid text, parent_keyid text, level int, branch text); + </programlisting> + <para> + or + </para> + <programlisting> + SELECT * FROM connectby('connectby_tree', 'keyid', 'parent_keyid', 'row2', 0) + AS t(keyid text, parent_keyid text, level int); + </programlisting> + <para> + or + </para> + <programlisting> + SELECT * FROM connectby('connectby_tree', 'keyid', 'parent_keyid', 'pos', 'row2', 0, '~') + AS t(keyid text, parent_keyid text, level int, branch text, pos int); + </programlisting> + <para> + or + </para> + <programlisting> + SELECT * FROM connectby('connectby_tree', 'keyid', 'parent_keyid', 'pos', 'row2', 0) + AS t(keyid text, parent_keyid text, level int, pos int); + </programlisting> + <para> + Note that it follows these rules: + </para> + <orderedlist> + <listitem><para>keyid and parent_keyid must be the same data type</para></listitem> + <listitem> + <para> + The column definition *must* include a third column of type INT4 for + the level value output + </para> + </listitem> + <listitem> + <para> + If the branch field is not desired, omit both the branch_delim input + parameter *and* the branch field in the query column definition. Note + that when branch_delim is not provided, a default value of '~' is used + for branch_delim for internal recursion detection, even though the branch + field is not returned. + </para> + </listitem> + <listitem> + <para> + If the branch field is desired, it must be the fourth column in the query + column definition, and it must be type TEXT. + </para> + </listitem> + <listitem> + <para> + The parameters representing table and field names must include double + quotes if the names are mixed-case or contain special characters. + </para> + </listitem> + <listitem> + <para> + If sorting of siblings is desired, the orderby_fld input parameter *and* + a name for the resulting serial field (type INT32) in the query column + definition must be given. + </para> + </listitem> + </orderedlist> + <para> + Example: + </para> + <programlisting> +CREATE TABLE connectby_tree(keyid text, parent_keyid text, pos int); + +INSERT INTO connectby_tree VALUES('row1',NULL, 0); +INSERT INTO connectby_tree VALUES('row2','row1', 0); +INSERT INTO connectby_tree VALUES('row3','row1', 0); +INSERT INTO connectby_tree VALUES('row4','row2', 1); +INSERT INTO connectby_tree VALUES('row5','row2', 0); +INSERT INTO connectby_tree VALUES('row6','row4', 0); +INSERT INTO connectby_tree VALUES('row7','row3', 0); +INSERT INTO connectby_tree VALUES('row8','row6', 0); +INSERT INTO connectby_tree VALUES('row9','row5', 0); + +-- with branch, without orderby_fld +SELECT * FROM connectby('connectby_tree', 'keyid', 'parent_keyid', 'row2', 0, '~') + AS t(keyid text, parent_keyid text, level int, branch text); + keyid | parent_keyid | level | branch +-------+--------------+-------+--------------------- + row2 | | 0 | row2 + row4 | row2 | 1 | row2~row4 + row6 | row4 | 2 | row2~row4~row6 + row8 | row6 | 3 | row2~row4~row6~row8 + row5 | row2 | 1 | row2~row5 + row9 | row5 | 2 | row2~row5~row9 +(6 rows) + +-- without branch, without orderby_fld +SELECT * FROM connectby('connectby_tree', 'keyid', 'parent_keyid', 'row2', 0) + AS t(keyid text, parent_keyid text, level int); + keyid | parent_keyid | level +-------+--------------+------- + row2 | | 0 + row4 | row2 | 1 + row6 | row4 | 2 + row8 | row6 | 3 + row5 | row2 | 1 + row9 | row5 | 2 +(6 rows) + +-- with branch, with orderby_fld (notice that row5 comes before row4) +SELECT * FROM connectby('connectby_tree', 'keyid', 'parent_keyid', 'pos', 'row2', 0, '~') + AS t(keyid text, parent_keyid text, level int, branch text, pos int) ORDER BY t.pos; + keyid | parent_keyid | level | branch | pos +-------+--------------+-------+---------------------+----- + row2 | | 0 | row2 | 1 + row5 | row2 | 1 | row2~row5 | 2 + row9 | row5 | 2 | row2~row5~row9 | 3 + row4 | row2 | 1 | row2~row4 | 4 + row6 | row4 | 2 | row2~row4~row6 | 5 + row8 | row6 | 3 | row2~row4~row6~row8 | 6 +(6 rows) + +-- without branch, with orderby_fld (notice that row5 comes before row4) +SELECT * FROM connectby('connectby_tree', 'keyid', 'parent_keyid', 'pos', 'row2', 0) + AS t(keyid text, parent_keyid text, level int, pos int) ORDER BY t.pos; + keyid | parent_keyid | level | pos +-------+--------------+-------+----- + row2 | | 0 | 1 + row5 | row2 | 1 | 2 + row9 | row5 | 2 | 3 + row4 | row2 | 1 | 4 + row6 | row4 | 2 | 5 + row8 | row6 | 3 | 6 +(6 rows) + </programlisting> + </sect3> + </sect2> + <sect2> + <title>Author</title> + <para> + Joe Conway + </para> + </sect2> +</sect1> + diff --git a/doc/src/sgml/trgm.sgml b/doc/src/sgml/trgm.sgml new file mode 100644 index 00000000000..62a5e30382f --- /dev/null +++ b/doc/src/sgml/trgm.sgml @@ -0,0 +1,214 @@ +<sect1 id="pgtrgm"> + <title>pg_trgm</title> + + <indexterm zone="pgtrgm"> + <primary>pgtrgm</primary> + </indexterm> + + <para> + The <literal>pg_trgm</literal> module provides functions and index classes + for determining the similarity of text based on trigram matching. + </para> + + <sect2> + <title>Trigram (or Trigraph)</title> + <para> + A trigram is a set of three consecutive characters taken + from a string. A string is considered to have two spaces + prefixed and one space suffixed when determining the set + of trigrams that comprise the string. + </para> + <para> + eg. The set of trigrams in the word "cat" is " c", " ca", + "at " and "cat". + </para> + </sect2> + + <sect2> + <title>Public Functions</title> + <table> + <title><literal>pg_trgm</literal> functions</title> + <tgroup cols="2"> + <thead> + <row> + <entry>Function</entry> + <entry>Description</entry> + </row> + </thead> + <tbody> + <row> + <entry><literal>real similarity(text, text)</literal></entry> + <entry> + <para> + Returns a number that indicates how closely matches the two + arguments are. A zero result indicates that the two words + are completely dissimilar, and a result of one indicates that + the two words are identical. + </para> + </entry> + </row> + <row> + <entry><literal>real show_limit()</literal></entry> + <entry> + <para> + Returns the current similarity threshold used by the '%' + operator. This in effect sets the minimum similarity between + two words in order that they be considered similar enough to + be misspellings of each other, for example. + </para> + </entry> + </row> + <row> + <entry><literal>real set_limit(real)</literal></entry> + <entry> + <para> + Sets the current similarity threshold that is used by the '%' + operator, and is returned by the show_limit() function. + </para> + </entry> + </row> + <row> + <entry><literal>text[] show_trgm(text)</literal></entry> + <entry> + <para> + Returns an array of all the trigrams of the supplied text + parameter. + </para> + </entry> + </row> + <row> + <entry>Operator: <literal>text % text (returns boolean)</literal></entry> + <entry> + <para> + The '%' operator returns TRUE if its two arguments have a similarity + that is greater than the similarity threshold set by set_limit(). It + will return FALSE if the similarity is less than the current + threshold. + </para> + </entry> + </row> + </tbody> + </tgroup> + </table> + </sect2> + + <sect2> + <title>Public Index Operator Class</title> + <para> + The <literal>pg_trgm</literal> module comes with the + <literal>gist_trgm_ops</literal> index operator class that allows a + developer to create an index over a text column for the purpose + of very fast similarity searches. + </para> + <para> + To use this index, the '%' operator must be used and an appropriate + similarity threshold for the application must be set. Example: + </para> + <programlisting> +CREATE TABLE test_trgm (t text); +CREATE INDEX trgm_idx ON test_trgm USING gist (t gist_trgm_ops); + </programlisting> + <para> + At this point, you will have an index on the t text column that you + can use for similarity searching. Example: + </para> + <programlisting> +SELECT + t, + similarity(t, 'word') AS sml +FROM + test_trgm +WHERE + t % 'word' +ORDER BY + sml DESC, t; + </programlisting> + <para> + This will return all values in the text column that are sufficiently + similar to 'word', sorted from best match to worst. The index will + be used to make this a fast operation over very large data sets. + </para> + </sect2> + + <sect2> + <title>Tsearch2 Integration</title> + <para> + Trigram matching is a very useful tool when used in conjunction + with a text index created by the Tsearch2 contrib module. (See + contrib/tsearch2) + </para> + <para> + The first step is to generate an auxiliary table containing all + the unique words in the Tsearch2 index: + </para> + <programlisting> +CREATE TABLE words AS SELECT word FROM + stat('SELECT to_tsvector(''simple'', bodytext) FROM documents'); + </programlisting> + <para> + Where 'documents' is a table that has a text field 'bodytext' + that TSearch2 is used to search. The use of the 'simple' dictionary + with the to_tsvector function, instead of just using the already + existing vector is to avoid creating a list of already stemmed + words. This way, only the original, unstemmed words are added + to the word list. + </para> + <para> + Next, create a trigram index on the word column: + </para> + <programlisting> +CREATE INDEX words_idx ON words USING gist(word gist_trgm_ops); + </programlisting> + <para> + or + </para> + <programlisting> +CREATE INDEX words_idx ON words USING gin(word gist_trgm_ops); + </programlisting> + <para> + Now, a <literal>SELECT</literal> query similar to the example above can be + used to suggest spellings for misspelled words in user search terms. A + useful extra clause is to ensure that the similar words are also + of similar length to the misspelled word. + </para> + <para> + <note> + <para> + Since the 'words' table has been generated as a separate, + static table, it will need to be periodically regenerated so that + it remains up to date with the word list in the Tsearch2 index. + </para> + </note> + </para> + </sect2> + + <sect2> + <title>References</title> + <para> + Tsearch2 Development Site + <ulink url="http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/"></ulink> + </para> + <para> + GiST Development Site + <ulink url="http://www.sai.msu.su/~megera/postgres/gist/"></ulink> + </para> + </sect2> + + <sect2> + <title>Authors</title> + <para> + Oleg Bartunov <email>oleg@sai.msu.su</email>, Moscow, Moscow University, Russia + </para> + <para> + Teodor Sigaev <email>teodor@sigaev.ru</email>, Moscow, Delta-Soft Ltd.,Russia + </para> + <para> + Documentation: Christopher Kings-Lynne + </para> + <para> + This module is sponsored by Delta-Soft Ltd., Moscow, Russia. + </para> + </sect2> + +</sect1> + diff --git a/doc/src/sgml/uuid-ossp.sgml b/doc/src/sgml/uuid-ossp.sgml new file mode 100644 index 00000000000..93e6c0faeac --- /dev/null +++ b/doc/src/sgml/uuid-ossp.sgml @@ -0,0 +1,163 @@ + +<sect1 id="uuid-ossp"> + <title>uuid-ossp</title> + + <indexterm zone="uuid-ossp"> + <primary>uuid-ossp</primary> + </indexterm> + + <para> + This module provides functions to generate universally unique + identifiers (UUIDs) using one of the several standard algorithms, as + well as functions to produce certain special UUID constants. + </para> + + <sect2> + <title>UUID Generation</title> + <para> + The relevant standards ITU-T Rec. X.667, ISO/IEC 9834-8:2005, and RFC + 4122 specify four algorithms for generating UUIDs, identified by the + version numbers 1, 3, 4, and 5. (There is no version 2 algorithm.) + Each of these algorithms could be suitable for a different set of + applications. + </para> + + <table> + <title><literal>uuid-ossp</literal> functions</title> + <tgroup cols="2"> + <thead> + <row> + <entry>Function</entry> + <entry>Description</entry> + </row> + </thead> + <tbody> + <row> + <entry><literal>uuid_generate_v1()</literal></entry> + <entry> + <para> + This function generates a version 1 UUID. This involves the MAC + address of the computer and a time stamp. Note that UUIDs of this + kind reveal the identity of the computer that created the identifier + and the time at which it did so, which might make it unsuitable for + certain security-sensitive applications. + </para> + </entry> + </row> + <row> + <entry><literal>uuid_generate_v1mc()</literal></entry> + <entry> + <para> + This function generates a version 1 UUID but uses a random multicast + MAC address instead of the real MAC address of the computer. + </para> + </entry> + </row> + <row> + <entry><literal>uuid_generate_v3(namespace uuid, name text)</literal></entry> + <entry> + <para> + This function generates a version 3 UUID in the given namespace using + the specified input name. The namespace should be one of the special + constants produced by the uuid_ns_*() functions shown below. (It + could be any UUID in theory.) The name is an identifier in the + selected namespace. For example: + </para> + </entry> + </row> + <row> + <entry><literal>uuid_generate_v3(uuid_ns_url(), 'http://www.postgresql.org')</literal></entry> + <entry> + <para> + The name parameter will be MD5-hashed, so the cleartext cannot be + derived from the generated UUID. + </para> + <para> + The generation of UUIDs by this method has no random or + environment-dependent element and is therefore reproducible. + </para> + </entry> + </row> + <row> + <entry><literal>uuid_generate_v4()</literal></entry> + <entry> + <para> + This function generates a version 4 UUID, which is derived entirely + from random numbers. + </para> + </entry> + </row> + <row> + <entry><literal>uuid_generate_v5(namespace uuid, name text)</literal></entry> + <entry> + <para> + This function generates a version 5 UUID, which works like a version 3 + UUID except that SHA-1 is used as a hashing method. Version 5 should + be preferred over version 3 because SHA-1 is thought to be more secure + than MD5. + </para> + </entry> + </row> + </tbody> + </tgroup> + </table> + + <table> + <title>UUID Constants</title> + <tgroup cols="2"> + <tbody> + <row> + <entry><literal>uuid_nil()</literal></entry> + <entry> + <para> + A "nil" UUID constant, which does not occur as a real UUID. + </para> + </entry> + </row> + <row> + <entry><literal>uuid_ns_dns()</literal></entry> + <entry> + <para> + Constant designating the DNS namespace for UUIDs. + </para> + </entry> + </row> + <row> + <entry><literal>uuid_ns_url()</literal></entry> + <entry> + <para> + Constant designating the URL namespace for UUIDs. + </para> + </entry> + </row> + <row> + <entry><literal>uuid_ns_oid()</literal></entry> + <entry> + <para> + Constant designating the ISO object identifier (OID) namespace for + UUIDs. (This pertains to ASN.1 OIDs, unrelated to the OIDs used in + PostgreSQL.) + </para> + </entry> + </row> + <row> + <entry><literal>uuid_ns_x500()</literal></entry> + <entry> + <para> + Constant designating the X.500 distinguished name (DN) namespace for + UUIDs. + </para> + </entry> + </row> + </tbody> + </tgroup> + </table> + </sect2> + <sect2> + <title>Author</title> + <para> + Peter Eisentraut <email>peter_e@gmx.net</email> + </para> + </sect2> +</sect1> + diff --git a/doc/src/sgml/vacuumlo.sgml b/doc/src/sgml/vacuumlo.sgml new file mode 100644 index 00000000000..28219d2b257 --- /dev/null +++ b/doc/src/sgml/vacuumlo.sgml @@ -0,0 +1,74 @@ +<sect1 id="vacuumlo"> + <title>vacuumlo</title> + + <indexterm zone="vacuumlo"> + <primary>vacuumlo</primary> + </indexterm> + + <para> + This is a simple utility that will remove any orphaned large objects out of a + PostgreSQL database. An orphaned LO is considered to be any LO whose OID + does not appear in any OID data column of the database. + </para> + <para> + If you use this, you may also be interested in the lo_manage trigger in + contrib/lo. lo_manage is useful to try to avoid creating orphaned LOs + in the first place. + </para> + <para> + <note> + <para> + It was decided to place this in contrib as it needs further testing, but hopefully, + this (or a variant of it) would make it into the backend as a "vacuum lo" + command in a later release. + </para> + </note> + </para> + + <sect2> + <title>Usage</title> + <programlisting> +vacuumlo [options] database [database2 ... databasen] + </programlisting> + <para> + All databases named on the command line are processed. Available options + include: + </para> + <programlisting> +-v Write a lot of progress messages +-n Don't remove large objects, just show what would be done +-U username Username to connect as +-W Prompt for password +-h hostname Database server host +-p port Database server port + </programlisting> + </sect2> + + <sect2> + <title>Method</title> + <para> + First, it builds a temporary table which contains all of the OIDs of the + large objects in that database. + </para> + <para> + It then scans through all columns in the database that are of type "oid" + or "lo", and removes matching entries from the temporary table. + </para> + <para> + The remaining entries in the temp table identify orphaned LOs. These are + removed. + </para> + </sect2> + + <sect2> + <title>Author</title> + <para> + Peter Mount <email>peter@retep.org.uk</email> + </para> + <para> + <ulink url="http://www.retep.org.uk"></ulink> + </para> + </sect2> + +</sect1> + diff --git a/doc/src/sgml/xml2.sgml b/doc/src/sgml/xml2.sgml new file mode 100644 index 00000000000..d73789a155d --- /dev/null +++ b/doc/src/sgml/xml2.sgml @@ -0,0 +1,436 @@ +<sect1 id="xml2"> + <title>xml2: XML-handling functions</title> + + <indexterm zone="xml2"> + <primary>xml2</primary> + </indexterm> + + <sect2> + <title>Deprecation notice</title> + <para> + From PostgreSQL 8.3 on, there is XML-related + functionality based on the SQL/XML standard in the core server. + That functionality covers XML syntax checking and XPath queries, + which is what this module does as well, and more, but the API is + not at all compatible. It is planned that this module will be + removed in PostgreSQL 8.4 in favor of the newer standard API, so + you are encouraged to try converting your applications. If you + find that some of the functionality of this module is not + available in an adequate form with the newer API, please explain + your issue to pgsql-hackers@postgresql.org so that the deficiency + can be addressed. + </para> + </sect2> + + <sect2> + <title>Description of functions</title> + <para> + The first set of functions are straightforward XML parsing and XPath queries: + </para> + + <table> + <title>Functions</title> + <tgroup cols="2"> + <tbody> + <row> + <entry> + <programlisting> + xml_is_well_formed(document) RETURNS bool + </programlisting> + </entry> + <entry> + <para> + This parses the document text in its parameter and returns true if the + document is well-formed XML. (Note: before PostgreSQL 8.2, this function + was called xml_valid(). That is the wrong name since validity and + well-formedness have different meanings in XML. The old name is still + available, but is deprecated and will be removed in 8.3.) + </para> + </entry> + </row> + <row> + <entry> + <programlisting> + xpath_string(document,query) RETURNS text + xpath_number(document,query) RETURNS float4 + xpath_bool(document,query) RETURNS bool + </programlisting> + </entry> + <entry> + <para> + These functions evaluate the XPath query on the supplied document, and + cast the result to the specified type. + </para> + </entry> + </row> + <row> + <entry> + <programlisting> + xpath_nodeset(document,query,toptag,itemtag) RETURNS text + </programlisting> + </entry> + <entry> + <para> + This evaluates query on document and wraps the result in XML tags. If + the result is multivalued, the output will look like: + </para> + <literal> + <toptag> + <itemtag>Value 1 which could be an XML fragment</itemtag> + <itemtag>Value 2....</itemtag> + </toptag> + </literal> + <para> + If either toptag or itemtag is an empty string, the relevant tag is omitted. + </para> + </entry> + </row> + <row> + <entry> + <programlisting> + xpath_nodeset(document,query) RETURNS + </programlisting> + </entry> + <entry> + <para> + Like xpath_nodeset(document,query,toptag,itemtag) but text omits both tags. + </para> + </entry> + </row> + <row> + <entry> + <programlisting> + xpath_nodeset(document,query,itemtag) RETURNS + </programlisting> + </entry> + <entry> + <para> + Like xpath_nodeset(document,query,toptag,itemtag) but text omits toptag. + </para> + </entry> + </row> + <row> + <entry> + <programlisting> + xpath_list(document,query,seperator) RETURNS text + </programlisting> + </entry> + <entry> + <para> + This function returns multiple values seperated by the specified + seperator, e.g. Value 1,Value 2,Value 3 if seperator=','. + </para> + </entry> + </row> + <row> + <entry> + <programlisting> + xpath_list(document,query) RETURNS text + </programlisting> + </entry> + <entry> + This is a wrapper for the above function that uses ',' as the seperator. + </entry> + </row> + </tbody> + </tgroup> + </table> + </sect2> + + + <sect2> + <title><literal>xpath_table</literal></title> + <para> + This is a table function which evaluates a set of XPath queries on + each of a set of documents and returns the results as a table. The + primary key field from the original document table is returned as the + first column of the result so that the resultset from xpath_table can + be readily used in joins. + </para> + <para> + The function itself takes 5 arguments, all text. + </para> + <programlisting> + xpath_table(key,document,relation,xpaths,criteria) + </programlisting> + <table> + <title>Parameters</title> + <tgroup cols="2"> + <tbody> + <row> + <entry><literal>key</literal></entry> + <entry> + <para> + the name of the "key" field - this is just a field to be used as + the first column of the output table i.e. it identifies the record from + which each output row came (see note below about multiple values). + </para> + </entry> + </row> + <row> + <entry><literal>document</literal></entry> + <entry> + <para> + the name of the field containing the XML document + </para> + </entry> + </row> + <row> + <entry><literal>relation</literal></entry> + <entry> + <para> + the name of the table or view containing the documents + </para> + </entry> + </row> + <row> + <entry><literal>xpaths</literal></entry> + <entry> + <para> + multiple xpath expressions separated by <literal>|</literal> + </para> + </entry> + </row> + <row> + <entry><literal>criteria</literal></entry> + <entry> + <para> + The contents of the where clause. This needs to be specified, + so use "true" or "1=1" here if you want to process all the rows in the + relation. + </para> + </entry> + </row> + </tbody> + </tgroup> + </table> + + <para> + NB These parameters (except the XPath strings) are just substituted + into a plain SQL SELECT statement, so you have some flexibility - the + statement is + </para> + + <para> + <literal> + SELECT <key>,<document> FROM <relation> WHERE <criteria> + </literal> + </para> + + <para> + so those parameters can be *anything* valid in those particular + locations. The result from this SELECT needs to return exactly two + columns (which it will unless you try to list multiple fields for key + or document). Beware that this simplistic approach requires that you + validate any user-supplied values to avoid SQL injection attacks. + </para> + + <para> + Using the function + </para> + + <para> + The function has to be used in a FROM expression. This gives the following + form: + </para> + + <programlisting> +SELECT * FROM +xpath_table('article_id', + 'article_xml', + 'articles', + '/article/author|/article/pages|/article/title', + 'date_entered > ''2003-01-01'' ') +AS t(article_id integer, author text, page_count integer, title text); + </programlisting> + + <para> + The AS clause defines the names and types of the columns in the + virtual table. If there are more XPath queries than result columns, + the extra queries will be ignored. If there are more result columns + than XPath queries, the extra columns will be NULL. + </para> + + <para> + Note that I've said in this example that pages is an integer. The + function deals internally with string representations, so when you say + you want an integer in the output, it will take the string + representation of the XPath result and use PostgreSQL input functions + to transform it into an integer (or whatever type the AS clause + requests). An error will result if it can't do this - for example if + the result is empty - so you may wish to just stick to 'text' as the + column type if you think your data has any problems. + </para> + <para> + The select statement doesn't need to use * alone - it can reference the + columns by name or join them to other tables. The function produces a + virtual table with which you can perform any operation you wish (e.g. + aggregation, joining, sorting etc). So we could also have: + </para> + + <programlisting> +SELECT t.title, p.fullname, p.email +FROM xpath_table('article_id','article_xml','articles', + '/article/title|/article/author/@id', + 'xpath_string(article_xml,''/article/@date'') > ''2003-03-20'' ') + AS t(article_id integer, title text, author_id integer), + tblPeopleInfo AS p +WHERE t.author_id = p.person_id; + </programlisting> + + <para> + as a more complicated example. Of course, you could wrap all + of this in a view for convenience. + </para> + <sect3> + <title>Multivalued results</title> + <para> + The xpath_table function assumes that the results of each XPath query + might be multi-valued, so the number of rows returned by the function + may not be the same as the number of input documents. The first row + returned contains the first result from each query, the second row the + second result from each query. If one of the queries has fewer values + than the others, NULLs will be returned instead. + </para> + <para> + In some cases, a user will know that a given XPath query will return + only a single result (perhaps a unique document identifier) - if used + alongside an XPath query returning multiple results, the single-valued + result will appear only on the first row of the result. The solution + to this is to use the key field as part of a join against a simpler + XPath query. As an example: + </para> + + <para> + <literal> + CREATE TABLE test + ( + id int4 NOT NULL, + xml text, + CONSTRAINT pk PRIMARY KEY (id) + ) + WITHOUT OIDS; + + INSERT INTO test VALUES (1, '<doc num="C1"> + <line num="L1"><a>1</a><b>2</b><c>3</c></line> + <line num="L2"><a>11</a><b>22</b><c>33</c></line> + </doc>'); + + INSERT INTO test VALUES (2, '<doc num="C2"> + <line num="L1"><a>111</a><b>222</b><c>333</c></line> + <line num="L2"><a>111</a><b>222</b><c>333</c></line> + </doc>'); + </literal> + </para> + </sect3> + + <sect3> + <title>The query</title> + + <programlisting> + SELECT * FROM xpath_table('id','xml','test', + '/doc/@num|/doc/line/@num|/doc/line/a|/doc/line/b|/doc/line/c','1=1') + AS t(id int4, doc_num varchar(10), line_num varchar(10), val1 int4, + val2 int4, val3 int4) + WHERE id = 1 ORDER BY doc_num, line_num + </programlisting> + + <para> + Gives the result: + </para> + + <programlisting> + id | doc_num | line_num | val1 | val2 | val3 + ----+---------+----------+------+------+------ + 1 | C1 | L1 | 1 | 2 | 3 + 1 | | L2 | 11 | 22 | 33 + </programlisting> + + <para> + To get doc_num on every line, the solution is to use two invocations + of xpath_table and join the results: + </para> + + <programlisting> + SELECT t.*,i.doc_num FROM + xpath_table('id','xml','test', + '/doc/line/@num|/doc/line/a|/doc/line/b|/doc/line/c','1=1') + AS t(id int4, line_num varchar(10), val1 int4, val2 int4, val3 int4), + xpath_table('id','xml','test','/doc/@num','1=1') + AS i(id int4, doc_num varchar(10)) + WHERE i.id=t.id AND i.id=1 + ORDER BY doc_num, line_num; + </programlisting> + + <para> + which gives the desired result: + </para> + + <programlisting> + id | line_num | val1 | val2 | val3 | doc_num + ----+----------+------+------+------+--------- + 1 | L1 | 1 | 2 | 3 | C1 + 1 | L2 | 11 | 22 | 33 | C1 + (2 rows) + </programlisting> + </sect3> + </sect2> + + + <sect2> + <title>XSLT functions</title> + <para> + The following functions are available if libxslt is installed (this is + not currently detected automatically, so you will have to amend the + Makefile) + </para> + + <sect3> + <title><literal>xslt_process</literal></title> + <programlisting> + xslt_process(document,stylesheet,paramlist) RETURNS text + </programlisting> + + <para> + This function appplies the XSL stylesheet to the document and returns + the transformed result. The paramlist is a list of parameter + assignments to be used in the transformation, specified in the form + 'a=1,b=2'. Note that this is also proof-of-concept code and the + parameter parsing is very simple-minded (e.g. parameter values cannot + contain commas!) + </para> + <para> + Also note that if either the document or stylesheet values do not + begin with a < then they will be treated as URLs and libxslt will + fetch them. It thus follows that you can use xslt_process as a means + to fetch the contents of URLs - you should be aware of the security + implications of this. + </para> + <para> + There is also a two-parameter version of xslt_process which does not + pass any parameters to the transformation. + </para> + </sect3> + </sect2> + + <sect2> + <title>Credits</title> + <para> + Development of this module was sponsored by Torchbox Ltd. (www.torchbox.com) + It has the same BSD licence as PostgreSQL. + </para> + <para> + This version of the XML functions provides both XPath querying and + XSLT functionality. There is also a new table function which allows + the straightforward return of multiple XML results. Note that the current code + doesn't take any particular care over character sets - this is + something that should be fixed at some point! + </para> + <para> + If you have any comments or suggestions, please do contact me at + <email>jgray@azuli.co.uk.</email> Unfortunately, this isn't my main job, so + I can't guarantee a rapid response to your query! + </para> + </sect2> +</sect1> + |