Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
summaryrefslogtreecommitdiff
path: root/doc/src
diff options
context:
space:
mode:
authorBruce Momjian2007-11-10 23:30:46 +0000
committerBruce Momjian2007-11-10 23:30:46 +0000
commitc3c69ab4fd25a20749b850d34cbc8ce3f1812e3b (patch)
treea7d9b4501297216ac71f597dca3840090ae791d5 /doc/src
parent6e414a171e8a91966b10ecd14aa367422870bdd2 (diff)
Move most /contrib README files into SGML. Some still need conversion
or will never be converted.
Diffstat (limited to 'doc/src')
-rw-r--r--doc/src/sgml/adminpack.sgml32
-rw-r--r--doc/src/sgml/btree-gist.sgml40
-rw-r--r--doc/src/sgml/buffercache.sgml121
-rw-r--r--doc/src/sgml/chkpass.sgml84
-rw-r--r--doc/src/sgml/contrib.sgml56
-rw-r--r--doc/src/sgml/cube.sgml529
-rw-r--r--doc/src/sgml/dblink.sgml1312
-rw-r--r--doc/src/sgml/earthdistance.sgml133
-rw-r--r--doc/src/sgml/filelist.sgml34
-rw-r--r--doc/src/sgml/freespacemap.sgml243
-rw-r--r--doc/src/sgml/fuzzystrmatch.sgml122
-rw-r--r--doc/src/sgml/hstore.sgml298
-rw-r--r--doc/src/sgml/intagg.sgml82
-rw-r--r--doc/src/sgml/intarray.sgml286
-rw-r--r--doc/src/sgml/isn.sgml502
-rw-r--r--doc/src/sgml/lo.sgml118
-rw-r--r--doc/src/sgml/ltree.sgml771
-rw-r--r--doc/src/sgml/oid2name.sgml205
-rw-r--r--doc/src/sgml/pageinspect.sgml125
-rw-r--r--doc/src/sgml/pgbench.sgml422
-rw-r--r--doc/src/sgml/pgcrypto.sgml1144
-rw-r--r--doc/src/sgml/pgrowlocks.sgml123
-rw-r--r--doc/src/sgml/pgstattuple.sgml158
-rw-r--r--doc/src/sgml/postgres.sgml3
-rw-r--r--doc/src/sgml/seg.sgml450
-rw-r--r--doc/src/sgml/sslinfo.sgml164
-rw-r--r--doc/src/sgml/standby.sgml249
-rw-r--r--doc/src/sgml/tablefunc.sgml765
-rw-r--r--doc/src/sgml/trgm.sgml214
-rw-r--r--doc/src/sgml/uuid-ossp.sgml163
-rw-r--r--doc/src/sgml/vacuumlo.sgml74
-rw-r--r--doc/src/sgml/xml2.sgml436
32 files changed, 9456 insertions, 2 deletions
diff --git a/doc/src/sgml/adminpack.sgml b/doc/src/sgml/adminpack.sgml
new file mode 100644
index 00000000000..10816f5c24d
--- /dev/null
+++ b/doc/src/sgml/adminpack.sgml
@@ -0,0 +1,32 @@
+<sect1>
+ <title>adminpack</title>
+ <para>
+ adminpack is a PostgreSQL standard module that implements a number of
+ support functions which pgAdmin and other administration and management tools
+ can use to provide additional functionality if installed on a server.
+ </para>
+
+ <sect2>
+ <title>Functions implemented</title>
+ <para>
+ Functions implemented by adminpack can only be run by a superuser. Here's a
+ list of these functions:
+ </para>
+ <para>
+ <programlisting>
+ int8 pg_catalog.pg_file_write(fname text, data text, append bool)
+ bool pg_catalog.pg_file_rename(oldname text, newname text, archivname text)
+ bool pg_catalog.pg_file_rename(oldname text, newname text)
+ bool pg_catalog.pg_file_unlink(fname text)
+ setof record pg_catalog.pg_logdir_ls()
+
+ /* Renaming of existing backend functions for pgAdmin compatibility */
+ int8 pg_catalog.pg_file_read(fname text, data text, append bool)
+ bigint pg_catalog.pg_file_length(text)
+ int4 pg_catalog.pg_logfile_rotate()
+ </programlisting>
+ </para>
+ </sect2>
+
+</sect1>
+
diff --git a/doc/src/sgml/btree-gist.sgml b/doc/src/sgml/btree-gist.sgml
new file mode 100644
index 00000000000..4e1126e33c3
--- /dev/null
+++ b/doc/src/sgml/btree-gist.sgml
@@ -0,0 +1,40 @@
+<sect1>
+ <!--
+ <indexterm zone="btree-gist">
+ <primary>btree-gist</primary>
+ </indexterm>
+ -->
+
+ <title>btree-gist</title>
+
+ <para>
+ btree-gist is a B-Tree implementation using GiST that supports the int2, int4,
+ int8, float4, float8 timestamp with/without time zone, time
+ with/without time zone, date, interval, oid, money, macaddr, char,
+ varchar/text, bytea, numeric, bit, varbit and inet/cidr types.
+ </para>
+
+ <sect2>
+ <title>Example usage</title>
+ <programlisting>
+ CREATE TABLE test (a int4);
+ -- create index
+ CREATE INDEX testidx ON test USING gist (a);
+ -- query
+ SELECT * FROM test WHERE a < 10;
+ </programlisting>
+ </sect2>
+
+ <sect2>
+ <title>Authors</title>
+ <para>
+ All work was done by Teodor Sigaev (<email>teodor@stack.net</email>) ,
+ Oleg Bartunov (<email>oleg@sai.msu.su</email>), Janko Richter
+ (<email>jankorichter@yahoo.de</email>). See
+ <ulink url="http://www.sai.msu.su/~megera/postgres/gist"></ulink> for additional
+ information.
+ </para>
+ </sect2>
+
+</sect1>
+
diff --git a/doc/src/sgml/buffercache.sgml b/doc/src/sgml/buffercache.sgml
new file mode 100644
index 00000000000..1347c75ba32
--- /dev/null
+++ b/doc/src/sgml/buffercache.sgml
@@ -0,0 +1,121 @@
+<sect1 id="buffercache">
+ <title>pg_buffercache</title>
+
+ <indexterm zone="buffercache">
+ <primary>pg_buffercache</primary>
+ </indexterm>
+
+ <para>
+ <literal>pg_buffercache</literal> module provides the means for examining
+ what's happening to the buffercache at any given time without having to
+ restart or rebuild the server with debugging code added. The intent is to
+ do for the buffercache what pg_locks does for locks.
+ </para>
+ <para>
+ This module consists of a C function <literal>pg_buffercache_pages()</literal>
+ that returns a set of records, plus a view <literal>pg_buffercache</literal>
+ to wrapper the function.
+ </para>
+ <para>
+ By default public access is REVOKED from both of these, just in case there
+ are security issues lurking.
+ </para>
+
+ <sect2>
+ <title>Notes</title>
+ <para>
+ The definition of the columns exposed in the view is:
+ </para>
+ <programlisting>
+ Column | references | Description
+ ----------------+----------------------+------------------------------------
+ bufferid | | Id, 1..shared_buffers.
+ relfilenode | pg_class.relfilenode | Refilenode of the relation.
+ reltablespace | pg_tablespace.oid | Tablespace oid of the relation.
+ reldatabase | pg_database.oid | Database for the relation.
+ relblocknumber | | Offset of the page in the relation.
+ isdirty | | Is the page dirty?
+ usagecount | | Page LRU count
+ </programlisting>
+ <para>
+ There is one row for each buffer in the shared cache. Unused buffers are
+ shown with all fields null except bufferid.
+ </para>
+ <para>
+ Because the cache is shared by all the databases, there are pages from
+ relations not belonging to the current database.
+ </para>
+ <para>
+ When the pg_buffercache view is accessed, internal buffer manager locks are
+ taken, and a copy of the buffer cache data is made for the view to display.
+ This ensures that the view produces a consistent set of results, while not
+ blocking normal buffer activity longer than necessary. Nonetheless there
+ could be some impact on database performance if this view is read often.
+ </para>
+ </sect2>
+
+ <sect2>
+ <title>Sample output</title>
+ <programlisting>
+ regression=# \d pg_buffercache;
+ View "public.pg_buffercache"
+ Column | Type | Modifiers
+ ----------------+----------+-----------
+ bufferid | integer |
+ relfilenode | oid |
+ reltablespace | oid |
+ reldatabase | oid |
+ relblocknumber | bigint |
+ isdirty | boolean |
+ usagecount | smallint |
+
+ View definition:
+ SELECT p.bufferid, p.relfilenode, p.reltablespace, p.reldatabase,
+ p.relblocknumber, p.isdirty, p.usagecount
+ FROM pg_buffercache_pages() p(bufferid integer, relfilenode oid,
+ reltablespace oid, reldatabase oid, relblocknumber bigint,
+ isdirty boolean, usagecount smallint);
+
+ regression=# SELECT c.relname, count(*) AS buffers
+ FROM pg_class c INNER JOIN pg_buffercache b
+ ON b.relfilenode = c.relfilenode INNER JOIN pg_database d
+ ON (b.reldatabase = d.oid AND d.datname = current_database())
+ GROUP BY c.relname
+ ORDER BY 2 DESC LIMIT 10;
+ relname | buffers
+ ---------------------------------+---------
+ tenk2 | 345
+ tenk1 | 141
+ pg_proc | 46
+ pg_class | 45
+ pg_attribute | 43
+ pg_class_relname_nsp_index | 30
+ pg_proc_proname_args_nsp_index | 28
+ pg_attribute_relid_attnam_index | 26
+ pg_depend | 22
+ pg_depend_reference_index | 20
+ (10 rows)
+
+ regression=#
+ </programlisting>
+ </sect2>
+
+ <sect2>
+ <title>Authors</title>
+ <itemizedlist>
+ <listitem>
+ <para>
+ Mark Kirkwood <email>markir@paradise.net.nz</email>
+ </para>
+ </listitem>
+ <listitem>
+ <para>Design suggestions: Neil Conway <email>neilc@samurai.com</email></para>
+ </listitem>
+ <listitem>
+ <para>Debugging advice: Tom Lane <email>tgl@sss.pgh.pa.us</email></para>
+ </listitem>
+ </itemizedlist>
+ </sect2>
+
+</sect1>
+
diff --git a/doc/src/sgml/chkpass.sgml b/doc/src/sgml/chkpass.sgml
new file mode 100644
index 00000000000..e0179b3971c
--- /dev/null
+++ b/doc/src/sgml/chkpass.sgml
@@ -0,0 +1,84 @@
+<sect1 id="chkpass">
+ <title>chkpass</title>
+
+ <!--
+ <indexterm zone="chkpass">
+ <primary>chkpass</primary>
+ </indexterm>
+ -->
+ <para>
+ chkpass is a password type that is automatically checked and converted upon
+ entry. It is stored encrypted. To compare, simply compare against a clear
+ text password and the comparison function will encrypt it before comparing.
+ It also returns an error if the code determines that the password is easily
+ crackable. This is currently a stub that does nothing.
+ </para>
+
+ <para>
+ Note that the chkpass data type is not indexable.
+ <!--
+ I haven't worried about making this type indexable. I doubt that anyone
+ would ever need to sort a file in order of encrypted password.
+ -->
+ </para>
+
+ <para>
+ If you precede the string with a colon, the encryption and checking are
+ skipped so that you can enter existing passwords into the field.
+ </para>
+
+ <para>
+ On output, a colon is prepended. This makes it possible to dump and reload
+ passwords without re-encrypting them. If you want the password (encrypted)
+ without the colon then use the raw() function. This allows you to use the
+ type with things like Apache's Auth_PostgreSQL module.
+ </para>
+
+ <para>
+ The encryption uses the standard Unix function crypt(), and so it suffers
+ from all the usual limitations of that function; notably that only the
+ first eight characters of a password are considered.
+ </para>
+
+ <para>
+ Here is some sample usage:
+ </para>
+
+ <programlisting>
+test=# create table test (p chkpass);
+CREATE TABLE
+test=# insert into test values ('hello');
+INSERT 0 1
+test=# select * from test;
+ p
+----------------
+ :dVGkpXdOrE3ko
+(1 row)
+
+test=# select raw(p) from test;
+ raw
+---------------
+ dVGkpXdOrE3ko
+(1 row)
+
+test=# select p = 'hello' from test;
+ ?column?
+----------
+ t
+(1 row)
+
+test=# select p = 'goodbye' from test;
+ ?column?
+----------
+ f
+(1 row)
+ </programlisting>
+
+ <sect2>
+ <title>Author</title>
+ <para>
+ D'Arcy J.M. Cain <email>darcy@druid.net</email>
+ </para>
+ </sect2>
+</sect1>
+
diff --git a/doc/src/sgml/contrib.sgml b/doc/src/sgml/contrib.sgml
new file mode 100644
index 00000000000..54c66d33a5c
--- /dev/null
+++ b/doc/src/sgml/contrib.sgml
@@ -0,0 +1,56 @@
+<chapter id="contrib">
+ <title>Standard Modules</title>
+
+ <para>
+ This section contains information regarding the standard modules which
+ can be found in the <literal>contrib</literal> directory of the
+ PostgreSQL distribution. These are porting tools, analysis utilities,
+ and plug-in features that are not part of the core PostgreSQL system,
+ mainly because they address a limited audience or are too experimental
+ to be part of the main source tree. This does not preclude their
+ usefulness.
+ </para>
+
+ <para>
+ Some modules supply new user-defined functions, operators, or types. In
+ these cases, you will need to run <literal>make</literal> and <literal>make
+ install</literal> in <literal>contrib/module</literal>. After you have
+ installed the files you need to register the new entities in the database
+ system by running the commands in the supplied .sql file. For example,
+
+ <programlisting>
+ $ psql -d dbname -f module.sql
+ </programlisting>
+ </para>
+
+ &adminpack;
+ &btree-gist;
+ &chkpass;
+ &cube;
+ &dblink;
+ &earthdistance;
+ &fuzzystrmatch;
+ &hstore;
+ &intagg;
+ &intarray;
+ &isn;
+ &lo;
+ &ltree;
+ &oid2name;
+ &pageinspect;
+ &pgbench;
+ &buffercache;
+ &pgcrypto;
+ &freespacemap;
+ &pgrowlocks;
+ &standby;
+ &pgstattuple;
+ &trgm;
+ &seg;
+ &sslinfo;
+ &tablefunc;
+ &uuid-ossp;
+ &vacuumlo;
+ &xml2;
+</chapter>
+
diff --git a/doc/src/sgml/cube.sgml b/doc/src/sgml/cube.sgml
new file mode 100644
index 00000000000..da19ae204af
--- /dev/null
+++ b/doc/src/sgml/cube.sgml
@@ -0,0 +1,529 @@
+
+<sect1 id="cube">
+ <title>cube</title>
+
+ <indexterm zone="cube">
+ <primary>cube</primary>
+ </indexterm>
+
+ <para>
+ This module contains the user-defined type, CUBE, representing
+ multidimensional cubes.
+ </para>
+
+ <sect2>
+ <title>Syntax</title>
+
+ <para>
+ The following are valid external representations for the CUBE type:
+ </para>
+
+ <table>
+ <title>Cube external representations</title>
+ <tgroup cols="2">
+ <tbody>
+ <row>
+ <entry>'x'</entry>
+ <entry>A floating point value representing a one-dimensional point or
+ one-dimensional zero length cubement
+ </entry>
+ </row>
+ <row>
+ <entry>'(x)'</entry>
+ <entry>Same as above</entry>
+ </row>
+ <row>
+ <entry>'x1,x2,x3,...,xn'</entry>
+ <entry>A point in n-dimensional space, represented internally as a zero
+ volume box
+ </entry>
+ </row>
+ <row>
+ <entry>'(x1,x2,x3,...,xn)'</entry>
+ <entry>Same as above</entry>
+ </row>
+ <row>
+ <entry>'(x),(y)'</entry>
+ <entry>1-D cubement starting at x and ending at y or vice versa; the
+ order does not matter
+ </entry>
+ </row>
+ <row>
+ <entry>'(x1,...,xn),(y1,...,yn)'</entry>
+ <entry>n-dimensional box represented by a pair of its opposite corners, no
+ matter which. Functions take care of swapping to achieve "lower left --
+ upper right" representation before computing any values
+ </entry>
+ </row>
+ </tbody>
+ </tgroup>
+ </table>
+ </sect2>
+
+ <sect2>
+ <title>Grammar</title>
+ <table>
+ <title>Cube Grammar Rules</title>
+ <tgroup cols="2">
+ <tbody>
+ <row>
+ <entry>rule 1</entry>
+ <entry>box -> O_BRACKET paren_list COMMA paren_list C_BRACKET</entry>
+ </row>
+ <row>
+ <entry>rule 2</entry>
+ <entry>box -> paren_list COMMA paren_list</entry>
+ </row>
+ <row>
+ <entry>rule 3</entry>
+ <entry>box -> paren_list</entry>
+ </row>
+ <row>
+ <entry>rule 4</entry>
+ <entry>box -> list</entry>
+ </row>
+ <row>
+ <entry>rule 5</entry>
+ <entry>paren_list -> O_PAREN list C_PAREN</entry>
+ </row>
+ <row>
+ <entry>rule 6</entry>
+ <entry>list -> FLOAT</entry>
+ </row>
+ <row>
+ <entry>rule 7</entry>
+ <entry>list -> list COMMA FLOAT</entry>
+ </row>
+ </tbody>
+ </tgroup>
+ </table>
+ </sect2>
+
+ <sect2>
+ <title>Tokens</title>
+ <table>
+ <title>Cube Grammar Rules</title>
+ <tgroup cols="2">
+ <tbody>
+ <row>
+ <entry>n</entry>
+ <entry>[0-9]+</entry>
+ </row>
+ <row>
+ <entry>i</entry>
+ <entry>nteger [+-]?{n}</entry>
+ </row>
+ <row>
+ <entry>real</entry>
+ <entry>[+-]?({n}\.{n}?|\.{n})</entry>
+ </row>
+ <row>
+ <entry>FLOAT</entry>
+ <entry>({integer}|{real})([eE]{integer})?</entry>
+ </row>
+ <row>
+ <entry>O_BRACKET</entry>
+ <entry>\[</entry>
+ </row>
+ <row>
+ <entry>C_BRACKET</entry>
+ <entry>\]</entry>
+ </row>
+ <row>
+ <entry>O_PAREN</entry>
+ <entry>\(</entry>
+ </row>
+ <row>
+ <entry>C_PAREN</entry>
+ <entry>\)</entry>
+ </row>
+ <row>
+ <entry>COMMA</entry>
+ <entry>\,</entry>
+ </row>
+ </tbody>
+ </tgroup>
+ </table>
+ </sect2>
+
+ <sect2>
+ <title>Examples</title>
+ <table>
+ <title>Examples</title>
+ <tgroup cols="2">
+ <tbody>
+ <row>
+ <entry>'x'</entry>
+ <entry>A floating point value representing a one-dimensional point
+ (or, zero-length one-dimensional interval)
+ </entry>
+ </row>
+ <row>
+ <entry>'(x)'</entry>
+ <entry>Same as above</entry>
+ </row>
+ <row>
+ <entry>'x1,x2,x3,...,xn'</entry>
+ <entry>A point in n-dimensional space,represented internally as a zero
+ volume cube
+ </entry>
+ </row>
+ <row>
+ <entry>'(x1,x2,x3,...,xn)'</entry>
+ <entry>Same as above</entry>
+ </row>
+ <row>
+ <entry>'(x),(y)'</entry>
+ <entry>A 1-D interval starting at x and ending at y or vice versa; the
+ order does not matter
+ </entry>
+ </row>
+ <row>
+ <entry>'[(x),(y)]'</entry>
+ <entry>Same as above</entry>
+ </row>
+ <row>
+ <entry>'(x1,...,xn),(y1,...,yn)'</entry>
+ <entry>An n-dimensional box represented by a pair of its diagonally
+ opposite corners, regardless of order. Swapping is provided
+ by all comarison routines to ensure the
+ "lower left -- upper right" representation
+ before actaul comparison takes place.
+ </entry>
+ </row>
+ <row>
+ <entry>'[(x1,...,xn),(y1,...,yn)]'</entry>
+ <entry>Same as above</entry>
+ </row>
+ </tbody>
+ </tgroup>
+ </table>
+ <para>
+ White space is ignored, so '[(x),(y)]' can be: '[ ( x ), ( y ) ]'
+ </para>
+ </sect2>
+ <sect2>
+ <title>Defaults</title>
+ <para>
+ I believe this union:
+ </para>
+<programlisting>
+select cube_union('(0,5,2),(2,3,1)','0');
+cube_union
+-------------------
+(0, 0, 0),(2, 5, 2)
+(1 row)
+</programlisting>
+
+ <para>
+ does not contradict to the common sense, neither does the intersection
+ </para>
+
+<programlisting>
+select cube_inter('(0,-1),(1,1)','(-2),(2)');
+cube_inter
+-------------
+(0, 0),(1, 0)
+(1 row)
+</programlisting>
+
+ <para>
+ In all binary operations on differently sized boxes, I assume the smaller
+ one to be a cartesian projection, i. e., having zeroes in place of coordinates
+ omitted in the string representation. The above examples are equivalent to:
+ </para>
+
+<programlisting>
+cube_union('(0,5,2),(2,3,1)','(0,0,0),(0,0,0)');
+cube_inter('(0,-1),(1,1)','(-2,0),(2,0)');
+</programlisting>
+
+ <para>
+ The following containment predicate uses the point syntax,
+ while in fact the second argument is internally represented by a box.
+ This syntax makes it unnecessary to define the special Point type
+ and functions for (box,point) predicates.
+ </para>
+
+<programlisting>
+select cube_contains('(0,0),(1,1)', '0.5,0.5');
+cube_contains
+--------------
+t
+(1 row)
+</programlisting>
+ </sect2>
+ <sect2>
+ <title>Precision</title>
+ <para>
+Values are stored internally as 64-bit floating point numbers. This means that
+numbers with more than about 16 significant digits will be truncated.
+ </para>
+ </sect2>
+
+ <sect2>
+ <title>Usage</title>
+ <para>
+ The access method for CUBE is a GiST index (gist_cube_ops), which is a
+ generalization of R-tree. GiSTs allow the postgres implementation of
+ R-tree, originally encoded to support 2-D geometric types such as
+ boxes and polygons, to be used with any data type whose data domain
+ can be partitioned using the concepts of containment, intersection and
+ equality. In other words, everything that can intersect or contain
+ its own kind can be indexed with a GiST. That includes, among other
+ things, all geometric data types, regardless of their dimensionality
+ (see also contrib/seg).
+ </para>
+
+ <para>
+ The operators supported by the GiST access method include:
+ </para>
+
+ <programlisting>
+a = b Same as
+ </programlisting>
+ <para>
+ The cubements a and b are identical.
+ </para>
+
+ <programlisting>
+a && b Overlaps
+ </programlisting>
+ <para>
+ The cubements a and b overlap.
+ </para>
+
+ <programlisting>
+a @> b Contains
+ </programlisting>
+ <para>
+ The cubement a contains the cubement b.
+ </para>
+
+ <programlisting>
+a <@ b Contained in
+ </programlisting>
+ <para>
+ The cubement a is contained in b.
+ </para>
+
+ <para>
+ (Before PostgreSQL 8.2, the containment operators @> and <@ were
+ respectively called @ and ~. These names are still available, but are
+ deprecated and will eventually be retired. Notice that the old names
+ are reversed from the convention formerly followed by the core geometric
+ datatypes!)
+ </para>
+
+ <para>
+ Although the mnemonics of the following operators is questionable, I
+ preserved them to maintain visual consistency with other geometric
+ data types defined in Postgres.
+ </para>
+
+ <para>
+ Other operators:
+ </para>
+
+ <programlisting>
+[a, b] < [c, d] Less than
+[a, b] > [c, d] Greater than
+ </programlisting>
+
+ <para>
+ These operators do not make a lot of sense for any practical
+ purpose but sorting. These operators first compare (a) to (c),
+ and if these are equal, compare (b) to (d). That accounts for
+ reasonably good sorting in most cases, which is useful if
+ you want to use ORDER BY with this type
+ </para>
+
+ <para>
+ The following functions are available:
+ </para>
+
+ <table>
+ <title>Functions available</title>
+ <tgroup cols="2">
+ <tbody>
+ <row>
+ <entry><literal>cube_distance(cube, cube) returns double</literal></entry>
+ <entry>cube_distance returns the distance between two cubes. If both
+ cubes are points, this is the normal distance function.
+ </entry>
+ </row>
+ <row>
+ <entry><literal>cube(float8) returns cube</literal></entry>
+ <entry>This makes a one dimensional cube with both coordinates the same.
+ If the type of the argument is a numeric type other than float8 an
+ explicit cast to float8 may be needed.
+ <literal>cube(1) == '(1)'</literal>
+ </entry>
+ </row>
+
+ <row>
+ <entry><literal>cube(float8, float8) returns cube</literal></entry>
+ <entry>
+ This makes a one dimensional cube.
+ <literal>cube(1,2) == '(1),(2)'</literal>
+ </entry>
+ </row>
+
+ <row>
+ <entry><literal>cube(float8[]) returns cube</literal></entry>
+ <entry>This makes a zero-volume cube using the coordinates
+ defined by thearray.<literal>cube(ARRAY[1,2]) == '(1,2)'</literal>
+ </entry>
+ </row>
+
+ <row>
+ <entry><literal>cube(float8[], float8[]) returns cube</literal></entry>
+ <entry>This makes a cube, with upper right and lower left
+ coordinates as defined by the 2 float arrays. Arrays must be of the
+ same length.
+ <literal>cube('{1,2}'::float[], '{3,4}'::float[]) == '(1,2),(3,4)'
+ </literal>
+ </entry>
+ </row>
+
+ <row>
+ <entry><literal>cube(cube, float8) returns cube</literal></entry>
+ <entry>This builds a new cube by adding a dimension on to an
+ existing cube with the same values for both parts of the new coordinate.
+ This is useful for building cubes piece by piece from calculated values.
+ <literal>cube('(1)',2) == '(1,2),(1,2)'</literal>
+ </entry>
+ </row>
+
+ <row>
+ <entry><literal>cube(cube, float8, float8) returns cube</literal></entry>
+ <entry>This builds a new cube by adding a dimension on to an
+ existing cube. This is useful for building cubes piece by piece from
+ calculated values. <literal>cube('(1,2)',3,4) == '(1,3),(2,4)'</literal>
+ </entry>
+ </row>
+
+ <row>
+ <entry><literal>cube_dim(cube) returns int</literal></entry>
+ <entry>cube_dim returns the number of dimensions stored in the
+ the data structure
+ for a cube. This is useful for constraints on the dimensions of a cube.
+ </entry>
+ </row>
+
+ <row>
+ <entry><literal>cube_ll_coord(cube, int) returns double </literal></entry>
+ <entry>
+ cube_ll_coord returns the nth coordinate value for the lower left
+ corner of a cube. This is useful for doing coordinate transformations.
+ </entry>
+ </row>
+
+ <row>
+ <entry><literal>cube_ur_coord(cube, int) returns double
+ </literal></entry>
+ <entry>cube_ur_coord returns the nth coordinate value for the
+ upper right corner of a cube. This is useful for doing coordinate
+ transformations.
+ </entry>
+ </row>
+
+ <row>
+ <entry><literal>cube_subset(cube, int[]) returns cube
+ </literal></entry>
+ <entry>Builds a new cube from an existing cube, using a list of
+ dimension indexes
+ from an array. Can be used to find both the ll and ur coordinate of single
+ dimenion, e.g.: cube_subset(cube('(1,3,5),(6,7,8)'), ARRAY[2]) = '(3),(7)'
+ Or can be used to drop dimensions, or reorder them as desired, e.g.:
+ cube_subset(cube('(1,3,5),(6,7,8)'), ARRAY[3,2,1,1]) =
+ '(5, 3, 1, 1),(8, 7, 6, 6)'
+ </entry>
+ </row>
+
+ <row>
+ <entry><literal>cube_is_point(cube) returns bool</literal></entry>
+ <entry>cube_is_point returns true if a cube is also a point.
+ This is true when the two defining corners are the same.</entry>
+ </row>
+
+ <row>
+ <entry><literal>cube_enlarge(cube, double, int) returns cube</literal></entry>
+ <entry>
+ cube_enlarge increases the size of a cube by a specified
+ radius in at least
+ n dimensions. If the radius is negative the box is shrunk instead. This
+ is useful for creating bounding boxes around a point for searching for
+ nearby points. All defined dimensions are changed by the radius. If n
+ is greater than the number of defined dimensions and the cube is being
+ increased (r >= 0) then 0 is used as the base for the extra coordinates.
+ LL coordinates are decreased by r and UR coordinates are increased by r.
+ If a LL coordinate is increased to larger than the corresponding UR
+ coordinate (this can only happen when r < 0) than both coordinates are
+ set to their average. To make it harder for people to break things there
+ is an effective maximum on the dimension of cubes of 100. This is set
+ in cubedata.h if you need something bigger.
+ </entry>
+ </row>
+ </tbody>
+ </tgroup>
+ </table>
+
+ <para>
+ There are a few other potentially useful functions defined in cube.c
+ that vanished from the schema because I stopped using them. Some of
+ these were meant to support type casting. Let me know if I was wrong:
+ I will then add them back to the schema. I would also appreciate
+ other ideas that would enhance the type and make it more useful.
+ </para>
+
+ <para>
+ For examples of usage, see sql/cube.sql
+ </para>
+ </sect2>
+
+ <sect2>
+ <title>Credits</title>
+ <para>
+ This code is essentially based on the example written for
+ Illustra, <ulink url="http://garcia.me.berkeley.edu/~adong/rtree"></ulink>
+ </para>
+ <para>
+ My thanks are primarily to Prof. Joe Hellerstein
+ (<ulink url="http://db.cs.berkeley.edu/~jmh/"></ulink>) for elucidating the
+ gist of the GiST (<ulink url="http://gist.cs.berkeley.edu/"></ulink>), and
+ to his former student, Andy Dong
+ (<ulink url="http://best.me.berkeley.edu/~adong/"></ulink>), for his exemplar.
+ I am also grateful to all postgres developers, present and past, for enabling
+ myself to create my own world and live undisturbed in it. And I would like to
+ acknowledge my gratitude to Argonne Lab and to the U.S. Department of Energy
+ for the years of faithful support of my database research.
+ </para>
+
+ <para>
+ Gene Selkov, Jr.
+ Computational Scientist
+ Mathematics and Computer Science Division
+ Argonne National Laboratory
+ 9700 S Cass Ave.
+ Building 221
+ Argonne, IL 60439-4844
+ <email>selkovjr@mcs.anl.gov</email>
+ </para>
+
+ <para>
+ Minor updates to this package were made by Bruno Wolff III
+ <email>bruno@wolff.to</email> in August/September of 2002. These include
+ changing the precision from single precision to double precision and adding
+ some new functions.
+ </para>
+
+ <para>
+ Additional updates were made by Joshua Reich <email>josh@root.net</email> in
+ July 2006. These include <literal>cube(float8[], float8[])</literal> and
+ cleaning up the code to use the V1 call protocol instead of the deprecated V0
+ form.
+ </para>
+ </sect2>
+</sect1>
+
diff --git a/doc/src/sgml/dblink.sgml b/doc/src/sgml/dblink.sgml
new file mode 100644
index 00000000000..095d600099b
--- /dev/null
+++ b/doc/src/sgml/dblink.sgml
@@ -0,0 +1,1312 @@
+<!-- <reference> -->
+<sect1 id="dblink">
+ <title>dblink</title>
+ <para>
+ dblink is a contrib module which allows connections with
+ other databases.
+ </para>
+
+ <!-- dblink_connect -->
+
+ <refentry>
+ <refnamediv>
+ <refname>dblink_connect</refname>
+ <refpurpose>opens a persistent connection to a remote database</refpurpose>
+ </refnamediv>
+
+ <refsynopsisdiv>
+ <synopsis>
+ dblink_connect(text connstr)
+ dblink_connect(text connname, text connstr)
+ </synopsis>
+ </refsynopsisdiv>
+
+ <refsect1>
+ <title>Inputs</title>
+
+ <refsect2>
+ <title>connname</title>
+ <para>
+ if 2 arguments ar given, the first is used as a name for a persistent
+ connection
+ </para>
+ </refsect2>
+
+ <refsect2>
+ <title>connstr</title>
+ <para>
+ standard libpq format connection string,
+ e.g. "hostaddr=127.0.0.1 port=5432 dbname=mydb user=postgres password=mypasswd"
+ </para>
+ <para>
+ if only one argument is given, the connection is unnamed; only one unnamed
+ connection can exist at a time
+ </para>
+ </refsect2>
+ </refsect1>
+
+ <refsect1>
+ <title>Outputs</title>
+ <para>Returns status = "OK"</para>
+ </refsect1>
+
+ <refsect1>
+ <title>Example</title>
+ <programlisting>
+ select dblink_connect('dbname=postgres');
+ dblink_connect
+ ----------------
+ OK
+ (1 row)
+
+ select dblink_connect('myconn','dbname=postgres');
+ dblink_connect
+ ----------------
+ OK
+ (1 row)
+ </programlisting>
+ </refsect1>
+ </refentry>
+
+ <!-- dblink_disconnect -->
+ <refentry>
+ <refnamediv>
+ <refname>dblink_disconnect</refname>
+ <refpurpose>closes a persistent connection to a remote database</refpurpose>
+ </refnamediv>
+
+ <refsynopsisdiv>
+ <synopsis>
+ dblink_disconnect()
+ dblink_disconnect(text connname)
+ </synopsis>
+ </refsynopsisdiv>
+
+ <refsect1>
+ <title>Inputs</title>
+
+ <refsect2>
+ <title>connname</title>
+ <para>
+ if an argument is given, it is used as a name for a persistent
+ connection to close; otherwiase the unnamed connection is closed
+ </para>
+ </refsect2>
+ </refsect1>
+
+ <refsect1>
+ <title>Outputs</title>
+ <para>Returns status = "OK"</para>
+ </refsect1>
+
+ <refsect1>
+ <title>Example</title>
+ <programlisting>
+ test=# select dblink_disconnect();
+ dblink_disconnect
+ -------------------
+ OK
+ (1 row)
+
+ select dblink_disconnect('myconn');
+ dblink_disconnect
+ -------------------
+ OK
+ (1 row)
+ </programlisting>
+ </refsect1>
+ </refentry>
+
+ <!-- dblink_open -->
+ <refentry>
+ <refnamediv>
+ <refname>dblink_open</refname>
+ <refpurpose>opens a cursor on a remote database</refpurpose>
+ </refnamediv>
+
+ <refsynopsisdiv>
+ <synopsis>
+ dblink_open(text cursorname, text sql [, bool fail_on_error])
+ dblink_open(text connname, text cursorname, text sql [, bool fail_on_error])
+ </synopsis>
+ </refsynopsisdiv>
+
+ <refsect1>
+ <title>Inputs</title>
+
+ <refsect2>
+ <title>connname</title>
+ <para>
+ if three arguments are present, the first is taken as the specific
+ connection name to use; otherwise the unnamed connection is assumed
+ </para>
+ </refsect2>
+
+ <refsect2>
+ <title>cursorname</title>
+ <para>
+ a reference name for the cursor
+ </para>
+ </refsect2>
+
+ <refsect2>
+ <title>sql</title>
+ <para>
+ sql statement that you wish to execute on the remote host
+ e.g. "select * from pg_class"
+ </para>
+ </refsect2>
+
+ <refsect2>
+ <title>fail_on_error</title>
+ <para>
+ If true (default when not present) then an ERROR thrown on the remote side
+ of the connection causes an ERROR to also be thrown locally. If false, the
+ remote ERROR is locally treated as a NOTICE, and the return value is set
+ to 'ERROR'.
+ </para>
+ </refsect2>
+ </refsect1>
+
+ <refsect1>
+ <title>Outputs</title>
+ <para>Returns status = "OK"</para>
+ </refsect1>
+
+ <refsect1>
+ <title>Note</title>
+ <itemizedlist>
+ <listitem>
+ <para>
+ dblink_connect(text connstr) must be executed first
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ dblink_open starts an explicit transaction. If, after using dblink_open,
+ you use dblink_exec to change data, and then an error occurs or you use
+ dblink_disconnect without a dblink_close first, your change *will* be
+ lost. Also, using dblink_close explicitly ends the transaction and thus
+ effectively closes *all* open cursors.
+ </para>
+ </listitem>
+ </itemizedlist>
+
+ </refsect1>
+ <refsect1>
+ <title>Example</title>
+ <programlisting>
+ test=# select dblink_connect('dbname=postgres');
+ dblink_connect
+ ----------------
+ OK
+ (1 row)
+
+ test=# select dblink_open('foo','select proname, prosrc from pg_proc');
+ dblink_open
+ -------------
+ OK
+ (1 row)
+ </programlisting>
+ </refsect1>
+ </refentry>
+
+ <!-- dblink_fetch -->
+ <refentry>
+ <refnamediv>
+ <refname>dblink_fetch</refname>
+ <refpurpose>returns a set from an open cursor on a remote database</refpurpose>
+ </refnamediv>
+
+ <refsynopsisdiv>
+ <synopsis>
+ dblink_fetch(text cursorname, int32 howmany [, bool fail_on_error])
+ dblink_fetch(text connname, text cursorname, int32 howmany [, bool fail_on_error])
+ </synopsis>
+ </refsynopsisdiv>
+
+ <refsect1>
+ <title>Inputs</title>
+
+ <refsect2>
+ <title>connname</title>
+ <para>
+ if three arguments are present, the first is taken as the specific
+ connection name to use; otherwise the unnamed connection is assumed
+ </para>
+ </refsect2>
+
+ <refsect2>
+ <title>cursorname</title>
+ <para>
+ The reference name for the cursor
+ </para>
+ </refsect2>
+
+ <refsect2>
+ <title>howmany</title>
+ <para>
+ Maximum number of rows to retrieve. The next howmany rows are fetched,
+ starting at the current cursor position, moving forward. Once the cursor
+ has positioned to the end, no more rows are produced.
+ </para>
+ </refsect2>
+
+ <refsect2>
+ <title>fail_on_error</title>
+ <para>
+ If true (default when not present) then an ERROR thrown on the remote side
+ of the connection causes an ERROR to also be thrown locally. If false, the
+ remote ERROR is locally treated as a NOTICE, and no rows are returned.
+ </para>
+ </refsect2>
+ </refsect1>
+
+ <refsect1>
+ <title>Outputs</title>
+ <para>Returns setof record</para>
+ </refsect1>
+
+ <refsect1>
+ <title>Note</title>
+ <para>
+ On a mismatch between the number of return fields as specified in the FROM
+ clause, and the actual number of fields returned by the remote cursor, an
+ ERROR will be thrown. In this event, the remote cursor is still advanced
+ by as many rows as it would have been if the ERROR had not occurred.
+ </para>
+ </refsect1>
+
+ <refsect1>
+ <title>Example</title>
+ <programlisting>
+ test=# select dblink_connect('dbname=postgres');
+ dblink_connect
+ ----------------
+ OK
+ (1 row)
+
+ test=# select dblink_open('foo','select proname, prosrc from pg_proc where proname like ''bytea%''');
+ dblink_open
+ -------------
+ OK
+ (1 row)
+
+ test=# select * from dblink_fetch('foo',5) as (funcname name, source text);
+ funcname | source
+ ----------+----------
+ byteacat | byteacat
+ byteacmp | byteacmp
+ byteaeq | byteaeq
+ byteage | byteage
+ byteagt | byteagt
+ (5 rows)
+
+ test=# select * from dblink_fetch('foo',5) as (funcname name, source text);
+ funcname | source
+ -----------+-----------
+ byteain | byteain
+ byteale | byteale
+ bytealike | bytealike
+ bytealt | bytealt
+ byteane | byteane
+ (5 rows)
+
+ test=# select * from dblink_fetch('foo',5) as (funcname name, source text);
+ funcname | source
+ ------------+------------
+ byteanlike | byteanlike
+ byteaout | byteaout
+ (2 rows)
+
+ test=# select * from dblink_fetch('foo',5) as (funcname name, source text);
+ funcname | source
+ ----------+--------
+ (0 rows)
+ </programlisting>
+ </refsect1>
+ </refentry>
+
+ <!-- dblink_close -->
+ <refentry>
+ <refnamediv>
+ <refname>dblink_close</refname>
+ <refpurpose>closes a cursor on a remote database</refpurpose>
+ </refnamediv>
+
+ <refsynopsisdiv>
+ <synopsis>
+ dblink_close(text cursorname [, bool fail_on_error])
+ dblink_close(text connname, text cursorname [, bool fail_on_error])
+ </synopsis>
+ </refsynopsisdiv>
+
+ <refsect1>
+ <title>Inputs</title>
+
+ <refsect2>
+ <title>connname</title>
+ <para>
+ if two arguments are present, the first is taken as the specific
+ connection name to use; otherwise the unnamed connection is assumed
+ </para>
+ </refsect2>
+
+ <refsect2>
+ <title>cursorname</title>
+ <para>
+ a reference name for the cursor
+ </para>
+ </refsect2>
+
+ <refsect2>
+ <title>fail_on_error</title>
+ <para>
+ If true (default when not present) then an ERROR thrown on the remote side
+ of the connection causes an ERROR to also be thrown locally. If false, the
+ remote ERROR is locally treated as a NOTICE, and the return value is set
+ to 'ERROR'.
+ </para>
+ </refsect2>
+ </refsect1>
+
+ <refsect1>
+ <title>Outputs</title>
+ <para>Returns status = "OK"</para>
+ </refsect1>
+
+ <refsect1>
+ <title>Note</title>
+ <para>
+ dblink_connect(text connstr) or dblink_connect(text connname, text connstr)
+ must be executed first.
+ </para>
+ </refsect1>
+
+ <refsect1>
+ <title>Example</title>
+ <programlisting>
+ test=# select dblink_connect('dbname=postgres');
+ dblink_connect
+ ----------------
+ OK
+ (1 row)
+
+ test=# select dblink_open('foo','select proname, prosrc from pg_proc');
+ dblink_open
+ -------------
+ OK
+ (1 row)
+
+ test=# select dblink_close('foo');
+ dblink_close
+ --------------
+ OK
+ (1 row)
+
+ select dblink_connect('myconn','dbname=regression');
+ dblink_connect
+ ----------------
+ OK
+ (1 row)
+
+ select dblink_open('myconn','foo','select proname, prosrc from pg_proc');
+ dblink_open
+ -------------
+ OK
+ (1 row)
+
+ select dblink_close('myconn','foo');
+ dblink_close
+ --------------
+ OK
+ (1 row)
+ </programlisting>
+ </refsect1>
+ </refentry>
+
+ <!-- dblink_exec -->
+ <refentry>
+ <refnamediv>
+ <refname>dblink_exec</refname>
+ <refpurpose>executes an UPDATE/INSERT/DELETE on a remote database</refpurpose>
+ </refnamediv>
+
+ <refsynopsisdiv>
+ <synopsis>
+ dblink_exec(text connstr, text sql [, bool fail_on_error])
+ dblink_exec(text connname, text sql [, bool fail_on_error])
+ dblink_exec(text sql [, bool fail_on_error])
+ </synopsis>
+ </refsynopsisdiv>
+
+ <refsect1>
+ <title>Inputs</title>
+
+ <refsect2>
+ <title>connname/connstr</title>
+ <para>
+ If two arguments are present, the first is first assumed to be a specific
+ connection name to use. If the name is not found, the argument is then
+ assumed to be a valid connection string, of standard libpq format,
+ e.g.: "hostaddr=127.0.0.1 dbname=mydb user=postgres password=mypasswd"
+
+ If only one argument is used, then the unnamed connection is used.
+ </para>
+ </refsect2>
+
+ <refsect2>
+ <title>sql</title>
+ <para>
+ sql statement that you wish to execute on the remote host, e.g.:
+ insert into foo values(0,'a','{"a0","b0","c0"}');
+ </para>
+ </refsect2>
+ <refsect2>
+ <title>fail_on_error</title>
+ <para>
+ If true (default when not present) then an ERROR thrown on the remote side
+ of the connection causes an ERROR to also be thrown locally. If false, the
+ remote ERROR is locally treated as a NOTICE, and the return value is set
+ to 'ERROR'.
+ </para>
+ </refsect2>
+ </refsect1>
+
+ <refsect1>
+ <title>Outputs</title>
+ <para>Returns status of the command, or 'ERROR' if the command failed.</para>
+ </refsect1>
+
+ <refsect1>
+ <title>Notes</title>
+ <para>
+ dblink_open starts an explicit transaction. If, after using dblink_open,
+ you use dblink_exec to change data, and then an error occurs or you use
+ dblink_disconnect without a dblink_close first, your change *will* be
+ lost.
+ </para>
+ </refsect1>
+
+ <refsect1>
+ <title>Example</title>
+ <programlisting>
+ select dblink_connect('dbname=dblink_test_slave');
+ dblink_connect
+ ----------------
+ OK
+ (1 row)
+
+ select dblink_exec('insert into foo values(21,''z'',''{"a0","b0","c0"}'');');
+ dblink_exec
+ -----------------
+ INSERT 943366 1
+ (1 row)
+
+ select dblink_connect('myconn','dbname=regression');
+ dblink_connect
+ ----------------
+ OK
+ (1 row)
+
+ select dblink_exec('myconn','insert into foo values(21,''z'',''{"a0","b0","c0"}'');');
+ dblink_exec
+ ------------------
+ INSERT 6432584 1
+ (1 row)
+
+ select dblink_exec('myconn','insert into pg_class values (''foo'')',false);
+ NOTICE: sql error
+ DETAIL: ERROR: null value in column "relnamespace" violates not-null constraint
+
+ dblink_exec
+ -------------
+ ERROR
+ (1 row)
+ </programlisting>
+ </refsect1>
+ </refentry>
+
+ <!-- dblink_current_query -->
+ <refentry>
+ <refnamediv>
+ <refname>dblink_current_query</refname>
+ <refpurpose>returns the current query string</refpurpose>
+ </refnamediv>
+
+ <refsynopsisdiv>
+ <synopsis>
+ dblink_current_query () RETURNS text
+ </synopsis>
+ </refsynopsisdiv>
+
+ <refsect1>
+ <title>Inputs</title>
+
+ <refsect2>
+ <title>None</title>
+ <para>
+ </para>
+ </refsect2>
+ </refsect1>
+
+ <refsect1>
+ <title>Outputs</title>
+ <para>Returns test -- a copy of the currenty executing query</para>
+ </refsect1>
+
+ <refsect1>
+ <title>Example</title>
+ <programlisting>
+ test=# select dblink_current_query() from (select dblink('dbname=postgres','select oid, proname from pg_proc where proname = ''byteacat''') as f1) as t1;
+ dblink_current_query
+ -----------------------------------------------------------------------------------------------------------------------------------------------------
+ select dblink_current_query() from (select dblink('dbname=postgres','select oid, proname from pg_proc where proname = ''byteacat''') as f1) as t1;
+ (1 row)
+ </programlisting>
+ </refsect1>
+ </refentry>
+
+ <!-- dblink_get_pkey -->
+ <refentry>
+ <refnamediv>
+ <refname>dblink_get_pkey</refname>
+ <refpurpose>returns the position and field names of a relation's
+ primary key fields
+ </refpurpose>
+ </refnamediv>
+
+ <refsynopsisdiv>
+ <synopsis>
+ dblink_get_pkey(text relname) RETURNS setof dblink_pkey_results
+ </synopsis>
+ </refsynopsisdiv>
+
+ <refsect1>
+ <title>Inputs</title>
+
+ <refsect2>
+ <title>relname</title>
+ <para>
+ any relation name;
+ e.g. 'foobar'
+ </para>
+ </refsect2>
+ </refsect1>
+
+ <refsect1>
+ <title>Outputs</title>
+ <para>
+ Returns setof dblink_pkey_results -- one row for each primary key field,
+ in order of position in the key. dblink_pkey_results is defined as follows:
+ CREATE TYPE dblink_pkey_results AS (position int4, colname text);
+ </para>
+ </refsect1>
+
+ <refsect1>
+ <title>Example</title>
+ <programlisting>
+ test=# select * from dblink_get_pkey('foobar');
+ position | colname
+ ----------+---------
+ 1 | f1
+ 2 | f2
+ 3 | f3
+ 4 | f4
+ 5 | f5
+ </programlisting>
+ </refsect1>
+ </refentry>
+
+ <!-- dblink_build_sql_insert -->
+ <refentry>
+ <refnamediv>
+ <refname>dblink_build_sql_insert</refname>
+ <refpurpose>
+ builds an insert statement using a local tuple, replacing the
+ selection key field values with alternate supplied values
+ </refpurpose>
+ </refnamediv>
+
+ <refsynopsisdiv>
+ <synopsis>
+ dblink_build_sql_insert(text relname
+ ,int2vector primary_key_attnums
+ ,int2 num_primary_key_atts
+ ,_text src_pk_att_vals_array
+ ,_text tgt_pk_att_vals_array) RETURNS text
+ </synopsis>
+ </refsynopsisdiv>
+
+ <refsect1>
+ <title>Inputs</title>
+
+ <refsect2>
+ <title>relname</title>
+ <para>
+ any relation name;
+ e.g. 'foobar';
+ </para>
+ </refsect2>
+ <refsect2>
+ <title>primary_key_attnums</title>
+ <para>
+ vector of primary key attnums (1 based, see pg_index.indkey);
+ e.g. '1 2'
+ </para>
+ </refsect2>
+ <refsect2>
+ <title>num_primary_key_atts</title>
+ <para>
+ number of primary key attnums in the vector; e.g. 2
+ </para>
+ </refsect2>
+ <refsect2>
+ <title>src_pk_att_vals_array</title>
+ <para>
+ array of primary key values, used to look up the local matching
+ tuple, the values of which are then used to construct the SQL
+ statement
+ </para>
+ </refsect2>
+ <refsect2>
+ <title>tgt_pk_att_vals_array</title>
+ <para>
+ array of primary key values, used to replace the local tuple
+ values in the SQL statement
+ </para>
+ </refsect2>
+ </refsect1>
+
+ <refsect1>
+ <title>Outputs</title>
+ <para>Returns text -- requested SQL statement</para>
+ </refsect1>
+
+ <refsect1>
+ <title>Example</title>
+ <programlisting>
+ test=# select dblink_build_sql_insert('foo','1 2',2,'{"1", "a"}','{"1", "b''a"}');
+ dblink_build_sql_insert
+ --------------------------------------------------
+ INSERT INTO foo(f1,f2,f3) VALUES('1','b''a','1')
+ (1 row)
+ </programlisting>
+ </refsect1>
+ </refentry>
+
+ <!-- dblink_build_sql_delete -->
+ <refentry>
+ <refnamediv>
+ <refname>dblink_build_sql_delete</refname>
+ <refpurpose>builds a delete statement using supplied values for selection
+ key field values
+ </refpurpose>
+ </refnamediv>
+
+ <refsynopsisdiv>
+ <synopsis>
+ dblink_build_sql_delete(text relname
+ ,int2vector primary_key_attnums
+ ,int2 num_primary_key_atts
+ ,_text tgt_pk_att_vals_array) RETURNS text
+ </synopsis>
+ </refsynopsisdiv>
+
+ <refsect1>
+ <title>Inputs</title>
+
+ <refsect2>
+ <title>relname</title>
+ <para>
+ any relation name;
+ e.g. 'foobar';
+ </para>
+ </refsect2>
+ <refsect2>
+ <title>primary_key_attnums</title>
+ <para>
+ vector of primary key attnums (1 based, see pg_index.indkey);
+ e.g. '1 2'
+ </para>
+ </refsect2>
+ <refsect2>
+ <title>num_primary_key_atts</title>
+ <para>
+ number of primary key attnums in the vector; e.g. 2
+ </para>
+ </refsect2>
+ <refsect2>
+ <title>src_pk_att_vals_array</title>
+ <para>
+ array of primary key values, used to look up the local matching
+ tuple, the values of which are then used to construct the SQL
+ statement
+ </para>
+ </refsect2>
+ <refsect2>
+ <title>tgt_pk_att_vals_array</title>
+ <para>
+ array of primary key values, used to replace the local tuple
+ values in the SQL statement
+ </para>
+ </refsect2>
+ </refsect1>
+
+ <refsect1>
+ <title>Outputs</title>
+ <para>Returns text -- requested SQL statement</para>
+ </refsect1>
+
+ <refsect1>
+ <title>Example</title>
+ <programlisting>
+ test=# select dblink_build_sql_delete('MyFoo','1 2',2,'{"1", "b"}');
+ dblink_build_sql_delete
+ ---------------------------------------------
+ DELETE FROM "MyFoo" WHERE f1='1' AND f2='b'
+ (1 row)
+ </programlisting>
+ </refsect1>
+ </refentry>
+
+ <!-- dblink_build_sql_update -->
+ <refentry>
+ <refnamediv>
+ <refname>dblink_build_sql_update</refname>
+ <refpurpose>builds an update statement using a local tuple, replacing
+ the selection key field values with alternate supplied values
+ </refpurpose>
+ </refnamediv>
+
+ <refsynopsisdiv>
+ <synopsis>
+ dblink_build_sql_update(text relname
+ ,int2vector primary_key_attnums
+ ,int2 num_primary_key_atts
+ ,_text src_pk_att_vals_array
+ ,_text tgt_pk_att_vals_array) RETURNS text
+ </synopsis>
+ </refsynopsisdiv>
+
+ <refsect1>
+ <title>Inputs</title>
+
+ <refsect2>
+ <title>relname</title>
+ <para>
+ any relation name;
+ e.g. 'foobar';
+ </para>
+ </refsect2>
+ <refsect2>
+ <title>primary_key_attnums</title>
+ <para>
+ vector of primary key attnums (1 based, see pg_index.indkey);
+ e.g. '1 2'
+ </para>
+ </refsect2>
+ <refsect2>
+ <title>num_primary_key_atts</title>
+ <para>
+ number of primary key attnums in the vector; e.g. 2
+ </para>
+ </refsect2>
+ <refsect2>
+ <title>src_pk_att_vals_array</title>
+ <para>
+ array of primary key values, used to look up the local matching
+ tuple, the values of which are then used to construct the SQL
+ statement
+ </para>
+ </refsect2>
+ <refsect2>
+ <title>tgt_pk_att_vals_array</title>
+ <para>
+ array of primary key values, used to replace the local tuple
+ values in the SQL statement
+ </para>
+ </refsect2>
+ </refsect1>
+
+ <refsect1>
+ <title>Outputs</title>
+ <para>Returns text -- requested SQL statement</para>
+ </refsect1>
+
+ <refsect1>
+ <title>Example</title>
+ <programlisting>
+ test=# select dblink_build_sql_update('foo','1 2',2,'{"1", "a"}','{"1", "b"}');
+ dblink_build_sql_update
+ -------------------------------------------------------------
+ UPDATE foo SET f1='1',f2='b',f3='1' WHERE f1='1' AND f2='b'
+ (1 row)
+ </programlisting>
+ </refsect1>
+ </refentry>
+
+ <!-- dblink_get_connections -->
+ <refentry>
+ <refnamediv>
+ <refname>dblink_get_connections</refname>
+ <refpurpose>returns a text array of all active named dblink connections</refpurpose>
+ </refnamediv>
+
+ <refsynopsisdiv>
+ <synopsis>
+ dblink_get_connections() RETURNS text[]
+ </synopsis>
+ </refsynopsisdiv>
+
+ <refsect1>
+ <title>Inputs</title>
+
+ <refsect2>
+ <title>none</title>
+ <para></para>
+ </refsect2>
+ </refsect1>
+
+ <refsect1>
+ <title>Outputs</title>
+ <para>Returns text array of all active named dblink connections</para>
+ </refsect1>
+
+ <refsect1>
+ <title>Example</title>
+ <programlisting>
+ SELECT dblink_get_connections();
+ </programlisting>
+ </refsect1>
+ </refentry>
+
+ <!-- dblinkd_is_busy -->
+ <refentry>
+ <refnamediv>
+ <refname>dblink_is_busy</refname>
+ <refpurpose>checks to see if named connection is busy with an async query</refpurpose>
+ </refnamediv>
+
+ <refsynopsisdiv>
+ <synopsis>
+ dblink_is_busy(text connname) RETURNS int
+ </synopsis>
+ </refsynopsisdiv>
+
+ <refsect1>
+ <title>Inputs</title>
+
+ <refsect2>
+ <title>connname</title>
+ <para>
+ The specific connection name to use
+ </para>
+ </refsect2>
+ </refsect1>
+
+ <refsect1>
+ <title>Outputs</title>
+ <para>
+ Returns 1 if connection is busy, 0 if it is not busy.
+ If this function returns 0, it is guaranteed that dblink_get_result
+ will not block.
+ </para>
+ </refsect1>
+
+ <refsect1>
+ <title>Example</title>
+ <programlisting>
+ SELECT dblink_is_busy('dtest1');
+ </programlisting>
+ </refsect1>
+ </refentry>
+
+ <!-- dblink_cancel_query -->
+ <refentry>
+ <refnamediv>
+ <refname>dblink_cancel_query</refname>
+ <refpurpose>cancels any active query on the named connection</refpurpose>
+ </refnamediv>
+
+ <refsynopsisdiv>
+ <synopsis>
+ dblink_cancel_query(text connname) RETURNS text
+ </synopsis>
+ </refsynopsisdiv>
+
+ <refsect1>
+ <title>Inputs</title>
+
+ <refsect2>
+ <title>connname</title>
+ <para>
+ The specific connection name to use.
+ </para>
+ </refsect2>
+ </refsect1>
+
+ <refsect1>
+ <title>Outputs</title>
+ <para>
+ Returns "OK" on success, or an error message on failure.
+ </para>
+ </refsect1>
+
+ <refsect1>
+ <title>Example</title>
+ <programlisting>
+ SELECT dblink_cancel_query('dtest1');
+ </programlisting>
+ </refsect1>
+ </refentry>
+
+ <!-- dblink_error_message -->
+ <refentry>
+ <refnamediv>
+ <refname>dblink_error_message</refname>
+ <refpurpose>gets last error message on the named connection</refpurpose>
+ </refnamediv>
+
+ <refsynopsisdiv>
+ <synopsis>
+ dblink_error_message(text connname) RETURNS text
+ </synopsis>
+ </refsynopsisdiv>
+
+ <refsect1>
+ <title>Inputs</title>
+
+ <refsect2>
+ <title>connname</title>
+ <para>
+ The specific connection name to use.
+ </para>
+ </refsect2>
+ </refsect1>
+
+ <refsect1>
+ <title>Outputs</title>
+ <para>
+ Returns last error message.
+ </para>
+ </refsect1>
+
+ <refsect1>
+ <title>Example</title>
+ <programlisting>
+ SELECT dblink_error_message('dtest1');
+ </programlisting>
+ </refsect1>
+ </refentry>
+
+ <!-- dblink -->
+ <refentry>
+ <refnamediv>
+ <refname>dblink</refname>
+ <refpurpose>returns a set from a remote database</refpurpose>
+ </refnamediv>
+
+ <refsynopsisdiv>
+ <synopsis>
+ dblink(text connstr, text sql [, bool fail_on_error])
+ dblink(text connname, text sql [, bool fail_on_error])
+ dblink(text sql [, bool fail_on_error])
+ </synopsis>
+ </refsynopsisdiv>
+
+ <refsect1>
+ <title>Inputs</title>
+
+ <refsect2>
+ <title>connname/connstr</title>
+ <para>
+ If two arguments are present, the first is first assumed to be a specific
+ connection name to use. If the name is not found, the argument is then
+ assumed to be a valid connection string, of standard libpq format,
+ e.g.: "hostaddr=127.0.0.1 dbname=mydb user=postgres password=mypasswd"
+
+ If only one argument is used, then the unnamed connection is used.
+ </para>
+ </refsect2>
+
+ <refsect2>
+ <title>sql</title>
+ <para>
+ sql statement that you wish to execute on the remote host
+ e.g. "select * from pg_class"
+ </para>
+ </refsect2>
+ <refsect2>
+ <title>fail_on_error</title>
+ <para>
+ If true (default when not present) then an ERROR thrown on the remote side
+ of the connection causes an ERROR to also be thrown locally. If false, the
+ remote ERROR is locally treated as a NOTICE, and no rows are returned.
+ </para>
+ </refsect2>
+ <refsect2>
+ <title></title>
+ <para>
+ </para>
+ </refsect2>
+ </refsect1>
+
+ <refsect1>
+ <title>Outputs</title>
+ <para>Returns setof record</para>
+ </refsect1>
+
+ <refsect1>
+ <title>Example</title>
+ <programlisting>
+ select * from dblink('dbname=postgres','select proname, prosrc from pg_proc')
+ as t1(proname name, prosrc text) where proname like 'bytea%';
+ proname | prosrc
+ ------------+------------
+ byteacat | byteacat
+ byteaeq | byteaeq
+ bytealt | bytealt
+ byteale | byteale
+ byteagt | byteagt
+ byteage | byteage
+ byteane | byteane
+ byteacmp | byteacmp
+ bytealike | bytealike
+ byteanlike | byteanlike
+ byteain | byteain
+ byteaout | byteaout
+ (12 rows)
+
+ select dblink_connect('dbname=postgres');
+ dblink_connect
+ ----------------
+ OK
+ (1 row)
+
+ select * from dblink('select proname, prosrc from pg_proc')
+ as t1(proname name, prosrc text) where proname like 'bytea%';
+ proname | prosrc
+ ------------+------------
+ byteacat | byteacat
+ byteaeq | byteaeq
+ bytealt | bytealt
+ byteale | byteale
+ byteagt | byteagt
+ byteage | byteage
+ byteane | byteane
+ byteacmp | byteacmp
+ bytealike | bytealike
+ byteanlike | byteanlike
+ byteain | byteain
+ byteaout | byteaout
+ (12 rows)
+
+ select dblink_connect('myconn','dbname=regression');
+ dblink_connect
+ ----------------
+ OK
+ (1 row)
+
+ select * from dblink('myconn','select proname, prosrc from pg_proc')
+ as t1(proname name, prosrc text) where proname like 'bytea%';
+ proname | prosrc
+ ------------+------------
+ bytearecv | bytearecv
+ byteasend | byteasend
+ byteale | byteale
+ byteagt | byteagt
+ byteage | byteage
+ byteane | byteane
+ byteacmp | byteacmp
+ bytealike | bytealike
+ byteanlike | byteanlike
+ byteacat | byteacat
+ byteaeq | byteaeq
+ bytealt | bytealt
+ byteain | byteain
+ byteaout | byteaout
+ (14 rows)
+ </programlisting>
+ <para>
+ A more convenient way to use dblink may be to create a view:
+ </para>
+ <programlisting>
+ create view myremote_pg_proc as
+ select *
+ from dblink('dbname=postgres','select proname, prosrc from pg_proc')
+ as t1(proname name, prosrc text);
+ </programlisting>
+ <para>
+ Then you can simply write:
+ </para>
+ <programlisting>
+ select * from myremote_pg_proc where proname like 'bytea%';
+ </programlisting>
+ </refsect1>
+ </refentry>
+
+ <!-- dblink_send_query -->
+ <refentry>
+ <refnamediv>
+ <refname>dblink_send_query</refname>
+ <refpurpose>sends an async query to a remote database</refpurpose>
+ </refnamediv>
+
+ <refsynopsisdiv>
+ <synopsis>
+ dblink_send_query(text connname, text sql)
+ </synopsis>
+ </refsynopsisdiv>
+
+ <refsect1>
+ <title>Inputs</title>
+
+ <refsect2>
+ <title>connname</title>
+ <para>
+ The specific connection name to use.
+ </para>
+ </refsect2>
+ <refsect2>
+ <title>sql</title>
+ <para>
+ sql statement that you wish to execute on the remote host
+ e.g. "select * from pg_class"
+ </para>
+ </refsect2>
+ </refsect1>
+
+ <refsect1>
+ <title>Outputs</title>
+ <para>
+ Returns int. A return value of 1 if the query was successfully dispatched,
+ 0 otherwise. If 1, results must be fetched by dblink_get_result(connname).
+ A running query may be cancelled by dblink_cancel_query(connname).
+ </para>
+ </refsect1>
+
+ <refsect1>
+ <title>Example</title>
+ <para>
+ <literal>
+ SELECT dblink_connect('dtest1', 'dbname=contrib_regression');
+ SELECT * FROM
+ dblink_send_query('dtest1', 'SELECT * FROM foo WHERE f1 < 3') AS t1;
+ </literal>
+ </para>
+ </refsect1>
+ </refentry>
+
+ <!-- dblink_get_result -->
+ <refentry>
+ <refnamediv>
+ <refname>dblink_get_result</refname>
+ <refpurpose>gets an async query result</refpurpose>
+ </refnamediv>
+
+ <refsynopsisdiv>
+ <synopsis>
+ dblink_get_result(text connname [, bool fail_on_error])
+ </synopsis>
+ </refsynopsisdiv>
+
+ <refsect1>
+ <title>Inputs</title>
+
+ <refsect2>
+ <title>connname</title>
+ <para>
+ The specific connection name to use. An asynchronous query must
+ have already been sent using dblink_send_query()
+ </para>
+ </refsect2>
+ <refsect2>
+ <title>fail_on_error</title>
+ <para>
+ If true (default when not present) then an ERROR thrown on the remote side
+ of the connection causes an ERROR to also be thrown locally. If false, the
+ remote ERROR is locally treated as a NOTICE, and no rows are returned.
+ </para>
+ </refsect2>
+ </refsect1>
+
+ <refsect1>
+ <title>Outputs</title>
+ <para>Returns setof record</para>
+ </refsect1>
+
+ <refsect1>
+ <title>Notes</title>
+ <para>
+ Blocks until a result gets available.
+
+ This function *must* be called if dblink_send_query returned
+ a 1, even on cancelled queries - otherwise the connection
+ can't be used anymore. It must be called once for each query
+ sent, and one additional time to obtain an empty set result,
+ prior to using the connection again.
+ </para>
+ </refsect1>
+
+ <refsect1>
+ <title>Example</title>
+ <programlisting>
+ contrib_regression=# SELECT dblink_connect('dtest1', 'dbname=contrib_regression');
+ dblink_connect
+ ----------------
+ OK
+ (1 row)
+
+ contrib_regression=# SELECT * from
+ contrib_regression-# dblink_send_query('dtest1', 'select * from foo where f1 < 3') as t1;
+ t1
+ ----
+ 1
+ (1 row)
+
+ contrib_regression=# SELECT * from dblink_get_result('dtest1') as t1(f1 int, f2 text, f3 text[]);
+ f1 | f2 | f3
+ ----+----+------------
+ 0 | a | {a0,b0,c0}
+ 1 | b | {a1,b1,c1}
+ 2 | c | {a2,b2,c2}
+ (3 rows)
+
+ contrib_regression=# SELECT * from dblink_get_result('dtest1') as t1(f1 int, f2 text, f3 text[]);
+ f1 | f2 | f3
+ ----+----+----
+ (0 rows)
+
+ contrib_regression=# SELECT * from
+ dblink_send_query('dtest1', 'select * from foo where f1 < 3; select * from foo where f1 > 6') as t1;
+ t1
+ ----
+ 1
+ (1 row)
+
+ contrib_regression=# SELECT * from dblink_get_result('dtest1') as t1(f1 int, f2 text, f3 text[]);
+ f1 | f2 | f3
+ ----+----+------------
+ 0 | a | {a0,b0,c0}
+ 1 | b | {a1,b1,c1}
+ 2 | c | {a2,b2,c2}
+ (3 rows)
+
+ contrib_regression=# SELECT * from dblink_get_result('dtest1') as t1(f1 int, f2 text, f3 text[]);
+ f1 | f2 | f3
+ ----+----+---------------
+ 7 | h | {a7,b7,c7}
+ 8 | i | {a8,b8,c8}
+ 9 | j | {a9,b9,c9}
+ 10 | k | {a10,b10,c10}
+ (4 rows)
+
+ contrib_regression=# SELECT * from dblink_get_result('dtest1') as t1(f1 int, f2 text, f3 text[]);
+ f1 | f2 | f3
+ ----+----+----
+ (0 rows)
+ </programlisting>
+ </refsect1>
+ </refentry>
+</sect1>
diff --git a/doc/src/sgml/earthdistance.sgml b/doc/src/sgml/earthdistance.sgml
new file mode 100644
index 00000000000..2d08bb829d6
--- /dev/null
+++ b/doc/src/sgml/earthdistance.sgml
@@ -0,0 +1,133 @@
+<sect1 id="earthdistance">
+ <title>earthdistance</title>
+
+ <indexterm zone="earthdistance">
+ <primary>earthdistance</primary>
+ </indexterm>
+
+ <para>
+ This module contains two different approaches to calculating
+ great circle distances on the surface of the Earth. The one described
+ first depends on the contrib/cube package (which MUST be installed before
+ earthdistance is installed). The second one is based on the point
+ datatype using latitude and longitude for the coordinates. The install
+ script makes the defined functions executable by anyone.
+ </para>
+ <para>
+ A spherical model of the Earth is used.
+ </para>
+ <para>
+ Data is stored in cubes that are points (both corners are the same) using 3
+ coordinates representing the distance from the center of the Earth.
+ </para>
+ <para>
+ The radius of the Earth is obtained from the earth() function. It is
+ given in meters. But by changing this one function you can change it
+ to use some other units or to use a different value of the radius
+ that you feel is more appropiate.
+ </para>
+ <para>
+ This package also has applications to astronomical databases as well.
+ Astronomers will probably want to change earth() to return a radius of
+ 180/pi() so that distances are in degrees.
+ </para>
+ <para>
+ Functions are provided to allow for input in latitude and longitude (in
+ degrees), to allow for output of latitude and longitude, to calculate
+ the great circle distance between two points and to easily specify a
+ bounding box usable for index searches.
+ </para>
+ <para>
+ The functions are all 'sql' functions. If you want to make these functions
+ executable by other people you will also have to make the referenced
+ cube functions executable. cube(text), cube(float8), cube(cube,float8),
+ cube_distance(cube,cube), cube_ll_coord(cube,int) and
+ cube_enlarge(cube,float8,int) are used indirectly by the earth distance
+ functions. is_point(cube) and cube_dim(cube) are used in constraints for data
+ in domain earth. cube_ur_coord(cube,int) is used in the regression tests and
+ might be useful for looking at bounding box coordinates in user applications.
+ </para>
+ <para>
+ A domain of type cube named earth is defined.
+ There are constraints on it defined to make sure the cube is a point,
+ that it does not have more than 3 dimensions and that it is very near
+ the surface of a sphere centered about the origin with the radius of
+ the Earth.
+ </para>
+ <para>
+ The following functions are provided:
+ </para>
+
+ <table id="earthdistance-functions">
+ <title>EarthDistance functions</title>
+ <tgroup cols="2">
+ <tbody>
+ <row>
+ <entry><literal>earth()</literal></entry>
+ <entry>returns the radius of the Earth in meters.</entry>
+ </row>
+ <row>
+ <entry><literal>sec_to_gc(float8)</literal></entry>
+ <entry>converts the normal straight line
+ (secant) distance between between two points on the surface of the Earth
+ to the great circle distance between them.
+ </entry>
+ </row>
+ <row>
+ <entry><literal>gc_to_sec(float8)</literal></entry>
+ <entry>Converts the great circle distance
+ between two points on the surface of the Earth to the normal straight line
+ (secant) distance between them.
+ </entry>
+ </row>
+ <row>
+ <entry><literal>ll_to_earth(float8, float8)</literal></entry>
+ <entry>Returns the location of a point on the surface of the Earth given
+ its latitude (argument 1) and longitude (argument 2) in degrees.
+ </entry>
+ </row>
+ <row>
+ <entry><literal>latitude(earth)</literal></entry>
+ <entry>Returns the latitude in degrees of a point on the surface of the
+ Earth.
+ </entry>
+ </row>
+ <row>
+ <entry><literal>longitude(earth)</literal></entry>
+ <entry>Returns the longitude in degrees of a point on the surface of the
+ Earth.
+ </entry>
+ </row>
+ <row>
+ <entry><literal>earth_distance(earth, earth)</literal></entry>
+ <entry>Returns the great circle distance between two points on the
+ surface of the Earth.
+ </entry>
+ </row>
+ <row>
+ <entry><literal>earth_box(earth, float8)</literal></entry>
+ <entry>Returns a box suitable for an indexed search using the cube @>
+ operator for points within a given great circle distance of a location.
+ Some points in this box are further than the specified great circle
+ distance from the location so a second check using earth_distance
+ should be made at the same time.
+ </entry>
+ </row>
+ <row>
+ <entry><literal><@></literal> operator</entry>
+ <entry>gives the distance in statute miles between
+ two points on the Earth's surface. Coordinates are in degrees. Points are
+ taken as (longitude, latitude) and not vice versa as longitude is closer
+ to the intuitive idea of x-axis and latitude to y-axis.
+ </entry>
+ </row>
+ </tbody>
+ </tgroup>
+ </table>
+ <para>
+ One advantage of using cube representation over a point using latitude and
+ longitude for coordinates, is that you don't have to worry about special
+ conditions at +/- 180 degrees of longitude or near the poles.
+ </para>
+</sect1>
+
diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml
index 5502bb42086..a1a8d048ed3 100644
--- a/doc/src/sgml/filelist.sgml
+++ b/doc/src/sgml/filelist.sgml
@@ -1,4 +1,4 @@
-<!-- $PostgreSQL: pgsql/doc/src/sgml/filelist.sgml,v 1.51 2007/11/01 17:00:18 momjian Exp $ -->
+<!-- $PostgreSQL: pgsql/doc/src/sgml/filelist.sgml,v 1.52 2007/11/10 23:30:46 momjian Exp $ -->
<!entity history SYSTEM "history.sgml">
<!entity info SYSTEM "info.sgml">
@@ -89,6 +89,38 @@
<!entity sources SYSTEM "sources.sgml">
<!entity storage SYSTEM "storage.sgml">
+<!-- contrib information -->
+<!entity contrib SYSTEM "contrib.sgml">
+<!entity adminpack SYSTEM "adminpack.sgml">
+<!entity btree-gist SYSTEM "btree-gist.sgml">
+<!entity chkpass SYSTEM "chkpass.sgml">
+<!entity cube SYSTEM "cube.sgml">
+<!entity dblink SYSTEM "dblink.sgml">
+<!entity earthdistance SYSTEM "earthdistance.sgml">
+<!entity fuzzystrmatch SYSTEM "fuzzystrmatch.sgml">
+<!entity hstore SYSTEM "hstore.sgml">
+<!entity intagg SYSTEM "intagg.sgml">
+<!entity intarray SYSTEM "intarray.sgml">
+<!entity isn SYSTEM "isn.sgml">
+<!entity lo SYSTEM "lo.sgml">
+<!entity ltree SYSTEM "ltree.sgml">
+<!entity oid2name SYSTEM "oid2name.sgml">
+<!entity pageinspect SYSTEM "pageinspect.sgml">
+<!entity pgbench SYSTEM "pgbench.sgml">
+<!entity buffercache SYSTEM "buffercache.sgml">
+<!entity pgcrypto SYSTEM "pgcrypto.sgml">
+<!entity freespacemap SYSTEM "freespacemap.sgml">
+<!entity pgrowlocks SYSTEM "pgrowlocks.sgml">
+<!entity standby SYSTEM "standby.sgml">
+<!entity pgstattuple SYSTEM "pgstattuple.sgml">
+<!entity trgm SYSTEM "trgm.sgml">
+<!entity seg SYSTEM "seg.sgml">
+<!entity sslinfo SYSTEM "sslinfo.sgml">
+<!entity tablefunc SYSTEM "tablefunc.sgml">
+<!entity uuid-ossp SYSTEM "uuid-ossp.sgml">
+<!entity vacuumlo SYSTEM "vacuumlo.sgml">
+<!entity xml2 SYSTEM "xml2.sgml">
+
<!-- appendixes -->
<!entity contacts SYSTEM "contacts.sgml">
<!entity cvs SYSTEM "cvs.sgml">
diff --git a/doc/src/sgml/freespacemap.sgml b/doc/src/sgml/freespacemap.sgml
new file mode 100644
index 00000000000..70b27415524
--- /dev/null
+++ b/doc/src/sgml/freespacemap.sgml
@@ -0,0 +1,243 @@
+<sect1 id="pgfreespacemap">
+ <title>pgfreespacemap</title>
+
+ <indexterm zone="pgfreespacemap">
+ <primary>pgfreespacemap</primary>
+ </indexterm>
+
+ <para>
+ This modules provides the means for examining the free space map (FSM). It
+ consists of two C functions: <literal>pg_freespacemap_relations()</literal>
+ and <literal>pg_freespacemap_pages()</literal> that return a set
+ of records, plus two views <literal>pg_freespacemap_relations</literal> and
+ <literal>pg_freespacemap_pages</literal> for more user-friendly access to
+ the functions.
+ </para>
+ <para>
+ The module provides the ability to examine the contents of the free space
+ map, without having to restart or rebuild the server with additional
+ debugging code.
+ </para>
+ <para>
+ By default public access is REVOKED from the functions and views, just in
+ case there are security issues present in the code.
+ </para>
+
+ <sect2>
+ <title>Notes</title>
+ <para>
+ The definitions for the columns exposed in the views are:
+ </para>
+
+ <table>
+ <title>pg_freespacemap_relations</title>
+ <tgroup cols="3">
+ <thead>
+ <row>
+ <entry>Column</entry>
+ <entry>references</entry>
+ <entry>Description</entry>
+ </row>
+ </thead>
+ <tbody>
+ <row>
+ <entry>reltablespace</entry>
+ <entry>pg_tablespace.oid</entry>
+ <entry>Tablespace oid of the relation.</entry>
+ </row>
+ <row>
+ <entry>reldatabase</entry>
+ <entry>pg_database.oid</entry>
+ <entry>Database oid of the relation.</entry>
+ </row>
+ <row>
+ <entry>relfilenode</entry>
+ <entry>pg_class.relfilenode</entry>
+ <entry>Relfilenode of the relation.</entry>
+ </row>
+ <row>
+ <entry>avgrequest</entry>
+ <entry></entry>
+ <entry>Moving average of free space requests (NULL for indexes)</entry>
+ </row>
+ <row>
+ <entry>interestingpages</entry>
+ <entry></entry>
+ <entry>Count of pages last reported as containing useful free space.</entry>
+ </row>
+ <row>
+ <entry>storedpages</entry>
+ <entry></entry>
+ <entry>Count of pages actually stored in free space map.</entry>
+ </row>
+ <row>
+ <entry>nextpage</entry>
+ <entry></entry>
+ <entry>Page index (from 0) to start next search at.</entry>
+ </row>
+ </tbody>
+ </tgroup>
+ </table>
+
+ <table>
+ <title>pg_freespacemap_pages</title>
+ <tgroup cols="3">
+ <thead>
+ <row>
+ <entry>Column</entry>
+ <entry> references</entry>
+ <entry>Description</entry>
+ </row>
+ </thead>
+ <tbody>
+ <row>
+ <entry>reltablespace</entry>
+ <entry>pg_tablespace.oid</entry>
+ <entry>Tablespace oid of the relation.</entry>
+ </row>
+ <row>
+ <entry>reldatabase</entry>
+ <entry>pg_database.oid</entry>
+ <entry>Database oid of the relation.</entry>
+ </row>
+ <row>
+ <entry>relfilenode</entry>
+ <entry>pg_class.relfilenode</entry>
+ <entry>Relfilenode of the relation.</entry>
+ </row>
+ <row>
+ <entry>relblocknumber</entry>
+ <entry></entry>
+ <entry>Page number in the relation.</entry>
+ </row>
+ <row>
+ <entry>bytes</entry>
+ <entry></entry>
+ <entry>Free bytes in the page, or NULL for an index page (see below).</entry>
+ </row>
+ </tbody>
+ </tgroup>
+ </table>
+
+ <para>
+ For <literal>pg_freespacemap_relations</literal>, there is one row for each
+ relation in the free space map. <literal>storedpages</literal> is the
+ number of pages actually stored in the map, while
+ <literal>interestingpages</literal> is the number of pages the last VACUUM
+ thought had useful amounts of free space.
+ </para>
+ <para>
+ If <literal>storedpages</literal> is consistently less than interestingpages
+ then it'd be a good idea to increase <literal>max_fsm_pages</literal>. Also,
+ if the number of rows in <literal>pg_freespacemap_relations</literal> is
+ close to <literal>max_fsm_relations</literal>, then you should consider
+ increasing <literal>max_fsm_relations</literal>.
+ </para>
+ <para>
+ For <literal>pg_freespacemap_pages</literal>, there is one row for each page
+ in the free space map. The number of rows for a relation will match the
+ <literal>storedpages</literal> column in
+ <literal>pg_freespacemap_relations</literal>.
+ </para>
+ <para>
+ For indexes, what is tracked is entirely-unused pages, rather than free
+ space within pages. Therefore, the average request size and free bytes
+ within a page are not meaningful, and are shown as NULL.
+ </para>
+ <para>
+ Because the map is shared by all the databases, it will include relations
+ not belonging to the current database.
+ </para>
+ <para>
+ When either of the views are accessed, internal free space map locks are
+ taken, and a copy of the map data is made for them to display.
+ This ensures that the views produce a consistent set of results, while not
+ blocking normal activity longer than necessary. Nonetheless there
+ could be some impact on database performance if they are read often.
+ </para>
+ </sect2>
+
+ <sect2>
+ <title>Sample output - pg_freespacemap_relations</title>
+ <programlisting>
+regression=# \d pg_freespacemap_relations
+View "public.pg_freespacemap_relations"
+ Column | Type | Modifiers
+------------------+---------+-----------
+ reltablespace | oid |
+ reldatabase | oid |
+ relfilenode | oid |
+ avgrequest | integer |
+ interestingpages | integer |
+ storedpages | integer |
+ nextpage | integer |
+View definition:
+ SELECT p.reltablespace, p.reldatabase, p.relfilenode, p.avgrequest, p.interestingpages, p.storedpages, p.nextpage
+ FROM pg_freespacemap_relations() p(reltablespace oid, reldatabase oid, relfilenode oid, avgrequest integer, interestingpages integer, storedpages integer, nextpage integer);
+
+regression=# SELECT c.relname, r.avgrequest, r.interestingpages, r.storedpages
+ FROM pg_freespacemap_relations r INNER JOIN pg_class c
+ ON c.relfilenode = r.relfilenode INNER JOIN pg_database d
+ ON r.reldatabase = d.oid AND (d.datname = current_database())
+ ORDER BY r.storedpages DESC LIMIT 10;
+ relname | avgrequest | interestingpages | storedpages
+---------------------------------+------------+------------------+-------------
+ onek | 256 | 109 | 109
+ pg_attribute | 167 | 93 | 93
+ pg_class | 191 | 49 | 49
+ pg_attribute_relid_attnam_index | | 48 | 48
+ onek2 | 256 | 37 | 37
+ pg_depend | 95 | 26 | 26
+ pg_type | 199 | 16 | 16
+ pg_rewrite | 1011 | 13 | 13
+ pg_class_relname_nsp_index | | 10 | 10
+ pg_proc | 302 | 8 | 8
+(10 rows)
+ </programlisting>
+ </sect2>
+
+ <sect2>
+ <title>Sample output - pg_freespacemap_pages</title>
+ <programlisting>
+regression=# \d pg_freespacemap_pages
+ View "public.pg_freespacemap_pages"
+ Column | Type | Modifiers
+----------------+---------+-----------
+ reltablespace | oid |
+ reldatabase | oid |
+ relfilenode | oid |
+ relblocknumber | bigint |
+ bytes | integer |
+View definition:
+ SELECT p.reltablespace, p.reldatabase, p.relfilenode, p.relblocknumber, p.bytes
+ FROM pg_freespacemap_pages() p(reltablespace oid, reldatabase oid, relfilenode oid, relblocknumber bigint, bytes integer);
+
+regression=# SELECT c.relname, p.relblocknumber, p.bytes
+ FROM pg_freespacemap_pages p INNER JOIN pg_class c
+ ON c.relfilenode = p.relfilenode INNER JOIN pg_database d
+ ON (p.reldatabase = d.oid AND d.datname = current_database())
+ ORDER BY c.relname LIMIT 10;
+ relname | relblocknumber | bytes
+--------------+----------------+-------
+ a_star | 0 | 8040
+ abstime_tbl | 0 | 7908
+ aggtest | 0 | 8008
+ altinhoid | 0 | 8128
+ altstartwith | 0 | 8128
+ arrtest | 0 | 7172
+ b_star | 0 | 7976
+ box_tbl | 0 | 7912
+ bt_f8_heap | 54 | 7728
+ bt_i4_heap | 49 | 8008
+(10 rows)
+ </programlisting>
+ </sect2>
+
+ <sect2>
+ <title>Author</title>
+ <para>
+ Mark Kirkwood <email>markir@paradise.net.nz</email>
+ </para>
+ </sect2>
+</sect1>
+
diff --git a/doc/src/sgml/fuzzystrmatch.sgml b/doc/src/sgml/fuzzystrmatch.sgml
new file mode 100644
index 00000000000..666e031c0d6
--- /dev/null
+++ b/doc/src/sgml/fuzzystrmatch.sgml
@@ -0,0 +1,122 @@
+
+<sect1 id="fuzzystrmatch">
+ <title>fuzzystrmatch</title>
+
+ <para>
+ This section describes the fuzzystrmatch module which provides different
+ functions to determine similarities and distance between strings.
+ </para>
+
+ <sect2>
+ <title>Soundex</title>
+ <para>
+ The Soundex system is a method of matching similar sounding names
+ (or any words) to the same code. It was initially used by the
+ United States Census in 1880, 1900, and 1910, but it has little use
+ beyond English names (or the English pronunciation of names), and
+ it is not a linguistic tool.
+ </para>
+ <para>
+ When comparing two soundex values to determine similarity, the
+ difference function reports how close the match is on a scale
+ from zero to four, with zero being no match and four being an
+ exact match.
+ </para>
+ <para>
+ The following are some usage examples:
+ </para>
+ <programlisting>
+SELECT soundex('hello world!');
+
+SELECT soundex('Anne'), soundex('Ann'), difference('Anne', 'Ann');
+SELECT soundex('Anne'), soundex('Andrew'), difference('Anne', 'Andrew');
+SELECT soundex('Anne'), soundex('Margaret'), difference('Anne', 'Margaret');
+
+CREATE TABLE s (nm text);
+
+INSERT INTO s VALUES ('john');
+INSERT INTO s VALUES ('joan');
+INSERT INTO s VALUES ('wobbly');
+INSERT INTO s VALUES ('jack');
+
+SELECT * FROM s WHERE soundex(nm) = soundex('john');
+
+SELECT a.nm, b.nm FROM s a, s b WHERE soundex(a.nm) = soundex(b.nm) AND a.oid <> b.oid;
+
+CREATE FUNCTION text_sx_eq(text, text) RETURNS boolean AS
+'select soundex($1) = soundex($2)'
+LANGUAGE SQL;
+
+CREATE FUNCTION text_sx_lt(text, text) RETURNS boolean AS
+'select soundex($1) < soundex($2)'
+LANGUAGE SQL;
+
+CREATE FUNCTION text_sx_gt(text, text) RETURNS boolean AS
+'select soundex($1) > soundex($2)'
+LANGUAGE SQL;
+
+CREATE FUNCTION text_sx_le(text, text) RETURNS boolean AS
+'select soundex($1) <= soundex($2)'
+LANGUAGE SQL;
+
+CREATE FUNCTION text_sx_ge(text, text) RETURNS boolean AS
+'select soundex($1) >= soundex($2)'
+LANGUAGE SQL;
+
+CREATE FUNCTION text_sx_ne(text, text) RETURNS boolean AS
+'select soundex($1) <> soundex($2)'
+LANGUAGE SQL;
+
+DROP OPERATOR #= (text, text);
+
+CREATE OPERATOR #= (leftarg=text, rightarg=text, procedure=text_sx_eq, commutator = #=);
+
+SELECT * FROM s WHERE text_sx_eq(nm, 'john');
+
+SELECT * FROM s WHERE s.nm #= 'john';
+
+SELECT * FROM s WHERE difference(s.nm, 'john') > 2;
+ </programlisting>
+ </sect2>
+
+ <sect2>
+ <title>levenshtein</title>
+ <para>
+ This function calculates the levenshtein distance between two strings:
+ </para>
+ <programlisting>
+ int levenshtein(text source, text target)
+ </programlisting>
+ <para>
+ Both <literal>source</literal> and <literal>target</literal> can be any
+ NOT NULL string with a maximum of 255 characters.
+ </para>
+ <para>
+ Example:
+ </para>
+ <programlisting>
+ SELECT levenshtein('GUMBO','GAMBOL');
+ </programlisting>
+ </sect2>
+
+ <sect2>
+ <title>metaphone</title>
+ <para>
+ This function calculates and returns the metaphone code of an input string:
+ </para>
+ <programlisting>
+ text metahpone(text source, int max_output_length)
+ </programlisting>
+ <para>
+ <literal>source</literal> has to be a NOT NULL string with a maximum of
+ 255 characters. <literal>max_output_length</literal> fixes the maximum
+ length of the output metaphone code; if longer, the output is truncated
+ to this length.
+ </para>
+ <para>Example</para>
+ <programlisting>
+ SELECT metaphone('GUMBO',4);
+ </programlisting>
+ </sect2>
+
+</sect1>
diff --git a/doc/src/sgml/hstore.sgml b/doc/src/sgml/hstore.sgml
new file mode 100644
index 00000000000..147fc7fba60
--- /dev/null
+++ b/doc/src/sgml/hstore.sgml
@@ -0,0 +1,298 @@
+<sect1 id="hstore">
+ <title>hstore</title>
+
+ <indexterm zone="hstore">
+ <primary>hstore</primary>
+ </indexterm>
+
+ <para>
+ The <literal>hstore</literal> module is usefull for storing (key,value) pairs.
+ This module can be useful in different scenarios: case with many attributes
+ rarely searched, semistructural data or a lazy DBA.
+ </para>
+
+ <sect2>
+ <title>Operations</title>
+ <itemizedlist>
+ <listitem>
+ <para>
+ <literal>hstore -> text</literal> - get value , perl analogy $h{key}
+ </para>
+ <programlisting>
+select 'a=>q, b=>g'->'a';
+ ?
+------
+ q
+ </programlisting>
+ <para>
+ Note the use of parenthesis in the select below, because priority of 'is' is
+ higher than that of '->':
+ </para>
+ <programlisting>
+SELECT id FROM entrants WHERE (info->'education_period') IS NOT NULL;
+ </programlisting>
+ </listitem>
+
+ <listitem>
+ <para>
+ <literal>hstore || hstore</literal> - concatenation, perl analogy %a=( %b, %c );
+ </para>
+ <programlisting>
+regression=# select 'a=>b'::hstore || 'c=>d'::hstore;
+ ?column?
+--------------------
+ "a"=>"b", "c"=>"d"
+(1 row)
+ </programlisting>
+
+ <para>
+ but, notice
+ </para>
+
+ <programlisting>
+regression=# select 'a=>b'::hstore || 'a=>d'::hstore;
+ ?column?
+----------
+ "a"=>"d"
+(1 row)
+ </programlisting>
+ </listitem>
+
+ <listitem>
+ <para>
+ <literal>text => text</literal> - creates hstore type from two text strings
+ </para>
+ <programlisting>
+select 'a'=>'b';
+ ?column?
+----------
+ "a"=>"b"
+ </programlisting>
+ </listitem>
+
+ <listitem>
+ <para>
+ <literal>hstore @> hstore</literal> - contains operation, check if left operand contains right.
+ </para>
+ <programlisting>
+regression=# select 'a=>b, b=>1, c=>NULL'::hstore @> 'a=>c';
+ ?column?
+----------
+ f
+(1 row)
+
+regression=# select 'a=>b, b=>1, c=>NULL'::hstore @> 'b=>1';
+ ?column?
+----------
+ t
+(1 row)
+ </programlisting>
+ </listitem>
+
+ <listitem>
+ <para>
+ <literal>hstore &lt;@ hstore</literal> - contained operation, check if
+ left operand is contained in right
+ </para>
+ <para>
+ (Before PostgreSQL 8.2, the containment operators @&gt; and &lt;@ were
+ respectively called @ and ~. These names are still available, but are
+ deprecated and will eventually be retired. Notice that the old names
+ are reversed from the convention formerly followed by the core geometric
+ datatypes!)
+ </para>
+ </listitem>
+ </itemizedlist>
+ </sect2>
+
+ <sect2>
+ <title>Functions</title>
+
+ <itemizedlist>
+ <listitem>
+ <para>
+ <literal>akeys(hstore)</literal> - returns all keys from hstore as array
+ </para>
+ <programlisting>
+regression=# select akeys('a=>1,b=>2');
+ akeys
+-------
+ {a,b}
+ </programlisting>
+ </listitem>
+
+ <listitem>
+ <para>
+ <literal>skeys(hstore)</literal> - returns all keys from hstore as strings
+ </para>
+ <programlisting>
+regression=# select skeys('a=>1,b=>2');
+ skeys
+-------
+ a
+ b
+ </programlisting>
+ </listitem>
+
+ <listitem>
+ <para>
+ <literal>avals(hstore)</literal> - returns all values from hstore as array
+ </para>
+ <programlisting>
+regression=# select avals('a=>1,b=>2');
+ avals
+-------
+ {1,2}
+ </programlisting>
+ </listitem>
+
+ <listitem>
+ <para>
+ <literal>svals(hstore)</literal> - returns all values from hstore as
+ strings
+ </para>
+ <programlisting>
+regression=# select svals('a=>1,b=>2');
+ svals
+-------
+ 1
+ 2
+ </programlisting>
+ </listitem>
+
+ <listitem>
+ <para>
+ <literal>delete (hstore,text)</literal> - delete (key,value) from hstore if
+ key matches argument.
+ </para>
+ <programlisting>
+regression=# select delete('a=>1,b=>2','b');
+ delete
+----------
+ "a"=>"1"
+ </programlisting>
+ </listitem>
+
+ <listitem>
+ <para>
+ <literal>each(hstore)</literal> - return (key, value) pairs
+ </para>
+ <programlisting>
+regression=# select * from each('a=>1,b=>2');
+ key | value
+-----+-------
+ a | 1
+ b | 2
+ </programlisting>
+ </listitem>
+
+ <listitem>
+ <para>
+ <literal>exist (hstore,text)</literal>
+ </para>
+ <para>
+ <literal>hstore ? text</literal> - returns 'true if key is exists in hstore
+ and false otherwise.
+ </para>
+ <programlisting>
+regression=# select exist('a=>1','a'), 'a=>1' ? 'a';
+ exist | ?column?
+-------+----------
+ t | t
+ </programlisting>
+ </listitem>
+
+ <listitem>
+ <para>
+ <literal>defined (hstore,text)</literal> - returns true if key is exists in
+ hstore and its value is not NULL.
+ </para>
+ <programlisting>
+regression=# select defined('a=>NULL','a');
+ defined
+---------
+ f
+ </programlisting>
+ </listitem>
+ </itemizedlist>
+ </sect2>
+
+ <sect2>
+ <title>Indices</title>
+ <para>
+ Module provides index support for '@>' and '?' operations.
+ </para>
+ <programlisting>
+CREATE INDEX hidx ON testhstore USING GIST(h);
+CREATE INDEX hidx ON testhstore USING GIN(h);
+ </programlisting>
+ </sect2>
+
+ <sect2>
+ <title>Examples</title>
+
+ <para>
+ Add a key:
+ </para>
+ <programlisting>
+UPDATE tt SET h=h||'c=>3';
+ </programlisting>
+ <para>
+ Delete a key:
+ </para>
+ <programlisting>
+UPDATE tt SET h=delete(h,'k1');
+ </programlisting>
+ </sect2>
+
+ <sect2>
+ <title>Statistics</title>
+ <para>
+hstore type, because of its intrinsic liberality, could contain a lot of
+different keys. Checking for valid keys is the task of application.
+Examples below demonstrate several techniques how to check keys statistics.
+ </para>
+
+ <para>
+ Simple example
+ </para>
+ <programlisting>
+SELECT * FROM each('aaa=>bq, b=>NULL, ""=>1 ');
+ </programlisting>
+
+ <para>
+ Using table
+ </para>
+ <programlisting>
+SELECT (each(h)).key, (each(h)).value INTO stat FROM testhstore ;
+ </programlisting>
+
+ <para>Online stat</para>
+ <programlisting>
+SELECT key, count(*) FROM (SELECT (each(h)).key FROM testhstore) AS stat GROUP BY key ORDER BY count DESC, key;
+ key | count
+-----------+-------
+ line | 883
+ query | 207
+ pos | 203
+ node | 202
+ space | 197
+ status | 195
+ public | 194
+ title | 190
+ org | 189
+...................
+ </programlisting>
+ </sect2>
+
+ <sect2>
+ <title>Authors</title>
+ <para>
+ Oleg Bartunov <email>oleg@sai.msu.su</email>, Moscow, Moscow University, Russia
+ </para>
+ <para>
+ Teodor Sigaev <email>teodor@sigaev.ru</email>, Moscow, Delta-Soft Ltd.,Russia
+ </para>
+ </sect2>
+</sect1>
+
diff --git a/doc/src/sgml/intagg.sgml b/doc/src/sgml/intagg.sgml
new file mode 100644
index 00000000000..3fbd5c3281f
--- /dev/null
+++ b/doc/src/sgml/intagg.sgml
@@ -0,0 +1,82 @@
+
+<sect1 id="intagg">
+ <title>intagg</title>
+
+ <indexterm zone="intagg">
+ <primary>intagg</primary>
+ </indexterm>
+
+ <para>
+ This section describes the <literal>intagg</literal> module which provides an integer aggregator and an enumerator.
+ </para>
+ <para>
+ Many database systems have the notion of a one to many table. Such a table usually sits between two indexed tables, as:
+ </para>
+ <programlisting>
+CREATE TABLE one_to_many(left INT, right INT) ;
+ </programlisting>
+
+ <para>
+ And it is used like this:
+ </para>
+
+ <programlisting>
+ SELECT right.* from right JOIN one_to_many ON (right.id = one_to_many.right)
+ WHERE one_to_many.left = item;
+ </programlisting>
+
+ <para>
+ This will return all the items in the right hand table for an entry
+ in the left hand table. This is a very common construct in SQL.
+ </para>
+
+ <para>
+ Now, this methodology can be cumbersome with a very large number of
+ entries in the one_to_many table. Depending on the order in which
+ data was entered, a join like this could result in an index scan
+ and a fetch for each right hand entry in the table for a particular
+ left hand entry. If you have a very dynamic system, there is not much you
+ can do. However, if you have some data which is fairly static, you can
+ create a summary table with the aggregator.
+ </para>
+
+ <programlisting>
+CREATE TABLE summary as SELECT left, int_array_aggregate(right)
+ AS right FROM one_to_many GROUP BY left;
+ </programlisting>
+
+ <para>
+ This will create a table with one row per left item, and an array
+ of right items. Now this is pretty useless without some way of using
+ the array, thats why there is an array enumerator.
+ </para>
+ <programlisting>
+SELECT left, int_array_enum(right) FROM summary WHERE left = item;
+ </programlisting>
+
+ <para>
+ The above query using int_array_enum, produces the same results as:
+ </para>
+ <programlisting>
+SELECT left, right FROM one_to_many WHERE left = item;
+ </programlisting>
+
+ <para>
+ The difference is that the query against the summary table has to get
+ only one row from the table, where as the query against "one_to_many"
+ must index scan and fetch a row for each entry.
+ </para>
+ <para>
+ On our system, an EXPLAIN shows a query with a cost of 8488 gets reduced
+ to a cost of 329. The query is a join between the one_to_many table,
+ </para>
+ <programlisting>
+SELECT right, count(right) FROM
+(
+ SELECT left, int_array_enum(right) AS right FROM summary JOIN
+ (SELECT left FROM left_table WHERE left = item) AS lefts
+ ON (summary.left = lefts.left )
+) AS list GROUP BY right ORDER BY count DESC ;
+ </programlisting>
+</sect1>
+
diff --git a/doc/src/sgml/intarray.sgml b/doc/src/sgml/intarray.sgml
new file mode 100644
index 00000000000..7e538a894d5
--- /dev/null
+++ b/doc/src/sgml/intarray.sgml
@@ -0,0 +1,286 @@
+<sect1 id="intarray">
+ <title>intarray</title>
+
+ <indexterm zone="intarray">
+ <primary>intarray</primary>
+ </indexterm>
+
+ <para>
+ This is an implementation of RD-tree data structure using GiST interface
+ of PostgreSQL. It has built-in lossy compression.
+ </para>
+
+ <para>
+ Current implementation provides index support for one-dimensional array of
+ int4's - gist__int_ops, suitable for small and medium size of arrays (used on
+ default), and gist__intbig_ops for indexing large arrays (we use superimposed
+ signature with length of 4096 bits to represent sets).
+ </para>
+
+ <sect2>
+ <title>Functions</title>
+
+ <itemizedlist>
+
+ <listitem>
+ <para>
+ <literal>int icount(int[])</literal> - the number of elements in intarray
+ </para>
+ <programlisting>
+test=# select icount('{1,2,3}'::int[]);
+ icount
+--------
+ 3
+(1 row)
+ </programlisting>
+ </listitem>
+
+ <listitem>
+ <para>
+ <literal>int[] sort(int[], 'asc' | 'desc')</literal> - sort intarray
+ </para>
+ <programlisting>
+test=# select sort('{1,2,3}'::int[],'desc');
+ sort
+---------
+ {3,2,1}
+(1 row)
+ </programlisting>
+ </listitem>
+
+ <listitem>
+ <para>
+ <literal>int[] sort(int[])</literal> - sort in ascending order
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ <literal>int[] sort_asc(int[]),sort_desc(int[])</literal> - shortcuts for sort
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ <literal>int[] uniq(int[])</literal> - returns unique elements
+ </para>
+ <programlisting>
+test=# select uniq(sort('{1,2,3,2,1}'::int[]));
+ uniq
+---------
+ {1,2,3}
+(1 row)
+ </programlisting>
+ </listitem>
+
+ <listitem>
+ <para>
+ <literal>int idx(int[], int item)</literal> - returns index of first
+ intarray matching element to item, or '0' if matching failed.
+ </para>
+ <programlisting>
+test=# select idx('{1,2,3,2,1}'::int[],2);
+ idx
+-----
+ 2
+(1 row)
+ </programlisting>
+ </listitem>
+
+ <listitem>
+ <para>
+ <literal>int[] subarray(int[],int START [, int LEN])</literal> - returns
+ part of intarray starting from element number START (from 1) and length LEN.
+ </para>
+ <programlisting>
+test=# select subarray('{1,2,3,2,1}'::int[],2,3);
+ subarray
+----------
+ {2,3,2}
+(1 row)
+ </programlisting>
+ </listitem>
+
+ <listitem>
+ <para>
+ <literal>int[] intset(int4)</literal> - casting int4 to int[]
+ </para>
+ <programlisting>
+test=# select intset(1);
+ intset
+--------
+ {1}
+(1 row)
+ </programlisting>
+ </listitem>
+
+ </itemizedlist>
+ </sect2>
+
+ <sect2>
+ <title>Operations</title>
+ <table>
+ <title>Operations</title>
+ <tgroup cols="2">
+ <thead>
+ <row>
+ <entry>Operator</entry>
+ <entry>Description</entry>
+ </row>
+ </thead>
+ <tbody>
+ <row>
+ <entry><literal>int[] && int[]</literal></entry>
+ <entry>overlap - returns TRUE if arrays have at least one common element</entry>
+ </row>
+ <row>
+ <entry><literal>int[] @> int[]</literal></entry>
+ <entry>contains - returns TRUE if left array contains right array</entry>
+ </row>
+ <row>
+ <entry><literal>int[] <@ int[]</literal></entry>
+ <entry>contained - returns TRUE if left array is contained in right array</entry>
+ </row>
+ <row>
+ <entry><literal># int[]</literal></entry>
+ <entry>returns the number of elements in array</entry>
+ </row>
+ <row>
+ <entry><literal>int[] + int</literal></entry>
+ <entry>push element to array ( add to end of array)</entry>
+ </row>
+ <row>
+ <entry><literal>int[] + int[] </literal></entry>
+ <entry>merge of arrays (right array added to the end of left one)</entry>
+ </row>
+ <row>
+ <entry><literal>int[] - int</literal></entry>
+ <entry>remove entries matched by right argument from array</entry>
+ </row>
+ <row>
+ <entry><literal>int[] - int[]</literal></entry>
+ <entry>remove right array from left</entry>
+ </row>
+ <row>
+ <entry><literal>int[] | int</literal></entry>
+ <entry>returns intarray - union of arguments</entry>
+ </row>
+ <row>
+ <entry><literal>int[] | int[]</literal></entry>
+ <entry>returns intarray as a union of two arrays</entry>
+ </row>
+
+ <row>
+ <entry><literal>int[] & int[]</literal></entry>
+ <entry>returns intersection of arrays</entry>
+ </row>
+
+ <row>
+ <entry><literal>int[] @@ query_int</literal></entry>
+ <entry>
+ returns TRUE if array satisfies query (like
+ <literal>'1&amp;(2|3)'</literal>)
+ </entry>
+ </row>
+
+ <row>
+ <entry><literal>query_int ~~ int[]</literal></entry>
+ <entry>returns TRUE if array satisfies query (commutator of @@)</entry>
+ </row>
+ </tbody>
+ </tgroup>
+ </table>
+ <para>
+ (Before PostgreSQL 8.2, the containment operators @> and <@ were
+ respectively called @ and ~. These names are still available, but are
+ deprecated and will eventually be retired. Notice that the old names
+ are reversed from the convention formerly followed by the core geometric
+ datatypes!)
+ </para>
+ </sect2>
+
+ <sect2>
+ <title>Example</title>
+
+ <programlisting>
+CREATE TABLE message (mid INT NOT NULL,sections INT[]);
+CREATE TABLE message_section_map (mid INT NOT NULL,sid INT NOT NULL);
+
+-- create indices
+CREATE unique index message_key ON message ( mid );
+CREATE unique index message_section_map_key2 ON message_section_map (sid, mid );
+CREATE INDEX message_rdtree_idx ON message USING GIST ( sections gist__int_ops);
+
+-- select some messages with section in 1 OR 2 - OVERLAP operator
+SELECT message.mid FROM message WHERE message.sections && '{1,2}';
+
+-- select messages contains in sections 1 AND 2 - CONTAINS operator
+SELECT message.mid FROM message WHERE message.sections @> '{1,2}';
+-- the same, CONTAINED operator
+SELECT message.mid FROM message WHERE '{1,2}' <@ message.sections;
+ </programlisting>
+ </sect2>
+
+ <sect2>
+ <title>Benchmark</title>
+ <para>
+ subdirectory bench contains benchmark suite.
+ </para>
+ <programlisting>
+ cd ./bench
+ 1. createdb TEST
+ 2. psql TEST < ../_int.sql
+ 3. ./create_test.pl | psql TEST
+ 4. ./bench.pl - perl script to benchmark queries, supports OR, AND queries
+ with/without RD-Tree. Run script without arguments to
+ see availbale options.
+
+ a)test without RD-Tree (OR)
+ ./bench.pl -d TEST -c -s 1,2 -v
+ b)test with RD-Tree
+ ./bench.pl -d TEST -c -s 1,2 -v -r
+
+ BENCHMARKS:
+
+ Size of table &lt;message>: 200000
+ Size of table &lt;message_section_map>: 269133
+
+ Distribution of messages by sections:
+
+ section 0: 74377 messages
+ section 1: 16284 messages
+ section 50: 1229 messages
+ section 99: 683 messages
+
+ old - without RD-Tree support,
+ new - with RD-Tree
+
+ +----------+---------------+----------------+
+ |Search set|OR, time in sec|AND, time in sec|
+ | +-------+-------+--------+-------+
+ | | old | new | old | new |
+ +----------+-------+-------+--------+-------+
+ | 1| 0.625| 0.101| -| -|
+ +----------+-------+-------+--------+-------+
+ | 99| 0.018| 0.017| -| -|
+ +----------+-------+-------+--------+-------+
+ | 1,2| 0.766| 0.133| 0.628| 0.045|
+ +----------+-------+-------+--------+-------+
+ | 1,2,50,65| 0.794| 0.141| 0.030| 0.006|
+ +----------+-------+-------+--------+-------+
+ </programlisting>
+ </sect2>
+
+ <sect2>
+ <title>Authors</title>
+ <para>
+ All work was done by Teodor Sigaev (<email>teodor@stack.net</email>) and Oleg
+ Bartunov (<email>oleg@sai.msu.su</email>). See
+ <ulink url="http://www.sai.msu.su/~megera/postgres/gist"></ulink> for
+ additional information. Andrey Oktyabrski did a great work on adding new
+ functions and operations.
+ </para>
+ </sect2>
+
+</sect1>
+
diff --git a/doc/src/sgml/isn.sgml b/doc/src/sgml/isn.sgml
new file mode 100644
index 00000000000..c6fef47f084
--- /dev/null
+++ b/doc/src/sgml/isn.sgml
@@ -0,0 +1,502 @@
+<sect1 id="isn">
+ <title>isn</title>
+
+ <indexterm zone="isn">
+ <primary>isn</primary>
+ </indexterm>
+
+ <para>
+ The <literal>isn</literal> module adds data types for the following
+ international-standard namespaces: EAN13, UPC, ISBN (books), ISMN (music),
+ and ISSN (serials). This module is inspired by Garrett A. Wollman's
+ isbn_issn code.
+ </para>
+ <para>
+ This module validates, and automatically adds the correct
+ hyphenations to the numbers. Also, it supports the new ISBN-13
+ numbers to be used starting in January 2007.
+ </para>
+
+ <para>
+ Premises:
+ </para>
+
+ <orderedlist>
+ <listitem>
+ <para>ISBN13, ISMN13, ISSN13 numbers are all EAN13 numbers</para>
+ </listitem>
+ <listitem>
+ <para>EAN13 numbers aren't always ISBN13, ISMN13 or ISSN13 (some are)</para>
+ </listitem>
+ <listitem>
+ <para>some ISBN13 numbers can be displayed as ISBN</para>
+ </listitem>
+ <listitem>
+ <para>some ISMN13 numbers can be displayed as ISMN</para>
+ </listitem>
+ <listitem>
+ <para>some ISSN13 numbers can be displayed as ISSN</para>
+ </listitem>
+ <listitem>
+ <para>all UPC, ISBN, ISMN and ISSN can be represented as EAN13 numbers</para>
+ </listitem>
+ </orderedlist>
+
+ <note>
+ <para>
+ All types are internally represented as 64 bit integers,
+ and internally all are consistently interchangeable.
+ </para>
+ </note>
+ <note>
+ <para>
+ We have two operator classes (for btree and for hash) so each data type
+ can be indexed for faster access.
+ </para>
+ </note>
+
+ <sect2>
+ <title>Data types</title>
+
+ <para>
+ We have the following data types:
+ </para>
+
+ <table>
+ <title>Data types</title>
+ <tgroup cols="2">
+ <thead>
+ <row>
+ <entry><para>Data type</para></entry>
+ <entry><para>Description</para></entry>
+ </row>
+ </thead>
+ <tbody>
+ <row>
+ <entry><para><literal>EAN13</literal></para></entry>
+ <entry>
+ <para>
+ European Article Numbers. This type will always show the EAN13-display
+ format. Te output function for this is <literal>ean13_out()</literal>
+ </para>
+ </entry>
+ </row>
+
+ <row>
+ <entry><para><literal>ISBN13</literal></para></entry>
+ <entry>
+ <para>
+ For International Standard Book Numbers to be displayed in
+ the new EAN13-display format.
+ </para>
+ </entry>
+ </row>
+
+ <row>
+ <entry><para><literal>ISMN13</literal></para></entry>
+ <entry>
+ <para>
+ For International Standard Music Numbers to be displayed in
+ the new EAN13-display format.
+ </para>
+ </entry>
+ </row>
+ <row>
+ <entry><para><literal>ISSN13</literal></para></entry>
+ <entry>
+ <para>
+ For International Standard Serial Numbers to be displayed in the new
+ EAN13-display format.
+ </para>
+ </entry>
+ </row>
+ <row>
+ <entry><para><literal>ISBN</literal></para></entry>
+ <entry>
+ <para>
+ For International Standard Book Numbers to be displayed in the current
+ short-display format.
+ </para>
+ </entry>
+ </row>
+ <row>
+ <entry><para><literal>ISMN</literal></para></entry>
+ <entry>
+ <para>
+ For International Standard Music Numbers to be displayed in the
+ current short-display format.
+ </para>
+ </entry>
+ </row>
+ <row>
+ <entry><para><literal>ISSN</literal></para></entry>
+ <entry>
+ <para>
+ For International Standard Serial Numbers to be displayed in the
+ current short-display format. These types will display the short
+ version of the ISxN (ISxN 10) whenever it's possible, and it will
+ show ISxN 13 when it's impossible to show the short version. The
+ output function to do this is <literal>isn_out()</literal>
+ </para>
+ </entry>
+ </row>
+ <row>
+ <entry><para><literal>UPC</literal></para></entry>
+ <entry>
+ <para>
+ For Universal Product Codes. UPC numbers are a subset of the EAN13
+ numbers (they are basically EAN13 without the first '0' digit.)
+ The output function to do this is also <literal>isn_out()</literal>
+ </para>
+ </entry>
+ </row>
+ </tbody>
+ </tgroup>
+ </table>
+
+ <note>
+ <para>
+ <literal>EAN13</literal>, <literal>ISBN13</literal>,
+ <literal>ISMN13</literal> and <literal>ISSN13</literal> types will always
+ display the long version of the ISxN (EAN13). The output function to do
+ this is <literal>ean13_out()</literal>.
+ </para>
+ <para>
+ The need for these types is just for displaying in different ways the same
+ data: <literal>ISBN13</literal> is actually the same as
+ <literal>ISBN</literal>, <literal>ISMN13=ISMN</literal> and
+ <literal>ISSN13=ISSN</literal>.
+ </para>
+ </note>
+ </sect2>
+
+ <sect2>
+ <title>Input functions</title>
+
+ <para>
+ We have the following input functions:
+ </para>
+
+ <table>
+ <title>Input functions</title>
+ <tgroup cols="2">
+ <thead>
+ <row>
+ <entry>Function</entry>
+ <entry>Description</entry>
+ </row>
+ </thead>
+ <tbody>
+ <row>
+ <entry><para><literal>ean13_in()</literal></para></entry>
+ <entry>
+ <para>
+ To take a string and return an EAN13.
+ </para>
+ </entry>
+ </row>
+
+ <row>
+ <entry><para><literal>isbn_in()</literal></para></entry>
+ <entry>
+ <para>
+ To take a string and return valid ISBN or ISBN13 numbers.
+ </para>
+ </entry>
+ </row>
+
+ <row>
+ <entry><para><literal>ismn_in()</literal></para></entry>
+ <entry>
+ <para>
+ To take a string and return valid ISMN or ISMN13 numbers.
+ </para>
+ </entry>
+ </row>
+
+ <row>
+ <entry><para><literal>issn_in()</literal></para></entry>
+ <entry>
+ <para>
+ To take a string and return valid ISSN or ISSN13 numbers.
+ </para>
+ </entry>
+ </row>
+ <row>
+ <entry><para><literal>upc_in()</literal></para></entry>
+ <entry>
+ <para>
+ To take a string and return an UPC codes.
+ </para>
+ </entry>
+ </row>
+ </tbody>
+ </tgroup>
+ </table>
+ </sect2>
+
+ <sect2>
+ <title>Casts</title>
+
+ <para>
+ We are able to cast from:
+ </para>
+ <itemizedlist>
+ <listitem>
+ <para>
+ ISBN13 -> EAN13
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ ISMN13 -> EAN13
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ ISSN13 -> EAN13
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ ISBN -> EAN13
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ ISMN -> EAN13
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ ISSN -> EAN13
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ UPC -> EAN13
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ ISBN <-> ISBN13
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ ISMN <-> ISMN13
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ ISSN <-> ISSN13
+ </para>
+ </listitem>
+ </itemizedlist>
+ </sect2>
+
+ <sect2>
+ <title>C API</title>
+ <para>
+ The C API is implemented as:
+ </para>
+ <programlisting>
+ extern Datum isn_out(PG_FUNCTION_ARGS);
+ extern Datum ean13_out(PG_FUNCTION_ARGS);
+ extern Datum ean13_in(PG_FUNCTION_ARGS);
+ extern Datum isbn_in(PG_FUNCTION_ARGS);
+ extern Datum ismn_in(PG_FUNCTION_ARGS);
+ extern Datum issn_in(PG_FUNCTION_ARGS);
+ extern Datum upc_in(PG_FUNCTION_ARGS);
+ </programlisting>
+
+ <para>
+ On success:
+ </para>
+ <itemizedlist>
+ <listitem>
+ <para>
+ <literal>isn_out()</literal> takes any of our types and returns a string containing
+ the shortes possible representation of the number.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <literal>ean13_out()</literal> takes any of our types and returns the
+ EAN13 (long) representation of the number.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <literal>ean13_in()</literal> takes a string and return a EAN13. Which, as stated in (2)
+ could or could not be any of our types, but it certainly is an EAN13
+ number. Only if the string is a valid EAN13 number, otherwise it fails.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <literal>isbn_in()</literal> takes a string and return an ISBN/ISBN13. Only if the string
+ is really a ISBN/ISBN13, otherwise it fails.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <literal>ismn_in()</literal> takes a string and return an ISMN/ISMN13. Only if the string
+ is really a ISMN/ISMN13, otherwise it fails.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <literal>issn_in()</literal> takes a string and return an ISSN/ISSN13. Only if the string
+ is really a ISSN/ISSN13, otherwise it fails.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <literal>upc_in()</literal> takes a string and return an UPC. Only if the string is
+ really a UPC, otherwise it fails.
+ </para>
+ </listitem>
+ </itemizedlist>
+
+ <para>
+ (on failure, the functions 'ereport' the error)
+ </para>
+ </sect2>
+
+ <sect2>
+ <title>Testing functions</title>
+ <table>
+ <title>Testing functions</title>
+ <tgroup cols="2">
+ <thead>
+ <row>
+ <entry><para>Function</para></entry>
+ <entry><para>Description</para></entry>
+ </row>
+ </thead>
+ <tbody>
+ <row>
+ <entry><para><literal>isn_weak(boolean)</literal></para></entry>
+ <entry><para>Sets the weak input mode.</para></entry>
+ </row>
+ <row>
+ <entry><para><literal>isn_weak()</literal></para></entry>
+ <entry><para>Gets the current status of the weak mode.</para></entry>
+ </row>
+ <row>
+ <entry><para><literal>make_valid()</literal></para></entry>
+ <entry><para>Validates an invalid number (deleting the invalid flag).</para></entry>
+ </row>
+ <row>
+ <entry><para><literal>is_valid()</literal></para></entry>
+ <entry><para>Checks for the invalid flag prsence.</para></entry>
+ </row>
+ </tbody>
+ </tgroup>
+ </table>
+
+ <para>
+ <literal>Weak</literal> mode is used to be able to insert invalid data to
+ a table. Invalid as in the check digit being wrong, not missing numbers.
+ </para>
+ <para>
+ Why would you want to use the weak mode? Well, it could be that
+ you have a huge collection of ISBN numbers, and that there are so many of
+ them that for weird reasons some have the wrong check digit (perhaps the
+ numbers where scanned from a printed list and the OCR got the numbers wrong,
+ perhaps the numbers were manually captured... who knows.) Anyway, the thing
+ is you might want to clean the mess up, but you still want to be able to have
+ all the numbers in your database and maybe use an external tool to access
+ the invalid numbers in the database so you can verify the information and
+ validate it more easily; as selecting all the invalid numbers in the table.
+ </para>
+ <para>
+ When you insert invalid numbers in a table using the weak mode, the number
+ will be inserted with the corrected check digit, but it will be flagged
+ with an exclamation mark ('!') at the end (i.e. 0-11-000322-5!)
+ </para>
+ <para>
+ You can also force the insertion of invalid numbers even not in the weak mode,
+ appending the '!' character at the end of the number.
+ </para>
+ </sect2>
+
+ <sect2>
+ <title>Examples</title>
+ <programlisting>
+--Using the types directly:
+SELECT isbn('978-0-393-04002-9');
+SELECT isbn13('0901690546');
+SELECT issn('1436-4522');
+
+--Casting types:
+-- note that you can only cast from ean13 to other type when the casted
+-- number would be valid in the realm of the casted type;
+-- thus, the following will NOT work: select isbn(ean13('0220356483481'));
+-- but these will:
+SELECT upc(ean13('0220356483481'));
+SELECT ean13(upc('220356483481'));
+
+--Create a table with a single column to hold ISBN numbers:
+CREATE TABLE test ( id isbn );
+INSERT INTO test VALUES('9780393040029');
+
+--Automatically calculating check digits (observe the '?'):
+INSERT INTO test VALUES('220500896?');
+INSERT INTO test VALUES('978055215372?');
+
+SELECT issn('3251231?');
+SELECT ismn('979047213542?');
+
+--Using the weak mode:
+SELECT isn_weak(true);
+INSERT INTO test VALUES('978-0-11-000533-4');
+INSERT INTO test VALUES('9780141219307');
+INSERT INTO test VALUES('2-205-00876-X');
+SELECT isn_weak(false);
+
+SELECT id FROM test WHERE NOT is_valid(id);
+UPDATE test SET id=make_valid(id) WHERE id = '2-205-00876-X!';
+
+SELECT * FROM test;
+
+SELECT isbn13(id) FROM test;
+ </programlisting>
+ </sect2>
+
+ <sect2>
+ <title>Bibliography</title>
+ <para>
+ The information to implement this module was collected through
+ several sites, including:
+ </para>
+ <programlisting>
+ http://www.isbn-international.org/
+ http://www.issn.org/
+ http://www.ismn-international.org/
+ http://www.wikipedia.org/
+ </programlisting>
+ <para>
+ the prefixes used for hyphenation where also compiled from:
+ </para>
+ <programlisting>
+ http://www.gs1.org/productssolutions/idkeys/support/prefix_list.html
+ http://www.isbn-international.org/en/identifiers.html
+ http://www.ismn-international.org/ranges.html
+ </programlisting>
+ <para>
+ Care was taken during the creation of the algorithms and they
+ were meticulously verified against the suggested algorithms
+ in the official ISBN, ISMN, ISSN User Manuals.
+ </para>
+ </sect2>
+
+ <sect2>
+ <title>Author</title>
+ <para>
+ Germ�n M�ndez Bravo (Kronuz), 2004 - 2006
+ </para>
+ </sect2>
+</sect1>
+
diff --git a/doc/src/sgml/lo.sgml b/doc/src/sgml/lo.sgml
new file mode 100644
index 00000000000..2a23a5b5cd0
--- /dev/null
+++ b/doc/src/sgml/lo.sgml
@@ -0,0 +1,118 @@
+
+<sect1 id="lo">
+ <title>lo</title>
+
+ <indexterm zone="lo">
+ <primary>lo</primary>
+ </indexterm>
+
+ <para>
+ PostgreSQL type extension for managing Large Objects
+ </para>
+
+ <sect2>
+ <title>Overview</title>
+ <para>
+ One of the problems with the JDBC driver (and this affects the ODBC driver
+ also), is that the specification assumes that references to BLOBS (Binary
+ Large OBjectS) are stored within a table, and if that entry is changed, the
+ associated BLOB is deleted from the database.
+ </para>
+ <para>
+ As PostgreSQL stands, this doesn't occur. Large objects are treated as
+ objects in their own right; a table entry can reference a large object by
+ OID, but there can be multiple table entries referencing the same large
+ object OID, so the system doesn't delete the large object just because you
+ change or remove one such entry.
+ </para>
+ <para>
+ Now this is fine for new PostgreSQL-specific applications, but existing ones
+ using JDBC or ODBC won't delete the objects, resulting in orphaning - objects
+ that are not referenced by anything, and simply occupy disk space.
+ </para>
+ </sect2>
+
+ <sect2>
+ <title>The Fix</title>
+ <para>
+ I've fixed this by creating a new data type 'lo', some support functions, and
+ a Trigger which handles the orphaning problem. The trigger essentially just
+ does a 'lo_unlink' whenever you delete or modify a value referencing a large
+ object. When you use this trigger, you are assuming that there is only one
+ database reference to any large object that is referenced in a
+ trigger-controlled column!
+ </para>
+ <para>
+ The 'lo' type was created because we needed to differentiate between plain
+ OIDs and Large Objects. Currently the JDBC driver handles this dilemma easily,
+ but (after talking to Byron), the ODBC driver needed a unique type. They had
+ created an 'lo' type, but not the solution to orphaning.
+ </para>
+ <para>
+ You don't actually have to use the 'lo' type to use the trigger, but it may be
+ convenient to use it to keep track of which columns in your database represent
+ large objects that you are managing with the trigger.
+ </para>
+ </sect2>
+
+ <sect2>
+ <title>How to Use</title>
+ <para>
+ The easiest way is by an example:
+ </para>
+ <programlisting>
+ CREATE TABLE image (title TEXT, raster lo);
+ CREATE TRIGGER t_raster BEFORE UPDATE OR DELETE ON image
+ FOR EACH ROW EXECUTE PROCEDURE lo_manage(raster);
+ </programlisting>
+ <para>
+ Create a trigger for each column that contains a lo type, and give the column
+ name as the trigger procedure argument. You can have more than one trigger on
+ a table if you need multiple lo columns in the same table, but don't forget to
+ give a different name to each trigger.
+ </para>
+ </sect2>
+
+ <sect2>
+ <title>Issues</title>
+
+ <itemizedlist>
+ <listitem>
+ <para>
+ Dropping a table will still orphan any objects it contains, as the trigger
+ is not executed.
+ </para>
+ <para>
+ Avoid this by preceding the 'drop table' with 'delete from {table}'.
+ </para>
+ <para>
+ If you already have, or suspect you have, orphaned large objects, see
+ the contrib/vacuumlo module to help you clean them up. It's a good idea
+ to run contrib/vacuumlo occasionally as a back-stop to the lo_manage
+ trigger.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ Some frontends may create their own tables, and will not create the
+ associated trigger(s). Also, users may not remember (or know) to create
+ the triggers.
+ </para>
+ </listitem>
+ </itemizedlist>
+
+ <para>
+ As the ODBC driver needs a permanent lo type (& JDBC could be optimised to
+ use it if it's Oid is fixed), and as the above issues can only be fixed by
+ some internal changes, I feel it should become a permanent built-in type.
+ </para>
+ </sect2>
+
+ <sect2>
+ <title>Author</title>
+ <para>
+ Peter Mount <email>peter@retep.org.uk</email> June 13 1998
+ </para>
+ </sect2>
+</sect1>
+
diff --git a/doc/src/sgml/ltree.sgml b/doc/src/sgml/ltree.sgml
new file mode 100644
index 00000000000..75c02013c73
--- /dev/null
+++ b/doc/src/sgml/ltree.sgml
@@ -0,0 +1,771 @@
+
+<sect1 id="ltree">
+ <title>ltree</title>
+
+ <indexterm zone="ltree">
+ <primary>ltree</primary>
+ </indexterm>
+
+ <para>
+ <literal>ltree</literal> is a PostgreSQL module that contains implementation
+ of data types, indexed access methods and queries for data organized as a
+ tree-like structures.
+ </para>
+
+ <sect2>
+ <title>Definitions</title>
+ <para>
+ A <emphasis>label</emphasis> of a node is a sequence of one or more words
+ separated by blank character '_' and containing letters and digits ( for
+ example, [a-zA-Z0-9] for C locale). The length of a label is limited by 256
+ bytes.
+ </para>
+ <para>
+ Example: 'Countries', 'Personal_Services'
+ </para>
+ <para>
+ A <emphasis>label path</emphasis> of a node is a sequence of one or more
+ dot-separated labels l1.l2...ln, represents path from root to the node. The
+ length of a label path is limited by 65Kb, but size <= 2Kb is preferrable.
+ We consider it's not a strict limitation (maximal size of label path for
+ DMOZ catalogue - <ulink url="http://www.dmoz.org"></ulink>, is about 240
+ bytes!)
+ </para>
+ <para>
+ Example: <literal>'Top.Countries.Europe.Russia'</literal>
+ </para>
+ <para>
+ We introduce several datatypes:
+ </para>
+ <itemizedlist>
+ <listitem>
+ <para>
+ <literal>ltree</literal> - is a datatype for label path.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <literal>ltree[]</literal> - is a datatype for arrays of ltree.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <literal>lquery</literal>
+ - is a path expression that has regular expression in the label path and
+ used for ltree matching. Star symbol (*) is used to specify any number of
+ labels (levels) and could be used at the beginning and the end of lquery,
+ for example, '*.Europe.*'.
+ </para>
+ <para>
+ The following quantifiers are recognized for '*' (like in Perl):
+ </para>
+ <itemizedlist>
+ <listitem>
+ <para>{n} Match exactly n levels</para>
+ </listitem>
+ <listitem>
+ <para>{n,} Match at least n levels</para>
+ </listitem>
+ <listitem>
+ <para>{n,m} Match at least n but not more than m levels</para>
+ </listitem>
+ <listitem>
+ <para>{,m} Match at maximum m levels (eq. to {0,m})</para>
+ </listitem>
+ </itemizedlist>
+ <para>
+ It is possible to use several modifiers at the end of a label:
+ </para>
+ <itemizedlist>
+ <listitem>
+ <para>@ Do case-insensitive label matching</para>
+ </listitem>
+ <listitem>
+ <para>* Do prefix matching for a label</para>
+ </listitem>
+ <listitem>
+ <para>% Don't account word separator '_' in label matching, that is
+ 'Russian%' would match 'Russian_nations', but not 'Russian'
+ </para>
+ </listitem>
+ </itemizedlist>
+
+ <para>
+ <literal>lquery</literal> can contain logical '!' (NOT) at the beginning
+ of the label and '|' (OR) to specify possible alternatives for label
+ matching.
+ </para>
+ <para>
+ Example of <literal>lquery</literal>:
+ </para>
+ <programlisting>
+ Top.*{0,2}.sport*@.!football|tennis.Russ*|Spain
+ a) b) c) d) e)
+ </programlisting>
+ <para>
+ A label path should
+ </para>
+ <orderedlist numeration='loweralpha'>
+ <listitem>
+ <para>
+ begin from a node with label 'Top'
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ and following zero or 2 labels until
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ a node with label beginning from case-insensitive prefix 'sport'
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ following node with label not matched 'football' or 'tennis' and
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ end on node with label beginning from 'Russ' or strictly matched
+ 'Spain'.
+ </para>
+ </listitem>
+ </orderedlist>
+
+ </listitem>
+
+ <listitem>
+ <para><literal>ltxtquery</literal>
+ - is a datatype for label searching (like type 'query' for full text
+ searching, see contrib/tsearch). It's possible to use modifiers @,%,* at
+ the end of word. The meaning of modifiers are the same as for lquery.
+ </para>
+ <para>
+ Example: <literal>'Europe & Russia*@ & !Transportation'</literal>
+ </para>
+ <para>
+ Search paths contain words 'Europe' and 'Russia*' (case-insensitive) and
+ not 'Transportation'. Notice, the order of words as they appear in label
+ path is not important !
+ </para>
+ </listitem>
+
+ </itemizedlist>
+ </sect2>
+
+ <sect2>
+ <title>Operations</title>
+ <para>
+ The following operations are defined for type ltree:
+ </para>
+
+ <itemizedlist>
+ <listitem>
+ <para>
+ <literal><,>,<=,>=,=, <></literal>
+ - Have their usual meanings. Comparison is doing in the order of direct
+ tree traversing, children of a node are sorted lexicographic.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <literal>ltree @> ltree</literal>
+ - returns TRUE if left argument is an ancestor of right argument (or
+ equal).
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <literal>ltree <@ ltree </literal>
+ - returns TRUE if left argument is a descendant of right argument (or
+ equal).
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <literal>ltree ~ lquery, lquery ~ ltree</literal>
+ - return TRUE if node represented by ltree satisfies lquery.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <literal>ltree ? lquery[], lquery ? ltree[]</literal>
+ - return TRUE if node represented by ltree satisfies at least one lquery
+ from array.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <literal>ltree @ ltxtquery, ltxtquery @ ltree</literal>
+ - return TRUE if node represented by ltree satisfies ltxtquery.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <literal>ltree || ltree, ltree || text, text || ltree</literal>
+ - return concatenated ltree.
+ </para>
+ </listitem>
+ </itemizedlist>
+
+ <para>
+ Operations for arrays of ltree (<literal>ltree[]</literal>):
+ </para>
+ <itemizedlist>
+ <listitem>
+ <para>
+ <literal>ltree[] @> ltree, ltree <@ ltree[]</literal>
+ - returns TRUE if array ltree[] contains an ancestor of ltree.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <literal>ltree @> ltree[], ltree[] <@ ltree</literal>
+ - returns TRUE if array ltree[] contains a descendant of ltree.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <literal>ltree[] ~ lquery, lquery ~ ltree[]</literal>
+ - returns TRUE if array ltree[] contains label paths matched lquery.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <literal>ltree[] ? lquery[], lquery[] ? ltree[]</literal>
+ - returns TRUE if array ltree[] contains label paths matched atleaset one
+ lquery from array.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <literal>ltree[] @ ltxtquery, ltxtquery @ ltree[]</literal>
+ - returns TRUE if array ltree[] contains label paths matched ltxtquery
+ (full text search).
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <literal>ltree[] ?@> ltree, ltree ?<@ ltree[], ltree[] ?~ lquery, ltree[] ?@ ltxtquery</literal>
+
+ - returns first element of array ltree[] satisfies corresponding condition
+ and NULL in vice versa.
+ </para>
+ </listitem>
+ </itemizedlist>
+ </sect2>
+
+ <sect2>
+ <title>Remark</title>
+
+ <para>
+ Operations <literal>&lt;@</literal>, <literal>@&gt;</literal>, <literal>@</literal> and
+ <literal>~</literal> have analogues - <literal>^&lt;@, ^@&gt;, ^@, ^~,</literal> which don't use
+ indices!
+ </para>
+ </sect2>
+
+ <sect2>
+ <title>Indices</title>
+ <para>
+ Various indices could be created to speed up execution of operations:
+ </para>
+
+ <itemizedlist>
+ <listitem>
+ <para>
+ B-tree index over ltree: <literal>&lt;, &lt;=, =, &gt;=, &gt;</literal>
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ GiST index over ltree: <literal>&lt;, &lt;=, =, &gt;=, &gt;, @&gt;, &lt;@, @, ~, ?</literal>
+ </para>
+ <para>
+ Example:
+ </para>
+ <programlisting>
+ CREATE INDEX path_gist_idx ON test USING GIST (path);
+ </programlisting>
+ </listitem>
+ <listitem>
+ <para>GiST index over ltree[]:
+ <literal>ltree[]<@ ltree, ltree @> ltree[], @, ~, ?.</literal>
+ </para>
+ <para>
+ Example:
+ </para>
+ <programlisting>
+ CREATE INDEX path_gist_idx ON test USING GIST (array_path);
+ </programlisting>
+ <para>
+ Notices: This index is lossy.
+ </para>
+ </listitem>
+ </itemizedlist>
+ </sect2>
+
+ <sect2>
+ <title>Functions</title>
+
+ <itemizedlist>
+ <listitem>
+ <para>
+ <literal>ltree subltree(ltree, start, end)</literal>
+ returns subpath of ltree from start (inclusive) until the end.
+ </para>
+ <programlisting>
+ # select subltree('Top.Child1.Child2',1,2);
+ subltree
+ --------
+ Child1
+ </programlisting>
+ </listitem>
+ <listitem>
+ <para>
+ <literal>ltree subpath(ltree, OFFSET,LEN)</literal> and
+ <literal>ltree subpath(ltree, OFFSET)</literal>
+ returns subpath of ltree from OFFSET (inclusive) with length LEN.
+ If OFFSET is negative returns subpath starts that far from the end
+ of the path. If LENGTH is omitted, returns everything to the end
+ of the path. If LENGTH is negative, leaves that many labels off
+ the end of the path.
+ </para>
+ <programlisting>
+ # select subpath('Top.Child1.Child2',1,2);
+ subpath
+ -------
+ Child1.Child2
+
+ # select subpath('Top.Child1.Child2',-2,1);
+ subpath
+ ---------
+ Child1
+ </programlisting>
+ </listitem>
+ <listitem>
+ <para>
+ <literal>int4 nlevel(ltree)</literal> - returns level of the node.
+ </para>
+ <programlisting>
+ # select nlevel('Top.Child1.Child2');
+ nlevel
+ --------
+ 3
+ </programlisting>
+ <para>
+ Note, that arguments start, end, OFFSET, LEN have meaning of level of the
+ node !
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <literal>int4 index(ltree,ltree)</literal> and
+ <literal>int4 index(ltree,ltree,OFFSET)</literal>
+ returns number of level of the first occurence of second argument in first
+ one beginning from OFFSET. if OFFSET is negative, than search begins from |
+ OFFSET| levels from the end of the path.
+ </para>
+ <programlisting>
+ SELECT index('0.1.2.3.5.4.5.6.8.5.6.8','5.6',3);
+ index
+ -------
+ 6
+ SELECT index('0.1.2.3.5.4.5.6.8.5.6.8','5.6',-4);
+ index
+ -------
+ 9
+ </programlisting>
+ </listitem>
+ <listitem>
+ <para>
+ <literal>ltree text2ltree(text)</literal> and
+ <literal>text ltree2text(text)</literal> cast functions for ltree and text.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <literal>ltree lca(ltree,ltree,...) (up to 8 arguments)</literal> and
+ <literal>ltree lca(ltree[])</literal> Returns Lowest Common Ancestor (lca).
+ </para>
+ <programlisting>
+ # select lca('1.2.2.3','1.2.3.4.5.6');
+ lca
+ -----
+ 1.2
+ # select lca('{la.2.3,1.2.3.4.5.6}') is null;
+ ?column?
+ ----------
+ f
+ </programlisting>
+ </listitem>
+ </itemizedlist>
+ </sect2>
+
+ <sect2>
+ <title>Installation</title>
+ <programlisting>
+ cd contrib/ltree
+ make
+ make install
+ make installcheck
+ </programlisting>
+ </sect2>
+
+ <sect2>
+ <title>Example</title>
+ <programlisting>
+ createdb ltreetest
+ psql ltreetest < /usr/local/pgsql/share/contrib/ltree.sql
+ psql ltreetest < ltreetest.sql
+ </programlisting>
+
+ <para>
+Now, we have a database ltreetest populated with a data describing hierarchy
+shown below:
+ </para>
+
+ <programlisting>
+
+
+ TOP
+ / | \
+ Science Hobbies Collections
+ / | \
+ Astronomy Amateurs_Astronomy Pictures
+ / \ |
+ Astrophysics Cosmology Astronomy
+ / | \
+ Galaxies Stars Astronauts
+ </programlisting>
+ <para>
+ Inheritance:
+ </para>
+
+ <programlisting>
+ltreetest=# select path from test where path <@ 'Top.Science';
+ path
+------------------------------------
+ Top.Science
+ Top.Science.Astronomy
+ Top.Science.Astronomy.Astrophysics
+ Top.Science.Astronomy.Cosmology
+(4 rows)
+ </programlisting>
+ <para>
+ Matching:
+ </para>
+ <programlisting>
+ltreetest=# select path from test where path ~ '*.Astronomy.*';
+ path
+-----------------------------------------------
+ Top.Science.Astronomy
+ Top.Science.Astronomy.Astrophysics
+ Top.Science.Astronomy.Cosmology
+ Top.Collections.Pictures.Astronomy
+ Top.Collections.Pictures.Astronomy.Stars
+ Top.Collections.Pictures.Astronomy.Galaxies
+ Top.Collections.Pictures.Astronomy.Astronauts
+(7 rows)
+ltreetest=# select path from test where path ~ '*.!pictures@.*.Astronomy.*';
+ path
+------------------------------------
+ Top.Science.Astronomy
+ Top.Science.Astronomy.Astrophysics
+ Top.Science.Astronomy.Cosmology
+(3 rows)
+ </programlisting>
+ <para>
+ Full text search:
+ </para>
+ <programlisting>
+ltreetest=# select path from test where path @ 'Astro*% & !pictures@';
+ path
+------------------------------------
+ Top.Science.Astronomy
+ Top.Science.Astronomy.Astrophysics
+ Top.Science.Astronomy.Cosmology
+ Top.Hobbies.Amateurs_Astronomy
+(4 rows)
+
+ltreetest=# select path from test where path @ 'Astro* & !pictures@';
+ path
+------------------------------------
+ Top.Science.Astronomy
+ Top.Science.Astronomy.Astrophysics
+ Top.Science.Astronomy.Cosmology
+(3 rows)
+ </programlisting>
+ <para>
+ Using Functions:
+ </para>
+ <programlisting>
+ltreetest=# select subpath(path,0,2)||'Space'||subpath(path,2) from test where path <@ 'Top.Science.Astronomy';
+ ?column?
+------------------------------------------
+ Top.Science.Space.Astronomy
+ Top.Science.Space.Astronomy.Astrophysics
+ Top.Science.Space.Astronomy.Cosmology
+(3 rows)
+We could create SQL-function:
+CREATE FUNCTION ins_label(ltree, int4, text) RETURNS ltree
+AS 'select subpath($1,0,$2) || $3 || subpath($1,$2);'
+LANGUAGE SQL IMMUTABLE;
+ </programlisting>
+ <para>
+ and previous select could be rewritten as:
+ </para>
+
+ <programlisting>
+ltreetest=# select ins_label(path,2,'Space') from test where path <@ 'Top.Science.Astronomy';
+ ins_label
+------------------------------------------
+ Top.Science.Space.Astronomy
+ Top.Science.Space.Astronomy.Astrophysics
+ Top.Science.Space.Astronomy.Cosmology
+(3 rows)
+ </programlisting>
+
+ <para>
+ Or with another arguments:
+ </para>
+
+ <programlisting>
+CREATE FUNCTION ins_label(ltree, ltree, text) RETURNS ltree
+AS 'select subpath($1,0,nlevel($2)) || $3 || subpath($1,nlevel($2));'
+LANGUAGE SQL IMMUTABLE;
+
+ltreetest=# select ins_label(path,'Top.Science'::ltree,'Space') from test where path <@ 'Top.Science.Astronomy';
+ ins_label
+------------------------------------------
+ Top.Science.Space.Astronomy
+ Top.Science.Space.Astronomy.Astrophysics
+ Top.Science.Space.Astronomy.Cosmology
+(3 rows)
+ </programlisting>
+ </sect2>
+
+ <sect2>
+ <title>Additional data</title>
+ <para>
+ To get more feeling from our ltree module you could download
+ dmozltree-eng.sql.gz (about 3Mb tar.gz archive containing 300,274 nodes),
+ available from
+ <ulink url="http://www.sai.msu.su/~megera/postgres/gist/ltree/"></ulink>
+ dmozltree-eng.sql.gz, which is DMOZ catalogue, prepared for use with ltree.
+ Setup your test database (dmoz), load ltree module and issue command:
+ </para>
+ <programlisting>
+ zcat dmozltree-eng.sql.gz| psql dmoz
+ </programlisting>
+ <para>
+ Data will be loaded into database dmoz and all indices will be created.
+ </para>
+ </sect2>
+
+ <sect2>
+ <title>Benchmarks</title>
+ <para>
+ All runs were performed on my IBM ThinkPad T21 (256 MB RAM, 750Mhz) using DMOZ
+ data, containing 300,274 nodes (see above for download link). We used some
+ basic queries typical for walking through catalog.
+ </para>
+
+ <sect3>
+ <title>Queries</title>
+ <itemizedlist>
+ <listitem>
+ <para>
+ Q0: Count all rows (sort of base time for comparison)
+ </para>
+ <programlisting>
+ select count(*) from dmoz;
+ count
+ --------
+ 300274
+ (1 row)
+ </programlisting>
+ </listitem>
+ <listitem>
+ <para>
+ Q1: Get direct children (without inheritance)
+ </para>
+ <programlisting>
+ select path from dmoz where path ~ 'Top.Adult.Arts.Animation.*{1}';
+ path
+ -----------------------------------
+ Top.Adult.Arts.Animation.Cartoons
+ Top.Adult.Arts.Animation.Anime
+ (2 rows)
+ </programlisting>
+ </listitem>
+ <listitem>
+ <para>
+ Q2: The same as Q1 but with counting of successors
+ </para>
+ <programlisting>
+ select path as parentpath , (select count(*)-1 from dmoz where path <@
+ p.path) as count from dmoz p where path ~ 'Top.Adult.Arts.Animation.*{1}';
+ parentpath | count
+ -----------------------------------+-------
+ Top.Adult.Arts.Animation.Cartoons | 2
+ Top.Adult.Arts.Animation.Anime | 61
+ (2 rows)
+ </programlisting>
+ </listitem>
+ <listitem>
+ <para>
+ Q3: Get all parents
+ </para>
+ <programlisting>
+ select path from dmoz where path @> 'Top.Adult.Arts.Animation' order by
+ path asc;
+ path
+ --------------------------
+ Top
+ Top.Adult
+ Top.Adult.Arts
+ Top.Adult.Arts.Animation
+ (4 rows)
+ </programlisting>
+ </listitem>
+ <listitem>
+ <para>
+ Q4: Get all parents with counting of children
+ </para>
+ <programlisting>
+ select path, (select count(*)-1 from dmoz where path <@ p.path) as count
+ from dmoz p where path @> 'Top.Adult.Arts.Animation' order by path asc;
+ path | count
+ --------------------------+--------
+ Top | 300273
+ Top.Adult | 4913
+ Top.Adult.Arts | 339
+ Top.Adult.Arts.Animation | 65
+ (4 rows)
+ </programlisting>
+ </listitem>
+ <listitem>
+ <para>
+ Q5: Get all children with levels
+ </para>
+ <programlisting>
+ select path, nlevel(path) - nlevel('Top.Adult.Arts.Animation') as level
+ from dmoz where path ~ 'Top.Adult.Arts.Animation.*{1,2}' order by path asc;
+ path | level
+ ------------------------------------------------+-------
+ Top.Adult.Arts.Animation.Anime | 1
+ Top.Adult.Arts.Animation.Anime.Fan_Works | 2
+ Top.Adult.Arts.Animation.Anime.Games | 2
+ Top.Adult.Arts.Animation.Anime.Genres | 2
+ Top.Adult.Arts.Animation.Anime.Image_Galleries | 2
+ Top.Adult.Arts.Animation.Anime.Multimedia | 2
+ Top.Adult.Arts.Animation.Anime.Resources | 2
+ Top.Adult.Arts.Animation.Anime.Titles | 2
+ Top.Adult.Arts.Animation.Cartoons | 1
+ Top.Adult.Arts.Animation.Cartoons.AVS | 2
+ Top.Adult.Arts.Animation.Cartoons.Members | 2
+ (11 rows)
+ </programlisting>
+ </listitem>
+ </itemizedlist>
+ </sect3>
+
+ <sect3>
+ <title>Timings</title>
+ <programlisting>
++---------------------------------------------+
+|Query|Rows|Time (ms) index|Time (ms) no index|
+|-----+----+---------------+------------------|
+| Q0| 1| NA| 1453.44|
+|-----+----+---------------+------------------|
+| Q1| 2| 0.49| 1001.54|
+|-----+----+---------------+------------------|
+| Q2| 2| 1.48| 3009.39|
+|-----+----+---------------+------------------|
+| Q3| 4| 0.55| 906.98|
+|-----+----+---------------+------------------|
+| Q4| 4| 24385.07| 4951.91|
+|-----+----+---------------+------------------|
+| Q5| 11| 0.85| 1003.23|
++---------------------------------------------+
+ </programlisting>
+ <para>
+ Timings without indices were obtained using operations which doesn't use
+ indices (see above)
+ </para>
+ </sect3>
+
+ <sect3>
+ <title>Remarks</title>
+ <para>
+ We didn't run full-scale tests, also we didn't present (yet) data for
+ operations with arrays of ltree (ltree[]) and full text searching. We'll
+ appreciate your input. So far, below some (rather obvious) results:
+ </para>
+ <itemizedlist>
+ <listitem>
+ <para>
+ Indices does help execution of queries
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ Q4 performs bad because one needs to read almost all data from the HDD
+ </para>
+ </listitem>
+ </itemizedlist>
+ </sect3>
+ </sect2>
+ <sect2>
+ <title>Some Backgrounds</title>
+ <para>
+ The approach we use for ltree is much like one we used in our other GiST based
+ contrib modules (intarray, tsearch, tree, btree_gist, rtree_gist). Theoretical
+ background is available in papers referenced from our GiST development page
+ (<ulink url="http://www.sai.msu.su/~megera/postgres/gist"></ulink>).
+ </para>
+ <para>
+ A hierarchical data structure (tree) is a set of nodes. Each node has a
+ signature (LPS) of a fixed size, which is a hashed label path of that node.
+ Traversing a tree we could *certainly* prune branches if
+ </para>
+ <programlisting>
+ LQS (bitwise AND) LPS != LQS
+ </programlisting>
+ <para>
+ where LQS is a signature of lquery or ltxtquery, obtained in the same way as
+ LPS.
+ </para>
+ <programlisting>
+ ltree[]:
+ </programlisting>
+ <para>
+ For array of ltree LPS is a bitwise OR-ed signatures of *ALL* children
+ reachable from that node. Signatures are stored in RD-tree, implemented using
+ GiST, which provides indexed access.
+ </para>
+ <programlisting>
+ ltree:
+ </programlisting>
+ <para>
+ For ltree we store LPS in a B-tree, implemented using GiST. Each node entry is
+ represented by (left_bound, signature, right_bound), so that we could speedup
+ operations <literal><, <=, =, >=, ></literal> using left_bound, right_bound and prune branches of
+ a tree using signature.
+ </para>
+ </sect2>
+ <sect2>
+ <title>Authors</title>
+ <para>
+ All work was done by Teodor Sigaev (<email>teodor@stack.net</email>) and
+ Oleg Bartunov (<email>oleg@sai.msu.su</email>). See
+ <ulink url="http://www.sai.msu.su/~megera/postgres/gist"></ulink> for
+ additional information. Authors would like to thank Eugeny Rodichev for
+ helpful discussions. Comments and bug reports are welcome.
+ </para>
+ </sect2>
+</sect1>
+
diff --git a/doc/src/sgml/oid2name.sgml b/doc/src/sgml/oid2name.sgml
new file mode 100644
index 00000000000..2c5c396522f
--- /dev/null
+++ b/doc/src/sgml/oid2name.sgml
@@ -0,0 +1,205 @@
+<sect1 id="oid2name">
+ <title>oid2name</title>
+
+ <indexterm zone="oid2name">
+ <primary>oid2name</primary>
+ </indexterm>
+
+ <para>
+ This utility allows administrators to examine the file structure used by
+ PostgreSQL. To make use of it, you need to be familiar with the file
+ structure, which is described in <xref linkend="storage">.
+ </para>
+
+ <sect2>
+ <title>Overview</title>
+ <para>
+ <literal>oid2name</literal> connects to the database and extracts OID,
+ filenode, and table name information. You can also have it show database
+ OIDs and tablespace OIDs.
+ </para>
+ <para>
+ When displaying specific tables, you can select which tables to show by
+ using -o, -f and -t. The first switch takes an OID, the second takes
+ a filenode, and the third takes a tablename (actually, it's a LIKE
+ pattern, so you can use things like "foo%"). Note that you can use as many
+ of these switches as you like, and the listing will include all objects
+ matched by any of the switches. Also note that these switches can only
+ show objects in the database given in -d.
+ </para>
+ <para>
+ If you don't give any of -o, -f or -t it will dump all the tables in the
+ database given in -d. If you don't give -d, it will show a database
+ listing. Alternatively you can give -s to get a tablespace listing.
+ </para>
+ <table>
+ <title>Additional switches</title>
+ <tgroup cols="2">
+ <tbody>
+ <row>
+ <entry><literal>-i</literal></entry>
+ <entry>include indexes and sequences in the database listing.</entry>
+ </row>
+ <row>
+ <entry><literal>-x</literal></entry>
+ <entry>display more information about each object shown: tablespace name,
+ schema name, OID.
+ </entry>
+ </row>
+ <row>
+ <entry><literal>-S</literal></entry>
+ <entry>also show system objects (those in information_schema, pg_toast
+ and pg_catalog schemas)
+ </entry>
+ </row>
+ <row>
+ <entry><literal>-q</literal></entry>
+ <entry>don't display headers(useful for scripting)</entry>
+ </row>
+ </tbody>
+ </tgroup>
+ </table>
+ </sect2>
+
+ <sect2>
+ <title>Examples</title>
+
+ <programlisting>
+$ oid2name
+All databases:
+ Oid Database Name Tablespace
+----------------------------------
+ 17228 alvherre pg_default
+ 17255 regression pg_default
+ 17227 template0 pg_default
+ 1 template1 pg_default
+
+$ oid2name -s
+All tablespaces:
+ Oid Tablespace Name
+-------------------------
+ 1663 pg_default
+ 1664 pg_global
+ 155151 fastdisk
+ 155152 bigdisk
+
+$ cd $PGDATA/17228
+
+$ # get top 10 db objects in the default tablespace, ordered by size
+$ ls -lS * | head -10
+-rw------- 1 alvherre alvherre 136536064 sep 14 09:51 155173
+-rw------- 1 alvherre alvherre 17965056 sep 14 09:51 1155291
+-rw------- 1 alvherre alvherre 1204224 sep 14 09:51 16717
+-rw------- 1 alvherre alvherre 581632 sep 6 17:51 1255
+-rw------- 1 alvherre alvherre 237568 sep 14 09:50 16674
+-rw------- 1 alvherre alvherre 212992 sep 14 09:51 1249
+-rw------- 1 alvherre alvherre 204800 sep 14 09:51 16684
+-rw------- 1 alvherre alvherre 196608 sep 14 09:50 16700
+-rw------- 1 alvherre alvherre 163840 sep 14 09:50 16699
+-rw------- 1 alvherre alvherre 122880 sep 6 17:51 16751
+
+$ oid2name -d alvherre -f 155173
+From database "alvherre":
+ Filenode Table Name
+----------------------
+ 155173 accounts
+
+$ # you can ask for more than one object
+$ oid2name -d alvherre -f 155173 -f 1155291
+From database "alvherre":
+ Filenode Table Name
+-------------------------
+ 155173 accounts
+ 1155291 accounts_pkey
+
+$ # you can also mix the options, and have more details
+$ oid2name -d alvherre -t accounts -f 1155291 -x
+From database "alvherre":
+ Filenode Table Name Oid Schema Tablespace
+------------------------------------------------------
+ 155173 accounts 155173 public pg_default
+ 1155291 accounts_pkey 1155291 public pg_default
+
+$ # show disk space for every db object
+$ du [0-9]* |
+> while read SIZE FILENODE
+> do
+> echo "$SIZE `oid2name -q -d alvherre -i -f $FILENODE`"
+> done
+16 1155287 branches_pkey
+16 1155289 tellers_pkey
+17561 1155291 accounts_pkey
+...
+
+$ # same, but sort by size
+$ du [0-9]* | sort -rn | while read SIZE FN
+> do
+> echo "$SIZE `oid2name -q -d alvherre -f $FN`"
+> done
+133466 155173 accounts
+17561 1155291 accounts_pkey
+1177 16717 pg_proc_proname_args_nsp_index
+...
+
+$ # If you want to see what's in tablespaces, use the pg_tblspc directory
+$ cd $PGDATA/pg_tblspc
+$ oid2name -s
+All tablespaces:
+ Oid Tablespace Name
+-------------------------
+ 1663 pg_default
+ 1664 pg_global
+ 155151 fastdisk
+ 155152 bigdisk
+
+$ # what databases have objects in tablespace "fastdisk"?
+$ ls -d 155151/*
+155151/17228/ 155151/PG_VERSION
+
+$ # Oh, what was database 17228 again?
+$ oid2name
+All databases:
+ Oid Database Name Tablespace
+----------------------------------
+ 17228 alvherre pg_default
+ 17255 regression pg_default
+ 17227 template0 pg_default
+ 1 template1 pg_default
+
+$ # Let's see what objects does this database have in the tablespace.
+$ cd 155151/17228
+$ ls -l
+total 0
+-rw------- 1 postgres postgres 0 sep 13 23:20 155156
+
+$ # OK, this is a pretty small table ... but which one is it?
+$ oid2name -d alvherre -f 155156
+From database "alvherre":
+ Filenode Table Name
+----------------------
+ 155156 foo
+
+$ # end of sample session.
+ </programlisting>
+
+ <para>
+ You can also get approximate size data for each object using psql. For
+ example,
+ </para>
+ <programlisting>
+ SELECT relpages, relfilenode, relname FROM pg_class ORDER BY relpages DESC;
+ </programlisting>
+ <para>
+ Each page is typically 8k. Relpages is updated by VACUUM.
+ </para>
+ </sect2>
+
+ <sect2>
+ <title>Author</title>
+ <para>
+ b. palmer, <email>bpalmer@crimelabs.net</email>
+ </para>
+ </sect2>
+
+</sect1>
+
diff --git a/doc/src/sgml/pageinspect.sgml b/doc/src/sgml/pageinspect.sgml
new file mode 100644
index 00000000000..3fe1edf378f
--- /dev/null
+++ b/doc/src/sgml/pageinspect.sgml
@@ -0,0 +1,125 @@
+
+<sect1 id="pageinspect">
+ <title>pageinspect</title>
+
+ <indexterm zone="pageinspect">
+ <primary>pageinspect</primary>
+ </indexterm>
+
+ <para>
+ The functions in this module allow you to inspect the contents of data pages
+ at a low level, for debugging purposes.
+ </para>
+
+ <sect2>
+ <title>Functions included</title>
+
+ <itemizedlist>
+ <listitem>
+ <para>
+ <literal>get_raw_page</literal> reads one block of the named table and returns a copy as a
+ bytea field. This allows a single time-consistent copy of the block to be
+ made. Use of this functions is restricted to superusers.
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ <literal>page_header</literal> shows fields which are common to all PostgreSQL heap and index
+ pages. Use of this function is restricted to superusers.
+ </para>
+ <para>
+ A page image obtained with <literal>get_raw_page</literal> should be passed as argument:
+ </para>
+ <programlisting>
+ test=# SELECT * FROM page_header(get_raw_page('pg_class',0));
+ lsn | tli | flags | lower | upper | special | pagesize | version
+ ----------+-----+-------+-------+-------+---------+----------+---------
+ 0/3C5614 | 1 | 1 | 216 | 256 | 8192 | 8192 | 4
+ (1 row)
+ </programlisting>
+ <para>
+ The returned columns correspond to the fields in the PageHeaderData-struct,
+ see src/include/storage/bufpage.h for more details.
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ <literal>heap_page_items</literal> shows all line pointers on a heap page. For those line
+ pointers that are in use, tuple headers are also shown. All tuples are
+ shown, whether or not the tuples were visible to an MVCC snapshot at the
+ time the raw page was copied. Use of this function is restricted to
+ superusers.
+ </para>
+ <para>
+ A heap page image obtained with <literal>get_raw_page</literal> should be passed as argument:
+ </para>
+ <programlisting>
+ test=# SELECT * FROM heap_page_items(get_raw_page('pg_class',0));
+ </programlisting>
+ <para>
+ See src/include/storage/itemid.h and src/include/access/htup.h for
+ explanations of the fields returned.
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ <literal>bt_metap()</literal> returns information about the btree index metapage:
+ </para>
+ <programlisting>
+ test=> SELECT * FROM bt_metap('pg_cast_oid_index');
+ -[ RECORD 1 ]-----
+ magic | 340322
+ version | 2
+ root | 1
+ level | 0
+ fastroot | 1
+ fastlevel | 0
+ </programlisting>
+ </listitem>
+
+ <listitem>
+ <para>
+ <literal>bt_page_stats()</literal> shows information about single btree pages:
+ </para>
+ <programlisting>
+ test=> SELECT * FROM bt_page_stats('pg_cast_oid_index', 1);
+ -[ RECORD 1 ]-+-----
+ blkno | 1
+ type | l
+ live_items | 256
+ dead_items | 0
+ avg_item_size | 12
+ page_size | 8192
+ free_size | 4056
+ btpo_prev | 0
+ btpo_next | 0
+ btpo | 0
+ btpo_flags | 3
+ </programlisting>
+ </listitem>
+
+ <listitem>
+ <para>
+ <literal>bt_page_items()</literal> returns information about specific items on btree pages:
+ </para>
+ <programlisting>
+ test=> SELECT * FROM bt_page_items('pg_cast_oid_index', 1);
+ itemoffset | ctid | itemlen | nulls | vars | data
+ ------------+---------+---------+-------+------+-------------
+ 1 | (0,1) | 12 | f | f | 23 27 00 00
+ 2 | (0,2) | 12 | f | f | 24 27 00 00
+ 3 | (0,3) | 12 | f | f | 25 27 00 00
+ 4 | (0,4) | 12 | f | f | 26 27 00 00
+ 5 | (0,5) | 12 | f | f | 27 27 00 00
+ 6 | (0,6) | 12 | f | f | 28 27 00 00
+ 7 | (0,7) | 12 | f | f | 29 27 00 00
+ 8 | (0,8) | 12 | f | f | 2a 27 00 00
+ </programlisting>
+ </listitem>
+ </itemizedlist>
+ </sect2>
+</sect1>
+
diff --git a/doc/src/sgml/pgbench.sgml b/doc/src/sgml/pgbench.sgml
new file mode 100644
index 00000000000..7f73dfa9eb3
--- /dev/null
+++ b/doc/src/sgml/pgbench.sgml
@@ -0,0 +1,422 @@
+
+<sect1 id="pgbench">
+ <title>pgbench</title>
+
+ <indexterm zone="pgbench">
+ <primary>pgbench</primary>
+ </indexterm>
+
+ <para>
+ <literal>pgbench</literal> is a simple program to run a benchmark test.
+ <literal>pgbench</literal> is a client application of PostgreSQL and runs
+ with PostgreSQL only. It performs lots of small and simple transactions
+ including SELECT/UPDATE/INSERT operations then calculates number of
+ transactions successfully completed within a second (transactions
+ per second, tps). Targeting data includes a table with at least 100k
+ tuples.
+ </para>
+ <para>
+ Example outputs from pgbench look like:
+ </para>
+ <programlisting>
+number of clients: 4
+number of transactions per client: 100
+number of processed transactions: 400/400
+tps = 19.875015(including connections establishing)
+tps = 20.098827(excluding connections establishing)
+ </programlisting>
+ <para> Similar program called "JDBCBench" already exists, but it requires
+ Java that may not be available on every platform. Moreover some
+ people concerned about the overhead of Java that might lead
+ inaccurate results. So I decided to write in pure C, and named
+ it "pgbench."
+ </para>
+
+ <para>
+ Features of pgbench:
+ </para>
+ <itemizedlist>
+ <listitem>
+ <para>
+ pgbench is written in C using libpq only. So it is very portable
+ and easy to install.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ pgbench can simulate concurrent connections using asynchronous
+ capability of libpq. No threading is required.
+ </para>
+ </listitem>
+ </itemizedlist>
+
+ <sect2>
+ <title>Overview</title>
+ <orderedlist>
+ <listitem>
+ <para>(optional)Initialize database by:</para>
+ <programlisting>
+pgbench -i &lt;dbname&gt;
+ </programlisting>
+ <para>
+ where &lt;dbname&gt; is the name of database. pgbench uses four tables
+ accounts, branches, history and tellers. These tables will be
+ destroyed. Be very careful if you have tables having same
+ names. Default test data contains:
+ </para>
+ <programlisting>
+table # of tuples
+-------------------------
+branches 1
+tellers 10
+accounts 100000
+history 0
+ </programlisting>
+ <para>
+ You can increase the number of tuples by using -s option. branches,
+ tellers and accounts tables are created with a fillfactor which is
+ set using -F option. See below.
+ </para>
+ </listitem>
+ <listitem>
+ <para>Run the benchmark test</para>
+ <programlisting>
+pgbench &lt;dbname&gt;
+ </programlisting>
+ <para>
+ The default configuration is:
+ </para>
+ <programlisting>
+ number of clients: 1
+ number of transactions per client: 10
+ </programlisting>
+ </listitem>
+ </orderedlist>
+
+ <table>
+ <title><literal>pgbench</literal> options</title>
+ <tgroup cols="2">
+ <thead>
+ <row>
+ <entry>Parameter</entry>
+ <entry>Description</entry>
+ </row>
+ </thead>
+ <tbody>
+ <row>
+ <entry><literal>-h hostname</literal></entry>
+ <entry>
+ <para>
+ hostname where the backend is running. If this option
+ is omitted, pgbench will connect to the localhost via
+ Unix domain socket.
+ </para>
+ </entry>
+ </row>
+ <row>
+ <entry><literal>-p port</literal></entry>
+ <entry>
+ <para>
+ the port number that the backend is accepting. default is
+ libpq's default, usually 5432.
+ </para>
+ </entry>
+ </row>
+ <row>
+ <entry><literal>-c number_of_clients</literal></entry>
+ <entry>
+ <para>
+ Number of clients simulated. default is 1.
+ </para>
+ </entry>
+ </row>
+ <row>
+ <entry><literal>-t number_of_transactions</literal></entry>
+ <entry>
+ <para>
+ Number of transactions each client runs. default is 10.
+ </para>
+ </entry>
+ </row>
+ <row>
+ <entry><literal>-s scaling_factor</literal></entry>
+ <entry>
+ <para>
+ this should be used with -i (initialize) option.
+ number of tuples generated will be multiple of the
+ scaling factor. For example, -s 100 will imply 10M
+ (10,000,000) tuples in the accounts table.
+ default is 1.
+ </para>
+ <para>
+ NOTE: scaling factor should be at least
+ as large as the largest number of clients you intend
+ to test; else you'll mostly be measuring update contention.
+ Regular (not initializing) runs using one of the
+ built-in tests will detect scale based on the number of
+ branches in the database. For custom (-f) runs it can
+ be manually specified with this parameter.
+ </para>
+ </entry>
+ </row>
+ <row>
+ <entry><literal>-D varname=value</literal></entry>
+ <entry>
+ <para>
+ Define a variable. It can be refered to by a script
+ provided by using -f option. Multiple -D options are allowed.
+ </para>
+ </entry>
+ </row>
+ <row>
+ <entry><literal>-U login</literal></entry>
+ <entry>
+ <para>
+ Specify db user's login name if it is different from
+ the Unix login name.
+ </para>
+ </entry>
+ </row>
+ <row>
+ <entry><literal>-P password</literal></entry>
+ <entry>
+ <para>
+ Specify the db password. CAUTION: using this option
+ might be a security hole since ps command will
+ show the password. Use this for TESTING PURPOSE ONLY.
+ </para>
+ </entry>
+ </row>
+ <row>
+ <entry><literal>-n</literal></entry>
+ <entry>
+ <para>
+ No vacuuming and cleaning the history table prior to the
+ test is performed.
+ </para>
+ </entry>
+ </row>
+ <row>
+ <entry><literal>-v</literal></entry>
+ <entry>
+ <para>
+ Do vacuuming before testing. This will take some time.
+ With neither -n nor -v, pgbench will vacuum tellers and
+ branches tables only.
+ </para>
+ </entry>
+ </row>
+ <row>
+ <entry><literal>-S</literal></entry>
+ <entry>
+ <para>
+ Perform select only transactions instead of TPC-B.
+ </para>
+ </entry>
+ </row>
+ <row>
+ <entry><literal>-N</literal></entry>
+ <entry>
+ <para>
+ Do not update "branches" and "tellers". This will
+ avoid heavy update contention on branches and tellers,
+ while it will not make pgbench supporting TPC-B like
+ transactions.
+ </para>
+ </entry>
+ </row>
+ <row>
+ <entry><literal>-f filename</literal></entry>
+ <entry>
+ <para>
+ Read transaction script from file. Detailed
+ explanation will appear later.
+ </para>
+ </entry>
+ </row>
+ <row>
+ <entry><literal>-C</literal></entry>
+ <entry>
+ <para>
+ Establish connection for each transaction, rather than
+ doing it just once at beginning of pgbench in the normal
+ mode. This is useful to measure the connection overhead.
+ </para>
+ </entry>
+ </row>
+ <row>
+ <entry><literal>-l</literal></entry>
+ <entry>
+ <para>
+ Write the time taken by each transaction to a logfile,
+ with the name "pgbench_log.xxx", where xxx is the PID
+ of the pgbench process. The format of the log is:
+ </para>
+ <programlisting>
+ client_id transaction_no time file_no time-epoch time-us
+ </programlisting>
+ <para>
+ where time is measured in microseconds, , the file_no is
+ which test file was used (useful when multiple were
+ specified with -f), and time-epoch/time-us are a
+ UNIX epoch format timestamp followed by an offset
+ in microseconds (suitable for creating a ISO 8601
+ timestamp with a fraction of a second) of when
+ the transaction completed.
+ </para>
+ <para>
+ Here are example outputs:
+ </para>
+ <programlisting>
+ 0 199 2241 0 1175850568 995598
+ 0 200 2465 0 1175850568 998079
+ 0 201 2513 0 1175850569 608
+ 0 202 2038 0 1175850569 2663
+ </programlisting>
+ </entry>
+ </row>
+ <row>
+ <entry><literal>-F fillfactor</literal></entry>
+ <entry>
+ <para>
+ Create tables(accounts, tellers and branches) with the given
+ fillfactor. Default is 100. This should be used with -i
+ (initialize) option.
+ </para>
+ </entry>
+ </row>
+ <row>
+ <entry><literal>-d</literal></entry>
+ <entry>
+ <para>
+ debug option.
+ </para>
+ </entry>
+ </row>
+ </tbody>
+ </tgroup>
+ </table>
+ </sect2>
+
+ <sect2>
+ <title>What is the "transaction" actually performed in pgbench?</title>
+ <orderedlist>
+ <listitem><para><literal>begin;</literal></para></listitem>
+
+ <listitem><para><literal>update accounts set abalance = abalance + :delta where aid = :aid;</literal></para></listitem>
+
+ <listitem><para><literal>select abalance from accounts where aid = :aid;</literal></para></listitem>
+
+ <listitem><para><literal>update tellers set tbalance = tbalance + :delta where tid = :tid;</literal></para></listitem>
+
+ <listitem><para><literal>update branches set bbalance = bbalance + :delta where bid = :bid;</literal></para></listitem>
+
+ <listitem><para><literal>insert into history(tid,bid,aid,delta) values(:tid,:bid,:aid,:delta);</literal></para></listitem>
+
+ <listitem><para><literal>end;</literal></para></listitem>
+ </orderedlist>
+ <para>
+ If you specify -N, (4) and (5) aren't included in the transaction.
+ </para>
+ </sect2>
+
+ <sect2>
+ <title>Script file</title>
+ <para>
+ <literal>pgbench</literal> has support for reading a transaction script
+ from a specified file (<literal>-f</literal> option). This file should
+ include SQL commands in each line. SQL command consists of multiple lines
+ are not supported. Empty lines and lines begging with "--" will be ignored.
+ </para>
+ <para>
+ Multiple <literal>-f</literal> options are allowed. In this case each
+ transaction is assigned randomly chosen script.
+ </para>
+ <para>
+ SQL commands can include "meta command" which begins with "\" (back
+ slash). A meta command takes some arguments separted by white
+ spaces. Currently following meta command is supported:
+ </para>
+
+ <itemizedlist>
+ <listitem>
+ <para>
+ <literal>\set name operand1 [ operator operand2 ]</literal>
+ - Sets the calculated value using "operand1" "operator"
+ "operand2" to variable "name". If "operator" and "operand2"
+ are omitted, the value of operand1 is set to variable "name".
+ </para>
+ <para>
+ Example:
+ </para>
+ <programlisting>
+\set ntellers 10 * :scale
+ </programlisting>
+ </listitem>
+ <listitem>
+ <para>
+ <literal>\setrandom name min max</literal>
+ - Assigns random integer to name between min and max
+ </para>
+ <para>
+ Example:
+ </para>
+ <programlisting>
+\setrandom aid 1 100000
+ </programlisting>
+ </listitem>
+ <listitem>
+ <para>
+ Variables can be referred to in SQL comands by adding ":" in front
+ of the varible name.
+ </para>
+ <para>
+ Example:
+ </para>
+ <programlisting>
+SELECT abalance FROM accounts WHERE aid = :aid
+ </programlisting>
+ <para>
+ Variables can also be defined by using -D option.
+ </para>
+ </listitem>
+ </itemizedlist>
+ </sect2>
+
+ <sect2>
+ <title>Examples</title>
+ <para>
+ Example, TPC-B like benchmark can be defined as follows(scaling
+ factor = 1):
+ </para>
+ <programlisting>
+\set nbranches :scale
+\set ntellers 10 * :scale
+\set naccounts 100000 * :scale
+\setrandom aid 1 :naccounts
+\setrandom bid 1 :nbranches
+\setrandom tid 1 :ntellers
+\setrandom delta 1 10000
+BEGIN
+UPDATE accounts SET abalance = abalance + :delta WHERE aid = :aid
+SELECT abalance FROM accounts WHERE aid = :aid
+UPDATE tellers SET tbalance = tbalance + :delta WHERE tid = :tid
+UPDATE branches SET bbalance = bbalance + :delta WHERE bid = :bid
+INSERT INTO history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, 'now')
+END
+ </programlisting>
+ <para>
+ If you want to automatically set the scaling factor from the number of
+ tuples in branches table, use -s option and shell command like this:
+ </para>
+ <programlisting>
+pgbench -s $(psql -At -c "SELECT count(*) FROM branches") -f tpc_b.sql
+ </programlisting>
+ <para>
+ Notice that -f option does not execute vacuum and clearing history
+ table before starting benchmark.
+ </para>
+ </sect2>
+
+</sect1>
+
diff --git a/doc/src/sgml/pgcrypto.sgml b/doc/src/sgml/pgcrypto.sgml
new file mode 100644
index 00000000000..4da29e03779
--- /dev/null
+++ b/doc/src/sgml/pgcrypto.sgml
@@ -0,0 +1,1144 @@
+
+<sect1 id="pgcrypto">
+ <title>pgcrypto</title>
+
+ <indexterm zone="pgcrypto">
+ <primary>pgcrypto</primary>
+ </indexterm>
+
+ <para>
+ This module provides cryptographic functions for PostgreSQL.
+ </para>
+
+ <sect2>
+ <title>Notes</title>
+ <sect3>
+ <title>Configuration</title>
+ <para>
+ pgcrypto configures itself according to the findings of main PostgreSQL
+ <literal>configure</literal> script. The options that affect it are
+ <literal>--with-zlib</literal> and <literal>--with-openssl</literal>.
+ </para>
+ <para>
+ When compiled with zlib, PGP encryption functions are able to
+ compress data before encrypting.
+ </para>
+ <para>
+ When compiled with OpenSSL there will be more algorithms available.
+ Also public-key encryption functions will be faster as OpenSSL
+ has more optimized BIGNUM functions.
+ </para>
+ <para>
+ Summary of functionality with and without OpenSSL:
+ </para>
+ <table>
+ <title>Summary of functionality with and without OpenSSL</title>
+ <tgroup cols="3">
+ <thead>
+ <row>
+ <entry>Functionality</entry>
+ <entry>built-in</entry>
+ <entry>OpenSSL</entry>
+ </row>
+ </thead>
+ <tbody>
+ <row>
+ <entry>MD5</entry>
+ <entry>yes</entry>
+ <entry>yes</entry>
+ </row>
+ <row>
+ <entry>SHA1</entry>
+ <entry>yes</entry>
+ <entry>yes</entry>
+ </row>
+ <row>
+ <entry>SHA224/256/384/512</entry>
+ <entry>yes</entry>
+ <entry>yes (3)</entry>
+ </row>
+ <row>
+ <entry>Any other digest algo</entry>
+ <entry>no</entry>
+ <entry>yes (1)</entry>
+ </row>
+ <row>
+ <entry>Blowfish</entry>
+ <entry>yes</entry>
+ <entry>yes</entry>
+ </row>
+ <row>
+ <entry>AES</entry>
+ <entry>yes</entry>
+ <entry>yes (2)</entry>
+ </row>
+ <row>
+ <entry>DES/3DES/CAST5</entry>
+ <entry>no</entry>
+ <entry>yes</entry>
+ </row>
+ <row>
+ <entry>Raw encryption</entry>
+ <entry>yes</entry>
+ <entry>yes</entry>
+ </row>
+ <row>
+ <entry>PGP Symetric encryption</entry>
+ <entry>yes</entry>
+ <entry>yes</entry>
+ </row>
+ <row>
+ <entry>PGP Public-Key encryption</entry>
+ <entry>yes</entry>
+ <entry>yes</entry>
+ </row>
+ </tbody>
+ </tgroup>
+ </table>
+
+ <orderedlist>
+ <listitem>
+ <para>
+ Any digest algorithm OpenSSL supports is automatically picked up.
+ This is not possible with ciphers, which need to be supported
+ explicitly.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ AES is included in OpenSSL since version 0.9.7. If pgcrypto is
+ compiled against older version, it will use built-in AES code,
+ so it has AES always available.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ SHA2 algorithms were added to OpenSSL in version 0.9.8. For
+ older versions, pgcrypto will use built-in code.
+ </para>
+ </listitem>
+ </orderedlist>
+ </sect3>
+
+ <sect3>
+ <title>NULL handling</title>
+ <para>
+ As standard in SQL, all functions return NULL, if any of the arguments
+ are NULL. This may create security risks on careless usage.
+ </para>
+ </sect3>
+
+ <sect3>
+ <title>Security</title>
+ <para>
+ All the functions here run inside database server. That means that all
+ the data and passwords move between pgcrypto and client application in
+ clear-text. Thus you must:
+ </para>
+
+ <orderedlist>
+ <listitem>
+ <para>Connect locally or use SSL connections.</para>
+ </listitem>
+ <listitem>
+ <para>Trust both system and database administrator.</para>
+ </listitem>
+ </orderedlist>
+ <para>
+ If you cannot, then better do crypto inside client application.
+ </para>
+ </sect3>
+ </sect2>
+
+ <sect2>
+ <title>General hashing</title>
+
+ <sect3>
+ <title><literal>digest(data, type)</literal></title>
+ <programlisting>
+ digest(data text, type text) RETURNS bytea
+ digest(data bytea, type text) RETURNS bytea
+ </programlisting>
+ <para>
+ Type is here the algorithm to use. Standard algorithms are `md5` and
+ `sha1`, although there may be more supported, depending on build
+ options.
+ </para>
+ <para>
+ Returns binary hash.
+ </para>
+ <para>
+ If you want hexadecimal string, use `encode()` on result. Example:
+ </para>
+ <programlisting>
+ CREATE OR REPLACE FUNCTION sha1(bytea) RETURNS text AS $$
+ SELECT encode(digest($1, 'sha1'), 'hex')
+ $$ LANGUAGE SQL STRICT IMMUTABLE;
+ </programlisting>
+ </sect3>
+
+ <sect3>
+ <title><literal>hmac(data, key, type)</literal></title>
+ <programlisting>
+ hmac(data text, key text, type text) RETURNS bytea
+ hmac(data bytea, key text, type text) RETURNS bytea
+ </programlisting>
+ <para>
+ Calculates Hashed MAC over data. `type` is the same as in `digest()`.
+ If the key is larger than hash block size it will first hashed and the
+ hash will be used as key.
+ </para>
+ <para>
+ It is similar to digest() but the hash can be recalculated only knowing
+ the key. This avoids the scenario of someone altering data and also
+ changing the hash.
+ </para>
+ <para>
+ Returns binary hash.
+ </para>
+ </sect3>
+ </sect2>
+
+ <sect2>
+ <title>Password hashing</title>
+ <para>
+ The functions <literal>crypt()</literal> and <literal>gen_salt()</literal> are specifically designed
+ for hashing passwords. <literal>crypt()</literal> does the hashing and `gen_salt()`
+ prepares algorithm parameters for it.
+ </para>
+ <para>
+ The algorithms in `crypt()` differ from usual hashing algorithms like
+ MD5 or SHA1 in following respects:
+ </para>
+ <orderedlist>
+ <listitem>
+ <para>
+ They are slow. As the amount of data is so small, this is only
+ way to make brute-forcing passwords hard.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ Include random 'salt' with result, so that users having same
+ password would have different crypted passwords. This is also
+ additional defense against reversing the algorithm.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ Include algorithm type in the result, so passwords hashed with
+ different algorithms can co-exist.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ Some of them are adaptive - that means after computers get
+ faster, you can tune the algorithm to be slower, without
+ introducing incompatibility with existing passwords.
+ </para>
+ </listitem>
+ </orderedlist>
+
+ <para>
+ Supported algorithms:
+ </para>
+ <programlisting>
+`------`-------------`---------`----------`---------------------------
+ Type Max password Adaptive Salt bits Description
+----------------------------------------------------------------------
+`bf` 72 yes 128 Blowfish-based, variant 2a
+`md5` unlimited no 48 md5-based crypt()
+`xdes` 8 yes 24 Extended DES
+`des` 8 no 12 Original UNIX crypt
+----------------------------------------------------------------------
+ </programlisting>
+
+ <sect3>
+ <title>crypt(password, salt)</title>
+ <programlisting>
+ crypt(password text, salt text) RETURNS text
+ </programlisting>
+ <para>
+ Calculates UN*X crypt(3) style hash of password. When storing new
+ password, you need to use function `gen_salt()` to generate new salt.
+ When checking password you should use existing hash as salt.
+ </para>
+ <para>
+ Example - setting new password:
+ </para>
+ <programlisting>
+ UPDATE .. SET pswhash = crypt('new password', gen_salt('md5'));
+ </programlisting>
+ <para>
+ Example - authentication:
+ </para>
+ <programlisting>
+ SELECT pswhash = crypt('entered password', pswhash) WHERE .. ;
+ </programlisting>
+ <para>
+ returns true or false whether the entered password is correct.
+ It also can return NULL if `pswhash` field is NULL.
+ </para>
+ </sect3>
+
+ <sect3>
+ <title>gen_salt(type)</title>
+ <programlisting>
+ gen_salt(type text) RETURNS text
+ </programlisting>
+ <para>
+ Generates a new random salt for usage in `crypt()`. For adaptible
+ algorithms, it uses the default iteration count.
+ </para>
+ <para>
+ Accepted types are: `des`, `xdes`, `md5` and `bf`.
+ </para>
+ </sect3>
+ <sect3>
+ <title>gen_salt(type, rounds)</title>
+ <programlisting>
+ gen_salt(type text, rounds integer) RETURNS text
+ </programlisting>
+ <para>
+ algorithms. The higher the count, the more time it takes to hash
+ the password and therefore the more time to break it. Although with
+ too high count the time to calculate a hash may be several years
+ - which is somewhat impractical.
+ </para>
+ <para>
+ Number is algorithm specific:
+ </para>
+ <programlisting>
+`-----'---------'-----'----------
+ type default min max
+---------------------------------
+ `xdes` 725 1 16777215
+ `bf` 6 4 31
+---------------------------------
+ </programlisting>
+ <para>
+ In case of xdes there is a additional limitation that the count must be
+ a odd number.
+ </para>
+ <para>
+ Notes:
+ </para>
+ <itemizedlist>
+ <listitem>
+ <para>
+ Original DES crypt was designed to have the speed of 4 hashes per
+ second on the hardware of that time.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ Slower than 4 hashes per second would probably dampen usability.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ Faster than 100 hashes per second is probably too fast.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ See next section about possible values for `crypt-bf`.
+ </para>
+ </listitem>
+ </itemizedlist>
+ </sect3>
+
+ <sect3>
+ <title>Comparison of crypt and regular hashes</title>
+ <para>
+ Here is a table that should give overview of relative slowness
+ of different hashing algorithms.
+ </para>
+ <itemizedlist>
+ <listitem>
+ <para>
+ The goal is to crack a 8-character password, which consists:
+ </para>
+ <orderedlist>
+ <listitem><para>Only of lowercase letters</para></listitem>
+ <listitem><para>Numbers, lower- and uppercase letters.</para></listitem>
+ </orderedlist>
+ </listitem>
+ <listitem>
+ <para>
+ The table below shows how much time it would take to try all
+ combinations of characters.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ The <literal>crypt-bf</literal> is featured in several settings - the number
+ after slash is the <literal>rounds</literal> parameter of
+ <literal>gen_salt()</literal>.
+ </para>
+ </listitem>
+ </itemizedlist>
+ <programlisting>
+`------------'----------'--------------'--------------------
+Algorithm Hashes/sec Chars: [a-z] Chars: [A-Za-z0-9]
+------------------------------------------------------------
+crypt-bf/8 28 246 years 251322 years
+crypt-bf/7 57 121 years 123457 years
+crypt-bf/6 112 62 years 62831 years
+crypt-bf/5 211 33 years 33351 years
+crypt-md5 2681 2.6 years 2625 years
+crypt-des 362837 7 days 19 years
+sha1 590223 4 days 12 years
+md5 2345086 1 day 3 years
+------------------------------------------------------------
+ </programlisting>
+ <itemizedlist>
+ <listitem>
+ <para>
+ The machine used is 1.5GHz Pentium 4.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ crypt-des and crypt-md5 algorithm numbers are taken from
+ John the Ripper v1.6.38 `-test` output.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ MD5 numbers are from mdcrack 1.2.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ SHA1 numbers are from lcrack-20031130-beta.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <literal>crypt-bf</literal> numbers are taken using simple program that loops
+ over 1000 8-character passwords. That way I can show the speed with
+ different number of rounds. For reference: <literal>john -test</literal>
+ shows 213 loops/sec for crypt-bf/5. (The small difference in results is
+ in accordance to the fact that the <literal>crypt-bf</literal> implementation in pgcrypto
+ is same one that is used in John the Ripper.)
+ </para>
+ </listitem>
+ </itemizedlist>
+
+ <para>
+ Note that "try all combinations" is not a realistic exercise.
+ Usually password cracking is done with the help of dictionaries, which
+ contain both regular words and various mutations of them. So, even
+ somewhat word-like passwords could be cracked much faster than the above
+ numbers suggest, and a 6-character non-word like password may escape
+ cracking. Or not.
+ </para>
+ </sect3>
+ </sect2>
+
+
+ <sect2>
+ <title>PGP encryption</title>
+ <para>
+ The functions here implement the encryption part of OpenPGP (RFC2440)
+ standard. Supported are both symmetric-key and public-key encryption.
+ </para>
+
+ <sect3>
+ <title>Overview</title>
+ <para>
+ Encrypted PGP message consists of 2 packets:
+ </para>
+ <itemizedlist>
+ <listitem><para>Packet for session key - either symmetric- or public-key encrypted.</para></listitem>
+ <listitem><para>Packet for session-key encrypted data.</para></listitem>
+ </itemizedlist>
+ <para>
+ When encrypting with password:
+ </para>
+ <orderedlist>
+ <listitem>
+ <para>
+ Given password is hashed using String2Key (S2K) algorithm. This
+ is rather similar to `crypt()` algorithm - purposefully slow
+ and with random salt - but it produces a full-length binary key.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ If separate session key is requested, new random key will be
+ generated. Otherwise S2K key will be used directly as session key.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ If S2K key is to be used directly, then only S2K settings will be put
+ into session key packet. Otherwise session key will be encrypted with
+ S2K key and put into session key packet.
+ </para>
+ </listitem>
+ </orderedlist>
+ <para>
+ When encrypting with public key:
+ </para>
+ <orderedlist>
+ <listitem><para>New random session key is generated.</para></listitem>
+ <listitem><para>It is encrypted using public key and put into session key packet.</para></listitem>
+ </orderedlist>
+
+ <para>
+ Now common part, the session-key encrypted data packet:
+ </para>
+ <orderedlist>
+ <listitem>
+ <para>
+ Optional data-manipulation: compression, conversion to UTF-8,
+ conversion of line-endings.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ Data is prefixed with block of random bytes. This is equal
+ to using random IV.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ A SHA1 hash of random prefix and data is appended.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ All this is encrypted with session key.
+ </para>
+ </listitem>
+ </orderedlist>
+ </sect3>
+
+ <sect3>
+ <title><literal>pgp_sym_encrypt(data, psw)</literal></title>
+ <programlisting>
+ pgp_sym_encrypt(data text, psw text [, options text] ) RETURNS bytea
+ pgp_sym_encrypt_bytea(data bytea, psw text [, options text] ) RETURNS bytea
+ </programlisting>
+ <para>
+ Return a symmetric-key encrypted PGP message.
+ </para>
+ <para>
+ Options are described in section 5.8.
+ </para>
+ </sect3>
+
+ <sect3>
+ <title><literal>pgp_sym_decrypt(msg, psw)</literal></title>
+ <programlisting>
+ pgp_sym_decrypt(msg bytea, psw text [, options text] ) RETURNS text
+ pgp_sym_decrypt_bytea(msg bytea, psw text [, options text] ) RETURNS bytea
+ </programlisting>
+ <para>
+ Decrypt a symmetric-key encrypted PGP message.
+ </para>
+ <para>
+ Decrypting bytea data with `pgp_sym_decrypt` is disallowed.
+ This is to avoid outputting invalid character data. Decrypting
+ originally textual data with `pgp_sym_decrypt_bytea` is fine.
+ </para>
+ <para>
+ Options are described in section 5.8.
+ </para>
+ </sect3>
+
+ <sect3>
+ <title><literal>pgp_pub_encrypt(data, pub_key)</literal></title>
+ <programlisting>
+ pgp_pub_encrypt(data text, key bytea [, options text] ) RETURNS bytea
+ pgp_pub_encrypt_bytea(data bytea, key bytea [, options text] ) RETURNS bytea
+ </programlisting>
+ <para>
+ Encrypt data with a public key. Giving this function a secret key will
+ produce a error.
+ </para>
+ <para>
+ Options are described in section 5.8.
+ </para>
+ </sect3>
+
+ <sect3>
+ <title><literal>pgp_pub_decrypt(msg, sec_key [, psw])</literal></title>
+ <programlisting>
+ pgp_pub_decrypt(msg bytea, key bytea [, psw text [, options text]] ) RETURNS text
+ pgp_pub_decrypt_bytea(msg bytea, key bytea [,psw text [, options text]] ) RETURNS bytea
+ </programlisting>
+ <para>
+ Decrypt a public-key encrypted message with secret key. If the secret
+ key is password-protected, you must give the password in `psw`. If
+ there is no password, but you want to specify option for function, you
+ need to give empty password.
+ </para>
+ <para>
+ Decrypting bytea data with `pgp_pub_decrypt` is disallowed.
+ This is to avoid outputting invalid character data. Decrypting
+ originally textual data with `pgp_pub_decrypt_bytea` is fine.
+ </para>
+ <para>
+ Options are described in section 5.8.
+ </para>
+ </sect3>
+
+ <sect3>
+ <title><literal>pgp_key_id(key / msg)</literal></title>
+ <programlisting>
+ pgp_key_id(key or msg bytea) RETURNS text
+ </programlisting>
+ <para>
+ It shows you either key ID if given PGP public or secret key. Or it
+ gives the key ID that was used for encrypting the data, if given
+ encrypted message.
+ </para>
+ <para>
+ It can return 2 special key IDs:
+ </para>
+ <itemizedlist>
+ <listitem>
+ <para>
+ SYMKEY:
+ </para>
+ <para>
+ The data is encrypted with symmetric key.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ ANYKEY:
+ </para>
+ <para>
+ The data is public-key encrypted, but the key ID is cleared.
+ That means you need to try all your secret keys on it to see
+ which one decrypts it. pgcrypto itself does not produce such
+ messages.
+ </para>
+ </listitem>
+ </itemizedlist>
+ <para>
+ Note that different keys may have same ID. This is rare but normal
+ event. Client application should then try to decrypt with each one,
+ to see which fits - like handling ANYKEY.
+ </para>
+ </sect3>
+
+ <sect3>
+ <title><literal>armor / dearmor</literal></title>
+ <programlisting>
+ armor(data bytea) RETURNS text
+ dearmor(data text) RETURNS bytea
+ </programlisting>
+ <para>
+ Those wrap/unwrap data into PGP Ascii Armor which is basically Base64
+ with CRC and additional formatting.
+ </para>
+ </sect3>
+
+ <sect3>
+ <title>Options for PGP functions</title>
+ <para>
+ Options are named to be similar to GnuPG. Values should be given after
+ an equal sign; separate options from each other with commas. Example:
+ </para>
+ <programlisting>
+ pgp_sym_encrypt(data, psw, 'compress-algo=1, cipher-algo=aes256')
+ </programlisting>
+ <para>
+ All of the options except `convert-crlf` apply only to encrypt
+ functions. Decrypt functions get the parameters from PGP data.
+ </para>
+ <para>
+ Most interesting options are probably `compression-algo` and
+ <literal>unicode-mode</literal>. The rest should have reasonable defaults.
+ </para>
+ </sect3>
+
+ <sect3>
+ <title>cipher-algo</title>
+ <para>
+ What cipher algorithm to use.
+ </para>
+ <programlisting>
+ Values: bf, aes128, aes192, aes256 (OpenSSL-only: `3des`, `cast5`)
+ Default: aes128
+ Applies: pgp_sym_encrypt, pgp_pub_encrypt
+ </programlisting>
+ </sect3>
+
+ <sect3>
+ <title>compress-algo</title>
+ <para>
+ Which compression algorithm to use. Needs building with zlib.
+ </para>
+ <para>
+ Values:
+ </para>
+ <programlisting>
+ 0 - no compression
+ 1 - ZIP compression
+ 2 - ZLIB compression [=ZIP plus meta-data and block-CRC's]
+ Default: 0
+ Applies: pgp_sym_encrypt, pgp_pub_encrypt
+ </programlisting>
+ </sect3>
+
+ <sect3>
+ <title>compress-level</title>
+ <para>
+ How much to compress. Bigger level compresses smaller but is slower.
+ 0 disables compression.
+ </para>
+ <programlisting>
+ Values: 0, 1-9
+ Default: 6
+ Applies: pgp_sym_encrypt, pgp_pub_encrypt
+ </programlisting>
+ </sect3>
+
+ <sect3>
+ <title>convert-crlf</title>
+ <para>
+ Whether to convert `\n` into `\r\n` when encrypting and `\r\n` to `\n`
+ when decrypting. RFC2440 specifies that text data should be stored
+ using `\r\n` line-feeds. Use this to get fully RFC-compliant
+ behavior.
+ </para>
+ <programlisting>
+ Values: 0, 1
+ Default: 0
+ Applies: pgp_sym_encrypt, pgp_pub_encrypt, pgp_sym_decrypt, pgp_pub_decrypt
+ </programlisting>
+ </sect3>
+
+ <sect3>
+ <title>disable-mdc</title>
+ <para>
+ Do not protect data with SHA-1. Only good reason to use this
+ option is to achieve compatibility with ancient PGP products, as the
+ SHA-1 protected packet is from upcoming update to RFC2440. (Currently
+ at version RFC2440bis-14.) Recent gnupg.org and pgp.com software
+ supports it fine.
+ </para>
+ <programlisting>
+ Values: 0, 1
+ Default: 0
+ Applies: pgp_sym_encrypt, pgp_pub_encrypt
+ </programlisting>
+ </sect3>
+
+ <sect3>
+ <title>enable-session-key</title>
+ <para>
+ Use separate session key. Public-key encryption always uses separate
+ session key, this is for symmetric-key encryption, which by default
+ uses S2K directly.
+ </para>
+ <programlisting>
+ Values: 0, 1
+ Default: 0
+ Applies: pgp_sym_encrypt
+ </programlisting>
+ </sect3>
+
+ <sect3>
+ <title>s2k-mode</title>
+ <para>
+ Which S2K algorithm to use.
+ </para>
+ <programlisting>
+ Values:
+ 0 - Without salt. Dangerous!
+ 1 - With salt but with fixed iteration count.
+ 3 - Variable iteration count.
+ Default: 3
+ Applies: pgp_sym_encrypt
+ </programlisting>
+ </sect3>
+
+ <sect3>
+ <title>s2k-digest-algo</title>
+ <para>
+ Which digest algorithm to use in S2K calculation.
+ </para>
+ <programlisting>
+ Values: md5, sha1
+ Default: sha1
+ Applies: pgp_sym_encrypt
+ </programlisting>
+ </sect3>
+
+ <sect3>
+ <title>s2k-cipher-algo</title>
+ <para>
+ Which cipher to use for encrypting separate session key.
+ </para>
+ <programlisting>
+ Values: bf, aes, aes128, aes192, aes256
+ Default: use cipher-algo.
+ Applies: pgp_sym_encrypt
+ </programlisting>
+ </sect3>
+
+ <sect3>
+ <title>unicode-mode</title>
+ <para>
+ Whether to convert textual data from database internal encoding to
+ UTF-8 and back. If your database already is UTF-8, no conversion will
+ be done, only the data will be tagged as UTF-8. Without this option
+ it will not be.
+ </para>
+ <programlisting>
+ Values: 0, 1
+ Default: 0
+ Applies: pgp_sym_encrypt, pgp_pub_encrypt
+ </programlisting>
+ </sect3>
+ </sect2>
+
+ <sect2>
+ <title>Generating keys with GnuPG</title>
+ <para>
+ Generate a new key:
+ </para>
+ <programlisting>
+ gpg --gen-key
+ </programlisting>
+ <para>
+ The preferred key type is "DSA and Elgamal".
+ </para>
+ <para>
+ For RSA encryption you must create either DSA or RSA sign-only key
+ as master and then add RSA encryption subkey with `gpg --edit-key`.
+ </para>
+ <para>
+ List keys:
+ </para>
+ <programlisting>
+ gpg --list-secret-keys
+ </programlisting>
+ <para>
+ Export ascii-armored public key:
+ </para>
+ <programlisting>
+ gpg -a --export KEYID > public.key
+ </programlisting>
+ <para>
+ Export ascii-armored secret key:
+ </para>
+ <programlisting>
+ gpg -a --export-secret-keys KEYID > secret.key
+ </programlisting>
+ <para>
+ You need to use `dearmor()` on them before giving them to
+ pgp_pub_* functions. Or if you can handle binary data, you can drop
+ "-a" from gpg.
+ </para>
+ <para>
+ For more details see `man gpg`,
+ <ulink url="http://www.gnupg.org/gph/en/manual.html"></ulink>[The GNU
+ Privacy Handbook] and other docs on
+ <ulink url="http://www.gnupg.org"></ulink> site.
+ </para>
+ </sect2>
+
+ <sect2>
+ <title>Limitations of PGP code</title>
+ <itemizedlist>
+ <listitem>
+ <para>
+ No support for signing. That also means that it is not checked
+ whether the encryption subkey belongs to master key.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ No support for encryption key as master key. As such practice
+ is generally discouraged, it should not be a problem.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ No support for several subkeys. This may seem like a problem, as this
+ is common practice. On the other hand, you should not use your regular
+ GPG/PGP keys with pgcrypto, but create new ones, as the usage scenario
+ is rather different.
+ </para>
+ </listitem>
+ </itemizedlist>
+ </sect2>
+
+ <sect2>
+ <title>Raw encryption</title>
+ <para>
+ Those functions only run a cipher over data, they don't have any advanced
+ features of PGP encryption. Therefore they have some major problems:
+ </para>
+ <orderedlist>
+ <listitem>
+ <para>
+ They use user key directly as cipher key.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ They don't provide any integrity checking, to see
+ if the encrypted data was modified.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ They expect that users manage all encryption parameters
+ themselves, even IV.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ They don't handle text.
+ </para>
+ </listitem>
+ </orderedlist>
+ <para>
+ So, with the introduction of PGP encryption, usage of raw
+ encryption functions is discouraged.
+ </para>
+ <programlisting>
+ encrypt(data bytea, key bytea, type text) RETURNS bytea
+ decrypt(data bytea, key bytea, type text) RETURNS bytea
+
+ encrypt_iv(data bytea, key bytea, iv bytea, type text) RETURNS bytea
+ decrypt_iv(data bytea, key bytea, iv bytea, type text) RETURNS bytea
+ </programlisting>
+ <para>
+ Encrypt/decrypt data with cipher, padding data if needed.
+ </para>
+ <para>
+ <literal>type</literal> parameter description in pseudo-noteup:
+ </para>
+ <programlisting>
+ algo ['-' mode] ['/pad:' padding]
+ </programlisting>
+ <para>
+ Supported algorithms:
+ </para>
+ <itemizedlist>
+ <listitem><para><literal>bf</literal>- Blowfish</para></listitem>
+ <listitem><para><literal>aes</literal>- AES (Rijndael-128)</para></listitem>
+ </itemizedlist>
+ <para>
+ Modes:
+ </para>
+ <itemizedlist>
+ <listitem>
+ <para>
+ <literal>cbc</literal>- next block depends on previous. (default)
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <literal>ecb</literal>- each block is encrypted separately. (for testing
+ only)
+ </para>
+ </listitem>
+ </itemizedlist>
+ <para>
+ Padding:
+ </para>
+ <itemizedlist>
+ <listitem>
+ <para>
+ <literal>pkcs</literal>-data may be any length (default)
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <literal>none</literal>- data must be multiple of cipher block size.
+ </para>
+ </listitem>
+ </itemizedlist>
+ <para>
+ IV is initial value for mode, defaults to all zeroes. It is ignored for
+ ECB. It is clipped or padded with zeroes if not exactly block size.
+ </para>
+ <para>
+ So, example:
+ </para>
+ <programlisting>
+ encrypt(data, 'fooz', 'bf')
+ </programlisting>
+ <para>
+ is equal to
+ </para>
+ <programlisting>
+ encrypt(data, 'fooz', 'bf-cbc/pad:pkcs')
+ </programlisting>
+ </sect2>
+
+ <sect2>
+ <title>Random bytes</title>
+ <programlisting>
+ gen_random_bytes(count integer)
+ </programlisting>
+ <para>
+ Returns `count` cryptographically strong random bytes as bytea value.
+ There can be maximally 1024 bytes extracted at a time. This is to avoid
+ draining the randomness generator pool.
+ </para>
+ </sect2>
+
+ <sect2>
+ <title>References/Links</title>
+
+ <sect3>
+ <title>Useful reading</title>
+ <itemizedlist>
+ <listitem>
+ <para><ulink url="http://www.gnupg.org/gph/en/manual.html"></ulink>:</para>
+ <para>The GNU Privacy Handbook</para>
+ </listitem>
+ <listitem>
+ <para><ulink url="http://www.openwall.com/crypt/"></ulink>:</para>
+ <para>Describes the crypt-blowfish algorithm.</para>
+ </listitem>
+ <listitem>
+ <para>
+ <ulink url="http://www.stack.nl/~galactus/remailers/passphrase-faq.html"></ulink>:
+ </para>
+ <para>How to choose good password.</para>
+ </listitem>
+ <listitem>
+ <para><ulink url="http://world.std.com/~reinhold/diceware.html"></ulink>:</para>
+ <para>Interesting idea for picking passwords.</para>
+ </listitem>
+ <listitem>
+ <para>
+ <ulink url="http://www.interhack.net/people/cmcurtin/snake-oil-faq.html"></ulink>:
+ </para>
+ <para>Describes good and bad cryptography.</para>
+ </listitem>
+ </itemizedlist>
+ </sect3>
+
+ <sect3>
+ <title>Technical references</title>
+ <itemizedlist>
+ <listitem>
+ <para><ulink url="http://www.ietf.org/rfc/rfc2440.txt"></ulink>:</para>
+ <para>OpenPGP message format</para>
+ </listitem>
+ <listitem>
+ <para>
+ <ulink url="http://www.imc.org/draft-ietf-openpgp-rfc2440bis"></ulink>:
+ </para>
+ <para>New version of RFC2440.</para>
+ </listitem>
+ <listitem>
+ <para><ulink url="http://www.ietf.org/rfc/rfc1321.txt"></ulink>:</para>
+ <para>The MD5 Message-Digest Algorithm</para>
+ </listitem>
+ <listitem>
+ <para><ulink url="http://www.ietf.org/rfc/rfc2104.txt"></ulink>:</para>
+ <para>HMAC: Keyed-Hashing for Message Authentication</para>
+ </listitem>
+ <listitem>
+ <para>
+ <ulink url="http://www.usenix.org/events/usenix99/provos.html"></ulink>:
+ </para>
+ <para>Comparison of crypt-des, crypt-md5 and bcrypt algorithms.</para>
+ </listitem>
+ <listitem>
+ <para><ulink url="http://csrc.nist.gov/cryptval/des.htm"></ulink>:</para>
+ <para>Standards for DES, 3DES and AES.</para>
+ </listitem>
+ <listitem>
+ <para>
+ <ulink url="http://en.wikipedia.org/wiki/Fortuna_(PRNG)"></ulink>:
+ </para>
+ <para>Description of Fortuna CSPRNG.</para>
+ </listitem>
+ <listitem>
+ <para><ulink url="http://jlcooke.ca/random/"></ulink>:</para>
+ <para>Jean-Luc Cooke Fortuna-based /dev/random driver for Linux.</para>
+ </listitem>
+ <listitem>
+ <para><ulink url="http://www.cs.ut.ee/~helger/crypto/"></ulink>:</para>
+ <para>Collection of cryptology pointers.</para>
+ </listitem>
+ </itemizedlist>
+ </sect3>
+ </sect2>
+
+ <sect2>
+ <title>Credits</title>
+ <para>
+ <literal>pgcrypto</literal> uses code from the following sources:
+ </para>
+ <table>
+ <title>Credits</title>
+ <tgroup cols="3">
+ <thead>
+ <row>
+ <entry><para>Algorithm</para></entry>
+ <entry><para>Author</para></entry>
+ <entry><para>Source origin</para></entry>
+ </row>
+ </thead>
+ <tbody>
+ <row>
+ <entry><para>DES crypt()</para></entry>
+ <entry><para>David Burren and others</para></entry>
+ <entry><para>FreeBSD libcrypt</para></entry>
+ </row>
+ <row>
+ <entry><para>MD5 crypt()</para></entry>
+ <entry><para>Poul-Henning Kamp</para></entry>
+ <entry><para>FreeBSD libcrypt</para></entry>
+ </row>
+ <row>
+ <entry><para>Blowfish crypt()</para></entry>
+ <entry><para>Solar Designer</para></entry>
+ <entry><para>www.openwall.com</para></entry>
+ </row>
+ <row>
+ <entry><para>Blowfish cipher</para></entry>
+ <entry><para>Simon Tatham</para></entry>
+ <entry><para>PuTTY</para></entry>
+ </row>
+ <row>
+ <entry><para>Rijndael cipher</para></entry>
+ <entry><para>Brian Gladman</para></entry>
+ <entry><para>OpenBSD sys/crypto</para></entry>
+ </row>
+ <row>
+ <entry><para>MD5 and SHA1</para></entry>
+ <entry><para>WIDE Project</para></entry>
+ <entry><para>KAME kame/sys/crypto</para></entry>
+ </row>
+ <row>
+ <entry><para>SHA256/384/512 </para></entry>
+ <entry><para>Aaron D. Gifford</para></entry>
+ <entry><para>OpenBSD sys/crypto</para></entry>
+ </row>
+ <row>
+ <entry><para>BIGNUM math</para></entry>
+ <entry><para>Michael J. Fromberger</para></entry>
+ <entry><para>dartmouth.edu/~sting/sw/imath</para></entry>
+ </row>
+ </tbody>
+ </tgroup>
+ </table>
+ </sect2>
+
+ <sect2>
+ <title>Author</title>
+ <para>
+ Marko Kreen <email>markokr@gmail.com</email>
+ </para>
+ </sect2>
+
+</sect1>
+
diff --git a/doc/src/sgml/pgrowlocks.sgml b/doc/src/sgml/pgrowlocks.sgml
new file mode 100644
index 00000000000..f7b1e479a06
--- /dev/null
+++ b/doc/src/sgml/pgrowlocks.sgml
@@ -0,0 +1,123 @@
+
+<sect1 id="pgrowlocks">
+ <title>pgrowlocks</title>
+
+ <indexterm zone="pgrowlocks">
+ <primary>pgrowlocks</primary>
+ </indexterm>
+
+ <para>
+ The <literal>pgrowlocks</literal> module provides a function to show row
+ locking information for a specified table.
+ </para>
+
+ <sect2>
+ <title>Overview</title>
+ <programlisting>
+pgrowlocks(text) RETURNS pgrowlocks_type
+ </programlisting>
+ <para>
+ The parameter is a name of table. And <literal>pgrowlocks_type</literal> is
+ defined as:
+ </para>
+ <programlisting>
+CREATE TYPE pgrowlocks_type AS (
+ locked_row TID, -- row TID
+ lock_type TEXT, -- lock type
+ locker XID, -- locking XID
+ multi bool, -- multi XID?
+ xids xid[], -- multi XIDs
+ pids INTEGER[] -- locker's process id
+);
+ </programlisting>
+
+ <table>
+ <title>pgrowlocks_type</title>
+ <tgroup cols="2">
+ <tbody>
+ <row>
+ <entry>locked_row</entry>
+ <entry>tuple ID(TID) of each locked rows</entry>
+ </row>
+ <row>
+ <entry>lock_type</entry>
+ <entry>"Shared" for shared lock, "Exclusive" for exclusive lock</entry>
+ </row>
+ <row>
+ <entry>locker</entry>
+ <entry>transaction ID of locker (Note 1)</entry>
+ </row>
+ <row>
+ <entry>multi</entry>
+ <entry>"t" if locker is a multi transaction, otherwise "f"</entry>
+ </row>
+ <row>
+ <entry>xids</entry>
+ <entry>XIDs of lockers (Note 2)</entry>
+ </row>
+ <row>
+ <entry>pids</entry>
+ <entry>process ids of locking backends</entry>
+ </row>
+ </tbody>
+ </tgroup>
+ </table>
+ <para>
+ Note1: If the locker is multi transaction, it represents the multi ID.
+ </para>
+ <para>
+ Note2: If the locker is multi, multiple data are shown.
+ </para>
+
+ <para>
+ The calling sequence for <literal>pgrowlocks</literal> is as follows:
+ <literal>pgrowlocks</literal> grabs AccessShareLock for the target table and
+ reads each row one by one to get the row locking information. You should
+ notice that:
+ </para>
+ <orderedlist>
+ <listitem>
+ <para>
+ if the table is exclusive locked by someone else,
+ <literal>pgrowlocks</literal> will be blocked.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <literal>pgrowlocks</literal> may show incorrect information if there's a
+ new lock or a lock is freeed while its execution.
+ </para>
+ </listitem>
+ </orderedlist>
+ <para>
+ <literal>pgrowlocks</literal> does not show the contents of locked rows. If
+ you want to take a look at the row contents at the same time, you could do
+ something like this:
+ </para>
+ <programlisting>
+SELECT * FROM accounts AS a, pgrowlocks('accounts') AS p WHERE p.locked_ row = a.ctid;
+ </programlisting>
+ </sect2>
+
+ <sect2>
+ <title>Example</title>
+ <para>
+ <literal>pgrowlocks</literal> returns the following data type:
+ </para>
+ <para>
+ Here is a sample execution of pgrowlocks:
+ </para>
+ <programlisting>
+test=# SELECT * FROM pgrowlocks('t1');
+ locked_row | lock_type | locker | multi | xids | pids
+------------+-----------+--------+-------+-----------+---------------
+ (0,1) | Shared | 19 | t | {804,805} | {29066,29068}
+ (0,2) | Shared | 19 | t | {804,805} | {29066,29068}
+ (0,3) | Exclusive | 804 | f | {804} | {29066}
+ (0,4) | Exclusive | 804 | f | {804} | {29066}
+(4 rows)
+ </programlisting>
+
+ </sect2>
+</sect1>
+
diff --git a/doc/src/sgml/pgstattuple.sgml b/doc/src/sgml/pgstattuple.sgml
new file mode 100644
index 00000000000..e8fa71602cf
--- /dev/null
+++ b/doc/src/sgml/pgstattuple.sgml
@@ -0,0 +1,158 @@
+
+<sect1 id="pgstattuple">
+ <title>pgstattuple</title>
+
+ <indexterm zone="pgstattuple">
+ <primary>pgstattuple</primary>
+ </indexterm>
+
+ <para>
+ <literal>pgstattuple</literal> modules provides various functions to obtain
+ tuple statistics.
+ </para>
+
+ <sect2>
+ <title>Functions</title>
+
+ <itemizedlist>
+ <listitem>
+ <para>
+ <literal>pgstattuple()</literal> returns the relation length, percentage
+ of the "dead" tuples of a relation and other info. This may help users to
+ determine whether vacuum is necessary or not. Here is an example session:
+ </para>
+ <programlisting>
+test=> \x
+Expanded display is on.
+test=> SELECT * FROM pgstattuple('pg_catalog.pg_proc');
+-[ RECORD 1 ]------+-------
+table_len | 458752
+tuple_count | 1470
+tuple_len | 438896
+tuple_percent | 95.67
+dead_tuple_count | 11
+dead_tuple_len | 3157
+dead_tuple_percent | 0.69
+free_space | 8932
+free_percent | 1.95
+ </programlisting>
+ <para>
+ Here are explanations for each column:
+ </para>
+
+ <table>
+ <title><literal>pgstattuple()</literal> column descriptions</title>
+ <tgroup cols="2">
+ <thead>
+ <row>
+ <entry>Column</entry>
+ <entry>Description</entry>
+ </row>
+ </thead>
+ <tbody>
+ <row>
+ <entry>table_len</entry>
+ <entry>physical relation length in bytes</entry>
+ </row>
+ <row>
+ <entry>tuple_count</entry>
+ <entry>number of live tuples</entry>
+ </row>
+ <row>
+ <entry>tuple_len</entry>
+ <entry>total tuples length in bytes</entry>
+ </row>
+ <row>
+ <entry>tuple_percent</entry>
+ <entry>live tuples in %</entry>
+ </row>
+ <row>
+ <entry>dead_tuple_len</entry>
+ <entry>total dead tuples length in bytes</entry>
+ </row>
+ <row>
+ <entry>dead_tuple_percent</entry>
+ <entry>dead tuples in %</entry>
+ </row>
+ <row>
+ <entry>free_space</entry>
+ <entry>free space in bytes</entry>
+ </row>
+ <row>
+ <entry>free_percent</entry>
+ <entry>free space in %</entry>
+ </row>
+ </tbody>
+ </tgroup>
+ </table>
+ <para>
+ <note>
+ <para>
+ <literal>pgstattuple</literal> acquires only a read lock on the relation. So
+ concurrent update may affect the result.
+ </para>
+ </note>
+ <note>
+ <para>
+ <literal>pgstattuple</literal> judges a tuple is "dead" if HeapTupleSatisfiesNow()
+ returns false.
+ </para>
+ </note>
+ </para>
+ </listitem>
+
+
+ <listitem>
+ <para>
+ <literal>pg_relpages()</literal> returns the number of pages in the relation.
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ <literal>pgstatindex()</literal> returns an array showing the information about an index:
+ </para>
+ <programlisting>
+test=> \x
+Expanded display is on.
+test=> SELECT * FROM pgstatindex('pg_cast_oid_index');
+-[ RECORD 1 ]------+------
+version | 2
+tree_level | 0
+index_size | 8192
+root_block_no | 1
+internal_pages | 0
+leaf_pages | 1
+empty_pages | 0
+deleted_pages | 0
+avg_leaf_density | 50.27
+leaf_fragmentation | 0
+ </programlisting>
+ </listitem>
+ </itemizedlist>
+ </sect2>
+
+ <sect2>
+ <title>Usage</title>
+ <para>
+ <literal>pgstattuple</literal> may be called as a relation function and is
+ defined as follows:
+ </para>
+ <programlisting>
+ CREATE OR REPLACE FUNCTION pgstattuple(text) RETURNS pgstattuple_type
+ AS 'MODULE_PATHNAME', 'pgstattuple'
+ LANGUAGE C STRICT;
+
+ CREATE OR REPLACE FUNCTION pgstattuple(oid) RETURNS pgstattuple_type
+ AS 'MODULE_PATHNAME', 'pgstattuplebyid'
+ LANGUAGE C STRICT;
+ </programlisting>
+ <para>
+ The argument is the relation name (optionally it may be qualified)
+ or the OID of the relation. Note that pgstattuple only returns
+ one row.
+ </para>
+ </sect2>
+
+</sect1>
+
diff --git a/doc/src/sgml/postgres.sgml b/doc/src/sgml/postgres.sgml
index 35fb92d5739..2cc4d573130 100644
--- a/doc/src/sgml/postgres.sgml
+++ b/doc/src/sgml/postgres.sgml
@@ -1,4 +1,4 @@
-<!-- $PostgreSQL: pgsql/doc/src/sgml/postgres.sgml,v 1.83 2007/11/01 17:00:18 momjian Exp $ -->
+<!-- $PostgreSQL: pgsql/doc/src/sgml/postgres.sgml,v 1.84 2007/11/10 23:30:46 momjian Exp $ -->
<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook V4.2//EN" [
@@ -102,6 +102,7 @@
&typeconv;
&indices;
&textsearch;
+ &contrib;
&mvcc;
&perform;
diff --git a/doc/src/sgml/seg.sgml b/doc/src/sgml/seg.sgml
new file mode 100644
index 00000000000..c60f3425def
--- /dev/null
+++ b/doc/src/sgml/seg.sgml
@@ -0,0 +1,450 @@
+
+<sect1 id="seg">
+ <title>seg</title>
+
+ <indexterm zone="seg">
+ <primary>seg</primary>
+ </indexterm>
+
+ <para>
+ The <literal>seg</literal> module contains the code for the user-defined
+ type, <literal>SEG</literal>, representing laboratory measurements as
+ floating point intervals.
+ </para>
+
+ <sect2>
+ <title>Rationale</title>
+ <para>
+ The geometry of measurements is usually more complex than that of a
+ point in a numeric continuum. A measurement is usually a segment of
+ that continuum with somewhat fuzzy limits. The measurements come out
+ as intervals because of uncertainty and randomness, as well as because
+ the value being measured may naturally be an interval indicating some
+ condition, such as the temperature range of stability of a protein.
+ </para>
+ <para>
+ Using just common sense, it appears more convenient to store such data
+ as intervals, rather than pairs of numbers. In practice, it even turns
+ out more efficient in most applications.
+ </para>
+ <para>
+ Further along the line of common sense, the fuzziness of the limits
+ suggests that the use of traditional numeric data types leads to a
+ certain loss of information. Consider this: your instrument reads
+ 6.50, and you input this reading into the database. What do you get
+ when you fetch it? Watch:
+ </para>
+ <programlisting>
+test=> select 6.50 as "pH";
+ pH
+---
+6.5
+(1 row)
+ </programlisting>
+ <para>
+ In the world of measurements, 6.50 is not the same as 6.5. It may
+ sometimes be critically different. The experimenters usually write
+ down (and publish) the digits they trust. 6.50 is actually a fuzzy
+ interval contained within a bigger and even fuzzier interval, 6.5,
+ with their center points being (probably) the only common feature they
+ share. We definitely do not want such different data items to appear the
+ same.
+ </para>
+ <para>
+ Conclusion? It is nice to have a special data type that can record the
+ limits of an interval with arbitrarily variable precision. Variable in
+ a sense that each data element records its own precision.
+ </para>
+ <para>
+ Check this out:
+ </para>
+ <programlisting>
+test=> select '6.25 .. 6.50'::seg as "pH";
+ pH
+------------
+6.25 .. 6.50
+(1 row)
+ </programlisting>
+ </sect2>
+
+ <sect2>
+ <title>Syntax</title>
+ <para>
+ The external representation of an interval is formed using one or two
+ floating point numbers joined by the range operator ('..' or '...').
+ Optional certainty indicators (<, > and ~) are ignored by the internal
+ logics, but are retained in the data.
+ </para>
+
+ <table>
+ <title>Rules</title>
+ <tgroup cols="2">
+ <tbody>
+ <row>
+ <entry>rule 1</entry>
+ <entry>seg -> boundary PLUMIN deviation</entry>
+ </row>
+ <row>
+ <entry>rule 2</entry>
+ <entry>seg -> boundary RANGE boundary</entry>
+ </row>
+ <row>
+ <entry>rule 3</entry>
+ <entry>seg -> boundary RANGE</entry>
+ </row>
+ <row>
+ <entry>rule 4</entry>
+ <entry>seg -> RANGE boundary</entry>
+ </row>
+ <row>
+ <entry>rule 5</entry>
+ <entry>seg -> boundary</entry>
+ </row>
+ <row>
+ <entry>rule 6</entry>
+ <entry>boundary -> FLOAT</entry>
+ </row>
+ <row>
+ <entry>rule 7</entry>
+ <entry>boundary -> EXTENSION FLOAT</entry>
+ </row>
+ <row>
+ <entry>rule 8</entry>
+ <entry>deviation -> FLOAT</entry>
+ </row>
+ </tbody>
+ </tgroup>
+ </table>
+
+ <table>
+ <title>Tokens</title>
+ <tgroup cols="2">
+ <tbody>
+ <row>
+ <entry>RANGE</entry>
+ <entry>(\.\.)(\.)?</entry>
+ </row>
+ <row>
+ <entry>PLUMIN</entry>
+ <entry>\'\+\-\'</entry>
+ </row>
+ <row>
+ <entry>integer</entry>
+ <entry>[+-]?[0-9]+</entry>
+ </row>
+ <row>
+ <entry>real</entry>
+ <entry>[+-]?[0-9]+\.[0-9]+</entry>
+ </row>
+ <row>
+ <entry>FLOAT</entry>
+ <entry>({integer}|{real})([eE]{integer})?</entry>
+ </row>
+ <row>
+ <entry>EXTENSION</entry>
+ <entry>[<>~]</entry>
+ </row>
+ </tbody>
+ </tgroup>
+ </table>
+
+ <table>
+ <title>Examples of valid <literal>SEG</literal> representations</title>
+ <tgroup cols="2">
+ <tbody>
+ <row>
+ <entry>Any number</entry>
+ <entry>
+ (rules 5,6) -- creates a zero-length segment (a point,
+ if you will)
+ </entry>
+ </row>
+ <row>
+ <entry>~5.0</entry>
+ <entry>
+ (rules 5,7) -- creates a zero-length segment AND records
+ '~' in the data. This notation reads 'approximately 5.0',
+ but its meaning is not recognized by the code. It is ignored
+ until you get the value back. View it is a short-hand comment.
+ </entry>
+ </row>
+ <row>
+ <entry><5.0</entry>
+ <entry>
+ (rules 5,7) -- creates a point at 5.0; '<' is ignored but
+ is preserved as a comment
+ </entry>
+ </row>
+ <row>
+ <entry>>5.0</entry>
+ <entry>
+ (rules 5,7) -- creates a point at 5.0; '>' is ignored but
+ is preserved as a comment
+ </entry>
+ </row>
+ <row>
+ <entry><para>5(+-)0.3</para><para>5'+-'0.3</para></entry>
+ <entry>
+ <para>
+ (rules 1,8) -- creates an interval '4.7..5.3'. As of this
+ writing (02/09/2000), this mechanism isn't completely accurate
+ in determining the number of significant digits for the
+ boundaries. For example, it adds an extra digit to the lower
+ boundary if the resulting interval includes a power of ten:
+ </para>
+ <programlisting>
+postgres=> select '10(+-)1'::seg as seg;
+ seg
+---------
+9.0 .. 11 -- should be: 9 .. 11
+ </programlisting>
+ <para>
+ Also, the (+-) notation is not preserved: 'a(+-)b' will
+ always be returned as '(a-b) .. (a+b)'. The purpose of this
+ notation is to allow input from certain data sources without
+ conversion.
+ </para>
+ </entry>
+ </row>
+ <row>
+ <entry>50 .. </entry>
+ <entry>(rule 3) -- everything that is greater than or equal to 50</entry>
+ </row>
+ <row>
+ <entry>.. 0</entry>
+ <entry>(rule 4) -- everything that is less than or equal to 0</entry>
+ </row>
+ <row>
+ <entry>1.5e-2 .. 2E-2 </entry>
+ <entry>(rule 2) -- creates an interval (0.015 .. 0.02)</entry>
+ </row>
+ <row>
+ <entry>1 ... 2</entry>
+ <entry>
+ The same as 1...2, or 1 .. 2, or 1..2 (space is ignored).
+ Because of the widespread use of '...' in the data sources,
+ I decided to stick to is as a range operator. This, and
+ also the fact that the white space around the range operator
+ is ignored, creates a parsing conflict with numeric constants
+ starting with a decimal point.
+ </entry>
+ </row>
+ </tbody>
+ </tgroup>
+ </table>
+
+ <table>
+ <title>Examples</title>
+ <tgroup cols="2">
+ <tbody>
+ <row>
+ <entry>.1e7</entry>
+ <entry>should be: 0.1e7</entry>
+ </row>
+ <row>
+ <entry>.1 .. .2</entry>
+ <entry>should be: 0.1 .. 0.2</entry>
+ </row>
+ <row>
+ <entry>2.4 E4</entry>
+ <entry>should be: 2.4E4</entry>
+ </row>
+ </tbody>
+ </tgroup>
+ </table>
+ <para>
+ The following, although it is not a syntax error, is disallowed to improve
+ the sanity of the data:
+ </para>
+ <table>
+ <title></title>
+ <tgroup cols="2">
+ <tbody>
+ <row>
+ <entry>5 .. 2</entry>
+ <entry>should be: 2 .. 5</entry>
+ </row>
+ </tbody>
+ </tgroup>
+ </table>
+ </sect2>
+
+ <sect2>
+ <title>Precision</title>
+ <para>
+ The segments are stored internally as pairs of 32-bit floating point
+ numbers. It means that the numbers with more than 7 significant digits
+ will be truncated.
+ </para>
+ <para>
+ The numbers with less than or exactly 7 significant digits retain their
+ original precision. That is, if your query returns 0.00, you will be
+ sure that the trailing zeroes are not the artifacts of formatting: they
+ reflect the precision of the original data. The number of leading
+ zeroes does not affect precision: the value 0.0067 is considered to
+ have just 2 significant digits.
+ </para>
+ </sect2>
+
+ <sect2>
+ <title>Usage</title>
+ <para>
+ The access method for SEG is a GiST index (gist_seg_ops), which is a
+ generalization of R-tree. GiSTs allow the postgres implementation of
+ R-tree, originally encoded to support 2-D geometric types such as
+ boxes and polygons, to be used with any data type whose data domain
+ can be partitioned using the concepts of containment, intersection and
+ equality. In other words, everything that can intersect or contain
+ its own kind can be indexed with a GiST. That includes, among other
+ things, all geometric data types, regardless of their dimensionality
+ (see also contrib/cube).
+ </para>
+ <para>
+ The operators supported by the GiST access method include:
+ </para>
+ <itemizedlist>
+ <listitem>
+ <programlisting>
+[a, b] << [c, d] Is left of
+ </programlisting>
+ <para>
+ The left operand, [a, b], occurs entirely to the left of the
+ right operand, [c, d], on the axis (-inf, inf). It means,
+ [a, b] << [c, d] is true if b < c and false otherwise
+ </para>
+ </listitem>
+ <listitem>
+ <programlisting>
+[a, b] >> [c, d] Is right of
+ </programlisting>
+ <para>
+ [a, b] is occurs entirely to the right of [c, d].
+ [a, b] >> [c, d] is true if a > d and false otherwise
+ </para>
+ </listitem>
+ <listitem>
+ <programlisting>
+[a, b] &< [c, d] Overlaps or is left of
+ </programlisting>
+ <para>
+ This might be better read as "does not extend to right of".
+ It is true when b <= d.
+ </para>
+ </listitem>
+ <listitem>
+ <programlisting>
+[a, b] &> [c, d] Overlaps or is right of
+ </programlisting>
+ <para>
+ This might be better read as "does not extend to left of".
+ It is true when a >= c.
+ </para>
+ </listitem>
+ <listitem>
+ <programlisting>
+[a, b] = [c, d] Same as
+ </programlisting>
+ <para>
+ The segments [a, b] and [c, d] are identical, that is, a == b
+ and c == d
+ </para>
+ </listitem>
+ <listitem>
+ <programlisting>
+[a, b] && [c, d] Overlaps
+ </programlisting>
+ <para>
+ The segments [a, b] and [c, d] overlap.
+ </para>
+ </listitem>
+ <listitem>
+ <programlisting>
+[a, b] @> [c, d] Contains
+ </programlisting>
+ <para>
+ The segment [a, b] contains the segment [c, d], that is,
+ a <= c and b >= d
+ </para>
+ </listitem>
+ <listitem>
+ <programlisting>
+[a, b] <@ [c, d] Contained in
+ </programlisting>
+ <para>
+ The segment [a, b] is contained in [c, d], that is,
+ a >= c and b <= d
+ </para>
+ </listitem>
+ </itemizedlist>
+ <para>
+ (Before PostgreSQL 8.2, the containment operators @> and <@ were
+ respectively called @ and ~. These names are still available, but are
+ deprecated and will eventually be retired. Notice that the old names
+ are reversed from the convention formerly followed by the core geometric
+ datatypes!)
+ </para>
+ <para>
+ Although the mnemonics of the following operators is questionable, I
+ preserved them to maintain visual consistency with other geometric
+ data types defined in Postgres.
+ </para>
+ <para>
+ Other operators:
+ </para>
+
+ <programlisting>
+[a, b] < [c, d] Less than
+[a, b] > [c, d] Greater than
+ </programlisting>
+ <para>
+ These operators do not make a lot of sense for any practical
+ purpose but sorting. These operators first compare (a) to (c),
+ and if these are equal, compare (b) to (d). That accounts for
+ reasonably good sorting in most cases, which is useful if
+ you want to use ORDER BY with this type
+ </para>
+
+ <para>
+ There are a few other potentially useful functions defined in seg.c
+ that vanished from the schema because I stopped using them. Some of
+ these were meant to support type casting. Let me know if I was wrong:
+ I will then add them back to the schema. I would also appreciate
+ other ideas that would enhance the type and make it more useful.
+ </para>
+ <para>
+ For examples of usage, see sql/seg.sql
+ </para>
+ <para>
+ NOTE: The performance of an R-tree index can largely depend on the
+ order of input values. It may be very helpful to sort the input table
+ on the SEG column (see the script sort-segments.pl for an example)
+ </para>
+ </sect2>
+
+ <sect2>
+ <title>Credits</title>
+ <para>
+ My thanks are primarily to Prof. Joe Hellerstein
+ (<ulink url="http://db.cs.berkeley.edu/~jmh/"></ulink>) for elucidating the
+ gist of the GiST (<ulink url="http://gist.cs.berkeley.edu/"></ulink>). I am
+ also grateful to all postgres developers, present and past, for enabling
+ myself to create my own world and live undisturbed in it. And I would like
+ to acknowledge my gratitude to Argonne Lab and to the U.S. Department of
+ Energy for the years of faithful support of my database research.
+ </para>
+ <programlisting>
+ Gene Selkov, Jr.
+ Computational Scientist
+ Mathematics and Computer Science Division
+ Argonne National Laboratory
+ 9700 S Cass Ave.
+ Building 221
+ Argonne, IL 60439-4844
+ </programlisting>
+ <para>
+ <email>selkovjr@mcs.anl.gov</email>
+ </para>
+ </sect2>
+
+</sect1>
+
diff --git a/doc/src/sgml/sslinfo.sgml b/doc/src/sgml/sslinfo.sgml
new file mode 100644
index 00000000000..828fca2591e
--- /dev/null
+++ b/doc/src/sgml/sslinfo.sgml
@@ -0,0 +1,164 @@
+
+<sect1 id="sslinfo">
+ <title>sslinfo</title>
+
+ <indexterm zone="sslinfo">
+ <primary>sslinfo</primary>
+ </indexterm>
+
+ <para>
+ This modules provides information about current SSL certificate for PostgreSQL.
+ </para>
+
+ <sect2>
+ <title>Notes</title>
+ <para>
+ This extension won't build unless your PostgreSQL server is configured
+ with --with-openssl. Information provided with these functions would
+ be completely useless if you don't use SSL to connect to database.
+ </para>
+ </sect2>
+
+ <sect2>
+ <title>Functions Description</title>
+
+ <itemizedlist>
+ <listitem>
+ <programlisting>
+ssl_is_used() RETURNS boolean;
+ </programlisting>
+ <para>
+ Returns TRUE, if current connection to server uses SSL and FALSE
+ otherwise.
+ </para>
+ </listitem>
+
+ <listitem>
+ <programlisting>
+ssl_client_cert_present() RETURNS boolean
+ </programlisting>
+ <para>
+ Returns TRUE if current client have presented valid SSL client
+ certificate to the server and FALSE otherwise (e.g., no SSL,
+ certificate hadn't be requested by server).
+ </para>
+ </listitem>
+
+ <listitem>
+ <programlisting>
+ssl_client_serial() RETURNS numeric
+ </programlisting>
+ <para>
+ Returns serial number of current client certificate. The combination
+ of certificate serial number and certificate issuer is guaranteed to
+ uniquely identify certificate (but not its owner -- the owner ought to
+ regularily change his keys, and get new certificates from the issuer).
+ </para>
+ <para>
+ So, if you run you own CA and allow only certificates from this CA to
+ be accepted by server, the serial number is the most reliable (albeit
+ not very mnemonic) means to indentify user.
+ </para>
+ </listitem>
+
+ <listitem>
+ <programlisting>
+ssl_client_dn() RETURNS text
+ </programlisting>
+ <para>
+ Returns the full subject of current client certificate, converting
+ character data into the current database encoding. It is assumed that
+ if you use non-Latin characters in the certificate names, your
+ database is able to represent these characters, too. If your database
+ uses the SQL_ASCII encoding, non-Latin characters in the name will be
+ represented as UTF-8 sequences.
+ </para>
+ <para>
+ The result looks like '/CN=Somebody /C=Some country/O=Some organization'.
+ </para>
+ </listitem>
+
+ <listitem>
+ <programlisting>
+ssl_issuer_dn()
+ </programlisting>
+ <para>
+ Returns the full issuer name of the client certificate, converting
+ character data into current database encoding.
+ </para>
+ <para>
+ The combination of the return value of this function with the
+ certificate serial number uniquely identifies the certificate.
+ </para>
+ <para>
+ The result of this function is really useful only if you have more
+ than one trusted CA certificate in your server's root.crt file, or if
+ this CA has issued some intermediate certificate authority
+ certificates.
+ </para>
+ </listitem>
+
+ <listitem>
+ <programlisting>
+ssl_client_dn_field(fieldName text) RETURNS text
+ </programlisting>
+ <para>
+ This function returns the value of the specified field in the
+ certificate subject. Field names are string constants that are
+ converted into ASN1 object identificators using the OpenSSL object
+ database. The following values are acceptable:
+ </para>
+ <programlisting>
+commonName (alias CN)
+surname (alias SN)
+name
+givenName (alias GN)
+countryName (alias C)
+localityName (alias L)
+stateOrProvinceName (alias ST)
+organizationName (alias O)
+organizationUnitName (alias OU)
+title
+description
+initials
+postalCode
+streetAddress
+generationQualifier
+description
+dnQualifier
+x500UniqueIdentifier
+pseudonim
+role
+emailAddress
+ </programlisting>
+ <para>
+ All of these fields are optional, except commonName. It depends
+ entirely on your CA policy which of them would be included and which
+ wouldn't. The meaning of these fields, howeer, is strictly defined by
+ the X.500 and X.509 standards, so you cannot just assign arbitrary
+ meaning to them.
+ </para>
+ </listitem>
+
+ <listitem>
+ <programlisting>
+ssl_issuer_field(fieldName text) RETURNS text;
+ </programlisting>
+ <para>
+ Does same as ssl_client_dn_field, but for the certificate issuer
+ rather than the certificate subject.
+ </para>
+ </listitem>
+ </itemizedlist>
+ </sect2>
+
+ <sect2>
+ <title>Author</title>
+ <para>
+ Victor Wagner <email>vitus@cryptocom.ru</email>, Cryptocom LTD
+ E-Mail of Cryptocom OpenSSL development group:
+ <email>openssl@cryptocom.ru</email>
+ </para>
+ </sect2>
+</sect1>
+
diff --git a/doc/src/sgml/standby.sgml b/doc/src/sgml/standby.sgml
new file mode 100644
index 00000000000..120fed4c2c0
--- /dev/null
+++ b/doc/src/sgml/standby.sgml
@@ -0,0 +1,249 @@
+
+<sect1 id="pgstandby">
+ <title>pg_standby</title>
+
+ <indexterm zone="pgstandby">
+ <primary>pgstandby</primary>
+ </indexterm>
+
+ <para>
+ <literal>pg_standby</literal> is a production-ready program that can be used
+ to create a Warm Standby server. Other configuration is required as well,
+ all of which is described in the main server manual.
+ </para>
+ <para>
+ The program is designed to be a wait-for <literal>restore_command</literal>,
+ required to turn a normal archive recovery into a Warm Standby. Within the
+ <literal>restore_command</literal> of the <literal>recovery.conf</literal>
+ you could configure <literal>pg_standby</literal> in the following way:
+ </para>
+ <programlisting>
+ restore_command = 'pg_standby archiveDir %f %p'
+ </programlisting>
+ <para>
+ which would be sufficient to define that files will be restored from
+ archiveDir.
+ </para>
+
+ <para>
+ <literal>pg_standby</literal> features include:
+ </para>
+ <itemizedlist>
+ <listitem>
+ <para>
+ It is written in C. So it is very portable
+ and easy to install.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ Supports copy or link from a directory (only)
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ Source easy to modify, with specifically designated
+ sections to modify for your own needs, allowing
+ interfaces to be written for additional Backup Archive Restore
+ (BAR) systems
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ Already tested on Linux and Windows
+ </para>
+ </listitem>
+ </itemizedlist>
+
+ <sect2>
+ <title>Usage</title>
+ <para>
+ <literal>pg_standby</literal> should be used within the
+ <literal>restore_command</literal> of the <literal>recovery.conf</literal>
+ file.
+ </para>
+ <para>
+ The basic usage should be like this:
+ </para>
+ <programlisting>
+ restore_command = 'pg_standby archiveDir %f %p'
+ </programlisting>
+ <para>
+ with the pg_standby command usage as
+ </para>
+ <programlisting>
+ pg_standby [OPTION]... [ARCHIVELOCATION] [NEXTWALFILE] [XLOGFILEPATH]
+ </programlisting>
+ <para>
+ When used within the <literal>restore_command</literal> the %f and %p macros
+ will provide the actual file and path required for the restore/recovery.
+ </para>
+
+ <table>
+ <title>Options</title>
+ <tgroup cols="2">
+ <tbody>
+ <row>
+ <entry>-c</entry>
+ <entry> use copy/cp command to restore WAL files from archive</entry>
+ </row>
+ <row>
+ <entry>-d</entry>
+ <entry>debug/logging option.</entry>
+ </row>
+ <row>
+ <entry>-k numfiles</entry>
+ <entry>
+ <para>
+ Cleanup files in the archive so that we maintain no more
+ than this many files in the archive.
+ </para>
+ <para>
+ You should be wary against setting this number too low,
+ since this may mean you cannot restart the standby. This
+ is because the last restartpoint marked in the WAL files
+ may be many files in the past and can vary considerably.
+ This should be set to a value exceeding the number of WAL
+ files that can be recovered in 2*checkpoint_timeout seconds,
+ according to the value in the warm standby postgresql.conf.
+ It is wholly unrelated to the setting of checkpoint_segments
+ on either primary or standby.
+ </para>
+ <para>
+ If in doubt, use a large value or do not set a value at all.
+ </para>
+ </entry>
+ </row>
+ <row>
+ <entry>-l</entry>
+ <entry>
+ <para>
+ use ln command to restore WAL files from archive
+ WAL files will remain in archive
+ </para>
+ <para>
+ Link is more efficient, but the default is copy to
+ allow you to maintain the WAL archive for recovery
+ purposes as well as high-availability.
+ </para>
+ <para>
+ This option uses the Windows Vista command mklink
+ to provide a file-to-file symbolic link. -l will
+ not work on versions of Windows prior to Vista.
+ Use the -c option instead.
+ see <ulink url="http://en.wikipedia.org/wiki/NTFS_symbolic_link"></ulink>
+ </para>
+ </entry>
+ </row>
+ <row>
+ <entry>-r maxretries</entry>
+ <entry>
+ <para>
+ the maximum number of times to retry the restore command if it
+ fails. After each failure, we wait for sleeptime * num_retries
+ so that the wait time increases progressively, so by default
+ we will wait 5 secs, 10 secs then 15 secs before reporting
+ the failure back to the database server. This will be
+ interpreted as and end of recovery and the Standby will come
+ up fully as a result. <literal>Default=3</literal>
+ </para>
+ </entry>
+ </row>
+ <row>
+ <entry>-s sleeptime</entry>
+ <entry>
+ the number of seconds to sleep between testing to see
+ if the file to be restored is available in the archive yet.
+ The default setting is not necessarily recommended,
+ consult the main database server manual for discussion.
+ <literal>Default=5</literal>
+ </entry>
+ </row>
+ <row>
+ <entry>-t triggerfile</entry>
+ <entry>
+ the presence of the triggerfile will cause recovery to end
+ whether or not the next file is available
+ It is recommended that you use a structured filename to
+ avoid confusion as to which server is being triggered
+ when multiple servers exist on same system.
+ e.g. /tmp/pgsql.trigger.5432
+ </entry>
+ </row>
+ <row>
+ <entry>-w maxwaittime</entry>
+ <entry>
+ the maximum number of seconds to wait for the next file,
+ after which recovery will end and the Standby will come up.
+ The default setting is not necessarily recommended,
+ consult the main database server manual for discussion.
+ <literal>Default=0</literal>
+ </entry>
+ </row>
+ </tbody>
+ </tgroup>
+ </table>
+ <note>
+ <para>
+ <literal>--help</literal> is not supported since
+ <literal>pg_standby</literal> is not intended for interactive use, except
+ during development and testing.
+ </para>
+ </note>
+ </sect2>
+
+ <sect2>
+ <title>Examples</title>
+
+ <itemizedlist>
+ <listitem>
+ <para>Example on Linux</para>
+ <programlisting>
+archive_command = 'cp %p ../archive/%f'
+
+restore_command = 'pg_standby -l -d -k 255 -r 2 -s 2 -w 0 -t /tmp/pgsql.trigger.5442 $PWD/../archive %f %p 2>> standby.log'
+ </programlisting>
+ <para>
+ which will
+ </para>
+ <itemizedlist>
+ <listitem><para>use a ln command to restore WAL files from archive</para></listitem>
+ <listitem><para>produce logfile output in standby.log</para></listitem>
+ <listitem><para>keep the last 255 full WAL files, plus the current one</para></listitem>
+ <listitem><para>sleep for 2 seconds between checks for next WAL file is full</para></listitem>
+ <listitem><para>never timeout if file not found</para></listitem>
+ <listitem><para>stop waiting when a trigger file called /tmp.pgsql.trigger.5442 appears</para></listitem>
+ </itemizedlist>
+ </listitem>
+
+ <listitem>
+ <para>
+ Example on Windows
+ </para>
+ <programlisting>
+archive_command = 'copy %p ..\\archive\\%f'
+ </programlisting>
+ <para>
+ Note that backslashes need to be doubled in the archive_command, but
+ *not* in the restore_command, in 8.2, 8.1, 8.0 on Windows.
+ </para>
+ <programlisting>
+restore_command = 'pg_standby -c -d -s 5 -w 0 -t C:\pgsql.trigger.5442
+ ..\archive %f %p 2>> standby.log'
+ </programlisting>
+ <para>
+ which will
+ </para>
+ <itemizedlist>
+ <listitem><para>use a copy command to restore WAL files from archive</para></listitem>
+ <listitem><para>produce logfile output in standby.log</para></listitem>
+ <listitem><para>sleep for 5 seconds between checks for next WAL file is full</para></listitem>
+ <listitem><para>never timeout if file not found</para></listitem>
+ <listitem><para>stop waiting when a trigger file called C:\pgsql.trigger.5442 appears</para></listitem>
+ </itemizedlist>
+ </listitem>
+ </itemizedlist>
+ </sect2>
+
+</sect1>
+
diff --git a/doc/src/sgml/tablefunc.sgml b/doc/src/sgml/tablefunc.sgml
new file mode 100644
index 00000000000..23d12b55804
--- /dev/null
+++ b/doc/src/sgml/tablefunc.sgml
@@ -0,0 +1,765 @@
+
+<sect1 id="tablefunc">
+ <title>tablefunc</title>
+
+ <indexterm zone="tablefunc">
+ <primary>tablefunc</primary>
+ </indexterm>
+
+ <para>
+ <literal>tablefunc</literal> provides functions to convert query rows into fields.
+ </para>
+ <sect2>
+ <title>Functions</title>
+ <table>
+ <title></title>
+ <tgroup cols="3">
+ <thead>
+ <row>
+ <entry>Function</entry>
+ <entry>Returns</entry>
+ <entry>Comments</entry>
+ </row>
+ </thead>
+ <tbody>
+ <row>
+ <entry>
+ <literal>
+ normal_rand(int numvals, float8 mean, float8 stddev)
+ </literal>
+ </entry>
+ <entry>
+ returns a set of normally distributed float8 values
+ </entry>
+ <entry></entry>
+ </row>
+ <row>
+ <entry><literal>crosstabN(text sql)</literal></entry>
+ <entry>returns a set of row_name plus N category value columns</entry>
+ <entry>
+ crosstab2(), crosstab3(), and crosstab4() are defined for you,
+ but you can create additional crosstab functions per the instructions
+ in the documentation below.
+ </entry>
+ </row>
+ <row>
+ <entry><literal>crosstab(text sql)</literal></entry>
+ <entry>returns a set of row_name plus N category value columns</entry>
+ <entry>
+ requires anonymous composite type syntax in the FROM clause. See
+ the instructions in the documentation below.
+ </entry>
+ </row>
+ <row>
+ <entry><literal>crosstab(text sql, N int)</literal></entry>
+ <entry></entry>
+ <entry>
+ <para>obsolete version of crosstab()</para>
+ <para>
+ the argument N is now ignored, since the number of value columns
+ is always determined by the calling query
+ </para>
+ </entry>
+ </row>
+ <row>
+ <entry>
+ <literal>
+ connectby(text relname, text keyid_fld, text parent_keyid_fld
+ [, text orderby_fld], text start_with, int max_depth
+ [, text branch_delim])
+ </literal>
+ </entry>
+ <entry>
+ returns keyid, parent_keyid, level, and an optional branch string
+ and an optional serial column for ordering siblings
+ </entry>
+ <entry>
+ requires anonymous composite type syntax in the FROM clause. See
+ the instructions in the documentation below.
+ </entry>
+ </row>
+ </tbody>
+ </tgroup>
+ </table>
+
+ <sect3>
+ <title><literal>normal_rand</literal></title>
+ <programlisting>
+normal_rand(int numvals, float8 mean, float8 stddev) RETURNS SETOF float8
+ </programlisting>
+ <para>
+ Where <literal>numvals</literal> is the number of values to be returned
+ from the function. <literal>mean</literal> is the mean of the normal
+ distribution of values and <literal>stddev</literal> is the standard
+ deviation of the normal distribution of values.
+ </para>
+ <para>
+ Returns a float8 set of random values normally distributed (Gaussian
+ distribution).
+ </para>
+ <para>
+ Example:
+ </para>
+ <programlisting>
+ test=# SELECT * FROM
+ test=# normal_rand(1000, 5, 3);
+ normal_rand
+----------------------
+ 1.56556322244898
+ 9.10040991424657
+ 5.36957140345079
+ -0.369151492880995
+ 0.283600703686639
+ .
+ .
+ .
+ 4.82992125404908
+ 9.71308014517282
+ 2.49639286969028
+(1000 rows)
+ </programlisting>
+ <para>
+ Returns 1000 values with a mean of 5 and a standard deviation of 3.
+ </para>
+ </sect3>
+
+
+ <sect3>
+ <title><literal>crosstabN(text sql)</literal></title>
+ <programlisting>
+crosstabN(text sql)
+ </programlisting>
+ <para>
+ The <literal>sql</literal> parameter is a SQL statement which produces the
+ source set of data. The SQL statement must return one row_name column, one
+ category column, and one value column. <literal>row_name</literal> and
+ value must be of type text. The function returns a set of
+ <literal>row_name</literal> plus N category value columns.
+ </para>
+ <para>
+ Provided <literal>sql</literal> must produce a set something like:
+ </para>
+<programlisting>
+row_name cat value
+---------+-------+-------
+ row1 cat1 val1
+ row1 cat2 val2
+ row1 cat3 val3
+ row1 cat4 val4
+ row2 cat1 val5
+ row2 cat2 val6
+ row2 cat3 val7
+ row2 cat4 val8
+ </programlisting>
+ <para>
+ The returned value is a <literal>SETOF table_crosstab_N</literal>, which
+ is defined by:
+ </para>
+ <programlisting>
+CREATE TYPE tablefunc_crosstab_N AS (
+ row_name TEXT,
+ category_1 TEXT,
+ category_2 TEXT,
+ .
+ .
+ .
+ category_N TEXT
+);
+ </programlisting>
+ <para>
+ for the default installed functions, where N is 2, 3, or 4.
+ </para>
+ <para>
+ e.g. the provided crosstab2 function produces a set something like:
+ </para>
+ <programlisting>
+ <== values columns ==>
+ row_name category_1 category_2
+ ---------+------------+------------
+ row1 val1 val2
+ row2 val5 val6
+ </programlisting>
+ <note>
+ <orderedlist>
+ <listitem><para>The sql result must be ordered by 1,2.</para></listitem>
+ <listitem>
+ <para>
+ The number of values columns depends on the tuple description
+ of the function's declared return type.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ Missing values (i.e. not enough adjacent rows of same row_name to
+ fill the number of result values columns) are filled in with nulls.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ Extra values (i.e. too many adjacent rows of same row_name to fill
+ the number of result values columns) are skipped.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ Rows with all nulls in the values columns are skipped.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ The installed defaults are for illustration purposes. You
+ can create your own return types and functions based on the
+ crosstab() function of the installed library. See below for
+ details.
+ </para>
+ </listitem>
+ </orderedlist>
+ </note>
+ <para>
+ Example:
+ </para>
+ <programlisting>
+create table ct(id serial, rowclass text, rowid text, attribute text, value text);
+insert into ct(rowclass, rowid, attribute, value) values('group1','test1','att1','val1');
+insert into ct(rowclass, rowid, attribute, value) values('group1','test1','att2','val2');
+insert into ct(rowclass, rowid, attribute, value) values('group1','test1','att3','val3');
+insert into ct(rowclass, rowid, attribute, value) values('group1','test1','att4','val4');
+insert into ct(rowclass, rowid, attribute, value) values('group1','test2','att1','val5');
+insert into ct(rowclass, rowid, attribute, value) values('group1','test2','att2','val6');
+insert into ct(rowclass, rowid, attribute, value) values('group1','test2','att3','val7');
+insert into ct(rowclass, rowid, attribute, value) values('group1','test2','att4','val8');
+
+select * from crosstab3(
+ 'select rowid, attribute, value
+ from ct
+ where rowclass = ''group1''
+ and (attribute = ''att2'' or attribute = ''att3'') order by 1,2;');
+
+ row_name | category_1 | category_2 | category_3
+----------+------------+------------+------------
+ test1 | val2 | val3 |
+ test2 | val6 | val7 |
+(2 rows)
+ </programlisting>
+ </sect3>
+
+ <sect3>
+ <title><literal>crosstab(text)</literal></title>
+ <programlisting>
+crosstab(text sql)
+crosstab(text sql, int N)
+ </programlisting>
+ <para>
+ The <literal>sql</literal> parameter is a SQL statement which produces the
+ source set of data. The SQL statement must return one
+ <literal>row_name</literal> column, one <literal>category</literal> column,
+ and one <literal>value</literal> column. <literal>N</literal> is an
+ obsolete argument; ignored if supplied (formerly this had to match the
+ number of category columns determined by the calling query).
+ </para>
+ <para>
+ </para>
+ <para>
+ e.g. provided sql must produce a set something like:
+ </para>
+<programlisting>
+ row_name cat value
+----------+-------+-------
+ row1 cat1 val1
+ row1 cat2 val2
+ row1 cat3 val3
+ row1 cat4 val4
+ row2 cat1 val5
+ row2 cat2 val6
+ row2 cat3 val7
+ row2 cat4 val8
+</programlisting>
+ <para>
+ Returns a <literal>SETOF RECORD</literal>, which must be defined with a
+ column definition in the FROM clause of the SELECT statement, e.g.:
+ </para>
+ <programlisting>
+ SELECT *
+ FROM crosstab(sql) AS ct(row_name text, category_1 text, category_2 text);
+ </programlisting>
+ <para>
+ the example crosstab function produces a set something like:
+ </para>
+ <programlisting>
+ <== values columns ==>
+row_name category_1 category_2
+ ---------+------------+------------
+ row1 val1 val2
+ row2 val5 val6
+ </programlisting>
+ <para>
+ Note that it follows these rules:
+ </para>
+ <orderedlist>
+ <listitem><para>The sql result must be ordered by 1,2.</para></listitem>
+ <listitem>
+ <para>
+ The number of values columns is determined by the column definition
+ provided in the FROM clause. The FROM clause must define one
+ row_name column (of the same datatype as the first result column
+ of the sql query) followed by N category columns (of the same
+ datatype as the third result column of the sql query). You can
+ set up as many category columns as you wish.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ Missing values (i.e. not enough adjacent rows of same row_name to
+ fill the number of result values columns) are filled in with nulls.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ Extra values (i.e. too many adjacent rows of same row_name to fill
+ the number of result values columns) are skipped.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ Rows with all nulls in the values columns are skipped.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ You can avoid always having to write out a FROM clause that defines the
+ output columns by setting up a custom crosstab function that has
+ the desired output row type wired into its definition.
+ </para>
+ </listitem>
+ </orderedlist>
+ <para>
+ There are two ways you can set up a custom crosstab function:
+ </para>
+ <itemizedlist>
+ <listitem>
+ <para>
+ Create a composite type to define your return type, similar to the
+ examples in the installation script. Then define a unique function
+ name accepting one text parameter and returning setof your_type_name.
+ For example, if your source data produces row_names that are TEXT,
+ and values that are FLOAT8, and you want 5 category columns:
+ </para>
+ <programlisting>
+ CREATE TYPE my_crosstab_float8_5_cols AS (
+ row_name TEXT,
+ category_1 FLOAT8,
+ category_2 FLOAT8,
+ category_3 FLOAT8,
+ category_4 FLOAT8,
+ category_5 FLOAT8
+ );
+
+ CREATE OR REPLACE FUNCTION crosstab_float8_5_cols(text)
+ RETURNS setof my_crosstab_float8_5_cols
+ AS '$libdir/tablefunc','crosstab' LANGUAGE C STABLE STRICT;
+ </programlisting>
+ </listitem>
+ <listitem>
+ <para>
+ Use OUT parameters to define the return type implicitly.
+ The same example could also be done this way:
+ </para>
+ <programlisting>
+ CREATE OR REPLACE FUNCTION crosstab_float8_5_cols(IN text,
+ OUT row_name TEXT,
+ OUT category_1 FLOAT8,
+ OUT category_2 FLOAT8,
+ OUT category_3 FLOAT8,
+ OUT category_4 FLOAT8,
+ OUT category_5 FLOAT8)
+ RETURNS setof record
+ AS '$libdir/tablefunc','crosstab' LANGUAGE C STABLE STRICT;
+ </programlisting>
+ </listitem>
+ </itemizedlist>
+ <para>
+ Example:
+ </para>
+ <programlisting>
+CREATE TABLE ct(id SERIAL, rowclass TEXT, rowid TEXT, attribute TEXT, value TEXT);
+INSERT INTO ct(rowclass, rowid, attribute, value) VALUES('group1','test1','att1','val1');
+INSERT INTO ct(rowclass, rowid, attribute, value) VALUES('group1','test1','att2','val2');
+INSERT INTO ct(rowclass, rowid, attribute, value) VALUES('group1','test1','att3','val3');
+INSERT INTO ct(rowclass, rowid, attribute, value) VALUES('group1','test1','att4','val4');
+INSERT INTO ct(rowclass, rowid, attribute, value) VALUES('group1','test2','att1','val5');
+INSERT INTO ct(rowclass, rowid, attribute, value) VALUES('group1','test2','att2','val6');
+INSERT INTO ct(rowclass, rowid, attribute, value) VALUES('group1','test2','att3','val7');
+INSERT INTO ct(rowclass, rowid, attribute, value) VALUES('group1','test2','att4','val8');
+
+SELECT *
+FROM crosstab(
+ 'select rowid, attribute, value
+ from ct
+ where rowclass = ''group1''
+ and (attribute = ''att2'' or attribute = ''att3'') order by 1,2;', 3)
+AS ct(row_name text, category_1 text, category_2 text, category_3 text);
+
+ row_name | category_1 | category_2 | category_3
+----------+------------+------------+------------
+ test1 | val2 | val3 |
+ test2 | val6 | val7 |
+(2 rows)
+ </programlisting>
+
+ </sect3>
+
+ <sect3>
+ <title><literal>crosstab(text, text)</literal></title>
+ <programlisting>
+crosstab(text source_sql, text category_sql)
+ </programlisting>
+
+ <para>
+ Where <literal>source_sql</literal> is a SQL statement which produces the
+ source set of data. The SQL statement must return one
+ <literal>row_name</literal> column, one <literal>category</literal> column,
+ and one <literal>value</literal> column. It may also have one or more
+ <emphasis>extra</emphasis> columns.
+ </para>
+ <para>
+ The <literal>row_name</literal> column must be first. The
+ <literal>category</literal> and <literal>value</literal> columns must be
+ the last two columns, in that order. <emphasis>extra</emphasis> columns must
+ be columns 2 through (N - 2), where N is the total number of columns.
+ </para>
+ <para>
+ The <emphasis>extra</emphasis> columns are assumed to be the same for all
+ rows with the same <literal>row_name</literal>. The values returned are
+ copied from the first row with a given <literal>row_name</literal> and
+ subsequent values of these columns are ignored until
+ <literal>row_name</literal> changes.
+ </para>
+ <para>
+ e.g. <literal>source_sql</literal> must produce a set something like:
+ </para>
+ <programlisting>
+ SELECT row_name, extra_col, cat, value FROM foo;
+
+ row_name extra_col cat value
+ ----------+------------+-----+---------
+ row1 extra1 cat1 val1
+ row1 extra1 cat2 val2
+ row1 extra1 cat4 val4
+ row2 extra2 cat1 val5
+ row2 extra2 cat2 val6
+ row2 extra2 cat3 val7
+ row2 extra2 cat4 val8
+ </programlisting>
+
+ <para>
+ <literal>category_sql</literal> has to be a SQL statement which produces
+ the distinct set of categories. The SQL statement must return one category
+ column only. <literal>category_sql</literal> must produce at least one
+ result row or an error will be generated. <literal>category_sql</literal>
+ must not produce duplicate categories or an error will be generated. e.g.:
+ </para>
+ <programlisting>
+SELECT DISTINCT cat FROM foo;
+ cat
+ -------
+ cat1
+ cat2
+ cat3
+ cat4
+ </programlisting>
+ <para>
+ The function returns <literal>SETOF RECORD</literal>, which must be defined
+ with a column definition in the FROM clause of the SELECT statement, e.g.:
+ </para>
+ <programlisting>
+ SELECT * FROM crosstab(source_sql, cat_sql)
+ AS ct(row_name text, extra text, cat1 text, cat2 text, cat3 text, cat4 text);
+ </programlisting>
+ <para>
+ the example crosstab function produces a set something like:
+ </para>
+ <programlisting>
+ <== values columns ==>
+ row_name extra cat1 cat2 cat3 cat4
+ ---------+-------+------+------+------+------
+ row1 extra1 val1 val2 val4
+ row2 extra2 val5 val6 val7 val8
+ </programlisting>
+ <para>
+ Note that it follows these rules:
+ </para>
+ <orderedlist>
+ <listitem><para>source_sql must be ordered by row_name (column 1).</para></listitem>
+ <listitem>
+ <para>
+ The number of values columns is determined at run-time. The
+ column definition provided in the FROM clause must provide for
+ the correct number of columns of the proper data types.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ Missing values (i.e. not enough adjacent rows of same row_name to
+ fill the number of result values columns) are filled in with nulls.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ Extra values (i.e. source rows with category not found in category_sql
+ result) are skipped.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ Rows with a null row_name column are skipped.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ You can create predefined functions to avoid having to write out
+ the result column names/types in each query. See the examples
+ for crosstab(text).
+ </para>
+ </listitem>
+ </orderedlist>
+
+ <programlisting>
+CREATE TABLE cth(id serial, rowid text, rowdt timestamp, attribute text, val text);
+INSERT INTO cth VALUES(DEFAULT,'test1','01 March 2003','temperature','42');
+INSERT INTO cth VALUES(DEFAULT,'test1','01 March 2003','test_result','PASS');
+INSERT INTO cth VALUES(DEFAULT,'test1','01 March 2003','volts','2.6987');
+INSERT INTO cth VALUES(DEFAULT,'test2','02 March 2003','temperature','53');
+INSERT INTO cth VALUES(DEFAULT,'test2','02 March 2003','test_result','FAIL');
+INSERT INTO cth VALUES(DEFAULT,'test2','02 March 2003','test_startdate','01 March 2003');
+INSERT INTO cth VALUES(DEFAULT,'test2','02 March 2003','volts','3.1234');
+
+SELECT * FROM crosstab
+(
+ 'SELECT rowid, rowdt, attribute, val FROM cth ORDER BY 1',
+ 'SELECT DISTINCT attribute FROM cth ORDER BY 1'
+)
+AS
+(
+ rowid text,
+ rowdt timestamp,
+ temperature int4,
+ test_result text,
+ test_startdate timestamp,
+ volts float8
+);
+ rowid | rowdt | temperature | test_result | test_startdate | volts
+-------+--------------------------+-------------+-------------+--------------------------+--------
+ test1 | Sat Mar 01 00:00:00 2003 | 42 | PASS | | 2.6987
+ test2 | Sun Mar 02 00:00:00 2003 | 53 | FAIL | Sat Mar 01 00:00:00 2003 | 3.1234
+(2 rows)
+ </programlisting>
+ </sect3>
+ <sect3>
+ <title>
+ <literal>connectby(text, text, text[, text], text, text, int[, text])</literal>
+ </title>
+ <programlisting>
+connectby(text relname, text keyid_fld, text parent_keyid_fld
+ [, text orderby_fld], text start_with, int max_depth
+ [, text branch_delim])
+ </programlisting>
+ <table>
+ <title><literal>connectby</literal> parameters</title>
+ <tgroup cols="2">
+ <thead>
+ <row>
+ <entry>Parameter</entry>
+ <entry>Description</entry>
+ </row>
+ </thead>
+ <tbody>
+ <row>
+ <entry><literal>relname</literal></entry>
+ <entry>Name of the source relation</entry>
+ </row>
+ <row>
+ <entry><literal>keyid_fld</literal></entry>
+ <entry>Name of the key field</entry>
+ </row>
+ <row>
+ <entry><literal>parent_keyid_fld</literal></entry>
+ <entry>Name of the key_parent field</entry>
+ </row>
+ <row>
+ <entry><literal>orderby_fld</literal></entry>
+ <entry>
+ If optional ordering of siblings is desired: Name of the field to
+ order siblings
+ </entry>
+ </row>
+ <row>
+ <entry><literal>start_with</literal></entry>
+ <entry>
+ Root value of the tree input as a text value regardless of
+ <literal>keyid_fld</literal>
+ </entry>
+ </row>
+ <row>
+ <entry><literal>max_depth</literal></entry>
+ <entry>
+ Zero (0) for unlimited depth, otherwise restrict level to this depth
+ </entry>
+ </row>
+ <row>
+ <entry><literal>branch_delim</literal></entry>
+ <entry>
+ If optional branch value is desired, this string is used as the delimiter.
+ When not provided, a default value of '~' is used for internal
+ recursion detection only, and no "branch" field is returned.
+ </entry>
+ </row>
+ </tbody>
+ </tgroup>
+ </table>
+ <para>
+ The function returns <literal>SETOF RECORD</literal>, which must defined
+ with a column definition in the FROM clause of the SELECT statement, e.g.:
+ </para>
+ <programlisting>
+ SELECT * FROM connectby('connectby_tree', 'keyid', 'parent_keyid', 'row2', 0, '~')
+ AS t(keyid text, parent_keyid text, level int, branch text);
+ </programlisting>
+ <para>
+ or
+ </para>
+ <programlisting>
+ SELECT * FROM connectby('connectby_tree', 'keyid', 'parent_keyid', 'row2', 0)
+ AS t(keyid text, parent_keyid text, level int);
+ </programlisting>
+ <para>
+ or
+ </para>
+ <programlisting>
+ SELECT * FROM connectby('connectby_tree', 'keyid', 'parent_keyid', 'pos', 'row2', 0, '~')
+ AS t(keyid text, parent_keyid text, level int, branch text, pos int);
+ </programlisting>
+ <para>
+ or
+ </para>
+ <programlisting>
+ SELECT * FROM connectby('connectby_tree', 'keyid', 'parent_keyid', 'pos', 'row2', 0)
+ AS t(keyid text, parent_keyid text, level int, pos int);
+ </programlisting>
+ <para>
+ Note that it follows these rules:
+ </para>
+ <orderedlist>
+ <listitem><para>keyid and parent_keyid must be the same data type</para></listitem>
+ <listitem>
+ <para>
+ The column definition *must* include a third column of type INT4 for
+ the level value output
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ If the branch field is not desired, omit both the branch_delim input
+ parameter *and* the branch field in the query column definition. Note
+ that when branch_delim is not provided, a default value of '~' is used
+ for branch_delim for internal recursion detection, even though the branch
+ field is not returned.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ If the branch field is desired, it must be the fourth column in the query
+ column definition, and it must be type TEXT.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ The parameters representing table and field names must include double
+ quotes if the names are mixed-case or contain special characters.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ If sorting of siblings is desired, the orderby_fld input parameter *and*
+ a name for the resulting serial field (type INT32) in the query column
+ definition must be given.
+ </para>
+ </listitem>
+ </orderedlist>
+ <para>
+ Example:
+ </para>
+ <programlisting>
+CREATE TABLE connectby_tree(keyid text, parent_keyid text, pos int);
+
+INSERT INTO connectby_tree VALUES('row1',NULL, 0);
+INSERT INTO connectby_tree VALUES('row2','row1', 0);
+INSERT INTO connectby_tree VALUES('row3','row1', 0);
+INSERT INTO connectby_tree VALUES('row4','row2', 1);
+INSERT INTO connectby_tree VALUES('row5','row2', 0);
+INSERT INTO connectby_tree VALUES('row6','row4', 0);
+INSERT INTO connectby_tree VALUES('row7','row3', 0);
+INSERT INTO connectby_tree VALUES('row8','row6', 0);
+INSERT INTO connectby_tree VALUES('row9','row5', 0);
+
+-- with branch, without orderby_fld
+SELECT * FROM connectby('connectby_tree', 'keyid', 'parent_keyid', 'row2', 0, '~')
+ AS t(keyid text, parent_keyid text, level int, branch text);
+ keyid | parent_keyid | level | branch
+-------+--------------+-------+---------------------
+ row2 | | 0 | row2
+ row4 | row2 | 1 | row2~row4
+ row6 | row4 | 2 | row2~row4~row6
+ row8 | row6 | 3 | row2~row4~row6~row8
+ row5 | row2 | 1 | row2~row5
+ row9 | row5 | 2 | row2~row5~row9
+(6 rows)
+
+-- without branch, without orderby_fld
+SELECT * FROM connectby('connectby_tree', 'keyid', 'parent_keyid', 'row2', 0)
+ AS t(keyid text, parent_keyid text, level int);
+ keyid | parent_keyid | level
+-------+--------------+-------
+ row2 | | 0
+ row4 | row2 | 1
+ row6 | row4 | 2
+ row8 | row6 | 3
+ row5 | row2 | 1
+ row9 | row5 | 2
+(6 rows)
+
+-- with branch, with orderby_fld (notice that row5 comes before row4)
+SELECT * FROM connectby('connectby_tree', 'keyid', 'parent_keyid', 'pos', 'row2', 0, '~')
+ AS t(keyid text, parent_keyid text, level int, branch text, pos int) ORDER BY t.pos;
+ keyid | parent_keyid | level | branch | pos
+-------+--------------+-------+---------------------+-----
+ row2 | | 0 | row2 | 1
+ row5 | row2 | 1 | row2~row5 | 2
+ row9 | row5 | 2 | row2~row5~row9 | 3
+ row4 | row2 | 1 | row2~row4 | 4
+ row6 | row4 | 2 | row2~row4~row6 | 5
+ row8 | row6 | 3 | row2~row4~row6~row8 | 6
+(6 rows)
+
+-- without branch, with orderby_fld (notice that row5 comes before row4)
+SELECT * FROM connectby('connectby_tree', 'keyid', 'parent_keyid', 'pos', 'row2', 0)
+ AS t(keyid text, parent_keyid text, level int, pos int) ORDER BY t.pos;
+ keyid | parent_keyid | level | pos
+-------+--------------+-------+-----
+ row2 | | 0 | 1
+ row5 | row2 | 1 | 2
+ row9 | row5 | 2 | 3
+ row4 | row2 | 1 | 4
+ row6 | row4 | 2 | 5
+ row8 | row6 | 3 | 6
+(6 rows)
+ </programlisting>
+ </sect3>
+ </sect2>
+ <sect2>
+ <title>Author</title>
+ <para>
+ Joe Conway
+ </para>
+ </sect2>
+</sect1>
+
diff --git a/doc/src/sgml/trgm.sgml b/doc/src/sgml/trgm.sgml
new file mode 100644
index 00000000000..62a5e30382f
--- /dev/null
+++ b/doc/src/sgml/trgm.sgml
@@ -0,0 +1,214 @@
+<sect1 id="pgtrgm">
+ <title>pg_trgm</title>
+
+ <indexterm zone="pgtrgm">
+ <primary>pgtrgm</primary>
+ </indexterm>
+
+ <para>
+ The <literal>pg_trgm</literal> module provides functions and index classes
+ for determining the similarity of text based on trigram matching.
+ </para>
+
+ <sect2>
+ <title>Trigram (or Trigraph)</title>
+ <para>
+ A trigram is a set of three consecutive characters taken
+ from a string. A string is considered to have two spaces
+ prefixed and one space suffixed when determining the set
+ of trigrams that comprise the string.
+ </para>
+ <para>
+ eg. The set of trigrams in the word "cat" is " c", " ca",
+ "at " and "cat".
+ </para>
+ </sect2>
+
+ <sect2>
+ <title>Public Functions</title>
+ <table>
+ <title><literal>pg_trgm</literal> functions</title>
+ <tgroup cols="2">
+ <thead>
+ <row>
+ <entry>Function</entry>
+ <entry>Description</entry>
+ </row>
+ </thead>
+ <tbody>
+ <row>
+ <entry><literal>real similarity(text, text)</literal></entry>
+ <entry>
+ <para>
+ Returns a number that indicates how closely matches the two
+ arguments are. A zero result indicates that the two words
+ are completely dissimilar, and a result of one indicates that
+ the two words are identical.
+ </para>
+ </entry>
+ </row>
+ <row>
+ <entry><literal>real show_limit()</literal></entry>
+ <entry>
+ <para>
+ Returns the current similarity threshold used by the '%'
+ operator. This in effect sets the minimum similarity between
+ two words in order that they be considered similar enough to
+ be misspellings of each other, for example.
+ </para>
+ </entry>
+ </row>
+ <row>
+ <entry><literal>real set_limit(real)</literal></entry>
+ <entry>
+ <para>
+ Sets the current similarity threshold that is used by the '%'
+ operator, and is returned by the show_limit() function.
+ </para>
+ </entry>
+ </row>
+ <row>
+ <entry><literal>text[] show_trgm(text)</literal></entry>
+ <entry>
+ <para>
+ Returns an array of all the trigrams of the supplied text
+ parameter.
+ </para>
+ </entry>
+ </row>
+ <row>
+ <entry>Operator: <literal>text % text (returns boolean)</literal></entry>
+ <entry>
+ <para>
+ The '%' operator returns TRUE if its two arguments have a similarity
+ that is greater than the similarity threshold set by set_limit(). It
+ will return FALSE if the similarity is less than the current
+ threshold.
+ </para>
+ </entry>
+ </row>
+ </tbody>
+ </tgroup>
+ </table>
+ </sect2>
+
+ <sect2>
+ <title>Public Index Operator Class</title>
+ <para>
+ The <literal>pg_trgm</literal> module comes with the
+ <literal>gist_trgm_ops</literal> index operator class that allows a
+ developer to create an index over a text column for the purpose
+ of very fast similarity searches.
+ </para>
+ <para>
+ To use this index, the '%' operator must be used and an appropriate
+ similarity threshold for the application must be set. Example:
+ </para>
+ <programlisting>
+CREATE TABLE test_trgm (t text);
+CREATE INDEX trgm_idx ON test_trgm USING gist (t gist_trgm_ops);
+ </programlisting>
+ <para>
+ At this point, you will have an index on the t text column that you
+ can use for similarity searching. Example:
+ </para>
+ <programlisting>
+SELECT
+ t,
+ similarity(t, 'word') AS sml
+FROM
+ test_trgm
+WHERE
+ t % 'word'
+ORDER BY
+ sml DESC, t;
+ </programlisting>
+ <para>
+ This will return all values in the text column that are sufficiently
+ similar to 'word', sorted from best match to worst. The index will
+ be used to make this a fast operation over very large data sets.
+ </para>
+ </sect2>
+
+ <sect2>
+ <title>Tsearch2 Integration</title>
+ <para>
+ Trigram matching is a very useful tool when used in conjunction
+ with a text index created by the Tsearch2 contrib module. (See
+ contrib/tsearch2)
+ </para>
+ <para>
+ The first step is to generate an auxiliary table containing all
+ the unique words in the Tsearch2 index:
+ </para>
+ <programlisting>
+CREATE TABLE words AS SELECT word FROM
+ stat('SELECT to_tsvector(''simple'', bodytext) FROM documents');
+ </programlisting>
+ <para>
+ Where 'documents' is a table that has a text field 'bodytext'
+ that TSearch2 is used to search. The use of the 'simple' dictionary
+ with the to_tsvector function, instead of just using the already
+ existing vector is to avoid creating a list of already stemmed
+ words. This way, only the original, unstemmed words are added
+ to the word list.
+ </para>
+ <para>
+ Next, create a trigram index on the word column:
+ </para>
+ <programlisting>
+CREATE INDEX words_idx ON words USING gist(word gist_trgm_ops);
+ </programlisting>
+ <para>
+ or
+ </para>
+ <programlisting>
+CREATE INDEX words_idx ON words USING gin(word gist_trgm_ops);
+ </programlisting>
+ <para>
+ Now, a <literal>SELECT</literal> query similar to the example above can be
+ used to suggest spellings for misspelled words in user search terms. A
+ useful extra clause is to ensure that the similar words are also
+ of similar length to the misspelled word.
+ </para>
+ <para>
+ <note>
+ <para>
+ Since the 'words' table has been generated as a separate,
+ static table, it will need to be periodically regenerated so that
+ it remains up to date with the word list in the Tsearch2 index.
+ </para>
+ </note>
+ </para>
+ </sect2>
+
+ <sect2>
+ <title>References</title>
+ <para>
+ Tsearch2 Development Site
+ <ulink url="http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/"></ulink>
+ </para>
+ <para>
+ GiST Development Site
+ <ulink url="http://www.sai.msu.su/~megera/postgres/gist/"></ulink>
+ </para>
+ </sect2>
+
+ <sect2>
+ <title>Authors</title>
+ <para>
+ Oleg Bartunov <email>oleg@sai.msu.su</email>, Moscow, Moscow University, Russia
+ </para>
+ <para>
+ Teodor Sigaev <email>teodor@sigaev.ru</email>, Moscow, Delta-Soft Ltd.,Russia
+ </para>
+ <para>
+ Documentation: Christopher Kings-Lynne
+ </para>
+ <para>
+ This module is sponsored by Delta-Soft Ltd., Moscow, Russia.
+ </para>
+ </sect2>
+
+</sect1>
+
diff --git a/doc/src/sgml/uuid-ossp.sgml b/doc/src/sgml/uuid-ossp.sgml
new file mode 100644
index 00000000000..93e6c0faeac
--- /dev/null
+++ b/doc/src/sgml/uuid-ossp.sgml
@@ -0,0 +1,163 @@
+
+<sect1 id="uuid-ossp">
+ <title>uuid-ossp</title>
+
+ <indexterm zone="uuid-ossp">
+ <primary>uuid-ossp</primary>
+ </indexterm>
+
+ <para>
+ This module provides functions to generate universally unique
+ identifiers (UUIDs) using one of the several standard algorithms, as
+ well as functions to produce certain special UUID constants.
+ </para>
+
+ <sect2>
+ <title>UUID Generation</title>
+ <para>
+ The relevant standards ITU-T Rec. X.667, ISO/IEC 9834-8:2005, and RFC
+ 4122 specify four algorithms for generating UUIDs, identified by the
+ version numbers 1, 3, 4, and 5. (There is no version 2 algorithm.)
+ Each of these algorithms could be suitable for a different set of
+ applications.
+ </para>
+
+ <table>
+ <title><literal>uuid-ossp</literal> functions</title>
+ <tgroup cols="2">
+ <thead>
+ <row>
+ <entry>Function</entry>
+ <entry>Description</entry>
+ </row>
+ </thead>
+ <tbody>
+ <row>
+ <entry><literal>uuid_generate_v1()</literal></entry>
+ <entry>
+ <para>
+ This function generates a version 1 UUID. This involves the MAC
+ address of the computer and a time stamp. Note that UUIDs of this
+ kind reveal the identity of the computer that created the identifier
+ and the time at which it did so, which might make it unsuitable for
+ certain security-sensitive applications.
+ </para>
+ </entry>
+ </row>
+ <row>
+ <entry><literal>uuid_generate_v1mc()</literal></entry>
+ <entry>
+ <para>
+ This function generates a version 1 UUID but uses a random multicast
+ MAC address instead of the real MAC address of the computer.
+ </para>
+ </entry>
+ </row>
+ <row>
+ <entry><literal>uuid_generate_v3(namespace uuid, name text)</literal></entry>
+ <entry>
+ <para>
+ This function generates a version 3 UUID in the given namespace using
+ the specified input name. The namespace should be one of the special
+ constants produced by the uuid_ns_*() functions shown below. (It
+ could be any UUID in theory.) The name is an identifier in the
+ selected namespace. For example:
+ </para>
+ </entry>
+ </row>
+ <row>
+ <entry><literal>uuid_generate_v3(uuid_ns_url(), 'http://www.postgresql.org')</literal></entry>
+ <entry>
+ <para>
+ The name parameter will be MD5-hashed, so the cleartext cannot be
+ derived from the generated UUID.
+ </para>
+ <para>
+ The generation of UUIDs by this method has no random or
+ environment-dependent element and is therefore reproducible.
+ </para>
+ </entry>
+ </row>
+ <row>
+ <entry><literal>uuid_generate_v4()</literal></entry>
+ <entry>
+ <para>
+ This function generates a version 4 UUID, which is derived entirely
+ from random numbers.
+ </para>
+ </entry>
+ </row>
+ <row>
+ <entry><literal>uuid_generate_v5(namespace uuid, name text)</literal></entry>
+ <entry>
+ <para>
+ This function generates a version 5 UUID, which works like a version 3
+ UUID except that SHA-1 is used as a hashing method. Version 5 should
+ be preferred over version 3 because SHA-1 is thought to be more secure
+ than MD5.
+ </para>
+ </entry>
+ </row>
+ </tbody>
+ </tgroup>
+ </table>
+
+ <table>
+ <title>UUID Constants</title>
+ <tgroup cols="2">
+ <tbody>
+ <row>
+ <entry><literal>uuid_nil()</literal></entry>
+ <entry>
+ <para>
+ A "nil" UUID constant, which does not occur as a real UUID.
+ </para>
+ </entry>
+ </row>
+ <row>
+ <entry><literal>uuid_ns_dns()</literal></entry>
+ <entry>
+ <para>
+ Constant designating the DNS namespace for UUIDs.
+ </para>
+ </entry>
+ </row>
+ <row>
+ <entry><literal>uuid_ns_url()</literal></entry>
+ <entry>
+ <para>
+ Constant designating the URL namespace for UUIDs.
+ </para>
+ </entry>
+ </row>
+ <row>
+ <entry><literal>uuid_ns_oid()</literal></entry>
+ <entry>
+ <para>
+ Constant designating the ISO object identifier (OID) namespace for
+ UUIDs. (This pertains to ASN.1 OIDs, unrelated to the OIDs used in
+ PostgreSQL.)
+ </para>
+ </entry>
+ </row>
+ <row>
+ <entry><literal>uuid_ns_x500()</literal></entry>
+ <entry>
+ <para>
+ Constant designating the X.500 distinguished name (DN) namespace for
+ UUIDs.
+ </para>
+ </entry>
+ </row>
+ </tbody>
+ </tgroup>
+ </table>
+ </sect2>
+ <sect2>
+ <title>Author</title>
+ <para>
+ Peter Eisentraut <email>peter_e@gmx.net</email>
+ </para>
+ </sect2>
+</sect1>
+
diff --git a/doc/src/sgml/vacuumlo.sgml b/doc/src/sgml/vacuumlo.sgml
new file mode 100644
index 00000000000..28219d2b257
--- /dev/null
+++ b/doc/src/sgml/vacuumlo.sgml
@@ -0,0 +1,74 @@
+<sect1 id="vacuumlo">
+ <title>vacuumlo</title>
+
+ <indexterm zone="vacuumlo">
+ <primary>vacuumlo</primary>
+ </indexterm>
+
+ <para>
+ This is a simple utility that will remove any orphaned large objects out of a
+ PostgreSQL database. An orphaned LO is considered to be any LO whose OID
+ does not appear in any OID data column of the database.
+ </para>
+ <para>
+ If you use this, you may also be interested in the lo_manage trigger in
+ contrib/lo. lo_manage is useful to try to avoid creating orphaned LOs
+ in the first place.
+ </para>
+ <para>
+ <note>
+ <para>
+ It was decided to place this in contrib as it needs further testing, but hopefully,
+ this (or a variant of it) would make it into the backend as a "vacuum lo"
+ command in a later release.
+ </para>
+ </note>
+ </para>
+
+ <sect2>
+ <title>Usage</title>
+ <programlisting>
+vacuumlo [options] database [database2 ... databasen]
+ </programlisting>
+ <para>
+ All databases named on the command line are processed. Available options
+ include:
+ </para>
+ <programlisting>
+-v Write a lot of progress messages
+-n Don't remove large objects, just show what would be done
+-U username Username to connect as
+-W Prompt for password
+-h hostname Database server host
+-p port Database server port
+ </programlisting>
+ </sect2>
+
+ <sect2>
+ <title>Method</title>
+ <para>
+ First, it builds a temporary table which contains all of the OIDs of the
+ large objects in that database.
+ </para>
+ <para>
+ It then scans through all columns in the database that are of type "oid"
+ or "lo", and removes matching entries from the temporary table.
+ </para>
+ <para>
+ The remaining entries in the temp table identify orphaned LOs. These are
+ removed.
+ </para>
+ </sect2>
+
+ <sect2>
+ <title>Author</title>
+ <para>
+ Peter Mount <email>peter@retep.org.uk</email>
+ </para>
+ <para>
+ <ulink url="http://www.retep.org.uk"></ulink>
+ </para>
+ </sect2>
+
+</sect1>
+
diff --git a/doc/src/sgml/xml2.sgml b/doc/src/sgml/xml2.sgml
new file mode 100644
index 00000000000..d73789a155d
--- /dev/null
+++ b/doc/src/sgml/xml2.sgml
@@ -0,0 +1,436 @@
+<sect1 id="xml2">
+ <title>xml2: XML-handling functions</title>
+
+ <indexterm zone="xml2">
+ <primary>xml2</primary>
+ </indexterm>
+
+ <sect2>
+ <title>Deprecation notice</title>
+ <para>
+ From PostgreSQL 8.3 on, there is XML-related
+ functionality based on the SQL/XML standard in the core server.
+ That functionality covers XML syntax checking and XPath queries,
+ which is what this module does as well, and more, but the API is
+ not at all compatible. It is planned that this module will be
+ removed in PostgreSQL 8.4 in favor of the newer standard API, so
+ you are encouraged to try converting your applications. If you
+ find that some of the functionality of this module is not
+ available in an adequate form with the newer API, please explain
+ your issue to pgsql-hackers@postgresql.org so that the deficiency
+ can be addressed.
+ </para>
+ </sect2>
+
+ <sect2>
+ <title>Description of functions</title>
+ <para>
+ The first set of functions are straightforward XML parsing and XPath queries:
+ </para>
+
+ <table>
+ <title>Functions</title>
+ <tgroup cols="2">
+ <tbody>
+ <row>
+ <entry>
+ <programlisting>
+ xml_is_well_formed(document) RETURNS bool
+ </programlisting>
+ </entry>
+ <entry>
+ <para>
+ This parses the document text in its parameter and returns true if the
+ document is well-formed XML. (Note: before PostgreSQL 8.2, this function
+ was called xml_valid(). That is the wrong name since validity and
+ well-formedness have different meanings in XML. The old name is still
+ available, but is deprecated and will be removed in 8.3.)
+ </para>
+ </entry>
+ </row>
+ <row>
+ <entry>
+ <programlisting>
+ xpath_string(document,query) RETURNS text
+ xpath_number(document,query) RETURNS float4
+ xpath_bool(document,query) RETURNS bool
+ </programlisting>
+ </entry>
+ <entry>
+ <para>
+ These functions evaluate the XPath query on the supplied document, and
+ cast the result to the specified type.
+ </para>
+ </entry>
+ </row>
+ <row>
+ <entry>
+ <programlisting>
+ xpath_nodeset(document,query,toptag,itemtag) RETURNS text
+ </programlisting>
+ </entry>
+ <entry>
+ <para>
+ This evaluates query on document and wraps the result in XML tags. If
+ the result is multivalued, the output will look like:
+ </para>
+ <literal>
+ &lt;toptag>
+ &lt;itemtag>Value 1 which could be an XML fragment&lt;/itemtag>
+ &lt;itemtag>Value 2....&lt;/itemtag>
+ &lt;/toptag>
+ </literal>
+ <para>
+ If either toptag or itemtag is an empty string, the relevant tag is omitted.
+ </para>
+ </entry>
+ </row>
+ <row>
+ <entry>
+ <programlisting>
+ xpath_nodeset(document,query) RETURNS
+ </programlisting>
+ </entry>
+ <entry>
+ <para>
+ Like xpath_nodeset(document,query,toptag,itemtag) but text omits both tags.
+ </para>
+ </entry>
+ </row>
+ <row>
+ <entry>
+ <programlisting>
+ xpath_nodeset(document,query,itemtag) RETURNS
+ </programlisting>
+ </entry>
+ <entry>
+ <para>
+ Like xpath_nodeset(document,query,toptag,itemtag) but text omits toptag.
+ </para>
+ </entry>
+ </row>
+ <row>
+ <entry>
+ <programlisting>
+ xpath_list(document,query,seperator) RETURNS text
+ </programlisting>
+ </entry>
+ <entry>
+ <para>
+ This function returns multiple values seperated by the specified
+ seperator, e.g. Value 1,Value 2,Value 3 if seperator=','.
+ </para>
+ </entry>
+ </row>
+ <row>
+ <entry>
+ <programlisting>
+ xpath_list(document,query) RETURNS text
+ </programlisting>
+ </entry>
+ <entry>
+ This is a wrapper for the above function that uses ',' as the seperator.
+ </entry>
+ </row>
+ </tbody>
+ </tgroup>
+ </table>
+ </sect2>
+
+
+ <sect2>
+ <title><literal>xpath_table</literal></title>
+ <para>
+ This is a table function which evaluates a set of XPath queries on
+ each of a set of documents and returns the results as a table. The
+ primary key field from the original document table is returned as the
+ first column of the result so that the resultset from xpath_table can
+ be readily used in joins.
+ </para>
+ <para>
+ The function itself takes 5 arguments, all text.
+ </para>
+ <programlisting>
+ xpath_table(key,document,relation,xpaths,criteria)
+ </programlisting>
+ <table>
+ <title>Parameters</title>
+ <tgroup cols="2">
+ <tbody>
+ <row>
+ <entry><literal>key</literal></entry>
+ <entry>
+ <para>
+ the name of the "key" field - this is just a field to be used as
+ the first column of the output table i.e. it identifies the record from
+ which each output row came (see note below about multiple values).
+ </para>
+ </entry>
+ </row>
+ <row>
+ <entry><literal>document</literal></entry>
+ <entry>
+ <para>
+ the name of the field containing the XML document
+ </para>
+ </entry>
+ </row>
+ <row>
+ <entry><literal>relation</literal></entry>
+ <entry>
+ <para>
+ the name of the table or view containing the documents
+ </para>
+ </entry>
+ </row>
+ <row>
+ <entry><literal>xpaths</literal></entry>
+ <entry>
+ <para>
+ multiple xpath expressions separated by <literal>|</literal>
+ </para>
+ </entry>
+ </row>
+ <row>
+ <entry><literal>criteria</literal></entry>
+ <entry>
+ <para>
+ The contents of the where clause. This needs to be specified,
+ so use "true" or "1=1" here if you want to process all the rows in the
+ relation.
+ </para>
+ </entry>
+ </row>
+ </tbody>
+ </tgroup>
+ </table>
+
+ <para>
+ NB These parameters (except the XPath strings) are just substituted
+ into a plain SQL SELECT statement, so you have some flexibility - the
+ statement is
+ </para>
+
+ <para>
+ <literal>
+ SELECT &lt;key>,&lt;document> FROM &lt;relation> WHERE &lt;criteria>
+ </literal>
+ </para>
+
+ <para>
+ so those parameters can be *anything* valid in those particular
+ locations. The result from this SELECT needs to return exactly two
+ columns (which it will unless you try to list multiple fields for key
+ or document). Beware that this simplistic approach requires that you
+ validate any user-supplied values to avoid SQL injection attacks.
+ </para>
+
+ <para>
+ Using the function
+ </para>
+
+ <para>
+ The function has to be used in a FROM expression. This gives the following
+ form:
+ </para>
+
+ <programlisting>
+SELECT * FROM
+xpath_table('article_id',
+ 'article_xml',
+ 'articles',
+ '/article/author|/article/pages|/article/title',
+ 'date_entered > ''2003-01-01'' ')
+AS t(article_id integer, author text, page_count integer, title text);
+ </programlisting>
+
+ <para>
+ The AS clause defines the names and types of the columns in the
+ virtual table. If there are more XPath queries than result columns,
+ the extra queries will be ignored. If there are more result columns
+ than XPath queries, the extra columns will be NULL.
+ </para>
+
+ <para>
+ Note that I've said in this example that pages is an integer. The
+ function deals internally with string representations, so when you say
+ you want an integer in the output, it will take the string
+ representation of the XPath result and use PostgreSQL input functions
+ to transform it into an integer (or whatever type the AS clause
+ requests). An error will result if it can't do this - for example if
+ the result is empty - so you may wish to just stick to 'text' as the
+ column type if you think your data has any problems.
+ </para>
+ <para>
+ The select statement doesn't need to use * alone - it can reference the
+ columns by name or join them to other tables. The function produces a
+ virtual table with which you can perform any operation you wish (e.g.
+ aggregation, joining, sorting etc). So we could also have:
+ </para>
+
+ <programlisting>
+SELECT t.title, p.fullname, p.email
+FROM xpath_table('article_id','article_xml','articles',
+ '/article/title|/article/author/@id',
+ 'xpath_string(article_xml,''/article/@date'') > ''2003-03-20'' ')
+ AS t(article_id integer, title text, author_id integer),
+ tblPeopleInfo AS p
+WHERE t.author_id = p.person_id;
+ </programlisting>
+
+ <para>
+ as a more complicated example. Of course, you could wrap all
+ of this in a view for convenience.
+ </para>
+ <sect3>
+ <title>Multivalued results</title>
+ <para>
+ The xpath_table function assumes that the results of each XPath query
+ might be multi-valued, so the number of rows returned by the function
+ may not be the same as the number of input documents. The first row
+ returned contains the first result from each query, the second row the
+ second result from each query. If one of the queries has fewer values
+ than the others, NULLs will be returned instead.
+ </para>
+ <para>
+ In some cases, a user will know that a given XPath query will return
+ only a single result (perhaps a unique document identifier) - if used
+ alongside an XPath query returning multiple results, the single-valued
+ result will appear only on the first row of the result. The solution
+ to this is to use the key field as part of a join against a simpler
+ XPath query. As an example:
+ </para>
+
+ <para>
+ <literal>
+ CREATE TABLE test
+ (
+ id int4 NOT NULL,
+ xml text,
+ CONSTRAINT pk PRIMARY KEY (id)
+ )
+ WITHOUT OIDS;
+
+ INSERT INTO test VALUES (1, '&lt;doc num="C1">
+ &lt;line num="L1">&lt;a>1&lt;/a>&lt;b>2&lt;/b>&lt;c>3&lt;/c>&lt;/line>
+ &lt;line num="L2">&lt;a>11&lt;/a>&lt;b>22&lt;/b>&lt;c>33&lt;/c>&lt;/line>
+ &lt;/doc>');
+
+ INSERT INTO test VALUES (2, '&lt;doc num="C2">
+ &lt;line num="L1">&lt;a>111&lt;/a>&lt;b>222&lt;/b>&lt;c>333&lt;/c>&lt;/line>
+ &lt;line num="L2">&lt;a>111&lt;/a>&lt;b>222&lt;/b>&lt;c>333&lt;/c>&lt;/line>
+ &lt;/doc>');
+ </literal>
+ </para>
+ </sect3>
+
+ <sect3>
+ <title>The query</title>
+
+ <programlisting>
+ SELECT * FROM xpath_table('id','xml','test',
+ '/doc/@num|/doc/line/@num|/doc/line/a|/doc/line/b|/doc/line/c','1=1')
+ AS t(id int4, doc_num varchar(10), line_num varchar(10), val1 int4,
+ val2 int4, val3 int4)
+ WHERE id = 1 ORDER BY doc_num, line_num
+ </programlisting>
+
+ <para>
+ Gives the result:
+ </para>
+
+ <programlisting>
+ id | doc_num | line_num | val1 | val2 | val3
+ ----+---------+----------+------+------+------
+ 1 | C1 | L1 | 1 | 2 | 3
+ 1 | | L2 | 11 | 22 | 33
+ </programlisting>
+
+ <para>
+ To get doc_num on every line, the solution is to use two invocations
+ of xpath_table and join the results:
+ </para>
+
+ <programlisting>
+ SELECT t.*,i.doc_num FROM
+ xpath_table('id','xml','test',
+ '/doc/line/@num|/doc/line/a|/doc/line/b|/doc/line/c','1=1')
+ AS t(id int4, line_num varchar(10), val1 int4, val2 int4, val3 int4),
+ xpath_table('id','xml','test','/doc/@num','1=1')
+ AS i(id int4, doc_num varchar(10))
+ WHERE i.id=t.id AND i.id=1
+ ORDER BY doc_num, line_num;
+ </programlisting>
+
+ <para>
+ which gives the desired result:
+ </para>
+
+ <programlisting>
+ id | line_num | val1 | val2 | val3 | doc_num
+ ----+----------+------+------+------+---------
+ 1 | L1 | 1 | 2 | 3 | C1
+ 1 | L2 | 11 | 22 | 33 | C1
+ (2 rows)
+ </programlisting>
+ </sect3>
+ </sect2>
+
+
+ <sect2>
+ <title>XSLT functions</title>
+ <para>
+ The following functions are available if libxslt is installed (this is
+ not currently detected automatically, so you will have to amend the
+ Makefile)
+ </para>
+
+ <sect3>
+ <title><literal>xslt_process</literal></title>
+ <programlisting>
+ xslt_process(document,stylesheet,paramlist) RETURNS text
+ </programlisting>
+
+ <para>
+ This function appplies the XSL stylesheet to the document and returns
+ the transformed result. The paramlist is a list of parameter
+ assignments to be used in the transformation, specified in the form
+ 'a=1,b=2'. Note that this is also proof-of-concept code and the
+ parameter parsing is very simple-minded (e.g. parameter values cannot
+ contain commas!)
+ </para>
+ <para>
+ Also note that if either the document or stylesheet values do not
+ begin with a < then they will be treated as URLs and libxslt will
+ fetch them. It thus follows that you can use xslt_process as a means
+ to fetch the contents of URLs - you should be aware of the security
+ implications of this.
+ </para>
+ <para>
+ There is also a two-parameter version of xslt_process which does not
+ pass any parameters to the transformation.
+ </para>
+ </sect3>
+ </sect2>
+
+ <sect2>
+ <title>Credits</title>
+ <para>
+ Development of this module was sponsored by Torchbox Ltd. (www.torchbox.com)
+ It has the same BSD licence as PostgreSQL.
+ </para>
+ <para>
+ This version of the XML functions provides both XPath querying and
+ XSLT functionality. There is also a new table function which allows
+ the straightforward return of multiple XML results. Note that the current code
+ doesn't take any particular care over character sets - this is
+ something that should be fixed at some point!
+ </para>
+ <para>
+ If you have any comments or suggestions, please do contact me at
+ <email>jgray@azuli.co.uk.</email> Unfortunately, this isn't my main job, so
+ I can't guarantee a rapid response to your query!
+ </para>
+ </sect2>
+</sect1>
+