From 414c5a2ea65cbd38d79ffdf9b1fde7cc75c134e0 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut
Date: Tue, 8 Feb 2011 23:04:18 +0200
Subject: Per-column collation support
This adds collation support for columns and domains, a COLLATE clause
to override it per expression, and B-tree index support.
Peter Eisentraut
reviewed by Pavel Stehule, Itagaki Takahiro, Robert Haas, Noah Misch
---
doc/src/sgml/catalogs.sgml | 110 ++++++++++++++++++++++++
doc/src/sgml/charset.sgml | 164 ++++++++++++++++++++++++++++++++++++
doc/src/sgml/func.sgml | 15 +++-
doc/src/sgml/indices.sgml | 42 +++++++++
doc/src/sgml/ref/create_domain.sgml | 12 +++
doc/src/sgml/ref/create_index.sgml | 16 +++-
doc/src/sgml/ref/create_table.sgml | 13 ++-
doc/src/sgml/ref/create_type.sgml | 11 +++
doc/src/sgml/regress.sgml | 21 +++++
doc/src/sgml/syntax.sgml | 48 +++++++++++
10 files changed, 448 insertions(+), 4 deletions(-)
(limited to 'doc/src')
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index be132f2eb7b..f31662c2720 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -103,6 +103,11 @@
check constraints, unique constraints, primary key constraints, foreign key constraints
+
+ pg_collation
+ collations (locale information)
+
+
pg_conversionencoding conversion information
@@ -1113,6 +1118,16 @@
+
+ attcollation
+ oid
+ pg_collation.oid
+
+ The defined collation of the column, zero if the column does
+ not have a collatable type.
+
+
+
attaclaclitem[]
@@ -2050,6 +2065,76 @@
+
+ pg_collation
+
+
+ pg_collation
+
+
+
+ The catalog pg_collation describes the
+ available collations, which are essentially mappings from an SQL
+ name to operating system locale categories.
+ See for more information.
+
+
+
+ pg_collation> Columns
+
+
+
+
+ Name
+ Type
+ References
+ Description
+
+
+
+
+
+ collname
+ name
+
+ Collation name (unique per namespace and encoding)
+
+
+
+ collnamespace
+ oid
+ pg_namespace.oid
+
+ The OID of the namespace that contains this collation
+
+
+
+
+ collencoding
+ int4
+
+ Encoding to which the collation is applicable
+
+
+
+ collcollate
+ name
+
+ LC_COLLATE for this collation object
+
+
+
+ collctype
+ name
+
+ LC_CTYPE for this collation object
+
+
+
+
+
+
+
pg_conversion
@@ -3125,6 +3210,16 @@
+
+ indcollation
+ oidvector
+ pg_collation.oid
+
+ For each column in the index key, this contains the OID of the
+ collation to use for the index.
+
+
+
indclassoidvector
@@ -5866,6 +5961,21 @@
+
+ typcollation
+ oid
+ pg_collation.oid
+
+ typcollation specifies the collation
+ of the type. If a type does not support collations, this will
+ be zero, collation analysis at parse time is skipped, and
+ the use of COLLATE clauses with the type is
+ invalid. A base type that supports collations will have
+ DEFAULT_COLLATION_OID here. A domain can have
+ another collation OID, if one was defined for the domain.
+
+
+
typdefaultbinpg_node_tree
diff --git a/doc/src/sgml/charset.sgml b/doc/src/sgml/charset.sgml
index 644c711dccc..49e1bd25b43 100644
--- a/doc/src/sgml/charset.sgml
+++ b/doc/src/sgml/charset.sgml
@@ -304,6 +304,170 @@ initdb --locale=sv_SE
+
+ Collation Support
+
+
+ The collation support allows specifying the sort order and certain
+ other locale aspects of data per column or per operation at run
+ time. This alleviates the problem that the
+ LC_COLLATE and LC_CTYPE settings
+ of a database cannot be changed after its creation.
+
+
+
+
+ The collation support feature is currently only known to work on
+ Linux/glibc and Mac OS X platforms.
+
+
+
+
+ Concepts
+
+
+ Conceptually, every datum of a collatable data type has a
+ collation. (Collatable data types in the base system are
+ text, varchar, and char.
+ User-defined base types can also be marked collatable.) If the
+ datum is a column reference, the collation of the datum is the
+ defined collation of the column. If the datum is a constant, the
+ collation is the default collation of the data type of the
+ constant. The collation of more complex expressions is derived
+ from the input collations as described below.
+
+
+
+ The collation of a datum can also be the default
+ collation, which reverts to the locale settings defined for the
+ database. In some cases, a datum can also have no known
+ collation. In such cases, ordering operations and other
+ operations that need to know the collation will fail.
+
+
+
+ When the database system has to perform an ordering or a
+ comparison, it considers the collation of the input data. This
+ happens in two situations: an ORDER BY clause
+ and a function or operator call such as <.
+ The collation to apply for the performance of the ORDER
+ BY clause is simply the collation of the sort key. The
+ collation to apply for a function or operator call is derived from
+ the arguments, as described below. Additionally, collations are
+ taken into account by functions that convert between lower and
+ upper case letters, that is, lower,
+ upper, and initcap.
+
+
+
+ For a function call, the collation that is derived from combining
+ the argument collations is both used for performing any
+ comparisons or ordering and for the collation of the function
+ result, if the result type is collatable.
+
+
+
+ The collation derivation of a datum can be
+ implicit or explicit. This distinction affects how collations are
+ combined when multiple different collations appear in an
+ expression. An explicit collation derivation arises when a
+ COLLATE clause is used; all other collation
+ derivations are implicit. When multiple collations need to be
+ combined, for example in a function call, the following rules are
+ used:
+
+
+
+
+ If any input item has an explicit collation derivation, then
+ all explicitly derived collations among the input items must be
+ the same, otherwise an error is raised. If an explicitly
+ derived collation is present, that is the result of the
+ collation combination.
+
+
+
+
+
+ Otherwise, all input items must have the same implicit
+ collation derivation or the default collation. If an
+ implicitly derived collation is present, that is the result of
+ the collation combination. Otherwise, the result is the
+ default collation.
+
+
+
+
+ For example, take this table definition:
+
+CREATE TABLE test1 (
+ a text COLLATE "x",
+ ...
+);
+
+
+ Then in
+
+SELECT a || 'foo' FROM test1;
+
+ the result collation of the || operator is
+ "x" because it combines an implicitly derived
+ collation with the default collation. But in
+
+SELECT a || ('foo' COLLATE "y") FROM test1;
+
+ the result collation is "y" because the explicit
+ collation derivation overrides the implicit one.
+
+
+
+
+ Managing Collations
+
+
+ A collation is an SQL schema object that maps an SQL name to
+ operating system locales. In particular, it maps to a combination
+ of LC_COLLATE and LC_CTYPE. (As
+ the name would indicate, the main purpose of a collation is to set
+ LC_COLLATE, which controls the sort order. But
+ it is rarely necessary in practice to have an
+ LC_CTYPE setting that is different from
+ LC_COLLATE, so it is more convenient to collect
+ these under one concept than to create another infrastructure for
+ setting LC_CTYPE per datum.) Also, a collation
+ is tied to a character encoding. The same collation name may
+ exist for different encodings.
+
+
+
+ When a database system is initialized, initdb
+ populates the system catalog pg_collation with
+ collations based on all the locales it finds on the operating
+ system at the time. For example, the operating system might
+ provide a locale named de_DE.utf8.
+ initdb would then create a collation named
+ de_DE.utf8 for encoding UTF8
+ that has both LC_COLLATE and
+ LC_CTYPE set to de_DE.utf8.
+ It will also create a collation with the .utf8
+ tag stripped off the name. So you could also use the collation
+ under the name de_DE, which is less cumbersome
+ to write and makes the name less encoding-dependent. Note that,
+ nevertheless, the initial set of collation names is
+ platform-dependent.
+
+
+
+ In case a collation is needed that has different values for
+ LC_COLLATE and LC_CTYPE, or a
+ different name is needed for a collation (for example, for
+ compatibility with existing applications), a new collation may be
+ created. But there is currently no SQL-level support for creating
+ or changing collations.
+
+
+
+
Character Set Support
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 897078ac3e6..70a1bd9ee9f 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -13059,6 +13059,12 @@ SELECT relname FROM pg_class WHERE pg_table_is_visible(oid);
+
+ pg_collation_is_visible(collation_oid)
+
+ boolean
+ is collation visible in search path
+ pg_conversion_is_visible(conversion_oid)
@@ -13123,6 +13129,9 @@ SELECT relname FROM pg_class WHERE pg_table_is_visible(oid);
+
+ pg_collation_is_visible
+ pg_conversion_is_visible
@@ -13256,7 +13265,7 @@ SELECT pg_type_is_visible('myschema.widget'::regtype);
- format_type(type_oid, typemod>)
+ format_type(type_oid [, typemod> [, collation_oid> ]])textget SQL name of a data type
@@ -13392,7 +13401,9 @@ SELECT pg_type_is_visible('myschema.widget'::regtype);
format_type returns the SQL name of a data type that
is identified by its type OID and possibly a type modifier. Pass NULL
- for the type modifier if no specific modifier is known.
+ for the type modifier or omit the argument if no specific modifier is known.
+ If a collation is given as third argument, a COLLATE> clause
+ followed by a formatted collation name is appended.
diff --git a/doc/src/sgml/indices.sgml b/doc/src/sgml/indices.sgml
index 8fec13ee1d8..15fbe0d614e 100644
--- a/doc/src/sgml/indices.sgml
+++ b/doc/src/sgml/indices.sgml
@@ -921,6 +921,7 @@ CREATE INDEX name ON table
defining two operator classes for the data type and then selecting
the proper class when making an index. The operator class determines
the basic sort ordering (which can then be modified by adding sort options
+ COLLATE,
ASC>/DESC> and/or
NULLS FIRST>/NULLS LAST>).
@@ -1002,6 +1003,47 @@ SELECT am.amname AS index_method,
+
+ Collations and Indexes
+
+
+ An index can only support one collation for one column or
+ expression. If multiple collations are of interest, multiple
+ indexes may be created.
+
+
+
+ Consider these statements:
+
+CREATE TABLE test1c (
+ id integer,
+ content varchar COLLATE "x"
+);
+
+CREATE INDEX test1c_content_index ON test1c (content);
+
+ The created index automatically follows the collation of the
+ underlying column, and so a query of the form
+
+SELECT * FROM test1c WHERE content = constant;
+
+ could use the index.
+
+
+
+ If in addition, a query of the form, say,
+
+SELECT * FROM test1c WHERE content > constant COLLATE "y";
+
+ is of interest, an additional index could be created that supports
+ the "y" collation, like so:
+
+CREATE INDEX test1c_content_index ON test1c (content COLLATE "y");
+
+
+
+
+
Examining Index Usage
diff --git a/doc/src/sgml/ref/create_domain.sgml b/doc/src/sgml/ref/create_domain.sgml
index 87a7654d6ce..83be889c6d9 100644
--- a/doc/src/sgml/ref/create_domain.sgml
+++ b/doc/src/sgml/ref/create_domain.sgml
@@ -22,6 +22,7 @@ PostgreSQL documentation
CREATE DOMAIN name [ AS ] data_type
+ [ COLLATE collation ]
[ DEFAULT expression ]
[ constraint [ ... ] ]
@@ -83,6 +84,17 @@ CREATE DOMAIN name [ AS ]
+
+ collation
+
+
+ An optional collation for the domain. If no collation is
+ specified, the database default collation is used (which can
+ be overridden when the domain is used to define a column).
+
+
+
+
DEFAULT expression
diff --git a/doc/src/sgml/ref/create_index.sgml b/doc/src/sgml/ref/create_index.sgml
index 45c298e371e..8ec7abbbd49 100644
--- a/doc/src/sgml/ref/create_index.sgml
+++ b/doc/src/sgml/ref/create_index.sgml
@@ -22,7 +22,7 @@ PostgreSQL documentation
CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ name ] ON table [ USING method ]
- ( { column | ( expression ) } [ opclass ] [ ASC | DESC ] [ NULLS { FIRST | LAST } ] [, ...] )
+ ( { column | ( expression ) } [ COLLATE collation ] [ opclass ] [ ASC | DESC ] [ NULLS { FIRST | LAST } ] [, ...] )
[ WITH ( storage_parameter = value [, ... ] ) ]
[ TABLESPACE tablespace ]
[ WHERE predicate ]
@@ -181,6 +181,20 @@ CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ name
+
+ collation
+
+
+ The name of the collation to use for the index. By default,
+ the index uses the collation declared for the column to be
+ indexed or the result collation of the expression to be
+ indexed. Indexes with nondefault collations are
+ available for use by queries that involve expressions using
+ nondefault collations.
+
+
+
+
opclass
diff --git a/doc/src/sgml/ref/create_table.sgml b/doc/src/sgml/ref/create_table.sgml
index 5bffe303364..9d2d99ad2e3 100644
--- a/doc/src/sgml/ref/create_table.sgml
+++ b/doc/src/sgml/ref/create_table.sgml
@@ -22,7 +22,7 @@ PostgreSQL documentation
CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } | UNLOGGED ] TABLE [ IF NOT EXISTS ] table_name ( [
- { column_namedata_type [ column_constraint [ ... ] ]
+ { column_namedata_type [ COLLATE collation ] [ column_constraint [ ... ] ]
| table_constraint
| LIKE parent_table [ like_option ... ] }
[, ... ]
@@ -244,6 +244,17 @@ CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } | UNLOGGED ] TABLE [ IF NOT EXI
+
+ COLLATE collation
+
+
+ The COLLATE> clause assigns a nondefault collation to
+ the column. By default, the locale settings of the database are
+ used.
+
+
+
+
INHERITS ( parent_table [, ... ] )
diff --git a/doc/src/sgml/ref/create_type.sgml b/doc/src/sgml/ref/create_type.sgml
index a3c75b51d0f..805fc2cf760 100644
--- a/doc/src/sgml/ref/create_type.sgml
+++ b/doc/src/sgml/ref/create_type.sgml
@@ -45,6 +45,7 @@ CREATE TYPE name (
[ , DEFAULT = default ]
[ , ELEMENT = element ]
[ , DELIMITER = delimiter ]
+ [ , COLLATABLE = collatable ]
)
CREATE TYPE name
@@ -352,6 +353,16 @@ CREATE TYPE name
with the array element type, not the array type itself.
+
+ If the optional
+ parameter collatable
+ is true, column definitions and expressions of the type may carry
+ collation information and allow the use of
+ the COLLATE clause. It is up to the
+ implementations of the functions operating on the type to actually
+ make use of the collation information; this does not happen
+ automatically merely by marking the type collatable.
+
diff --git a/doc/src/sgml/regress.sgml b/doc/src/sgml/regress.sgml
index 61a87ff74d2..6f5e2b1ab88 100644
--- a/doc/src/sgml/regress.sgml
+++ b/doc/src/sgml/regress.sgml
@@ -236,6 +236,27 @@ gmake check LANG=C MULTIBYTE=EUC_JP
existing installation.
+
+
+ Extra tests
+
+
+ The regression test suite contains a few test files that are not
+ run by default, because they might be platform-dependent or take a
+ very long time to run. You can run these or other extra test
+ files by setting the variable EXTRA_TESTS. For
+ example, to run the numeric_big test:
+
+gmake check EXTRA_TESTS=numeric_big
+
+ To run the collation tests:
+
+gmake check EXTRA_TESTS=collate.linux.utf8 LANG=en_US.utf8
+
+ This test works only on Linux/glibc platforms and when run in a
+ UTF-8 locale.
+
+
diff --git a/doc/src/sgml/syntax.sgml b/doc/src/sgml/syntax.sgml
index 004205f1265..00f665ae544 100644
--- a/doc/src/sgml/syntax.sgml
+++ b/doc/src/sgml/syntax.sgml
@@ -1899,6 +1899,54 @@ CAST ( expression AS type
+
+ COLLATE Clause
+
+
+ COLLATE
+
+
+
+ The COLLATE clause overrides the collation of
+ an expression. It is appended to the expression it applies to:
+
+expr COLLATE collation
+
+ where collation is a possibly
+ schema-qualified identifier. The COLLATE
+ clause binds tighter than operators; parentheses can be used when
+ necessary.
+
+
+
+ If no collation is explicitly specified, the database system
+ either derives a collation from the columns involved in the
+ expression, or it defaults to the default collation of the
+ database if no column is involved in the expression.
+
+
+
+ The two typical uses of the COLLATE clause are
+ overriding the sort order in an ORDER BY> clause, for
+ example:
+
+SELECT a, b, c FROM tbl WHERE ... ORDER BY a COLLATE "C";
+
+ and overriding the collation of a function or operator call that
+ has locale-sensitive results, for example:
+
+SELECT * FROM tbl WHERE a > 'foo' COLLATE "C";
+
+ In the latter case it doesn't matter which argument of the
+ operator of function call the COLLATE> clause is
+ attached to, because the collation that is applied by the operator
+ or function is derived from all arguments, and
+ the COLLATE> clause will override the collations of all
+ other arguments. Attaching nonmatching COLLATE>
+ clauses to more than one argument, however, is an error.
+
+
+
Scalar Subqueries
--
cgit v1.2.3