Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

Commit a2a8c7a

Browse files
committed
Support hex-string input and output for type BYTEA.
Both hex format and the traditional "escape" format are automatically handled on input. The output format is selected by the new GUC variable bytea_output. As committed, bytea_output defaults to HEX, which is an *incompatible change*. We will keep it this way for awhile for testing purposes, but should consider whether to switch to the more backwards-compatible default of ESCAPE before 8.5 is released. Peter Eisentraut
1 parent f192e4a commit a2a8c7a

File tree

21 files changed

+442
-111
lines changed

21 files changed

+442
-111
lines changed

doc/src/sgml/config.sgml

+18-1
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
<!-- $PostgreSQL: pgsql/doc/src/sgml/config.sgml,v 1.222 2009/07/16 20:55:44 tgl Exp $ -->
1+
<!-- $PostgreSQL: pgsql/doc/src/sgml/config.sgml,v 1.223 2009/08/04 16:08:35 tgl Exp $ -->
22

33
<chapter Id="runtime-config">
44
<title>Server Configuration</title>
@@ -4060,6 +4060,23 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
40604060
</listitem>
40614061
</varlistentry>
40624062

4063+
<varlistentry id="guc-bytea-output" xreflabel="bytea_output">
4064+
<term><varname>bytea_output</varname> (<type>enum</type>)</term>
4065+
<indexterm>
4066+
<primary><varname>bytea_output</> configuration parameter</primary>
4067+
</indexterm>
4068+
<listitem>
4069+
<para>
4070+
Sets the output format for values of type <type>bytea</type>.
4071+
Valid values are <literal>hex</literal> (the default)
4072+
and <literal>escape</literal> (the traditional PostgreSQL
4073+
format). See <xref linkend="datatype-binary"> for more
4074+
information. The <type>bytea</type> type always
4075+
accepts both formats on input, regardless of this setting.
4076+
</para>
4077+
</listitem>
4078+
</varlistentry>
4079+
40634080
<varlistentry id="guc-xmlbinary" xreflabel="xmlbinary">
40644081
<term><varname>xmlbinary</varname> (<type>enum</type>)</term>
40654082
<indexterm>

doc/src/sgml/datatype.sgml

+77-15
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
<!-- $PostgreSQL: pgsql/doc/src/sgml/datatype.sgml,v 1.240 2009/07/08 17:21:55 tgl Exp $ -->
1+
<!-- $PostgreSQL: pgsql/doc/src/sgml/datatype.sgml,v 1.241 2009/08/04 16:08:35 tgl Exp $ -->
22

33
<chapter id="datatype">
44
<title id="datatype-title">Data Types</title>
@@ -1177,7 +1177,7 @@ SELECT b, char_length(b) FROM test2;
11771177
<para>
11781178
A binary string is a sequence of octets (or bytes). Binary
11791179
strings are distinguished from character strings in two
1180-
ways: First, binary strings specifically allow storing
1180+
ways. First, binary strings specifically allow storing
11811181
octets of value zero and other <quote>non-printable</quote>
11821182
octets (usually, octets outside the range 32 to 126).
11831183
Character strings disallow zero octets, and also disallow any
@@ -1191,13 +1191,82 @@ SELECT b, char_length(b) FROM test2;
11911191
</para>
11921192

11931193
<para>
1194-
When entering <type>bytea</type> values, octets of certain
1195-
values <emphasis>must</emphasis> be escaped (but all octet
1196-
values <emphasis>can</emphasis> be escaped) when used as part
1197-
of a string literal in an <acronym>SQL</acronym> statement. In
1194+
The <type>bytea</type> type supports two external formats for
1195+
input and output: <productname>PostgreSQL</productname>'s historical
1196+
<quote>escape</quote> format, and <quote>hex</quote> format. Both
1197+
of these are always accepted on input. The output format depends
1198+
on the configuration parameter <xref linkend="guc-bytea-output">;
1199+
the default is hex. (Note that the hex format was introduced in
1200+
<productname>PostgreSQL</productname> 8.5; earlier versions and some
1201+
tools don't understand it.)
1202+
</para>
1203+
1204+
<para>
1205+
The <acronym>SQL</acronym> standard defines a different binary
1206+
string type, called <type>BLOB</type> or <type>BINARY LARGE
1207+
OBJECT</type>. The input format is different from
1208+
<type>bytea</type>, but the provided functions and operators are
1209+
mostly the same.
1210+
</para>
1211+
1212+
<sect2>
1213+
<title><type>bytea</> hex format</title>
1214+
1215+
<para>
1216+
The <quote>hex</> format encodes binary data as 2 hexadecimal digits
1217+
per byte, most significant nibble first. The entire string is
1218+
preceded by the sequence <literal>\x</literal> (to distinguish it
1219+
from the escape format). In some contexts, the initial backslash may
1220+
need to be escaped by doubling it, in the same cases in which backslashes
1221+
have to be doubled in escape format; details appear below.
1222+
The hexadecimal digits can
1223+
be either upper or lower case, and whitespace is permitted between
1224+
digit pairs (but not within a digit pair nor in the starting
1225+
<literal>\x</literal> sequence).
1226+
The hex format is compatible with a wide
1227+
range of external applications and protocols, and it tends to be
1228+
faster to convert than the escape format, so its use is preferred.
1229+
</para>
1230+
1231+
<para>
1232+
Example:
1233+
<programlisting>
1234+
SELECT E'\\xDEADBEEF';
1235+
</programlisting>
1236+
</para>
1237+
</sect2>
1238+
1239+
<sect2>
1240+
<title><type>bytea</> escape format</title>
1241+
1242+
<para>
1243+
The <quote>escape</quote> format is the traditional
1244+
<productname>PostgreSQL</productname> format for the <type>bytea</type>
1245+
type. It
1246+
takes the approach of representing a binary string as a sequence
1247+
of ASCII characters, while converting those bytes that cannot be
1248+
represented as an ASCII character into special escape sequences.
1249+
If, from the point of view of the application, representing bytes
1250+
as characters makes sense, then this representation can be
1251+
convenient. But in practice it is usually confusing becauses it
1252+
fuzzes up the distinction between binary strings and character
1253+
strings, and also the particular escape mechanism that was chosen is
1254+
somewhat unwieldy. So this format should probably be avoided
1255+
for most new applications.
1256+
</para>
1257+
1258+
<para>
1259+
When entering <type>bytea</type> values in escape format,
1260+
octets of certain
1261+
values <emphasis>must</emphasis> be escaped, while all octet
1262+
values <emphasis>can</emphasis> be escaped. In
11981263
general, to escape an octet, convert it into its three-digit
11991264
octal value and precede it
1200-
by two backslashes. <xref linkend="datatype-binary-sqlesc">
1265+
by a backslash (or two backslashes, if writing the value as a
1266+
literal using escape string syntax).
1267+
Backslash itself (octet value 92) can alternatively be represented by
1268+
double backslashes.
1269+
<xref linkend="datatype-binary-sqlesc">
12011270
shows the characters that must be escaped, and gives the alternative
12021271
escape sequences where applicable.
12031272
</para>
@@ -1343,14 +1412,7 @@ SELECT b, char_length(b) FROM test2;
13431412
have to escape line feeds and carriage returns if your interface
13441413
automatically translates these.
13451414
</para>
1346-
1347-
<para>
1348-
The <acronym>SQL</acronym> standard defines a different binary
1349-
string type, called <type>BLOB</type> or <type>BINARY LARGE
1350-
OBJECT</type>. The input format is different from
1351-
<type>bytea</type>, but the provided functions and operators are
1352-
mostly the same.
1353-
</para>
1415+
</sect2>
13541416
</sect1>
13551417

13561418

src/backend/catalog/pg_largeobject.c

+2-2
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@
88
*
99
*
1010
* IDENTIFICATION
11-
* $PostgreSQL: pgsql/src/backend/catalog/pg_largeobject.c,v 1.32 2009/01/01 17:23:37 momjian Exp $
11+
* $PostgreSQL: pgsql/src/backend/catalog/pg_largeobject.c,v 1.33 2009/08/04 16:08:36 tgl Exp $
1212
*
1313
*-------------------------------------------------------------------------
1414
*/
@@ -18,7 +18,7 @@
1818
#include "access/heapam.h"
1919
#include "catalog/indexing.h"
2020
#include "catalog/pg_largeobject.h"
21-
#include "utils/builtins.h"
21+
#include "utils/bytea.h"
2222
#include "utils/fmgroids.h"
2323
#include "utils/rel.h"
2424
#include "utils/tqual.h"

src/backend/commands/trigger.c

+2-1
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@
77
* Portions Copyright (c) 1994, Regents of the University of California
88
*
99
* IDENTIFICATION
10-
* $PostgreSQL: pgsql/src/backend/commands/trigger.c,v 1.251 2009/07/30 02:45:36 tgl Exp $
10+
* $PostgreSQL: pgsql/src/backend/commands/trigger.c,v 1.252 2009/08/04 16:08:36 tgl Exp $
1111
*
1212
*-------------------------------------------------------------------------
1313
*/
@@ -37,6 +37,7 @@
3737
#include "tcop/utility.h"
3838
#include "utils/acl.h"
3939
#include "utils/builtins.h"
40+
#include "utils/bytea.h"
4041
#include "utils/fmgroids.h"
4142
#include "utils/inval.h"
4243
#include "utils/lsyscache.h"

src/backend/optimizer/path/indxpath.c

+2-1
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@
99
*
1010
*
1111
* IDENTIFICATION
12-
* $PostgreSQL: pgsql/src/backend/optimizer/path/indxpath.c,v 1.240 2009/06/11 14:48:58 momjian Exp $
12+
* $PostgreSQL: pgsql/src/backend/optimizer/path/indxpath.c,v 1.241 2009/08/04 16:08:36 tgl Exp $
1313
*
1414
*-------------------------------------------------------------------------
1515
*/
@@ -31,6 +31,7 @@
3131
#include "optimizer/restrictinfo.h"
3232
#include "optimizer/var.h"
3333
#include "utils/builtins.h"
34+
#include "utils/bytea.h"
3435
#include "utils/lsyscache.h"
3536
#include "utils/pg_locale.h"
3637
#include "utils/selfuncs.h"

src/backend/utils/adt/encode.c

+5-5
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@
77
*
88
*
99
* IDENTIFICATION
10-
* $PostgreSQL: pgsql/src/backend/utils/adt/encode.c,v 1.23 2009/01/01 17:23:49 momjian Exp $
10+
* $PostgreSQL: pgsql/src/backend/utils/adt/encode.c,v 1.24 2009/08/04 16:08:36 tgl Exp $
1111
*
1212
*-------------------------------------------------------------------------
1313
*/
@@ -109,7 +109,7 @@ binary_decode(PG_FUNCTION_ARGS)
109109
* HEX
110110
*/
111111

112-
static const char *hextbl = "0123456789abcdef";
112+
static const char hextbl[] = "0123456789abcdef";
113113

114114
static const int8 hexlookup[128] = {
115115
-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
@@ -122,7 +122,7 @@ static const int8 hexlookup[128] = {
122122
-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
123123
};
124124

125-
static unsigned
125+
unsigned
126126
hex_encode(const char *src, unsigned len, char *dst)
127127
{
128128
const char *end = src + len;
@@ -136,7 +136,7 @@ hex_encode(const char *src, unsigned len, char *dst)
136136
return len * 2;
137137
}
138138

139-
static char
139+
static inline char
140140
get_hex(char c)
141141
{
142142
int res = -1;
@@ -152,7 +152,7 @@ get_hex(char c)
152152
return (char) res;
153153
}
154154

155-
static unsigned
155+
unsigned
156156
hex_decode(const char *src, unsigned len, char *dst)
157157
{
158158
const char *s,

src/backend/utils/adt/selfuncs.c

+2-1
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@
1515
*
1616
*
1717
* IDENTIFICATION
18-
* $PostgreSQL: pgsql/src/backend/utils/adt/selfuncs.c,v 1.261 2009/06/11 14:49:04 momjian Exp $
18+
* $PostgreSQL: pgsql/src/backend/utils/adt/selfuncs.c,v 1.262 2009/08/04 16:08:36 tgl Exp $
1919
*
2020
*-------------------------------------------------------------------------
2121
*/
@@ -109,6 +109,7 @@
109109
#include "parser/parse_coerce.h"
110110
#include "parser/parsetree.h"
111111
#include "utils/builtins.h"
112+
#include "utils/bytea.h"
112113
#include "utils/date.h"
113114
#include "utils/datum.h"
114115
#include "utils/fmgroids.h"

0 commit comments

Comments
 (0)