1
- <!-- $PostgreSQL: pgsql/doc/src/sgml/datatype.sgml,v 1.240 2009/07/08 17:21:55 tgl Exp $ -->
1
+ <!-- $PostgreSQL: pgsql/doc/src/sgml/datatype.sgml,v 1.241 2009/08/04 16:08:35 tgl Exp $ -->
2
2
3
3
<chapter id="datatype">
4
4
<title id="datatype-title">Data Types</title>
@@ -1177,7 +1177,7 @@ SELECT b, char_length(b) FROM test2;
1177
1177
<para>
1178
1178
A binary string is a sequence of octets (or bytes). Binary
1179
1179
strings are distinguished from character strings in two
1180
- ways: First, binary strings specifically allow storing
1180
+ ways. First, binary strings specifically allow storing
1181
1181
octets of value zero and other <quote>non-printable</quote>
1182
1182
octets (usually, octets outside the range 32 to 126).
1183
1183
Character strings disallow zero octets, and also disallow any
@@ -1191,13 +1191,82 @@ SELECT b, char_length(b) FROM test2;
1191
1191
</para>
1192
1192
1193
1193
<para>
1194
- When entering <type>bytea</type> values, octets of certain
1195
- values <emphasis>must</emphasis> be escaped (but all octet
1196
- values <emphasis>can</emphasis> be escaped) when used as part
1197
- of a string literal in an <acronym>SQL</acronym> statement. In
1194
+ The <type>bytea</type> type supports two external formats for
1195
+ input and output: <productname>PostgreSQL</productname>'s historical
1196
+ <quote>escape</quote> format, and <quote>hex</quote> format. Both
1197
+ of these are always accepted on input. The output format depends
1198
+ on the configuration parameter <xref linkend="guc-bytea-output">;
1199
+ the default is hex. (Note that the hex format was introduced in
1200
+ <productname>PostgreSQL</productname> 8.5; earlier versions and some
1201
+ tools don't understand it.)
1202
+ </para>
1203
+
1204
+ <para>
1205
+ The <acronym>SQL</acronym> standard defines a different binary
1206
+ string type, called <type>BLOB</type> or <type>BINARY LARGE
1207
+ OBJECT</type>. The input format is different from
1208
+ <type>bytea</type>, but the provided functions and operators are
1209
+ mostly the same.
1210
+ </para>
1211
+
1212
+ <sect2>
1213
+ <title><type>bytea</> hex format</title>
1214
+
1215
+ <para>
1216
+ The <quote>hex</> format encodes binary data as 2 hexadecimal digits
1217
+ per byte, most significant nibble first. The entire string is
1218
+ preceded by the sequence <literal>\x</literal> (to distinguish it
1219
+ from the escape format). In some contexts, the initial backslash may
1220
+ need to be escaped by doubling it, in the same cases in which backslashes
1221
+ have to be doubled in escape format; details appear below.
1222
+ The hexadecimal digits can
1223
+ be either upper or lower case, and whitespace is permitted between
1224
+ digit pairs (but not within a digit pair nor in the starting
1225
+ <literal>\x</literal> sequence).
1226
+ The hex format is compatible with a wide
1227
+ range of external applications and protocols, and it tends to be
1228
+ faster to convert than the escape format, so its use is preferred.
1229
+ </para>
1230
+
1231
+ <para>
1232
+ Example:
1233
+ <programlisting>
1234
+ SELECT E'\\xDEADBEEF';
1235
+ </programlisting>
1236
+ </para>
1237
+ </sect2>
1238
+
1239
+ <sect2>
1240
+ <title><type>bytea</> escape format</title>
1241
+
1242
+ <para>
1243
+ The <quote>escape</quote> format is the traditional
1244
+ <productname>PostgreSQL</productname> format for the <type>bytea</type>
1245
+ type. It
1246
+ takes the approach of representing a binary string as a sequence
1247
+ of ASCII characters, while converting those bytes that cannot be
1248
+ represented as an ASCII character into special escape sequences.
1249
+ If, from the point of view of the application, representing bytes
1250
+ as characters makes sense, then this representation can be
1251
+ convenient. But in practice it is usually confusing becauses it
1252
+ fuzzes up the distinction between binary strings and character
1253
+ strings, and also the particular escape mechanism that was chosen is
1254
+ somewhat unwieldy. So this format should probably be avoided
1255
+ for most new applications.
1256
+ </para>
1257
+
1258
+ <para>
1259
+ When entering <type>bytea</type> values in escape format,
1260
+ octets of certain
1261
+ values <emphasis>must</emphasis> be escaped, while all octet
1262
+ values <emphasis>can</emphasis> be escaped. In
1198
1263
general, to escape an octet, convert it into its three-digit
1199
1264
octal value and precede it
1200
- by two backslashes. <xref linkend="datatype-binary-sqlesc">
1265
+ by a backslash (or two backslashes, if writing the value as a
1266
+ literal using escape string syntax).
1267
+ Backslash itself (octet value 92) can alternatively be represented by
1268
+ double backslashes.
1269
+ <xref linkend="datatype-binary-sqlesc">
1201
1270
shows the characters that must be escaped, and gives the alternative
1202
1271
escape sequences where applicable.
1203
1272
</para>
@@ -1343,14 +1412,7 @@ SELECT b, char_length(b) FROM test2;
1343
1412
have to escape line feeds and carriage returns if your interface
1344
1413
automatically translates these.
1345
1414
</para>
1346
-
1347
- <para>
1348
- The <acronym>SQL</acronym> standard defines a different binary
1349
- string type, called <type>BLOB</type> or <type>BINARY LARGE
1350
- OBJECT</type>. The input format is different from
1351
- <type>bytea</type>, but the provided functions and operators are
1352
- mostly the same.
1353
- </para>
1415
+ </sect2>
1354
1416
</sect1>
1355
1417
1356
1418
0 commit comments