1
- <!-- $PostgreSQL: pgsql/doc/src/sgml/syntax.sgml,v 1.123 2008/06/26 22:24:42 momjian Exp $ -->
1
+ <!-- $PostgreSQL: pgsql/doc/src/sgml/syntax.sgml,v 1.124 2008/10/29 08:04:52 petere Exp $ -->
2
2
3
3
<chapter id="sql-syntax">
4
4
<title>SQL Syntax</title>
@@ -189,6 +189,57 @@ UPDATE "my_table" SET "a" = 5;
189
189
ampersands. The length limitation still applies.
190
190
</para>
191
191
192
+ <para>
193
+ <indexterm><primary>Unicode escape</primary><secondary>in
194
+ identifiers</secondary></indexterm> A variant of quoted
195
+ identifiers allows including escaped Unicode characters identified
196
+ by their code points. This variant starts
197
+ with <literal>U&</literal> (upper or lower case U followed by
198
+ ampersand) immediately before the opening double quote, without
199
+ any spaces in between, for example <literal>U&"foo"</literal>.
200
+ (Note that this creates an ambiguity with the
201
+ operator <literal>&</literal>. Use spaces around the operator to
202
+ avoid this problem.) Inside the quotes, Unicode characters can be
203
+ specified in escaped form by writing a backslash followed by the
204
+ four-digit hexadecimal code point number or alternatively a
205
+ backslash followed by a plus sign followed by a six-digit
206
+ hexadecimal code point number. For example, the
207
+ identifier <literal>"data"</literal> could be written as
208
+ <programlisting>
209
+ U&"d\0061t\+000061"
210
+ </programlisting>
211
+ The following less trivial example writes the Russian
212
+ word <quote>slon</quote> (elephant) in Cyrillic letters:
213
+ <programlisting>
214
+ U&"\0441\043B\043E\043D"
215
+ </programlisting>
216
+ </para>
217
+
218
+ <para>
219
+ If a different escape character than backslash is desired, it can
220
+ be specified using
221
+ the <literal>UESCAPE</literal><indexterm><primary>UESCAPE</primary></indexterm>
222
+ clause after the string, for example:
223
+ <programlisting>
224
+ U&"d!0061t!+000061" UESCAPE '!'
225
+ </programlisting>
226
+ The escape character can be any single character other than a
227
+ hexadecimal digit, the plus sign, a single quote, a double quote,
228
+ or a whitespace character. Note that the escape character is
229
+ written in single quotes, not double quotes.
230
+ </para>
231
+
232
+ <para>
233
+ To include the escape character in the identifier literally, write
234
+ it twice.
235
+ </para>
236
+
237
+ <para>
238
+ The Unicode escape syntax works only when the server encoding is
239
+ UTF8. When other server encodings are used, only code points in
240
+ the ASCII range (up to <literal>\007F</literal>) can be specified.
241
+ </para>
242
+
192
243
<para>
193
244
Quoting an identifier also makes it case-sensitive, whereas
194
245
unquoted names are always folded to lower case. For example, the
@@ -245,7 +296,7 @@ UPDATE "my_table" SET "a" = 5;
245
296
write two adjacent single quotes, e.g.
246
297
<literal>'Dianne''s horse'</literal>.
247
298
Note that this is <emphasis>not</> the same as a double-quote
248
- character (<literal>"</>).
299
+ character (<literal>"</>). <!-- font-lock sanity: " -->
249
300
</para>
250
301
251
302
<para>
@@ -269,14 +320,19 @@ SELECT 'foo' 'bar';
269
320
by <acronym>SQL</acronym>; <productname>PostgreSQL</productname> is
270
321
following the standard.)
271
322
</para>
323
+ </sect3>
272
324
273
- <para>
274
- <indexterm>
325
+ <sect3 id="sql-syntax-strings-escape">
326
+ <title>String Constants with C-Style Escapes</title>
327
+
328
+ <indexterm zone="sql-syntax-strings-escape">
275
329
<primary>escape string syntax</primary>
276
330
</indexterm>
277
- <indexterm>
331
+ <indexterm zone="sql-syntax-strings-escape" >
278
332
<primary>backslash escapes</primary>
279
333
</indexterm>
334
+
335
+ <para>
280
336
<productname>PostgreSQL</productname> also accepts <quote>escape</>
281
337
string constants, which are an extension to the SQL standard.
282
338
An escape string constant is specified by writing the letter
@@ -287,7 +343,8 @@ SELECT 'foo' 'bar';
287
343
Within an escape string, a backslash character (<literal>\</>) begins a
288
344
C-like <firstterm>backslash escape</> sequence, in which the combination
289
345
of backslash and following character(s) represent a special byte
290
- value:
346
+ value, as shown in <xref linkend="sql-backslash-table">.
347
+ </para>
291
348
292
349
<table id="sql-backslash-table">
293
350
<title>Backslash Escape Sequences</title>
@@ -341,14 +398,24 @@ SELECT 'foo' 'bar';
341
398
</tgroup>
342
399
</table>
343
400
344
- It is your responsibility that the byte sequences you create are
345
- valid characters in the server character set encoding. Any other
401
+ <para>
402
+ Any other
346
403
character following a backslash is taken literally. Thus, to
347
404
include a backslash character, write two backslashes (<literal>\\</>).
348
405
Also, a single quote can be included in an escape string by writing
349
406
<literal>\'</literal>, in addition to the normal way of <literal>''</>.
350
407
</para>
351
408
409
+ <para>
410
+ It is your responsibility that the byte sequences you create are
411
+ valid characters in the server character set encoding. When the
412
+ server encoding is UTF-8, then the alternative Unicode escape
413
+ syntax, explained in <xref linkend="sql-syntax-strings-uescape">,
414
+ should be used instead. (The alternative would be doing the
415
+ UTF-8 encoding by hand and writing out the bytes, which would be
416
+ very cumbersome.)
417
+ </para>
418
+
352
419
<caution>
353
420
<para>
354
421
If the configuration parameter
@@ -379,6 +446,65 @@ SELECT 'foo' 'bar';
379
446
</para>
380
447
</sect3>
381
448
449
+ <sect3 id="sql-syntax-strings-uescape">
450
+ <title>String Constants with Unicode Escapes</title>
451
+
452
+ <indexterm zone="sql-syntax-strings-uescape">
453
+ <primary>Unicode escape</primary>
454
+ <secondary>in string constants</secondary>
455
+ </indexterm>
456
+
457
+ <para>
458
+ <productname>PostgreSQL</productname> also supports another type
459
+ of escape syntax for strings that allows specifying arbitrary
460
+ Unicode characters by code point. A Unicode escape string
461
+ constant starts with <literal>U&</literal> (upper or lower case
462
+ letter U followed by ampersand) immediately before the opening
463
+ quote, without any spaces in between, for
464
+ example <literal>U&'foo'</literal>. (Note that this creates an
465
+ ambiguity with the operator <literal>&</literal>. Use spaces
466
+ around the operator to avoid this problem.) Inside the quotes,
467
+ Unicode characters can be specified in escaped form by writing a
468
+ backslash followed by the four-digit hexadecimal code point
469
+ number or alternatively a backslash followed by a plus sign
470
+ followed by a six-digit hexadecimal code point number. For
471
+ example, the string <literal>'data'</literal> could be written as
472
+ <programlisting>
473
+ U&'d\0061t\+000061'
474
+ </programlisting>
475
+ The following less trivial example writes the Russian
476
+ word <quote>slon</quote> (elephant) in Cyrillic letters:
477
+ <programlisting>
478
+ U&'\0441\043B\043E\043D'
479
+ </programlisting>
480
+ </para>
481
+
482
+ <para>
483
+ If a different escape character than backslash is desired, it can
484
+ be specified using
485
+ the <literal>UESCAPE</literal><indexterm><primary>UESCAPE</primary></indexterm>
486
+ clause after the string, for example:
487
+ <programlisting>
488
+ U&'d!0061t!+000061' UESCAPE '!'
489
+ </programlisting>
490
+ The escape character can be any single character other than a
491
+ hexadecimal digit, the plus sign, a single quote, a double quote,
492
+ or a whitespace character.
493
+ </para>
494
+
495
+ <para>
496
+ The Unicode escape syntax works only when the server encoding is
497
+ UTF8. When other server encodings are used, only code points in
498
+ the ASCII range (up to <literal>\007F</literal>) can be
499
+ specified.
500
+ </para>
501
+
502
+ <para>
503
+ To include the escape character in the string literally, write it
504
+ twice.
505
+ </para>
506
+ </sect3>
507
+
382
508
<sect3 id="sql-syntax-dollar-quoting">
383
509
<title>Dollar-Quoted String Constants</title>
384
510
0 commit comments