1
- <!-- $Header: /cvsroot/pgsql/doc/src/sgml/queries.sgml,v 1.2 2001/02/01 19:13:47 momjian Exp $ -->
1
+ <!-- $Header: /cvsroot/pgsql/doc/src/sgml/queries.sgml,v 1.3 2001/02/10 08:30:13 tgl Exp $ -->
2
2
3
3
<chapter id="queries">
4
4
<title>Queries</title>
5
5
6
6
<para>
7
- A <firstterm>query</firstterm> is the process of or the command to
8
- retrieve data from a database. In SQL the <command>SELECT</command>
7
+ A <firstterm>query</firstterm> is the process of retrieving or the command
8
+ to retrieve data from a database. In SQL the <command>SELECT</command>
9
9
command is used to specify queries. The general syntax of the
10
10
<command>SELECT</command> command is
11
11
<synopsis>
@@ -65,11 +65,11 @@ SELECT random();
65
65
</para>
66
66
67
67
<para>
68
- The WHERE, GROUP BY, and HAVING clauses in the table expression
68
+ The optional WHERE, GROUP BY, and HAVING clauses in the table expression
69
69
specify a pipeline of successive transformations performed on the
70
- table derived in the FROM clause. The final transformed table that
71
- is derived provides the input rows used to derive output rows as
72
- specified by the select list of derived column value expressions.
70
+ table derived in the FROM clause. The derived table that is produced by
71
+ all these transformations provides the input rows used to compute output
72
+ rows as specified by the select list of column value expressions.
73
73
</para>
74
74
75
75
<sect2 id="queries-from">
@@ -91,10 +91,12 @@ FROM <replaceable>table_reference</replaceable> <optional>, <replaceable>table_r
91
91
</para>
92
92
93
93
<para>
94
- If a table reference is a simple table name and it is the
95
- supertable in a table inheritance hierarchy, rows of the table
96
- include rows from all of its subtable successors unless the
97
- keyword ONLY precedes the table name.
94
+ When a table reference names a table that is the
95
+ supertable of a table inheritance hierarchy, the table reference
96
+ produces rows of not only that table but all of its subtable successors,
97
+ unless the keyword ONLY precedes the table name. However, the reference
98
+ produces only the columns that appear in the named table --- any columns
99
+ added in subtables are ignored.
98
100
</para>
99
101
100
102
<sect3 id="queries-join">
@@ -124,7 +126,7 @@ FROM <replaceable>table_reference</replaceable> <optional>, <replaceable>table_r
124
126
row consisting of all columns in <replaceable>T1</replaceable>
125
127
followed by all columns in <replaceable>T2</replaceable>. If
126
128
the tables have have N and M rows respectively, the joined
127
- table will have N * M rows. A cross join is essentially an
129
+ table will have N * M rows. A cross join is equivalent to an
128
130
<literal>INNER JOIN ON TRUE</literal>.
129
131
</para>
130
132
@@ -189,11 +191,11 @@ FROM <replaceable>table_reference</replaceable> <optional>, <replaceable>table_r
189
191
190
192
<listitem>
191
193
<para>
192
- First, an INNER JOIN is performed. Then, for a row in T1
194
+ First, an INNER JOIN is performed. Then, for each row in T1
193
195
that does not satisfy the join condition with any row in
194
196
T2, a joined row is returned with NULL values in columns of
195
- T2. Thus, the joined table unconditionally has a row for each
196
- row in T1.
197
+ T2. Thus, the joined table unconditionally has at least one
198
+ row for each row in T1.
197
199
</para>
198
200
</listitem>
199
201
</varlistentry>
@@ -203,7 +205,7 @@ FROM <replaceable>table_reference</replaceable> <optional>, <replaceable>table_r
203
205
204
206
<listitem>
205
207
<para>
206
- This is like a left join, only that the result table will
208
+ This is the converse of a left join: the result table will
207
209
unconditionally have a row for each row in T2.
208
210
</para>
209
211
</listitem>
@@ -237,19 +239,19 @@ FROM <replaceable>table_reference</replaceable> <optional>, <replaceable>table_r
237
239
<para>
238
240
A natural join creates a joined table where every pair of matching
239
241
column names between the two tables are merged into one column. The
240
- join specification is effectively a USING clause containing all the
241
- common column names and is otherwise like a Qualified JOIN .
242
+ result is the same as a qualified join with a USING clause that lists
243
+ all the common column names of the two tables .
242
244
</para>
243
245
</listitem>
244
246
</varlistentry>
245
247
</variablelist>
246
248
247
249
<para>
248
- Joins of all types can be chained together or nested where either
250
+ Joins of all types can be chained together or nested: either
249
251
or both of <replaceable>T1</replaceable> and
250
- <replaceable>T2</replaceable> may be JOINed tables. Parenthesis
251
- can be used around JOIN clauses to control the join order which
252
- are otherwise left to right.
252
+ <replaceable>T2</replaceable> may be JOINed tables. Parentheses
253
+ may be used around JOIN clauses to control the join order. In the
254
+ absence of parentheses, JOIN clauses nest left-to- right.
253
255
</para>
254
256
</sect3>
255
257
@@ -258,7 +260,7 @@ FROM <replaceable>table_reference</replaceable> <optional>, <replaceable>table_r
258
260
259
261
<para>
260
262
Subqueries specifying a derived table must be enclosed in
261
- parenthesis and <emphasis>must</emphasis> be named using an AS
263
+ parentheses and <emphasis>must</emphasis> be named using an AS
262
264
clause. (See <xref linkend="queries-table-aliases">.)
263
265
</para>
264
266
@@ -287,17 +289,17 @@ FROM <replaceable>table_reference</replaceable> AS <replaceable>alias</replaceab
287
289
Here, <replaceable>alias</replaceable> can be any regular
288
290
identifier. The alias becomes the new name of the table
289
291
reference for the current query -- it is no longer possible to
290
- refer to the table by the original name (if the table reference
291
- was an ordinary base table). Thus
292
+ refer to the table by the original name. Thus
292
293
<programlisting>
293
294
SELECT * FROM my_table AS m WHERE my_table.a > 5;
294
295
</programlisting>
295
- is not valid SQL syntax. What will happen instead, as a
296
- <productname>Postgres</productname> extension, is that an implicit
296
+ is not valid SQL syntax. What will actually happen (this is a
297
+ <productname>Postgres</productname> extension to the standard)
298
+ is that an implicit
297
299
table reference is added to the FROM clause, so the query is
298
- processed as if it was written as
300
+ processed as if it were written as
299
301
<programlisting>
300
- SELECT * FROM my_table AS m, my_table WHERE my_table.a > 5;
302
+ SELECT * FROM my_table AS m, my_table AS my_table WHERE my_table.a > 5;
301
303
</programlisting>
302
304
Table aliases are mainly for notational convenience, but it is
303
305
necessary to use them when joining a table to itself, e.g.,
@@ -309,7 +311,7 @@ SELECT * FROM my_table AS a CROSS JOIN my_table AS b ...
309
311
</para>
310
312
311
313
<para>
312
- Parenthesis are used to resolve ambiguities. The following
314
+ Parentheses are used to resolve ambiguities. The following
313
315
statement will assign the alias <literal>b</literal> to the
314
316
result of the join, unlike the previous example:
315
317
<programlisting>
@@ -321,7 +323,7 @@ SELECT * FROM (my_table AS a CROSS JOIN my_table) AS b ...
321
323
<synopsis>
322
324
FROM <replaceable>table_reference</replaceable> <replaceable>alias</replaceable>
323
325
</synopsis>
324
- This form is equivalent the previously treated one; the
326
+ This form is equivalent to the previously treated one; the
325
327
<token>AS</token> key word is noise.
326
328
</para>
327
329
@@ -330,8 +332,9 @@ FROM <replaceable>table_reference</replaceable> <replaceable>alias</replaceable>
330
332
FROM <replaceable>table_reference</replaceable> <optional>AS</optional> <replaceable>alias</replaceable> ( <replaceable>column1</replaceable> <optional>, <replaceable>column2</replaceable> <optional>, ...</optional></optional> )
331
333
</synopsis>
332
334
In addition to renaming the table as described above, the columns
333
- of the table are also given temporary names. If less column
334
- aliases are specified than the actual table has columns, the last
335
+ of the table are also given temporary names for use by the surrounding
336
+ query. If fewer column
337
+ aliases are specified than the actual table has columns, the remaining
335
338
columns are not renamed. This syntax is especially useful for
336
339
self-joins or subqueries.
337
340
</para>
@@ -359,7 +362,7 @@ FROM (SELECT * FROM T1) DT1, T2, T3
359
362
Above are some examples of joined tables and complex derived
360
363
tables. Notice how the AS clause renames or names a derived
361
364
table and how the optional comma-separated list of column names
362
- that follows gives names or renames the columns. The last two
365
+ that follows renames the columns. The last two
363
366
FROM clauses produce the same derived table from T1, T2, and T3.
364
367
The AS keyword was omitted in naming the subquery as DT1. The
365
368
keywords OUTER and INNER are noise that can be omitted also.
@@ -410,7 +413,10 @@ FROM a NATURAL JOIN b WHERE b.val > 5
410
413
Which one of these you use is mainly a matter of style. The JOIN
411
414
syntax in the FROM clause is probably not as portable to other
412
415
products. For outer joins there is no choice in any case: they
413
- must be done in the FROM clause.
416
+ must be done in the FROM clause. An outer join's ON/USING clause
417
+ is <emphasis>not</> equivalent to a WHERE condition, because it
418
+ determines the addition of rows (for unmatched input rows) as well
419
+ as the removal of rows from the final result.
414
420
</para>
415
421
</note>
416
422
@@ -439,7 +445,7 @@ FROM FDT WHERE
439
445
subqueries as value expressions (C2 assumed UNIQUE). Just like
440
446
any other query, the subqueries can employ complex table
441
447
expressions. Notice how FDT is referenced in the subqueries.
442
- Qualifying C1 as FDT.C1 is only necessary if C1 is the name of a
448
+ Qualifying C1 as FDT.C1 is only necessary if C1 is also the name of a
443
449
column in the derived input table of the subquery. Qualifying the
444
450
column name adds clarity even when it is not needed. The column
445
451
naming scope of an outer query extends into its inner queries.
@@ -471,17 +477,17 @@ SELECT <replaceable>select_list</replaceable> FROM ... <optional>WHERE ...</opti
471
477
</para>
472
478
473
479
<para>
474
- Once a table is grouped, columns that are not included in the
475
- grouping cannot be referenced, except in aggregate expressions,
480
+ Once a table is grouped, columns that are not used in the
481
+ grouping cannot be referenced except in aggregate expressions,
476
482
since a specific value in those columns is ambiguous - which row
477
483
in the group should it come from? The grouped-by columns can be
478
484
referenced in select list column expressions since they have a
479
485
known constant value per group. Aggregate functions on the
480
486
ungrouped columns provide values that span the rows of a group,
481
487
not of the whole table. For instance, a
482
- <function>sum(sales)</function> on a grouped table by product code
488
+ <function>sum(sales)</function> on a table grouped by product code
483
489
gives the total sales for each product, not the total sales on all
484
- products. The aggregates of the ungrouped columns are
490
+ products. Aggregates computed on the ungrouped columns are
485
491
representative of the group, whereas their individual values may
486
492
not be.
487
493
</para>
@@ -516,12 +522,12 @@ SELECT <replaceable>select_list</replaceable> FROM ... <optional>WHERE ...</opti
516
522
If a table has been grouped using a GROUP BY clause, but then only
517
523
certain groups are of interest, the HAVING clause can be used,
518
524
much like a WHERE clause, to eliminate groups from a grouped
519
- table. For some queries, Postgres allows a HAVING clause to be
520
- used without a GROUP BY and then it acts just like another WHERE
521
- clause, but the point in using HAVING that way is not clear. Since
522
- HAVING operates on groups, only grouped columns can be listed in
523
- the HAVING clause . If selection based on some ungrouped column is
524
- desired, it should be expressed in the WHERE clause.
525
+ table. Postgres allows a HAVING clause to be
526
+ used without a GROUP BY, in which case it acts like another WHERE
527
+ clause, but the point in using HAVING that way is not clear. A good
528
+ rule of thumb is that a HAVING condition should refer to the results
529
+ of aggregate functions . A restriction that does not involve an
530
+ aggregate is more efficiently expressed in the WHERE clause.
525
531
</para>
526
532
527
533
<para>
@@ -533,11 +539,11 @@ SELECT pid AS "Products",
533
539
FROM products p LEFT JOIN sales s USING ( pid )
534
540
WHERE s.date > CURRENT_DATE - INTERVAL '4 weeks'
535
541
GROUP BY pid, p.name, p.price, p.cost
536
- HAVING p.price > 5000;
542
+ HAVING sum( p.price * s.units) > 5000;
537
543
</programlisting>
538
544
In the example above, the WHERE clause is selecting rows by a
539
545
column that is not grouped, while the HAVING clause
540
- is selecting groups with a price greater than 5000.
546
+ restricts the output to groups with total gross sales over 5000.
541
547
</para>
542
548
</sect2>
543
549
</sect1>
@@ -552,8 +558,8 @@ SELECT pid AS "Products",
552
558
tables, views, eliminating rows, grouping, etc. This table is
553
559
finally passed on to processing by the select list. The select
554
560
list determines which <emphasis>columns</emphasis> of the
555
- intermediate table are retained . The simplest kind of select list
556
- is <literal>*</literal> which retains all columns that the table
561
+ intermediate table are actually output . The simplest kind of select list
562
+ is <literal>*</literal> which emits all columns that the table
557
563
expression produces. Otherwise, a select list is a comma-separated
558
564
list of value expressions (as defined in <xref
559
565
linkend="sql-expressions">). For instance, it could be a list of
@@ -562,7 +568,7 @@ SELECT pid AS "Products",
562
568
SELECT a, b, c FROM ...
563
569
</programlisting>
564
570
The columns names a, b, and c are either the actual names of the
565
- columns of table referenced in the FROM clause, or the aliases
571
+ columns of tables referenced in the FROM clause, or the aliases
566
572
given to them as explained in <xref linkend="queries-table-aliases">.
567
573
The name space available in the select list is the same as in the
568
574
WHERE clause (unless grouping is used, in which case it is the same
@@ -578,9 +584,9 @@ SELECT tbl1.a, tbl2.b, tbl1.c FROM ...
578
584
If an arbitrary value expression is used in the select list, it
579
585
conceptually adds a new virtual column to the returned table. The
580
586
value expression is effectively evaluated once for each retrieved
581
- row with real values substituted for any column references. But
587
+ row, with the row's values substituted for any column references. But
582
588
the expressions in the select list do not have to reference any
583
- columns in the table expression of the FROM clause; they can be
589
+ columns in the table expression of the FROM clause; they could be
584
590
constant arithmetic expressions as well, for instance.
585
591
</para>
586
592
@@ -595,12 +601,12 @@ SELECT tbl1.a, tbl2.b, tbl1.c FROM ...
595
601
<programlisting>
596
602
SELECT a AS value, b + c AS sum FROM ...
597
603
</programlisting>
598
- The AS key word can in fact be omitted.
599
604
</para>
600
605
601
606
<para>
602
- If no name is chosen, the system assigns a default. For simple
603
- column references, this is the name of the column. For function
607
+ If no output column name is specified via AS, the system assigns a
608
+ default name. For simple column references, this is the name of the
609
+ referenced column. For function
604
610
calls, this is the name of the function. For complex expressions,
605
611
the system will generate a generic name.
606
612
</para>
@@ -634,7 +640,7 @@ SELECT DISTINCT <replaceable>select_list</replaceable> ...
634
640
<para>
635
641
Obviously, two rows are considered distinct if they differ in at
636
642
least one column value. NULLs are considered equal in this
637
- consideration .
643
+ comparison .
638
644
</para>
639
645
640
646
<para>
@@ -645,18 +651,21 @@ SELECT DISTINCT ON (<replaceable>expression</replaceable> <optional>, <replaceab
645
651
</synopsis>
646
652
Here <replaceable>expression</replaceable> is an arbitrary value
647
653
expression that is evaluated for all rows. A set of rows for
648
- which all the expressions is equal are considered duplicates and
649
- only the first row is kept in the output. Note that the
654
+ which all the expressions are equal are considered duplicates, and
655
+ only the first row of the set is kept in the output. Note that the
650
656
<quote>first row</quote> of a set is unpredictable unless the
651
- query is sorted.
657
+ query is sorted on enough columns to guarantee a unique ordering
658
+ of the rows arriving at the DISTINCT filter. (DISTINCT ON processing
659
+ occurs after ORDER BY sorting.)
652
660
</para>
653
661
654
662
<para>
655
663
The DISTINCT ON clause is not part of the SQL standard and is
656
- sometimes considered bad style because of the indeterminate nature
664
+ sometimes considered bad style because of the potentially indeterminate
665
+ nature
657
666
of its results. With judicious use of GROUP BY and subselects in
658
- FROM the construct can be avoided, but it is very often the much
659
- more convenient alternative.
667
+ FROM the construct can be avoided, but it is very often the most
668
+ convenient alternative.
660
669
</para>
661
670
</sect2>
662
671
</sect1>
@@ -689,9 +698,9 @@ SELECT DISTINCT ON (<replaceable>expression</replaceable> <optional>, <replaceab
689
698
<command>UNION</command> effectively appends the result of
690
699
<replaceable>query2</replaceable> to the result of
691
700
<replaceable>query1</replaceable> (although there is no guarantee
692
- that this is the order in which the rows are actually returned) and
693
- eliminates all duplicate rows, in the sense of DISTINCT, unless ALL
694
- is specified.
701
+ that this is the order in which the rows are actually returned).
702
+ Furthermore, it eliminates all duplicate rows, in the sense of DISTINCT,
703
+ unless ALL is specified.
695
704
</para>
696
705
697
706
<para>
@@ -727,7 +736,7 @@ SELECT DISTINCT ON (<replaceable>expression</replaceable> <optional>, <replaceab
727
736
chosen, the rows will be returned in random order. The actual
728
737
order in that case will depend on the scan and join plan types and
729
738
the order on disk, but it must not be relied on. A particular
730
- ordering can only be guaranteed if the sort step is explicitly
739
+ output ordering can only be guaranteed if the sort step is explicitly
731
740
chosen.
732
741
</para>
733
742
@@ -737,8 +746,7 @@ SELECT DISTINCT ON (<replaceable>expression</replaceable> <optional>, <replaceab
737
746
SELECT <replaceable>select_list</replaceable> FROM <replaceable>table_expression</replaceable> ORDER BY <replaceable>column1</replaceable> <optional>ASC | DESC</optional> <optional>, <replaceable>column2</replaceable> <optional>ASC | DESC</optional> ...</optional>
738
747
</synopsis>
739
748
<replaceable>column1</replaceable>, etc., refer to select list
740
- columns: It can either be the name of a column (either the
741
- explicit column label or default name, as explained in <xref
749
+ columns. These can be either the output name of a column (see
742
750
linkend="queries-column-labels">) or the number of a column. Some
743
751
examples:
744
752
<programlisting>
@@ -759,8 +767,8 @@ SELECT a, b FROM table1 ORDER BY a + b;
759
767
<programlisting>
760
768
SELECT a AS b FROM table1 ORDER BY a;
761
769
</programlisting>
762
- But this does not work in queries involving UNION, INTERSECT, or
763
- EXCEPT, and is not portable.
770
+ But these extensions do not work in queries involving UNION, INTERSECT,
771
+ or EXCEPT, and are not portable to other DBMSes .
764
772
</para>
765
773
766
774
<para>
@@ -773,8 +781,8 @@ SELECT a AS b FROM table1 ORDER BY a;
773
781
</para>
774
782
775
783
<para>
776
- If more than one sort column is specified the later entries are
777
- used to sort the rows that are equal under the order imposed by the
784
+ If more than one sort column is specified, the later entries are
785
+ used to sort rows that are equal under the order imposed by the
778
786
earlier sort specifications.
779
787
</para>
780
788
</sect1>
0 commit comments