Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

Commit 8372304

Browse files
committed
Improve documentation's description of JOIN clauses.
In bug #12000, Andreas Kunert complained that the documentation was misleading in saying "FROM T1 CROSS JOIN T2 is equivalent to FROM T1, T2". That's correct as far as it goes, but the equivalence doesn't hold when you consider three or more tables, since JOIN binds more tightly than comma. I added a <note> to explain this, and ended up rearranging some of the existing text so that the note would make sense in context. In passing, rewrite the description of JOIN USING, which was unnecessarily vague, and hadn't been helped any by somebody's reliance on markup as a substitute for clear writing. (Mostly this involved reintroducing a concrete example that was unaccountably removed by commit 032f3b7.) Back-patch to all supported branches.
1 parent 88fc719 commit 8372304

File tree

1 file changed

+98
-85
lines changed

1 file changed

+98
-85
lines changed

doc/src/sgml/queries.sgml

+98-85
Original file line numberDiff line numberDiff line change
@@ -118,10 +118,12 @@ FROM <replaceable>table_reference</replaceable> <optional>, <replaceable>table_r
118118
</synopsis>
119119

120120
A table reference can be a table name (possibly schema-qualified),
121-
or a derived table such as a subquery, a table join, or complex
122-
combinations of these. If more than one table reference is listed
123-
in the <literal>FROM</> clause they are cross-joined (see below)
124-
to form the intermediate virtual table that can then be subject to
121+
or a derived table such as a subquery, a <literal>JOIN</> construct, or
122+
complex combinations of these. If more than one table reference is
123+
listed in the <literal>FROM</> clause, the tables are cross-joined
124+
(that is, the Cartesian product of their rows is formed; see below).
125+
The result of the <literal>FROM</> list is an intermediate virtual
126+
table that can then be subject to
125127
transformations by the <literal>WHERE</>, <literal>GROUP BY</>,
126128
and <literal>HAVING</> clauses and is finally the result of the
127129
overall table expression.
@@ -161,6 +163,16 @@ FROM <replaceable>table_reference</replaceable> <optional>, <replaceable>table_r
161163
A joined table is a table derived from two other (real or
162164
derived) tables according to the rules of the particular join
163165
type. Inner, outer, and cross-joins are available.
166+
The general syntax of a joined table is
167+
<synopsis>
168+
<replaceable>T1</replaceable> <replaceable>join_type</replaceable> <replaceable>T2</replaceable> <optional> <replaceable>join_condition</replaceable> </optional>
169+
</synopsis>
170+
Joins of all types can be chained together, or nested: either or
171+
both <replaceable>T1</replaceable> and
172+
<replaceable>T2</replaceable> can be joined tables. Parentheses
173+
can be used around <literal>JOIN</> clauses to control the join
174+
order. In the absence of parentheses, <literal>JOIN</> clauses
175+
nest left-to-right.
164176
</para>
165177

166178
<variablelist>
@@ -197,10 +209,28 @@ FROM <replaceable>table_reference</replaceable> <optional>, <replaceable>table_r
197209
<para>
198210
<literal>FROM <replaceable>T1</replaceable> CROSS JOIN
199211
<replaceable>T2</replaceable></literal> is equivalent to
200-
<literal>FROM <replaceable>T1</replaceable>,
201-
<replaceable>T2</replaceable></literal>. It is also equivalent to
202212
<literal>FROM <replaceable>T1</replaceable> INNER JOIN
203213
<replaceable>T2</replaceable> ON TRUE</literal> (see below).
214+
It is also equivalent to
215+
<literal>FROM <replaceable>T1</replaceable>,
216+
<replaceable>T2</replaceable></literal>.
217+
<note>
218+
<para>
219+
This latter equivalence does not hold exactly when more than two
220+
tables appear, because <literal>JOIN</> binds more tightly than
221+
comma. For example
222+
<literal>FROM <replaceable>T1</replaceable> CROSS JOIN
223+
<replaceable>T2</replaceable> INNER JOIN <replaceable>T3</replaceable>
224+
ON <replaceable>condition</replaceable></literal>
225+
is not the same as
226+
<literal>FROM <replaceable>T1</replaceable>,
227+
<replaceable>T2</replaceable> INNER JOIN <replaceable>T3</replaceable>
228+
ON <replaceable>condition</replaceable></literal>
229+
because the <replaceable>condition</replaceable> can
230+
reference <replaceable>T1</replaceable> in the first case but not
231+
the second.
232+
</para>
233+
</note>
204234
</para>
205235
</listitem>
206236
</varlistentry>
@@ -240,76 +270,6 @@ FROM <replaceable>table_reference</replaceable> <optional>, <replaceable>table_r
240270
<quote>match</quote>, as explained in detail below.
241271
</para>
242272

243-
<para>
244-
The <literal>ON</> clause is the most general kind of join
245-
condition: it takes a Boolean value expression of the same
246-
kind as is used in a <literal>WHERE</> clause. A pair of rows
247-
from <replaceable>T1</> and <replaceable>T2</> match if the
248-
<literal>ON</> expression evaluates to true.
249-
</para>
250-
251-
<para>
252-
The <literal>USING</> clause allows you to take advantage of
253-
the specific situation where both sides of the join use the
254-
same name for the joining columns. It takes a
255-
comma-separated list of the shared column names
256-
and forms a join using the equals operator. Furthermore, the
257-
output of <literal>JOIN USING</> has one column for each of the
258-
listed columns, followed by the remaining columns from each table.
259-
</para>
260-
261-
<para>The output column difference between <literal>ON</> and
262-
<literal>USING</> when invoking <literal>SELECT *</> is:</para>
263-
<itemizedlist>
264-
<listitem>
265-
<para>
266-
<literal>ON</> - all columns from <replaceable>T1</> followed
267-
by all columns from <replaceable>T2</>
268-
</para>
269-
</listitem>
270-
<listitem>
271-
<para>
272-
<literal>USING</> - all join columns, one copy each
273-
and in the listed order, followed by non-join columns
274-
in <replaceable>T1</> followed by non-join columns in
275-
<replaceable>T2</>
276-
</para>
277-
</listitem>
278-
<listitem>
279-
<para>
280-
Examples provided below
281-
</para>
282-
</listitem>
283-
</itemizedlist>
284-
285-
<para>
286-
<indexterm>
287-
<primary>join</primary>
288-
<secondary>natural</secondary>
289-
</indexterm>
290-
<indexterm>
291-
<primary>natural join</primary>
292-
</indexterm>
293-
Finally, <literal>NATURAL</> is a shorthand form of
294-
<literal>USING</>: it forms a <literal>USING</> list
295-
consisting of all column names that appear in both
296-
input tables. As with <literal>USING</>, these columns appear
297-
only once in the output table. If there are no common
298-
columns, <literal>NATURAL</literal> behaves like
299-
<literal>CROSS JOIN</literal>.
300-
</para>
301-
302-
<note>
303-
<para>
304-
<literal>USING</literal> is reasonably safe from column changes
305-
in the joined relations since only the specific columns mentioned
306-
are considered. <literal>NATURAL</> is considerably more problematic
307-
if you are referring to relations only by name (views and tables)
308-
since any schema changes to either relation that cause a new matching
309-
column name to be present will cause the join to consider that new column.
310-
</para>
311-
</note>
312-
313273
<para>
314274
The possible types of qualified join are:
315275

@@ -387,19 +347,70 @@ FROM <replaceable>table_reference</replaceable> <optional>, <replaceable>table_r
387347
</varlistentry>
388348
</variablelist>
389349
</para>
350+
351+
<para>
352+
The <literal>ON</> clause is the most general kind of join
353+
condition: it takes a Boolean value expression of the same
354+
kind as is used in a <literal>WHERE</> clause. A pair of rows
355+
from <replaceable>T1</> and <replaceable>T2</> match if the
356+
<literal>ON</> expression evaluates to true.
357+
</para>
358+
359+
<para>
360+
The <literal>USING</> clause is a shorthand that allows you to take
361+
advantage of the specific situation where both sides of the join use
362+
the same name for the joining column(s). It takes a
363+
comma-separated list of the shared column names
364+
and forms a join condition that includes an equality comparison
365+
for each one. For example, joining <replaceable>T1</>
366+
and <replaceable>T2</> with <literal>USING (a, b)</> produces
367+
the join condition <literal>ON <replaceable>T1</>.a
368+
= <replaceable>T2</>.a AND <replaceable>T1</>.b
369+
= <replaceable>T2</>.b</literal>.
370+
</para>
371+
372+
<para>
373+
Furthermore, the output of <literal>JOIN USING</> suppresses
374+
redundant columns: there is no need to print both of the matched
375+
columns, since they must have equal values. While <literal>JOIN
376+
ON</> produces all columns from <replaceable>T1</> followed by all
377+
columns from <replaceable>T2</>, <literal>JOIN USING</> produces one
378+
output column for each of the listed column pairs (in the listed
379+
order), followed by any remaining columns from <replaceable>T1</>,
380+
followed by any remaining columns from <replaceable>T2</>.
381+
</para>
382+
383+
<para>
384+
<indexterm>
385+
<primary>join</primary>
386+
<secondary>natural</secondary>
387+
</indexterm>
388+
<indexterm>
389+
<primary>natural join</primary>
390+
</indexterm>
391+
Finally, <literal>NATURAL</> is a shorthand form of
392+
<literal>USING</>: it forms a <literal>USING</> list
393+
consisting of all column names that appear in both
394+
input tables. As with <literal>USING</>, these columns appear
395+
only once in the output table. If there are no common
396+
column names, <literal>NATURAL</literal> behaves like
397+
<literal>CROSS JOIN</literal>.
398+
</para>
399+
400+
<note>
401+
<para>
402+
<literal>USING</literal> is reasonably safe from column changes
403+
in the joined relations since only the listed columns
404+
are combined. <literal>NATURAL</> is considerably more risky since
405+
any schema changes to either relation that cause a new matching
406+
column name to be present will cause the join to combine that new
407+
column as well.
408+
</para>
409+
</note>
390410
</listitem>
391411
</varlistentry>
392412
</variablelist>
393413

394-
<para>
395-
Joins of all types can be chained together or nested: either or
396-
both <replaceable>T1</replaceable> and
397-
<replaceable>T2</replaceable> can be joined tables. Parentheses
398-
can be used around <literal>JOIN</> clauses to control the join
399-
order. In the absence of parentheses, <literal>JOIN</> clauses
400-
nest left-to-right.
401-
</para>
402-
403414
<para>
404415
To put this together, assume we have tables <literal>t1</literal>:
405416
<programlisting>
@@ -516,6 +527,8 @@ FROM <replaceable>table_reference</replaceable> <optional>, <replaceable>table_r
516527
clause is processed <emphasis>before</> the join, while
517528
a restriction placed in the <literal>WHERE</> clause is processed
518529
<emphasis>after</> the join.
530+
That does not matter with inner joins, but it matters a lot with outer
531+
joins.
519532
</para>
520533
</sect3>
521534

0 commit comments

Comments
 (0)