Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
SlideShare a Scribd company logo
SQL Outer Joins

for Fun and Profit
Bill Karwin
Proprietor/Chief Architect
bill@karwin.com
www.karwin.com
Introduction
n 
n 
n 

Overview of SQL joins: inner and outer
Applications of outer joins
Solving Sudoku puzzles with outer joins

2006-07-27

OSCON 2006

2
Joins in SQL
n 

Joins:
The SQL way to express relations between data
in tables
n  Form a new row in the result set, from matching
rows in each joined table
n  As fundamental to using a relational database as
a loop is in other programming languages
n 

2006-07-27

OSCON 2006

3
Inner joins refresher
n 

ANSI SQL-89 syntax:
SELECT ...
FROM products p, orders o
WHERE p.product_id = o.product_id;

n 

ANSI SQL-92 syntax:
SELECT ...
FROM products p JOIN orders o
ON p.product_id = o.product_id;

2006-07-27

OSCON 2006

4
Inner join example
Products

Orders

product_id

product_id

order_id

Abc

Abc

10

Def

Abc

11

Efg

Def

9

2006-07-27

OSCON 2006

5
Inner join example
Query result set
product_id

Product
attributes

order_id

Order
attributes

Abc

$10.00

10

2006/2/1

Abc

$10.00

11

2006/3/10

Def

$5.00

9

2005/5/2

SELECT ...
FROM products p JOIN orders o
ON p.product_id = o.product_id;

2006-07-27

OSCON 2006

6
Outer joins
n 

n 
n 

Returns all rows in one table, but only
matching rows in joined table.
Returns NULL where no row matches.
Not supported in SQL-89
SQL-92 syntax:
SELECT ...
FROM products p
LEFT OUTER JOIN orders o
ON p.product_id = o.product_id;

2006-07-27

OSCON 2006

7
Types of outer joins
n 

LEFT OUTER JOIN
Returns all rows from table on left.
Returns NULLs in columns of right
table where no row matches

n 

RIGHT OUTER JOIN
Returns all rows from table on right.
Returns NULLs in columns of left
table where no row matches.

n 

FULL OUTER JOIN
Returns all rows from both tables.
Returns NULLs in columns of each,
where no row matches.

2006-07-27

OSCON 2006

8
Support for OUTER JOIN
Open-source RDBMS products:
Hypersonic

HSQLDB

PostgreSQL

LEFT
OUTER
JOIN

ü

ü

ü ü ü ü ü

RIGHT
OUTER
JOIN

ü

ü

ü ü ü ü ü

ü

ü ü

2006-07-27

SQLite

Ingres
R3

MySQL

FULL
OUTER
JOIN

Firebird

Apache
Derby

OSCON 2006

ü
9
Outer join example
Products

Orders

product_id

product_id

order_id

Abc

Abc

10

Def

Abc

11

Efg

Def

9

NULL

2006-07-27

OSCON 2006

NULL

10
Outer join example
Query result set
product_id

Product
attributes

order_id

Order
attributes

Abc

$10.00

10

2006/2/1

Abc

$10.00

11

2006/3/10

Def

$5.00

9

2005/5/2

Efg

$17.00

NULL

NULL

SELECT ...
FROM products p
LEFT OUTER JOIN orders o
ON p.product_id = o.product_id;
2006-07-27

OSCON 2006

11
So what?
n 
n 
n 

Difference seems trivial and uninteresting
SQL works with sets and relations
Operations on sets combine in powerful
ways (just like operations on numbers,
strings, or booleans)

INNER JOIN

2006-07-27

LEFT
OUTER JOIN

RIGHT
OUTER JOIN

OSCON 2006

FULL
OUTER JOIN
12
Solutions using outer joins
n 

n 
n 
n 

Extra join
conditions
Subtotals per day
Localization
Mimic

n 
n 

(entity-attribute-value)
n 

NOT IN (subquery)
n 

Top three per group
Finding attributes in
EAV tables
Sudoku puzzle
solver

Greatest row per
group

2006-07-27

OSCON 2006

13
Extra join conditions
n 

n 

Problem: match only with orders created
this year.
Put extra conditions on the outer table into
the ON clause. This applies the conditions
before the join:
SELECT ...
FROM products p
LEFT OUTER JOIN orders o
ON p.product_id = o.product_id
AND o.date >= '2006-01-01';

2006-07-27

OSCON 2006

14
Extra join conditions
Products

Orders

product_id

product_id

order_id

date

Abc

Abc

10

2006/2/1

Def

Abc

11

2006/3/10

Efg

Def

9

2005/5/2

NULL

2006-07-27

OSCON 2006

NULL

NULL

15
Extra join conditions
Query result set
product_id

Product
attributes

order_id

Order
attributes

Abc

$10.00

10

2006/2/1

Abc

$10.00

11

2006/3/10

Def

$5.00

NULL

NULL

Efg

$17.00

NULL

NULL

SELECT ...
FROM products p
LEFT OUTER JOIN orders o
ON p.product_id = o.product_id
AND o.date >= '2006-01-01';
2006-07-27

OSCON 2006

16
Subtotals per day
n 

n 

Problem: show all days, and the subtotal of
orders per day even when there are zero.
Requires an additional table containing all
dates in the desired range.
SELECT d.date, COUNT(o.order_id)
FROM days d
LEFT OUTER JOIN orders o
ON o.date = d.date
GROUP BY d.date;

2006-07-27

OSCON 2006

17
Subtotals per day
Days

Orders

date

date

order_id

2005/5/2

2005/5/2

9

2006/2/1

10

2006/3/10

11

. . .
. . .
. . .
. . .
2006/2/1
. . .
. . .

NULL

NULL

. . .
. . .
2006/3/10
. . .
2006-07-27

OSCON 2006

18
Subtotals per day
Query result set
date
2005/5/2
. . .

0

. . .

0

. . .

0

. . .

0

2006/2/1

1
0

. . .

0

. . .

0

. . .

0

2006/3/10

1

. . .
2006-07-27

1

. . .

SELECT d.date, COUNT(o.order_id)
FROM days d
LEFT OUTER JOIN orders o
ON o.date = d.date
GROUP BY d.date;

COUNT()

0

OSCON 2006

19
Localization
n 

Problem: show translated messages, or in
default language if translation is not
available.
SELECT en.message_id,
COALESCE(sp.message, en.message)
FROM messages AS sp
RIGHT OUTER JOIN messages AS en
ON sp.message_id = en.message_id
AND sp.language = 'sp'
AND en.language = 'en';

n 

COALESCE() returns its first non-null argument.

2006-07-27

OSCON 2006

20
Localization
messages
message_id

language

message

123

en

Thank you

123

sp

Gracias

456

en

Hello
NULL

2006-07-27

OSCON 2006

21
Localization
Query result set
message_id

message

123

Gracias

456

Hello

SELECT en.message_id, COALESCE(sp.message, en.message)
FROM messages AS sp
RIGHT OUTER JOIN messages AS en
ON sp.message_id = en.message_id
AND sp.language = 'sp' AND en.language = 'en';

2006-07-27

OSCON 2006

22
Mimic NOT IN subquery
n 

n 

Problem: find rows for which there is no
match.
Often implemented using NOT IN (subquery):
SELECT ...
FROM products p
WHERE p.product_id NOT IN
(SELECT o.product_id FROM orders o)

2006-07-27

OSCON 2006

23
Mimic NOT IN subquery
n 

Can also be implemented using an outer
join:
SELECT ...
FROM products p
LEFT OUTER JOIN orders o
ON p.product_id = o.product_id
WHERE o.product_id IS NULL;

n 

Useful when subqueries are not supported
(e.g. MySQL 4.0)

2006-07-27

OSCON 2006

24
Mimic NOT IN subquery
Products

Orders

product_id

product_id

order_id

Abc

Abc

10

Def

Abc

11

Efg

Def

9

NULL

2006-07-27

OSCON 2006

NULL

25
Mimic NOT IN subquery
Query result set
product_id

Product
attributes

order_id

Order
attributes

Efg

$17.00

NULL

NULL

SELECT ...
FROM products p
LEFT OUTER JOIN orders o
ON p.product_id = o.product_id
WHERE o.product_id IS NULL;

2006-07-27

OSCON 2006

26
Greatest row per group
n 

Problem: find the row in each group with
the greatest value in one column
SELECT ...
FROM products p JOIN orders o1
ON p.product_id = o1.product_id
LEFT OUTER JOIN orders o2
ON p.product_id = o2.product_id
AND o1.date < o2.date
WHERE o2.product_id IS NULL;

n 

I.e., show the rows for which no other row
exists with a greater date and the same
product_id.

2006-07-27

OSCON 2006

27
Greatest row per group
Orders o1
Products

product_id

order_id

date

product_id

Abc

10

2006/2/1

Abc

Abc

11

2006/3/10

Def

Def

9

2005/5/2

Efg

NULL

Orders o2
product_id

date

Abc

10

2006/2/1

Abc

11

2006/3/10

Def
2006-07-27

order_id

9

2005/5/2

OSCON 2006

28
Greatest row per group
Query result set
product_id

Product
attributes

order_id

Order
attributes

Abc

$10.00

11

2006/3/10

Def

$5.00

9

2005/5/2

SELECT ...
FROM products p JOIN orders o1
ON p.product_id = o1.product_id
LEFT OUTER JOIN orders o2
ON p.product_id = o2.product_id
AND o1.date < o2.date
WHERE o2.product_id IS NULL;
2006-07-27

OSCON 2006

29
Top three per group
n 

Problem: list the largest three cities per US
state.
SELECT c.state, c.city_name, c.population
FROM cities AS c
LEFT JOIN cities AS c2 ON c.state = c2.state
AND c.population <= c2.population
GROUP BY c.state, c.city_name, c.population
HAVING COUNT(*) <= 3
ORDER BY c.state, c.population DESC;

n 

I.e., show the cities for which the number of cities
with the same state and greater population is less
than or equal to three.

2006-07-27

OSCON 2006

30
Top three per group
Cities c2
Cities c

state

city_name

population

state

city_name

population

CA

Los Angeles

3485K

CA

Los Angeles

3485K

CA

San Diego

1110K

CA

San Diego

1110K

CA

San Jose

782K

CA

San Jose

782K

CA

San Francisco

724K

CA

San Francisco

724K

2006-07-27

OSCON 2006

31
Top three per group
Query result set
state

city_name

population

CA

Los Angeles

3485K

CA

San Diego

1110K

CA

San Jose

782K

SELECT c.state, c.city_name, c.population
FROM cities AS c
LEFT JOIN cities AS c2 ON c.state = c2.state
AND c.population <= c2.population
GROUP BY c.state, c.city_name, c.population
HAVING COUNT(*) <= 3
ORDER BY c.state, c.population DESC;

2006-07-27

OSCON 2006

32
Fetching EAV attributes
n 

Entity-Attribute-Value table structure for
dynamic attributes
Not normalized schema design
n  Lacks integrity enforcement
n  Not scalable
n  Nevertheless, EAV is used widely and is
sometimes the only solution when attributes
evolve quickly
n 

2006-07-27

OSCON 2006

33
Fetching EAV attributes
Products

Attributes

product_id

product_id

attribute

value

Abc

Abc

Media

DVD

Def

Abc

Discs

2

Efg

Abc

Format

Widescreen

Abc

Length

108 min.

2006-07-27

OSCON 2006

34
Fetching EAV attributes
n 

Need an outer join per attribute:
SELECT p.product_id, media.value AS media, discs.value AS discs,
format.value AS format, length.value AS length
FROM products AS p
LEFT OUTER JOIN attributes AS media
ON p.product_id = media.product_id AND media.attribute = 'Media'
LEFT OUTER JOIN attributes AS discs
ON p.product_id = discs.product_id AND discs.attribute = 'Discs'
LEFT OUTER JOIN attributes AS format
ON p.product_id = format.product_id AND format.attribute = 'Format'
LEFT OUTER JOIN attributes AS length
ON p.product_id = length.product_id AND length.attribute = 'Length'
WHERE p.product_id = 'Abc';

2006-07-27

OSCON 2006

35
Fetching EAV attributes
Query result set
product_id

media

discs

Format

length

Abc

DVD

2

Widescreen

108 min.

SELECT p.product_id, media.value AS media, discs.value AS discs,
format.value AS format, length.value AS length
FROM products AS p
LEFT OUTER JOIN attributes AS media
ON p.product_id = media.product_id AND media.attribute = 'Media'
LEFT OUTER JOIN attributes AS discs
ON p.product_id = discs.product_id AND discs.attribute = 'Discs'
LEFT OUTER JOIN attributes AS format
ON p.product_id = format.product_id AND format.attribute = 'Format'
LEFT OUTER JOIN attributes AS length
ON p.product_id = length.product_id AND length.attribute = 'Length'
WHERE p.product_id = 'Abc';

2006-07-27

OSCON 2006

36
Sudoku puzzles
7 2 6
3
5 1
1 4 9
7
6
3 8 5 9 1 6 4 7 2
6 2 3
1
5 3
6
9 7
8 6 4
2
5
1
2
8
6 1 7 5
9
9
7 3 1
2006-07-27

OSCON 2006

37
Sudoku schema
CREATE TABLE one_to_nine (
value INTEGER NOT NULL );
INSERT INTO one_to_nine (value) VALUES
(1), (2), (3), (4), (5), (6), (7), (8), (9);
CREATE TABLE sudoku (
column INTEGER NOT NULL,
row
INTEGER NOT NULL,
value INTEGER NOT NULL );
INSERT INTO sudoku (column, row, value) VALUES
(6,1,3), (8,1,5), (9,1,1), (1,2,1), (2,2,4), (5,2,7),
(7,2,6), (2,3,8), (3,3,5), (4,3,9), (7,3,4), (9,3,2),
(3,4,2), (4,4,3), (7,4,1), (9,4,7), (1,5,5), (2,5,3),
(8,5,6), (1,6,9), (4,6,8), (5,6,6), (6,6,4), (8,6,2),
(2,7,5), (4,7,1), (6,7,2), (8,7,8), (1,8,6), (3,8,7),
(4,8,5), (8,8,9), (6,9,7), (7,9,3), (8,9,1);
2006-07-27

OSCON 2006

38
Showing puzzle state
SELECT GROUP_CONCAT(COALESCE(s.value, '_')
ORDER BY x.value SEPARATOR ' ') AS `Puzzle_state`
FROM one_to_nine AS x
INNER JOIN one_to_nine AS y
+-------------------+
| Puzzle_state
|
LEFT OUTER JOIN sudoku AS s
+-------------------+
ON s.column = x.value
| _ _ _ _ _ 3 _ 5 1 |
AND s.row = y.value
| 1 4 _ _ 7 _ 6 _ _ |
| _ 8 5 9 _ _ 4 _ 2 |
GROUP BY y.value;
| _ _ 2 3 _ _ 1 _ 7 |
| 5 3 _ _ _ _ _ 6 _ |
| 9 _ _ 8 6 4 _ 2 _ |
| _ 5 _ 1 _ 2 _ 8 _ |
| 6 _ 7 5 _ _ _ 9 _ |
| _ _ _ _ _ 7 3 1 _ |
+-------------------+

2006-07-27

OSCON 2006

39
Revealing possible values
Cartesian product:
loop x over 1..9 columns,
SELECT x_loop.value AS x, y_loop.value AS y,
GROUP_CONCAT(cell.value ORDER BY cell.value) AS possibilities 1..9 rows,
loop y over
FROM (one_to_nine AS x_loop
loop cell over 1..9 values

INNER JOIN one_to_nine AS y_loop
Is there any value already
INNER JOIN one_to_nine AS cell)
in the cell x, y ?
LEFT OUTER JOIN sudoku as occupied
ON (occupied.column = x_loop.value
Does the value appear in
AND occupied.row = y_loop.value)
column x ?
LEFT OUTER JOIN sudoku as num_in_col
ON (num_in_col.column = x_loop.value
Does the value appear
AND num_in_col.value = cell.value)
Does the value appear
in row y ?
LEFT OUTER JOIN sudoku AS num_in_row
in the sub-square
ON (num_in_row.row = y_loop.value
containing x, y ?
AND num_in_row.value = cell.value)
LEFT OUTER JOIN sudoku AS num_in_box
ON (CEIL(x_loop.value/3) = CEIL(num_in_box.column/3)
AND CEIL(y_loop.value/3) = CEIL(num_in_box.row/3)
AND cell.value = num_in_box.value)
WHERE COALESCE(occupied.value, num_in_col.value,
Select for cases
num_in_row.value, num_in_box.value) IS NULL
where all four
GROUP BY x_loop.value, y_loop.value
outer joins find
no matches
2006-07-27

OSCON 2006

40
Revealing singleton values
SELECT x_loop.value AS x, y_loop.value AS y,
cell.value AS possibilities
FROM (one_to_nine AS x_loop
INNER JOIN one_to_nine AS y_loop
INNER JOIN one_to_nine AS cell)
LEFT OUTER JOIN sudoku as occupied
ON (occupied.column = x_loop.value
AND occupied.row = y_loop.value)
LEFT OUTER JOIN sudoku as num_in_col
ON (num_in_col.column = x_loop.value
AND num_in_col.value = cell.value)
LEFT OUTER JOIN sudoku AS num_in_row
ON (num_in_row.row = y_loop.value
Limit the groups only to
AND num_in_row.value = cell.value)
those with one value
LEFT OUTER JOIN sudoku AS num_in_box
ON (CEIL(x_loop.value/3) = CEIL(num_in_box.column/3)
remaining
AND CEIL(y_loop.value/3) = CEIL(num_in_box.row/3)
AND cell.value = num_in_box.value)
WHERE COALESCE(occupied.value, num_in_col.value,
num_in_row.value, num_in_box.value) IS NULL
GROUP BY x_loop.value, y_loop.value

HAVING COUNT(*) = 1;

2006-07-27

OSCON 2006

41
Updating the puzzle
INSERT INTO sudoku (column, row, value)

SELECT x_loop.value AS x, y_loop.value AS y,
cell.value AS possibilities
FROM (one_to_nine AS x_loop
INNER JOIN one_to_nine AS y_loop
Insert these singletons back
INNER JOIN one_to_nine AS cell)
into the table,
LEFT OUTER JOIN sudoku as occupied
ON (occupied.column = x_loop.value
then we can try again
AND occupied.row = y_loop.value)
LEFT OUTER JOIN sudoku as num_in_col
ON (num_in_col.column = x_loop.value
AND num_in_col.value = cell.value)
LEFT OUTER JOIN sudoku AS num_in_row
ON (num_in_row.row = y_loop.value
AND num_in_row.value = cell.value)
LEFT OUTER JOIN sudoku AS num_in_box
ON (CEIL(x_loop.value/3) = CEIL(num_in_box.column/3)
AND CEIL(y_loop.value/3) = CEIL(num_in_box.row/3)
AND cell.value = num_in_box.value)
WHERE COALESCE(occupied.value, num_in_col.value,
num_in_row.value, num_in_box.value) IS NULL
GROUP BY x_loop.value, y_loop.value
HAVING COUNT(*) = 1;

2006-07-27

OSCON 2006

42
Finish
n 

Outer joins are an indispensable part
of SQL programming.

Thank you!

2006-07-27

OSCON 2006

43

More Related Content

SQL Outer Joins for Fun and Profit

  • 1. SQL Outer Joins for Fun and Profit Bill Karwin Proprietor/Chief Architect bill@karwin.com www.karwin.com
  • 2. Introduction n  n  n  Overview of SQL joins: inner and outer Applications of outer joins Solving Sudoku puzzles with outer joins 2006-07-27 OSCON 2006 2
  • 3. Joins in SQL n  Joins: The SQL way to express relations between data in tables n  Form a new row in the result set, from matching rows in each joined table n  As fundamental to using a relational database as a loop is in other programming languages n  2006-07-27 OSCON 2006 3
  • 4. Inner joins refresher n  ANSI SQL-89 syntax: SELECT ... FROM products p, orders o WHERE p.product_id = o.product_id; n  ANSI SQL-92 syntax: SELECT ... FROM products p JOIN orders o ON p.product_id = o.product_id; 2006-07-27 OSCON 2006 4
  • 6. Inner join example Query result set product_id Product attributes order_id Order attributes Abc $10.00 10 2006/2/1 Abc $10.00 11 2006/3/10 Def $5.00 9 2005/5/2 SELECT ... FROM products p JOIN orders o ON p.product_id = o.product_id; 2006-07-27 OSCON 2006 6
  • 7. Outer joins n  n  n  Returns all rows in one table, but only matching rows in joined table. Returns NULL where no row matches. Not supported in SQL-89 SQL-92 syntax: SELECT ... FROM products p LEFT OUTER JOIN orders o ON p.product_id = o.product_id; 2006-07-27 OSCON 2006 7
  • 8. Types of outer joins n  LEFT OUTER JOIN Returns all rows from table on left. Returns NULLs in columns of right table where no row matches n  RIGHT OUTER JOIN Returns all rows from table on right. Returns NULLs in columns of left table where no row matches. n  FULL OUTER JOIN Returns all rows from both tables. Returns NULLs in columns of each, where no row matches. 2006-07-27 OSCON 2006 8
  • 9. Support for OUTER JOIN Open-source RDBMS products: Hypersonic HSQLDB PostgreSQL LEFT OUTER JOIN ü ü ü ü ü ü ü RIGHT OUTER JOIN ü ü ü ü ü ü ü ü ü ü 2006-07-27 SQLite Ingres R3 MySQL FULL OUTER JOIN Firebird Apache Derby OSCON 2006 ü 9
  • 11. Outer join example Query result set product_id Product attributes order_id Order attributes Abc $10.00 10 2006/2/1 Abc $10.00 11 2006/3/10 Def $5.00 9 2005/5/2 Efg $17.00 NULL NULL SELECT ... FROM products p LEFT OUTER JOIN orders o ON p.product_id = o.product_id; 2006-07-27 OSCON 2006 11
  • 12. So what? n  n  n  Difference seems trivial and uninteresting SQL works with sets and relations Operations on sets combine in powerful ways (just like operations on numbers, strings, or booleans) INNER JOIN 2006-07-27 LEFT OUTER JOIN RIGHT OUTER JOIN OSCON 2006 FULL OUTER JOIN 12
  • 13. Solutions using outer joins n  n  n  n  Extra join conditions Subtotals per day Localization Mimic n  n  (entity-attribute-value) n  NOT IN (subquery) n  Top three per group Finding attributes in EAV tables Sudoku puzzle solver Greatest row per group 2006-07-27 OSCON 2006 13
  • 14. Extra join conditions n  n  Problem: match only with orders created this year. Put extra conditions on the outer table into the ON clause. This applies the conditions before the join: SELECT ... FROM products p LEFT OUTER JOIN orders o ON p.product_id = o.product_id AND o.date >= '2006-01-01'; 2006-07-27 OSCON 2006 14
  • 16. Extra join conditions Query result set product_id Product attributes order_id Order attributes Abc $10.00 10 2006/2/1 Abc $10.00 11 2006/3/10 Def $5.00 NULL NULL Efg $17.00 NULL NULL SELECT ... FROM products p LEFT OUTER JOIN orders o ON p.product_id = o.product_id AND o.date >= '2006-01-01'; 2006-07-27 OSCON 2006 16
  • 17. Subtotals per day n  n  Problem: show all days, and the subtotal of orders per day even when there are zero. Requires an additional table containing all dates in the desired range. SELECT d.date, COUNT(o.order_id) FROM days d LEFT OUTER JOIN orders o ON o.date = d.date GROUP BY d.date; 2006-07-27 OSCON 2006 17
  • 18. Subtotals per day Days Orders date date order_id 2005/5/2 2005/5/2 9 2006/2/1 10 2006/3/10 11 . . . . . . . . . . . . 2006/2/1 . . . . . . NULL NULL . . . . . . 2006/3/10 . . . 2006-07-27 OSCON 2006 18
  • 19. Subtotals per day Query result set date 2005/5/2 . . . 0 . . . 0 . . . 0 . . . 0 2006/2/1 1 0 . . . 0 . . . 0 . . . 0 2006/3/10 1 . . . 2006-07-27 1 . . . SELECT d.date, COUNT(o.order_id) FROM days d LEFT OUTER JOIN orders o ON o.date = d.date GROUP BY d.date; COUNT() 0 OSCON 2006 19
  • 20. Localization n  Problem: show translated messages, or in default language if translation is not available. SELECT en.message_id, COALESCE(sp.message, en.message) FROM messages AS sp RIGHT OUTER JOIN messages AS en ON sp.message_id = en.message_id AND sp.language = 'sp' AND en.language = 'en'; n  COALESCE() returns its first non-null argument. 2006-07-27 OSCON 2006 20
  • 22. Localization Query result set message_id message 123 Gracias 456 Hello SELECT en.message_id, COALESCE(sp.message, en.message) FROM messages AS sp RIGHT OUTER JOIN messages AS en ON sp.message_id = en.message_id AND sp.language = 'sp' AND en.language = 'en'; 2006-07-27 OSCON 2006 22
  • 23. Mimic NOT IN subquery n  n  Problem: find rows for which there is no match. Often implemented using NOT IN (subquery): SELECT ... FROM products p WHERE p.product_id NOT IN (SELECT o.product_id FROM orders o) 2006-07-27 OSCON 2006 23
  • 24. Mimic NOT IN subquery n  Can also be implemented using an outer join: SELECT ... FROM products p LEFT OUTER JOIN orders o ON p.product_id = o.product_id WHERE o.product_id IS NULL; n  Useful when subqueries are not supported (e.g. MySQL 4.0) 2006-07-27 OSCON 2006 24
  • 25. Mimic NOT IN subquery Products Orders product_id product_id order_id Abc Abc 10 Def Abc 11 Efg Def 9 NULL 2006-07-27 OSCON 2006 NULL 25
  • 26. Mimic NOT IN subquery Query result set product_id Product attributes order_id Order attributes Efg $17.00 NULL NULL SELECT ... FROM products p LEFT OUTER JOIN orders o ON p.product_id = o.product_id WHERE o.product_id IS NULL; 2006-07-27 OSCON 2006 26
  • 27. Greatest row per group n  Problem: find the row in each group with the greatest value in one column SELECT ... FROM products p JOIN orders o1 ON p.product_id = o1.product_id LEFT OUTER JOIN orders o2 ON p.product_id = o2.product_id AND o1.date < o2.date WHERE o2.product_id IS NULL; n  I.e., show the rows for which no other row exists with a greater date and the same product_id. 2006-07-27 OSCON 2006 27
  • 28. Greatest row per group Orders o1 Products product_id order_id date product_id Abc 10 2006/2/1 Abc Abc 11 2006/3/10 Def Def 9 2005/5/2 Efg NULL Orders o2 product_id date Abc 10 2006/2/1 Abc 11 2006/3/10 Def 2006-07-27 order_id 9 2005/5/2 OSCON 2006 28
  • 29. Greatest row per group Query result set product_id Product attributes order_id Order attributes Abc $10.00 11 2006/3/10 Def $5.00 9 2005/5/2 SELECT ... FROM products p JOIN orders o1 ON p.product_id = o1.product_id LEFT OUTER JOIN orders o2 ON p.product_id = o2.product_id AND o1.date < o2.date WHERE o2.product_id IS NULL; 2006-07-27 OSCON 2006 29
  • 30. Top three per group n  Problem: list the largest three cities per US state. SELECT c.state, c.city_name, c.population FROM cities AS c LEFT JOIN cities AS c2 ON c.state = c2.state AND c.population <= c2.population GROUP BY c.state, c.city_name, c.population HAVING COUNT(*) <= 3 ORDER BY c.state, c.population DESC; n  I.e., show the cities for which the number of cities with the same state and greater population is less than or equal to three. 2006-07-27 OSCON 2006 30
  • 31. Top three per group Cities c2 Cities c state city_name population state city_name population CA Los Angeles 3485K CA Los Angeles 3485K CA San Diego 1110K CA San Diego 1110K CA San Jose 782K CA San Jose 782K CA San Francisco 724K CA San Francisco 724K 2006-07-27 OSCON 2006 31
  • 32. Top three per group Query result set state city_name population CA Los Angeles 3485K CA San Diego 1110K CA San Jose 782K SELECT c.state, c.city_name, c.population FROM cities AS c LEFT JOIN cities AS c2 ON c.state = c2.state AND c.population <= c2.population GROUP BY c.state, c.city_name, c.population HAVING COUNT(*) <= 3 ORDER BY c.state, c.population DESC; 2006-07-27 OSCON 2006 32
  • 33. Fetching EAV attributes n  Entity-Attribute-Value table structure for dynamic attributes Not normalized schema design n  Lacks integrity enforcement n  Not scalable n  Nevertheless, EAV is used widely and is sometimes the only solution when attributes evolve quickly n  2006-07-27 OSCON 2006 33
  • 35. Fetching EAV attributes n  Need an outer join per attribute: SELECT p.product_id, media.value AS media, discs.value AS discs, format.value AS format, length.value AS length FROM products AS p LEFT OUTER JOIN attributes AS media ON p.product_id = media.product_id AND media.attribute = 'Media' LEFT OUTER JOIN attributes AS discs ON p.product_id = discs.product_id AND discs.attribute = 'Discs' LEFT OUTER JOIN attributes AS format ON p.product_id = format.product_id AND format.attribute = 'Format' LEFT OUTER JOIN attributes AS length ON p.product_id = length.product_id AND length.attribute = 'Length' WHERE p.product_id = 'Abc'; 2006-07-27 OSCON 2006 35
  • 36. Fetching EAV attributes Query result set product_id media discs Format length Abc DVD 2 Widescreen 108 min. SELECT p.product_id, media.value AS media, discs.value AS discs, format.value AS format, length.value AS length FROM products AS p LEFT OUTER JOIN attributes AS media ON p.product_id = media.product_id AND media.attribute = 'Media' LEFT OUTER JOIN attributes AS discs ON p.product_id = discs.product_id AND discs.attribute = 'Discs' LEFT OUTER JOIN attributes AS format ON p.product_id = format.product_id AND format.attribute = 'Format' LEFT OUTER JOIN attributes AS length ON p.product_id = length.product_id AND length.attribute = 'Length' WHERE p.product_id = 'Abc'; 2006-07-27 OSCON 2006 36
  • 37. Sudoku puzzles 7 2 6 3 5 1 1 4 9 7 6 3 8 5 9 1 6 4 7 2 6 2 3 1 5 3 6 9 7 8 6 4 2 5 1 2 8 6 1 7 5 9 9 7 3 1 2006-07-27 OSCON 2006 37
  • 38. Sudoku schema CREATE TABLE one_to_nine ( value INTEGER NOT NULL ); INSERT INTO one_to_nine (value) VALUES (1), (2), (3), (4), (5), (6), (7), (8), (9); CREATE TABLE sudoku ( column INTEGER NOT NULL, row INTEGER NOT NULL, value INTEGER NOT NULL ); INSERT INTO sudoku (column, row, value) VALUES (6,1,3), (8,1,5), (9,1,1), (1,2,1), (2,2,4), (5,2,7), (7,2,6), (2,3,8), (3,3,5), (4,3,9), (7,3,4), (9,3,2), (3,4,2), (4,4,3), (7,4,1), (9,4,7), (1,5,5), (2,5,3), (8,5,6), (1,6,9), (4,6,8), (5,6,6), (6,6,4), (8,6,2), (2,7,5), (4,7,1), (6,7,2), (8,7,8), (1,8,6), (3,8,7), (4,8,5), (8,8,9), (6,9,7), (7,9,3), (8,9,1); 2006-07-27 OSCON 2006 38
  • 39. Showing puzzle state SELECT GROUP_CONCAT(COALESCE(s.value, '_') ORDER BY x.value SEPARATOR ' ') AS `Puzzle_state` FROM one_to_nine AS x INNER JOIN one_to_nine AS y +-------------------+ | Puzzle_state | LEFT OUTER JOIN sudoku AS s +-------------------+ ON s.column = x.value | _ _ _ _ _ 3 _ 5 1 | AND s.row = y.value | 1 4 _ _ 7 _ 6 _ _ | | _ 8 5 9 _ _ 4 _ 2 | GROUP BY y.value; | _ _ 2 3 _ _ 1 _ 7 | | 5 3 _ _ _ _ _ 6 _ | | 9 _ _ 8 6 4 _ 2 _ | | _ 5 _ 1 _ 2 _ 8 _ | | 6 _ 7 5 _ _ _ 9 _ | | _ _ _ _ _ 7 3 1 _ | +-------------------+ 2006-07-27 OSCON 2006 39
  • 40. Revealing possible values Cartesian product: loop x over 1..9 columns, SELECT x_loop.value AS x, y_loop.value AS y, GROUP_CONCAT(cell.value ORDER BY cell.value) AS possibilities 1..9 rows, loop y over FROM (one_to_nine AS x_loop loop cell over 1..9 values INNER JOIN one_to_nine AS y_loop Is there any value already INNER JOIN one_to_nine AS cell) in the cell x, y ? LEFT OUTER JOIN sudoku as occupied ON (occupied.column = x_loop.value Does the value appear in AND occupied.row = y_loop.value) column x ? LEFT OUTER JOIN sudoku as num_in_col ON (num_in_col.column = x_loop.value Does the value appear AND num_in_col.value = cell.value) Does the value appear in row y ? LEFT OUTER JOIN sudoku AS num_in_row in the sub-square ON (num_in_row.row = y_loop.value containing x, y ? AND num_in_row.value = cell.value) LEFT OUTER JOIN sudoku AS num_in_box ON (CEIL(x_loop.value/3) = CEIL(num_in_box.column/3) AND CEIL(y_loop.value/3) = CEIL(num_in_box.row/3) AND cell.value = num_in_box.value) WHERE COALESCE(occupied.value, num_in_col.value, Select for cases num_in_row.value, num_in_box.value) IS NULL where all four GROUP BY x_loop.value, y_loop.value outer joins find no matches 2006-07-27 OSCON 2006 40
  • 41. Revealing singleton values SELECT x_loop.value AS x, y_loop.value AS y, cell.value AS possibilities FROM (one_to_nine AS x_loop INNER JOIN one_to_nine AS y_loop INNER JOIN one_to_nine AS cell) LEFT OUTER JOIN sudoku as occupied ON (occupied.column = x_loop.value AND occupied.row = y_loop.value) LEFT OUTER JOIN sudoku as num_in_col ON (num_in_col.column = x_loop.value AND num_in_col.value = cell.value) LEFT OUTER JOIN sudoku AS num_in_row ON (num_in_row.row = y_loop.value Limit the groups only to AND num_in_row.value = cell.value) those with one value LEFT OUTER JOIN sudoku AS num_in_box ON (CEIL(x_loop.value/3) = CEIL(num_in_box.column/3) remaining AND CEIL(y_loop.value/3) = CEIL(num_in_box.row/3) AND cell.value = num_in_box.value) WHERE COALESCE(occupied.value, num_in_col.value, num_in_row.value, num_in_box.value) IS NULL GROUP BY x_loop.value, y_loop.value HAVING COUNT(*) = 1; 2006-07-27 OSCON 2006 41
  • 42. Updating the puzzle INSERT INTO sudoku (column, row, value) SELECT x_loop.value AS x, y_loop.value AS y, cell.value AS possibilities FROM (one_to_nine AS x_loop INNER JOIN one_to_nine AS y_loop Insert these singletons back INNER JOIN one_to_nine AS cell) into the table, LEFT OUTER JOIN sudoku as occupied ON (occupied.column = x_loop.value then we can try again AND occupied.row = y_loop.value) LEFT OUTER JOIN sudoku as num_in_col ON (num_in_col.column = x_loop.value AND num_in_col.value = cell.value) LEFT OUTER JOIN sudoku AS num_in_row ON (num_in_row.row = y_loop.value AND num_in_row.value = cell.value) LEFT OUTER JOIN sudoku AS num_in_box ON (CEIL(x_loop.value/3) = CEIL(num_in_box.column/3) AND CEIL(y_loop.value/3) = CEIL(num_in_box.row/3) AND cell.value = num_in_box.value) WHERE COALESCE(occupied.value, num_in_col.value, num_in_row.value, num_in_box.value) IS NULL GROUP BY x_loop.value, y_loop.value HAVING COUNT(*) = 1; 2006-07-27 OSCON 2006 42
  • 43. Finish n  Outer joins are an indispensable part of SQL programming. Thank you! 2006-07-27 OSCON 2006 43