SQL Antipatterns
SQL Antipatterns
Bill Karwin
1
Antipattern Categories
Logical Database Physical Database
Antipatterns Antipatterns
CREATE TABLE BugsProducts (
bug_id INTEGER REFERENCES Bugs,
product VARCHAR(100) REFERENCES Products,
PRIMARY KEY (bug_id, product)
);
Query Application
Antipatterns Antipatterns
SELECT b.product, COUNT(*) $dbHandle = new PDO(‘mysql:dbname=test’);
FROM BugsProducts AS b $stmt = $dbHandle->prepare($sql);
GROUP BY b.product; $result = $stmt->fetchAll();
2
Antipattern Categories
Logical Database Physical Database
Antipatterns Antipatterns
CREATE TABLE BugsProducts (
bug_id INTEGER REFERENCES Bugs,
product VARCHAR(100) REFERENCES Products,
PRIMARY KEY (bug_id, product)
);
Query Application
Antipatterns Antipatterns
SELECT b.product, COUNT(*) $dbHandle = new PDO(‘mysql:dbname=test’);
FROM BugsProducts AS b $stmt = $dbHandle->prepare($sql);
GROUP BY b.product; $result = $stmt->fetchAll();
3
Logical Database Antipatterns
1. Comma-Separated Lists
2. Multi-Column Attributes
3. Entity-Attribute-Value
4. Metadata Tribbles
4
Comma-Separated Lists
5
Comma-Separated Lists
• Example: table BUGS references table
PRODUCTS using a foreign key
( * .. 1 )
BUGS PRODUCTS
6
Comma-Separated Lists
CREATE TABLE bugs ( CREATE TABLE products (
bug_id
SERIAL PRIMARY KEY,
product_id
SERIAL PRIMARY KEY,
description
VARCHAR(200),
product_name
VARCHAR(50)
product_id
BIGINT )
REFERENCES products
)
7
Comma-Separated Lists
• Objective: Allow a bug to reference
multiple products
( * .. * )
BUGS PRODUCTS
8
Comma-Separated Lists
• Antipattern: reference multiple
products in a comma-separated list
CREATE TABLE bugs (
bug_id
SERIAL PRIMARY KEY,
description
VARCHAR(200),
product_id
VARCHAR(50)
)
9
Comma-Separated Lists
10
Comma-Separated Lists
11
Comma-Separated Lists
12
Comma-Separated Lists
13
Comma-Separated Lists
14
Comma-Separated Lists
15
Comma-Separated Lists
• Solution: a many-to-many relationship
always requires an intersection table.
(1 .. *) (* .. 1)
BUGS BUGS_PRODS PRODUCTS
16
Comma-Separated Lists
17
Comma-Separated Lists
18
Comma-Separated Lists
Add a product: INSERT INTO bugs_prods
VALUES (1234, 2)
19
Comma-Separated Lists
Join bugs to products: SELECT * FROM bugs
JOIN bugs_prods USING (bug_id)
JOIN products USING (product_id)
WHERE bug_id = 1234
Count bugs per product: SELECT product_id, COUNT(*)
FROM bugs_prods
GROUP BY product_id
20
Multi-Column
Attributes
21
Multi-Column Attributes
• Objective: support an attribute with
multiple values
22
Multi-Column Attributes
• Antipattern: add more columns
CREATE TABLE bugs (
bug_id
SERIAL PRIMARY KEY,
description
VARCHAR(200),
product_id1
BIGINT REFERENCES products,
product_id2
BIGINT REFERENCES products,
product_id3
BIGINT REFERENCES products
)
23
Multi-Column Attributes
24
Multi-Column Attributes
• Search for multiple products (two forms):
SELECT * FROM bugs
WHERE (
product_id1 = 1
OR
product_id2 = 1
OR
product_id3 = 1 )
AND (
product_id1 = 3
OR
product_id2 = 3
OR
product_id3 = 3 )
SELECT * FROM bugs
WHERE
1 IN (product_id1, product_id2, product_id3)
AND
3 IN (product_id2, product_id3, product_id4)
25
Multi-Column Attributes
• Remove product 2 from a row:
bug_id product_id1 product_id2 product_id3
BEFORE 1234 1 3 2
AFTER 1234 1 3 NULL
UPDATE bugs
SET
product_id1 = NULLIF(product_id1, 2),
product_id2 = NULLIF(product_id2, 2),
product_id3 = NULLIF(product_id3, 2)
WHERE bug_id = 1234
26
Multi-Column Attributes
• Add product 3 to a row:
bug_id product_id1 product_id2 product_id3
UPDATE bugs
SET
product_id1 = CASE
WHEN 3 IN (product_id2, product_id3) THEN product_id1
ELSE COALESCE(product_id1, 3) END,
product_id2 = CASE
WHEN 3 IN (product_id1, product_id3) THEN product_id2
ELSE COALESCE(product_id2, 3) END,
product_id3 = CASE
WHEN 3 IN (product_id1, product_id2) THEN product_id3
ELSE COALESCE(product_id3, 3) END
WHERE bug_id = 1234
27
Multi-Column Attributes
• How many columns are enough?
• Add a new column for fourth product:
ALTER TABLE bugs ADD COLUMN product_id4
BIGINT REFERENCES products
28
Multi-Column Attributes
29
Entity-Attribute-Value
30
Entity-Attribute-Value
• Objective: make a table with a variable
set of attributes
bug_id bug_type priority description severity sponsor
31
Entity-Attribute-Value
32
Entity-Attribute-Value
33
Entity-Attribute-Value
• Difficult to ensure consistent attribute
names
34
Entity-Attribute-Value
• Difficult to enforce data type integrity
35
Entity-Attribute-Value
36
Entity-Attribute-Value
• Difficult to enforce referential integrity for
attribute values
bug_id attr_name attr_value
1234 priority new
3456 priority fixed
5678 priority banana
37
Entity-Attribute-Value
• Difficult to reconstruct a row of attributes:
SELECT b.bug_id,
e1.attr_value AS `created_date`, need one JOIN
e2.attr_value AS `priority` per attribute
FROM bugs b
LEFT JOIN eav e1 ON (b.bug_id = e1.bug_id
AND e1.attr_name = ‘created_date’)
LEFT JOIN eav e2 ON (b.bug_id = e2.bug_id
AND e2.attr_name = ‘priority’)
38
Entity-Attribute-Value
39
Entity-Attribute-Value
• Define similar tables for similar types:
“Sister Tables”
40
Entity-Attribute-Value
• Define related tables for related types:
“Inherited Tables”
CREATE TABLE issues (
issue_id
SERIAL PRIMARY KEY,
created_date
DATE NOT NULL
priority
VARCHAR(20),
description
TEXT
)
41
Entity-Attribute-Value
42
Metadata Tribbles
43
Metadata Tribbles
My database has
fifty... thousand... tables.
44
Metadata Tribbles
45
Metadata Tribbles
46
Metadata Tribbles
47
Metadata Tribbles
48
Metadata Tribbles
49
Metadata Tribbles
50
Metadata Tribbles
• Solution #1: use horizontal partitioning
• Physically split, while logically whole
BUGS
(2006)
BUGS
BUGS (2007)
BUGS
(2008)
51
Metadata Tribbles
• Solution #2: use vertical partitioning
• Move bulky and seldom-used columns to a
second table in one-to-one relationship
( 1 .. 1 )
PRODUCTS INSTALLERS
52
Metadata Tribbles
53
Metadata Tribbles
• Solution #3: add a dependent table
CREATE TABLE bugs_prods (
bug_id
BIGINT REFERENCES bugs
product_id
BIGINT REFERENCES products,
PRIMARY KEY (bug_id, product_id)
)
(1 .. *)
BUGS BUGS_PRODS
54
Antipattern Categories
Logical Database Physical Database
Antipatterns Antipatterns
CREATE TABLE BugsProducts (
bug_id INTEGER REFERENCES Bugs,
product VARCHAR(100) REFERENCES Products,
PRIMARY KEY (bug_id, product)
);
Query Application
Antipatterns Antipatterns
SELECT b.product, COUNT(*) $dbHandle = new PDO(‘mysql:dbname=test’);
FROM BugsProducts AS b $stmt = $dbHandle->prepare($sql);
GROUP BY b.product; $result = $stmt->fetchAll();
55
Physical Database Antipatterns
5. ID Required
6. Phantom Files
7. FLOAT Antipattern
8. ENUM Antipattern
9. Readable Passwords
56
ID Required
57
ID Required
58
ID Required
59
ID Required
• Issues:
• Natural keys
• Compound keys
• Duplicate rows
• Obscure meaning
60
ID Required
• Natural keys
• Not auto-generated
61
ID Required
• Compound keys
• More than one column
62
ID Required
63
ID Required
64
ID Required
65
ID Required
• “id” prevents JOIN...USING
SELECT *
FROM bugs
JOIN bugs_prods USING (bug_id)
BUGS. BUGS_PRODS.
bug_id bug_id
66
ID Required
• “id” prevents JOIN...USING
SELECT *
FROM bugs b
JOIN bugs_prods p ON (b.id = p.bug_id)
BUGS. BUGS_PRODS.
id bug_id
67
ID Required
• Solution:
• Use natural keys when needed
• Use compound keys when needed
• Choose sensible names
• Use same name in foreign keys
68
Phantom Files
69
Phantom Files
• Objective: store screenshot images
BUGS SCREENSHOTS
70
Phantom Files
• Antipattern: store path to image in
database, image on filesystem
image.png
BUGS SCREENSHOTS
A
CREATE TABLE screenshots (
bug_id
BIGINT REFERENCES bugs,
image_path
VARCHAR(200),
comment
VARCHAR(200)
)
71
Phantom Files
• Files don’t obey DELETE
A
insert
row create
file
72
Phantom Files
• Files don’t obey DELETE
A
delete
row file is
orphaned
73
Phantom Files
• Files don’t obey UPDATE
1. Client #1 updates row, acquires row lock
A
B
update
row replace
file
74
Phantom Files
• Files don’t obey UPDATE
2. Client #2 replaces image file, but not row
B
C
2nd
update gets replaces file
conflict error anyway
75
Phantom Files
• Files don’t obey ROLLBACK
1. Start transaction and INSERT
A
insert
row create
file
76
Phantom Files
• Files don’t obey ROLLBACK
2. ROLLBACK
A
row is file is
discarded orphaned
77
Phantom Files
• Files don’t obey ROLLBACK
1. Start transaction and UPDATE
A
B
update
row replace
file
78
Phantom Files
• Files don’t obey ROLLBACK
2. ROLLBACK
B
new
new
row reverts
file doesn’t
to old
revert
79
Phantom Files
• Files don’t obey transaction isolation
1. Client #1 starts transaction and UPDATE
A
B
update
row replace
file
80
Phantom Files
• Files don’t obey transaction isolation
2. Client #2 queries before #1 COMMITs
B
query reads ...but reads
old row... new image file
81
Phantom Files
• Files don’t obey database backup tools
A
included in excluded
backup from backup
82
Phantom Files
• Files don’t obey SQL access privileges
A
you
but
can’t read
you can read
this row
this file
83
Phantom Files
• Solution: consider storing images inside
the database, in a BLOB column
CREATE TABLE screenshots (
bug_id
BIGINT REFERENCES bugs,
image_data
BLOB,
comment
VARCHAR(200)
)
BUGS SCREENSHOTS
84
Phantom Files
85
FLOAT Antipattern
86
FLOAT Antipattern
87
FLOAT Antipattern
88
FLOAT Antipattern
• FLOAT is inexact
SELECT work_estimate_hrs
FROM bugs WHERE bug_id = 1234
‣ 3.3
SELECT work_estimate_hrs * 1000000000
FROM bugs WHERE bug_id = 1234
‣ 3299999952.3163
89
FLOAT Antipattern
90
FLOAT Antipattern
91
FLOAT Antipattern
92
ENUM Antipattern
93
ENUM Antipattern
94
ENUM Antipattern
95
ENUM Antipattern
96
ENUM Antipattern
97
ENUM Antipattern
98
ENUM Antipattern
• Use a lookup table if values may change
CREATE TABLE bug_status (
status
VARCHAR(10) PRIMARY KEY
)
INSERT INTO bug_status (status)
VALUES (‘new’), (‘open’), (‘fixed’)
BUGS BUG_STATUS
99
ENUM Antipattern
100
ENUM Antipattern
101
Readable Passwords
102
Readable Passwords
• Objective:
help users who forget their password
103
Readable Passwords
104
Readable Passwords
105
Readable Passwords
• Antipattern: send password in plain
text in email to user upon request
From: daemon
To: bill@example.com
Subject: password request
...
106
Readable Passwords
107
Readable Passwords
108
Readable Passwords
• Better: concatentate a salt before using
the hash function
CREATE TABLE accounts (
...
password_hash
CHAR(32) NOT NULL,
salt
BINARY(4) NOT NULL
)
SELECT (a.password_hash = MD5(
CONCAT(‘xyzzy’, a.salt))) AS is_correct
FROM accounts a
WHERE a.acct_name = ‘bill’
109
Readable Passwords
110
Readable Passwords
• Reset password to a temporary random
string, require user to change it
From: daemon
To: bill@example.com
Subject: password reset
111
Antipattern Categories
Logical Database Physical Database
Antipatterns Antipatterns
CREATE TABLE BugsProducts (
bug_id INTEGER REFERENCES Bugs,
product VARCHAR(100) REFERENCES Products,
PRIMARY KEY (bug_id, product)
);
Query Application
Antipatterns Antipatterns
SELECT b.product, COUNT(*) $dbHandle = new PDO(‘mysql:dbname=test’);
FROM BugsProducts AS b $stmt = $dbHandle->prepare($sql);
GROUP BY b.product; $result = $stmt->fetchAll();
112
Query Antipatterns
113
Ambiguous
GROUP BY
114
Ambiguous GROUP BY
115
Ambiguous GROUP BY
• Antipattern: bug_id isn’t that of the
latest per product
product_name bug_id created_date
Open RoundFile 1234 2007-12-19
Open RoundFile 2248 2008-04-01 product_name bug_id latest
Visual TurboBuilder 3456 2008-02-16 Open RoundFile 1234 2008-04-01
Visual TurboBuilder 4077 2008-02-10 Visual TurboBuilder 3456 2008-02-16
ReConsider 5678 2008-01-01 ReConsider 5678 2008-01-01
ReConsider 8063 2007-11-09
116
Ambiguous GROUP BY
117
Ambiguous GROUP BY
• Functional dependency:
• For a given product_name, there is guaranteed to
be one value in a functionally dependent attribute
product_name bug_id created_date
Open RoundFile 1234 2007-12-19
multiple values per
Open RoundFile 2248 2008-04-01 product name
118
Ambiguous GROUP BY
• Solution #1: use only functionally
dependent attributes in select-list
SELECT prod_name, bug_id,
MAX(created_date) as latest
FROM bugs
GROUP BY prod_name
product_name latest
Open RoundFile 2008-04-01
Visual TurboBuilder 2008-02-16
ReConsider 2008-01-01
119
Ambiguous GROUP BY
• Solution #2: use GROUP_CONCAT()
function in MySQL to get all values
SELECT prod_name,
GROUP_CONCAT(bug_id) as bug_id_list,
MAX(created_date) as latest
FROM bugs
GROUP BY prod_name
120
Ambiguous GROUP BY
• Solution #3: use this OUTER JOIN
query instead of GROUP BY
SELECT b.prod_name, b.bug_id,
b.created_date AS latest
FROM bugs b LEFT OUTER JOIN bugs b2
ON (b.prod_name = b2.prod_name AND
b.created_date < b2.created_date)
WHERE b2.bug_id IS NULL
121
HAVING Antipattern
122
HAVING Antipattern
not
allowed!
123
HAVING Antipattern
124
HAVING Antipattern
2. SELECT expressions
4. GROUP BY
5. HAVING expressions
6. ORDER BY
125
HAVING Antipattern
2. SELECT expressions
4. GROUP BY
5. HAVING expressions
6. ORDER BY
126
HAVING Antipattern
4. GROUP BY
5. HAVING expressions
6. ORDER BY
127
HAVING Antipattern
2. SELECT expressions
5. HAVING expressions
6. ORDER BY
128
HAVING Antipattern
2. SELECT expressions
6. ORDER BY
129
HAVING Antipattern
2. SELECT expressions
4. GROUP BY
130
HAVING Antipattern
2. SELECT expressions
postponing row filtering
3. Define column aliases causes the SELECT
4. GROUP BY expressions to execute
for all rows
5. HAVING expressions
6. ORDER BY
131
HAVING Antipattern
132
HAVING Antipattern
133
HAVING Antipattern
134
Poor Man’s
Search Engine
135
Poor Man’s Search Engine
136
Poor Man’s Search Engine
• Or regular expressions
SELECT * FROM bugs
WHERE description RLIKE ‘crash’
137
Poor Man’s Search Engine
• Performance issues
• Indexes don’t benefit from substring searches
138
Poor Man’s Search Engine
139
Poor Man’s Search Engine
140
Poor Man’s Search Engine
• Accuracy issues
• Substrings find irrelevant or false matches
141
Poor Man’s Search Engine
• http://lucene.apache.org/
• http://www.sphinxsearch.com/
142
Implicit Columns
143
Implicit Columns
144
Implicit Columns
• Reposition a column
• Rename a column
145
Implicit Columns
146
Implicit Columns
147
Implicit Columns
148
Antipattern Categories
Logical Database Physical Database
Antipatterns Antipatterns
CREATE TABLE BugsProducts (
bug_id INTEGER REFERENCES Bugs,
product VARCHAR(100) REFERENCES Products,
PRIMARY KEY (bug_id, product)
);
Query Application
Antipatterns Antipatterns
SELECT b.product, COUNT(*) $dbHandle = new PDO(‘mysql:dbname=test’);
FROM BugsProducts AS b $stmt = $dbHandle->prepare($sql);
GROUP BY b.product; $result = $stmt->fetchAll();
149
Application Antipatterns
150
User-Supplied SQL
151
User-supplied SQL
152
User-supplied SQL
• Antipattern:
• Let users write their own SQL expressions
153
User-supplied SQL
154
User-supplied SQL
155
User-supplied SQL
156
SQL Injection
157
SQL Injection
158
SQL Injection
159
SQL Injection
160
SQL Injection
161
SQL Injection
162
Parameter Façade
163
Parameter Façade
164
Parameter Façade
• Antipatterns:
• Trying to use parameters to change syntax
165
Parameter Façade
166
Parameter Façade
167
Parameter Façade
168
Parameter Façade
169
Parameter Façade
170
Parameter Façade
171
Parameter Façade
172
Parameter Façade
SELECT expr-list *
bug_id
parameter
placeholder
173
Parameter Façade
• Runs query
could invalidate
optimization plan
174
Parameter Façade
SELECT expr-list *
bug_id
1234
supplied
value
175
Parameter Façade
SELECT expr-list *
bug_id
1234
or 1=1
supplied
value
176
Parameter Façade
177
Parameter Façade
SELECT expr-list *
bug_id
1234
intended
value
178
Parameter Façade
SELECT expr-list *
bug_id
1234
WHERE expr OR
1
equality =
SQL injection 1
179
Parameter Façade
180
Parameter Façade
must supply
exactly four values
181
Parameter Façade
Scenario Value Interpolation Parameter
single value ‘1234’ SELECT * FROM bugs SELECT * FROM bugs
WHERE bug_id = $id WHERE bug_id = ?
multiple values ‘1234, 3456, 5678’ SELECT * FROM bugs SELECT * FROM bugs
WHERE bug_id IN ($list) WHERE bug_id IN ( ?, ?, ? )
182
Parameter Façade
• Solution:
• Use parameters only for individual values
183
Pseudokey Neat Freak
184
Pseudokey Neat Freak
• Objective: address the discomfort over
the presence of “gaps” in the primary key
185
Pseudokey Neat Freak
• AntiPattern #1:
changing values to close the gaps:
187
Pseudokey Neat Freak
188
Pseudokey Neat Freak
189
Pseudokey Neat Freak
• Reusing primary key values exposes data
bug_id description
insert row
bug_id = 1234
190
Pseudokey Neat Freak
• Reusing primary key values exposes data
bug_id description
notify of
bug_id 1234
191
Pseudokey Neat Freak
• Reusing primary key values exposes data
Zzz...
bug_id description
delete row
bug_id = 1234
192
Pseudokey Neat Freak
• Reusing primary key values exposes data
no bug
1234 ?
bug_id description
193
Pseudokey Neat Freak
• Reusing primary key values exposes data
bug_id description
re-insert row
bug_id = 1234
194
Pseudokey Neat Freak
• Reusing primary key values exposes data
bug 1234
is not mine!
bug_id description
195
Pseudokey Neat Freak
• Solution:
• Don’t re-use primary key values
• Create 1,000 rows per second 24x7 for 584 million years
196
Session Coupling
197
Session Coupling
• CGI applications
198
Session Coupling
• Antipattern: persistent connections
199
Session Coupling
• Scalability
• One database connection per Apache thread
Apache Database
httpd server
200
Session Coupling
• Uniform authentication
• All apps must use identical user & password
201
Session Coupling
202
Session Coupling
• Request #2:
mysteriously inherits connection character set,
uses Russian 8-bit encoding and collation
203
Session Coupling
• Request #2:
mysteriously inherits autocommit behavior,
ROLLBACK ineffective
204
Session Coupling
• Request #2:
LAST_INSERT_ID() returns bug_id
generated in previous request
205
Session Coupling
206
Session Coupling
• Wishlist:
• Method to reset connection to default state
207
Phantom Side Effects
208
Phantom Side Effects
209
Phantom Side Effects
210
Phantom Side Effects
• External effects don’t obey ROLLBACK
1. Start transaction and INSERT
bug_id description
notify of
bug_id 1234
insert row
bug_id = 1234
211
Phantom Side Effects
• External effects don’t obey ROLLBACK
2. ROLLBACK
bug_id description
I got email,
discard but no row
row 1234?
212
Phantom Side Effects
• External effects don’t obey transaction
isolation
1. Start transaction and INSERT
bug_id description
notify of
bug_id 1234
insert row
bug_id = 1234
213
Phantom Side Effects
• External effects don’t obey transaction
isolation
2. Email is received before row is visible
bug_id description
I got email,
row but no row
pending 1234?
commit
214
Phantom Side Effects
• Auditing/logging confusion
215
Phantom Side Effects
216
Phantom Side Effects
217
Phantom Side Effects
• Solution:
• Operate only on database in triggers, stored
procedures, database functions
218
Antipattern Categories
Logical Database Physical Database
Antipatterns Antipatterns
CREATE TABLE BugsProducts (
bug_id INTEGER REFERENCES Bugs,
product VARCHAR(100) REFERENCES Products,
PRIMARY KEY (bug_id, product)
);
Query Application
Antipatterns Antipatterns
SELECT b.product, COUNT(*) $dbHandle = new PDO(‘mysql:dbname=test’);
FROM BugsProducts AS b $stmt = $dbHandle->prepare($sql);
GROUP BY b.product; $result = $stmt->fetchAll();
219
Thank You
Bill Karwin
220