Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
SlideShare a Scribd company logo
©	
  2011	
  –	
  2013	
  PERCONA	
  
Extensible Data Modeling with MySQL
Bill Karwin
Percona Live MySQL Conference & ExpoBill Karwin
Percona Live 2013
©	
  2011	
  –	
  2013	
  PERCONA	
  
	

“I need to add a 
new column— 
but I don’t want
ALTER TABLE to
lock the application
for a long time.”
©	
  2011	
  –	
  2013	
  PERCONA	
  
How MySQL Does ALTER TABLE
1.  Lock the table.
2.  Make a new, empty the table like the original.
3.  Modify the columns of the new empty table.
4.  Copy all rows of data from original to new table…
no matter how long it takes.
5.  Swap the old and new tables.
6.  Unlock the tables  drop the original.
©	
  2011	
  –	
  2013	
  PERCONA	
  
Extensibility
•  How can we add new attributes without the pain of
schema changes?
–  Object-oriented modeling
–  Sparse columns
•  Especially to support user-defined attributes at
runtime or after deployment:
–  Content management systems
–  E-commerce frameworks
–  Games
©	
  2011	
  –	
  2013	
  PERCONA	
  
Solutions
•  “Extra Columns”
•  Entity-Attribute-Value
•  Class Table Inheritance
•  Serialized LOB  Inverted Indexes
•  Online Schema Changes
•  Non-Relational Databases
©	
  2011	
  –	
  2013	
  PERCONA	
  
EXTRA COLUMNS
©	
  2011	
  –	
  2013	
  PERCONA	
  
Table with Fixed Columns
CREATE TABLE Title (!
id int(11) NOT NULL AUTO_INCREMENT PRIMARY KEY,!
title text NOT NULL,!
imdb_index varchar(12) DEFAULT NULL,!
kind_id int(11) NOT NULL,!
production_year int(11) DEFAULT NULL,!
imdb_id int(11) DEFAULT NULL,!
phonetic_code varchar(5) DEFAULT NULL,!
episode_of_id int(11) DEFAULT NULL,!
season_nr int(11) DEFAULT NULL,!
episode_nr int(11) DEFAULT NULL,!
series_years varchar(49) DEFAULT NULL,!
title_crc32 int(10) unsigned DEFAULT NULL!
);!
©	
  2011	
  –	
  2013	
  PERCONA	
  
Table with Extra Columns
CREATE TABLE Title (!
id int(11) NOT NULL AUTO_INCREMENT PRIMARY KEY,!
title text NOT NULL,!
imdb_index varchar(12) DEFAULT NULL,!
kind_id int(11) NOT NULL,!
production_year int(11) DEFAULT NULL,!
imdb_id int(11) DEFAULT NULL,!
phonetic_code varchar(5) DEFAULT NULL,!
extra_data1 TEXT DEFAULT NULL,!
extra_data2 TEXT DEFAULT NULL, !
extra_data3 TEXT DEFAULT NULL,!
extra_data4 TEXT DEFAULT NULL,!
extra_data5 TEXT DEFAULT NULL,!
extra_data6 TEXT DEFAULT NULL,!
);!
use for whatever comes up
that we didn’t think of at the
start of the project
©	
  2011	
  –	
  2013	
  PERCONA	
  
Adding a New Attribute
UPDATE Title 

SET extra_data3 = 'PG-13'

WHERE id = 207468;! remember which column
you used for each new
attribute!
©	
  2011	
  –	
  2013	
  PERCONA	
  
Pros and Cons
•  Good solution:
–  No ALTER TABLE necessary to use a column for a new
attribute—only a project decision is needed.
–  Related to Single Table Inheritance (STI)
http://martinfowler.com/eaaCatalog/singleTableInheritance.html
©	
  2011	
  –	
  2013	
  PERCONA	
  
Pros and Cons
•  Bad solution:
–  If you run out of extra columns, then you’re back to
ALTER TABLE.
–  Anyone can put any data in the columns—you can’t
assume consistent usage on every row.
–  Columns lack descriptive names or the right data type.
©	
  2011	
  –	
  2013	
  PERCONA	
  
ENTITY-ATTRIBUTE-VALUE
©	
  2011	
  –	
  2013	
  PERCONA	
  
EAV
•  Store each attribute in a row instead of a column.
CREATE TABLE Attributes (

entity INT NOT NULL,

attribute VARCHAR(20) NOT NULL,

value TEXT,

FOREIGN KEY (entity) 

REFERENCES Title (id)

);!
©	
  2011	
  –	
  2013	
  PERCONA	
  
Example EAV Data
SELECT * FROM Attributes;!
+--------+-----------------+---------------------+!
| entity | attribute | value |!
+--------+-----------------+---------------------+!
| 207468 | title | Goldfinger |!
| 207468 | production_year | 1964 |!
| 207468 | rating | 7.8 |!
| 207468 | length | 110 min |!
+--------+-----------------+---------------------+!
!
!
!
©	
  2011	
  –	
  2013	
  PERCONA	
  
Adding a New Attribute
•  Simply use INSERT with a new attribute name.
INSERT INTO Attributes (entity, attribute, value)

VALUES (207468, 'budget', '$3,000,000');!
©	
  2011	
  –	
  2013	
  PERCONA	
  
Query EAV as a Pivot
SELECT a.entity AS id,!
a.value AS title,!
y.value AS production_year,!
r.value AS rating,

b.value AS budget!
FROM Attributes AS a!
JOIN Attributes AS y USING (entity)!
JOIN Attributes AS r USING (entity)!
JOIN Attributes AS b USING (entity)!
WHERE a.attribute = 'title'!
AND y.attribute = 'production_year'!
AND r.attribute = 'rating'

AND b.attribute = 'budget';!
+--------+------------+-----------------+--------+------------+!
| id | title | production_year | rating | budget |!
+--------+------------+-----------------+--------+------------+!
| 207468 | Goldfinger | 1964 | 7.8 | $3,000,000 |!
+--------+------------+-----------------+--------+------------+!
another join required for
each additional attribute
©	
  2011	
  –	
  2013	
  PERCONA	
  
©	
  2011	
  –	
  2013	
  PERCONA	
  
Sounds Simple Enough, But…
•  NOT NULL doesn’t work
•  FOREIGN KEY doesn’t work
•  UNIQUE KEY doesn’t work
•  Data types don’t work
•  Searches don’t scale
•  Indexes and storage are inefficient
©	
  2011	
  –	
  2013	
  PERCONA	
  
Constraints Don’t Work
CREATE TABLE Attributes (

entity INT NOT NULL,

attribute VARCHAR(20) NOT NULL,

value TEXT NOT NULL,

FOREIGN KEY (entity) 

REFERENCES Title (id)

FOREIGN KEY (value) 

REFERENCES Ratings (rating)

);!
constraints apply to all
rows, not just rows for a
specific attribute type
©	
  2011	
  –	
  2013	
  PERCONA	
  
Data Types Don’t Work
INSERT INTO Attributes (entity, attribute, value)

VALUES (207468, 'budget', 'banana');!
database cannot
prevent application
storing nonsensical
data
©	
  2011	
  –	
  2013	
  PERCONA	
  
Add Typed Value Columns?
CREATE TABLE Attributes (

entity INT NOT NULL,

attribute VARCHAR(20) NOT NULL,

intvalue BIGINT,

floatvalue FLOAT,

textvalue TEXT,

datevalue DATE,

datetimevalue DATETIME, 

FOREIGN KEY (entity) 

REFERENCES Title (id)

);!
now my application needs to
know which data type column
to use for each attribute when
inserting and querying
©	
  2011	
  –	
  2013	
  PERCONA	
  
Searches Don’t Scale
•  You must hard-code each attribute name,
–  One JOIN per attribute!
•  Alternatively, you can query all attributes, but the
result is one attribute per row:
SELECT attribute, value 

FROM Attributes 

WHERE entity = 207468;!
–  …and sort it out in your application code.
©	
  2011	
  –	
  2013	
  PERCONA	
  
Indexes and Storage Are Inefficient
•  Many rows, with few distinct attribute names.
–  Poor index cardinality.
•  The entity and attribute columns use extra
space for every attribute of every “row.”
–  In a conventional table, the entity is the primary key, so
it’s stored only once per row.
–  The attribute name is in the table definition, so it’s
stored only once per table.
©	
  2011	
  –	
  2013	
  PERCONA	
  
Pros and Cons
•  Good solution:
–  No ALTER TABLE needed again—ever!
–  Supports ultimate flexibility, potentially any “row” can
have its own distinct set of attributes.
©	
  2011	
  –	
  2013	
  PERCONA	
  
Pros and Cons
•  Bad solution:
–  SQL operations become more complex.
–  Lots of application code required to reinvent features
that an RDBMS already provides.
–  Doesn’t scale well—pivots required.
©	
  2011	
  –	
  2013	
  PERCONA	
  
CLASS TABLE INHERITANCE
©	
  2011	
  –	
  2013	
  PERCONA	
  
Subtypes
•  Titles includes:
–  Films
–  TV shows
–  TV episodes
–  Video games
•  Some attributes apply to all, other attributes apply
to one subtype or the other.
©	
  2011	
  –	
  2013	
  PERCONA	
  
Title Table
CREATE TABLE Title (!
id int(11) NOT NULL AUTO_INCREMENT PRIMARY KEY,!
title text NOT NULL,!
imdb_index varchar(12) DEFAULT NULL,!
kind_id int(11) NOT NULL,!
production_year int(11) DEFAULT NULL,!
imdb_id int(11) DEFAULT NULL,!
phonetic_code varchar(5) DEFAULT NULL,!
episode_of_id int(11) DEFAULT NULL,!
season_nr int(11) DEFAULT NULL,!
episode_nr int(11) DEFAULT NULL,!
series_years varchar(49) DEFAULT NULL,!
title_crc32 int(10) unsigned DEFAULT NULL!
);!
only for tv shows
©	
  2011	
  –	
  2013	
  PERCONA	
  
Title Table with Subtype Tables
CREATE TABLE Title (!
id int(11) NOT NULL AUTO_INCREMENT PRIMARY KEY,!
title text NOT NULL,!
imdb_index varchar(12) DEFAULT NULL,!
kind_id int(11) NOT NULL,!
production_year int(11) DEFAULT NULL,!
imdb_id int(11) DEFAULT NULL,!
phonetic_code varchar(5) DEFAULT NULL,!
title_crc32 int(10) unsigned DEFAULT NULL,!
PRIMARY KEY (id)!
);!
!
CREATE TABLE Film (!
id int(11) NOT NULL PRIMARY KEY,

aspect_ratio varchar(20),!
FOREIGN KEY (id) REFERENCES Title(id)!
);!
!
CREATE TABLE TVShow (!
id int(11) NOT NULL PRIMARY KEY,!
episode_of_id int(11) DEFAULT NULL,!
season_nr int(11) DEFAULT NULL,!
episode_nr int(11) DEFAULT NULL,!
series_years varchar(49) DEFAULT NULL,!
FOREIGN KEY (id) REFERENCES Title(id)!
);!
Title	
  
Film	
   TVShow	
  
1:1 1:1
©	
  2011	
  –	
  2013	
  PERCONA	
  
Adding a New Subtype
•  Create a new table—without locking existing tables.
CREATE TABLE VideoGames (

id int(11) NOT NULL PRIMARY KEY,

platforms varchar(100) NOT NULL,

FOREIGN KEY (id) 

REFERENCES Title(id)

);!
©	
  2011	
  –	
  2013	
  PERCONA	
  
Pros and Cons
•  Good solution:
–  Best to support a finite set of subtypes, which are likely
unchanging after creation.
–  Data types and constraints work normally.
–  Easy to create or drop subtype tables.
–  Easy to query attributes common to all subtypes.
–  Subtype tables are shorter, indexes are smaller.
©	
  2011	
  –	
  2013	
  PERCONA	
  
Pros and Cons
•  Bad solution:
–  Adding one entry takes two INSERT statements.
–  Querying attributes of subtypes requires a join.
–  Querying all types with subtype attributes requires
multiple joins (as many as subtypes).
–  Adding a common attribute locks a large table.
–  Adding an attribute to a populated subtype locks a
smaller table.
©	
  2011	
  –	
  2013	
  PERCONA	
  
SERIALIZED LOB
©	
  2011	
  –	
  2013	
  PERCONA	
  
What is Serializing?
•  Objects in your applications can be represented in
serialized form—i.e., convert the object to a scalar
string that you can save and load back as an object.
–  Java objects implementing Serializable and
processed with writeObject()!
–  PHP variables processed with serialize()!
–  Python objects processed with pickle.dump()!
–  Data encoded with XML, JSON, YAML, etc.
©	
  2011	
  –	
  2013	
  PERCONA	
  
What Is a LOB?
•  The BLOB or TEXT datatypes can store long
sequences of bytes or characters, such as a string.
•  You can store the string representing your object
into a single BLOB or TEXT column.
–  You don’t need to define SQL columns for each field of
your object.
©	
  2011	
  –	
  2013	
  PERCONA	
  
Title Table with Serialized LOB
CREATE TABLE Title (

id int(11) NOT NULL AUTO_INCREMENT PRIMARY KEY,

title text NOT NULL,

imdb_index varchar(12) DEFAULT NULL,

kind_id int(11) NOT NULL,

production_year int(11) DEFAULT NULL,

imdb_id int(11) DEFAULT NULL,

phonetic_code varchar(5) DEFAULT NULL,

title_crc32 int(10) unsigned DEFAULT NULL

extra_info TEXT 

);!
holds everything else,
plus anything we
didn’t think of
©	
  2011	
  –	
  2013	
  PERCONA	
  
Adding a New Attribute
UPDATE Title

SET extra_info = 

'{

episode_of_id: 1291895, 

season_nr: 5, 

episode_nr: 6

}'

WHERE id = 1292057;!
JSON example
©	
  2011	
  –	
  2013	
  PERCONA	
  
Using XML in MySQL
•  MySQL has limited support for XML.
SELECT id, title, 

ExtractValue(extra_info, '/episode_nr') 

AS episode_nr 

FROM Title

WHERE ExtractValue(extra_info, 

'/episode_of_id') = 1292057;!
•  Forces table-scans, not possible to use indexes.
http://dev.mysql.com/doc/refman/5.6/en/xml-functions.html
©	
  2011	
  –	
  2013	
  PERCONA	
  
Dynamic Columns in MariaDB
CREATE TABLE Title (

id int(11) NOT NULL AUTO_INCREMENT PRIMARY KEY,

title text NOT NULL,

...

extra_info BLOB 

);!
INSERT INTO Title (title, extra_info) 

VALUES ('Trials and Tribble-ations', 

COLUMN_CREATE('episode_of_id', '1291895', 

'episode_nr', '5',

'season_nr', '6'));!
https://kb.askmonty.org/en/dynamic-columns/
©	
  2011	
  –	
  2013	
  PERCONA	
  
Pros and Cons
•  Good solution:
–  Store any object and add new custom fields at any time.
–  No need to do ALTER TABLE to add custom fields.
©	
  2011	
  –	
  2013	
  PERCONA	
  
Pros and Cons
•  Bad solution:
–  Not indexable.
–  Must return the whole object, not an individual field.
–  Must write the whole object to update a single field.
–  Hard to use a custom field in a WHERE clause, GROUP BY
or ORDER BY.
–  No support in the database for data types or
constraints, e.g. NOT NULL, UNIQUE, FOREIGN KEY.
©	
  2011	
  –	
  2013	
  PERCONA	
  
INVERTED INDEXES
©	
  2011	
  –	
  2013	
  PERCONA	
  
Use This with Serialized LOB
•  Helps to mitigate some of the weaknesses.
©	
  2011	
  –	
  2013	
  PERCONA	
  
How This Works
•  Create a new table for each field of the LOB that
you want to address individually:
CREATE TABLE Title_EpisodeOf (

episode_of_id INT NOT NULL,

id INT NOT NULL,

PRIMARY KEY (episode_of_id, id),

FOREIGN KEY (id)

REFERENCES Title (id)

);!
here’s where you get
the index support
©	
  2011	
  –	
  2013	
  PERCONA	
  
How This Works
•  For each LOB containing an “episode_of_id” field,
insert a row to the attribute table with its value.
INSERT INTO Title_EpisodeOf 

VALUES (1291895, 1292057);!
•  If another title doesn’t have this field,
then you don’t create a referencing row.
©	
  2011	
  –	
  2013	
  PERCONA	
  
Query for Recent Users
SELECT u.*

FROM Title_EpisodeOf AS e

JOIN Title AS t USING (id)

WHERE e.episode_of_id = '1291895';!
This is a primary key lookup.
It matches only titles that have such a field,
and whose value matches the condition
©	
  2011	
  –	
  2013	
  PERCONA	
  
Pros and Cons
•  Good solution:
–  Preserves the advantage of Serialized LOB.
–  Adds support for SQL data types, and UNIQUE and
FOREIGN KEY constraints.
–  You can index any custom field—without locking the
master table.
©	
  2011	
  –	
  2013	
  PERCONA	
  
Pros and Cons
•  Bad solution:
–  Redundant storage.
–  It’s up to you to keep attribute tables in sync manually
(or with triggers).
–  Requires JOIN to fetch the master row.
–  You must plan which columns you want to be indexed
(but this is true of conventional columns too).
–  Still no support for NOT NULL constraint.
©	
  2011	
  –	
  2013	
  PERCONA	
  
ONLINE SCHEMA CHANGES
©	
  2011	
  –	
  2013	
  PERCONA	
  
pt-online-schema-change
•  Performs online, non-blocking ALTER TABLE.
–  Captures concurrent updates to a table while
restructuring.
–  Some risks and caveats exist; please read the manual
and test carefully.
•  Free tool—part of Percona Toolkit.
–  http://www.percona.com/doc/percona-toolkit/pt-online-schema-
change.html
©	
  2011	
  –	
  2013	
  PERCONA	
  
How MySQL Does ALTER TABLE
1.  Lock the table.
2.  Make a new, empty the table like the original.
3.  Modify the columns of the new empty table.
4.  Copy all rows of data from original to new table.
5.  Swap the old and new tables.
6.  Unlock the tables  drop the original.
©	
  2011	
  –	
  2013	
  PERCONA	
  
How pt-osc Does ALTER TABLE
Lock the table.
1.  Make a new, empty the table like the original.
2.  Modify the columns of the new empty table.
3.  Copy all rows of data from original to new table.
a.  Iterate over the table in chunks, in primary key order.
b.  Use triggers to capture ongoing changes in the
original, and apply them to the new table.
4.  Swap the tables, then drop the original.
Unlock the tables.
©	
  2011	
  –	
  2013	
  PERCONA	
  
Visualize This (1)
cast_info after
trigger
cast_info new
©	
  2011	
  –	
  2013	
  PERCONA	
  
Visualize This (2)
cast_info after
trigger
cast_info new
©	
  2011	
  –	
  2013	
  PERCONA	
  
Visualize This (3)
cast_info after
trigger
cast_info new
©	
  2011	
  –	
  2013	
  PERCONA	
  
Visualize This (4)
cast_info after
trigger
cast_info new
©	
  2011	
  –	
  2013	
  PERCONA	
  
Visualize This (5)
cast_info after
trigger
cast_info new
©	
  2011	
  –	
  2013	
  PERCONA	
  
Visualize This (6)
cast_info cast_info old
DROP
©	
  2011	
  –	
  2013	
  PERCONA	
  
Adding a New Attribute
•  Design the ALTER TABLE statement, but don’t
execute it yet.
mysql ALTER TABLE cast_info 

ADD COLUMN source INT NOT NULL;!
•  Equivalent pt-online-schema-change command:
$ pt-online-schema-change 

h=localhost,D=imdb,t=cast_info 

--alter ADD COLUMN source INT NOT NULL!
©	
  2011	
  –	
  2013	
  PERCONA	
  
Execute
$ pt-online-schema-change h=localhost,D=imdb,t=cast_info 

--alter ADD COLUMN source INT NOT NULL --execute!
!
Altering `imdb`.`cast_info`...!
Creating new table...!
Created new table imdb._cast_info_new OK.!
Altering new table...!
Altered `imdb`.`_cast_info_new` OK.!
Creating triggers...!
Created triggers OK.!
Copying approximately 22545051 rows...!
Copying `imdb`.`cast_info`: 10% 04:05 remain!
Copying `imdb`.`cast_info`: 19% 04:07 remain!
Copying `imdb`.`cast_info`: 28% 03:44 remain!
Copying `imdb`.`cast_info`: 37% 03:16 remain!
Copying `imdb`.`cast_info`: 47% 02:47 remain!
Copying `imdb`.`cast_info`: 56% 02:18 remain!
Copying `imdb`.`cast_info`: 64% 01:53 remain!
Copying `imdb`.`cast_info`: 73% 01:28 remain!
Copying `imdb`.`cast_info`: 82% 00:55 remain!
Copying `imdb`.`cast_info`: 91% 00:26 remain!
Copied rows OK.!
Swapping tables...!
Swapped original and new tables OK.!
Dropping old table...!
Dropped old table `imdb`.`_cast_info_old` OK.!
Dropping triggers...!
Dropped triggers OK.!
Successfully altered `imdb`.`cast_info`.!
!
!
©	
  2011	
  –	
  2013	
  PERCONA	
  
Self-Adjusting
•  Copies rows in chunks which the tool sizes
dynamically.
•  The tool throttles back if it increases load too much
or if it causes any replication slaves to lag.
•  The tool tries to set its lock timeouts to let
applications be more likely to succeed.
©	
  2011	
  –	
  2013	
  PERCONA	
  
Why Shouldn’t I Use This?
•  Is your table small enough that ALTER is already
quick enough?
•  Is your change already very quick, for example
DROP KEY in InnoDB?
•  Will pt-online-schema-change take too long or
increase the load too much?
©	
  2011	
  –	
  2013	
  PERCONA	
  
Pros and Cons
•  Good solution:
–  ALTER TABLE to add conventional columns without
the pain of locking.
©	
  2011	
  –	
  2013	
  PERCONA	
  
Pros and Cons
•  Bad solution:
–  Can take up to 4× more time than ALTER TABLE.
–  Table must have a PRIMARY key.
–  Table must not have triggers.
–  No need if your table is small and ALTER TABLE
already runs quickly enough.
–  No need for some ALTER TABLE operations that don’t
restructure the table (e.g. dropping indexes, adding
comments).
©	
  2011	
  –	
  2013	
  PERCONA	
  
NON-RELATIONAL DATABASES
©	
  2011	
  –	
  2013	
  PERCONA	
  
No Rules to Break
•  To be relational, a table must have a fixed set of
columns on every row.
•  No such rule exists in a non-relational model; you
can store a distinct set of fields per record.
•  No schema makes NoSQL more flexible.
©	
  2011	
  –	
  2013	
  PERCONA	
  
Adding a New Attribute
•  Document-oriented databases are designed to
support defining distinct attributes per document.
•  But you lose advantages of relational databases:
–  Data types
–  Constraints
–  Uniform structure of records
©	
  2011	
  –	
  2013	
  PERCONA	
  
SUMMARY
©	
  2011	
  –	
  2013	
  PERCONA	
  
Summary
Solu%on	
   Lock-­‐free	
   Flexible	
   Select	
   Filter	
   Indexed	
   Data	
  Types	
   Constraints	
  
Extra	
  
Columns	
  
no*	
   no	
   yes	
   yes	
   yes*	
   no	
   no	
  
EAV	
   yes	
   yes	
   yes*	
   yes	
   yes*	
   no*	
   no	
  
CTI	
   no*	
   no	
   yes	
   yes	
   yes	
   yes	
   yes	
  
LOB	
   yes	
   yes	
   no	
   no	
   no	
   no	
   no	
  
Inverted	
  
Index	
  
yes	
   yes	
   yes	
   yes	
   yes	
   yes	
   yes	
  
OSC	
   yes	
   no	
   yes	
   yes	
   yes	
   yes	
   yes	
  
NoSQL	
   yes	
   yes	
   yes	
   yes	
   yes	
   no*	
   no	
  
* conditions or exceptions apply.
©	
  2011	
  –	
  2013	
  PERCONA	
  
Senior Industry Experts	

In-Person and Online Classes	

Custom Onsite Training	

http://percona.com/training
©	
  2011	
  –	
  2013	
  PERCONA	
  
“Rate This Session”	

http://www.percona.com/live/mysql-conference-2013/
sessions/extensible-data-modeling-mysql
©	
  2011	
  –	
  2013	
  PERCONA	
  
SQL Antipatterns: 
Avoiding the Pitfalls of
Database Programming	

by Bill Karwin	

	

Available in print, epub, mobi, pdf. 
Delivery options for Kindle or Dropbox.	

	

http://pragprog.com/book/bksqla/
Extensible Data Modeling

More Related Content

Extensible Data Modeling

  • 1. ©  2011  –  2013  PERCONA   Extensible Data Modeling with MySQL Bill Karwin Percona Live MySQL Conference & ExpoBill Karwin Percona Live 2013
  • 2. ©  2011  –  2013  PERCONA   “I need to add a new column— but I don’t want ALTER TABLE to lock the application for a long time.”
  • 3. ©  2011  –  2013  PERCONA   How MySQL Does ALTER TABLE 1.  Lock the table. 2.  Make a new, empty the table like the original. 3.  Modify the columns of the new empty table. 4.  Copy all rows of data from original to new table… no matter how long it takes. 5.  Swap the old and new tables. 6.  Unlock the tables drop the original.
  • 4. ©  2011  –  2013  PERCONA   Extensibility •  How can we add new attributes without the pain of schema changes? –  Object-oriented modeling –  Sparse columns •  Especially to support user-defined attributes at runtime or after deployment: –  Content management systems –  E-commerce frameworks –  Games
  • 5. ©  2011  –  2013  PERCONA   Solutions •  “Extra Columns” •  Entity-Attribute-Value •  Class Table Inheritance •  Serialized LOB Inverted Indexes •  Online Schema Changes •  Non-Relational Databases
  • 6. ©  2011  –  2013  PERCONA   EXTRA COLUMNS
  • 7. ©  2011  –  2013  PERCONA   Table with Fixed Columns CREATE TABLE Title (! id int(11) NOT NULL AUTO_INCREMENT PRIMARY KEY,! title text NOT NULL,! imdb_index varchar(12) DEFAULT NULL,! kind_id int(11) NOT NULL,! production_year int(11) DEFAULT NULL,! imdb_id int(11) DEFAULT NULL,! phonetic_code varchar(5) DEFAULT NULL,! episode_of_id int(11) DEFAULT NULL,! season_nr int(11) DEFAULT NULL,! episode_nr int(11) DEFAULT NULL,! series_years varchar(49) DEFAULT NULL,! title_crc32 int(10) unsigned DEFAULT NULL! );!
  • 8. ©  2011  –  2013  PERCONA   Table with Extra Columns CREATE TABLE Title (! id int(11) NOT NULL AUTO_INCREMENT PRIMARY KEY,! title text NOT NULL,! imdb_index varchar(12) DEFAULT NULL,! kind_id int(11) NOT NULL,! production_year int(11) DEFAULT NULL,! imdb_id int(11) DEFAULT NULL,! phonetic_code varchar(5) DEFAULT NULL,! extra_data1 TEXT DEFAULT NULL,! extra_data2 TEXT DEFAULT NULL, ! extra_data3 TEXT DEFAULT NULL,! extra_data4 TEXT DEFAULT NULL,! extra_data5 TEXT DEFAULT NULL,! extra_data6 TEXT DEFAULT NULL,! );! use for whatever comes up that we didn’t think of at the start of the project
  • 9. ©  2011  –  2013  PERCONA   Adding a New Attribute UPDATE Title 
 SET extra_data3 = 'PG-13'
 WHERE id = 207468;! remember which column you used for each new attribute!
  • 10. ©  2011  –  2013  PERCONA   Pros and Cons •  Good solution: –  No ALTER TABLE necessary to use a column for a new attribute—only a project decision is needed. –  Related to Single Table Inheritance (STI) http://martinfowler.com/eaaCatalog/singleTableInheritance.html
  • 11. ©  2011  –  2013  PERCONA   Pros and Cons •  Bad solution: –  If you run out of extra columns, then you’re back to ALTER TABLE. –  Anyone can put any data in the columns—you can’t assume consistent usage on every row. –  Columns lack descriptive names or the right data type.
  • 12. ©  2011  –  2013  PERCONA   ENTITY-ATTRIBUTE-VALUE
  • 13. ©  2011  –  2013  PERCONA   EAV •  Store each attribute in a row instead of a column. CREATE TABLE Attributes (
 entity INT NOT NULL,
 attribute VARCHAR(20) NOT NULL,
 value TEXT,
 FOREIGN KEY (entity) 
 REFERENCES Title (id)
 );!
  • 14. ©  2011  –  2013  PERCONA   Example EAV Data SELECT * FROM Attributes;! +--------+-----------------+---------------------+! | entity | attribute | value |! +--------+-----------------+---------------------+! | 207468 | title | Goldfinger |! | 207468 | production_year | 1964 |! | 207468 | rating | 7.8 |! | 207468 | length | 110 min |! +--------+-----------------+---------------------+! ! ! !
  • 15. ©  2011  –  2013  PERCONA   Adding a New Attribute •  Simply use INSERT with a new attribute name. INSERT INTO Attributes (entity, attribute, value)
 VALUES (207468, 'budget', '$3,000,000');!
  • 16. ©  2011  –  2013  PERCONA   Query EAV as a Pivot SELECT a.entity AS id,! a.value AS title,! y.value AS production_year,! r.value AS rating,
 b.value AS budget! FROM Attributes AS a! JOIN Attributes AS y USING (entity)! JOIN Attributes AS r USING (entity)! JOIN Attributes AS b USING (entity)! WHERE a.attribute = 'title'! AND y.attribute = 'production_year'! AND r.attribute = 'rating'
 AND b.attribute = 'budget';! +--------+------------+-----------------+--------+------------+! | id | title | production_year | rating | budget |! +--------+------------+-----------------+--------+------------+! | 207468 | Goldfinger | 1964 | 7.8 | $3,000,000 |! +--------+------------+-----------------+--------+------------+! another join required for each additional attribute
  • 17. ©  2011  –  2013  PERCONA  
  • 18. ©  2011  –  2013  PERCONA   Sounds Simple Enough, But… •  NOT NULL doesn’t work •  FOREIGN KEY doesn’t work •  UNIQUE KEY doesn’t work •  Data types don’t work •  Searches don’t scale •  Indexes and storage are inefficient
  • 19. ©  2011  –  2013  PERCONA   Constraints Don’t Work CREATE TABLE Attributes (
 entity INT NOT NULL,
 attribute VARCHAR(20) NOT NULL,
 value TEXT NOT NULL,
 FOREIGN KEY (entity) 
 REFERENCES Title (id)
 FOREIGN KEY (value) 
 REFERENCES Ratings (rating)
 );! constraints apply to all rows, not just rows for a specific attribute type
  • 20. ©  2011  –  2013  PERCONA   Data Types Don’t Work INSERT INTO Attributes (entity, attribute, value)
 VALUES (207468, 'budget', 'banana');! database cannot prevent application storing nonsensical data
  • 21. ©  2011  –  2013  PERCONA   Add Typed Value Columns? CREATE TABLE Attributes (
 entity INT NOT NULL,
 attribute VARCHAR(20) NOT NULL,
 intvalue BIGINT,
 floatvalue FLOAT,
 textvalue TEXT,
 datevalue DATE,
 datetimevalue DATETIME, 
 FOREIGN KEY (entity) 
 REFERENCES Title (id)
 );! now my application needs to know which data type column to use for each attribute when inserting and querying
  • 22. ©  2011  –  2013  PERCONA   Searches Don’t Scale •  You must hard-code each attribute name, –  One JOIN per attribute! •  Alternatively, you can query all attributes, but the result is one attribute per row: SELECT attribute, value 
 FROM Attributes 
 WHERE entity = 207468;! –  …and sort it out in your application code.
  • 23. ©  2011  –  2013  PERCONA   Indexes and Storage Are Inefficient •  Many rows, with few distinct attribute names. –  Poor index cardinality. •  The entity and attribute columns use extra space for every attribute of every “row.” –  In a conventional table, the entity is the primary key, so it’s stored only once per row. –  The attribute name is in the table definition, so it’s stored only once per table.
  • 24. ©  2011  –  2013  PERCONA   Pros and Cons •  Good solution: –  No ALTER TABLE needed again—ever! –  Supports ultimate flexibility, potentially any “row” can have its own distinct set of attributes.
  • 25. ©  2011  –  2013  PERCONA   Pros and Cons •  Bad solution: –  SQL operations become more complex. –  Lots of application code required to reinvent features that an RDBMS already provides. –  Doesn’t scale well—pivots required.
  • 26. ©  2011  –  2013  PERCONA   CLASS TABLE INHERITANCE
  • 27. ©  2011  –  2013  PERCONA   Subtypes •  Titles includes: –  Films –  TV shows –  TV episodes –  Video games •  Some attributes apply to all, other attributes apply to one subtype or the other.
  • 28. ©  2011  –  2013  PERCONA   Title Table CREATE TABLE Title (! id int(11) NOT NULL AUTO_INCREMENT PRIMARY KEY,! title text NOT NULL,! imdb_index varchar(12) DEFAULT NULL,! kind_id int(11) NOT NULL,! production_year int(11) DEFAULT NULL,! imdb_id int(11) DEFAULT NULL,! phonetic_code varchar(5) DEFAULT NULL,! episode_of_id int(11) DEFAULT NULL,! season_nr int(11) DEFAULT NULL,! episode_nr int(11) DEFAULT NULL,! series_years varchar(49) DEFAULT NULL,! title_crc32 int(10) unsigned DEFAULT NULL! );! only for tv shows
  • 29. ©  2011  –  2013  PERCONA   Title Table with Subtype Tables CREATE TABLE Title (! id int(11) NOT NULL AUTO_INCREMENT PRIMARY KEY,! title text NOT NULL,! imdb_index varchar(12) DEFAULT NULL,! kind_id int(11) NOT NULL,! production_year int(11) DEFAULT NULL,! imdb_id int(11) DEFAULT NULL,! phonetic_code varchar(5) DEFAULT NULL,! title_crc32 int(10) unsigned DEFAULT NULL,! PRIMARY KEY (id)! );! ! CREATE TABLE Film (! id int(11) NOT NULL PRIMARY KEY,
 aspect_ratio varchar(20),! FOREIGN KEY (id) REFERENCES Title(id)! );! ! CREATE TABLE TVShow (! id int(11) NOT NULL PRIMARY KEY,! episode_of_id int(11) DEFAULT NULL,! season_nr int(11) DEFAULT NULL,! episode_nr int(11) DEFAULT NULL,! series_years varchar(49) DEFAULT NULL,! FOREIGN KEY (id) REFERENCES Title(id)! );! Title   Film   TVShow   1:1 1:1
  • 30. ©  2011  –  2013  PERCONA   Adding a New Subtype •  Create a new table—without locking existing tables. CREATE TABLE VideoGames (
 id int(11) NOT NULL PRIMARY KEY,
 platforms varchar(100) NOT NULL,
 FOREIGN KEY (id) 
 REFERENCES Title(id)
 );!
  • 31. ©  2011  –  2013  PERCONA   Pros and Cons •  Good solution: –  Best to support a finite set of subtypes, which are likely unchanging after creation. –  Data types and constraints work normally. –  Easy to create or drop subtype tables. –  Easy to query attributes common to all subtypes. –  Subtype tables are shorter, indexes are smaller.
  • 32. ©  2011  –  2013  PERCONA   Pros and Cons •  Bad solution: –  Adding one entry takes two INSERT statements. –  Querying attributes of subtypes requires a join. –  Querying all types with subtype attributes requires multiple joins (as many as subtypes). –  Adding a common attribute locks a large table. –  Adding an attribute to a populated subtype locks a smaller table.
  • 33. ©  2011  –  2013  PERCONA   SERIALIZED LOB
  • 34. ©  2011  –  2013  PERCONA   What is Serializing? •  Objects in your applications can be represented in serialized form—i.e., convert the object to a scalar string that you can save and load back as an object. –  Java objects implementing Serializable and processed with writeObject()! –  PHP variables processed with serialize()! –  Python objects processed with pickle.dump()! –  Data encoded with XML, JSON, YAML, etc.
  • 35. ©  2011  –  2013  PERCONA   What Is a LOB? •  The BLOB or TEXT datatypes can store long sequences of bytes or characters, such as a string. •  You can store the string representing your object into a single BLOB or TEXT column. –  You don’t need to define SQL columns for each field of your object.
  • 36. ©  2011  –  2013  PERCONA   Title Table with Serialized LOB CREATE TABLE Title (
 id int(11) NOT NULL AUTO_INCREMENT PRIMARY KEY,
 title text NOT NULL,
 imdb_index varchar(12) DEFAULT NULL,
 kind_id int(11) NOT NULL,
 production_year int(11) DEFAULT NULL,
 imdb_id int(11) DEFAULT NULL,
 phonetic_code varchar(5) DEFAULT NULL,
 title_crc32 int(10) unsigned DEFAULT NULL
 extra_info TEXT 
 );! holds everything else, plus anything we didn’t think of
  • 37. ©  2011  –  2013  PERCONA   Adding a New Attribute UPDATE Title
 SET extra_info = 
 '{
 episode_of_id: 1291895, 
 season_nr: 5, 
 episode_nr: 6
 }'
 WHERE id = 1292057;! JSON example
  • 38. ©  2011  –  2013  PERCONA   Using XML in MySQL •  MySQL has limited support for XML. SELECT id, title, 
 ExtractValue(extra_info, '/episode_nr') 
 AS episode_nr 
 FROM Title
 WHERE ExtractValue(extra_info, 
 '/episode_of_id') = 1292057;! •  Forces table-scans, not possible to use indexes. http://dev.mysql.com/doc/refman/5.6/en/xml-functions.html
  • 39. ©  2011  –  2013  PERCONA   Dynamic Columns in MariaDB CREATE TABLE Title (
 id int(11) NOT NULL AUTO_INCREMENT PRIMARY KEY,
 title text NOT NULL,
 ...
 extra_info BLOB 
 );! INSERT INTO Title (title, extra_info) 
 VALUES ('Trials and Tribble-ations', 
 COLUMN_CREATE('episode_of_id', '1291895', 
 'episode_nr', '5',
 'season_nr', '6'));! https://kb.askmonty.org/en/dynamic-columns/
  • 40. ©  2011  –  2013  PERCONA   Pros and Cons •  Good solution: –  Store any object and add new custom fields at any time. –  No need to do ALTER TABLE to add custom fields.
  • 41. ©  2011  –  2013  PERCONA   Pros and Cons •  Bad solution: –  Not indexable. –  Must return the whole object, not an individual field. –  Must write the whole object to update a single field. –  Hard to use a custom field in a WHERE clause, GROUP BY or ORDER BY. –  No support in the database for data types or constraints, e.g. NOT NULL, UNIQUE, FOREIGN KEY.
  • 42. ©  2011  –  2013  PERCONA   INVERTED INDEXES
  • 43. ©  2011  –  2013  PERCONA   Use This with Serialized LOB •  Helps to mitigate some of the weaknesses.
  • 44. ©  2011  –  2013  PERCONA   How This Works •  Create a new table for each field of the LOB that you want to address individually: CREATE TABLE Title_EpisodeOf (
 episode_of_id INT NOT NULL,
 id INT NOT NULL,
 PRIMARY KEY (episode_of_id, id),
 FOREIGN KEY (id)
 REFERENCES Title (id)
 );! here’s where you get the index support
  • 45. ©  2011  –  2013  PERCONA   How This Works •  For each LOB containing an “episode_of_id” field, insert a row to the attribute table with its value. INSERT INTO Title_EpisodeOf 
 VALUES (1291895, 1292057);! •  If another title doesn’t have this field, then you don’t create a referencing row.
  • 46. ©  2011  –  2013  PERCONA   Query for Recent Users SELECT u.*
 FROM Title_EpisodeOf AS e
 JOIN Title AS t USING (id)
 WHERE e.episode_of_id = '1291895';! This is a primary key lookup. It matches only titles that have such a field, and whose value matches the condition
  • 47. ©  2011  –  2013  PERCONA   Pros and Cons •  Good solution: –  Preserves the advantage of Serialized LOB. –  Adds support for SQL data types, and UNIQUE and FOREIGN KEY constraints. –  You can index any custom field—without locking the master table.
  • 48. ©  2011  –  2013  PERCONA   Pros and Cons •  Bad solution: –  Redundant storage. –  It’s up to you to keep attribute tables in sync manually (or with triggers). –  Requires JOIN to fetch the master row. –  You must plan which columns you want to be indexed (but this is true of conventional columns too). –  Still no support for NOT NULL constraint.
  • 49. ©  2011  –  2013  PERCONA   ONLINE SCHEMA CHANGES
  • 50. ©  2011  –  2013  PERCONA   pt-online-schema-change •  Performs online, non-blocking ALTER TABLE. –  Captures concurrent updates to a table while restructuring. –  Some risks and caveats exist; please read the manual and test carefully. •  Free tool—part of Percona Toolkit. –  http://www.percona.com/doc/percona-toolkit/pt-online-schema- change.html
  • 51. ©  2011  –  2013  PERCONA   How MySQL Does ALTER TABLE 1.  Lock the table. 2.  Make a new, empty the table like the original. 3.  Modify the columns of the new empty table. 4.  Copy all rows of data from original to new table. 5.  Swap the old and new tables. 6.  Unlock the tables drop the original.
  • 52. ©  2011  –  2013  PERCONA   How pt-osc Does ALTER TABLE Lock the table. 1.  Make a new, empty the table like the original. 2.  Modify the columns of the new empty table. 3.  Copy all rows of data from original to new table. a.  Iterate over the table in chunks, in primary key order. b.  Use triggers to capture ongoing changes in the original, and apply them to the new table. 4.  Swap the tables, then drop the original. Unlock the tables.
  • 53. ©  2011  –  2013  PERCONA   Visualize This (1) cast_info after trigger cast_info new
  • 54. ©  2011  –  2013  PERCONA   Visualize This (2) cast_info after trigger cast_info new
  • 55. ©  2011  –  2013  PERCONA   Visualize This (3) cast_info after trigger cast_info new
  • 56. ©  2011  –  2013  PERCONA   Visualize This (4) cast_info after trigger cast_info new
  • 57. ©  2011  –  2013  PERCONA   Visualize This (5) cast_info after trigger cast_info new
  • 58. ©  2011  –  2013  PERCONA   Visualize This (6) cast_info cast_info old DROP
  • 59. ©  2011  –  2013  PERCONA   Adding a New Attribute •  Design the ALTER TABLE statement, but don’t execute it yet. mysql ALTER TABLE cast_info 
 ADD COLUMN source INT NOT NULL;! •  Equivalent pt-online-schema-change command: $ pt-online-schema-change 
 h=localhost,D=imdb,t=cast_info 
 --alter ADD COLUMN source INT NOT NULL!
  • 60. ©  2011  –  2013  PERCONA   Execute $ pt-online-schema-change h=localhost,D=imdb,t=cast_info 
 --alter ADD COLUMN source INT NOT NULL --execute! ! Altering `imdb`.`cast_info`...! Creating new table...! Created new table imdb._cast_info_new OK.! Altering new table...! Altered `imdb`.`_cast_info_new` OK.! Creating triggers...! Created triggers OK.! Copying approximately 22545051 rows...! Copying `imdb`.`cast_info`: 10% 04:05 remain! Copying `imdb`.`cast_info`: 19% 04:07 remain! Copying `imdb`.`cast_info`: 28% 03:44 remain! Copying `imdb`.`cast_info`: 37% 03:16 remain! Copying `imdb`.`cast_info`: 47% 02:47 remain! Copying `imdb`.`cast_info`: 56% 02:18 remain! Copying `imdb`.`cast_info`: 64% 01:53 remain! Copying `imdb`.`cast_info`: 73% 01:28 remain! Copying `imdb`.`cast_info`: 82% 00:55 remain! Copying `imdb`.`cast_info`: 91% 00:26 remain! Copied rows OK.! Swapping tables...! Swapped original and new tables OK.! Dropping old table...! Dropped old table `imdb`.`_cast_info_old` OK.! Dropping triggers...! Dropped triggers OK.! Successfully altered `imdb`.`cast_info`.! ! !
  • 61. ©  2011  –  2013  PERCONA   Self-Adjusting •  Copies rows in chunks which the tool sizes dynamically. •  The tool throttles back if it increases load too much or if it causes any replication slaves to lag. •  The tool tries to set its lock timeouts to let applications be more likely to succeed.
  • 62. ©  2011  –  2013  PERCONA   Why Shouldn’t I Use This? •  Is your table small enough that ALTER is already quick enough? •  Is your change already very quick, for example DROP KEY in InnoDB? •  Will pt-online-schema-change take too long or increase the load too much?
  • 63. ©  2011  –  2013  PERCONA   Pros and Cons •  Good solution: –  ALTER TABLE to add conventional columns without the pain of locking.
  • 64. ©  2011  –  2013  PERCONA   Pros and Cons •  Bad solution: –  Can take up to 4× more time than ALTER TABLE. –  Table must have a PRIMARY key. –  Table must not have triggers. –  No need if your table is small and ALTER TABLE already runs quickly enough. –  No need for some ALTER TABLE operations that don’t restructure the table (e.g. dropping indexes, adding comments).
  • 65. ©  2011  –  2013  PERCONA   NON-RELATIONAL DATABASES
  • 66. ©  2011  –  2013  PERCONA   No Rules to Break •  To be relational, a table must have a fixed set of columns on every row. •  No such rule exists in a non-relational model; you can store a distinct set of fields per record. •  No schema makes NoSQL more flexible.
  • 67. ©  2011  –  2013  PERCONA   Adding a New Attribute •  Document-oriented databases are designed to support defining distinct attributes per document. •  But you lose advantages of relational databases: –  Data types –  Constraints –  Uniform structure of records
  • 68. ©  2011  –  2013  PERCONA   SUMMARY
  • 69. ©  2011  –  2013  PERCONA   Summary Solu%on   Lock-­‐free   Flexible   Select   Filter   Indexed   Data  Types   Constraints   Extra   Columns   no*   no   yes   yes   yes*   no   no   EAV   yes   yes   yes*   yes   yes*   no*   no   CTI   no*   no   yes   yes   yes   yes   yes   LOB   yes   yes   no   no   no   no   no   Inverted   Index   yes   yes   yes   yes   yes   yes   yes   OSC   yes   no   yes   yes   yes   yes   yes   NoSQL   yes   yes   yes   yes   yes   no*   no   * conditions or exceptions apply.
  • 70. ©  2011  –  2013  PERCONA   Senior Industry Experts In-Person and Online Classes Custom Onsite Training http://percona.com/training
  • 71. ©  2011  –  2013  PERCONA   “Rate This Session” http://www.percona.com/live/mysql-conference-2013/ sessions/extensible-data-modeling-mysql
  • 72. ©  2011  –  2013  PERCONA   SQL Antipatterns: Avoiding the Pitfalls of Database Programming by Bill Karwin Available in print, epub, mobi, pdf. Delivery options for Kindle or Dropbox. http://pragprog.com/book/bksqla/