Lessons Learned Building A Web 2.0 Application Using Mysql
Lessons Learned Building A Web 2.0 Application Using Mysql
0 Application
using MySQL
Who are we?
• We’re missing a vowl (must be web
2.0!)
• 7 full-time employees
• Several ex-Andover.net (Slashdot) guys
including our CEO
• Boston based but distributed dev
• Been in existence about 2 years
What do we do?
• We’re all about
• Create and share reading lists
• Merge and filter feeds
• Publish widgets
• Future: Do more advanced stuff with
feeds
Why “Grazr”
“I’m actually coming to the conclusion that the
whole subscriptions mindset is a problem and that
in future we’ll ‘graze’‚ for the most part instead of
subscribing. As Zigbee sensors, RFID chips and
GPS trackers proliferate we’ll be drowning in an
RSS-everywhere world if we don’t change our
approach.”
James Corbett “Eirepreneur”
http://eirepreneur.blogs.com
January 2006
http://grazr.com
http://vibemetrix.com
The heart of our
system:
Lots of Mistakes
• Warning: Some of these lessons will
seem obvious
• Hindsight is 20/20
• We *did* have reasons for many of
these at the time (good ones?)
• Tried a lot of experimental
configurations
Lesson 1:
Beware arch. momentum
• Early system decisions affected later
architecture, even when abandoned
• Careful about exotic w/out good reason
• db read on our feed processing web
boxes
• The traditional setup, splitting MySQL
from Apache = system much happier
Lesson 2:
Scaling
• “Don’t worry about scaling” - 37 signals
• “You must build for scaling or die!” -
Friendster
• The truth? Somewhere in the middle
• Understand growth and scaling patterns
but don’t build up front
• Your scaling plan: wrong in some way
Overemphasis on scaling
• Two hosted data centers + one
test/backup data center
• Geographically separated
• Over provisioned
• 18 servers, architecture mirrored
• Traffic could have been served from
two or three machines
“Skynet Jr.”
Lesson 3:
Limits of Testing
• Startup reality: not enough time for
thorough testing
• Replication testing and simulation
• Speed good enough, even cross
country
• Problem: real world system behaved
differently
Lesson 4:
Replication is fast, until it isn’t
• We knew better: but empirical testing
seemed OK
• Asynchronous nature of repl.
sometimes hard to code, careful with
state
• Some of our code treated it as
synchronous (Fail!)
• Smarter code was slow (retries, polls)
Lesson 5:
Memcached is your friend
• Excellent tool in the scaling toolbox
• Classic cache: limiting touching your
database is good!
• Added benefit: on top of repl.
synchronizer
• Good temporary storage for async.
proc.
Lesson 6:
Sphinx is your other friend
• FULLTEXT was too slow for amount of
data
• Sphinx works together with MySQL for
complete search solution
• Use Sphinx to obtain your primary keys
of what you are searching for!
• Use Sphinx for ordering
• Made it possible to switch to InnoDB!
• Joins with Sphinx storage engine
Lesson 7:
Bulk insert / lazy write
• Obvious: If you don’t need it now, do it
later
• Disconnected / async good in these
cases
• If you can do it later, glom many
together (bulk)
• Much better write perf.
Lesson 8:
User experience vs. Scaling
• Emphasis on scaling hurt user
experience
• Characterize lazy vs. user affecting
transactions
• Fast, data correct transactions = single
data store (no read/write io split) or a
sync buffer (memcached)
Lesson 9:
Instrumentation
• Visibility into system good
• More data = better
• Non live testing != reality
• Scaling is iterative process, requires
feedback loop
• Nagios, Cacti, SHOW GLOBAL
STATUS
Lesson 10:
Try new things
• Best practices are good, but new ideas
sometimes better!
• Memcached as write/bulk buffer, good
results
• UDF, clever replication uses, triggers,
virtual servers, background async.
daemons
• MogileFS (?)
Lesson 11:
Everyone has the same problems
• If you can, re-use!
• Obvious: MySQL, Apache (not re-
writing db’s, webservers)
• Search Engines
• Less obvious: building batch job
processor, others have done this better!
(Gearman)
Lesson 12:
Accept change
Vibemetrix
item_id-s items_dist items_dist
item_id
If feed/outline is in memcached…
Patrick Galbraith
Senior Programmer - Grazr, Inc.
Mike Kowalchik
CTO - Grazr, Inc.
Jimmy Guerrero
Sr Product Marketing Manager - Sun Microsystems, Database Group
Copyright 2008 MySQL AB The World’s Most Popular Open Source Database 1
• Sun – MySQL Overview
• Brief Introduction to memcached
• “Lessons Learned” - Grazr
• Memcached Solutions from Sun
• Next Steps plus Q & A
Copyright 2008 MySQL AB The World’s Most Popular Open Source Database 2
Established & Emerging Companies
Web 2.0
Enterprise 2.0
craigslist
SaaS
Telecom
Copyright 2008 MySQL AB The World’s Most Popular Open Source Database 3
Introduction to memcached
Copyright 2008 MySQL AB The World’s Most Popular Open Source Database 4
What is Memcached?
* http://www.socialtext.net/memcached/index.cgi?faq
Copyright 2008 MySQL AB The World’s Most Popular Open Source Database 5
“Cache is King”
• Browser Cache
• Memcached
ms
• Disk Storage
Copyright 2008 MySQL AB The World’s Most Popular Open Source Database 6
What is Memcached?
Copyright 2008 MySQL AB The World’s Most Popular Open Source Database 7
Why Use Memcached With MySQL?
• Enables massive scale-out of dynamic web-sites
• Faster page loads
• Allows for more efficient use of existing database resources
• Can easily utilize idle computing resources
• Dozens to hundreds of nodes can be supported in a
memcached cluster
• No interconnect or proprietary networking required
• Extensible and customizable
Copyright 2008 MySQL AB The World’s Most Popular Open Source Database 8
Who Uses Memcached?
Copyright 2008 MySQL AB The World’s Most Popular Open Source Database 9
Memcached Basics
• Community Driven
• Open Source
• Memcached Server released under BSD license
• http://www.danga.com/memcached/download.bml
Copyright 2008 MySQL AB The World’s Most Popular Open Source Database 10
Memcached Basics
• Runs wherever RAM is available
– Application, Web, Database or dedicated memcached servers
Copyright 2008 MySQL AB The World’s Most Popular Open Source Database 11
What Memcached Isn’t
• Not Reliable/Durable storage
• Not Highly Available
• Not Secure
• Not a database
• Not a database cache
Copyright 2008 MySQL AB The World’s Most Popular Open Source Database 12
How Does Memcached Work?
Memcached
• Two-stage hash
• Similar to giant hash table looking up key = value pairs
• Client hashes the key against a list of servers
• When the server is identified, the client sends its request
• Server performs a hash key lookup for the actual data
Hash Function
• A hash is a procedure for turning data into a small integer
that serves as an index into an array
• Speeds up table lookup or data comparison tasks
Copyright 2008 MySQL AB The World’s Most Popular Open Source Database 13
Memcached Server
• Written in C
• libevent based
– Asynchronous event notification library
Copyright 2008 MySQL AB The World’s Most Popular Open Source Database 14
Typical Use Case: Read/Pass-Through
• Modify the application so information is read from
memcached
• In the event of a cache miss…
– data is loaded from the database
– written into memcached
Copyright 2008 MySQL AB The World’s Most Popular Open Source Database 15
Basic Memcached Example
X Y Z
Client X
mc mc mc
1) set key “1” with value “abc”
2) hashes the key against server list
hash server list get key
3) Server B is selected
select server connect 4) connects to Server B and sets key
connect get value
set key value
Client Z
1) get key “1”
2) connects to Server B
ms ms ms
3) requests “1” and gets value “abc”
A B C
key = value
1 = abc
Copyright 2008 MySQL AB The World’s Most Popular Open Source Database 16
Solutions
Jimmy Guerrero
Sr Product Marketing Manager
Sun Microsystems - Database Group
jimmy@mysql.com
Copyright 2008 MySQL AB The World’s Most Popular Open Source Database 17
Memcached for MySQL
• Support is built into your MySQL Enterprise subscription
http://www.mysql.com/products/enterprise/memcached.html
• MySQL Enterprise
– 24x7 Production Support
– Enterprise Monitor
– MySQL Enterprise Server
– Additional Add-ons Available
Copyright 2008 MySQL AB The World’s Most Popular Open Source Database 18
Why Memcached with MySQL?
• Enables massive scale-out of dynamic web-sites
• Faster page loads
• Allows for more efficient use of existing database resources
• Can easily utilize idle computing resources
• Dozens to hundreds of nodes can be supported in a
memcached cluster
• No interconnect or proprietary networking required
• Extensible and customizable
Copyright 2008 MySQL AB The World’s Most Popular Open Source Database 19
Next Steps
Memcached for MySQL -
http://www.mysql.com/products/enterprise/memcached.html
http://mysql.com/news-and-events/on-demand-webinars/display-od-158.html
Whitepapers - http://www.mysql.com/why-mysql/white-papers/
Documentation -
http://dev.mysql.com/doc/refman/6.0/en/ha-memcached.html
Copyright 2008 MySQL AB The World’s Most Popular Open Source Database 20
Questions?
Patrick Galbraith
Senior Programmer - Grazr, Inc.
Mike Kowalchik
CTO - Grazr, Inc.
Jimmy Guerrero
Sr Product Marketing Manager - Sun Microsystems, Database Group
Copyright 2008 MySQL AB The World’s Most Popular Open Source Database 21