Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
100% found this document useful (6 votes)
2K views

Scaling MySQL and Java in High Write Throughput Environments Presentation

We present the backend architecture behind Spinn3r – our scalable web and blog crawler. Most existing work in scaling MySQL has been around high read throughput environments similar to web applications. In contrast, at Spinn3r we needed to complete thousands of write transactions per second in order to index the blogosphere at full speed. We have achieved this through our ground up development of a fault tolerant distributed database and compute infstructure all built on top of cheap commodity hardware. We’ve built out a number of technologies on top of MySQL that help enable us to easily scale operations. We’ve implemented an Open Source load balancing JDBC driver named lbpool. (http://code.tailrank.com/lbpool). Lbpool allows us to loosely couple our MySQL slaves which allow us to gracefully handle system failures. It also supports load balancing, reprovisioning, slave lag, and other advanced features not available in the stock MySQL JDBC driver. We’ve also built out a sharded database similar to infrastructure built at other companies such as Google (Adwords) and Yahoo (Flickr). Our sharded DB has a number of interesting properties including ultra high throughput requirements (we process 52TB per month), distributed sequence generation, and query plan execution. - Kevin Burton (Tailrank), Jonathan Moore (Tailrank/spinn3r)

Uploaded by

zmg
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (6 votes)
2K views

Scaling MySQL and Java in High Write Throughput Environments Presentation

We present the backend architecture behind Spinn3r – our scalable web and blog crawler. Most existing work in scaling MySQL has been around high read throughput environments similar to web applications. In contrast, at Spinn3r we needed to complete thousands of write transactions per second in order to index the blogosphere at full speed. We have achieved this through our ground up development of a fault tolerant distributed database and compute infstructure all built on top of cheap commodity hardware. We’ve built out a number of technologies on top of MySQL that help enable us to easily scale operations. We’ve implemented an Open Source load balancing JDBC driver named lbpool. (http://code.tailrank.com/lbpool). Lbpool allows us to loosely couple our MySQL slaves which allow us to gracefully handle system failures. It also supports load balancing, reprovisioning, slave lag, and other advanced features not available in the stock MySQL JDBC driver. We’ve also built out a sharded database similar to infrastructure built at other companies such as Google (Adwords) and Yahoo (Flickr). Our sharded DB has a number of interesting properties including ultra high throughput requirements (we process 52TB per month), distributed sequence generation, and query plan execution. - Kevin Burton (Tailrank), Jonathan Moore (Tailrank/spinn3r)

Uploaded by

zmg
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

Scaling MySQL and Java in

High Write Throughput


Environments
How we built Spinn3r

1
What is Spinn3r?
• Licensed weblog crawler
• 500k posts per hour (RSS+HTML)
• 3.5TB of content
• 10 months of blog archives
• 3B documents
• 80Mb /s - 24/7

2
Hardware
• ~40 servers
– Quad Core
– 8GB memory
– Gigabit ethernet
– Dual SATA (software RAID 0)
• Moving to SSD

3
Write Throughput
• 90% write, 10% read
• MyISAM didn’t scale
– Too many seeks in high write load
• InnoDB with write ahead log
– 1/5th of effective disk bandwidth
– Improve the fuzzy checkpointing logic
– Just continually write memory images (log
structured)
– 1.5 minutes to write an 8G image

4
Database Sharding
• Split data across shards based on PK
– hashcode of URL
• Range routing
• Limitations
– No triggers
– No foreign keys
– No transactions
• Similar philosophy to Bigtable, S3,
Dynamo, etc

5
Shard Architecture

6
Query Limitations
• No functions in WHERE clauses
• LIMIT required
• Query should be deterministic
– ORDER BY
– ID = N
• Must order by some column to page
• No offset
• No aggregate functions

7
Shard Insertion
• Bulk insert data
– Custom API
– Operate on lists, commit every N records
or T minutes.
– INSERT … ON DUPLICATE KEY UPDATE
• Parallel dispatch architecture

8
In-memory Storage
• Metadata
– queue
– graph
• Deprecated memcached
• Allows InnoDB to execute at speed
• WAL allows disk to write at about
40MB/s

9
On-disk Storage
• 2.5 TB of content (full HTML and RSS)
• Numerous backup copies
• RAID caching controllers with BBU
• InnoDB blobs with to append-only and
‘eventually immutable’ tables.
• Gzip compressed (3x savings)
– Reduces the # of IOs by trading CPU

10
Resource/Primary Key
• Key is truncate(SHA1(resource+secret))
• Deterministic mechanism for key
generation
– works across robots
• Works well with shards
• Routable
• Decentralized
• Avoid clustered indexes

11
Distributed Lock Manager
• acquire( lock )
• renew( lock )
• Similar to Google’s chubby
• See Paxos algorithm for distributed
consensus
• Good for master servers, failover, etc.
• We use this for master queue promotion

12
Sequence Generation
• Need monotonically increasing
sequences
– Paging through results
• Settled on global prefix+local suffix with
a distributed lock manager
• Used in shards to page across results.
– paging on time is hard/impossible due to
collision

13
Task/Queue
• Similar to MapReduce
• Central queue
– Fault tolerant
– Sharded for scale
• Distributed tasks
• Executes robot jobs over 30 machines
• Supports heterogeneous machines

14
JDBC Load Balancing
• Created lbpool
– Licensed to MySQL (Open Source)
• Load balanced connection pool
• Replication aware
• Handles runtime rebalancing
– slave lag
– broken slaves
• Fault tolerant

15
User Defined Functions
• Necessary for distributed databases
• Row level locks to avoid race conditions
• Increment
• Bloom filters
• Zeta codes
• Histographs

16
Solid State Storage
• NAND based flash devices
• SUPER fast reads
– 15k 4k reads per second
– ~250/s for HDDs
• Regular performance writes
– Small InnoDB buffer pool
• Historically avoided to due high MTBF

17
Current SSD state
• $30 / GB
• 16/32/64 GB capacity
• Mtron
• Memoright
• STEC
• ~ 100MB/s sequential write
• ~ 120MB/s sequential read

18
The Future of DB Storage
• SSD for in-memory data
• 10x performance boost for 20% cost
increase.
– $30/GB now -> $15/GB in Q2-Q3
• Mainstream in 2009
• MUCH more data per node
• Log structured databases
• See benchmarks

19
Questions
• Further reading:
– feedblog.org
– spinn3r.com
– feedblog.org/category/ssd/
– code.google.com/p/mysql-lbpool/
– Paxos algorithm
– Chubby

20

You might also like