Scaling MySQL and Java in High Write Throughput Environments Presentation

We present the backend architecture behind Spinn3r – our scalable web and blog crawler. Most existing work in scaling MySQL has been around high read throughput environments similar to web applications. In contrast, at Spinn3r we needed to complete thousands of write transactions per second in order to index the blogosphere at full speed. We have achieved this through our ground up development of a fault tolerant distributed database and compute infstructure all built on top of cheap commodity hardware. We’ve built out a number of technologies on top of MySQL that help enable us to easily scale operations. We’ve implemented an Open Source load balancing JDBC driver named lbpool. (http://code.tailrank.com/lbpool). Lbpool allows us to loosely couple our MySQL slaves which allow us to gracefully handle system failures. It also supports load balancing, reprovisioning, slave lag, and other advanced features not available in the stock MySQL JDBC driver. We’ve also built out a sharded database similar to infrastructure built at other companies such as Google (Adwords) and Yahoo (Flickr). Our sharded DB has a number of interesting properties including ultra high throughput requirements (we process 52TB per month), distributed sequence generation, and query plan execution. - Kevin Burton (Tailrank), Jonathan Moore (Tailrank/spinn3r)

Uploaded by

zmg

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (6 votes)

2K views

Scaling MySQL and Java in High Write Throughput Environments Presentation

Uploaded by

zmg

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 20

Scaling MySQL and Java in

High Write Throughput

Environments
How we built Spinn3r

1
What is Spinn3r?
• Licensed weblog crawler
• 500k posts per hour (RSS+HTML)
• 3.5TB of content
• 10 months of blog archives
• 3B documents
• 80Mb /s - 24/7

2
Hardware
• ~40 servers
– Quad Core
– 8GB memory
– Gigabit ethernet
– Dual SATA (software RAID 0)
• Moving to SSD

3
Write Throughput
• 90% write, 10% read
• MyISAM didn’t scale
– Too many seeks in high write load
• InnoDB with write ahead log
– 1/5th of effective disk bandwidth
– Improve the fuzzy checkpointing logic
– Just continually write memory images (log
structured)
– 1.5 minutes to write an 8G image

4
Database Sharding
• Split data across shards based on PK
– hashcode of URL
• Range routing
• Limitations
– No triggers
– No foreign keys
– No transactions
• Similar philosophy to Bigtable, S3,
Dynamo, etc

5
Shard Architecture

6
Query Limitations
• No functions in WHERE clauses
• LIMIT required
• Query should be deterministic
– ORDER BY
– ID = N
• Must order by some column to page
• No offset
• No aggregate functions

7
Shard Insertion
• Bulk insert data
– Custom API
– Operate on lists, commit every N records
or T minutes.
– INSERT … ON DUPLICATE KEY UPDATE
• Parallel dispatch architecture

8
In-memory Storage
• Metadata
– queue
– graph
• Deprecated memcached
• Allows InnoDB to execute at speed
• WAL allows disk to write at about
40MB/s

9
On-disk Storage
• 2.5 TB of content (full HTML and RSS)
• Numerous backup copies
• RAID caching controllers with BBU
• InnoDB blobs with to append-only and
‘eventually immutable’ tables.
• Gzip compressed (3x savings)
– Reduces the # of IOs by trading CPU

10
Resource/Primary Key
• Key is truncate(SHA1(resource+secret))
• Deterministic mechanism for key
generation
– works across robots
• Works well with shards
• Routable
• Decentralized
• Avoid clustered indexes

11
Distributed Lock Manager
• acquire( lock )
• renew( lock )
• Similar to Google’s chubby
• See Paxos algorithm for distributed
consensus
• Good for master servers, failover, etc.
• We use this for master queue promotion

12
Sequence Generation
• Need monotonically increasing
sequences
– Paging through results
• Settled on global prefix+local suffix with
a distributed lock manager
• Used in shards to page across results.
– paging on time is hard/impossible due to
collision

13
Task/Queue
• Similar to MapReduce
• Central queue
– Fault tolerant
– Sharded for scale
• Distributed tasks
• Executes robot jobs over 30 machines
• Supports heterogeneous machines

14
JDBC Load Balancing
• Created lbpool
– Licensed to MySQL (Open Source)
• Load balanced connection pool
• Replication aware
• Handles runtime rebalancing
– slave lag
– broken slaves
• Fault tolerant

15
User Defined Functions
• Necessary for distributed databases
• Row level locks to avoid race conditions
• Increment
• Bloom filters
• Zeta codes
• Histographs

16
Solid State Storage
• NAND based flash devices
• SUPER fast reads
– 15k 4k reads per second
– ~250/s for HDDs
• Regular performance writes
– Small InnoDB buffer pool
• Historically avoided to due high MTBF

17
Current SSD state
• $30 / GB
• 16/32/64 GB capacity
• Mtron
• Memoright
• STEC
• ~ 100MB/s sequential write
• ~ 120MB/s sequential read

18
The Future of DB Storage
• SSD for in-memory data
• 10x performance boost for 20% cost
increase.
– $30/GB now -> $15/GB in Q2-Q3
• Mainstream in 2009
• MUCH more data per node
• Log structured databases
• See benchmarks

19
Questions
• Further reading:
– feedblog.org
– spinn3r.com
– feedblog.org/category/ssd/
– code.google.com/p/mysql-lbpool/
– Paxos algorithm
– Chubby

Guidewire Datamodel Sample
78% (18)
Guidewire Datamodel Sample
4 pages
Q Tips: Fast, Scalable, and Maintainable Kdb+
From Everand
Q Tips: Fast, Scalable, and Maintainable Kdb+
Nick Psaris
No ratings yet
Designing Data Intensive Applications
25% (4)
Designing Data Intensive Applications
61 pages
TSQL
No ratings yet
TSQL
56 pages
Designing Data Intensive Applications: Part 1: Storage and Retrieval
No ratings yet
Designing Data Intensive Applications: Part 1: Storage and Retrieval
85 pages
Building Scalable Web Sites
No ratings yet
Building Scalable Web Sites
21 pages
Exploring Hadoop Ecosystem (Volume 2): Stream Processing
From Everand
Exploring Hadoop Ecosystem (Volume 2): Stream Processing
Wei Liu
No ratings yet
Real World Web: Performance & Scalability
100% (26)
Real World Web: Performance & Scalability
189 pages
4.1_intro_nosql
No ratings yet
4.1_intro_nosql
43 pages
20 - 04 - 2024 Cheatsheet
No ratings yet
20 - 04 - 2024 Cheatsheet
3 pages
4.1 Intro Nosql
No ratings yet
4.1 Intro Nosql
43 pages
4.1 Intro Nosql
No ratings yet
4.1 Intro Nosql
45 pages
5.1 Intro Nosql
No ratings yet
5.1 Intro Nosql
22 pages
Hbase in Practice
No ratings yet
Hbase in Practice
46 pages
04-2 Intro Nosql
No ratings yet
04-2 Intro Nosql
18 pages
BigData_NoSQL
No ratings yet
BigData_NoSQL
30 pages
NO-SQL
No ratings yet
NO-SQL
32 pages
Vineet Gupta - GM - Software Engineering - Directi: Intelligent People. Uncommon Ideas
No ratings yet
Vineet Gupta - GM - Software Engineering - Directi: Intelligent People. Uncommon Ideas
73 pages
Seminar Nosql
No ratings yet
Seminar Nosql
56 pages
Storagesystems
No ratings yet
Storagesystems
41 pages
Bigdata and Nosql DBS: Piyushgupta July2013
No ratings yet
Bigdata and Nosql DBS: Piyushgupta July2013
27 pages
MySQL Cluster - Voxxed Days Belgrade 2015
No ratings yet
MySQL Cluster - Voxxed Days Belgrade 2015
32 pages
CassandraTraining v3.3.4
100% (1)
CassandraTraining v3.3.4
183 pages
NoSQL Database Technology - A Survey and Comparison of Systems
No ratings yet
NoSQL Database Technology - A Survey and Comparison of Systems
44 pages
4th Unit
No ratings yet
4th Unit
52 pages
Lessons Learned Building A Web 2.0 Application Using Mysql
100% (3)
Lessons Learned Building A Web 2.0 Application Using Mysql
50 pages
1.mysql: Aim: To Study The Top 10 Databases Management Systems
No ratings yet
1.mysql: Aim: To Study The Top 10 Databases Management Systems
9 pages
CC presentation GAE
No ratings yet
CC presentation GAE
14 pages
Memcache FB PDF
No ratings yet
Memcache FB PDF
14 pages
Nosqldbs
No ratings yet
Nosqldbs
149 pages
NoSQL Unit 1 & 2 QnA
No ratings yet
NoSQL Unit 1 & 2 QnA
18 pages
List of NOSQL Database
No ratings yet
List of NOSQL Database
23 pages
database
No ratings yet
database
4 pages
lec09-no-sql
No ratings yet
lec09-no-sql
42 pages
Database Types
No ratings yet
Database Types
4 pages
Big data Slides
No ratings yet
Big data Slides
26 pages
NoSQL MongoDB HBase Cassandra
100% (1)
NoSQL MongoDB HBase Cassandra
142 pages
Cheat Sheet v4
No ratings yet
Cheat Sheet v4
3 pages
MySQL Cluster Sometimes SQL UC2011
No ratings yet
MySQL Cluster Sometimes SQL UC2011
31 pages
Rwws Mysql 2006
No ratings yet
Rwws Mysql 2006
73 pages
cockroach usecases and syntax
No ratings yet
cockroach usecases and syntax
4 pages
MongoDB Architecture Guide PDF
No ratings yet
MongoDB Architecture Guide PDF
17 pages
261 Points by Aespinoza 677 Days Ago - Comments
No ratings yet
261 Points by Aespinoza 677 Days Ago - Comments
7 pages
Linux and H/W Optimizations For Mysql: Yoshinori Matsunobu
No ratings yet
Linux and H/W Optimizations For Mysql: Yoshinori Matsunobu
160 pages
Fdocuments - in Nosql-Seminar
No ratings yet
Fdocuments - in Nosql-Seminar
40 pages
Introduction To NoSQL
No ratings yet
Introduction To NoSQL
29 pages
Seminar Topic Nosql
No ratings yet
Seminar Topic Nosql
73 pages
Couchbase Under The Hood WP
No ratings yet
Couchbase Under The Hood WP
36 pages
Systems Design Study Guide
No ratings yet
Systems Design Study Guide
32 pages
777 1651399819 BD Module 5
No ratings yet
777 1651399819 BD Module 5
75 pages
Exam Overview: GCP Data Engineer
100% (5)
Exam Overview: GCP Data Engineer
12 pages
SYSTEM DESIGN.docx (1)
No ratings yet
SYSTEM DESIGN.docx (1)
6 pages
Unit 3
No ratings yet
Unit 3
7 pages
NOSQL Databases
No ratings yet
NOSQL Databases
18 pages
Livejournal'S Backend: Brad Fitzpatrick Mark Smith
100% (3)
Livejournal'S Backend: Brad Fitzpatrick Mark Smith
70 pages
41 NoSQL Introduction.pptx
No ratings yet
41 NoSQL Introduction.pptx
18 pages
Cheat Sheet v2
No ratings yet
Cheat Sheet v2
3 pages
Nintendo 64 Architecture: Architecture of Consoles: A Practical Analysis, #8
From Everand
Nintendo 64 Architecture: Architecture of Consoles: A Practical Analysis, #8
Rodrigo Copetti
No ratings yet
Distributed Caching & Data Management: Mastering Redis, Memcached, And Apache Ignite Caching
From Everand
Distributed Caching & Data Management: Mastering Redis, Memcached, And Apache Ignite Caching
Rob Botwright
No ratings yet
Mastering Apache Iceberg: Managing Big Data in a Modern Data Lake
From Everand
Mastering Apache Iceberg: Managing Big Data in a Modern Data Lake
Robert Johnson
No ratings yet
Oracle GoldenGate 11g Implementer's guide
From Everand
Oracle GoldenGate 11g Implementer's guide
John P Jeffries
5/5 (1)
Mastering DuckDB: High-Performance Analytics Made Easy
From Everand
Mastering DuckDB: High-Performance Analytics Made Easy
Robert Johnson
No ratings yet
NVMe Performance Hacks
From Everand
NVMe Performance Hacks
Mei Gates
No ratings yet
The Top 20 Design Tips For MySQL Enterprise Data Architects
91% (11)
The Top 20 Design Tips For MySQL Enterprise Data Architects
38 pages
MySQL Proxy: The Complete Tutorial (Full Day) Presentation
100% (4)
MySQL Proxy: The Complete Tutorial (Full Day) Presentation
90 pages
Monitoring Scale-Out With The MySQL Enterprise Monitor
100% (2)
Monitoring Scale-Out With The MySQL Enterprise Monitor
26 pages
Building Scalable: High Performance Datamarts With MySQL
100% (3)
Building Scalable: High Performance Datamarts With MySQL
103 pages
DBSlayer A Simpler Way To Proxy Presentation
100% (1)
DBSlayer A Simpler Way To Proxy Presentation
28 pages
23-34&5) ,-1367&8) 9,0&: ) (/&?1341&@#"A343" (3& BCCD EF4, &GHIGJ7&BCCD
No ratings yet
23-34&5) ,-1367&8) 9,0&: ) (/&?1341&@#"A343" (3& BCCD EF4, &GHIGJ7&BCCD
31 pages
FALLSEM2024-25_SWE2011_ETH_VL2024250103282_2024-08-01_Reference-Material-I
No ratings yet
FALLSEM2024-25_SWE2011_ETH_VL2024250103282_2024-08-01_Reference-Material-I
49 pages
Binary Search Tree: Basic Operations
No ratings yet
Binary Search Tree: Basic Operations
4 pages
Database Programming Section 3 Quiz
No ratings yet
Database Programming Section 3 Quiz
15 pages
DB1 1 - Course Introduction
No ratings yet
DB1 1 - Course Introduction
7 pages
Informix 14.10 Quick Reference Guide: Command Utilities
No ratings yet
Informix 14.10 Quick Reference Guide: Command Utilities
8 pages
A Recommender System For Academic Projects Repositry
No ratings yet
A Recommender System For Academic Projects Repositry
10 pages
PostgreSQL_Objective_Questions_for_Data_Analysts
No ratings yet
PostgreSQL_Objective_Questions_for_Data_Analysts
3 pages
10775a 01
No ratings yet
10775a 01
28 pages
Learning Guide: Tour Service Level III
No ratings yet
Learning Guide: Tour Service Level III
35 pages
Final DB Systems Exam June 2020
No ratings yet
Final DB Systems Exam June 2020
8 pages
Coronel - Ch03 - The Relational Database Model
No ratings yet
Coronel - Ch03 - The Relational Database Model
38 pages
How To Insert Update Delete Search Display Images in SQL Database Using
No ratings yet
How To Insert Update Delete Search Display Images in SQL Database Using
7 pages
Data Storage Mechanism
No ratings yet
Data Storage Mechanism
19 pages
Disk-Partition and File System
No ratings yet
Disk-Partition and File System
5 pages
SVM Data Mobility: Nondisruptive Migration of Workloads Across Clusters
No ratings yet
SVM Data Mobility: Nondisruptive Migration of Workloads Across Clusters
29 pages
SEMrush Keyword Research Checklist
No ratings yet
SEMrush Keyword Research Checklist
1 page
DBMS LESSON ALL Fill in The Blanks
100% (1)
DBMS LESSON ALL Fill in The Blanks
1 page
INOchain Backup Technical One Pager v1.0
No ratings yet
INOchain Backup Technical One Pager v1.0
1 page
PowerBI questions [Autosaved]
No ratings yet
PowerBI questions [Autosaved]
52 pages
SQL Server 2014 AlwaysOn AG Failover
No ratings yet
SQL Server 2014 AlwaysOn AG Failover
22 pages
Quiz2.Working With Unix
No ratings yet
Quiz2.Working With Unix
4 pages
Introduction To HDFS
No ratings yet
Introduction To HDFS
25 pages
Itws Attainments Cse
No ratings yet
Itws Attainments Cse
25 pages
Advantages: (Multiple Database Following Homogenous Environment Following? Each
No ratings yet
Advantages: (Multiple Database Following Homogenous Environment Following? Each
10 pages
DBMS Practical Definition 2015 07 13 09 36 27
No ratings yet
DBMS Practical Definition 2015 07 13 09 36 27
23 pages
Prac Locks
No ratings yet
Prac Locks
2 pages
SQL Sqlite Commands Cheat Sheet PDF
No ratings yet
SQL Sqlite Commands Cheat Sheet PDF
5 pages
MBA SEM II - Timetable - 18-03-2024
No ratings yet
MBA SEM II - Timetable - 18-03-2024
3 pages

Scaling MySQL and Java in High Write Throughput Environments Presentation

Uploaded by

Scaling MySQL and Java in High Write Throughput Environments Presentation

Uploaded by

Scaling MySQL and Java in

High Write Throughput

You might also like