MySQL Cluster Sometimes SQL UC2011
MySQL Cluster Sometimes SQL UC2011
Clients
NDB API
• synchronous replication
locally
• async replication
geographically
• master-slave or multi-
master
• automated conflict
detection and resolution
Out of the box scalability
distributed hash & data partitioning
Application
Authid (PK) Frame Iname Country
1 Albert Camus France
2 Ernest Hemingway USA
3 Johann Goethe Germany
4 Junichiro Tanizaki Japan
Authid (PK) Frame Iname Country Authid (PK) Frame Iname Country
1 Albert Camus France 2 Ernest Hemingway USA
3 Johann Goethe Germany 4 Junichiro Tanizaki Japan
Availability with sub-second failover
Application
Authid (PK) Frame Iname Country
1 Albert Camus France
2 Ernest Hemingway USA
3 Johann Goethe Germany
4 Junichiro Tanizaki Japan
Authid (PK) Frame Iname Country Authid (PK) Frame Iname Country
1 Albert Camus France 2 Ernest Hemingway USA
3 Johann Goethe Germany 4 Junichiro Tanizaki Japan
Key features
Dynamically • Incrementally scale up, out and on-line with application demands
• Linearly scale with distribution awareness
scalable
Ericsson Network
DataBase (NDB): MySQL 5.1: MySQL Cluster 7.0:
Real-Time Disk data Scale-up (multi-
99.99% availability threaded data nodes)
Geo-Replication MySQL Cluster 7.1:
Auto-failover On-line add-node
User-defined MySQL Cluster
In-memory partitioning 4x performance
MySQL 4.1.7: Manager
Scale-out Back-NDB for LDAP
NDB/Cluster NDBINFO – real-
On-line backups integrated time monitoring
MySQL Cluster 6.1:
NDB API direct with MySQL Java Connector (ClusterJ)
255 nodes
C++ Access SQL Access
NDB/J direct Java access
GRANT mrfoo
GRANT mrfoo
App
memcached memcached
memcached
ndb_engine.so
ndb_engine.so
NdbTransaction * tx = ndb->startTransaction();
NdbOperation * op = tx->getNdbOperation(myTable);
op->readTuple(NdbOperation::LM_CommittedRead);
op->equal("code", code);
op->getValue("name", name);
tx->execute( NdbTransaction::Commit );
cout << "name = " << name << endl;
ndb->closeTransaction(tx);
Throughput benchmark 2011
Note: last minute benchmark before the conference to show some new numbers.
4M ops/s
3M ops/s
2M ops/s
updates
1M ops/s
JPA
ClusterJPA
ClusterJPA
JDBC ClusterJ
ClusterJ
MySQL JNI
@PersistenceCapable(table="t_fish_food")
CREATE TABLE t_fish_food(
public interface Fish {
@PrimaryKey
int getId(); id int primary key,
void setId(int id);
@Column(name = "Fishname")
@Index(name="idx_unq_hash_name") Fishname varchar(255)
String getName(); unique index using hash,
void setName(String name);
... ...
}; ) ENGINE = ndbcluster;
ClusterJ find() data
Employee theEmployee =
session.find(Employee.class, 988);
ClusterJ persist()
• Community feature
• Apache module allowing native access to cluster
• REST API speaking JSON or XML
{iata : SFO,
Long : “San Francisco Intl Airport“,
Loc : US}
mod_ndb and AJAX
$.get('http://localhost/demo/iata',
{ "code": value },
function(data) {
var n = JSON.parse(data);
output(n.name);
});
SQL, memcached, ClusterJ, C++,
REST/JSON … which to choose?
• Answer: Use either or both, whatever best fits your application
– Different APIs into MySQL Cluster
– Alternative data stores
• Factors to consider:
– Performance & Scalability
– Developer skills & Familiarity with APIs
– Levels of support
– Access patterns (joins needed? Key/value sufficient?)
– Schema changes (online or schema-less)
• Mix & Match!
– MySQL Cluster allows the same data to be accessed simultaneously
through SQL & NoSQL interfaces
SQL, memcached, ClusterJ, C++,
REST/JSON … which to choose?
SQL
•
Using a standard mod_ndb
•
Joins & complex queries •
REST/JSON
•
Relational model •
HTML
Memcached •
using apache
•
simple to use API
•
key/value
•
driver for many languages
•
ideal as e.g. PHP proxy
ClusterJ
•
simple to use Java API
•
Web & telco C++
•
Object Relational Mapping •
knowledged developer
•
native & fast access to cluster •
super low latency / real-time
Sometimes SQL!
Traditionally
• Very fast primary key operations
• Certain optimized clauses (e.g. IN(....))
• Distributed joins were hard to scale
New in 7.2
• Pushing the execution down into the storage layer,
greatly reduces network trips
• Makes joins scale and up to 40x faster
• expand the use of MySQL Cluster into a broader
range of services and applications
COMPANY OVERVIEW USER PERSPECTIVE
• Division of Docudesk “MySQL Cluster exceeds our requirements for low
• Deliver Document Management SaaS latency, high throughput performance with
continuous availability, in a single solution that
minimizes complexity and overall cost.”
CHALLENGES / OPPORTUNITIES -- Casey Brown, Manager of Dev & DBA Services,
• Provide a single repository for customers to Docudesk
manage, archive, and distribute documents
• Implement scalable, fault tolerant, real time
data management back-end RESULTS
• PHP session state cached for in-service • Successfully deployed document
personalization management solution, eliminating paper
• Store document meta-data, text (as trails from legal processes
BLOBs), ACL, job queues and billing data • Integrate caching and database into one
• Data volumes growing at 2% per day layer, reducing complexity & cost
• Support workload with 50:50 read/write
ratio
SOLUTION • Low latency for real-time user experience
• MySQL Cluster deployed on EC2 and document time-stamping
• Continuous database availability
Star Schema Q1.1 with distributed joins