Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
SlideShare a Scribd company logo
Mashing the Data
Real-Time replication from
MySQL to Google Cloud
● NodeJS
● ZongJi
● Google Cloud Datastore
There are two types of DBAs:
1) DBAs that do backups
2) DBAs that will do backups
● Most used Open source DB - second place overall after Oracle (but almost
● Since 1995
● Currently at version 5.7 (5.7.16 in Oct’16)
● Several forks - MariaDB, Percona
● Several storage engines, most used is InnoDB
● NDB Cluster and Master-Master Replication for HA
* According to http://db-engines.com/en/ranking
A SQL query walks into a bar and sees two tables.
He walks up to them and asks, "Can I join you?"
MySQL replication
● Master - Slave(s)
● Slaves can be Masters in their turn (Master->Slave->Slave->...->Slave)
○ log_slave_updates
● Only data modifying queries are logged (Create, Update, Delete; not
● 2 ½ types of replication
○ Statement Based (SBR) -> binary log records queries (UPDATE … SET ..) which are then
replayed on slave
○ Row Based (RBR) -> binary log records directly the values of the affected row before and
after the change is applied
○ Mixed -> binary log records a mix of SBR and RBR (default is SBR, but for certain
statements + storage engine used, the log is automatically switched to row-based)
Q: Why do you never ask SQL people to help you
move your furniture?
A: They sometimes drop the table
MySQL replication (cont’d)
● SBR is good when changes affect lots of rows (as for e.g. 1k modified rows
we only send a few bytes across the wire)
● SBR has problems when there are inconsistencies between master and
slave or when queries are not deterministic (e.g. UPDATE … SET … LIMIT
● RBR is good in maintaining a better consistency (as every changed row is
● RBR can be problematic when many rows are changed with a single
statement (lots of traffic over the network)
Mashing the data
Google Cloud
What is GCD
● NoSQL document database
● Automatic scaling
● High performance
● Flexible storage
GCD (cont’d)
● Balance of strong and eventual consistency
○ entity lookups by key and ancestor queries always receive strongly consistent data
○ Other queries are eventually consistent
● Encryption at rest
○ encrypts all data before it is written to disk
● Querying of data through GQL
○ Similar with “classic” SQL; e.g. SELECT * FROM myKind WHERE myProp >= 100 AND
myProp < 200 or SELECT * FROM myKind ORDER BY myProp DESC LIMIT 100
● By default all properties are indexed, supports composite indexes (a bit
more work to enable them though)
Our Setup
MySQL Master
MySQL Slave
SBR NodeJS App
Google Cloud
Google Cloud
Node modules
Details about NodeJS App
● Uses ZongJi (https://github.com/nevill/zongji - MySQL binlog listener)
var ZongJi = require('zongji');
var zongji = new ZongJi(config.database);
zongji.on('binlog',function (evt) {doSomething('binlog',evt)})
zongji.on('query', function(evt) {doSomething('query',evt)})
zongji.on('writerows',function(evt) {doSomething('insert',evt)})
zongji.on('updaterows', function(evt) {doSomething('update',evt)})
zongji.on('deleterows', function(evt) {doSomething('delete',evt)})
NodeJS (cont’d)
startAtEnd: true,
includeSchema: {yourDBhere":true,"yourOtherDBHere":true},//config.monitor,
includeEvents: [ 'tablemap', 'writerows', 'updaterows', 'deleterows' , 'query','rotate']
var doSomething = function(type, event) {
//event has a rows attribute containing every modified row
//it also has a tableMap containing table metadata (most important - table name)
NodeJS (last one, I promise)
var sendToDataStore = function(namespace,idfldname,row) {
var k = datastore.key([namespace, row[idfldname]]);
datastore.save({key:k,data:row} ,function(err,res){
if(err) console.log("ERROR",err)
else console.log("OK",JSON.stringify(res))
Demo Time
In case the demo does not work
Thank you!

More Related Content

Mashing the data

  • 1. Mashing the Data Real-Time replication from MySQL to Google Cloud Datastore
  • 2. Ingredients ● MySQL ● NodeJS ● ZongJi ● Google Cloud Datastore
  • 3. There are two types of DBAs: 1) DBAs that do backups 2) DBAs that will do backups
  • 4. MySQL ● Most used Open source DB - second place overall after Oracle (but almost equal)* ● Since 1995 ● Currently at version 5.7 (5.7.16 in Oct’16) ● Several forks - MariaDB, Percona ● Several storage engines, most used is InnoDB ● NDB Cluster and Master-Master Replication for HA * According to http://db-engines.com/en/ranking
  • 5. A SQL query walks into a bar and sees two tables. He walks up to them and asks, "Can I join you?"
  • 6. MySQL replication ● Master - Slave(s) ● Slaves can be Masters in their turn (Master->Slave->Slave->...->Slave) ○ log_slave_updates ● Only data modifying queries are logged (Create, Update, Delete; not Reads) ● 2 ½ types of replication ○ Statement Based (SBR) -> binary log records queries (UPDATE … SET ..) which are then replayed on slave ○ Row Based (RBR) -> binary log records directly the values of the affected row before and after the change is applied ○ Mixed -> binary log records a mix of SBR and RBR (default is SBR, but for certain statements + storage engine used, the log is automatically switched to row-based)
  • 7. Q: Why do you never ask SQL people to help you move your furniture? A: They sometimes drop the table
  • 8. MySQL replication (cont’d) ● SBR is good when changes affect lots of rows (as for e.g. 1k modified rows we only send a few bytes across the wire) ● SBR has problems when there are inconsistencies between master and slave or when queries are not deterministic (e.g. UPDATE … SET … LIMIT 100) ● RBR is good in maintaining a better consistency (as every changed row is replicated) ● RBR can be problematic when many rows are changed with a single statement (lots of traffic over the network)
  • 11. What is GCD ● NoSQL document database ● Automatic scaling ● High performance ● Flexible storage
  • 12. GCD (cont’d) ● Balance of strong and eventual consistency ○ entity lookups by key and ancestor queries always receive strongly consistent data ○ Other queries are eventually consistent ● Encryption at rest ○ encrypts all data before it is written to disk ● Querying of data through GQL ○ Similar with “classic” SQL; e.g. SELECT * FROM myKind WHERE myProp >= 100 AND myProp < 200 or SELECT * FROM myKind ORDER BY myProp DESC LIMIT 100 ● By default all properties are indexed, supports composite indexes (a bit more work to enable them though)
  • 14. Setup MySQL Master MySQL Slave SBR NodeJS App RBR Google Cloud Datastore Google Cloud Node modules
  • 15. Details about NodeJS App ● Uses ZongJi (https://github.com/nevill/zongji - MySQL binlog listener) var ZongJi = require('zongji'); var zongji = new ZongJi(config.database); zongji.on('binlog',function (evt) {doSomething('binlog',evt)}) zongji.on('query', function(evt) {doSomething('query',evt)}) zongji.on('writerows',function(evt) {doSomething('insert',evt)}) zongji.on('updaterows', function(evt) {doSomething('update',evt)}) zongji.on('deleterows', function(evt) {doSomething('delete',evt)})
  • 16. NodeJS (cont’d) zongji.start({ startAtEnd: true, includeSchema: {yourDBhere":true,"yourOtherDBHere":true},//config.monitor, includeEvents: [ 'tablemap', 'writerows', 'updaterows', 'deleterows' , 'query','rotate'] }); var doSomething = function(type, event) { //event has a rows attribute containing every modified row //it also has a tableMap containing table metadata (most important - table name) }
  • 17. NodeJS (last one, I promise) var sendToDataStore = function(namespace,idfldname,row) { var k = datastore.key([namespace, row[idfldname]]); datastore.save({key:k,data:row} ,function(err,res){ if(err) console.log("ERROR",err) else console.log("OK",JSON.stringify(res)) }); }
  • 19. In case the demo does not work