Automated, Non-Stop MySQL Operations and Failover Presentation
Automated, Non-Stop MySQL Operations and Failover Presentation
Yoshinori Matsunobu
master
slave1-> New Master
Objective:
Automate master failover. That is, pick one of the appropriate slaves as a new
master, making applications send write traffics to the new master,
then starting replication again.
Failure Example (1)
Writer IP All slaves have received all binlog events
from the crashed master.
master
Any slave can be a new master,
id=99 without recovering any data
id=100
id=101
Example: picking slave 1 as a new master
Get current binlog position (file1,pos1)
Grant write access
Slave 2 and 3 should execute
Activate writer IP address
CHANGE MASTER MASTER_HOST=
‘slave1’ …;
START SLAVE;
slave1 slave2 slave3
This is the easiest scenario.
id=99 id=99 id=99 But not all times it is so lucky.
id=100 id=100 id=100
id=101 id=101 id=101
Execute CHANGE MASTER TO MASTER_HOST=‘slave1’,
MASTER_LOG_FILE=‘file1’, MASTER_LOG_POS=pos1;
Failure Example (2)
All slaves have received same binlog events
from the crashed master.
master
But the crashed master has some events
id=99 that have not been sent to slaves yet.
id=100
id=101
id=102 id=102 will be lost if you promote
Copy and apply events (id=102) one of slaves to a new master.
Start Master
If the crashed master is reachable (via SSH)
and binlog file is readable, you should
save binlog (id=102) before promoting
slave1 slave2 slave3 a slave to a new master.
id=99 id=99 id=99
id=100 id=100 id=100 Using Semi-Synchronous replication
id=101 id=101 id=101 greatly reduces the risk of this scenario.
id=102 id=102 id=102
CHANGE MASTER
Failure Example (3)
Writer IP
Some slaves have events which
other slaves have not received yet.
master
id=99 You need to pick events from the
id=100 latest slave (slave 2), and apply to
id=101
other slaves so that all slaves will be
Start Master
consistent.
(Sending id=101 to slave 1, sending
id=100 and 101 to slave 3)
slave1 slave2 slave3
id=99 id=99 The issues are:
id=99
id=100 id=100 id=100 - How can we identify which binlog
id=101 id=101 id=101 events are not sent?
Identify which events are not sent - How can we make all slaves
Apply lost events eventually consistent?
CHANGE MASTER
Master Failover: What makes it difficult?
Writer IP MySQL replication is asynchronous.
MySQL Cluster
MySQL Cluster is really Highly Available, but unfortunately we use InnoDB
Global Transaction ID
On mysql side, it’s not supported yet. Adding global transaction Id within binary logs
require binlog format change, which can’t be done in 5.1/5.5.
– Check Google’s Global Transaction ID patch if you’re interested
There are ways to implement global tx ID on application side, but it’s not possible
without accepting complexity, performance, data loss, and/or consistency problems
More concrete objective
Make master failover and slave promotion work
Saving binary log events from the crashed master (if possible)
– Semi-synchronous replication helps too
Identifying the latest slave
Applying differential relay log events to other slaves
Applying saved binary log events from master
Promoting one of the slaves to a new master
Making other slaves replicate from the new master
Do the above
Without introducing too much complexity on application side
With 5.0/5.1 InnoDB
Without losing performance significantly
Without spending too much money
Saving binlog events from (crashed) master
Dead Master Latest Slave Other Slaves
If the dead master is reachable via SSH, and binary logs are
accessible (Not H/W failure, i.e. InnoDB data file corruption on the
master), binlog events can be saved.
Lost events can be identified by checking {Master_Log_File,
Read_Master_Log_Pos} on the latest slave + mysqlbinlog
Using Semi-Synchronous replication greatly reduces the risk of
events loss
Understanding SHOW SLAVE STATUS
mysql> show slave status¥G
Slave_IO_State: Waiting for master to send event
Master_Host: master_host {Master_Log_File,
Master_User: repl Read_Master_Log_Pos} :
Master_Port: 3306 The position in the current
Connect_Retry: 60 master binary log file up to
Master_Log_File: mysqld-bin.000980 which the I/O thread has read.
Read_Master_Log_Pos: 629290122
Relay_Log_File: mysqld-relay-bin.000005 {Relay_Master_Log_File,
Relay_Log_Pos: 26087338 Exec_Master_Log_Pos} :
Relay_Master_Log_File: mysqld-bin.000980 The position in the current
Slave_IO_Running: Yes master binary log file up to
Slave_SQL_Running: Yes which the SQL thread has read
Replicate_Do_DB: db1 and executed.
…
Last_Errno: 0 {Relay_Log_File,
Relay_Log_Pos} :
Last_Error:
Exec_Master_Log_Pos: 629290122 The position in the current
Seconds_Behind_Master: 0 relay log file up to which the
Last_IO_Errno: 0 SQL thread has read and
executed.
Last_IO_Error:
Last SQL Errno: 0
Identifying the latest slave
Slave 1 Slave 2 Slave 3
slave2-relay.003123
slave3-relay.001234
slave1-relay.003300
Slave
Binary Logs Relay Logs
Master …
…
BEGIN; BEGIN;
UPDATE… UPDATE… These events are
INSERT… INSERT… NOT executed forever
UPDATE… (EOF)
COMMIT;
(EOF)
{Master_Log_File, Read_Master_Log_Pos}
Alive slave IO thread writes valid relay log events, so invalid (can’t read)
events should not be written to the relay log
But if master crashes while sending binary logs, it is likely that only some
parts of the events are sent and written on slaves.
In this case, slave does not execute the last (incomplete) transaction.
{Master_Log_File, Read_Master_Log_Pos} points to the end of the relay log,
but {Relay_Master_Log_File, Exec_Master_Log_Pos} will point to the last
transaction commit.
Lost transactions
Relay_Log_Pos In some unusual cases, relay
Exec_Master_Log_Pos logs are not ended with
(Current slave1’s data)
transaction commits
[user@slave1] mysqlbinlog mysqld-relay-bin.003300 i.e. running very long
# at 91807 transactions
#110207 15:43:42 server id 1384 end_log_pos 101719
Xid = 12951490655 Read_Master_Log_Pos always
COMMIT/*!*/; points to the end of the relay
# at 91835 log’s end_log_pos
#110207 15:43:42 server id 1384 end_log_pos 101764 Exec_Master_Log_Pos points
Query thread_id=1784 exec_time=0 error_code=0 to the end of the transaction’s
SET TIMESTAMP=1297061022/*!*/;
end_log_pos (COMMIT)
BEGIN
/*!*/; In the left case,
# at 91910 Exec_Master_Log_Pos ==
#110207 15:43:42 server id 1384 end_log_pos 102067
Read_Master_Log_Pos is never
true
Query thread_id=1784 exec_time=0 error_code=0
SET TIMESTAMP=1297061022/*!*/;
update ………………….. Slave 1’s SQL thread will never
/*!*/; execute BEGIN and UPDATE
statements
(EOF)
Unapplied events can be
generated by mysqlbinlog –
Read_Master_Log_Pos start-position=91835
Recovering lost transactions
[user@slave2] mysqlbinlog mysqld-relay-bin.003123 [user@slave1] mysqlbinlog mysqld-relay-bin.003300
# at 106 # at 106
#101210 4:19:03 server id 1384 end_log_pos 0 #101210 4:19:03 server id 1384 end_log_pos 0
Rotate to mysqld-bin.001221 pos: 4 Rotate to mysqld-bin.001221 pos: 4
… …
# at 101807 # at 91807
#110207 15:43:42 server id 1384 end_log_pos 101719 #110207 15:43:42 server id 1384 end_log_pos 101719
Xid = 12951490655 Xid = 12951490655
COMMIT/*!*/; COMMIT/*!*/;
Relay_Log_Pos
# at 101835 # at 91835 (current slave1’s pos)
#110207 15:43:42 server id 1384 end_log_pos 101764 #110207 15:43:42 server id 1384 end_log_pos 101764
Query thread_id=1784 exec_time=0 error_code=0 Query thread_id=1784 exec_time=0 error_code=0
SET TIMESTAMP=1297061022/*!*/; SET TIMESTAMP=1297061022/*!*/;
BEGIN BEGIN
/*!*/; /*!*/; (A)
# at 101910 # at 91910
#110207 15:43:42 server id 1384 end_log_pos 102067 #110207 15:43:42 server id 1384 end_log_pos 102067
Query thread_id=1784 exec_time=0 error_code=0 Query thread_id=1784 exec_time=0 error_code=0
SET TIMESTAMP=1297061022/*!*/; SET TIMESTAMP=1297061022/*!*/;
update 1………………….. update 1…………………..
/*!*/; /*!*/; (EOF)
# at 102213
#110207 15:43:42 server id 1384 end_log_pos 102211
Query thread_id=1784 exec_time=0 error_code=0 The second update event is lost on
SET TIMESTAMP=1297061022/*!*/; slave 1, which can be sent from slave 2
update 2………………….. The first update event is not executed
/*!*/; (B) on slave 1’s SQL thread
# at 102357
#110207 15:43:42 server id 1384 end_log_pos 102238 (A) + (B) should be applied on slave
Xid = 12951490691 1, wichin the same transaction
COMMIT/*!*/; (EOF)
Steps for recovery
Dead Master Latest Slave Slave(i)
Final Relay_Log_File,
Relay_Log_Pos
On slave(i),
Wait until the SQL thread executes events
Apply i1 -> i2 -> X
– On the latest slave, i2 is empty
Design notes
BINLOG '
OKqiTRMaBQAALQAAABlTKAAAABAAAAAAAAEABWdhbWVfAAJ0MQADAwP8
AQIG
OKqiTRcaBQAAMwAAAExTKAAAABAAAAAAAAEAA//4CmgAAApoAAALAGFhY
WFhYTI2NjM0
'/*!*/;
Multiple “#at” entries + same number of “end_log_pos” entries (when parsed by
mysqlbinlog)
“Table_map” event + “Write_rows (or others)” event + STMT_END
Write_rows events can be many when using LOAD DATA, Bulk INSERT, etc
mysqlbinlog prints out when valid “Table Map .. STMT End” events are written
If slave A has only partial events, it is needed to send complete “Table Map ..
STMT End” events from the latest slave
Automating failover
Common HA tasks
Detecting master failure
Node Fencing (Power off the dead master, to avoid split brain)
Updating writer IP address
Notification/Operation
Sending mails
Disabling scheduled backup jobs on the new master
Updating internal administration tool status, master/slave ip address mappings, etc
Tool: Master High Availability Toolkit
Manager master
MySQL-MasterHA-Manager
- master_monitor
- master_switch
- masterha_manager slave1 slave2 slave3
master MySQL-MasterHA-Node
- save_binary_logs
- apply_diff_relay_logs
- filter_mysqlbinlog
- purge_relay_logs
Manager
slave1 slave2 slave3
master_monitor: Detecting master failure
master_switch: Doing failover (manual, or automatic failover invoked by masterha_manager)
Node : Deploying on all MySQL servers
save_binary_logs: Copying master’s binary logs if accessible
apply_diff_relay_logs: Generating differential relay logs from the latest slave, and applying all
differential binlog events
filter_mysqlbinlog: Trimming unnecessary ROLLBACK events
purge_relay_logs: Deleting relay logs without stopping SQL thread
We have started using this tool internally. Will publish as OSS soon
One Manager per Datacenter
Each Manager monitors multiple
MySQL masters within the same
datacenter
Master recovery
Finished in less than 1 second
Adding/Changing shards
Can be done without stopping service, if designed well
Hash based sharding makes it difficult to re-shard without stopping
services
Mapping table based sharding makes it much easier
Tentative three-tier replication
Writer App Writer App