Here are the steps to deploy a local 3-node MySQL InnoDB Cluster sandbox:
1. Start 3 local MySQL instances on ports 3310, 3320, 3330
2. Connect to the first instance using MySQL Shell
3. Run the following commands in MySQL Shell to bootstrap and join the nodes:
```js
// Bootstrap first node
dba.bootstrapCluster({
"host": "127.0.0.1",
"port": 3310
});
// Join second node
dba.addInstance({
"host": "127.0.0.1",
"port": 3320
});
// Join third node
dba.add
1 of 63
More Related Content
MySQL Day Paris 2018 - MySQL InnoDB Cluster; A complete High Availability solution for MySQL
1. MySQL InnoDB Cluster
Copyright 2018, Oracle and/or its affiliates. All rights reserved
A complete High Availability solution for MySQL
Olivier Dasini
MySQL Principal Solutions Architect EMEA
olivier.dasini@oracle.com
Twitter : @freshdaz
Blog : http://dasini.net/blog
5. Copyright 2018, Oracle and/or its affiliates. All rights reserved
Virtually all organizations require their
most critical systems to be highly available
5
100%
High Availability
7. High Availability: Factors
• Environment
– Redundant servers in different datacenters and geographical areas will protect you
against regional issues—power grid failures, hurricanes, earthquakes, etc.
• Hardware
– Each part of your hardware stack—networking, storage, servers—should be
redundant
• Software
– Every layer of the software stack needs to be duplicated and distributed across
separate hardware and environments
• Data
– Data loss and inconsistency/corruption must be prevented by having multiple copies
of each piece of data, with consistency checks and guarantees for each change
7
8. High Availability: The Causes of Downtime
8
40.00%
40.00%
20.00%
Software/Application Human Error Hardware
* Source: Gartner Group 1998 survey
A study by the Gartner Group
projected that through 2015,
80% of downtime will be due to
people and process issues
9. High Availability: The Business Cost of
Downtime
• Calculate a cost per minute of downtime
– Average revenue generated per-minute over a year
– Cost of not meeting any customer SLAs
– Factor in costs that are harder to quantify
1. Revenue
2. Reputation
3. Customer sentiment
4. Stock price
5. Service’s success
6. Company’s very existence
9
THIS is why
HA matters!
17. Group Replication: What Is It?
• Group Replication library
– Implementation of Replicated Database State Machine theory
• MySQL GCS is based on Paxos (variant of Mencius)
– Provides virtually synchronous replication for MySQL 5.7+
– Supported on all MySQL platforms
• Linux, Windows, Solaris, OSX, FreeBSD
“Single/Multi-master update everywhere replication plugin for MySQL with built-in
automatic distributed recovery, conflict detection and group
membership.”
17
http://dasini.net/blog/2016/11/08/deployer-un-cluster-mysql-group-replication/
18. • A Highly Available distributed MySQL database service
– Clustering eliminates single points of failure (No SPOF)
• Allows for online maintenance
– Removes the need for handling server fail-over
– Provides fault tolerance
– Enables update everywhere setups
– Automates group reconfiguration (handling of crashes, failures, re-connects)
– Provides a highly available replicated database
– Automatically ensures data consistency
• Detects and handles conflicts
• Prevents data loss
• Prevents data corruption
18
Group Replication: What Does It Provide?
19. Group Replication: What Sets It Apart?
• Built by the MySQL Engineering Team
– Natively integrated into Server: InnoDB, Replication, GTIDs, Performance
Schema, SYS
– Built-in, no need for separate downloads
– Available on all platforms [Linux, Windows, Solaris, FreeBSD, etc]
• Better performance than similar offerings
– MySQL GCS has optimized network protocol that reduces the impact on latency
• Easier monitoring
– Simple Performance Schema tables for group and node status/stats
– Native support for Group Replication in MySQL Enterprise Monitor
• Modern full stack MySQL HA being built around it
19
20. Group Replication: Architecture
Node Types
R: Traffic routers/proxies: mysqlrouter, ProxySQL, HAProxy...
M: mysqld nodes participating in Group Replication
20
23. Single Primary Mode
• Configuration mode that makes a single member act as a writeable master (PRIMARY) and the
rest of the members act as hot-standbys (SECONDARIES)
– The group itself coordinates automatically to figure out which is the member that will act as the PRIMARY,
through an automatic primary election mechanism
– Secondaries are automatically set to read-only
• Single_primary mode is the default mode
– Closer to classic asynchronous replication setups, simpler to reason about the beginning
– Avoids some limitations of multi-primary mode by default
• The current PRIMARY member UUID can be know by executing the following SQL statement:
23
SQL> SELECT * FROM performance_schema.replication_group_members WHERE MEMBER_ROLE='PRIMARY'G
*************************** 1. row ***************************
CHANNEL_NAME: group_replication_applier
MEMBER_ID: da6e3c8e-a3cb-11e8-84dc-0242ac13000b
MEMBER_HOST: mysql_8.0_node1
MEMBER_PORT: 3306
MEMBER_STATE: ONLINE
MEMBER_ROLE: PRIMARY
MEMBER_VERSION: 8.0.12
SQL> SELECT * FROM performance_schema.replication_group_members WHERE MEMBER_ROLE='PRIMARY'G
*************************** 1. row ***************************
CHANNEL_NAME: group_replication_applier
MEMBER_ID: da6e3c8e-a3cb-11e8-84dc-0242ac13000b
MEMBER_HOST: mysql_8.0_node1
MEMBER_PORT: 3306
MEMBER_STATE: ONLINE
MEMBER_ROLE: PRIMARY
MEMBER_VERSION: 8.0.12
24. Multi-primary Mode
• Configuration mode that makes all members writeable
– Enabled by setting option --group_replication_single_primary_mode to
OFF
• Any two transactions on different servers can write to the same tuple
• Conflicts will be detected and dealt with
– First committer wins rule
24
25. Full stack secure connections
• Following the industry standards, Group Replication supports secure
connections along the complete stack
– Client connections
– Distributed recovery connections
– Connections between members
• IP Whitelisting
– Restrict which hosts are allowed to connect to the group
– By default it is set to the values AUTOMATIC, which allow connections from private
subnetworks active on the host
25
http://mysqlhighavailability.com/mysql-group-replication-securing-the-perimeter/
26. Prioritize member for the Primary Member Election
• group_replication_member_weight
– allows users to influence primary member election
– takes integer value between 0 and 100
– default value = 50
• The first primary member is still the member which bootstrapped the group
irrespective of group_replication_member_weight value.
26
http://mysqlhighavailability.com/group-replication-prioritise-member-for-the-primary-member-election/
node1> SET GLOBAL group_replication_member_weight= 90;
node2> SET GLOBAL group_replication_member_weight= 70;
node1> SET GLOBAL group_replication_member_weight= 90;
node2> SET GLOBAL group_replication_member_weight= 70;
27. Parallel applier support
• Group Replication now also takes full advantage of parallel binary log applier
infrastructure
– Reduces applier lad and improves replication performance considerably
– Configured in the sale way as asynchronous replication
27
slave_parallel_workers=<NUMBER>
slave_parallel_type=logical_clock
slave_preserve_commit_order=ON
slave_parallel_workers=<NUMBER>
slave_parallel_type=logical_clock
slave_preserve_commit_order=ON
28. • Optimize the system as a whole
– Sometimes it is beneficial to delay some parts of a distributed system it to improve the throughput of the system as a
whole
– In MySQL Group Replication, it is used to
●
keep writers operating below the sustained capacity of the system;
●
reduce buffering stress on the replication pipeline;
●
protect the correct execution of the system.
• Designed as a safety measure
– Throttling will never be active while the system is operating below its sustained capacity.
• To better support unbalanced systems and unfriedly workloads
– Keep members closer for faster failover
– Keep members closer for reduced replication lag
– Reduce the number of transactions aborts in multi-master
• Make sure new members can always join write-intensive groups
– Nodes entering the group need to catch up previous work while also storing current work to apply later
– In case excess capacity is not available, the cluster will need to be put at lower throughput for new members to catch up
28
Flow Control
29. • MySQL 5.7 - Basic configuration options
– group-replicaton-fow-control-mode = QUOTA | DISABLED
– group-replicaton-fow-control-certifier-threshold = 0..n
– group-replicaton-fow-control-applier-threshold = 0..n
• Certifier/applier thresholds
– The thresholds are the point at which the flow-control system will delay the writes at the master
– The default is set to 25000 and should be kept larger then one second of sustained commit rate
– But some members will be up to 25000 transactions delayed, if, and only if, they are unable to keep up with the writer members
• MySQL 8.0.3 introduce additional options to fine-tune the heuristics
– group_replicaton_fow_control_min_quota = X commits/s
– group_replicaton_fow_control_min_recovery_quota = X commits/s
– group_replicaton_fow_control_max_commit_quota = X commits/s
– group_replicaton_fow_control_member_quota_percent = Y %
– group_replicaton_fow_control_period = Z seconds
– group_replicaton_fow_control_hold_percent = Y %
– group_replicaton_fow_control_release_percent = Y %
29
Flow Control options
51. Enterprise Monitor
• Native holistic support for Group Replication clusters
– Intelligent monitoring and alerting
– Topology views
– Detailed metrics and graphs
– Best Practice advice
• Monitoring of MySQL Group Replication
51
52. 52
• Group Replication with 3 online nodes
• Group Replication with 3 nodes :
– 2 online
– 1 unreachable
Enterprise Monitor
53. 53
• Asynchronous replication between
– Group Replication cluster : master
– Standalone instances : slaves
Enterprise Monitor
54. 54
• Asynchronous replication between
– Group Replication cluster 1 : master
– Group Replication cluster 2 : slave
Enterprise Monitor
55. Monitoring: replication_group_member_stats
• Useful to understand :
– how the applier queue is growing
– how many conflicts have been found
– how many transactions were checked
– which transactions are committed everywhere
●
Important for monitoring the performance of the members connected in the group
node1> SELECT * FROM performance_schema.replication_group_member_statsG
*************************** 1. row ***************************
CHANNEL_NAME: group_replication_applier
VIEW_ID: 14845735801161197:3
MEMBER_ID: 00014001-1111-1111-1111-111111111111
COUNT_TRANSACTIONS_IN_QUEUE: 0
COUNT_TRANSACTIONS_CHECKED: 0
COUNT_CONFLICTS_DETECTED: 0
COUNT_TRANSACTIONS_ROWS_VALIDATING: 0
TRANSACTIONS_COMMITTED_ALL_MEMBERS: 4e0f05b7-d9d0-11e6-87cf-002710cccc64:1-2
LAST_CONFLICT_FREE_TRANSACTION:
55
56. Monitoring: replication_group_members 1/2
• Used for monitoring the status of the different server instances that are tracked in the current view
node1> SELECT * FROM performance_schema.replication_group_membersG
*************************** 1. row ***************************
CHANNEL_NAME: group_replication_applier
MEMBER_ID: 00014001-1111-1111-1111-111111111111
MEMBER_HOST: localhost
MEMBER_PORT: 14001
MEMBER_STATE: ONLINE
MEMBER_ROLE: PRIMARY
MEMBER_VERSION: 8.0.12
*************************** 2. row ***************************
CHANNEL_NAME: group_replication_applier
MEMBER_ID: 00014002-2222-2222-2222-222222222222
MEMBER_HOST: localhost
MEMBER_PORT: 14002
MEMBER_STATE: ONLINE
MEMBER_ROLE: SECONDARY
MEMBER_VERSION: 8.0.12
*************************** 2. row ***************************
CHANNEL_NAME: group_replication_applier
MEMBER_ID: 00014003-3333-3333-3333-333333333333
MEMBER_HOST: localhost
MEMBER_PORT: 14003
MEMBER_STATE: ONLINE
MEMBER_ROLE: SECONDARY
MEMBER_VERSION: 8.0.12
56
57. Monitoring: replication_group_members 2/2
• replication_group_members table is updated whenever there is a view change
• There are various states that a server instance can be in
• If servers are communicating properly, all report the same states for all servers
• If there is a network partition, or a server leaves the group, then different information
may be reported, depending on which server is queried
57
Field Description Group Synchronized
ONLINE
The member is ready to serve as a fully functional group member, meaning that the
client can connect and start executing transactions Yes
RECOVERING
The member is in the process of becoming an active member of the group and is
currently going through the recovery process, receiving state information from a
donor
No
OFFLINE The plugin is loaded but the member does not belong to any group No
ERROR
The state of the local node. Whenever there is an error on the recovery phase or
while applying changes, the server enters this state No
UNREACHABLE
Whenever the local failure detector suspects that a given server is not reachable,
because maybe it has crashed or was disconnected involuntarily, it shows that
server's state as 'unreachable'
No
58. Monitoring: replication_connection_status
• Show information regarding Group Replication :
– transactions that have been received from the group and queued in the applier queue (the relay log)
– Recovery
node1> SELECT * FROM performance_schema.replication_connection_statusG
*************************** 1. row ***************************
CHANNEL_NAME: group_replication_applier
GROUP_NAME: 4e0f05b7-d9d0-11e6-87cf-002710cccc64
SOURCE_UUID: 4e0f05b7-d9d0-11e6-87cf-002710cccc64
THREAD_ID: NULL
SERVICE_STATE: ON
COUNT_RECEIVED_HEARTBEATS: 0
LAST_HEARTBEAT_TIMESTAMP: 0000-00-00 00:00:00
RECEIVED_TRANSACTION_SET: 4e0f05b7-d9d0-11e6-87cf-002710cccc64:1-2
LAST_ERROR_NUMBER: 0
LAST_ERROR_MESSAGE:
LAST_ERROR_TIMESTAMP: 0000-00-00 00:00:00
58
59. Monitoring: replication_applier_status
• The state of the Group Replication related channels and thread
• If there are many different worker threads applying transactions then the
worker tables can also be used to monitor what each worker thread is
doing
node1> SELECT * FROM performance_schema.replication_applier_statusG
*************************** 1. row ***************************
CHANNEL_NAME: group_replication_applier
SERVICE_STATE: ON
REMAINING_DELAY: NULL
COUNT_TRANSACTIONS_RETRIES: 0
59