Getting started with MySQL Group Replication

The multi master plugin for MySQL is here. MySQL Group Replication ensures virtual synchronous updates on any member in a group of MySQL servers, with conflict handling and failure detection. Distributed recovery is also in the package to ease the process of adding new servers to your server group.

How do you start? Just sit back, download MySQL Group Replication from http://labs.mysql.com/ and then let us begin this journey into the world of multi master MySQL.

Pre requisites

Under its hood, the group replication plugin is powered by a group communication toolkit. This is what decides which servers belong to the server group, performs failure detection and orders server messages. This last being the magic thing that allows the data to be consistent across all servers.

In its latest version, the plugin relies by default on XCom, a MySQL implementation of a variation of the Paxos algorithm. Among other advantages, XCom is bundled into the plugin and it’s cross platform as described in here.

Other option that is still available is the Corosync Cluster Engine. Corosync comes however with the limitation of only working in Linux and it also requires previous installation and configuration on every machine of a server group. To get the basic instructions on how to install Corosync and the recommended configurations, please see my blog post on the subject.

Plugin’s required configurations

As any new feature, the group replication plugin includes some limitations and requirements that emerge from its underlying characteristics. When configuring a server and your data for a multi master scenario you will need:

1) Have the binlog active and its logging format set to row.

Like on standard replication, multi master replication is based on the transmission of log
events. Its inner mechanism are however based on the write sets generated during row based logging  so row based replication is a requirement.

2) Have GTIDs enabled.

MySQL Group Replication depends on GTIDs, used to identify what transactions were
executed in the group, and for that reason vital to the certification and distributed recovery processes.

3) Use the InnoDB engine.

Synchronous multi master is dependent on transactional tables so only InnoDB is supported. Since this is now the default engine, you only have to be careful when creating individual tables.

4) Every table must have a primary key.

Multi master concurrency control is based on primary keys. Every change made to a table line is indexed to its primary key so this is a fundamental requirement.

5) Table based repositories

The relay log info and master info repositories must have their type set to TABLE.
Since group replication relies on channels for its applier and recovery mechanism, table repositories are needed to isolate their execution info.

6) Set the transaction write set extraction algorithm

The process of extracting what writes were made during a transaction is crucial for conflict detection on all group servers. This extracted information is then hashed using a specified algorithm that must be chosen upfront. Currently the only available algorithm is MURMUR32

Other current limitations:

1) Binlog Event checksum use must be OFF. 

Due to needed changes in the checksum mechanism, group replication is incompatible with this feature for now.

2) No concurrent DDL

As it currently stands, DDL statements can’t be concurrently executed with other DDL queries or even DML.

Hands on with MySQL Group Replication

First of all set-up a group of servers of the required version and then grab the provided plugin binaries that come with the release or compile them yourself following the instruction in the multi platform blog post. If you are still using Corosync, you must have the its development libraries not only the daemon installed when compiling.

You can test this on your desktop, use a group of physical computers or even test it on several virtual machines. In this example, three machines are spawn from the same machine with different data folders.

  • Configure a server on a standalone folder and create a replication user

In case you are not used to MySQL and the creation of data directories, we show you here the basic commands to get you running. We also include basic instructions to create a replication user. This last is used to establish master slave connections between members for recovery purposes. Please follow the steps used for server1 and repeat them for the other test servers using different data folders.

On the base directory of your MySQL server execute:

Start the server

You maybe noticed that a password was generated in the initialize command that can be used for root server connections. You can for this simple example avoid the use of passwords using the –initialize-insecure option.
If you instead, want to keep the password, you can change it executing:

Create a replication user (should be done for all nodes)

  •  Note: we are creating the users in a server with its binary log disabled, otherwise that would probably lead to replication conflicts during recovery. To do this process in a replicated way, create a user in the first member of your group when already configured, and let it be replicated through recovery to the other members when they join.

Shut down the server

  • Start the servers with the plugin and all the necessary options

Note: The below instruction assume that you are using a Unix environment. For Windows, you will need to replace the uses of group_replication.so with group_replication.dll

Server 1

Server 2

Server 3

Alternatively, if you have a running server without the plugin loaded you can install it on run-time. This implies that you have GTID mode ON, row based logging and all the above requirements correctly configured.

  • Configure

The first step on configuring a MySQL server group is to define a unique name that identifies the group and allows its members to join.
This name must be defined on every member, and since it works also as the group UUID, it must be a valid UUID.

Besides this, you should also configure the access credentials for recovery.
These settings are used by joining servers to establish a slave connection to a donor when entering the group, allowing them to receive missing data. Ignored on the first member that forms the group, you should always configure it on every server anyway, as they may fail and be reinstated at any moment in time.

By default, recovery tries to connect to other servers using connections credentials set to “root” with no associated password. To change these values just set the following variables.

Associated to these fields there is the also recovery’s retry count and reconnect interval.
These field tells recovery how many times it should try to connect to the available donors and how much time to wait when every attempt to connect to all group donors fails. By default, retry count is equal to 86400 and the reconnect interval is set to 60 seconds.

If you want to modify them, just use a variation of the example commands:

XCom settings

If you are running Group replication with XCom, you need to set some options so it can find the other group members.  For each member you will then have to set:

  • group_replication_local_address: The member local address, i.e., host:port where that member will expose itself to be contacted by the group.
  • group_replication_peer_addresses: The list of peers that also belong to the group. This list is comma separated: host1:port1,host2:port2. If the server is not configured to bootstrap a group it will sequentially contact the peers in the group in order to be added or removed from it. If the list contains its member local address, it will be ignored.

Important: Xcom ports must be different from the configured MySQL ports. Xcom is an internal service that must have its own dedicated port.

If you want to change the group communication system and use Corosync, you can still do it with:

  • Start multi master replication

To bootstrap a new group, you must explicitly state that with:

 

This flag shall be set on the first member of each group and should be set again to 0 when the member is online. This reset is necessary in order to it to be able to leave and rejoin, if needed, the *same* group instead of starting a new one.
You should also configure the member contact info when using XCom.

The member can now be started.

  • Check the member status

Here you can see the information about group replication trough the status of
its main channel. Besides the name you know the group replication applier is
running and that no error was detected.

  • Check the group members

  • Test query execution

Start server 2:

Insert some data on server 1:

Alternate between servers and check the data flow:

  • See distributed recovery in action

When you start a new server, it will try to get all the data it is missing from the other group members. It will use the configured access credentials and connect to another member fetching the missing group transactions.
During this period its state will be shown as ‘RECOVERING’, and you should not preform any action on this server during this phase.

Wait for it to be online. Truth is that here, with such a small amount of data, this state is hard to spot. However, you should be aware of this when dealing with real data sets.

Check that the data is there:

If you want to read more on how this is achieved, please check my blog post about Distributed Recovery.

  • Be aware of failures on concurrency scenarios

Due to the distributed nature of MySQL groups, concurrent updates can result on query failure if the queries are found to be conflicting. Lets perform a concurrent update to the same line in the example table

On server 1

On server 2

Execute in parallel

Note that, the scenario where the second update succeeds and the first one fails is also equally possible and only depends on the order the queries were ordered and certified inside the plugin.

Let’s check the tables.

The failed query rollbacks and no server is affected by this.

  • Check the execution stats

Check your GTID stats on each group member

Note that in all servers the GTID executed set is the same and belongs to the group.
You are maybe asking yourself why the set contains 8 transactions when we only executed 5 successful queries  in this tutorial. The reason is that whenever a member joins or leaves the group a transaction is logged to mark in every member this moment for recovery reasons.

The member execution stats are also available on the performance schema tables.

Where it can be seen that, from the 6 executed queries on this tutorial, 1 was found to be conflicting. We can also see that the transaction in the queue are 0, so it means no transaction are waiting validation.

On the last fields, the transaction validating field is 0 because all transaction that were executed are already considered to be stable on all members, as seen in the second last field. In other words, every member knows all the data, so the number of possible conflicting transactions is 0.

However if you execute this query on server 3, the result will be different.

Here we can see only two certified transactions as the remaining were all transmitted during recovery, but you can still see that the certification data is there as in the other members.

For more information on performance  tables check our blog post on the subject.

  • Stop group replication

To stop the plugin, you just need to execute:

On the server shutdown, the plugin stops automatically.

  • Reset group replication channels

If after using group replication you want to remove the associated channel and files you can execute.

Note that the “group_replication_applier” channel is not a normal slave channel and will not respond to generic commands like “RESET SLAVE ALL”.

  • How to start multi master replication at server boot

To enable the automatic start of multi master replication at server start two options are always needed.

The group name

The start on boot flag

Besides these two, and when not using the default parameters, you will also need to configure the access options that allow the members to connect to one another during recovery.

If using XCom, you need also the contact information for other members.

Try it now and send us your feedback

On his first steps, MySQL Group Replication is still in development.
Feel free to try it and get back at us so we can make it even better for the community!

About Pedro Gomes

Who am I? I'm a replication developer @ MySQL since 2013, and a fan of all things distributed so it's hard not to love my job. Raised on the distributed lab of Minho's University, home of great academic research on the field, I joined Oracle following this same passion and here I am!

11 thoughts on “Getting started with MySQL Group Replication

  1. I have the exception when starting group replication – START GROUP_REPLICATION. It tries to read from the global_variables table. By grant SELECT on performance_schema.global_variables, I am able to start the replication. Just want to get your confirmation on this

    1. Hi Ivan,

      Thanks for the feedback!
      About your issue, it sounds strange, as you have permissions to start group replication but not to access the performance schema table.
      What permissions does your user have?

  2. It will be great to have ‘group_replication.so’ for Ubuntu 14.04, especially for testing with MySQL Sandbox. It is much easier to install 3 MySQL servers and then just move plugin to related folder and start , rather than, compiling MySQL with plugin then installing MySQLs from source using MySQL Sandbox.

  3. I am getting grant error for replication user.

    2015-10-08T07:21:03.353445Z 8 [ERROR] Slave I/O for channel ‘group_replication_recovery’: The slave I/O thread stops because a fatal error is encountered when it try to get the value of SERVER_ID variable from master. Error: SELECT command denied to user ‘rpl_user’@’localhost’ for table ‘global_variables’, Error_code: 1142

  4. Hi Ivan, Shahriyar

    Your complains about lack of permissions should be related to
    https://bugs.mysql.com/bug.php?id=77732

    This is solved in 5.7.9 so it should not affect future releases.
    Sorry for the inconvenience, for now please grant SELECT permissions for the replication user on performance_schema.global_variables.

Leave a Reply

Your email address will not be published. Required fields are marked *

Please enter. *