Automatic provisioning in Group Replication

With the release of version 8.0.17 of MySQL, we bring to you the brand new Clone plugin. Allowing you to get physical snapshots of data stored in InnoDB, this new plugin is an ideal addition to your Group Replication setup.

When joining a new server into a running replication group, a manual offline provisioning procedure was required in most cases as a tool to execute this step in the context of distributed recovery was missing. With this new plugin group replication distributed recovery can now handle that step online and automatically for the user.
Allowing the new member to be provisioned with a physical snapshot of data directly from a running donor and see that member come ONLINE with no more external operations is a major step forward in setup automation.

In this post we delve into how we went the extra mile to bring this two solutions together to bring you a faster and easier provisioning mechanism for your high available group of MySQL servers.

Group Replication + Clone

One of the key features of Group Replication is that a new member can automatically catch up to the group by transferring data from a donor member. This process was until now solely based on standard asynchronous replication , i.e, a slave channel.
With Clone Plugin we add to the distributed recovery process the extra capability to execute a full state transfer by taking a InnoDB physical snapshots from a remote server avoiding. This allows for a shorter incremental state transfer by the existing binary log replication.

Following our tradition in ease of use and automation, to use Clone provisioning there are very little changes you will need on your Group Replication setup to use this new feature.

First of all just remember the current limitations of Clone where your group servers can’t diverge in terms of what operative system they are running in, the MySQL version and similar restrictions described in here.

So upon installing the clone plugin (or making it available on start) your main focus point becomes the new option:

  • group_replication_clone_threshold

This new variable establishes the number of missing transactions in the joining member that Group Replication will use as threshold to decide if a first step of provisioning with the use of Clone will be executed.

When a member joins, a calculation is made base on the joiner and the group members GTID sets to discover how many transactions it is missing in relation with the group.
If this value is above your defined threshold the joining member will then execute a remote clone request to one of the existing online members.
When this process ends, the MySQL server will restart and, if configured to do so, Group
replication will start and join the group.
When group replication starts after a clone process ends, the distributed replication process will be executed again, evaluating if there is need for another clone request or if we should bring the member ONLINE by recovering using binary logs.

By default this value is set to the maximum number of GTIDs associated to a server UUID, in other words cloning is disabled.
The reason is that besides being a intrusive operation that will make the joiner discard all local data, the clone threshold is also a setting that you should adjust according to your work flow. Small values can be dangerous for example, as the group might execute that number of transactions while the server is cloning, meaning the server will see itself again being above the threshold after restart repeating clone operations until this no longer holds true.

Purged Data

One more scenario that Clone covers is when members no longer hold all the data needed for recovery in their binary logs. This is common on systems where you want to save disk so you don’t want to maintain a big binary log history in all members, thence periodic purges are configured.

When Group Replication starts besides the threshold calculation it also evaluates if group members can donate the missing transactions for recovery.
Even if the missing transactions are below the threshold, cloning will always be attempted in these cases where no one can donate the missing data.

Credentials and security

The other main point of configuration when using Clone with Group replication is how we handle authentication and connection encryption.

So to contact donors, the group replication plugin will use the user credentials that are setup for distributed recovery as both clone and binary log replication use similar ways of authentication.
For being able to clone the user connected to the donor server requires BACKUP_ADMIN privilege in a addition to REPLICATION_SLAVE privilege.

As for configuration, you will configure the recovery channel as always.
One detail is that Group Replication automatically configures the settings for the clone SSL options (clone_ssl_ca, clone_ssl_cert, and clone_ssl_key) to match your settings for the corresponding Group Replication distributed recovery options (group_replication_recovery_ssl_ca, group_replication_recovery_ssl_cert, and group_replication_recovery_ssl_key).

As for the clone_valid_donor_list, Group Replication configures this setting automatically for you after it selects a donor from the existing group members.

A last , but critical point, pertains to the fact that all data is overwritten on the joiner. This has a implication that while settings in file or persisted are preserved, your local recovery user settings will be lost (they are on the master info table) and replaced with the credentials from the donor. Remember that binary log recovery is still needed on joining members when they come back from cloning.
Hence remember that if you want for Group Replication to start at boot when Clone restarts make it so that the users copied from one member to another are still valid there.

Conclusion

The new Clone plugin brings an interesting addition to Group Replication operations.
You should however read all its repercussions and limitations as well as the advantages in order to make a informed decision on its usage.

For ease of use and a more automated approach to this process, we recommend you to use  MySQL InnoDB Cluster. You can read all about how we adapted all these tools together and  how they help you make the best decision when adding a member in here!

Please try it out and give us your feedback!

4,257 total views, 14 views today

About Pedro Gomes

Who am I? I'm a replication developer @ MySQL since 2013, and a fan of all things distributed so it's hard not to love my job. Raised on the distributed lab of Minho's University, home of great academic research on the field, I joined Oracle following this same passion and here I am!

One thought on “Automatic provisioning in Group Replication

Leave a Reply