On-demand primary election

Having a server acting as a primary with multiple secondaries is the most common setup when using Group Replication. Until now though, there was no way to change the current primary member while the group was running without causing some sort of disruption. Now, those times are over.

On this new release of the MySQL 8.0.13 server and Group Replication we now give you, the user, new tools to choose which server should be the primary member but also in what mode the group is operating without any server downtime.

Online primary election

Starting with MySQL 8.0.13 we introduced some new user functions that are automatically installed alongside Group Replication and among them you can now find:

group_replication_set_as_primary(UUID)

At any given point in time, you as a DBA can now use this function to change the current primary. See the example below:

In terms of database writes, as the new primary is declared ready for writes, the old primary will migrate to a secondary role eventually entering a read only state.
As a user this might mean you will see a small window where a member A is marked as primary but the member is still in read only mode, and member B (the old primary) is marked as secondary but still accepting writes.

Note: As only a  primary member can do writes in the group, if a asynchronous channel is running on that member, no switch will be allowed until the channel is stopped.

Primary Mode migrations

Alongside this new utility we also added new similar functions to change the mode currently configured in the group. You might be running a group in single primary mode, but during low times you want to execute some data processing tasks that could benefit from the resources of all the machines in the group. For these cases, being the group in single or multi primary the DBA can now switch the mode without any kind of restart.

To migrate a group into single primary mode we added the function

group_replication_switch_to_single_primary_mode([UUID])

If given a server UUID this new user function will migrate the group into single primary mode electing the proposed member as the new group primary. If no member identification is proposed the group will migrate electing a new primary based on the member weights or UUID lexicographic order.

See the example below:

At all moments the new primary member is always writable during the process.

Note: Again, as only a  primary member can do writes in the group, if an asynchronous channel is running on a member, only a migration electing that said member as primary will be allowed. If more than one member have running channels, no migrations will be allowed until the channels are stopped in at least part of the members.

On the other direction, if you want to change to multi primary mode you can use

group_replication_switch_to_multi_primary_mode()

On this operation, after some coordinated operations to ensure the safety and consistency of your data, eventually all members will become writable.

See the example below:

Note: All migrations operations will persist the settings into your configurations using SET PERSIST feature. This means if you start a member configured to run in single-primary mode, change the group mode to multi-primary and then restart the member, the member will come back online  with a multi-primary configuration.

Monitoring

One key aspect of every feature we implement is a concern about the end user can monitor the status and progress of the tasks being executed. For this purpose every configuration operation presented in this blog post can be monitored using MySQL stages.

Querying the performance_schema.events_stages_current table the DBA can observe the current status of the operation in progress in the invoking member or in any other group member.
On this table under its many columns, the event_name will give you the action being currently executed, being the work_estimated  for example the amount transactions we are currently waiting for or the number of members we are expecting a reply from and work_completed the already completed steps under this task.
You can check some examples below.

For primary switch :

For multi-primary migrations :

Not only the new configuration operations, but also the primary election process can now be monitored even on failover scenarios:

General considerations

This work of introducing new user functions for enhanced usability is inserted in a general effort to introduce coordinated management operations into Group Replication. As a whole these operations interact with many components and routines of the plugin so we now present some of their general aspects in the plugin context.

  • You can invoke configuration operations everywhere

If you want to make member A primary you don’t need to invoke the operation on member A. All operations are sent and executed in a coordinated way in all group members.
Also, this distributed execution of an operation also has a different ramification: when the invoking member dies, if already running the configuration process will continue on other members. See above how to check your configuration progress to know it ended on other members.

  • All members must be online

To simplify the migration/election process and guarantee it is fastest as possible, the group must not contain any member in recovery or the configuration action will be rejected by the member where you are executing it.

  • No members can join during a configuration change

For the same reasons stated above, any new member that will join the group during a coordinated configuration change will leave the group and cancel its join process.

  • No more than one configuration at once

This one is pretty straightforward in reason: concurrent configuration operations could lead to member divergence.

  • You can’t use configuration functions on mixed version groups

Due to the distributed nature of the these configuration actions, all members must recognize them in order to execute them. As such no server of an older version must be present in the server, otherwise the operation will be rejected.

  • Primary elections are now safer

As a natural consequence of introducing appointed primary elections, we refactored  some of the overall election process. On the course of this task we made a decision towards data safety that affects users on failover processes.
Now when a server dies or stops, and a new primary is elected, the new primary will only be writable when it processes all the transactions that came from the old primary. This avoids possible concurrency issues between old transactions from the old primary and the new ones being executed on this member.

Conclusion

We are happy we can now deliver to you these new features that we hope will make Group Replication easier to use and a better product overall.

Thanks for using MySQL Group Replication!

4,047 total views, 12 views today

About Pedro Gomes

Who am I? I'm a replication developer @ MySQL since 2013, and a fan of all things distributed so it's hard not to love my job. Raised on the distributed lab of Minho's University, home of great academic research on the field, I joined Oracle following this same passion and here I am!

Leave a Reply