Enabling Global Transaction Identifiers Without Downtime in MySQL 5.7.6

A much requested feature has made it to MySQL 5.7.6: Global Transaction Identifiers (GTIDs) can now be enabled online, without stopping writes, synchronizing servers, or restarting servers.

We introduced GTIDs in MySQL 5.6.6. GTIDs allow, among other things, seamless fail-over after a master has crashed. This enables highly available applications, since service can continue without interruption even if the master crashes. For new deployments, you could easily enable GTIDs right away – just set gtid-mode=ON in the configuration files. For deployments that accept a certain amount of downtime, you could switch from off to on too. However, for big deployments with strict limitations on downtime, this was more problematic, since you had to stop updates, sync all servers, and restart all servers simultaneously with GTIDs enabled, and this would lead to several minutes of downtime.

In MySQL 5.7.6, we have now made it possible to enable GTIDs and keep the service up at all times. In fact, we removed the need to stop updates, we removed the need to sync the cluster, and we removed the need to restart any servers. At the same time, we enabled steps to pre-check that the workload is GTID-compatible before forcing it to be so, so as to eliminate the risk for surprises. We also simplified and reinforced parts of the codebase, especially around GTID execution. You can now turn on GTIDs with no interruption at all in application service.

Offline procedure

It is still possible to use the old, offline procedure. The procedure is as follows:

  1. Disable all write operations.
  2. Wait for all transactions to propagate from the master(s) to all slaves.
  3. Stop all servers.
  4. On each server, set gtid-mode=ON in the configuration file.
  5. Start all servers.
  6. Enable write operations.

Let us now see how the new features in 5.7.6 allow you to enable GTIDs with no offline time, and with safety checks to eliminate the risk for surprises when the workload has to be GTID-consistent.

Prerequisities for enabling GTIDs online

The GTID online procedure goes through a number of steps, and each step must be completed on all servers before beginning the next step on any server. Therefore, before you can begin the procedure on any single server, all servers in your replication topology must use MySQL 5.7.6 or later. If any server uses an old version, you must use the offline procedure on every server.

Online procedure to enable GTIDs

The procedure has multiple steps. To give a better overview of the procedure, we divide the steps into three ”checkpoints” (the procedure is also documented in http://dev.mysql.com/doc/refman/5.7/en/replication-mode-change-online-enable-gtids.html):

Checkpoint 1. Ensure that your workload is compatible with GTIDs.

This checkpoint is needed because GTIDs do put (small) restrictions on the SQL you can execute. The following statements are disallowed: CREATE  TABLE … SELECT; CREATE TEMPORARY or DROP TEMPORARY inside a transaction context; and mixing transactional tables with nontransactional tables in the same statement or in the same transaction context. Such statements must be eliminated from the workload (this is a good practice anyways, because the non-transactional nature of these statements makes them unsuitable for replication).

We do not expect these to be severe limitations for most users. But, to be sure that enabling GTIDs does not cause any surprises, you can now pre-check your workload by enabling a mode that generates warnings for the GTID-violating statements.

To complete checkpoint 1, do the following:

1. On each server, execute: SET @@GLOBAL.ENFORCE_GTID_CONSISTENCY = WARN, and set enforce-gtid-consistency=WARN in my.cnf.

2. Let it run for a while with your normal workload. Any GTID-violating statement will execute correctly, but generate a warning in the server log, like:

or

or

Look in the log for these warnings. If there are any warnings , adjust your SQL so that it only uses GTID-compatible statements. E.g., replace CREATE TABLE … SELECT by a CREATE TABLE statement followed by an INSERT statement, move CREATE/DROP TEMPORARY out of transactional context, or switch to InnoDB for any table that needs to be updated in the same transaction or in the same statement as other InnoDB tables. Notice that these statements are risky to use together with replication, so all this is good to do anyways. Repeat until it does not generate any warning.

3. On each server, execute: SET @@GLOBAL.ENFORCE_GTID_CONSISTENCY = ON, and set enforce-gtid-consistency=ON in my.cnf so that the value is preserved next time you need to restart (you don’t need to restart due to this procedure, but there could always be some other reason).

Checkpoint 2. Make all servers generate GTIDs.

This procedure is online as of 5.7.6:

4. On each server, execute: SET @@GLOBAL.GTID_MODE = OFF_PERMISSIVE, and set gtid-mode=OFF_PERMISSIVE in my.cnf.

It is important that you complete step 4 entirely on all servers, before you begin step 5 on any server.

5. On each server, execute: SET @@GLOBAL.GTID_MODE = ON_PERMISSIVE, and set gtid-mode=ON_PERMISSIVE in my.cnf.

Checkpoint 3. Make all servers GTID-only.

During the previous steps to enable GTIDs online, we have to allow a server to execute a mixture of GTID-transactions and transactions that do not have a GTID. When a binary log contains such a mixture of transactions, GTIDs naturally cannot be safely used as the only way to address transactions. Therefore, before taking the last step to make the server GTID-only, we have to wait for all old binary logs to become obsolete.

The steps to complete checkpoint 3 are:

6. Wait until there are no more ongoing client transactions that need to be GTID-free. Normally this should happen within a second. To verify that this is completed, do the following. On each server, wait until the status variable ONGOING_ANONYMOUS_TRANSACTION_COUNT is zero. This can be checked using the statement SHOW STATUS LIKE ‘ONGOING_ANONYMOUS_TRANSACTION_COUNT’;

7. Wait until all transactions that existed after step 6 are replicated to all servers. A simple way is to wait for a bit longer than your slave lag (see http://dev.mysql.com/doc/refman/5.7/en/replication-mode-change-online-verify-transactions.html for other methods).

8. All binary logs existing up to now may contain transactions that do not have a GTID. In case you use binary logs for anything else than replication, such as point-in-time backup and recovery, these logs will not be usable after you make the server GTID-only. Therefore, you must now wait for the logs to expire so that you don’t depend on the old logs in any way.

If you configured mysqld setting –expire-log-days to a nonzero value, then the logs will be automatically purged after that many days. As an alternative, many users have cronjobs or other periodically executed scripts which issue PURGE BINARY LOGS to remove old binary logs. If your backup routine is correctly configured, it should take this into account, so that the newest backup never relies on binary logs that have been removed. So the easiest way to be sure that you don’t depend on the old binary logs, is to wait for these old binary logs to be removed.

Start by issuing FLUSH LOGS followed by SHOW MASTER STATUS on each server. Note down the filename reported by SHOW MASTER STATUS on each server. Then wait until all binary logs older than this have been removed or expired. Depending on your configuration, this can take days or even weeks. The database will be fully online all the time, so while this can potentially take long time, it does not harm your application.

9. On each server, execute: SET @@GLOBAL.GTID_MODE = ON, and set gtid-mode=ON in my.cnf

10. If you want to use the auto-positioning protocol so that the slave uses GTIDs to keep track of the replication position, you can do so now:

This has to be executed on each slave server that should use the auto-positioning protocol.

In MySQL 5.7.6 we also introduced multi-source replication. If you use multi-source replication, you have to repeat step 10 once for each replication channel, specifying the channel name:

That is all!

Fault-tolerant procedure

The procedure has been designed so that it is fault-tolerant. In other words, it works gracefully even in the presence of a crash or power failure, or if anything else unexpected interrupts the procedure at any point. If anything like this happens, you can leave the GTID online procedure in whatever state it is, go and handle the error, and later resume the procedure where it was stopped.

Many steps, many safe-guards

We realize that the many steps can look a bit overwhelming. However, note that this is something you do once in a lifetime and then forget about. When you bootstrap new servers in the future, you should of course set gtid-mode=ON directly in the configuration file, before starting them for the first time, and will not have to go through the procedure. Also, the server is fully operational during each step, so there is no harm if you pause in the middle of the procedure.

Note that this procedure includes all steps to ensure your workload is safe for GTIDs. There would almost always be some kind of verification steps involved when changing any configuration parameter. The difference for GTIDs is that we do provide tools that allow you to pre-check the configuration, and this is explicitly included in the upgrade procedure.

That said, let us go into more detail on the new values for GTID_MODE to explain why it has to go through the two intermediate steps.

The idea of permissive GTID_MODEs

In 5.6, GTID_MODE could take two values:

GTID_MODE=ON: The server will generate GTIDs for all client transactions, and if the server is a replication slave it will only accept to replicate transactions that have a GTID. The point of this mode is that it guarantees that every transaction has a GTID. Because of this guarantee, GTIDs can be safely used as the only way to address transactions.

GTID_MODE=OFF: The server will not generate any GTIDs for client transactions, and if the server is a replication slave it will only accept to replicate transactions that do not have any GTID. The point of this mode is that it guarantees that every transaction is GTID-free, so it is backward compatible with earlier server versions.

In 5.7, we added two new modes: ON_PERMISSIVE and OFF_PERMISSIVE:

GTID_MODE=ON_PERMISSIVE: Just like when GTID_MODE=ON, the server generates GTIDs for all client transactions. However, if the server is a replication slave, it is more forgiving, and accepts replicated transactions to either have a GTID or have no GTID. Thus, while new transactions have a GTID, the guarantee that every transaction has a GTID does not hold.

GTID_MODE=OFF_PERMISSIVE: Just like when GTID_MODE=OFF, the server does not generate any GTID for client transactions. But on a replication slave, the server is forgiving and accepts both transactions with and without a GTID. So even if new transactions do not have a GTID, the guarantee that every transaction is GTID-free does not hold.

So neither of the two permissive modes has the advantages of ON (GTID-addressable transactions) or OFF (backward compatibility), but their advantage is that they allow replicating all transactions. This is what makes them suitable during the online procedure, since in the absence of full synchronization between all servers there will necessarily be a period of time when any server can expect both transactions that have a GTID and transactions that don’t have a GTID.

This makes the modes suitable as intermediate steps during the procedure to enable GTIDs. Note that these modes do not guarantee the full feature sets of either GTID_MODE=ON (you can’t address every transaction using a GTID since there may be transactions without a GTID) or GTID_MODE=OFF (the server is backward compatible with older server versions since it allows GTID-transactions).

Online procedure to disable GTIDs

While we recommend users to switch to GTID_MODE=ON, of course we provide ways to revert to GTID_MODE=OFF. The new values for GTID_MODE allow you to do this online as well. The procedure is mostly a reverse of the upgrade procedure: see http://dev.mysql.com/doc/refman/5.7/en/replication-mode-change-online-disable-gtids.html for details.

Summary

Starting in MySQL 5.7.6, you can enable GTIDs without any downtime. The procedure has multiple steps, so as to ensure that the server is fully compatible with the next step before issuing the next step. This removes the largest obstacle for enabling GTIDs and we hope many users will now go and enable GTIDs on their deployments!

About Sven Sandberg

Sven Sandberg is a Principal Software Engineer and leads a part of the replication team at Oracle. He has a PhD in software verification from Uppsala University, Sweden, on the subject of games and probabilistic infinite-state systems.

One thought on “Enabling Global Transaction Identifiers Without Downtime in MySQL 5.7.6

Leave a Reply

Your email address will not be published. Required fields are marked *

Please enter. * Time limit is exhausted. Please reload CAPTCHA.