No new options, no new commands… Just faster at full load, where it counts!

Starting on MySQL 8.0.1, a very updated replica server will be more efficient (and probably faster) than previous MySQL versions because of improvements in the relationship between the replication threads. Preliminary testing showed a benefit of up to 65% on Sysbench Update Index.

What was the issue

The core of MySQL replication on the replica side is composed of two threads (at least): one responsible for the connection with the master, retrieving the events and queuing them on the relay log; another (the applier) responsible for reading the queued events from the relay log and applying them on the replica.

The relationship between connection and applier was very tight for a very updated replica (when both threads were dealing with the same relay log file). The access to the “hot” relay log file required mutual exclusion by the replication threads: when the connection was writing to it, the applier was unable to read content to be applied and had to wait. Likewise, when the applier was reading from the “hot” relay log file, the connection was unable to write new received content to it and had to wait.

WL8599 - Arbitration and IO operations - Before

The arbitration was necessary to prevent the applier from sending to workers events that are only partially written to the relay log.

This arbitration might be beneficial on slaves with limited resources, but it was also limiting the scalability of the multi-threaded slave (MTS) applier.

What has changed

MySQL 8.0.1 made the relationship between the replication threads more efficient. The applier should never block the connection anymore (the only exception is when the relay log overpassed the relay log space limit). On the other hand, the connection will not block the applier for already fully queued transaction parts.

Arbitration and IO operations - After

The solution relies on making the connection thread keep an updated information about the position on the “hot” relay log file of the last fully queued event. The applier now reads from the “hot” log up to this position and will wait for notification from the connection thread when there is nothing else to apply.

Some numbers about the improvement

For the released of MySQL 8.0.1 we performed a set of evaluations taking into account the replication changes between versions 8.0.0 and 8.0.1, including this new feature changes.

The numbers were obtained using a single host, based on Xeon E5-2699-V3 processor, with 16 cores bounded to the master server, 16 cores bounded to the replica server and the remaining 4 cores bounded to Sysbench threads. The workloads were executed on the master with the replication threads stopped on the replica up to 1 million transactions. Then, the replica threads were started and the numbers were collected after the replica reached the 1 million transactions while the master kept generating more transactions to replicate. The database had 16 tables and 8 million rows stored on a local SSD disk.

In order to make our testing comparable with MySQL 8.0.0, we used the replica configured with --slave-parallel-type=LOGICAL_CLOCK.

WL8599 - Sysbench Applier Throughput

Comparing the numbers obtained with durable settings (--sync-binlog=1 on both master and replica and --log-slave-updates on replica) for Sysbench OLTP RW (improvement of 25%) and Sysbench Update Index (improvement of 42%) just confirmed our expectations: workloads with many small transactions should notice more improvements.

WL8599 - Sysbench UI - Applier Throughput - Durability and Threads

Pushing the performance further by using non-durable settings (--sync-binlog=0 on master and replica), we obtained up to 60% of improvement when compared with MySQL 8.0.0 for Sysbench Update Index.

WL8599 - Sysbench UI - Applier Throughput - Binlog Format

Pushing even more, we were able to measure an improvement of up to 65% in Sysbench Update Index with non-durable settings and using Statement-based Replication (SBR).

Summary

MySQL 8.0.1 made the relationship between connection and applier replication threads to be more efficient, with improvements of up to 25% (on Sysbench OLTP RW) and of up to 65% (on Sysbench Update Index, with non-durable settings and SBR). The improvements are expected to be more noticeable when having workloads with many small transactions and using multi-threaded slave capabilities.

About João Gramacho

João Gramacho is a Software Developer for the MySQL Replication Core team at Oracle. Before joining Oracle, he worked for more than ten years with IT infrastructure support (servers hardware, operating systems, networks, management and security) and did a PhD in high-performance computing.

4 thoughts on “No new options, no new commands… Just faster at full load, where it counts!

  1. hi,
    I have a question, what’ the meaning of “updated before locking” in graph “Arbitration on 8.0.1”, arbitration zone?
    Supposed it means the last_pos is updated before locking, but Applier compares the pos and last_pos outside of arbitration zone;
    I’m confused;

    1. Hi Wang!

      The “pos < last_pos" evaluation outside the arbitration zone relies on atomics to avoid blocking as it should not happen often in a replica server with constant workload.

      The applier does some activities after "pos < last_pos" evaluation and before entering the arbitration zone.

      Suppose last_pos is updated by the I/O thread after the applier evaluated "pos < last_pos" and before the applier entered the arbitration zone. If the applier doesn't evaluate "pos < last_pos" again (checking if it was updated before the applier entered the arbitration zone) before entering "wait notification" state, the applier would miss the last I/O thread notification about last_pos being updated.

Leave a Reply

Your email address will not be published. Required fields are marked *

Please enter. * Time limit is exhausted. Please reload CAPTCHA.