An overview of the Group Replication performance

In this blog post we will present a first look at the performance of Group Replication (GR), now that the RC is out. The goal is to provide some insight on the throughput, latency and scalability one can expect in a modern computing infrastructure, using GR in single- and multi-master configurations, but mostly focused on single-master.

→ If you’re in a hurry, here’s the summary: the performance of GR is great!

1. The basics of how it works

GR introduces the ability to make MySQL servers collaborate and provide a true multi-master architecture where clients can write to the database through any of the members of a GR group. The master-slave model of traditional replication is now replaced with a peer-to-peer model in which all servers are masters of their clients workloads.

Servers in a distributed database system must unequivocally agree on which database changes are to be performed and which are to be refused, independently of the server a client connects to or the moment in which he does it. When a transaction is ready to commit, the changes are sent by the client-facing member to all other members in a totally ordered broadcast message using a group communication system based on Paxos. After the changes are received and certified by the members, and if no incompatibilities arise, the changes are committed on the client-facing member and asynchronously applied on the remaining members.

For more information regarding what is going on behind the scenes, from a performance point of view, please check this post: http://mysqlhighavailability.com/tuning-mysql-group-replication-for-fun-and-profit/

2. Throughput and scalability

The following charts present the throughput achieved by Sysbench RW and Update Indexed using GR with several group sizes, comparing it to a standalone MySQL server (with the binary log active). This will allows us to show GR’s behaviour on a well known workload but, as any synthetic benchmark, it is no proper replacement for testing with the actual workloads and infrastructure that will be used in each actual deployment.

We measured both the peak throughput – the best that the clients can get from the client-facing servers (with flow-control disabled) – and the sustained throughput – the best the system as a whole can withstand in the long run without having some of the members lagging behind the others. For these first charts we use the best throughput achieved in any thread combination.

Sysbench Read-Write throughput

Sysbench RW maximum throughput
Chart 1. Sysbench RW maximum throughput

This chart shows several points that make us quite proud:

  • When all write requests are directed to a single member (single-master) GR achieves around 80% of the throughput of a standalone MySQL server, dropping only slightly (3%) when going from groups of 2 members up to 9 members (that’s the largest group supported for now);

  • When write requests are distributed between multiple members (multi-master), the sustained throughput can even be slightly larger than that of a standalone MySQL server and event with 9 server groups we can still get 85% of that;

  • Also we can get peak throughputs that almost double the capacity of a standalone server when using multi-master, if we disable flow-control and allow transactions to buffer and apply later;

  • Finally, on Sysbench RW the MTS applier on the members allows them to keep-up with the master (in single-master mode) even at the maximum throughput it can deliver.

Sysbench Update Index throughput

Sysbench Update index maximum throughput
Chart 2. Sysbench Update index maximum throughput

The Sysbench Update Index benchmark shows what happens with GR in workloads with very small transactions (more demanding for replication, with a single update per transaction in this case). With smaller transactions the number of operations that GR must handle is higher and the overhead on GR itself becomes proportionally larger.

Even in this scenario GR performs at very competitive levels. In particular, the scalability on the network is still great, and going from 2 to 9 members brings a very small degradation in both single-master and multi-master scenarios. Nevertheless, there is work to do to bring the sustained throughput closer to the peak and both closer to the standalone server.

The next sections expand on these numbers for the single-master case. A posterior blog post we will focus on the performance of multi-master configurations, including the balance/fairness between members, impact of conflicting workloads, etc.

3. Throughput by client (single-master)

Group Replication takes over a transaction once it is ready to commit and it only returns to the client once the certification is done, which happens after a majority of the members in the group agrees on an common sequence of transactions. This means that Group Replication is expected to increase the transaction latency and to reduce the number of transactions executed per second for the same thread count until the server starts reaching its full capacity. Once the number of clients is high enough to keep the server busy the added latency may be hidden entirely, but then scheduling many threads introduces its own overhead that limits that effect.

Sysbench RW sustained throughput by client thread
Chart 3. Sysbench RW sustained throughput by client thread

On the chart above one can see that at the same number of client threads there is a small gap between the  number of transactions per second that GR and a standalone server can deliver. That gap is on average [12%-19%] between 2 – 9 members, but with 70 client threads the TPS gap between GR and the standalone server is very small (4%-11%). At their maximum throughput the difference between GR and a standalone server goes to 18%-21%, again from 2 to 9 member groups.

So, we are very please to see that GR is able to achieve around 80% of the standalone server performance on single-master configurations, even using the largest groups supported.

Sysbench Update Index throughput by client thread
Chart 4. Sysbench Update Index throughput by client thread

Even on very small transactions we are still able to achieve around 60% of the throughput of a standalone server. The latency gets hidden as we increase the number of clients, as on Sysbench RW, but the number transactions that need to go through certification and applied is much larger and becomes a limit in itself.

4. Transaction latency (single-master)

Sysbench RW average transaction latency
Chart 5. Sysbench RW average transaction latency

As mentioned above, the transaction latency was expected to grow with GR, but the chart above shows that in the tested system the Sysbench RW transaction latency increase is rather contained (16% to 30%, from 2 to 9 members).

Latency added by Group Replication in Sysbench RW and Update Index
Chart 6. Latency added by Group Replication in Sysbench RW and Update Index

That latency increase depends mostly on the network, so the added latency changes only slightly between the Sysbench RW and the Sysbench Update Index, even if the payload sent is several times larger in Sysbench RW. The chart above shows that for this kind of workloads the latency increase is between 0.6ms to 1.2ms from 2 to 7 members, going up to 1.7ms with 9 members.

5. Benchmarking setup

In order to show GR throughput without interference from the durability settings we used sync_binlog=0 and innodb_flush_log_at_trx_commit=2 during the tests. In multi-master we used group_replication_gtid_assignment_block_size=1000000 and the results presented are from the highest from using 2 or all members of the group as writers of non-conflicting workloads.

The tests were executed using Sysbench 0.5 in a database with 10 million rows (the database fits in the buffer pool) and the median result of 5 runs is used.

The benchmark were deployed in an infrastructure constituted by 9 physical servers (Oracle Enterprise Linux 7, 20-core, dual Intel Xeon E5-2660-v3, with database and binary log placed on enterprise SSDs), with another server acting as a client (36-core, dual Intel Xeon E5-2699-v3), all interconnected by a 10Gbps Ethernet network.

6. Summary

We have spent quite a bit of effort optimizing the performance of Group Replication and this post presents a brief overview of the result now that the RC is being released.

Other performance-related posts will follow soon to focus specifically on the multi-threaded applier, on the group communication system itself and on multi-master scenarios.

Stay tuned!

About Vitor Oliveira

Vitor Oliveira is a Senior Performance Engineer part of the MySQL team at Oracle, working with Replication and other components. He has a background in high-performance computing systems, application profiling and optimization.

11 thoughts on “An overview of the Group Replication performance

    1. Hi,
      Sorry for the delay, somehow I didn’t see your comment!

      The sysbench queries used to execute the benchmark are:
      sysbench –test=oltp.lua –oltp-table-size=10000000 –mysql-db=db –oltp-tables-count=1 –num-threads= –mysql-engine-trx=yes –oltp-read-only=off –oltp-test-mode=complex –oltp-dist-type=uniform –init-rng=on –max-time=120 run
      and
      sysbench –test=update_index.lua –oltp-table-size=10000000 –mysql-db=db –oltp-tables-count=1 –num-threads= –mysql-engine-trx=no –oltp-nontrx-mode=update_nokey –oltp-read-only=off –oltp-test-mode=nontrx –oltp-dist-type=uniform –init-rng=on –max-time=300 run

      In the multi-master case a sysbench process is run per member server to test, each using its own database (that above).

      Regards
      Vitor

  1. Hi, I just download mysql-5.7.15-labs-gr090 version, and run sysbench on 3-node single master mode, the insert performance is only 4000/s, and with pstack ,I can see that most of threads are dealing with group_replication_trans_before_commit. however the standart mysql server with the same config can reach 80000/s , I want to know if I set wrong arguments in test , can you share your configuration?

    1. Hi,

      Threads wait on group_replication_trans_before_commit while they are waiting for consensus to be reached on the network (certification_latch->waitTicket()) or when flow-control is throttling the writer to allow the slaves to keep up (applier_module->get_flow_control_module()->do_wait()).

      For the first case, please check the network and if bandwidth is a problem maybe compress the messages.
      For the second case, please check that your slaves are able to keep up with the master, when writing to the relay log, binlog and database.

      In this post you have further info about this:
      http://mysqlhighavailability.com/tuning-mysql-group-replication-for-fun-and-profit/

      Regarding the configuration parameters, the tests shown in this post don’t enforce durability on storage to try to check the limits of GR itself.
      The most relevant variables are:
      innodb_doublewrite=0
      innodb_flush_log_at_trx_commit=2
      innodb_flush_method=O_DIRECT_NO_FSYNC
      innodb_thread_concurrency=64
      sync-binlog=0
      performance_schema=OFF

      I hope this helps.

      Regards
      Vitor Oliveira

      1. Hi, I use the sysbench cmds given above, but fail to get the expected performance result (only 5300 qps for insert & 24000 qps for update).

        My test machine:
        Intel Xeon CPU E5-2650 v2 @ 2.60GHz, 32-core
        10Gbps Ethernet network
        enterprise SSDs

        Here is my config file.
        [mysqld]
        datadir=
        basedir=
        log_error=
        port=
        socket=
        #
        # Replication configuration parameters
        #
        server_id=
        gtid_mode=ON
        enforce_gtid_consistency=ON
        master_info_repository=TABLE
        relay_log_info_repository=TABLE
        binlog_checksum=NONE
        log_slave_updates=ON
        log_bin=binlog
        binlog_format=ROW
        #
        # Group Replication configuration
        #
        transaction_write_set_extraction=XXHASH64
        loose-group_replication_group_name=
        loose-group_replication_start_on_boot=off
        loose-group_replication_local_address=
        loose-group_replication_group_seeds=
        loose-group_replication_bootstrap_group=off
        loose-group_replication_ip_whitelist=

        max_connections=8500
        max_user_connections=8000
        innodb_flush_log_at_trx_commit=2
        sync_binlog=0
        performance_schema=OFF

        innodb_doublewrite=0
        innodb_flush_method=O_DIRECT_NO_FSYNC
        innodb_thread_concurrency=64
        innodb_spin_wait_delay=6
        innodb_buffer_pool_size=35G
        innodb_log_file_size=4G
        innodb_buffer_pool_instances=4
        innodb_log_files_in_group=4
        innodb_log_buffer_size=200M
        innodb_flush_log_at_trx_commit=0
        innodb_max_dirty_pages_pct=60
        innodb_io_capacity_max=6000
        innodb_io_capacity=1000
        innodb_read_io_threads=8
        innodb_write_io_threads=8
        innodb_open_files=615350
        innodb_file_format=Barracuda
        innodb_file_per_table=1
        innodb_change_buffering=inserts
        innodb_adaptive_flushing=1
        innodb_old_blocks_time=1000
        innodb_stats_on_metadata=0
        innodb_use_native_aio=1
        innodb_lock_wait_timeout=5
        innodb_rollback_on_timeout=0
        innodb_purge_threads=4
        innodb_strict_mode=1
        transaction-isolation=READ-COMMITTED
        innodb_disable_sort_file_cache=ON
        innodb_lru_scan_depth=2048
        innodb_flush_neighbors=0
        innodb_sync_array_size=16
        innodb_print_all_deadlocks
        innodb_checksum_algorithm=CRC32
        innodb_max_dirty_pages_pct_lwm=10

        binlog_cache_size=32K
        max_binlog_cache_size=2G
        max_binlog_size=500M
        binlog-format=ROW
        binlog_checksum=none
        log_bin_use_v1_row_events=on
        explicit_defaults_for_timestamp=OFF
        binlog_row_image=FULL
        binlog_rows_query_log_events=OFF
        binlog_stmt_cache_size=32768

        #server
        default-storage-engine=INNODB
        character-set-server=utf8mb4
        lower_case_table_names=1
        skip-external-locking
        open_files_limit=615350
        safe-user-create
        local-infile=1
        log_slow_admin_statements=1
        log_warnings=1
        long_query_time=1
        slow_query_log=1
        general_log=0
        query_cache_type=0
        query_cache_limit=1M
        query_cache_min_res_unit=1K
        table_definition_cache=65536
        metadata_locks_hash_instances=256
        metadata_locks_cache_size=32768
        eq_range_index_dive_limit=200
        table_open_cache_instances=16
        table_open_cache=65535
        thread_stack=512K
        thread_cache_size=256
        read_rnd_buffer_size=128K
        sort_buffer_size=256K
        join_buffer_size=128K
        read_buffer_size=128K
        skip-ssl
        max_connections=8500
        max_user_connections=8000
        max_connect_errors=65536
        max_allowed_packet=128M
        connect_timeout=8
        net_read_timeout=30
        net_write_timeout=60
        back_log=1024

        1. Hi,

          By default Group Replication uses a flow-control scheme that does not allow the member that writes to the group to go faster then the other nodes can apply the workload, so that slaves don’t start lagging behind.

          In your case, it seems you haven’t activated MTS with a high number of threads to allow the members to keep up with the changes.

          Please try something like:
          slave_parallel_type=LOGICAL_CLOCK
          slave_parallel_workers=16
          slave_preserve_commit_order=on

          To test if that is the reason you can also disable flow control with:
          group_replication_flow_control_mode=DISABLED

          Regards
          Vitor

  2. Hi, Vitor
    Recently I test the insert performance of MySQL 5.7.15 and find that gtid mode greatly affects the result.
    With gtid, the result is 3.5w qps. If I turn it off, the result can reach 9w qps.
    I think that the requirement of gitd decreases the performace of group replication.

  3. Hi,
    GTIDs are fundamental for the GR certification process to figure out the state each node had prior to executing a transaction so check for conflicts, no way around them.

    But their impact on performance should not be that large, so please let me ask a few questions:
    – how many clients are you using?
    – is gtid-mode the only change in the configuration?
    – what cpu and storage type are you using?
    – is sync_binlog=1 and innodb_flush_log_at_trx_commit=1?

    Anyway, please try 5.7.17 when it comes out. 😉

    Regards
    Vitor

    1. Here is my test environment:
      Intel Xeon CPU E5-2650 v2 @ 2.60GHz, 16 core
      enterprise SSD
      innodb_flush_log_at_trx_commit=2
      sync_binlog=0

      To test no gtid mode, I just comment the following two lines:
      #gtid_mode=ON
      #enforce_gtid_consistency=ON

      In addition, I found the performance degradation of gtid mode does not appear in the lastest version of MySQL 5.6.

      1. Hi,
        Thanks for the config, good for us to know.
        The GTIDs take a bit time to generated per transaction, so for very small transactions it’s noticeable. But as a per transaction overhead it’s possible to reduce it by grouping several auto-committed inserts into larger transactions.
        That is also a good thing to do for GR, as replication also becomes more efficient when transactions are not very small (like a single update or insert).
        Regards
        Vitor

Leave a Reply

Your email address will not be published. Required fields are marked *

Please enter. * Time limit is exhausted. Please reload CAPTCHA.