Extending replication instrumentation: account for memory used in XCom

Version 8.0.12 added many great new features to MySQL. One of the new features included is the memory instrumentation of the XCom cache, which allows users to view and monitor the memory utilization of the cache by querying Performance Schema (PS). This blog post will tell you all about this latest addition.

The XCom Cache

The XCom cache is the message cache used by the Group Communication System (GCS) of GR to hold the messages exchanged between GR nodes as part of the consensus protocol. It is the single most important consumer of memory in the GCS system as a whole and it is one of its most essential components. It takes its name from the eXtended Communications (XCom) submodule of GCS that is in charge of running the consensus protocol. If you would like to know more about these components and the consensus protocol, take a look at the many great blog posts (this one is a good entry point).

Currently, the cache is composed of 50k slots and the maximum size of the cache is approximately 1GB. This means that the cache can store up to 50k messages or close to 1GB before it has to start deleting any data; when either the space limit or the slots limit is reached (one of which will, inevitably, happen), the cache will remove some of the old entries to make space for new ones. Which of the limits is reached first depends on the  workload of the system, specifically on the average message/transaction size: in a system with small messages the slot limit will be reached first; in an environment with large transactions, the size limit will be reached first.

Monitoring the cache

Let’s now see the new feature in action. Information about the memory usage of the XCom cache is shown in the memory_summary_global_by_event_name  table of PS under key GCS_XCom::xcom_cache.  Here is an example of what this table contains immediately after starting GR:

One interesting aspect of the instrumentation of the XCom cache is that each allocation corresponds to a cached entry, meaning that there is a one-one correspondence between allocations/deallocations and cached/removed messages. The columns of the memory_summary_global_by_event_name  table, thus, contain the following information:

  • COUNT_ALLOC, COUNT_FREE: Aggregated number of allocations and deallocation, which correspond, respectively, to the aggregated number of messages cached and removed from the cache.
  • SUM_NUMBER_OF_BYTES_ALLOC, SUM_NUMBER_OF_BYTES_FREE: Aggregated sizes of allocated and deallocated memory blocks. These correspond to the aggregated sizes of cached and removed messages, respectively.
  • CURRENT_COUNT_USED: Aggregated number of currently allocated blocks that have not been freed yet; its value is equal to COUNT_ALLOC COUNT_FREE , which corresponds to the number of currently cached entries.
  • CURRENT_NUMBER_OF_BYTES_USED: Aggregated size of currently allocated memory blocks that have not been freed yet; its value is equal to SUM_NUMBER_OF_BYTES_ALLOC SUM_NUMBER_OF_BYTES_FREE , which corresponds to the aggregated size of all the currently cached messages.
  • HIGH_COUNT_USED, HIGH_NUMBER_OF_BYTES_USED: The highest values seen in the CURRENT_COUNT_USED and CURRENT_NUMBER_OF_BYTES_USED columns, respectively.
  • LOW_COUNT_USED, LOW_NUMBER_OF_BYTES_USED: The lowest values seen in the CURRENT_COUNT_USED and CURRENT_NUMBER_OF_BYTES_USED columns, respectively. These columns will always have 0 as its value.

By definition, the cache will simply continue to allocate data as long as it has free space and free slots available. For example, after leaving the server running for several minutes under a low load environment, we have the following output, which shows that none of the limits have been reached and, thus, no message has been removed:

Here is an example of the output that can be seen after the slots limit has been reached:

Note that from this moment on, the CURRENT_COUNT_USED column will remain at 49999 or 50000 (depending on the timing of the query to PS), because the cache will need to remove an older entry every time something new needs to be cached.

Conclusion

Since MySQL 8.0.12, users can monitor the memory consumption of the XCom cache through queries to MySQL’s performance schema. Make sure to try this new feature and let us know about any feedback you have!

2,218 total views, 8 views today

Leave a Reply