Group Replication: Preventing stale reads on primary fail-over!

This is a follow up post about the new consistency option of Group Replication group_replication_consistency. We recommend that you first read this post to get an overview of scenarios that can be addressed with this solution.

A group replication cluster will automatically detect failures and adjust the view of the members that are active, i.e., the membership configuration. Furthermore, if a group is deployed in single-primary mode, when the membership changes there will be a check performed to figure out if there is still a primary member in the group. If there is none, a new one shall be selected from the list of secondary members. Typically, this is known as the secondary promotion.

Given the fact that the system detects failures and reconfigures itself automatically, the user may also expect that once the promotion takes place, the new primary is in the exact state, data-wise, as that of the old one. In other words, the user may expect that there is no backlog of replicated transactions to be applied on the new primary once he is able to read from and write to it. In practical terms, the user may expect that once his application fails-over to the new primary, there would be no chance, even if temporarily, to read old data or write into old data records.

Now… Whenever the user has flow control activated and properly tuned, the chances of transiently reading stale data from the new primary right after the promotion are pretty slim, as there should not be a backlog, or if there is one it should be small. Moreover, some group replication users have proxy/middleware layers that govern application accesses to the primary after a promotion and enforce the consistency criteria at that level.

Nonetheless, for those who do not, there was no real mechanism in place preventing the application from racing with the backlog applier and thus reading or writing into stale data. Not until MySQL 8.0.14.

Consistency Guarantees on Primary Fail-over

MySQL 8.0.14 onwards, the user can specify the behavior of the new primary once it is promoted. There is a new option named group_replication_consistency, which controls whether the new primary blocks both reads and writes until after the backlog is fully applied or if it operates in the pre MySQL 8.0.14 behavior. If the option is set to BEFORE_ON_PRIMARY_FAILOVER, then the new behavior described above is activated. In detail, if transactions are issued against a new primary which has the group_replication_consistency=’BEFORE_ON_PRIMARY_FAILOVER’, while it is still  applying the replication backlog, these will be blocked until  the backlog is fully applied. Thus, the following anomalies are prevented:

  •  No stale reads for read-only and read-write transactions. This prevents stale reads from being externalized to the application by the new primary.
  • No spurious rolling backs for read-write transactions, due to write-write conflicts with replicated read-write transactions still in the backlog to be applied.
  • No read skew on read-write transactions, such as:

    This query will most likely not cause a conflict but will write outdated
    values.

Note that with this option ON the user is stating that he favors consistency over availability. That is the type of trade-off the user has to consider.

How to use it

To enable hold read and writes when a primary election is occurring you need to set  group_replication_consistency=’BEFORE_ON_PRIMARY_FAILOVER’. By default the value is EVENTUAL that won’t hold any reads and writes will error out due to super_read_only being enabled.

group_replication_consistency is a system variable, and so you can inspect current value through the regular Perfomance_Schema, SELECT or SHOW commands.

Non-Blocked Queries

Not all reads are blocked whenever group_replication_consistency=’BEFORE_ON_PRIMARY_FAILOVER’. The user must still be able to inspect the server while it is applying backlog after a promotion took place. This is useful for debugging, monitoring, observability and troubleshooting.

Basically, queries that do not modify data are allowed, such as the following:

  • SHOW commands
  • SET option
  • DO
  • EMPTY
  • USE
  • SELECTing from performance_schema database
  • SELECTing from table PROCESSLIST on database infoschema
  • SELECTing from sys database
  • SELECT command that don’t use tables
  • SELECT command that don’t execute user defined functions
  • STOP GROUP_REPLICATION command
  • SHUTDOWN command
  • RESET PERSIST

Note that a transaction cannot be on-hold forever. If it exceeds wait_timeout it shall return ER_GR_HOLD_WAIT_TIMEOUT.

To guarantee that the group do not answer with stale data all members of the group should have group_replication_consistency=’BEFORE_ON_PRIMARY_FAILOVER’  persisted on its configuration , so when a secondary is being promoted will put on hold  transactions until primary applies all backlog.

Conclusion

Group Replication continues its development and this solution improves the guarantees of consistency on your data.

Please give it a try to this new feature, available on MySQL 8.0.14 and give us your feedback !!

10,448 total views, 26 views today

About Aníbal Pinto

I am a group replication developer since 2016, just in time of first release of Group Replication on a GA version, 5.7.17. My main focus is improving Group Replication be it by fixing details or adding new features to simplify users life when setting up High Availability solution.