Group Replication: coping with unreliable failure detection

Failure detection is a cornerstone of distributed systems as it determines which components are working properly or not, allowing it to tackle both network and host instabilities and failures. Like many other distributed systems, Group Replication (GR) requires a majority of correctly operating group members to work properly. Each GR member comprises a failure detector that is responsible for determining the state of the remaining group members by analyzing all exchanged messages. If it doesn’t receive any message from another member for some time, it creates a suspicion. When that suspicion times out, the member is expelled and, as a result, excluded from the group’s membership. This decreases the size of the group, increasing the chance that the group holds a majority of correctly working members, which allows correct processing of clients’ requests. This blog post details how this expelling mechanism works and how it can be configured to defer the eviction of group members.

Any GR member takes 5 seconds to detect if another group member is not responding. After this detection period, a non-responding member is suspected of having failed and then expelled from the group, by default. Due to transient network failures or to machine slowdowns, a member might not communicate for some seconds, which can cause it to be immediately expelled from the group due to a slight degradation in performance or communications.

To avoid this issue, we have introduced in MySQL 8.0.13 a new parameter in GR, group_replication_member_expel_timeout, which will determine the interval in seconds between the moment a member becomes a suspect and the moment it is actually expelled from the group, when the corresponding suspicion times out.

Let’s analyze two different scenarios to assess the impact of varying the value of group_replication_member_expel_timeout, specifically how it affects the group’s behavior when faced with a non-responding member.

  • Scenario 1 – This scenario depicts the default GR behavior regarding suspect members. Suspicions are created, but since the suspicions’ lifetime is set to 0 seconds, the suspect members are simply expelled from the group.
  • Scenario 2 – This scenario demonstrates how one can defer the expulsion of a member from the group by increasing the value of the new parameter, hence the lifetime of suspicions.

On both scenarios, we will use a group with 3 members (node1, node2 and node3) connected through the group1 network, from which we will disconnect and reconnect node3. For this setup, we used Docker containers for simplification matters, but the commands can be adapted to other platforms. See the Setting up MySQL Group Replication with MySQL Docker images blog post for more details on the used setup.

Figure 1. Group with 3 members connected through the group1 network.
Figure 2. Group after disconnecting node3 from the group1 network.

Scenario 1: Default behavior

This scenario depicts the default behavior of the expelling mechanism, i.e. when group_replication_member_expel_timeout retains the default value. Therefore, when a member becomes a suspect of having failed it is immediately expelled from the group, since the lifetime of its suspicion is set to 0.

The value of group_replication_member_expel_timeout is set to 0.

To ensure that node3 is expelled from the group, it is disconnected from the network.

One second after disconnecting node3 from the network, it is still considered to be ONLINE by the remaining group members.

More than 5 seconds after disconnecting node3, it should have been expelled from the group and not show up in the membership seen by the remaining members.

Notice that node3 still considers itself as ONLINE since it doesn’t know that it was expelled from the group.

node3 will be reconnected to the network to verify what happens.

After node3’s re-connection, it is able to communicate with node1 and node2, and it becomes aware that it was expelled and shuts itself down, according to the default value of the group_replication_exit_state_action parameter, which is ABORT_SERVER.

Scenario 2: How to defer the expulsion of a member

On this scenario, we will verify that an unreachable member isn’t expelled until its suspicion lifetime elapses, i.e., the member was not unreachable for longer than the value of group_replication_member_expel_timeout. node3 will be disconnected and reconnected to the network after some time, allowing it to rejoin the group.

group_replication_member_expel_timeout is set to 300 seconds on all nodes.

node3 is disconnected from the network.

30 seconds after the disconnection, node3 is seen as UNREACHABLE by the remaining group members, and these are considered as UNREACHABLE by that member.

To verify that node3 is able to rejoin the group, it is reconnected to the network 40 seconds after the disconnection.

One second after the re-connection, node3 is considered ONLINE by all group members, since we modified the value of group_replication_member_expel_timeout to defer the expulsion of a suspect member by 300 seconds. However, node3 still considers node1 and node2 as UNREACHABLE.

Let’s verify how node3’s view of the remaining members evolves.

Finally, node3 sees node1 and node2 as ONLINE again, 17 seconds after the re-connection, which is a normal amount of time for node3 to process the missing messages allowing it to be up to date.

Remember that if node3 wasn’t reconnected before the suspicion had elapsed, it would be expelled from the group and not able to re-join automatically.

Perks and Limitations

In order to unblock existing suspicions, any modification of the value of the group_replication_member_expel_timeout parameter affects them as well as future suspicions. For instance, if a suspicion was created 60 seconds ago, and the parameter’s value is 3000 seconds, the suspicion still hasn’t timed out. However, if the value of the parameter is modified to 30 seconds, the suspicion will time out and the corresponding member will be expelled from the group.

Although it is not mandatory that all group members have the same value defined for this new parameter, that is recommended in order to avoid unexpected behavior due to the distributed nature of the system, which can contribute to members being expelled unexpectedly.

Group membership re-configurations aren’t allowed if there are UNREACHABLE members in the group. Even if most members are ONLINE, which allows the group to process clients’ requests, you won’t be able to add or remove nodes while there are UNREACHABLE members in the group. If you must change the group’s membership, you can force suspicions to time out by changing the value of group_replication_member_expel_timeout.

One limitation of the expelling mechanism is that it might not expel a suspect member immediately after its suspicion times out, since these are verified periodically, which could imply taking a few more seconds before expelling a suspect.

Conclusion

This blog post describes the newly introduced group_replication_member_expel_timeout parameter, and demonstrates how it allows GR users to defer the expulsion of group members that are suspect of having failed. This is most useful when dealing with slow or overloaded machines and networks, which can cause the remaining group members to expel other members that are operating correctly, albeit being overloaded or communicating slowly.

We hope you can improve the availability of your group by fiddling with this new GR parameter introduced in MySQL 8.0.13. Please share with us your experience, feedback or doubts through comments on this blog or MySQL bugs.

3,598 total views, 12 views today

Leave a Reply