MySQL Fabric – adding High Availability to MySQL

MySQL Fabric - High Availability and Scalability for MySQL

MySQL Fabric is a new framework that adds High Availability (HA) and/or scaling-out for MySQL. MySQL Fabric achieves scale-out by managing the sharding of table data between multiple MySQL Servers and then having Fabric-aware connectors route queries and transactions to the correct locations – scaling-out will be the subject of a future post and the rest of this article is focused on using MySQL Fabric for HA. It starts with an introduction to HA and how MySQL Fabric delivers it before going on to work through a full example of configuring a HA farm of MySQL Servers together with the code that the application developer needs to write in order to exploit it. Note that at the time of writing, MySQL Fabric is not yet GA but is available as a public alpha.

High Availability – Introduction

High Availability (HA) refers to the ability for a system to provide continuous service – a system is available while that service can be utilized. The level of availability is often expressed in terms of the “number of nines” – for example, a HA level of 99.999% means that the service can be used for 99.999% of the time, in other words, on average, the service is only unavailable for 5.25 minutes per year (and that includes all scheduled as well as unscheduled down-time).

Layers in architecture where High Availability is needed

The figure shows the different layers in the system that need to be available for service to be provided.

At the bottom is the data that the service relies on. Obviously, if that data is lost then the service cannot function correctly and so it’s important to make sure that there is at least one extra copy of that data. This data can be duplicated at the storage layer itself but with MySQL, it’s most commonly replicated by the layer above – the MySQL Server using MySQL Replication. The MySQL Server provides access to the data – there is no point in the data being there if you can’t get at it! It’s a common misconception that having redundancy at these two levels is enough to have a HA system but you also need to look at the system from the top-down.

To have a HA service, there needs to be redundancy at the application layer; in itself this is very straight-forward, just load balance all of the service requests over a pool of application servers which are all running the same application logic. If the service were something as simple as a random number generator then this would be fine but most useful applications need to access data and as soon as you move beyond a single database server (for example because it needs to be HA) then a way is needed to connect the application server to the correct data source. In a HA system, the routing isn’t a static function, if one database server should fail (or be taken down for maintenance) the application should be directed instead to an alternate database. Some HA systems implement this routing function by introducing a proxy process between the application and the database servers; others use a virtual IP address which can be migrated to the correct server. When using MySQL Fabric, this routing function is implemented within the Fabric-aware MySQL connector library that’s used by the application server processes.

MySQL Fabric delivers HA by adding a management and monitoring layer on top of MySQL Replication together with a set of Fabric-aware MySQL Connectors that route writes (and consistent reads) to the current master.

MySQL Fabric has the concept of a HA group which is a pool of two or more MySQL Servers; at any point in time, one of those servers is the Primary (MySQL Replication master) and the others are Secondaries (MySQL Replication slaves). The role of a HA group is to ensure that access to the data held within that group is always available.

Example MySQL Fabric HA Group

While MySQL Replication allows the data to be made safe by duplicating it, for a HA solution two extra components are needed and MySQL Fabric provides these:

  • Failure detection and promotion – the MySQL Fabric process monitors the Primary within the HA group and should that server fail then it selects one of the Secondaries and promotes it to be the Primary (with all of the other slaves in the HA group then receiving updates from the new master). Note that the connectors can inform MySQL Fabric when they observe a problem with the Primary and the MySQL Fabric process uses that information as part of its decision making process surrounding the state of the servers in the farm.
  • Routing of database requests – When MySQL Fabric promotes the new Primary, it updates the state store and notifies the connectors so that they can refresh their caches with the updated routing information. In this way, the application does not need to be aware that the topology has changed and that writes need to be sent to a different destination.

Worked Example

The following steps set up the HA MySQL configuration shown here before running some (Python) code against it and then finally the killing the Primary (replication Master) and observing that one of the slaves is automatically promoted.

Note that this configuration isn’t really HA as all of the MySQL Servers in the HA Group are actually running on the same machine; this configuration has been chosen for this post to illustrate that you can experiment with MySQL Fabric using a small number of machines (in fact, the MySQL Fabric process and its state store (another MySQL Server) could have been run on that same machine). Later posts will use more machines to demonstrate more realistic deployment topologies.

Building the HA MySQL Server Farm

The machines being used already have MySQL 5.6 installed (though in a custom location) and so the only software pre-requisite is to install the MySQL connector for Python from the “Development Releases” tab from the connector download page and MySQL Fabric (part of MySQL Utilities) from the “Development Releases” tab on the MySQL Utilties download page:

MySQL Fabric needs access to a MySQL Database to store state and routing information for the farm of servers; if there isn’t already a running MySQL Server instance that can be used for this then it’s simple to set one up:

MySQL Fabric needs to be able to access this state store and so a dedicated user is created (note that the fabric database hasn’t yet been created – that will be done soon using the mysqlfabric command):

All of the management requests that we make for MySQL Fabric will be issued via the mysqlfabric command. This command is documented in the MySQL Fabric User Guide but sub-commands can be viewed from the terminal using the list-commands option:

MySQL Fabric has its own configuration file (note that it’s location can vary depending on your platform and how MySQL Utilities were installed). The contents of this configuration file should be reviewed before starting the MySQL Fabric process (in this case, the mysqldump_program and mysqldump_program settings needed to be changed as MySQL was installed in a user’s directory):

The final step before starting the MySQL Fabric process is to create the MySQL Fabric schema within the state store:

An optional step is then to check for yourself that the schema is indeed there:

The MySQL Fabric process can now be started; in this case the process will run from the terminal from which it’s started but the --daemonize option can be used to make it run as a daemon.

If the process had been run as a daemon then it’s useful to be able to check if it’s actually running:

At this point, MySQL Fabric is up and running but it has no MySQL Servers to manage. As shown in the earlier diagram, three MySQL Servers will run on a single machine. Each of those MySQL Servers will need their own configuration settings to make sure that there are no resource conflicts – the steps are shown here but without any detailed commentary as this is standard MySQL stuff:

Now that the MySQL Servers are configured and up and running it’s possible to create the new HA Group (my_group and add the three new MySQL Server instances to it):

The mysqlfabric command can then be used to confirm that the HA group now contains the three servers but that they’re all still tagged as being Secondaries (in other words there is no MySQL Replication master):

mysqlfabric group promote is used to promote one of the servers within the my_group HA group to be the Primary/master:

Note that it would have been possible to include the uuid of the specific MySQL Server that should be promoted but as none was specified, the best way to know which was selected is to query the state information:

As an extra step, we can confirm that one of the other servers is indeed acting as a replication slave to the master:

The final step in configuring the HA system is to have MySQL Fabric start monitoring the servers so that it can promote a new Primary (and send new routing information to the connectors) in the event that the current Primary should fail:

Example Application Code

Now it’s the turn of the application developer to start using the new HA database server. Note that while they must make some minor changes to work with the HA group, they don’t need to care about what servers are part of the group or which of them is currently the Primary – this is all handled transparently by MySQL Fabric and the Fabric-aware connectors.

The first piece of example code will create the subscribers table within the test database and add a single subscriber record. Most of this is fairly standard and so only the MySQL Fabric-specific pieces will be commented on:

  • The fabric module from mysql.com is included
  • The application connects to MySQL Fabric rather than any of the MySQL Servers ({"host" : "localhost", "port" : 8080})
  • The mode property for the connection is set to fabric.MODE_READWRITE – in that way the operations are sent to the Primary server by the connector so that the changes will be replicated to all servers in the HA group.

This code can then be run:

To check that everything has worked as expected, one of the slave servers can be checked to confirm that the table and data is there:

The next piece of code can then be run to read the record back. The main thing to note in this sample is the connection’s mode property is set to fabric.MODE_READONLY which means that the connector is free to send the query to one of the slaves (optionally, you can configure MySQL Fabric to include the master in the connector’s round-robin algorithm).

This script can then be executed to retrieve the data:

Testing Automatic Failover

The final stage is to check that things work as planned when the Primary server stops; in other words:

  • MySQL Fabric will make one of the slaves be the new master
  • MySQL Fabric will update the state/routing data to reflect the new Primary server
  • The Fabric-aware connector is informed of the change, updates its cache and routes to the correct Primary

Before stopping a MySQL Server, we can confirm which one is currently the Primary before shutting it down:

The mysqlfabric command can then be used to confirm the state change (promotion of a new Primary server):

The following code reads the data but because it sets the mode property to fabric.MODE_READWRITE the connector will send the query to the Primary (this is how you can ensure that reads are not accessing stale data from a slave):

Typically, the failed MySQL Server would be recovered and then it makes sense to add it back into the HA group:

MySQL Fabric Architecture & Extensibility

MySQL Fabric has been architected for extensibility at a number of levels. For example, in the first release the only option for implementing HA is based on MySQL Replication but in future releases we hope to add further options (for example, MySQL Cluster). We also hope to see completely new applications around the managing of farms of MySQL Servers – both from Oracle and the wider MySQL community.

The following diagram illustrates how new applications and protocols can be added using the pluggable framework.

MySQL Fabric - Extensible Architecture

Next Steps

We really hope that people try out MySQL Fabric and let us know how you get on; one way is to comment on this post, another is to post to the MySQL Fabric forum or if you think you’ve found a bug then raise a bug report.

9 thoughts on “MySQL Fabric – adding High Availability to MySQL

  1. Hi,

    i am stuck at this point: What is reason of this error

    [ahsan@mysqls ~]$ mysqlfabric group add group-1 192.168.0.213:3306
    Procedure :
    { uuid = cb9bb4ca-718e-455f-8c67-69e78d6ef16f,
    finished = True,
    success = False,
    return = ServerError: Error accessing server (192.168.0.213:3306).,
    activities =
    }

    Reagrds
    MWM

  2. Hi,
    I have the same issue.
    > mysqlfabric group add group_sh1 172.16.154.120:6612

    return = ServerError: Error accessing server (172.16.154.120:6612).,

    I tried with:

    [servers]
    user = fabric
    password = 123456

    but still get the same error
    The connection using mysql client works fine

    mysql -ufabric -h172.16.154.120 -P6612 -p123456
    mysql>

    Thanks in advance
    Greg

    1. Greg & MWM,

      Your client is likely failing because it does not have sufficient privileges for the ‘fabric’ user. This appears to be a signifigant omission in the documentation above.

      The MySQL Fabric Whitepaper ( mysql_wp_fabric.pdf ) asks you to do run the following grants on all nodes.

      mysql -h 127.0.0.1 -P3306 -u root -e ‘GRANT ALL ON *.* TO fabric@”%”‘

      This is also mentioned at http://www.clusterdb.com/mysql-fabric/mysql-fabric-adding-high-availability-and-scaling-to-mysql

      However, this is a horrible example because it grants global privileges to everything on your database, which is highly insecure; not to mention it doesn’t tell us what tables we need to grant access to.

      I don’t understand why so many examples from mysql.com gloss over privileges like this. It’s very frustrating when key details like this are skipped, and it demonstrates poor security practices. I mean, sure they say “Don’t do this because it’s bad”, but then what privileges do we actually need to be granting?

      1. Hi Stefan,

        I can certainly see your point. I’ve taken the decision to go for the low security approach in the examples I use as that allows me to focus on what’s unique to the functionality that I’m blogging about rather than fine tuning firewall settings, user privileges etc. I realise that this may be setting a bad example but hope that people will recognise that they need to apply their normal security procedures.

        Andrew.

  3. Hi,

    I met a problem when I split a existing shard. Here is the detail:

    I create 3 groups, global-group, group_id-1 and group_id-2, then built a shard.
    Till now, everything is OK. But when I split the shard with a new group named group_id-3, there is an error occurred.
    The message is as follows:

    # mysqlfabric sharding split_shard 2 group_id-3 –split_value=100000
    Password for admin:
    Procedure :
    { uuid = a0eaff93-f6fe-4a7d-9c44-e7e53882fbfe,
    finished = True,
    success = False,
    return = BackupError: (‘Error while restoring the backup using the mysql clientn, %s’, “ERROR 1840 (HY000) at line 24 in file: ‘MySQL_132.228.239.19_3316.sql’: @@GLOBAL.GTID_PURGED can only be set when @@GLOBAL.GTID_EXECUTED is empty.n”),
    activities =
    }
    I run the “reset master” command first, but the error is still there.what is this error?

    Regards
    Longxing

    1. Hi, Shen;

      This may be a little late, but it might also help others:

      gtid_purged can in fact only be set on a slave right after running “reset master;” What’s more, some slave information is carried over that will prevent this from working correctly in some circumstances (when you use an xtrabackup snapshot taken from a live slave, for example) – in that case, you will also need to run “reset slave all;” (on the slave) before you finish preparing the slave to join the fabric group by issuing the “set global gtid_purged” command.

      Most importantly: I have noticed in some circumstances that removing a server from a fabric group does not correctly stop replication. Depending on your configuration, replication may start automatically on server restart as well. If replication was running while you tried to set the global gtid_purged id, the command is not applied – just try issuing a “stop slave;” before the “reset slave all; reset master; set global gtid_purged=’x’;”.

  4. I am so tempted to try this. I was setting up my home based development server in OpenSUSE 13.2 within virtualbox on a win7 machine. I stumbled on mysql-fabric-doctrine-1.4.0.zip inside /etc/mysql folder. Opened it up and asked myself, what the heck is this? Did it come with mysql-utilities package? Googled MySQLFabric and this is the first page I found.

    I have always setup 4 MySQL server when on windows. 3306-3309 I found myself very disappointed 3 years ago when I started playing with linux os and setting up my own LAMP servers. Simply because even till today I did not know how to create more than one instance of MySQL on a single linux machine. So fricken easy to do on windows. But the whole Linux rpm yast2 concept just does not allow multiple instances.

    So anyway, once I a lil reading of what MySQLFabric is about I instantly thought there may be something here about setting up more than one instance. So, not only wass this true, but it was even better. Turns out its about having multiple instances and how to make them work together. Even better. Better yet, how to work together when physically on different machines.

    I am seriously thinking about giving this a try. Anyway, very nice work goes out to the developer responsible for fabric. Awesome job. Hope this goes somewhere for you.

    Good Job, Cory in Bradenton

Leave a Reply

Your email address will not be published. Required fields are marked *

Please enter. * Time limit is exhausted. Please reload CAPTCHA.