MySQL Fabric – adding Scaling to MySQL

MySQL Fabric - High Availability and Scalability for MySQL

MySQL Fabric is a new framework that adds High Availability (HA) and/or scaling-out for MySQL. This is the second in a series of posts on the new MySQL Fabric framework; the first article (MySQL Fabric – adding High Availability to MySQL) explained how MySQL Fabric can deliver HA and then stepped through all of the steps to configure and use it.

This post focuses on using MySQL Fabric to scale out both reads and writes across multiple MySQL Servers. It starts with an introduction to scaling out (by partitioning/sharding data) and how MySQL Fabric achieves it before going on to work through a full example of configuring sharding across a farm of MySQL Servers together with the code that the application developer needs to write in order to exploit it. Note that at the time of writing, MySQL Fabric is not yet GA but is available as a public alpha.

Scaling Out – Sharding

When nearing the capacity or write performance limit of a single MySQL Server (or HA group), MySQL Fabric can be used to scale-out the database servers by partitioning the data across multiple MySQL Server “groups”. Note that a group could contain a single MySQL Server or it could be a HA group.

MySQL Fabric cluster

The administrator defines how data should be partitioned/sharded between these servers; this is done by creating shard mappings. A shard mapping applies to a set of tables and for each table the administrator specifies which column from those tables should be used as a shard key (the shard key will subsequently be used by MySQL Fabric to calculate which shard a specific row from one of those tables should be part of). Because all of these tables use the same shard key and mapping, the use of the same column value in those tables will result in those rows being in the same shard – allowing a single transaction to access all of them. For example, if using the subscriber-id column from multiple tables then all of the data for a specific subscriber will be in the same shard. The administrator then defines how that shard key should be used to calculate the shard number:

  • HASH: A hash function is run on the shard key to generate the shard number. If values held in the column used as the sharding key don’t tend to have too many repeated values then this should result in an even partitioning of rows across the shards.
  • RANGE: The administrator defines an explicit mapping between ranges of values for the sharding key and shards. This gives maximum control to the user of how data is partitioned and which rows should be co-located.

When the application needs to access the sharded database, it sets a property for the connection that specifies the sharding key – the Fabric-aware connector will then apply the correct range or hash mapping and route the transaction to the correct shard.

If further shards/groups are needed then MySQL Fabric can split an existing shard into two and then update the state-store and the caches of routing data held by the connectors. Similarly, a shard can be moved from one HA group to another.

Note that a single transaction or query can only access a single shard and so it is important to select shard keys based on an understanding of the data and the application’s access patterns. It doesn’t always make sense to shard all tables as some may be relatively small and having their full contents available in each group can be beneficial given the rule about no cross-shard queries. These global tables are written to a ‘global group’ and any additions or changes to data in those tables are automatically replicated to all of the other groups. Schema changes are also made to the global group and replicated to all of the others to ensure consistency.

To get the best mapping, it may also be necessary to modify the schema if there isn’t already a ‘natural choice’ for the sharding keys.

Worked Example

The following steps set up the sharded MySQL configuration shown here before running some (Python) code against – with queries and transactions routed to the correct MySQL Server.

Building the Sharded MySQL Server Farm

Sharding using MySQL Fabric

The machines being used already have MySQL 5.6 installed (though in a custom location) and so the only software pre-requisite is to install the MySQL connector for Python from the “Development Releases” tab from the connector download page and MySQL Fabric (part of MySQL Utilities) from the “Development Releases” tab on the MySQL Utilities download page:

MySQL Fabric needs access to a MySQL Database to store state and routing information for the farm of servers; if there isn’t already a running MySQL Server instance that can be used for this then it’s simple to set one up:

MySQL Fabric needs to be able to access this state store and so a dedicated user is created (note that the fabric database hasn’t yet been created – that will be done soon using the mysqlfabric command):

All of the management requests that we make for MySQL Fabric will be issued via the mysqlfabric command. This command is documented in the MySQL Fabric User Guide but sub-commands can be viewed from the terminal using the list-commands option:

MySQL Fabric has its own configuration file (note that it’s location can vary depending on your platform and how MySQL Utilities were installed). The contents of this configuration file should be reviewed before starting the MySQL Fabric process (in this case, the mysqldump_program and mysqldump_program settings needed to be changed as MySQL was installed in a user’s directory):

The final step before starting the MySQL Fabric process is to create the MySQL Fabric schema within the state store:

An optional step is then to check for yourself that the schema is indeed there:

The MySQL Fabric process can now be started; in this case the process will run from the terminal from which it’s started but the --daemonize option can be used to make it run as a daemon.

If the process had been run as a daemon then it’s useful to be able to check if it’s actually running:

At this point, MySQL Fabric is up and running but it has no MySQL Servers to manage. As shown in the earlier diagram, three MySQL Servers will run on a single machine. Each of those MySQL Servers will need their own configuration settings to make sure that there are no resource conflicts – the steps are shown here but without any detailed commentary as this is standard MySQL stuff:

At this point, the MySQL Fabric process (and its associate state store) is up and running, as are the MySQL Servers that will become part of the Fabric server farm. The next step is to define the groups (and assign a server to each one); the mappings that will be used to map from shard keys to shards and then finally the shards themselves.

The first group that’s created is the global group group_id-global which is where all changes to the data schema or to the global tables (those tables that are duplicated in every group rather than being sharded) are sent and then replicated to all of the other groups.

After that, the two groups that will contain the sharded table data are created – group_id-1&group_id-2.

The three groups have now been created but they’re all empty and so the next step is to assign a single MySQL Server to each one.

Optionally, the mysqlfabric command can then be used to confirm this configuration.

Even though each of these groups contains a single server, it’s still necessary to promote those servers to be Primaries so that the Fabric-aware connectors will send writes to them.

Shard mappings are used to map shard keys to shards and they can be based on ranges or on a hash of the shard key – in this example, a single mapping will be created and it will be based on ranges. When creating the shard mapping, the name of the global group must be supplied – in this case group_id-global.

From the return code, you can observe that the ID given to the shard mapping is 1 – that can optionally be confirmed by checking the meta data in the state store:

The next step is to define the sharding key for each of the tables that we want to be partitioned as part of this mapping. In fact, this example is only sharding one table test.subscribers but the command can be repeated for multiple tables. The name of the column to be used as the sharding key must also be supplied (in this case sub_no) as must the ID for the shard mapping (which we’ve just confirmed is 1).

Again, the state store can be checked to confirm that this has been set up correctly.

The next step is to define the shards themselves. In this example, all values of sub_no from 1-9999 will be mapped to the first shard (which will be associated with group_id-1) and from 10000 and up to the second shard (group_id-2). Again the shard mapping ID (1) must also be provided.

Example Application Code

Now that the MySQL Fabric farm is up and running we can start running some code against it. There are currently Fabric-aware connectors for PHP, Python and Java – for this post, Python is used.

Note that while they must make some minor changes to work with the sharded database, they don’t need to care about what servers are part of the farm, what shards exist or where they’re located – this is all handled by MySQL Fabric and the Fabric-aware connectors. What they do need to do is provide the hints needed by the connector to figure out where to send the query or transaction.

The first piece of example code will create the subscribers table within the test database. Most of this is fairly standard and so only the MySQL Fabric-specific pieces will be commented on:

  • The fabric module from mysql.com is included
  • The application connects to MySQL Fabric rather than any of the MySQL Servers ({"host" : "localhost", "port" : 8080})
  • The scope property for the connection is set to fabric.SCOPE_GLOBAL – in that way the operations are sent to the global group by the connector so that the schema changes will be replicated to all servers in the HA group (the same would be true if writing to a non-sharded (global) table).

The code can then be executed:

The next piece of application code adds some records to the test.subscribers table. To ensure that the connector can route the transactions to the correct group, the following properties are set for the connection: scope is set to fabric.SCOPE_LOCAL (i.e. not global); tables is set to "test.subscribers" which allows the connector to select the correct mapping and key is set to the value of the sub_no being used for the row in the current transaction so that the connector can perform a range test on it to find the correct shard. The mode property is also set to fabric.MODE_READWRITE (if [HA groups]:(http://www.mysqlhighavailability.com/mysql-fabric/mysql-fabric-adding-high-availability-to-mysql “MySQL Fabric – adding High Availability to MySQL”) were being used then this would tell the connector to send the transaction to the primary).

This application code is then run and then the servers from each of the groups queried to confirm that the data has been sharded as expected (based on the 0-9999 and 10000+ range definition).

The final piece of application code reads back the rows. Note that in this case the connection’s mode property is set to fabric.READONLY which tells the connector that if [HA groups]:(http://www.mysqlhighavailability.com/mysql-fabric/mysql-fabric-adding-high-availability-to-mysql “MySQL Fabric – adding High Availability to MySQL”) were being used then the queries could be sent to any of the Secondaries.

MySQL Fabric Architecture & Extensibility

MySQL Fabric has been architected for extensibility at a number of levels. For example, in the first release the only option for implementing HA is based on MySQL Replication but in future releases we hope to add further options (for example, MySQL Cluster). We also hope to see completely new applications around the managing of farms of MySQL Servers – both from Oracle and the wider MySQL community.

The following diagram illustrates how new applications and protocols can be added using the pluggable framework.

MySQL Fabric - Extensible Architecture

Next Steps

We really hope that people try out MySQL Fabric and let us know how you get on; one way is to comment on this post, another is to post to the MySQL Fabric forum or if you think you’ve found a bug then raise a bug report.

2 thoughts on “MySQL Fabric – adding Scaling to MySQL

  1. Hi there,
    I am reaching out for a bit of help regarding the complete setup. I am new to setting up fail over and replication and was wondering if anyone could help with a step by step setup?
    I am working with cloud servers on rackspace.
    I would love to hear from someone with the time.
    Thank you.
    Dean

Leave a Reply

Your email address will not be published. Required fields are marked *

Please enter. * Time limit is exhausted. Please reload CAPTCHA.