Efficient JSON Replication in MySQL 8.0

MySQL is not only a relational database, but can also be used as a schwyriwyuemaless/NOSQL document store, or a mix of both. This is realized by the JSON datatype, and is useful for any data that is hard to fit in the ”tabular” format of a traditional table. For instance, the inventory of a shop for gardening equipment might contain rows for various types of gloves, as well as rows for wheelbarrows. The glove rows might have size, material, and waterproofness properties, whereas rows for wheelbarrows might have capacity properties measured in both litres and kilograms. So not only the values but also the mere existence of those metadata fields differs between rows in the table. Such fields are conveniently stored inside a JSON column.

wheelbarrow-legend
In MySQL 8.0.3, we made the replication of JSON much more efficient. The new feature, called Partial JSON Replication, ensures that an UPDATE statement that only modifies parts of a JSON document will not write the full document to the binary log, but rather only those parts that were actually changed. Thus, the bigger your JSON document is, and the smaller part you modify, the more space will you save by using this feature. This reduces the disk usage for binary logs and relay logs, as well as the amount of data that needs to be sent over the network. Therefore, the throughput of the entire system can improve so that you get more transactions per second for JSON-heavy workloads.

Requirements

The feature works when the UPDATE for the JSON column consists of the following JSON functions: JSON_SET, JSON_REPLACE, JSON_REMOVE. It also works for combinations of these functions. Here is an example:

These JSON functions operate on a JSON document. The JSON document must be taken from the same column and row that is being updated. In other words, this does not work for copying values between different columns.

This is implemented for row-based replication (binlog_format=row); for statement-based replication such statements were already replicated efficiently.

Enabling the feature

The feature is disabled by default, in order to keep backward compatibility with any application that relies on having the full document in the binary log. To enable the feature, set this configuration option:

You can of course set this dynamically, too, using SET GLOBAL binlog_row_value_options = PARTIAL_JSON (or SET SESSION or SET PERSIST depending on your needs). In all cases it requires the SYSTEM_VARIABLES_ADMIN or SUPER privilege.

In addition, note that by default all columns are included in row updates. An update consists of both a before_image (the row before the update) and an after_image (the row after the update). Partial JSON only applies to the after image, so this option alone can only remove half the full documents. To get rid of the other half, consider using this option together with --binlog-row-image=MINIMAL (or SET GLOBAL binlog_row_image = MINIMAL, or SET SESSION, or SET PERSIST). This ensures that full documents are not sent even in the before_image.

More human-readable mysqlbinlog

As part of this work, we also made mysqlbinlog print both full JSON documents and partial JSON updates correctly, when using the -v option to decode row events. This makes the output more human-readable (applying the binlog on a server always worked). Here is an example:

Low-level format

If you are into the bits and bytes of the binary log, here is a quick overview of the format (most users do not need to know this):

  • When binlog_row_value_options=PARTIAL_JSON, we use a new event type code, PARTIAL_UPDATE_ROWS_EVENT, with the numeric value 39.
  • At the heart of any row update in the binary log are the before_image and the after_image. The before_image contains values for columns that the applier can use to find the row in the table (e.g. the primary key), and the after_image contains new values that will be written to the row. For events with the type code PARTIAL_UPDATE_ROWS_EVENT, we additionally store a so-called shared_image, between the before_image and the after_image. The shared_image has one of the following structures:
    value_options0
    or
    operation2
    where value_options is a single byte with the value 0 or 1, and partial_columns is a bitmap having one bit for each JSON column. When value_options=0, all JSON columns are stored in full format. When value_options=1, all JSON columns included in the before_image are still stored in full format, whereas JSON columns included in the after_image may be stored in either partial format or full format. The nth bit of partial_columns is 1 if the nth JSON column in the after_image is stored in partial format, and 0 if stored in full format. (We count bits starting at the least significant bit of the first byte.)
  • The format for JSON columns stored in full format is the same as in 5.7 and earlier:
    length_data
    where length is the length of data in bytes in 32 bit little endian format, and data is a binary encoding of JSON.
  • The format for JSON columns in partial format is:
    length-diff1-diff2-diffn
    where length is total the length of all the diffi fields, in bytes in 32 bit little endian format, and each diffi field has one of the following forms:
    operation01
    or
    operation2
    where:

    • operation is a 1 byte constant. It is 0 when the object at path will be replaced by data; 1 when data will be inserted at path without replacing existing objects; and 2 when the object at path will be removed.
    • path_length is the length in bytes of path, and data_length is the length in bytes of data. These use a variable-length integer encoding which occupies 1, 3, 4, or 9 bytes (see the source code in mysys/pack.cc).
    • path is the JSON path, as a string in utf8mb4 format (no null termination).
    • data is the JSON document that will be inserted, in binary JSON format.

In case the partial format would take more space than the full document, it will fall back on storing the full document instead.

See also WL#2955 for details.

Summary

When you update only a small part of a JSON document using JSON_SET, JSON_REMOVE, or JSON_REPLACE, we now replicate only that small part, and not the remaining, unchanged parts of the document, if you have previously set binlog_row_value_options=PARTIAL_JSON. This saves disk space.

About Sven Sandberg

Sven Sandberg is a Principal Software Engineer and leads a part of the replication team at Oracle. He has a PhD in software verification from Uppsala University, Sweden, on the subject of games and probabilistic infinite-state systems.

Leave a Reply

Your email address will not be published. Required fields are marked *

Please enter. * Time limit is exhausted. Please reload CAPTCHA.