Indexing JSON Data in MySQL

Storing JSON in a relational database is something developers have done for quite some time. There are a variety of reasons why you would want to store JSON in a database table - user preferences and configuration data are two good examples. The JSON Data Type was introduced to MySQL in version 5.7.8. This data type allows us to store valid JSON in a database column and run queries based on the values in the JSON.
The Potential Issue
When storing JSON data in MySQL, we can query the database based on values within that JSON. We can accomplish this using a variety of JSON specific MySQL functions. The potential issue is that query performance can degrade over time as the number of rows increases. When this happens with other data types, one solution to help query performance is to add an index to one or more columns. Since MySQL 8.0.13, we have had the ability to create functional indexes. Functional indexes allow us to create indexes based on expressions, not column data. We can leverage this feature to create an index based on JSON values.
Getting Started
Before creating an index, let’s create a simple table with a column containing JSON data.
Once we have the table created, let’s add some data.
Here is a more readable example of the JSON we are storing.
Running a Query
Here is a query we can use to filter our data based on the vehicle manufacturer.
Note: The ->>
is shorthand for JSON_EXTRACT()
within JSON_UNQUOTE()
While there are no performance issues with this query for our small dataset, there could be when more data is inserted into our table.
Let’s see what the explain plan for this query looks like by running the following:
Note: We use \G
at the end of the command to get more readable output in the command line interface.
Here are the results:
The portions of the results we are interested in are the values of possible_keys
and key
. As we can see, these values are NULL
, meaning no indexes exist that can be used for this query.
Adding the Index
To address the potential performance issues with this query, we can create an index based on the values of our JSON. Here is what this SQL command would look like:
After we have run this command, we can look at the index definition by using this command:
The results of this command should list two indexes - the index for our primary key and the index we just created.
When we look at the expression
property of our new index, we can see that the text vehicle_data->>"$.manufacturer"
has been replaced with json_unquote(json_extract(vehicle_data,_utf8mb4\'$.manufacturer\'))
.
You may wonder why we used CAST()
and COLLATE
to create this index.
First, the function JSON_UNQUOTE()
returns a data type of LONGTEXT
. The data type LONGTEXT
cannot be used in an index, so we need to CAST()
the results to a data type that can be indexed. In this example, CHAR(255)
.
Next, we use COLLATE
because the functions used to extract data (used in the WHERE
statement in our query) are collated to utf8mb4_bin
. However, when we cast a string without using COLLATE,
it is cast to utf8mb4_0900_ai_ci
. When the collation of what is stored in the index does not match the collation of the string in our WHERE
clause, the index will not get used.
Checking Our Work
After we create the index, let’s check on our explain plan by rerunning this command:
The result should look similar to this:
We can see that possible_key
and key
now have a value indicating that our new index is being used to execute this query.
Wrap up
Storing JSON data in a relational database is something developers have been doing long before a JSON
data type existed. The JSON data type allows us to store valid JSON data and run queries based on the values in the JSON object. By using functional indexes, we can help the performance of those queries in the same way an index on other data types helps boost performance.
Photo by Maksym Kaharlytskyi on Unsplash