Using SQL Server to manipulate JSON data structures, including a boolean gotcha
Photo by Alain Pham on Unsplash
The problem I was solving
I recently did some work to update the structure of some JSON stored in a SQL Server database. The JSON started out as a flat list of properties, and I wanted to add some structure by moving the properties into nested JSON objects.
The original structure was like this:
{
"key1": "value1",
"key2": "value2",
"key3": "value3"
}
I wanted to update it to be like this:
{
"group1": {
"key1": "value1",
"key2": "value2"
},
"group2": {
"key3": "value3"
}
}
I was using Entity Framework 6, so I wrote a migration which builds up a string of SQL within the C# migration file, then executes it by passing it to the built-in Sql
function which EF6 exposes.
Initial solution
Initially this seemed quite straightforward, and I wrote the following helper function which I could call for each property.
private void CopyDatumIntoJsonGroup(string groupName, string key)
{
Sql($@"
UPDATE dbo.JsonTable
SET JsonColumn = JSON_MODIFY(JsonColumn, '$.{groupName}.{key}', JSON_VALUE(JsonColumn, '$.{key}'));");
}
This uses JSON_VALUE
to get the current value, and JSON_MODIFY
to write it into the new structure. (I then had a separate helper function which deleted the original value once it had been copied.)
At first glance, this function worked fine. I tested with various values, and moved on. However, I later discovered a scenario where this doesn't work as expected.
Discovering a gotcha
Some of my JSON data contained booleans. The JSON looked like this:
{
"value1": true,
"value2": false
}
Once I'd used my helper function to move these into a group, I was left with JSON like this:
{
"group1": {
"value1": "true",
"value2": "false"
}
}
The key point here is that the booleans have been converted to strings - i.e. true
and false
have been replaced by "true"
and "false"
, and those quotes change the data type which the JSON data represents.
Improving the solution
My next attempt was to create a second helper function which checks for boolean values and handles them accordingly. If the boolean value is cast as a BIT
then JSON_MODIFY
will treat it as a boolean, rather than a string.
This is what I came up with:
private void MoveDatumIntoJsonGroup(string groupName, string key)
{
var value = GetPossibleBoolean($"JSON_VALUE(JsonColumn, '$.{key}')");
Sql($@"
UPDATE dbo.JsonTable
SET JsonColumn = JSON_MODIFY(JsonColumn, '$.{groupName}.{key}', {value});");
}
/// <summary>
/// When writing a value using JSON_MODIFY, a JSON boolean (which is represented as
/// the word true/false without any quotes) will be interpreted by SQL as the string "true"/"false".
/// To handle this, we first check for those values and cast them as bits, which SQL Server uses
/// to represent booleans. JSON_MODIFY will then write these as JSON booleans, not JSON strings.
/// </summary>
/// <param name="input">SQL which will get a JSON value which could be a boolean.</param>
/// <returns>SQL which will return the input cast to a BIT if it's a boolean, or the original input if not.</returns>
private static string GetPossibleBoolean(string input) => $@"
CASE
WHEN {input} IN ('true', 'false')
THEN TRY_CAST({input} AS BIT)
ELSE {input}
END";
However, this has the problem that the CASE
statement can return two different data types (a BIT
for a boolean, or whatever data type the input SQL would return if not) which is not permitted.
Final solution
The solution to problem this is to include the whole of JSON_MODIFY
within the CASE
.
private void MoveDatumIntoJsonGroup(string groupName, string key)
{
var updatedSql = GetJsonModify("JsonColumn", $"'$.{groupName}.{key}'", $"JSON_VALUE(JsonColumn, '$.{key}')");
Sql($@"
UPDATE dbo.JsonTable
SET JsonColumn = {updatedSql};"
}
/// <summary>
/// When writing a value using JSON_MODIFY, a JSON boolean (which is represented as
/// the word true/false without any quotes) will be interpreted by SQL as the string "true"/"false".
/// To handle this, we first check for those values and cast them as bits, which SQL Server uses
/// to represent booleans. JSON_MODIFY will then write these as JSON booleans, not JSON strings.
/// </summary>
/// <param name="expression">A SQL expression which returns JSON - typically the name of a JSON column.</param>
/// <param name="path">The path to the JSON which should be modified.</param>
/// <param name="newValueSql">SQL which returns the new value which should be written to the <paramref name="path"/>.</param>
/// <returns>The updated value of <paramref name="expression"/> as properly formatted JSON text.</returns>
private string GetJsonModify(string expression, string path, string newValueSql)
{
// The only part of the JSON_MODIFY which is different is the 3rd parameter
// but a SQL CASE statement cannot return different data types from the different cases,
// so we cannot have a single JSON_MODIFY and use CASE within it
// (because this would mean the CASE could return a BIT or the data type returned by newValueSql).
// We have the JSON_MODIFY repeated so the CASE always returns a JSON string.
return $@"
CASE
WHEN {newValueSql} IN ('true', 'false')
THEN JSON_MODIFY({expression}, {path}, TRY_CAST({newValueSql} AS BIT))
ELSE JSON_MODIFY({expression}, {path}, {newValueSql})
END";
}
Conclusion
The result is that the generated SQL is longer and more complex, but this is a suitable trade-off because it's only run as a one-off migration. This solution might be overkill if the SQL needed running frequently (e.g. in a loop) or if we knew in advance which keys did/didn't contain boolean values, but in my use case, it was a suitable trade-off.