Using SQL Server to manipulate JSON data structures, including a boolean gotcha

Using SQL Server to manipulate JSON data structures, including a boolean gotcha

Photo by Alain Pham on Unsplash

The problem I was solving

I recently did some work to update the structure of some JSON stored in a SQL Server database. The JSON started out as a flat list of properties, and I wanted to add some structure by moving the properties into nested JSON objects.

The original structure was like this:

{
    "key1": "value1",
    "key2": "value2",
    "key3": "value3"
}

I wanted to update it to be like this:

{
  "group1": {
    "key1": "value1",
    "key2": "value2"
  },
  "group2": {
    "key3": "value3"
  }
}

I was using Entity Framework 6, so I wrote a migration which builds up a string of SQL within the C# migration file, then executes it by passing it to the built-in Sql function which EF6 exposes.

Initial solution

Initially this seemed quite straightforward, and I wrote the following helper function which I could call for each property.

private void CopyDatumIntoJsonGroup(string groupName, string key)
{
    Sql($@"
UPDATE dbo.JsonTable
SET JsonColumn = JSON_MODIFY(JsonColumn, '$.{groupName}.{key}', JSON_VALUE(JsonColumn, '$.{key}'));");
}

This uses JSON_VALUE to get the current value, and JSON_MODIFY to write it into the new structure. (I then had a separate helper function which deleted the original value once it had been copied.)

At first glance, this function worked fine. I tested with various values, and moved on. However, I later discovered a scenario where this doesn't work as expected.

Discovering a gotcha

Some of my JSON data contained booleans. The JSON looked like this:

{
  "value1": true,
  "value2": false
}

Once I'd used my helper function to move these into a group, I was left with JSON like this:

{
  "group1": {
    "value1": "true",
    "value2": "false"
  }
}

The key point here is that the booleans have been converted to strings - i.e. true and false have been replaced by "true" and "false", and those quotes change the data type which the JSON data represents.

Improving the solution

My next attempt was to create a second helper function which checks for boolean values and handles them accordingly. If the boolean value is cast as a BIT then JSON_MODIFY will treat it as a boolean, rather than a string.

This is what I came up with:

private void MoveDatumIntoJsonGroup(string groupName, string key)
{
    var value = GetPossibleBoolean($"JSON_VALUE(JsonColumn, '$.{key}')");

    Sql($@"
UPDATE dbo.JsonTable
SET JsonColumn = JSON_MODIFY(JsonColumn, '$.{groupName}.{key}', {value});");
}

/// <summary>
/// When writing a value using JSON_MODIFY, a JSON boolean (which is represented as
/// the word true/false without any quotes) will be interpreted by SQL as the string "true"/"false".
/// To handle this, we first check for those values and cast them as bits, which SQL Server uses
/// to represent booleans. JSON_MODIFY will then write these as JSON booleans, not JSON strings.
/// </summary>
/// <param name="input">SQL which will get a JSON value which could be a boolean.</param>
/// <returns>SQL which will return the input cast to a BIT if it's a boolean, or the original input if not.</returns>
private static string GetPossibleBoolean(string input) => $@"
CASE 
    WHEN {input} IN ('true', 'false') 
        THEN TRY_CAST({input} AS BIT)
    ELSE {input}
END";

However, this has the problem that the CASE statement can return two different data types (a BIT for a boolean, or whatever data type the input SQL would return if not) which is not permitted.

Final solution

The solution to problem this is to include the whole of JSON_MODIFY within the CASE.

private void MoveDatumIntoJsonGroup(string groupName, string key)
{
    var updatedSql = GetJsonModify("JsonColumn", $"'$.{groupName}.{key}'", $"JSON_VALUE(JsonColumn, '$.{key}')");

    Sql($@"
UPDATE dbo.JsonTable
SET JsonColumn = {updatedSql};"
}


/// <summary>
/// When writing a value using JSON_MODIFY, a JSON boolean (which is represented as
/// the word true/false without any quotes) will be interpreted by SQL as the string "true"/"false".
/// To handle this, we first check for those values and cast them as bits, which SQL Server uses
/// to represent booleans. JSON_MODIFY will then write these as JSON booleans, not JSON strings.
/// </summary>
/// <param name="expression">A SQL expression which returns JSON - typically the name of a JSON column.</param>
/// <param name="path">The path to the JSON which should be modified.</param>
/// <param name="newValueSql">SQL which returns the new value which should be written to the <paramref name="path"/>.</param>
/// <returns>The updated value of <paramref name="expression"/> as properly formatted JSON text.</returns>
private string GetJsonModify(string expression, string path, string newValueSql)
{
    // The only part of the JSON_MODIFY which is different is the 3rd parameter
    // but a SQL CASE statement cannot return different data types from the different cases,
    // so we cannot have a single JSON_MODIFY and use CASE within it
    // (because this would mean the CASE could return a BIT or the data type returned by newValueSql).
    // We have the JSON_MODIFY repeated so the CASE always returns a JSON string.
    return $@"
CASE 
    WHEN {newValueSql} IN ('true', 'false') 
        THEN JSON_MODIFY({expression}, {path}, TRY_CAST({newValueSql} AS BIT))
    ELSE JSON_MODIFY({expression}, {path}, {newValueSql})
END";
}

Conclusion

The result is that the generated SQL is longer and more complex, but this is a suitable trade-off because it's only run as a one-off migration. This solution might be overkill if the SQL needed running frequently (e.g. in a loop) or if we knew in advance which keys did/didn't contain boolean values, but in my use case, it was a suitable trade-off.