person encoding in laptop

Working with JSON in an RDBMS – Part 1, MS SQL Server

Working with JSON in an RDBMS

I have to begin by admitting that I’m no expert with NoSQL databases. I understand them conceptually and have worked hands-on with MongoDB and AWS DynamoDB in some learning and exploratory environments. But I have yet to build anything substantial using this particular technology, so it’s on my list of things to dive deeper into in the coming year. In the meantime, I became curious about working with JSON in a relational database, like SQL Server.

The concept of being able to store data as a document is a powerful idea in an age where it seems that everything is becoming more connected through the power of REST APIs. At the same time, many companies are still bound to their traditional relational SQL databases. As we know, these have their own pros and cons.

I have seen some companies who take a hybrid approach – deploying an RDBMS like PostgreSQL to handle certain functions while other data sets are stored in MongoDB. The problem here though is that in order to support this, you’re increasing the level of complexity in your environment as well as the costs that go into it. You need people who can administer both and people who can develop with both. You’re also going to need to deploy instances of each in your non-production and production environments. And then there’s the matter of sharing data between the two.

As a result of the increasing market share that NoSQL databases have started to capture, most SQL database platforms have responded by adding functionality to support storage of and interaction with JSON documents as a built-in feature. This means that in SQL Server, PostgreSQL, Oracle, and MySQL you can now store and interact with JSON documents that are stored in a table, right there among your other columns. This means that you can now contemplate using a single database platform to solve for scenarios where JSON documents are needed as well as places where a relational structure makes more sense.

One caution that I will add at this point is that you should make sure that you’re very intentional in making decisions like these. Just because you can’t doesn’t always mean that you should. Give some clear thought to how you will utilize the data that you’re storing and under what conditions you may need to access it. It’s important to use the right tool for the job at hand, otherwise you may come to regret it later and find yourself migrating a bunch of data from an RDBMS table over to a MongoDB instance anyway.

I’m going to do a three part series on using JSON in relational databases. I’ll cover MS SQL Server, PostgreSQL, and MySQL. I decided to do this because I recently did run into a situation where I was contemplating using a MongoDB as a data store for an application that already runs on PostgreSQL. I was dreading having to dive in, configure a new server, and complicate my deployment footprint. Then I read about PostgreSQL adding the json and jsonb data types and in my particular situation it makes perfect sense to utilize these structures. I figured that others might have similar situations, so hopefully you find this series useful.

I’m actually going to start with Microsoft SQL Server because the work I’m doing right this moment is more focused there, so I decided to start with what’s top of mind.

Use Case for JSON in an RDBMS

First, I’ll start by describing the use case that I’m going to leverage for this series. I do a substantial amount of work in healthcare and insurance systems. These systems tend to be highly configurable and it always struck me that a lot of the configuration work we do works best in document form than in a relational table form. For example, an insurance policy tends to have a large number of attributes and as policies and regulations change, you find yourself tacking new columns onto the ends of tables. This creates a lot of change in what should ideally be pretty stable data structures. And change can lead to issues in refreshing data warehouses and updating reports and sharing data.

But the document collection structure that NoSQL databases pioneered makes perfect sense to me for things like policy attributes, benefit plans, and provider contracts. A big part of the reason for that is that these structures often come from actual, real life documents like an Explanation of Benefits / Summary of Benefits, Insurance Policy, or an actual contract between a provider and a health plan. These documents are often a combination of various statements that might easily be represented as an attribute or set of attributes combined with tables that might outline things like fee schedules for specific services. All of that translates really well to a document structure like what we find in a JSON document.

So for this series we’re going to use the example of an (extremely simplified) provider contract. We’ll have a provider table that has a provider name and NPI and then we’ll have a contract table that relates to the provider table through a foreign key. The contract table will store contract terms as a JSON document, a contract effective date, and a contract termination date. (Note that we could store those dates in the JSON document itself but because we always have these values on every contract, we’re putting them in as separate fields. This is unlike attributes we’ll put in the JSON document to indicate Out of Area and Out of Network contracts.) So the structure will look like:

  • provider
    • id (bigint, pk)
    • provider_name (varchar(100))
    • npi (varchar(10))
  • provider_contract
    • id (bigint, pk)
    • provider_id (bigint, fk to provider.id)
    • contract_terms (nvarchar(max)) > this is where our JSON document will go
    • contract_eff_date (datetime)
    • contract_term_date (datetime)

In the JSON document, we’re going to keep that fairly simple. We’re going to capture an array of contracted services. A contracted service can have a range of service codes and a fee schedule which could either be a flat rate or a percentage of the billed amount. (We’re not building a full-fledged real-world claim system so we’re drastically simplifying.) The JSON documents that we’ll work with will be the following:

Princeton Plainsboro Teaching Hospital Contract:

{
    "out_of_area": "true",
    "service_rates": [
           {"service_description": "Office Visits 1", "service_code_range_start": "99211", "service_code_range_end": "99221", "flat_rate": "45.00"},
           {"service_description": "Office Visits 2", "service_code_range_start": "99222", "service_code_range_end": "99251", "flat_rate": "56.00"},
           {"service_description": "Anesthesia", "service_code_range_start": "00100", "service_code_range_end": "00952", "billed_percentage": "0.7401"}
     ]
}

Sacred Heart Hospital Contract:

{
     "out_of_network": "true",
     "service_rates": [
          {"service_description": "Misc Services", "service_code_range_start": "81200", "service_code_range_end": "82000", "billed_percentage": "0.6132"}
     ]
} 

Overview of Working With JSON in Microsoft SQL Server

Microsoft SQL Server added support for JSON in MS SQL Server 2016. This came in the form of a set of functions that are used to parse, modify, query, and select from JSON documents that are stored in NVARCHAR fields.

There are five functions used for working with JSON documents in SQL Server:

  • ISJSON(column/variable) – this will return a 1 if the contents of the column or variable you provide are a valid JSON document, a 0 if the contents are not valid JSON, and NULL if the contents are NULL.
  • JSON_VALUE(column/variable, JSON path) – this will return a scalar value from the path you provide if that path has a valid scalar value. Note that if you try to use JSON_VALUE to return an array within the JSON document then you’re going to get a null.
  • JSON_QUERY(column/variable, JSON path) – this will return a JSON object from the path you provide. You can use this to return an array from within the JSON document, for example.
  • OPENJSON(column/variable, option JSON path) – this function allows you to essentially convert a portion of a JSON document to table value. We’ll use examples where we combine this with a WITH clause to map columns in the table to specific JSON paths
  • JSON_MODIFY(column/variable, JSON path, new value) – this allows you to modify a value within a path. We won’t spend much time on this at the moment but there is plenty of documentation available to assist if you find that you need this. At least in my view, solutions where we’re using these functions should be restricted to situations where you’re manipulating the JSON document within the applications and using SQL primary for storage and potentially some querying, so I don’t want to go too far into this function right now.

Let’s start by inserting our providers:

INSERT INTO providers (provider_name, npi) VALUES ('Princeton Plainsboro Teaching Hospital', '0123456789')
INSERT INTO providers (provider_name, npi) VALUES ('Sacred Heart Hospital', '1234560987') 

Our providers table now has PPTH as id = 1 and Sacred Heart as id = 2. Let’s insert contracts for each of them now. Note that I’m using the JSON documents we defined above in our “contract_terms” column:

INSERT INTO provider_contracts (provider_id, contract_terms, contract_eff_date, contract_term_date)
VALUES (   
    1,
  '{ "out_of_area": "true", "service_rates": [ {"service_description": "Office Visits 1", "service_code_range_start": "99211", "service_code_range_end": "99221", "flat_rate": "45.00"}, {"service_description": "Office Visits 2", "service_code_range_start": "99222", "service_code_range_end": "99251", "flat_rate": "56.00"}, {"service_description": "Anesthesia", "service_code_range_start": "00100", "service_code_range_end": "00952", "billed_percentage": "0.7401"}] }',
  '20190101 00:00:00',
  '20201231 23:59:59'
);
INSERT INTO provider_contracts (provider_id, contract_terms, contract_eff_date, contract_term_date)
VALUES (
   2,
   '{ "out_of_network": "true", "service_rates": [{"service_description": "Misc Services", "service_code_range_start": "81200", "service_code_range_end": "82000", "billed_percentage": "0.6132"}] }',
   '20200101 00:00:00',
   '20201231 23:59:59'
);

Now, let’s run a query using one of our new JSON functions to make sure that everything looks good.

SELECT id, provider_id, ISJSON(contract_terms) AS is_valid_json
FROM provider_contracts

If we did everything right, then we have two rows and the is_valid_json column in both rows is equal to 1, meaning that we have valid JSON documents in those fields.

Now let’s run a query to find any providers that are out of the area. We’re going to do this by using the JSON_VALUE function to select our out-of-area and out-of-network fields from the JSON document and we’re going to use the same function in the WHERE clause to filter for out-of-area = true.

SELECT id, JSON_VALUE(contract_terms, '$.out_of_area') AS out_of_area,
    JSON_VALUE(contract_terms, '$.out_of_network') AS out_of_network 
FROM provider_contracts
WHERE JSON_VALUE(contract_terms, '$.out_of_area') = 'true'

In this case, we’re going to get one record (id = 1 for me, this is the PPTH contract) and the out_of_area will be equal to ‘true’. The out_of_network column is null because that value doesn’t exist in the PPTH contract document. This is exactly what we would expect to happen – the JSON_VALUE function doesn’t return an error, it simply tells us that the value we’re asking for based on that path just doesn’t exist.

One thing that I will note here is that if I want to get something specific from our service_rates array, I’m going to have to specify all the way down to the specific value within the array that I want to get. For example, if I want the flat rate from the first service rate (“Office Visit 1”) then I’m going to have to do something like this:

SELECT p.id, pc.id AS contract_id,
   p.provider_name,
   JSON_VALUE(pc.contract_terms, '$.service_rates[0].flat_rate') AS office_visit_rate
FROM provider_contracts pc
INNER JOIN providers p ON p.id = pc.provider_id
WHERE JSON_VALUE(pc.contract_terms, '$.service_rates[0].service_description') = 'Office Visits 1' 

If I try to get index [3] from the array, I get a NULL because the indexes will only go up to 2 (because there are three service rates in the contract). The problem here is that it requires that I actually know where in the array my Office Visits 1 service definition is. This isn’t very practical and makes it difficult to work with. We’ll see how we have to work around this when we get into the OPENJSON function.

Next, let’s say that I want to return a provider JSON document in my query that includes the provider details and the contract details as well. This is a good place for us to use JSON_QUERY. We start by creating what looks like a perfectly normal select statement with an inner join between providers and provider_contracts. But then, we wrap our contract_terms field in JSON_QUERY and give it an alias (contract_terms). And at the very end, we tack on FOR JSON AUTO.

By doing this, we’re telling SQL Server we want it to return a JSON object and that it should automatically formulate the structure based on the order of our select statement and the table structure. If we use AUTO here then we don’t have any further control over what SQL gives us. We can use PATH in place of this if we want more control over it. (Note that if we use path, we can’t have two columns with the same name, so we’ll have to alias our [contracts].id field.) Finally, note that we can use the ROOT method to specify what we want the top level/root of the document to be. Here we’re going to use providers. If we don’t use this, then we get an array of JSON objects back. By using ROOT('providers') we’re going to get a providers object that contains an array of providers.

SELECT p.id, 
  p.provider_name, 
  p.npi, 
  [contracts].id, 
  JSON_QUERY([contracts].contract_terms, '$') AS contract_terms,
  [contracts].contract_eff_date,
  [contracts].contract_term_date 
FROM providers p 
INNER JOIN provider_contracts [contracts] ON p.id = [contracts].provider_id 
FOR JSON AUTO, ROOT('providers') 

Finally, let’s say that we have a service code and we want to retrieve the service rate to be used for calculating a payment on that service. Let’s say we get service code 99215 for PPTH. We know we should get the Office Visits 1 service rate which pays a flat rate of $45.00 for services in the range of 99211 to 99221. Remember how we said that using JSON_VALUE requires us to know exactly where in the document our data is? In order to query in the way we need, we need to use OPENJSON to essentially convert our JSON document into a table. Then we can filter that table to get the values we need just like we would any other query.

The first thing I’m going to have to do is declare a variable as an NVARCHAR(MAX). Then, I select my contract terms field (using JSON_QUERY to wrap it and retrieve the service rates structure from the contract_terms). I’m assuming here that I already know which contract ID I need to retrieve (1).

DECLARE @json NVARCHAR(MAX)
SELECT @json = JSON_QUERY(contract_terms, '$.service_rates') FROM provider_contracts WHERE id = 1
PRINT @json 

I’m printing out the value of the variable just to check that I’ve got what I expected.

[ 
    {"service_description": "Office Visits 1", "service_code_range_start": "99211", "service_code_range_end": "99221", "flat_rate": "45.00"}, 
    {"service_description": "Office Visits 2", "service_code_range_start": "99222", "service_code_range_end": "99251", "flat_rate": "56.00"}, 
    {"service_description": "Anesthesia", "service_code_range_start": "00100", "service_code_range_end": "00952", "billed_percentage": "0.7401"}
]

Now, I can use this @json variable in a query. I’m going to pass @json to the OPENJSON function, which will essentially convert that document to a table for me. I will be using the WITH clause to explicitly specify the scalar value fields I want back and the path where those scalar value fields can be found. I’m aliasing this table that’s returned from OPENJSON as “js”.

I then use CROSS APPLY to get other fields from my provider_contracts field (making sure to specify the provider contract ID in my WHERE clause, the same as I did when I retrieved the JSON). After that, I do an INNER JOIN to my providers table so I can get my provider name and NPI fields. I then add fields in my WHERE clause to get the service rates where my service code falls within the range. (Try changing the code in both of the WHERE statements from 99215 to 00102 and see the difference in results.)

DECLARE @json NVARCHAR(MAX)
SELECT @json = JSON_QUERY(contract_terms, '$.service_rates') FROM provider_contracts WHERE id = 1
PRINT @json

SELECT p.provider_name, p.npi, pc.id, pc.provider_id,
    pc.contract_eff_date, pc.contract_term_date, js.*
FROM OPENJSON(@json)
WITH (
   svc_description nvarchar(100) '$.service_description',
      svc_range_start nvarchar(100) '$.service_code_range_start',
      svc_range_end nvarchar(100) '$.service_code_range_end',
   flat_rate decimal(12, 2) '$.flat_rate',
   billed_percentage decimal (12, 4) '$.billed_percentage'
) AS js 
CROSS APPLY provider_contracts AS pc 
INNER JOIN providers p ON p.id = pc.provider_id 
WHERE pc.id = 1
   AND js.svc_range_start <= '99215'
   AND js.svc_range_end >= '99215'

You can see here how we can convert a JSON document stored in an NVARCHAR field into a table. Then we use that the way we would any other table structure. (Also, there’s probably a better way than above as well, so I’d love to hear alternative approaches too.) We could wrap the code above in a stored procedure that takes the service code and the contract ID (or provider ID and the service date with a few tweaks). We could then have the system return to us the contract details we need to calculate the payment for the selected service in a table structure.

There’s much more to learn about using JSON within MS SQL Server. I encourage you to play around with these functions and see what more you can get out of them.

I will say that this approach in general feels very kludgey to me. It seems like we’re writing a lot of code here to accomplish something that should be easier. However, the JSON document gives us a lot of flexibility within this contract structure that might be useful. I tend to look at the functionality that SQL Server provides as convenient if you need it. But I doubt that I would intentionally build something on top of this.

The primary reason I’d store JSON in a SQL Server database would be to have an application retrieve it and use it from there. There’s always the possibility that I’d want or need some backend process to operate on the data. (e.g. data migrations, cleanup, updates, etc.) In this case, SQL Server gives me some mechanism for doing that.

In the next part of the series, I’m going to cover how to query JSON in PostgreSQL. Unlike Microsoft, the PostgreSQL community made new data types to handle this information. As a result, I think we’re going to see a lot more flexibility and function with that approach. In the final part of the series, we cover how to query JSON in MySQL. Until next time, happy coding!

How useful was this post?

Click on a star to rate it!

Average rating 0 / 5. Vote count: 0

No votes so far! Be the first to rate this post.

We are sorry that this post was not useful for you!

Let us improve this post!

Tell us how we can improve this post?

2 Comments

  • Tim Bui June 2, 2021

    Hi Dave, I am a new student of SQL. I started to watch your youtube video and subscribed to your blog to learn more about using JSON in SQL Server. This subject is new to me. If you have time, could you please show me how to convert a regular table into a json format. Say, I have a table of 4 columns (ticker, balancesheet_items, fy and amount). How can I convert this into json so that the new table would be have rows like these:
    {‘ticker’:’IBM’, {‘balancesheet_item ‘: ‘cash’, ‘amount’:{ ‘fy2020’:100, ‘fy2019’:110, ‘fy2018’:120}}}
    ,{‘ticker’:’IBM’, {‘balancesheet_item ‘: ‘account_receivables’, ‘amount’:{ ‘fy2020’:200, ‘fy2019’:210, ‘fy2018’:220}}}
    ,{‘ticker’:’AAPL’, {‘balancesheet_item ‘: ‘cash’, ‘amount’:{ ‘fy2020’:300, ‘fy2019’:310, ‘fy2018’:320}}}
    ,{‘ticker’:’AAPL’, {‘balancesheet_item ‘: ‘account_receivables’, ‘amount’:{ ‘fy2020’:400, ‘fy2019’:410, ‘fy2018’:420}}}

    I am thinking that I need to use json structure to store data because there are 4000+ companies in the US. Each company reports balance sheet, income statement and cash flow statement. There are some 50 financial items on each financial statement, and each item has about 10 years going back. Individual table would have way too many columns, or I would have to have way too many tables.
    If you think I need to use a different structure, I would be very appreciative hearing as well.
    Thank you in advance for your help and much appreciate your teaching!
    Tim

    • David Medlock September 8, 2021

      Hi Tim, I’m so sorry for the delay in getting back to you. I’m sure that you’ve solved for this by now, but wanted to respond anyway.

      I think this depends on what you need to do with the data. I don’t necessarily see a problem with capturing those columns in a normal table structure where you keep track of the ticker, the balance sheet item, the amount, and the fiscal year. But again, it depends on what you plan on doing with the data down the road. You could end up with hundreds of thousands of rows possibly. (Let’s say ~4000 companies, about 10 different balance sheet items, over 10 years – you’d have 4 columns and 4000 * 10 * 10 rows = 400k rows total. Not a large table.)

      I’d recommend adding a primary key with an identity field so you have some auto-incrementing and can easily identify rows by an ID. You may also add an index on the field(s) you plan on searching, such as the ticker field maybe.

      From there, you could use something like PowerBI or Tableau to create charts and graphs for different balance sheet items over time, add multiple tickers, etc. if you wanted to compare, say, cash over time between AAPL and AMZN, for example.

      If it’s something you’re still working on and want to run other things by, I’m happy to help. I like the idea of combining JSON with relational databases, but I also think that it has very specific use cases and isn’t always the most efficient. Also, SQL Server – as much as I love it – is probably the worse of the major database platforms in terms of its support for JSON fields because it essentially stores it in a large text field. So I’d have concerns about performance as the data volume grows.

Comments are closed.