Latest White Paper | "Cyral for Data Access Governance"· Learn More
Cyral
Free Trial
Blog

How to Mask JSON and JSONB in PostgreSQL

Introduction

We’ve previously talked about our masking feature but this is focused on masking the contents of an entire column. We’ve had some recent customer questions about masking json data in PostgreSQL. The request was to be able to mask elements within JSON or JSONB data types in PostgreSQL. 

Postgres does have an extension called PostgreSQL Anonymizer that can provide masking/anonymizing capabilities. This extension does not come installed by default so would need to be installed into your Postgres server. This extension is also not yet on the Extension versions for Amazon RDS for PostgreSQL for those running in AWS RDS.

In addition to these potential limitations, we found it burdensome to add masking to multiple databases, schemas, tables, or fields. The PostgreSQL Anonymizer requires you to either create dynamic masking policies or static masking function calls. Creating exceptions to the masking is possible but again a burden. Once we scaled our masking to multiple fields, the task of keeping track of the masking policies and our exceptions grew too large to handle.

For these reasons, we decided that this might not be the perfect solution for our customers.

We decided to create a new function that could be used to handle this request.

Masking JSON Data in Postgres Using Custom UDF

In order to tackle this problem, we decided that it made sense to create our own custom UDF to test this out. For our example, we only needed to find a single path and change its value to a fixed string. In this example, the string will be REDACTED

We wrote our code in PL/SQL so that it didn’t have any extension requirements from the Postgres server. This code leverages two functions. The first function takes advantage of Postgres allowing users to overload their functions. The second function does the json path searching and masking.

The Function Code

Here is the code that we created as a simple demonstration of searching a json path for a particular key and masking it with a constant value.

-- This function is used to capture any regular json as input
-- It sends it as jsonb so that we can have a singular function for searching
-- Function 1
create or replace function testing.mask_json(data json, json_path jsonpath) returns json as $$
BEGIN
        return to_json(testing.mask_json(to_jsonb(data), json_path));
END;
$$ LANGUAGE plpgsql;

-- This is the core function that searches for the provided JSON path
-- Function 2
create or replace function testing.mask_json(data jsonb, json_path jsonpath) returns jsonb as $$
begin
        case
                when jsonb_path_exists(data, json_path) then
                	return jsonb_set(data, string_to_array(REGEXP_REPLACE(substring(cast(json_path as varchar), 4), '"','', 'g'), '.'), '"REDACTED"');
                else
                    RETURN data;
        end case;
END;
$$ LANGUAGE plpgsql;

Explaining the Function Code

We can start with Function 1. As noted above, this function is just an overload so that we can process both json and jsonb data types. This function takes a json data type as data and a jsonpath as json_path. From there, it casts the data to jsonb and calls Function 2 and returns its result.

Why did we do this? This is a very simple question to answer! We wanted to be able to first determine if the json_path existed in the data before attempting to mask it. If you check the Postgres documentation, you’ll notice that Postgres has a jsonb_path_exists but it does not have a json_path_exists function. This is used in a case statement to return the original data if we don’t find the path. If we do find the path, then we just do a little magic to convert the json_path to a format the jsonb_set function expects.  We replace the value of the path with our REDACTED string and return the resulting jsonb.

Testing the Code

With the sample function built, we can do some testing. Let’s start with our sample table called orders.

CREATE TABLE orders (
	id serial NOT NULL PRIMARY KEY,
	info json NOT NULL
);
INSERT INTO orders (info)
VALUES('{ "customer": "Lily Bush", "items": {"product": "Diaper","qty": 24}}'),
      ('{ "customer": "John Doe", "items": {"product": "Beer","qty": 6}}'),
      ('{ "customer": "Josh William", "items": {"product": "Toy Car","qty": 1}}'),
      ('{ "customer": "No Product", "items": {"qty": 1}}'),
      ('{ "items": {"product": "No Customer","qty": 1}}'),
      ('{ "customer": "Mary Clark", "items": {"product": "Toy Train","qty": 2}}');

We can look for an embedded json item like items.product and redact it.

SELECT testing.mask_json(info, '$.items.product'::jsonpath) as info FROM orders;

info                                                                    |
------------------------------------------------------------------------+
{"items": {"qty": 24, "product": "REDACTED"}, "customer": "Lily Bush"}  |
{"items": {"qty": 6, "product": "REDACTED"}, "customer": "John Doe"}    |
{"items": {"qty": 1, "product": "REDACTED"}, "customer": "Josh William"}|
{"items": {"qty": 1}, "customer": "No Product"}                         |
{"items": {"qty": 1, "product": "REDACTED"}}                            |
{"items": {"qty": 2, "product": "REDACTED"}, "customer": "Mary Clark"}  |

While this is a neat little trick, we have now successfully brought ourselves to the problem we discussed in our post Replacing Stored Procedures with Policy as Code for Your Favorite Databases The good news is that we have a new feature to announce that will get us over the hurdle.

New Feature Release: Custom UDFs in Policies

In order to help customers continue to leverage policy as code for your favorite database, we’re releasing a new feature in our 4.9 release. This new Custom UDF feature will allow users to create their own functions on the server like above, and have them executed via policy.

Using Our JSON Masking Function as a Custom UDF in a Policy

In order to leverage this feature in a policy, we’ll first need to create a data map and policy to leverage the custom UDF. We’ll use these next sections to create these and test. We are going to assume that you already configured the repository for masking. These steps are outside the scope of this article but you are welcome to review them in the Cyral documentation:

  1. Install a sidecar
  2. Track a repository
  3. Bind a repository to a sidecar
  4. Enable masking on your repository

Creating the Data Map

We’re going to make the label JSON_INFO for ease of identification. Our data map entry for this repository would look something like this

JSON_INFO:
    attributes:
        - testing.orders.info

Creating the Policy

With the data map entry created, we need to configure the policy. You can reference the documentation for additional details regarding the syntax of this new custom type. For this example, we’re going to implement masking for anyone. This will be a simple policy like the below.

data:
  - JSON_INFO
rules:
  - reads:
    - data:
        - custom:testing.mask_json(JSON_INFO, "$.items.product")
      rows: any
      severity: low

Running a Test Query

Now, we can run our previous query without needing to call our function. We can simply query the info column.

SELECT info FROM orders;

info                                                                    |
------------------------------------------------------------------------+
{"items": {"qty": 24, "product": "REDACTED"}, "customer": "Lily Bush"}  |
{"items": {"qty": 6, "product": "REDACTED"}, "customer": "John Doe"}    |
{"items": {"qty": 1, "product": "REDACTED"}, "customer": "Josh William"}|
{"items": {"qty": 1}, "customer": "No Product"}                         |
{"items": {"qty": 1, "product": "REDACTED"}}                            |
{"items": {"qty": 2, "product": "REDACTED"}, "customer": "Mary Clark"}  |

Due to our policy as code concepts, we are able to produce the same results without having to adjust any queries or policies on the server itself.

Conclusion

This is a simple example of how you can accomplish masking json data in postgres using our new custom UDF feature. This feature is available in all of our repositories that support masking and can allow you to do even more.

Subscribe to our Blog

Get stories about data security delivered directly to your inbox

Try Cyral

Get Started in Minutes with our Free Trial