Skip to main content
Version: v4.11

Data Map for S3

The Data Map is an inventory of your data locations that allows you to establish a short name, called a LABEL, for each data location (like a table, collection, or S3 bucket) that you want to protect. When you write a policy rule, you'll use LABELs rather than specific table and column names to specify which data the rule protects.

Each LABEL maps to a specific location (for example, a specific column in a specific database). Because a single LABEL can refer to many locations in many repositories, the Data Map gives you the ability to write a policy that treats your data consistently, even when that data is spread across many data repositories.

Structure​

The Data Map follows this structure:

{ LABEL }:
attributes: [{ ATTRIBUTE_LOCATION }, ...]

The fields are defined as follows:

  • {LABEL} (string): label given to the data specified in the corresponding list.
  • each value in the list assigned to a label is an object made up of two fields:
    • attributes ([string]): contains the specific locations of the data within the repo, following the pattern {BUCKET}.{KEY}, where KEY is optional

For example, the following Data Map entries are valid for S3:

EXAMPLE_BUCKET:
attributes: [my_bucket_name]

EXAMPLE_KEY:
attributes: [my_bucket_name.key]

Example

FUNDING_BUCKET_ALL:
attributes: [finance-funding]

FUNDING_2022_EVENTS:
attributes: [finance-funding.2022.event]

In the above example, for an S3 repository that you've tracked in Cyral:

  • The label FUNDING_BUCKET_ALL can be used to write policies that govern access to an entire S3 bucket, meaning it will cover all keys (files and folders) inside the designed bucket.
  • The label FUNDING_2022_EVENTS can be used to write policies that govern access to a specific S3 key, which could be a single file or a folder. In this example, 2022.event designates a specific folder in the finance-funding bucket.

Policy examples

The following policy examples using the three labels we established above.

Case 1: No file access

The user should not be able to read any file from the finance-funding bucket:

data:
- FUNDING_BUCKET_ALL
rules:
- identities:
users:
- frank.hardy@hhiu.us

By adding FUNDING_BUCKET_ALL to the top data field, we instruct the sidecars that this label is associated with sensitive data that needs to be governed by this policy. Since the rules block contains no rule declaring access permissions for this label, user Frank has no access.

Case 2: The right to read files only

The user should be able to read any file from the finance-funding.2022.event folder, but should not be able to list other folders or read files from any other folders inside that bucket.

data:
- FUNDING_BUCKET_ALL
- FUNDING_2022_EVENTS
rules:
- identities:
users:
- frank.hardy@hhiu.us
reads:
- data:
- FUNDING_2022_EVENTS
rows: any
severity: low

By adding FUNDING_BUCKET_ALL to the top data field, we instruct the sidecars that this label is associated with sensitive data that needs to be governed by this policy. Since the rules block contains no rule providing the access permissions for this label, user Frank has no access to the bucket as a whole.

By adding FUNDING_2022_EVENTS to the top data field, we instruct the sidecars that this label is associated with sensitive data that needs to be governed by this policy. This label also shows up in the rules.reads.data entry, meaning that the read access is governed by that specific rule.

Within this policy, we have two labels covering the same data:

  • FUNDING_2022_EVENTS: covers only the folder finance-funding.2022.event
  • FUNDING_BUCKET_ALL: covers all folders in this bucket, including finance-funding.2022.event.

When Cyral encounters a case like this, the most specific label is used to evaluate policies.

In this example, this means that even though FUNDING_BUCKET_ALL would prohibit Frank from reading data from finance-funding.2022.event, the more specific label, FUNDING_2022_EVENTS, overrides the broader label and allows the read to proceed.

Based on the bolicy above, Frank's attempt to run the following will fail because the policy does not contain a reads rule for the FUNDING_BUCKET_ALL. At the command line, Frank would see this:

aws s3 ls s3://finance-funding
Using S3 proxy: http://edge-sidecar-a01.example.cyral.com:453

An error occurred (Forbidden) when calling the ListObjectsV2 operation: Request blocked as user
[frank.hardy@hhiu.us] does not have permission to access the required resource

On the other hand, Frank can successfully download a file from the finance-funding.2022.event folder because the policy for him contains a reads rule for the FUNDING_2022_EVENTS label. Here's what Frank will see:

aws s3 cp s3://finance-funding/2022/funding/output.txt /tmp
Using S3 proxy: http://edge-sidecar-a01.example.cyral.com:453

download: s3://finance-funding/2022/funding/output.txt to ../../../../tmp/output.txt

S3 object key names containing dots

This behavior also applies to any other data repo, as this is a characteristic of the Data Map. It is not specifically related to S3.

When an S3 object key name contains dots

Let’s use the file downloaded in the previous use case as an example. This file resides in the S3 bucket finance-funding under the following S3 object key name:

finance-funding/2022/funding/output.txt

When adding this location to the Cyral Data Map, the administrator needs to convert it to the format used by Cyral, which consists of converting the delimiters from slashes (/) to dots (.).

When naively doing this conversion, we might end up with the following attribute entry:

SAMPLE_LABEL:
attributes:
- finance-funding.2022.event.output.txt

The above entry will be wrongly interpreted by the sidecar. To avoid such misbehavior, names containing dots must be wrapped in double-quotes. The correct way to write the above object key name in the Cyral Data Map is:

SAMPLE_LABEL:
attributes:
- finance-funding.2022.event."output.txt"

When an S3 bucket name contains dots

If your S3 bucket name contains dots (.), you must:

  • wrap the bucket name first in double quotes
  • wrap the entire bucket and object name string in single quotes.

For a sample S3 bucket called financefunding.euro, this would look like:

SAMPLE_LABEL:
attributes:
- '"financefunding.euro"'

For a sample object key 2022.event inside the bucket financefunding.euro, this would look like:

SAMPLE_LABEL:
attributes:
- '"financefunding.euro".2022.event'
note

Why is this needed? The wrapping of bucket names in double quotes and single quotes overcomes a YAML limitation. In the Cyral management console UI, Data Maps are managed through YAML files, and this introduces complications when strings start with quotation marks. By double-wrapping the bucket name, we preserve the double quotes around the bucket name, even when it's used with a dot-delimited S3 object key name like '"financefunding.euro".2022.event'

Next steps