Skip to main content

Policy guide

With Cyral, you create policies that limit how your organization's data can be acted on by people and applications. With a policy in place, you can use it to block access (preventing users from violating your policy) and/or generate a log entry when a user violates the policy.

Your Cyral policies consist of:

  • a data map that specifies data fields to be protected
  • one or more policies that contain the rules specifying how the data can be accessed.

The data map and policies are expressed in YAML, as shown in the samples below.

Sample data map:

CCN:
- repo: claims
attributes: [finance.customers.ccn]
- repo: loans
attributes: [applications.customers.credit_card_number]
EMAIL:
- repo: claims
attributes: [finance.customers.email]
- repo: loans
attributes: [applications.customers.email]
SSN:
- repo: claims
attributes: [finance.customers.ssn]
- repo: loans
attributes: [applications.customers.social_security_number]

Sample policy:

data:
- EMAIL
- CCN
- SSN
rules:
- identities:
groups: [analyst]
reads:
- data: any
rows: 10
updates:
- data: [EMAIL, CCN]
rows: 1
severity: medium
deletes:
- data: any
rows: 1
severity: medium
- identities:
users: [bob]
hosts: [192.0.2.22, 203.0.113.16/28]
reads:
- data: any
rows: any
updates:
- data: [EMAIL, CCN]
rows: any
deletes:
- data: any
rows: any
- reads:
- data: [EMAIL]
rows: 1

Below, we explain the data map and policy structures and their fields, and we finish with a full interpretation of the sample policy.

Data map

The data map is an inventory of your data locations that allows you to establish a short name, called a LABEL, for each data location (like a table, collection, or S3 bucket) that you want to protect. When you write a policy rule, you'll use LABELs rather than specific table and column names to specify which data the rule protects.

Each LABEL maps to a specific location (for example, a specific column in a specific database). Because a single LABEL can refer to many locations in many repositories, the data map gives you the ability to write a policy that treats your data consistently, even when that data is spread across many data repositories.

The data map follows this structure:

{ LABEL }:
- repo: { REPOSITORY_NAME }
attributes: [{ ATTRIBUTE_LOCATION }, ...]

And the fields are defined as follows:

  • {LABEL} (string): label given to the data specified in the corresponding list. Important: See the Limits on how you create labels, below.
  • each value in the list assigned to a label is an object made up of two fields:
    • repo (string): name of the repository containing the data as specified through the Cyral management console
    • attributes ([string]): contains the specific locations of the data within the repo, following the pattern {SCHEMA}.{TABLE}.{ATTRIBUTE}
      • When referencing data in a Dremio repository, please include the complete location, with each nested Dremio space separated by a .. For example, an attribute my_attr contained by table my_tbl within space inner_space within space outer_space would be referenced as outer_space.inner_space.my_tbl.my_attr.

Limits on how you create and use labels

When creating and using a LABEL, please observe these limits:

  • Each LABEL must be defined in only one data map.
  • A LABEL can refer to one or many attributes (for example, tables, fields, or columns) in one or many repositories.
  • A given repository location (a table, collection, field, column, or bucket) must be included in only one LABEL.
  • Each LABEL must be used in only one policy. You may use the LABEL in one or many rules in the policy.

If your data maps and policies violate any of these limits, the policy update will fail. These limits prevent users from writing conflicting rules about the same data.

Data map example

In the below example, we assign labels to data in two repos, claims and loans. The label CCN is assigned to the attribute ccn in the table customers in the finance schema of the claims repository as well as the attribute credit_card_number in the table customers in the applications schema of the loans repository. The labels EMAIL and SSN are also assigned to email and social security number data from each repo, respectively, following the same pattern.

CCN:
- repo: claims
attributes: [finance.customers.ccn]
- repo: loans
attributes: [applications.customers.credit_card_number]
EMAIL:
- repo: claims
attributes: [finance.customers.email]
- repo: loans
attributes: [applications.customers.email]
SSN:
- repo: claims
attributes: [finance.customers.ssn]
- repo: loans
attributes: [applications.customers.social_security_number]

In the next section, we'll show a sample policy that sets access rules for each of these data labels (CCN, EMAIL, and SSN). The policy applies to all repositories included in the data map.

Policy

Your policy consists of these main parts:

  • The data block lists the data locations (schemas, tables, columns, and so on) that this policy covers, using the LABELs you've established in your data map.
  • The rules block holds your data access rules that govern who can perform which operations on which data.
  • In the case of policies managed in Github, we also include a meta block which sets the policy's name
info

For a guided tour of a working policy, skip forward to Interpreting the sample policy now.

To understand policy structure, let's look at our Sample policy again so we can examine its structure:

data:
- EMAIL
- CCN
- SSN
rules:
- identities:
groups: [analyst]
reads:
- data: any
rows: 10
updates:
- data: [EMAIL, CCN]
rows: 1
severity: medium
deletes:
- data: any
rows: 1
severity: medium
- identities:
users: [bob]
hosts: [192.0.2.22, 203.0.113.16/28]
reads:
- data: any
rows: any
updates:
- data: [EMAIL, CCN]
rows: any
deletes:
- data: any
rows: any
- reads:
- data: [EMAIL]
rows: 1

The meta block of a policy

The meta block is where you give your policy a name and optional tags. This section is only required if you're automating your policy publishing via Github integration.

caution

If you create a policy here and then move it to the version control system (like GitHub) for automated policy management, you must leave the policy and its name unchanged in the Cyral UI.

The data block of a policy

In your policy, you use the data block to specify which data fields this policy manages. In the data block, you list each field using the LABEL you established for it in your data map. The actual location of that data (the names of fields, columns, or databases that hold it) is listed in the data map.

tip

As mentioned earlier, each LABEL must be used in only one policy. In other words, no two policies may overlap in terms of the fields they protect. Within a single policy, you may use a single LABEL many times in many rules.

Rules

The rules block of a policy

Rules specify who can interact with which data, and what actions they can take on that data. Inside the rules block:

  • Every rule except your default rule has an identities specification that specifies the people, applications, or groups this rule applies to.
  • Every rule contains of a set of contexted rules, one for each type of access: reads, updates, and/or deletes. Each contexted rule applies only in the context of its specified operation type. For example the reads rule applies only when someone tries to retrieve data. The rules block does not need to include all three operation types; actions you omit are disallowed.
  • A rule may optionally contain a hosts specification that limits access to only those users connecting from a certain network location.

Unless you create a default rule, users and groups only have the rights you explicitly grant them.

The default rule

A default rule is an optional rule without an identity specification (identities field). It applies to any user whose username or group affiliation failed to match any other rule. Without a default rule, the policy only allows those actions explicitly granted in the identities-based rules.

The following default rule from the sample policy specifies that any person who failed to match the other rules will be allowed to read only 1 row of EMAIL at a time. Updates and deletes are disallowed in for such users, since the default rule contains no updates or deletes permissions.

reads:
- data: [EMAIL]
rows: 1

The identities specification in a rule

For each rule, you can specify the set of identities (people, applications, and groups) to which the rule applies. If you omit the identity specification, this rule becomes the default rule.

  • users ([string]): individual users

  • services ([string]): applications

    • for users going through Looker, use the service name looker
    • for custom services use the application name provided in the connection URL when connecting to the database
  • groups ([string]): user groups defined your enterprise SSO service such as GSuite or Okta

For example, the following identity specification indicates that the rule will apply to users bob and sara, any users going through the service looker, and any users belonging to the user group analyst.

identities:
user: [bob, sara]
services: [looker]
groups: [analyst]

In a policy, a limit of one rule per user or group

Within a given policy, make sure you only create one rule per user or group. In other words, no two rules in a single policy can contain the same user/group/service. In our example, this means that the user bob can only appear in one rule for a given policy.

Specifically, the following limits apply in order to prevent conflicts within a policy:

  • Each person must have only one rule that specifically applies to that person by username

  • Each group must have only one rule that specifically applies to that group by name

  • A person may have both a rule applied to them by username, and one or more rules that apply to them based on group affiliation. In this case, the rule that applies to them by username takes precedence.

    • Looking at the sample policy, we can see that one rule applies to the user bob and another applies to the user group analyst. If bob happens to be a member of the group analyst, then when Bob attempts to perform a data operation, we will apply the rule specified for the user bob and ignore the rule specified for the group analyst. In overlap cases like this, Cyral enforces a single rule with the following precedence: user > group > service.

The hosts specification in a rule

The hosts specification is optional. It lists the host addresses that are allowed to connect to the data locations governed by this rule. If you do not include a hosts block, Cyral does not enforce limits based on the connecting client's host address.

To specify a hosts block, provide addresses as a comma-separated list of IP addresses and network blocks in CIDR notation. When a user tries to perform a data operation while connected from any host other than those you list here, the rule blocks the action.

For example, the hosts specification shown below ensures that data locations in this rule can be accessed only while connected from a host at 192.0.2.22 or one of the hosts in the 203.0.113.16/28 block.

hosts: [192.0.2.22, 203.0.113.16/28]

Contexted rules

Each contexted rule comprises these fields describing the allowed access for a given access type:

  • data ([string]): the data locations protected by this rule.
    • Specify locations using LABELs you've established in your data map.
    • Specify a value of any to grant access to all the data locations protected by the current policy.
  • rows (int): the number of records (for example, rows or documents) that can be accessed/affected in a single statement.
    • Specify a value of any to allow an unlimited number of records to be accessed/affected in a single statement.
  • other optional fields, like additional checks and request rewriting.

For example, the following rule from the sample policy specifies that individuals belonging to the user group analyst can read 10 rows at a time from any of the tracked data locations (EMAIL, CCN, and SSN). They can also write 1 row at a time to the locations EMAIL and CCN, and they can delete 1 row at a time from any of the tracked locations.

identities:
groups: [analyst]
reads:
- data: any
rows: 10
updates:
- data: [EMAIL, CCN]
rows: 1
severity: medium
deletes:
- data: any
rows: 1
severity: medium
Optional fields in a contexted rule

Users can also specify the following optional fields in a contexted rule:

  • additionalChecks (string): constraints on the data access specified in Rego. See Additional checks.
  • datasetRewrites ([object]): defines how requests should be rewritten in the case of policy violations. See Request rewriting.
  • severity (string): severity level that's recorded when someone violate this rule. This is an informational value. Settings: (low | medium | high). If not specified, the severity is considered to be low.
Example with optional fields

For example, the following rule from the sample policy specifies that individuals belonging to the user group analyst can read 10 rows at a time from any of the data locations covered by this policy (EMAIL, CCN, and SSN). They can write 1 row at a time to the locations EMAIL and CCN. Finally, they can delete 1 row at a time from any of the data locations covered by this policy, provided they are using the psql application to do it.

identities:
groups: [analyst]
reads:
- data: any
rows: 10
updates:
- data: [EMAIL, CCN]
rows: 1
severity: medium
deletes:
- data: any
rows: 1
severity: medium
additionalChecks: |
is_valid_request {
client.applicationName == "psql"
}

Additional checks

Beyond specifying which and how much data can be accessed in the data and rows fields, you can impose more sophisticated constraints by adding the additionalChecks field to a contexted rule.

The additionalChecks field contains a rule you'll write in the Rego language. The checks you specify in this field will be evaluated each time the contexted rule applies to an access request. Specify each check in the form of a Rego rule named is_valid_request, which needs to evaluate to true for the access attempt to be considered an allowed request. Otherwise the request will be considered a policy violation.

Each rule can evaluate attributes of the access request that are made available in the activity log. This information is exposed in the context of the Rego rule through the following variables that represent top-level fields in the activity log:

  • identity: information about the entity performing the observed data access
  • client: information about the client application from which the data is accessed
  • repo: information about the repository being accessed
  • request: information about the request itself
  • tags: values provided in the request comment via the pass-through CyralTags

Attributes nested inside these top-level fields can be accessed using dot notation (e.g. identity.endUser, client.applicationName, repo.type, and so on).

As an example, the following additional check denotes that whatever access the check is specified for is only valid if the access is through a psql client. A more sophisticated example is provided in the Examples section at the end of this document (Example 6: Only allow users to see data pertaining to themselves).

additionalChecks: |
is_valid_request {
client.applicationName == "psql"
}

In the above example, we use the | operator, which denotes a multiline string in YAML. See this page for more information on specifying multiline strings in YAML.

note

The Rego language defines a Rego module as comprising a Package declaration, a set of Import statements for declaring data dependencies, and a set of Rules. In this context, users need only specify Rules, omitting Package declaration and Import statements.

Request rewriting

You can specify how a read request should be rewritten when that request would otherwise violate your policy. This allows you to place constraints on what the data user can retrieve.

Specify this by adding the datasetRewrites field in your contexted rule. The datasetRewrites field contains an array of objects with the following structure:

  • repo (string): the name of the repository that the rewrite applies to
  • dataset (string): the dataset that should be rewritten
    • in the case of Snowflake, this denotes a fully qualified table name in the form <database>.<schema>.<table>
  • parameters ([string]): the set of parameters used in the substitution request, these are references to fields in the activity log as described in the Additional Checks section above
  • substitution (string): the request used to substitute references to the dataset

For example, the following contexted rule specifies a rewrite that is triggered in the event a request which reads EMAIL data would produce a policy violation. As a result, in this case, any references to the fully qualified table myDb.finance.customers will be replaced with the subquery SELECT * FROM myDb.finance.customers WHERE email=:identity.endUser:, where :identity.endUser: would be replaced with the value in the identity.endUser field in the activity log.

reads:
- data: [EMAIL]
rows: 10
datasetRewrites:
- repo: claims
dataset: myDb.finance.customers
parameters: [identity.endUser]
substitution: "SELECT * FROM myDb.finance.customers WHERE email=:identity.endUser:"

As a more specific example, suppose an individual makes the following query which would cause a policy violation due to reading more than the 10 row limit specified above. Suppose also that the individual has accessed the repository using SSO authentication, and is identified as the end user nancy.drew@hhiu.us.

SELECT * FROM myDb.finance.customers;

Given the dataset rewrite specification in the example above, the Cyral sidecar would rewrite the query such that the receiving database sees the following query.

SELECT * FROM (SELECT * FROM myDb.finance.customers WHERE email='nancy.drew@hhiu.us');
note

Currently, parameter substitutions take place even within string literals. For example, the substitution "SELECT FROM myDb.finance.customers WHERE greeting = 'Hello, :identity.endUser:'" contains the string literal 'Hello, :identity.endUser:'. During request rewriting, the sidecar will substitute :identity.endUser: with whatever value is in the identity.endUser field in the activity log associated with the data access.

Interpreting the sample policy

Here we show the same sample policy presented at the beginning of this document. This policy manages the data locations EMAIL, CCN, and SSN, which map to email, credit card number, and social security number data located in repositories claims and loans as defined in the Data Map.

Based on the rules specified in this policy,

  • Users belonging to the user group analyst are allowed to read up to 10 rows at a time from any of the data locations covered by this policy, update 1 row at a time of EMAIL or CCN data, and delete 1 row at a time from any of the data locations covered by this policy.

  • As an exception, the user bob can read any amount of rows for any of the covered data locations, update any number of rows in EMAIL or CCN, and delete any number of records from any of the locations covered by this policy, but he can do this only when connected from a machine with the address 192.0.2.22 or with an address in the range of the subnet 203.0.113.16/28.

  • All other users (those who are not bob nor belonging to the group analysts) can read 1 row of EMAIL at a time. Any other access to the data locations EMAIL, CCN, and SSN is disallowed.

data:
- EMAIL
- CCN
- SSN
rules:
- identities:
groups: [analyst]
reads:
- data: any
rows: 10
updates:
- data: [EMAIL, CCN]
rows: 1
severity: medium
deletes:
- data: any
rows: 1
severity: medium
- identities:
users: [bob]
hosts: [192.0.2.22, 203.0.113.16/28]
reads:
- data: any
rows: any
updates:
- data: [EMAIL, CCN]
rows: any
deletes:
- data: any
rows: any
- reads:
- data: [EMAIL]
rows: 1

Sample policy use cases

Example 1: Read-only access

Here, the user bob has unlimited read-only access to the data EMAIL, CCN, and SSN, disallowing update and delete access.

data: [EMAIL, CCN, SSN]
rules:
- identities:
users: [bob]
reads:
- data: any
rows: any

Example 2: Add a default rule

This policy includes a default rule (rule with no identities specified) which allows reads of EMAIL up to 1 row at a time if the accessing user is neither bob nor alice. In the previous cases where the default rule isn't specified, no accesses (reads, updates, or deletes) are allowed for users for which an identified rule is not specified.

data: [EMAIL, CCN, SSN]
rules:
- identities:
users: [bob]
reads:
- data: any
rows: any
updates:
- data: [CCN]
rows: 1
deletes:
- data: any
rows: 1
- identities:
users: [alice]
reads:
- data: [EMAIL, CCN]
rows: 1
updates:
- data: [EMAIL]
rows: 1
- reads:
- data: [EMAIL]
rows: 1

Example 3: Apply the same rules to a group of users

The following policy specifies bob, tom, and alex have unlimited read access and limited update and delete access, whereas alice and jeff have limited read and update access and no delete access.

data: [EMAIL, CCN, SSN]
rules:
- identities:
users: [bob, tom, alex]
reads:
- data: any
rows: any
updates:
- data: [CCN]
rows: 1
deletes:
- data: any
rows: 1
- identities:
users: [alice, jeff]
reads:
- data: [EMAIL, CCN]
rows: 1
updates:
- data: [EMAIL]
rows: 1

Example 4: Apply rule to a service identified by its name

The following policy specifies that the service operatingCostPredictor can read an unlimited number of rows from the data location CCN but cannot update or delete.

data: [EMAIL, CCN, SSN]
rules:
- identities:
services: [operatingCostPredictor]
reads:
- data: [CCN]
rows: any

Example 5: Apply rule to users accessing data through Looker

The following policy specifies that users accessing data through Looker have unlimited read access and no update or delete privileges.

data: [EMAIL, CCN, SSN]
rules:
- identities:
services: [looker]
reads:
- data: any
rows: any

Example 6: Only allow users to see data pertaining to themselves

The following policy specifies that any amount of EMAIL, CCN, and SSN can be read, so long as the individual accessing the data was authenticated through SSO, and that they are accessing records which contain their own email address.

Specifically, this rule checks that the request satisfies a filter allowing retrieval of only those records whose userInfo.email value matches the requesting user's email address. The rule is able to read the user's email address from the identity.endUser field of the activity log because the user has authenticated through SSO (ensured by requiring the existence of the identity.group field).

data: [EMAIL, CCN, SSN]
rules:
- reads:
- data: any
rows: any
additionalChecks: |
is_valid_request {
identity.group
filter := request.filters[_]
filter.field == "userInfo.email"
filter.op == "="
filter.value == identity.endUser
}

Next step

See Manage policies in GitHub to see how you can manage your policies with an audit trail and the ability to roll back unwanted changes.