Skip to main content
Version: v2.x

Set up Automatic Data Map

Cyral's Automatic Data Map capability is a way to have Cyral find and secure the data you care about. Automatic Data Map relies on a Cyral service called the Repo Crawler. When enabled, the Repo Crawler scans your specified repositories to find data locations (for example, database columns) that contain certain types of data. When the crawler finds a data location you might want to protect, it gives you the option to protect that location by including it in a Data Map protected by your policies.

To set up and use Automatic Data Map, the main steps are

Prerequisites

Cyral API client credentials for crawler

Set up Cyral API client credentials for the crawler. This is an account in Cyral that the Repo Crawler will use to connect to Cyral.

  1. In the Cyral control plane UI, click API Client Credentials in the lower left.

  2. Click the ➕ button. Give the account a name, and give it the following permissions:

    • View Datamaps permission
    • Modify Policies permission
    • Repo Crawler permission

    Click Create.

  3. Copy or store the Client ID and Client Secret so that you can use them later in this procedure.

    Cyral recommends that you store them in AWS Secrets Manager. Note the ARN of the secret; you will pass this ARN later as the CyralSecretARN.

    Store the credentials in the format shown here:

    {
    "client-id": "...",
    "client-secret": "..."
    }
    note

    If you lose these values, you can generate new ones using the Rotate Client Secret button in the Edit API Client window. In the Cyral control plane UI, click API Client Credentials, click the name of your API client credentials, and click Rotate Client Secret.

Database connection setup for crawler

  1. Make sure the database service is connected to and accessible through Cyral, as explained in Track a repository.

  2. For each database service to be scanned, find or create an account on the database service. This account must have read permissions on all tables and columns to be scanned. We call this the local database account.

    • Store the local database account credentials in AWS Secrets Manager (see format below) and provide the secret's ARN as the RepoSecretARN in the Repo Crawler deployment template.

      {
      "username": "...",
      "password": "..."
      }
    • Alternatively, you can provide the local database account username and password directly during crawler deployment later in this procedure as the RepoUsername and RepoPassword. Note that when you use this option, an AWS Secrets Manager secret will be created in the format described above which contains the provided credentials.

  3. Have ready the following connection details for your database:

    • RepoName: Name of the data repository, as saved in Cyral control plane
    • RepoType: Type of repository, like "PostgreSQL".
    • RepoHost, RepoPort: Cyral sidecar host and port where the crawler will connect to the repository.
    • RepoDatabase: Name of the database the Repo crawler will connect to.

Install and run the Repo Crawler

  1. Plan your CloudFormation stack deployment in AWS. The VPC and subnet where you deploy must provide:

    • Access to the internet
    • Network access to the Cyral control plane
    • Network access to the repositories you will monitor
    • Network access to the AWS S3 APIs
  2. Create the CloudFormation stack in AWS.

    • For Template source, choose Amazon S3 URL

    • For Amazon S3 URL, specify the Cyral Repo Crawler template download URL as follows:

      https://cyral-public-assets-<region>.s3.<region>.amazonaws.com/cyral-repo-crawler/cyral-repo-crawler-cft-latest.yaml

      where region is one of us-east-1, us-east-2, us-west-1, or us-west-2

      note

      You also have the option to use a versioned path for the crawler template. Form the versioned URL according to the following general format:

      https://cyral-public-assets-<region>.s3.<region>.amazonaws.com/cyral-repo-crawler/<version>/cyral-repo-crawler-cft-<version>.yaml

      where version is your desired version as discussed with Cyral support, for example, v0.2.6

      For example, to get version v0.2.6 for running in us-east-2, you would use the URL,

      https://cyral-public-assets-us-east-2.s3.us-east-2.amazonaws.com/cyral-repo-crawler/v0.2.6/cyral-repo-crawler-cft-v0.2.6.yaml
  3. In the Specify stack details page, provide the following information:

    • Stack name: Give the crawler Lambda function a recognizable name like, "Cyral-crawler"

    • ControlPlane: Hostname of your Cyral control plane

    • ControlPlaneRestPort: Keep the default value unless Cyral support advises otherwise. This is the REST API port number of your Cyral control plane.

    • ControlPlaneGrpcPort: Keep the default value unless Cyral support advises otherwise. This is the GRPC port number of your Cyral control plane.

    • CyralSecretARN, CyralClientId, CyralClientSecret: These fields provide the Cyral service user credentials for the crawler. There are two ways to set this up:

      • Store the credentials in AWS Secrets Manager and provide the secret's ARN here as the CyralSecretARN. See the earlier section, "Prerequisites" to learn how to format the secret.

        or

      • Leave CyralSecretARN blank, and provide the Cyral API client ID and client secret in the CyralClientId and CyralClientSecret fields. Get these values from the API Client Credentials screen in Cyral, as shown in Step 1.

  4. In Repository Configuration, provide the information the crawler will use to connect to your repository:

    • RepoName: Name of the data repository, as saved in Cyral control plane

    • RepoType: Type of repository. For example, PostgreSQL.

    • RepoHost, RepoPort: Host and port where the crawler will connect to the repository.

    • RepoSecret, RepoUsername, and RepoPassword: These fields provide the repository login credentials for the crawler. There are two ways to set this up:

      • Store the credentials in AWS Secrets Manager and provide the secret's ARN here as the RepoSecretARN. See the earlier section, "Prerequisites" to learn how to format the secret.

        or

      • Leave RepoSecretARN blank, and provide the username and password in the RepoUsername and RepoPassword fields.

    • RepoDatabase: Name of the database the Repo Crawler will connect to.

  5. Snowflake configuration: If you'll scan a Snowflake repo, provide its connection details here. Otherwise, leave this section blank.

  6. Oracle configuration: If you'll scan an Oracle database, provide its connection details here. Otherwise, leave this section blank.

  7. Connection String configuration, ConnectionOpts: Any additional parameters that your repository requires when the crawler connects.

  8. Networking and Lambda configuration:

    • ScheduleExpression: How frequently the crawler will run, expressed in cron notation.
    • VPC and Subnets: You must deploy the crawler to a VPC and subnet that has access to the internet.
    • RepoCrawlerCodeS3Bucket: Leave blank. This is only used for custom crawler deployments.
  9. Create the stack. The Cyral requirements checker Lambda will verify that the VPC provides all needed resources. If the checker fails, the deployment attempt will be rolled back.

Once deployed, the crawler will run automatically, based on the cron schedule you set in the ScheduleExpression field of the template.

To test your crawler, you can execute a manual test run in AWS and examine its logs in CloudWatch. To do this in the AWS console, navigate to your Repo Crawler Lambda and open the Test: Test event panel. Click Test to run the crawler.

Next step

To start scanning, you must specify data patterns to match for Automatic Data Map.