Skip to main content
Version: v4.7

Metrics Specification

Cyral publishes the metrics described here to track data activity and system health. Cyral metrics use a standard set of labels, listed at the end of this page.

Metrics format and exposure

The Cyral sidecar exposes a port that responds with metrics that conform to the OpenMetrics specification.

The metrics port defaults to 9000, but can be changed through configuration. Refer to your sidecar deployment option documentation for configuration options and service discovery snippets.

Metric filtering

By default, all Cyral metrics are exposed through the metrics endpoint, but you can filter metrics by name in two different ways. At deployment time, you can set a default regex which will filter all requests given to the endpoint by name. You can also override that default by setting the name_regex query parameter when scraping the metrics endpoint from your metrics scraper.

For example, to filter out any metrics that do not start with cyral_, you would configure your scraper to hit the following endpoint with default port:

cyral-sidecar:9000/metrics?name_regex="^cyral_.*"

Up metric

An up metric is exposed for each service inside the Cyral sidecar, along with a sidecar_service_name label. This metric represents whether or not the given service has responded to the latest metric poll.

Health status metric

Each sidecar instance exposes a metrics that represents its health status.

The metric has values and labels as follows:

StatusMetric ValueLabels
unknown0status="unknown"
healthy1status="healthy"
degraded2status="degraded",failed_components="component1;component2..."
unhealthy3status="unhealthy",failed_components="component1;component2..."

Sidecar System Metrics

CPU

This metric tracks CPU utilisation of the sidecar compute node. Short spikes are acceptable, but sustained use above 80% may cause performance degradation in the sidecar. This should be monitored at an individual sidecar instance level, and not aggregated across a group of sidecar instances.

Recommendation: Increase the size of the autoscaling group to provision additional nodes or increase the capacity of each node in the autoscaling group.

Memory

This metric tracks memory utilization of the sidecar compute node. Short spikes are acceptable, but sustained use above 80% may increase the risk that the sidecar instance will have to restart. This should be monitored at an individual sidecar instance level, and not aggregated across a group of sidecar instances.

Recommendations:

  • Increase the size of the autoscaling group to provision additional nodes or increase the capacity of each node in the autoscaling group.

  • Consider applying a sidecar memory budget. For deployments that need to keep sidecar memory usage under a certain threshold, Cyral provides an optional mechanism to enforce memory budgets. This feature is disabled by default to ensure every query is analyzed.

    If enabled, the sidecar memory budget limits the amount of memory the sidecar uses while parsing and analyzing queries and responses. In particular, the budget sets an upper bound on the maximum query/response size that will be analyzed. Queries/responses larger than what the current memory budget allows will not be analyzed.

    tip

    Please contact Cyral support for help changing the memory budget or setting the optimal budget for your use case.

Disk

The sidecar does not store any persistent data other than logs which should be automatically rotated as they are forwarded to a log collector service. Sustained sidecar disk utilisation above 50% should be investigated to ensure that log rotation is behaving correctly. Reaching 100% disk utilization may result in the sidecar compute node restarting.

Recommendation: Investigate the cause for increasing disk consumption by connecting to the sidecar instance and looking at the volume. Resolve by ensuring that log rotation is correctly configured.

Cyral Counters

System Health Metrics

Metric NameDescription
cyral_open_client_conns_countNumber of client connections established
cyral_closed_connection_countNumber of monitored client connections closed
cyral_query_duration_sum Cumulative sum of query execution duration
cyral_wire_dial_errors_count Number of times wire was unreachable
cyral_repo_dial_errors_count Number of times repository was unreachable
go_memstats_heap_inuse_bytes Memory used by sidecar applications
go_goroutines Number of Goroutines the sidecar is using
cyral_bypass_wire_count Number of connections that went through bypass mode due to the unavailability of the wire

See below for more detailed descriptions of these metrics.

Open connections

Calculated as: cyral_open_client_conns_count - cyral_closed_connection_count

This metric can be used to count the number of concurrent connections to the sidecar/repo. This can be used to alert if the number of connections falls outside an expected range such as:

  • Connections = 0 — May indicate a problem if an app is expected to maintain a persistent connection
  • Connections < x — A deviation from normal, based on expected use of the data, may indicate an issue.

Recommendation: If the number of connections falls outside the expected bounds, investigate the access logs to understand the behaviour change. If connections have increased, the logs will reveal which client is driving the additional traffic. If the connections have dropped, investigate the application for outage or failed authentication etc.

Average query duration

Calculated as: increase(cyral_query_duration_sum[1m]/increase(cyral_query_duration_count[1m])

This metric records the average time taken for a query. This can be used as an indicator of degraded application performance. An increase may indicate an issue with either the sidecar or application. Note that this is an average over many queries and may not be indicative if queries are run on an adhoc basis.

Recommendation: If the average query duration increases, check the sidecar CPU/memory and repository CPU/memory are not reaching their limits. Using the access logs, determine if all queries are taking longer than previously, or a subset of queries. Investigate if the nature of some/all queries have become more complex resulting in longer repository processing time.

Dial errors

  • cyral_wire_dial_errors_count measures errors in internal sidecar communication between services.
  • cyral_repo_dial_errors_count measures errors in external communication with the repo.

These metrics indicate an error communicating internally among sidecar services or externally with the repository, respectively. A single, infrequent event may not be of concern, but a large number of events or an increase in frequency may indicate a connectivity or authentication issue.

Recommendation: Wire dial errors should be reported to Cyral. Repo dial errors indicate that the sidecar is unable to reach the configured repository. Check that the repository endpoint is correctly configured in the Cyral console, and that any security groups on the repository allow traffic from the sidecar on the configured port.

Golang memory usage

The go_memstats_heap_inuse_bytes metric reports how much memory the Cyral sidecar applications are using. A constant increase of this value or reaching 80% of the nodes capacity could indicate a memory leak and may result in the sidecar restarting

Recommendation: Report to Cyral for investigation

Goroutines

The go_goroutines metric represents how many goroutines the sidecar is using. Like memory, a constant increase may indicate a leak and should be investigated.

Recommendation: Report to Cyral for investigation

Unanalyzed queries

The cyral_bypass_wire_count metric can alert you when some database traffic is not being fully monitored by Cyral.

The sidecar supports a mode ("Enter passthrough mode on failure") which prioritizes data access over monitoring. This means that if an internal component of the sidecar has an issue, the sidecar will attempt to ensure traffic is still directed to the repository, even if analysis and monitoring cannot occur. Small increases in this metric may indicate a complex query that is not being analysed correctly and should be reported to Cyral. Large increases in this metric (in line with the increase in queries) suggest that the sidecar has a partial failure and should be investigated/restarted if persisting.

Recommendation: Report to Cyral for investigation

Application Health Metrics

Metric NameDescription
cyral_authentication_failure_countNumber of authentication failures
cyral_portscan_countNumber of port scans
cyral_policy_violation_countNumber of queries that have resulted in policy violations
cyral_blocked_queries_countNumber of queries blocked by Cyral due to policy violations
cyral_queries_with_errorsNumber of queries that resulted in database errors

Authentication failures

The cyral_authentication_failure_count metric counts the number of authentication failures. A small increase in this metric may be due to someone mis-typing a password. A moderate increase in this metric may indicate an incorrect/changed password in the repository. A large increase may indicate an attacker/attempted breach of the repository and should be investigated.

Port scans

The cyral_portscan_count metric indicates that a client has attempted to connect to the sidecar, but not progressed the connection or provided any authentication, then terminated the connection. This is typically used by an attacker to scan a network and discover what addresses/ports are open before attempting to connect. In a private/restricted network, these should not be expected and should be investigated.

Policy violations

  • The cyral_policy_violation_count metric indicates how many policy violations have occurred. This can be used for verifying a policy before blocking mode is enabled, monitoring for malicious users, or detecting queries from applications that may not behave as expected.

  • The cyral_blocked_queries_count metric indicates how many queries are blocked due to policy violations. Increases may indicate a malicious user, or an application unable to complete its function (due to misconfigured code misconfigured policy)

Query errors

The cyral_queries_with_errors metric indicates an error occurred at the repository while processing the request.

Labels

LabelTypeDescription
repo_idstringRepository ID
repo_namestringRepository name
repo_typestringRepository type (MySQL, PostgreSQL, and so on)
client_hoststringClient IP address
client_tlsbooleanWhether the client connected to the sidecar using TLS
repo_tlsbooleanWhether the sidecar connected to the repository using TLS
sensitive_requestbooleanWhether the request accessed sensitive data
end_userstringThe user (SSO user or native data repository user) who connected to the repository
service_namestringThe service that connected to the repository