A New Approach to On-Call Access Management
Imagine the following scenario — an SRE gets alerted in the middle of the night about an incident that requires them to log into the production database to debug a performance issue related to one of the company’s critical applications. There are three ways in which this scenario typically plays out next, all of which have trade-offs.
Scenario 1: SRE and DevOps teams often grant permanent standing access to production systems for each member. For example, Frank and Nancy may have individual accounts on the production database that were created when they joined the group, thus allowing access whenever they want. This is problematic because manual provisioning and de-provisioning of individual accounts is error-prone and leaves critical systems with unused accounts and privileges vulnerable to attacks. More importantly, it isn’t possible to vet who should or shouldn’t be accessing a system at a particular time of day or week. For instance, is it okay for Frank to be logging into the production database on a weekend when he’s supposedly not on-call for support incidents? Perhaps not!
Scenario 2: Another common but highly insecure practice is for team members to have permanent standing access in the form of shared accounts i.e., accounts whose credentials are shared with everyone. This leads to bad security hygiene for obvious reasons — credentials become hard to rotate, individuals get phished for credentials, or they may end up leaking them inadvertently in code or in configuration files.
Scenario 3: To counter these problems, some teams overcompensate by removing all standing access to production systems from all but a couple of trusted administrators. Anyone requiring access must make an explicit request, which gets manually triaged for necessity and urgency before granting access. This approval process typically takes place in ServiceNow, JIRA, Slack, and other internal services. Though this guarantees greater security and monitoring, it also slows down the incident response process and hurts business SLAs. An SRE may have to wait for their access request to be approved in the middle of night while an administrator gets notified through ServiceNow, JIRA, or Slack, triages the request, and grants access.
Thus, teams often end up making a trade-off between being more agile and having strong security hygiene. It’s time for a new approach. In this blog post, you will learn how Cyral helps SRE and DevOps teams overcome this false dilemma by allowing them to be both agile and secure at the same time.
What’s required to be both agile and secure?
The ideal solution to this problem ensures that only authenticated on-call engineers are authorized to access specific production data resources during their on-call shifts. It should do so by enabling the following 5 key aspects:
- Enforcing centralized authentication using an existing IDP
- Automatically limiting access to production data service to only on-call engineers
- Enabling self-service for on-call engineers to grant limited access to others for troubleshooting
- Providing a strong audit trail of all access approvals and durations
- Generating a detailed activity log of what data was exposed to who during the process
How Cyral solves this problem
Cyral enables administrators to remove all standing access to production databases, and automates the process of granting time-bound access to only those individuals that are on-call at a particular time of the day or week.
On-call access management benefits
Using Cyral for automated access management in this manner has the following benefits:
- Strong Security: Administrators can implement strong security hygiene by eliminating shared account access to databases, and getting rid of provisioning and deprovisioning of individual accounts
- Agility: Team members who are on the hook during production incidents can be more agile by not having to go through elaborate approval processes and waiting on administrators to review and grant them access
- Strong Audit Trail: Security teams have access to an audit trail of all databases accesses including rich session recordings of all activity for audit and compliance purposes
How it works
The following architectural diagram illustrates how Cyral accomplishes this.
At the center is the Cyral sidecar, a transparent and stateless interception mechanism, that brokers all connection requests to your databases. It does this by integrating with identity and MFA providers such as Auth0, Okta, Active Directory, Duo Security, etc. for authenticating your users and determining their group memberships in the organization.
Upon successful authentication, the sidecar checks on-call schedules of the users in your incident response service, such as PagerDuty, VictorOps, and OpsGenie.
If the user is determined to be on-call at the time, they’re allowed to connect through to the database, otherwise their access is denied. In either case, an audit trail for the connection request is generated and sent to your SIEM such as ELK, Datadog, Sumo Logic, and Splunk. Further, when access is granted, a complete session recording of the user’s activity is also generated and sent to the SIEM for audit and compliance purposes.
Cyral also supports authorization rules in the form of policies that allows administrators to set constraints on what their team members can or cannot do once they’re logged into the database. Can an SRE drop a table or add a new database user? Can someone run a full table scan on the customer data? You can configure all this and more using easy to use security policies, a number of which are readily available out of the box.
Next, we’ll explore a few policies that are relevant to the on-call incident response use case.
Out-of-the box policies for on-call access management
Enable on-call access management
Let’s start with a simple out-of-box policy to enable on-call access management for a couple of repositories secured by Cyral. This policy is autogenerated and kept up-to-date as new repositories are added. The policy requires that members of the SRE group (as determined by group membership in the identity provider) be on-call in order to gain access to the patient-data-prod and insurance-claims repositories.
Given this policy, the following screenshot shows what happens if Frank Hardy, an SRE, attempts to log into the patient-data-prod database at a time when PagerDuty shows he has no scheduled on-call shifts. His connection request gets denied with the following message:
Access to repo ‘patient-data-prod’ is denied because user ‘frank.hardy@hhiu.us’ is not on call in PagerDuty at this time.
However, Frank is allowed to log in successfully when PagerDuty shows he is actually on-call.
Turn on MFA for repositories protected by on-call access
This policy shows how on-call access to patient-data-prod and insurance-claims repositories can be configured to require multi-factor authentication (MFA) upon a successful identity provider authentication. Cyral integrates with services such as Duo Security to provide this functionality. Additionally, this policy also requires that the clients being used to access the databases use TLS for security.
Limit privileged commands while connected to the database
The last policy example shows how an administrator can specify limits around which commands one can run while connected to the database. In the screenshot below, an SRE may not run the following:
- A “drop table” command
- A “create user” command
- A command that will result in a full table scan (such as SQL statements without a WHERE clause, or a LIMIT clause)
- A command that will result in a cross join (also known as FULL JOINs or cross products)
Summary
With this new approach to on-call access management, companies can now allow engineers to reduce MTTR while providing cloud infrastructure teams a simplified way to unblock investigations while limiting direct production data access. At the same time, there’s now a strong audit trail to understand what was done when and by whom.
Interested in trying it for your organization? Sign up for our Free Trial!
Learn more about our On-Call Access Management with PagerDuty Free Trial.
The trial allows you to locally spin up our stateless interception service (sidecar) to broker requests for PagerDuty on-call engineers to gain access to production databases, and federate authentication with Okta.