Latest White Paper | "Cyral for Data Access Governance"· Learn More
Cyral
Free Trial
Blog

Validating Deployment Requirements with CloudFormation Custom Resources

Here at Cyral, we have many microservices that depend upon external requirements. We automate those services’ deployment using Infrastructure as Code (IaC) templates and, ideally, we want to know if all the service’s requirements are met at deployment time. Some of the items we usually check include:

  • Secrets from AWS Secrets Manager are accessible. If this verification fails, it might be due to a connectivity problem (e.g., missing VPC endpoint or firewall rules) or a permission problem (e.g., the IAM policy does not allow the service to read the secret).
  • Database connectivity. We usually try to connect to database engines and run a simple query like SELECT 1. If this step fails, it can point to either connectivity issues (e.g., misconfigured DNS records, wrong firewall rules, etc.) or authentication problems.
  • Check the connectivity with external services. That includes, for instance, trying to use a pair of OIDC client-id and secret-key to get a temporary access token.

Since we deploy most of our services to AWS using CloudFormation, we share some ideas in this post on how to validate deployment requirements using this service.

Background

CloudFormation is the AWS service for Infrastructure as Code (IaC). It automates cloud deployments by making use of templates to create infrastructure resources like VPC, subnets, EC2 instances, just to name a few. The service groups these resources in a collection named “stack” in which resources establish dependency relationships. CloudFormation manages the state of the entire stack, which makes it easy to know when your infrastructure is deployed.

Although the state helps to keep your infrastructure consistent, it has some limitations. For instance, suppose you write a CloudFormation template to deploy a Lambda function that depends on an external service at runtime. Also imagine that the external service is not ready to serve requests. By default, the CloudFormation deployment would finish with a successful state, even though the Lambda will not be able to function properly, since the service it depends on is unavailable.

Instead, what if we make stack deployment complete only when all the external requirements are met?

CloudFormation custom resources

In addition to native resources like AWS::EC2::Instance, CloudFormation also offers a special resource type named Custom, also known as a custom resource. With this feature, we can extend CloudFormation capabilities and implement our own resource type through Lambda functions.

RequirementsCheckerFunction:
    Type: AWS::Lambda::Function
    Properties:
      Runtime: python3.9
      Handler: index.lambda_handler
      Code:
        ZipFile: |
          def lambda_handler(event, context):
            # check if all the requirements are met
RequirementsChecker:
    Type: Custom::RequirementsChecker
    Properties:
      ServiceToken: !GetAtt RequirementsCheckerFunction.Arn

In the above example, we are declaring a new resource named RequirementsChecker. Note that the resource type starts with Custom:: — that’s how CloudFormation knows it is a custom resource. We also specify the ServiceToken parameter, which refers to a Lambda function called RequirementsCheckerFunction. This function is where the custom resource implementation is made.

Checking if the requirements of a service are met

In the above example, you could fill the body of lambda_handle with code to check if your CloudFormation deployments requirements are met. For instance, you could try to read credentials from AWS Secrets Manager, use these credentials to connect to a remote API, and call a ping operation on that API. If any of those steps fail, the requirements checker Lambda can send a signal to CloudFormation indicating that deployment of that custom resource failed, which would make the whole stack deployment fail and trigger rollback.

Writing all those checks in your template might be cumbersome. To simplify things, a good approach is to implement a dry-run mode in your application. If your application is also a Lambda function, you can read a custom input parameter from the lambda event payload to control whether it should behave like a dry-run or not.

With this final approach, your final Lambda code could look like the following.

import base64
import boto3
import cfnresponse
import json
import os

def lambda_handler(event, context):
  try:
    if event['RequestType'] == 'Delete':
      cfnresponse.send(event, context, cfnresponse.SUCCESS, {})
      return
    lambda_client = boto3.client('lambda')
    response = lambda_client.invoke(
      FunctionName=os.getenv('SERVICE_LAMBDA_ARN'),
      Payload='{"dry-run":true}'
    )
    status_code = response.get('StatusCode')
    function_error = response.get('FunctionError')
    payload = response.get('Payload')
    if payload:
      payload = payload.read().decode('utf-8')
    if status_code >= 200 and status_code <= 299 and not function_error:
      cfnresponse.send(event, context, cfnresponse.SUCCESS, {})
    else:
      cfnresponse.send(
        event, context, cfnresponse.FAILED, {},
        reason=f"Service invocation terminated with error. StatusCode: {status_code}, Response: {payload}"
      )
  except Exception as e:
    cfnresponse.send(event, context, cfnresponse.FAILED, {}, reason=f"Unable to deploy the service: {e}")

The key points in the above code snippet are:

  • The event input parameter contains metadata passed by CloudFormation. For instance, we use the RequestType if the stack is being created, updated, or deleted. We don’t want to run the requirements checker when resources are being deleted in this example.
  • The RequirementsChecker Lambda synchronously invokes another Lambda function referenced by the SERVICE_LAMBDA_ARN environment variable, in dry-run mode. The response of the dry-run execution is used to determine the status of the CloudFormation stack deployment.
  • We use the cfnresponse module to send signals to CloudFormation so it knows the status of the custom resource. Sending FAILED will make the CloudFormation operation fail, and that’s what we want to accomplish on creation time when the requirements are not met.

Putting it all together

Now that we have a Lambda function that is able to process and respond to CloudFormation events, we can put it all together to complete our requirements checker.

ServiceFunction:
    Type: AWS::Lambda::Function
    Properties:
      Runtime: python3.8
      Handler: index.lambda_handler
      Code:
        ZipFile: |
          def lambda_handler(event, context):
            try:
              db.connect()
              db.ping()
            except Exception as e:
              return {'statusCode': 400, body: {'message': f'{e}'}}
            if event.get('dry-run'):
              return {'statusCode': 200, body: {'message': 'OK'}}
            return process_event(event, context)

RequirementsCheckerFunction:
    Condition: DoDryRun
    Type: AWS::Lambda::Function
    Properties:
      Runtime: python3.8
      Handler: index.lambda_handler
      Environment:
        Variables:
          SERVICE_LAMBDA_ARN: !GetAtt ServiceFunction.Arn
      Code:
        ZipFile: |
          import base64
          import boto3
          import cfnresponse
          import json
          import os
          def lambda_handler(event, context):
            try:
              if event['RequestType'] == 'Delete':
                cfnresponse.send(event, context, cfnresponse.SUCCESS, {})
                return
              lambda_client = boto3.client('lambda')
              response = lambda_client.invoke(
                FunctionName=os.getenv('SERVICE_LAMBDA_ARN'),
                Payload='{"dry-run":true}'
              )
              status_code = response.get('StatusCode')
              function_error = response.get('FunctionError')
              payload = response.get('Payload')
              if payload:
                payload = payload.read().decode('utf-8')
              if status_code >= 200 and status_code <= 299 and not function_error:
                cfnresponse.send(event, context, cfnresponse.SUCCESS, {})
              else:
                cfnresponse.send(
                  event, context, cfnresponse.FAILED, {},
                  reason=f"Service invocation terminated with error. StatusCode: {status_code}, Response: {payload}"
                )
            except Exception as e:
              cfnresponse.send(event, context, cfnresponse.FAILED, {}, reason=f"Unable to deploy the service: {e}")

RequirementsChecker:
    Type: Custom::RequirementsChecker
    Properties:
      ServiceToken: !GetAtt RequirementsCheckerFunction.Arn

Conclusion

Custom resources provide a way to extend native CloudFormation capabilities. One can rely on this feature to add new functionalities to the AWS template engine. In this article, we demonstrated an approach to check deployments prerequisites using custom resources. This approach consists of declaring a custom resource that runs the service in the dry-run mode at deployment time and, if it fails, the whole stack deployment also fails.

Subscribe to our Blog

Get stories about data security delivered directly to your inbox

Try Cyral

Get Started in Minutes with our Free Trial