Learn how you can Unlock Limitless Customer Lifetime Value with CleverTap’s All-in-One Customer Engagement Platform.
In preparation for compliance certification later this year, we disabled write access for all AWS console users (except the primary, more on that later) to streamline infrastructure change management. This post describes how we created a self enforcing, self documenting infrastructure change management system using Git, CodePipeline and CloudFormation.
AWS CodePipeline is a service that allows you to model and visualise steps within your software delivery pipeline. For building and testing, it integrates with AWS CodeBuild – another service that provides a customisable execution environment where you can run anything arbitrary. For deployment, it integrated with AWS CodeDeploy, Amazon Elastic Container Service (ECS) and AWS CloudFormation. To perform almost any task other than the built-in integrations, there is support for Lambda
CloudFormation provides a way to describe AWS resources in simple text. It’s used for provisioning and managing all AWS resources (VPC, routing, subnets, LaunchConfigurations AutoScaling Groups etc) that are required for running your application/workload.
We chose CodePipeline over Jenkins (who has served us well for a very long time now) to model our CloudFormation deployment pipelines because
Jenkinsfile
has too much knowledge of the operating environment. The ability to configure pipelines in CodePipeline based on the current operating environment (stack) by passing variables around using CloudFormation makes it easier to adapt to environment specifics. This makes it easier to provision environments on the flyAny decent CloudFormation codebase is broken into multiple stacks. With each stack being responsible for one specific function (network, database, service A, service B) and represented by a single template file. We modelled one CodePipeline pipeline per CloudFormation stack. This provides the ability to execute updates to CloudFormation stacks in any order we deem necessary.
Each pipeline was initially configured to watch our CloudFormation repository in GitHub. The downside to this is, a commit to a single file causes all the pipelines to begin executing in unison because each one detects a change in the repository. To get around this, we created a pipeline that synchronises files from GitHub to S3. Each pipeline is then configured to watch specific files (the stack’s template and configuration) in S3, instead of the whole repository in GitHub. This way, only the pipeline mapped to the specific stack executes when its template or configuration file changes. Here is what the GitHub to S3 sync pipeline looks like visually in CodePipeline
You can’t have a CodePipeline pipeline watch a file in S3, which is why we had to zip it into an archive that holds just the file to make CodePipeline happy. Here is what the buildspec.yml
used by CodeBuild in the above pipeline job looks like
version: 0.2
phases:
build:
commands:
- echo Build started at `date`
- export BUILD_ROOT=${PWD}
- echo "Fetch existing template files from s3"
- mkdir -p /tmp/${SOURCE_S3_BUCKET}
- aws s3 sync --quiet --delete s3://${SOURCE_S3_BUCKET}/ /tmp/${SOURCE_S3_BUCKET}/
- echo "Comparing and pushing updated files"
- |
cd ${BUILD_ROOT}
for file in $(find stacks -name "*.json" )
do
if [ -f /tmp/${SOURCE_S3_BUCKET}/TemplateSource/${file} ]
then
if ! cmp ${file} /tmp/${SOURCE_S3_BUCKET}/TemplateSource/${file}
then
zip --junk-path ${file}.zip ${file}
aws s3 cp $file s3://${SOURCE_S3_BUCKET}/TemplateSource/${file}
aws s3 cp ${file}.zip s3://${SOURCE_S3_BUCKET}/TemplateSource/${file}.zip
fi
else
zip --junk-path ${file}.zip ${file}
aws s3 cp $file s3://${SOURCE_S3_BUCKET}/TemplateSource/${file}
aws s3 cp ${file}.zip s3://${SOURCE_S3_BUCKET}/TemplateSource/${file}.zip
fi
done
- echo "Cleaning up deleted files using reverse lookup"
- |
cd /tmp/${SOURCE_S3_BUCKET}/TemplateSource/
for file in $(find stacks -name "*.json")
do
if [ ! -f ${BUILD_ROOT}/${file} ]
then
aws s3 rm s3://${SOURCE_S3_BUCKET}/TemplateSource/${file}
aws s3 rm s3://${SOURCE_S3_BUCKET}/TemplateSource/${file}.zip
fi
done
- echo "S3 file sync completed"
post_build:
commands:
- echo Build completed at `date`
Each CodePipeline pipeline is mapped to one single stack. There are two sources for the pipeline – the template and the configuration. When a change in either of the sources is detected, the pipeline begins executing. It creates a CloudFormation changeset and notifies everyone that a changeset is awaiting review. Upon manual review of the changeset a user with IAM permission codepipeline:PutApprovalResult
approves it. Upon approval, the changeset is executed.
We have a stack that provisions CodePipeline pipelines. These pipelines provision and manage other stacks. Yes, that sounds twisted – read it once again to wrap your head around it. This is our version of infrastructure init
. Once provisioned by hand, it’s capable of provisioning all the CloudFormation stacks we need to run and manage our platform. We have multiple, 100% isolated, independent production environments deployed across various AWS regions.
The change management process begins with a branch being created for a change. This branch is tested independently by the person working on it, in his/her development AWS account owned by CleverTap. Once the change is verified in development, a pull request to merge it into develop is raised. Develop is locked using GitHub’s branch protection rules. A merge into develop requires at least one review and can only be merged by the repository owners. This is where a peer review happens and is signed off by the repository owner(s). From develop, it’s merged into master by the repository owner(s). Once in master, the CodePipeline pipeline that is watching the repository detects the change and begins executing. This causes the files that changed in master to get updated in S3. When files in S3 are updated, the pipeline(s) watching it begin executing causing the creation of a changeset and notifying everyone via email that a change needs approval before being executed. The changeset is once again reviewed by the repository owner in our case and executed by any team member during environment maintenance hours.
Assuming no one other than the root AWS account has write access, this change management process is self-enforcing. There is no way anyone can change the state of infrastructure without following the process.
Although we don’t have a use case for it, another interesting possibility is to require two different people to approve before the pipeline continues and applies the changeset.
In the opening statement, I said that we disabled write access for everyone with the root account being the exception. That’s not 100% true. Operations and engineering IAM accounts have the ability to stop, start an instance, invalidate CloudFront cache and execute/approve build pipelines. These write actions are operational requirements. They do not change the configuration of deployed infrastructure.
AWS::Include
CodePipeline in my head is the way forward for us. CloudFormation delivery pipelines is just the beginning. We will begin moving all our software delivery pipelines over shortly. Having said that, based on our experience with CodePipeline over the past couple of months, here’s our wish list:
RestartExecutionOnUpdate
applies to updates to an existing pipeline. I understand it’s neat to automatically begin execution on provisioning, because – after all, the pipeline is meant to magically deploy software to its destination. But there are cases, especially when deploying to CloudFormation where you want to control the first time execution of a pipeline.ListAllMyPipelines
similar to S3 action ListAllMyBuckets
. This will limit the pipeline listing to only the pipelines that user has access to.TemplateConfiguration
. Access to the repository that holds template configuration is available to a larger audience. Update policy which controls stack protection information should be protected separately.Stop
or Cancel
during a pipeline execution. This can currently be achieved by rejecting an approval stage. But, there is a difference between Reject
and Stopped
or Cancelled
.More info
and What's new
sections of the pipeline listing page don’t add value. I would rather have the information listed above so that status of all my pipelines is visible at one glanceWe will lobby for our wish list with AWS and hopefully some if not all of the items on the list see the light of day.