Automatic CloudFront Invalidation for S3 Origins

Dave North | Last updated on November 24, 2021 | 5 minute read

Welcome to the first post on the Rewind engineering blog.  We’re hoping to share some of the interesting technology problems we’re solving and solutions we’ve come up with.

To kick things off, here’s a small solution to a common problem – automatically forcing a refresh of content on the edge nodes of AWS CloudFront.

CloudWhat?

CloudFront is the AWS Content Delivery Network (CDN), used when you want to enable faster access to content in remote locations.  With hundreds of edge nodes serving your content, CloudFront is perfect when you need fast access to content from various geographic locations.

CloudFront “distributions” are pointed to a content origin (where the master version of the content exists).  The content is expired from the cache on the edge following a periodic schedule – say every 24 hours. If you need to force a refresh of the cache on the edge (say you’ve just updated some content and want it visible right away), CloudFront allows you to “invalidate” the content on the edge by submitting an invalidation request.

We use CloudFront for a few applications at Rewind but our latest use is for hosting the configuration for our feature toggle tool, 8ball.  We’ll write more about 8ball in a future blog but this is one way we can get features into the hands of Rewind customers faster, release some features to a limited set of people before final release or just “dark launch” functionality.  8ball reads its configuration information from CloudFront so being able to have it update automatically is one less thing for our team to think about.

Automatic Invalidation

One way to handle invalidation is to have a side process that runs whenever you upload new content to the origin.  But what if you want to get content into the origin via other means? If you’re using S3 as the content origin, you can take advantage of AWS Events to trigger the invalidation.  That’s the solution we created here and found a couple of hurdles along the way which we’ll cover here.

We used AWS SAM to package up a solution for the automation invalidation.  If you’re not familiar with SAM, have a read of this article on using it to create a Slack DJ.  SAM lets you develop and test Lambda-based solutions locally with ease and then deploy them as an application.  In our example here, the solution needs to:

  • Create a Lambda function that:
    • Finds the CloudFront distribution associated with a given S3 bucket
    • Submits an invalidation request for any changed files in this bucket
  • Subscribe the Lambda function to any file modification events on the S3 bucket

You can find the full source for this solution in our GitHub repo, but let’s look at a couple of pieces and one major gotcha.

Finding the CloudFront distribution

This has to check all of the CloudFront distributions looking for the one that uses our S3 bucket as the origin.  I put this code together using Python/Boto:

def get_cloudfront_distribution_id(bucket):

    bucket_origin = bucket + '.s3.amazonaws.com'
    cf_distro_id = None

    # Create a reusable Paginator
    paginator = cloudfront_client.get_paginator('list_distributions')

    # Create a PageIterator from the Paginator
    page_iterator = paginator.paginate()

    for page in page_iterator:
        for distribution in page['DistributionList']['Items']:
            for cf_origin in distribution['Origins']['Items']:
                    print("Origin found {}".format(cf_origin['DomainName']))
                    if bucket_origin == cf_origin['DomainName']:
                            cf_distro_id = distribution['Id']
                            print("The CF distribution ID for {} is {}".format(bucket,cf_distro_id))

    return cf_distro_id

The key thing to note here is the use of Boto’s paginators.  These are a fantastic timesaver in Boto because they save you from having to iterate over paged results where paging is not a requirement.  The paginator does the paging automatically for you and gives you a big list back.

Creating the Invalidation

Once we have the CloudFront distribution ID, we can send an invalidation request.

invalidation = cloudfront_client.create_invalidation(DistributionId=cf_distro_id,
        InvalidationBatch={
        'Paths': {
                'Quantity': 1,
                'Items': [key]
        },
        'CallerReference': str(time.time())
})

Two things of note here are the items to invalidate and the CallerReference.  The items to invalidate uses the “key” which comes from the event which has triggered our Lambda function (ie. a file has changed in S3).  The CallerReference can be any unique values so we just use a simple timestamp here.

Subscribing the Lambda to S3 Events (Gotcha!)

SAM does support events in it’s templates but here’s the kicker – you cannot create events on existing S3 buckets.  SAM has to own it all. This is very inconvenient if you’re creating the S3 buckets in some other way (ie. Terraform) or if they are pre-existing buckets from long ago.  That was the case for us so I had to find another way to add the event. I came up with the following sequence which is very long-winded but does work. This is captured in the deploy.sh in the GitHub repo

  • Do not define the S3 bucket or event in the SAM template.  Define only the Lambda function
  • Deploy the Lamda using standard SAM commands (sam package, sam deploy)
  • Use the AWS CLI to add permission to the Lambda, allowing it to be called from an S3 event (aws lambda add-permission…)
  • Also using the CLI, create the S3 event (aws s3api put-bucket-notification-configuration…)

All Done!

At this point, you should have a solution which will automatically send an invalidation request to CloudFront whenever new content is added or existing content is changed in your S3 origin.  In our case, we’re serving an application configuration file using an internal CloudFront distribution and this is just one less step to remember when the file gets updated. Reducing error-prone steps is always good!

Next time I’ll show you how you can create a full working internal combustion engine using only a toothpick, a soda can, and some pine sap.

The full source for this solution (including instructions how to test this locally) is available on GitHub: https://github.com/rewindio/aws-cloudfront-auto-invalidator

We’re hiring. Come work with us!

We’re always looking for passionate and talented engineers to join our growing team.

View open career opportunities

Profile picture of <a class=Dave North">
Dave North
Dave North has been a versatile member of the Ottawa technology sector for more than 25 years. Dave is currently working at Rewind, leading the technical operations group. Prior to Rewind, Dave was a long time member of Signiant, holding many roles in the organization including sales engineer, pro services, technical support manager, product owner, and devops director. A proven leader and innovator, Dave holds 5 US patents and helped drive Signiant's move to a cloud SaaS business model with the award-winning Media Shuttle project. Prior to Signiant, Dave held several roles at Nortel, Bay Networks, and ISOTRO Network Management working on the NetID product suite. Dave is fanatical about cloud computing, automation, gadgets and Formula 1 racing.