Multi-clouds deployment with OIDC and rollback support

actions, oidc, container-app, azure, aws, gcp, release, reusable-workflows, rollback

Table of Contents

That’s it ninjas, spring has arrived, birds are singing and beautiful clouds adorn the sky. It make me remember about these old stories, where some mountain monks could touch multiple of them at once.

Today, we’re going to do the same thing, but with different clouds. Let’s use our GitHub Actions katas to deploy an application to Azure, AWS and GCP ! ☁️☁️☁️

1 - General View #

We’ll deploy the Spring Petclinic sample application as a container application in the cloud, and this is the fork repository where we’ll work on. The “🚀 Multi-cloud deployment” is the main workflow from where the deployment starts. The overall process is as follows:

  1. Build the application using Maven to generate a JAR file (link to code);
  2. Build and push a container image ready to run this JAR file (link to code);
  3. Run the container image on each cloud provider (link to code).

This dojo session will mainly focus on the last step and some of its specificities. If you’ve followed its link, you’ll have noticed that three reusable workflows - one per cloud provider - are called to this end. This is because I will likely use them in future projects, as deploying container applications to the cloud is a common task. Also, this is a nice way to keep the main workflow as simple as possible.

Overall, the deployment will have the following properties:

PropertiesHow
AutomatedThe deployment will be performed with GitHub Actions.
VersionedThe workflow will be triggered when a new release is published, or manually on specific tags. But not with every tags 🥷👇
ProtectedThe protection is threefold:
  • Who can deploy? Only repository admins;
  • What can be deployed? Only tags matching the regular expression v* (and associated releases since releases are based on tags);
  • Where to deploy? Only on three specific, protected GitHub environments (named aws, azure and gcp for the cloud providers they’ll respectively target).
PasswordlessAuthentication to the cloud providers is done using the OpenID Connect standard, hence no need to manage any password.
Geo-distributedThe application will be deployed on three different geographical locations: ap-southeast-1 for AWS, east us for Azure and europe-west1 for GCP.

2 - Shaping the clouds #

Each of the cloud providers has its own way(s) to deploy container applications, and provides the associated GitHub Actions to do it. Here is a quick recap of the ones I’ve used:

Cloud providerContainer ServiceGitHub Actions
Amazon Web ServicesElastic Container Service
Microsoft AzureAzure Container InstancesAzure/aci-deploy
Google Cloud PlatformCloud Rungoogle-github-actions/deploy-cloudrun

Before using these actions in our workflows, we have to set up the cloud infrastructures first. With the aim of keeping this scroll as short as possible, I’ve detailed this process in the doc/infra-setup folder and all the CLI commands are provided. You can also have a look at the quickstart to get an overall picture.

💡 Bonus! If you want to try it, but don’t want to install the CLI tools on your own machine, there is a codespaces setup ready to be used for that ✨

Once it’s done, we’ll use these actions in our reusable workflows to deploy the container application:

You can observe that these workflows are called with different regions [1] [2] [3] for each cloud provider, hence making the deployment geo-distributed. This is a good practice to ensure high availability and to reduce the risk of downtime. And this is by the way just one of the benefits of the multi-cloud approach.

3 - OIDC in Actions #

So the clouds are ready and we know exactly how we’ll deploy our application. That’s neat, but we still have to authenticate beforehand in order to proceed. When it comes to authenticate to cloud providers in GitHub Actions, two good practices pop up:

And guess what? This is exactly what we gonna do here: we’ll use a service account for each cloud provider, and we’ll configure them to authenticate using the OpenID Connect standard.

Similarly to what we did for the cloud infrastructures, I’ve detailed this process in the doc/oidc-setup folder and the following table summarizes what has been used to this end:

Cloud providerService account constructOIDC implementationGitHub Action
Amazon Web ServicesIAM roleOIDC identity provideraws-actions/configure-aws-credentials
Microsoft AzureService principalWorkload identity federationAzure/login
Google Cloud PlatformService accountWorkload identity federationgoogle-github-actions/auth

Once everything is set up, we can authenticate to the cloud providers using OIDC in our reusable workflows:

You might have noticed that we are still using Actions secrets to pass the OIDC information to the cloud providers. This is just to add an extra layer of security: for instance, if that information was leaked and the OIDC trust relationship was too permissive, an attacker would perhaps be able to access the cloud resources from its own repository.

Here this should not be the case - but we never know! - since during the setup I’ve restricted the trust relationship to the following OIDC identities:

During the authentication process, the identity checks are performed by the cloud providers based on the token passed by the GitHub OIDC provider.

Allright! We have implemented some mecanisms to make the deployment geo-distributed and passwordless, now let’s make it protected and versioned.

4 - Controlling the version to deploy #

The ones with eyes like the hawk will have observed that there is actually an additional step in the main worflow, that I’ve not listed in the general view above.

To get a bit more context, let’s have a look at the very beginning of the workflow:

name: "🚀 Multi-cloud PetClinic deployment demo"

on:
  workflow_dispatch:
  release:
    types: [published]

The workflow will be triggered manually or when a release is published. I’m working alone on this project, but let’s imagine a more complex scenario where multiple people collaborate. As a repository administrator, I would possibly want to restrict i) who can trigger/re-run the workflow (e.g. only admin), and ii) which tags (and thus associated releases) are allowed to be deployed (e.g. only production tag).

That’s why I’ve added a few checks in the first job of the workflow to ensure that:

  prereq_checks:
    name: Ensure this workflow has been triggered by and admin on a protected release tag (v*)
    runs-on: ubuntu-latest
    env:
      TRIGGERING_ACTOR: ${{ github.triggering_actor }}
    steps:
      - name: Fail if the workflow has been triggered manually with a non-release tag (i.e. not using the 'v*' regex)
        run: |
          if [[ "$GITHUB_REF" != refs/tags/v* ]]; then
            echo "Only 'v*' tags and associated releases can be deployed, exiting."
            exit 1
          else
            echo "The workflow has been triggered on a release tag, continuing."
          fi
      - name: Fail if the workflow has been triggered by a non-admin user
        uses: actions/github-script@v6
        with:
          script: |
            const TRIGGERING_ACTOR = process.env.TRIGGERING_ACTOR
            github.rest.repos.getCollaboratorPermissionLevel({
              owner: context.repo.owner,
              repo: context.repo.repo,
              username: TRIGGERING_ACTOR
              }).then(response => {
                if (response.data.permission == 'admin') {
                  core.info('The github actor has admin permission on this repository. Continuing.')
                } else {
                  core.info('Only repository admins can trigger this workflow. Exiting.')
                  process.exit(1)
                }
              })

As you can see, only member with the admin permission can trigger the workflow, and only tags matching the regexp v* are allowed to be deployed. In parallel, I’ve set up a tag protection rules to restrict v* tags creation to privileged members only. This tag protection rule will also apply when creating a release (i.e. a non-privileged member won’t be authorized to create a release with a tag matching v*).

If you combine this with the use of the 3 dedicated GitHub environments, which are themselves bound to the branches matching v* (i.e. our protected tags) and protected to require a manual approval before deployment, the overall level of security is fairly satisfying.

5 - The pieces together a.k.a. time to trigger #

Congrats, you made it so far, and now comes the time to enjoy and trigger 🔫

Let’s say that the developers have been working hard on the application to add new features, and they are ready to deploy the new version to production.

A member with the admin permission on the repository will create a new release with a tag matching the v* regexp: release_creation

This will trigger the main workflow, and after the checks have passed, the image is built and pushed to the GitHub container registry. Then the three reusable workflows are called, and since they are working on protected environments, a manual approval is required (e.g. by the SRE team): deployments_approval

Once the approval is granted, the workflows will continue and the new version of the application will be deployed to our three cloud providers: deployment_succeeded

Is that all? Could be, but you know, sometimes accidents happen. Let’s imagine that’s the case, and that the new version of the application is not working as expected. Mayday, mayday, we need to rollback! Nothing more simple, the administrator will just trigger the workflow again, but this time manually, and on the tag matching the previous release: rollback

Phew, we’re going back to business! 🎉

6 - Next steps #

That was a lot of fun, and I hope you enjoyed it as much as I did. As usual, this is not for production use, and there is room for improvement. I’m sure you have some ideas on how to make this even better!

We could for instance:

But the sun is shining, and the flowers are blooming, so let’s leave it here for now.

Gasshō 🙏