I’m not always the most patient person. In truth, I hate waiting – especially when it comes to long builds. In Using the Docker Cache, I discussed a few of the options available for caching layers to improve performance. Today, let’s look at how to implement this with GitHub Actions and the GitHub Container Registry (GHCR).
Baseline Action
Let’s start with a simple GitHub Action workflow that will build a Dockerfile. This workflow is very similar to the default one that is provided for you by GitHub for publishing Docker containers. I explore the pattern more
in this post. Essentially, it publishes an image with the name {owner}/{repo}:{branch-or-tag}
to the GitHub Container Registry associated with {owner}
.
The code:
1name: Docker Build
2
3on:
4 workflow_dispatch:
5 push:
6 branches: [ "main" ]
7 tags: [ 'v*.*.*' ]
8 pull_request:
9 branches: [ "main" ]
10
11env:
12 REGISTRY: ghcr.io
13 IMAGE_NAME: ${{ github.repository }}
14
15jobs:
16 build-container:
17 runs-on: ubuntu-latest
18 permissions:
19 contents: read
20 packages: write
21
22 steps:
23 - name: Checkout repository
24 uses: actions/checkout@v3
25
26 - name: Setup Docker buildx
27 uses: docker/setup-buildx-action@v2
28
29 - name: Log into registry ${{ env.REGISTRY }}
30 if: github.event_name != 'pull_request'
31 uses: docker/login-action@v2
32 with:
33 registry: ${{ env.REGISTRY }}
34 username: ${{ github.actor }}
35 password: ${{ secrets.GITHUB_TOKEN }}
36
37 - name: Extract Docker metadata
38 id: meta
39 uses: docker/metadata-action@v4
40 with:
41 images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}
42
43 - name: Build and push Docker image
44 id: build-and-push
45 uses: docker/build-push-action@v4
46 with:
47 context: .
48 push: ${{ github.event_name != 'pull_request' }}
49 tags: ${{ steps.meta.outputs.tags }}
50 labels: ${{ steps.meta.outputs.labels }}
Using a Cache
To select and use a cache type, we modify the step for building and pushing the image. We just need to add two additional inputs, cache-to
and cache-from
. The configurations for these is a set of comma-separated name/value pairs.
The basic structure:
1 - name: Build and push Docker image
2 id: build-and-push
3 uses: docker/build-push-action@v4
4 with:
5 context: .
6 push: ${{ github.event_name != 'pull_request' }}
7 tags: ${{ steps.meta.outputs.tags }}
8 labels: ${{ steps.meta.outputs.labels }}
9 cache-from:
10 cache-to:
GHA
The first cache type we’ll explore is GHA
(GitHub Actions cache). This saves the metadata and blobs for the cache to the GitHub Actions cache service. The cache is limited to 10GB per repo, so it’s not a good fit for large images or repos that need to cache a large number of layers. The caches are scoped by branch, with the default branch cache being available to every branch. The details about this cache are
documented here.
It has two modes for cache-to
:
min
: Only export layers for the resulting image (default)max
: Export all layers, including the intermediate steps
An additional parameter, scope
is also available for cache-to
. This provides a scoping name for the cache (default: buildkit
). This can be used to avoid potential cache collisions.
The configured step:
1 - name: Build and push Docker image
2 id: build-and-push
3 uses: docker/build-push-action@v4
4 with:
5 context: .
6 push: ${{ github.event_name != 'pull_request' }}
7 tags: ${{ steps.meta.outputs.tags }}
8 labels: ${{ steps.meta.outputs.labels }}
9 cache-from: type=gha
10 cache-to: type=gha,mode=max
After running a build, the cache contents will be pushed to the Actions cache service. For example:
Inline
The next level of enhancement is the inline cache exporter. This one embeds the cache directly into the images themselves, enabling the registry to store the cache data. This has a few limitations:
- It adds the cache data to your registry along with the images
- It only supports
min
caching mode, so it only exports layers for the resulting image - The cache importer (
cache-from
) must betype=registry,ref=IMAGE_NAME
The registry ref
is worth diving into a bit further.
What happens if we use type=registry,ref=${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}
? If no specific version tag is provided for the ref
, the importer will use latest
. So, that ref
is equivalent to type=registry,ref=${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:latest
. This is usually not what we want!
We often want to reference the caches associated with the same tag. An easy solution is to use the first tag returned from the step that gathers the Docker metadata. Because we gave that step an identifier (id: meta
), we can directly reference the JSON output using the expression ${{ fromJSON(steps.meta.outputs.json).tags[0] }}
. This would be written like this:
1 - name: Build and push Docker image
2 id: build-and-push
3 uses: docker/build-push-action@v4
4 with:
5 context: .
6 push: ${{ github.event_name != 'pull_request' }}
7 tags: ${{ steps.meta.outputs.tags }}
8 labels: ${{ steps.meta.outputs.labels }}
9 cache-from: type=registry,ref=${{ fromJSON(steps.meta.outputs.json).tags[0] }}
10 cache-to: inline
There may be multiple tags. As an example, a scheduled trigger will a nightly
tag in addition to the branch-specific tag. This results in:
${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:nightly
${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:main
You can apply additional logic if you want to target a specific tag in the list. A better alternative is to understand that cache-from
is actually defined as a delimited list. Each line represents a unique cache importer. You could use a run
step to format the tags for use with cache-from
and output the values. The command line app jq
can be used to process the JSON data and convert it to an array of line-delimited strings.
This approach would be implemented:
1 - name: Format tags as registry refs
2 id: registry_refs
3 env:
4 TAGS: ${{ steps.meta.outputs.json }}
5 run: |
6 echo tags=$(echo $TAGS | jq '.tags[] | "type=registry,ref=" + . | @text') >> $GITHUB_OUTPUT
7
8 - name: Build and push Docker image
9 id: build-and-push
10 uses: docker/build-push-action@v4
11 with:
12 context: .
13 push: ${{ github.event_name != 'pull_request' }}
14 tags: ${{ steps.meta.outputs.tags }}
15 labels: ${{ steps.meta.outputs.labels }}
16 cache-from: ${{ steps.registry_refs.outputs.tags }}
17 cache-to: inline
Registry
This is the last one we’ll examine today. This gives the greatest control, including:
- Separate the cache data from the main image data
- Change the compression algorithm to
gzip
,estargz
, orzstd
(’compression=zstd
) - Set the compression level to a value from 0 to 22 (
compressionlevel=11
) - Use OCI media types in the manifest (
oci-mediatypes=true
) - Supports both
min
andmax
modes
Just like with inline, a ref
must be provided. In this case, it is required for both the cache-to
and cache-from
. All of the details about creating and handling the ref
targets apply here. There is one major difference. Because the cache is not included inline, a dedicated tag can be used. This can be a fixed image name (such as buildcache
) or it can be a dynamic name (such as cache-main
or cache-nightly
).
Until recently, BuildKit currently only supported a single cache exporter (cache-to
) and did not support multiple values. That has since changed, although there are still
some open issues. The Action supports a list for cache-to
to allow you to use the features. In the past, it was common to use a single cache or a cache for the build event type (branch, nightly, pr, etc.).
1 - name: Build and push Docker image
2 id: build-and-push
3 uses: docker/build-push-action@v4
4 with:
5 context: .
6 push: ${{ github.event_name != 'pull_request' }}
7 tags: ${{ steps.meta.outputs.tags }}
8 labels: ${{ steps.meta.outputs.labels }}
9 cache-from: type=registry,ref=${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:cache
10 cache-to: type=registry,ref=${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:cache,mode=max
If you’re using multiple branches, you can define those multiple ways. You can use code (similar to what we did for above) to create dynamic values as outputs from a step. You can also use GitHub context variables to get the event type, branch name, or other values.
Using a hard-coded list with some dynamic variables might look like this:
1 - name: Build and push Docker image
2 id: build-and-push
3 uses: docker/build-push-action@v4
4 with:
5 context: .
6 push: ${{ github.event_name != 'pull_request' }}
7 tags: ${{ steps.meta.outputs.tags }}
8 labels: ${{ steps.meta.outputs.labels }}
9 cache-from: |
10 type=registry,ref=${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:cache
11 type=registry,ref=${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:cache-${{ github.event_name }}
12 cache-to: |
13 type=registry,ref=${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:cache,mode=max
14 type=registry,ref=${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:cache-${{ github.event_name }},mode=max
That covers the main options. I’d encourage you to explore these and see which approaches offer you the best performance gains.
Happy DevOp’ing!