This week you get a bonus – the conclusion to the discussion of Kubernetes native sidecars! In the last post, we discussed the problems that native sidecars are designed to solve and how they work. In this post, we’ll look at a practical example that can benefit from this functionality.
As you probably guessed from the title, this example will explore how to use native sidecars to improve how GitHub Actions Runner Controller (ARC) supports Docker-in-Docker (DinD) containers. As a reminder, the native sidecar functionality is currently still considered to be in Beta. That means that is is on by default and generally stable, well-tested, and considered safe. Support for the feature will not be dropped, but details or semantics could change in subsequent releases. Native sidecars support is available on any cluster running 1.29 or later. It is expected to be generally available in 2025 as part of Kubernetes 1.33.
Why we want a native sidecar
In the case of ARC, a sidecar is used to support the Docker-in-Docker container mode. This mode supports workflows that need to use Docker commands to build and test images or run Dockerfile based Actions. It uses the docker:dind
image as a sidecar to provide this functionality. The dind
container uses a shared volume to provide the Docker socket to the main application container. This allows the runner to use Docker commands as if it was running in a local Docker environment. It also isolates the higher-privilege Docker host from the main application container.
As we discussed previously, the current sidecar behaviors are not ideal. The sidecar typically starts at the same time as the runner, but there is a small chance that the start is delayed. In that case, it won’t be available in time to handle requests. The dind
container can also get stuck in a state where it is running a process and not able to shut down gracefully. This prevents the pod from terminating, delaying the start of further workflow jobs. Of course, if it fails (or fails to start), the runner will likely fail to run workflows successfully. The Docker tasks will fail. This situation can happen if the dind
container is killed due to memory pressure. Since it may have a lower OOM score than the runner, it may be killed first when the node is under pressure.
While there are ways to work around these problems, there is only one solution that can handle all of the issues: native sidecars.
Implementing a solution
When ARC is configured with containerMode: dind
, it uses
this spec for deploying new runners. Most teams will leave the containerMode
unconfigured and use this specification as a starting point for their customizations. That will mean they have something like this in the values.yaml
file:
1spec:
2 initContainers:
3 - name: init-dind-externals
4 image: ghcr.io/actions/actions-runner:latest
5 command: ["cp", "-r", "-v", "/home/runner/externals/.",e/runner/tmpDir/"]
6 volumeMounts:
7 - name: dind-externals
8 mountPath: /home/runner/tmpDir
9 containers:
10 - name: runner
11 image: ghcr.io/actions/actions-runner:latest
12 command: ["/home/runner/run.sh"]
13 env:
14 - name: DOCKER_HOST
15 value: unix:///var/run/docker.sock
16 volumeMounts:
17 - name: work
18 mountPath: /home/runner/_work
19 - name: dind-sock
20 mountPath: /var/run
21 - name: dind
22 image: docker:dind
23 args:
24 - dockerd
25 - --host=unix:///var/run/docker.sock
26 - --group=$(DOCKER_GROUP_GID)
27 env:
28 - name: DOCKER_GROUP_GID
29 value: "123"
30 securityContext:
31 privileged: true
32 volumeMounts:
33 - name: work
34 mountPath: /home/runner/_work
35 - name: dind-sock
36 mountPath: /var/run
37 - name: dind-externals
38 mountPath: /home/runner/externals
39 volumes:
40 - name: work
41 emptyDir: {}
42 - name: dind-sock
43 emptyDir: {}
44 - name: dind-externals
45 emptyDir: {}
Notice that ARC uses an initContainer
to stage some files into the mount used by the dind
container. Once that completes, the runner
and dind
containers start in parallel. The runner
will not have access to the shared Docker socket until the dind
container is running. Since both are configured to run as application containers, the dind
container can continue to run after the runner has terminated, consuming resources. Until it terminates, ARC cannot reclaim any other resources. Since Kubernetes has no way to know that the Pod only relied on the runner
container, it won’t automatically terminate the dind
container.
In short, it’s a perfect example of a situation where native sidecars can help.
Solving the problem
This is a modified version of the ARC spec that takes advantage of native sidecars:
1spec:
2 initContainers:
3 - name: init-dind-externals
4 image: ghcr.io/actions/actions-runner:latest
5 command: ["cp", "-r", "-v", "/home/runner/externals/.",e/runner/tmpDir/"]
6 volumeMounts:
7 - name: dind-externals
8 mountPath: /home/runner/tmpDir
9 - name: dind
10 image: docker:dind
11 restartPolicy: Always
12 args:
13 - dockerd
14 - --host=unix:///var/run/docker.sock
15 - --group=$(DOCKER_GROUP_GID)
16 env:
17 - name: DOCKER_GROUP_GID
18 value: "123"
19 startupProbe:
20 exec:
21 command:
22 - docker
23 - info
24 initialDelaySeconds: 0
25 failureThreshold: 24
26 periodSeconds: 5
27 securityContext:
28 privileged: true
29 volumeMounts:
30 - name: work
31 mountPath: /home/runner/_work
32 - name: dind-sock
33 mountPath: /var/run
34 - name: dind-externals
35 mountPath: /home/runner/externals
36 containers:
37 - name: runner
38 image: ghcr.io/actions/actions-runner:latest
39 command: ["/home/runner/run.sh"]
40 env:
41 - name: DOCKER_HOST
42 value: unix:///var/run/docker.sock
43 volumeMounts:
44 - name: work
45 mountPath: /home/runner/_work
46 - name: dind-sock
47 mountPath: /var/run
48 volumes:
49 - name: work
50 emptyDir: {}
51 - name: dind-sock
52 emptyDir: {}
53 - name: dind-externals
54 emptyDir: {}
The biggest difference in this template is that the dind
container is now part of the initContainers
instead of the application containers
. Its definition also includes a new setting, restartPolicy: Always
. This turns it into a native sidecar. As a result, the runner
container will not start until the container is ready, and the dind
container will not be terminated until after the runner
container has terminated. Because the runner
container is the only application container, Kubernetes knows that it can terminate the pod once the runner
has completed. That provides a further improvement.
Notice that dind
is listed second in the initContainers
. That means it won’t start until after the init-dind-externals
container has completed. This behavior is consistent with the original implementation. That ensures that the externals folder will be properly configured before dind
or runner
can start.
This configuration also addresses one additional issue. Using a native sidecar guarantees that dind
has started before runner
. It doesn’t, however, guarantee that the dockerd
running inside the container is ready to accept commands. The runner
could still try to execute a workflow without having a working Docker daemon. This is addressed by the changes on lines 19-26.
Probing the daemons
When the dind
container starts, it begins executing a script that configures dockerd
. It then starts listening for commands communicated using /var/run/socket
. This is called a Unix Domain Socket (UDS); it allows to processes on the same host to communicate using a lightweight file rather than a network connection. Since these containers are in the same Pod, this approach allows the runner
to use a fast, dedicated local connection.
The implementation relies on sharing the parent directory, /var/run
, with the runner
using the dind-sock
volume mount. The two containers both have the DOCKER_HOST
environment variable configured. This makes both containers aware that they should use the UDS. While the runner
can use the shared socket, it doesn’t have a way to know that dockerd
is running in the dind
container.
To resolve this, the dind
container declares a startupProbe
. This means that it won’t be considered started until the startupProbe
returns success. The implementation takes advantage of the docker
CLI that is included in that container. The docker info
command can be used to check if dockerd
is ready. It will return a zero exit code if it successfully connects to dockerd
(and a non-zero on failure). Once the probe succeeds, dockerd
is accepting commands. The probe process will then halt and the runner
will be started.
It’s worth observing that dockerd
is actually running as the entrypoint for the container. As a result, the container will terminate if dockerd
exits or fails. A liveness or readiness probe is not strictly necessary in this particular case since the container will automatically restart the container if dockerd
fails. While there are some edge cases, this is enough for our situation.
As you can see, it’s quite simple to adopt a native sidecar. In addition, it provides a number of benefits that can improve the reliability of the Pod and the ARC runner. This creates a more resilient deployment that behaves as expected across a wider set of scenarios.