Using A Kubernetes Native Sidecar With GitHub ARC

Category:

Tags:

#ARC

#Containers

#DevOps

#GitHub

Published: December 20, 2024 Reading Time: 7 min

This week you get a bonus – the conclusion to the discussion of Kubernetes native sidecars! In the last post, we discussed the problems that native sidecars are designed to solve and how they work. In this post, we’ll look at a practical example that can benefit from this functionality.

As you probably guessed from the title, this example will explore how to use native sidecars to improve how GitHub Actions Runner Controller (ARC) supports Docker-in-Docker (DinD) containers. As a reminder, the native sidecar functionality is currently still considered to be in Beta. That means that is is on by default and generally stable, well-tested, and considered safe. Support for the feature will not be dropped, but details or semantics could change in subsequent releases. Native sidecars support is available on any cluster running 1.29 or later. It is expected to be generally available in 2025 as part of Kubernetes 1.33.

Why we want a native sidecar

In the case of ARC, a sidecar is used to support the Docker-in-Docker container mode. This mode supports workflows that need to use Docker commands to build and test images or run Dockerfile based Actions. It uses the docker:dind image as a sidecar to provide this functionality. The dind container uses a shared volume to provide the Docker socket to the main application container. This allows the runner to use Docker commands as if it was running in a local Docker environment. It also isolates the higher-privilege Docker host from the main application container.

As we discussed previously, the current sidecar behaviors are not ideal. The sidecar typically starts at the same time as the runner, but there is a small chance that the start is delayed. In that case, it won’t be available in time to handle requests. The dind container can also get stuck in a state where it is running a process and not able to shut down gracefully. This prevents the pod from terminating, delaying the start of further workflow jobs. Of course, if it fails (or fails to start), the runner will likely fail to run workflows successfully. The Docker tasks will fail. This situation can happen if the dind container is killed due to memory pressure. Since it may have a lower OOM score than the runner, it may be killed first when the node is under pressure.

While there are ways to work around these problems, there is only one solution that can handle all of the issues: native sidecars.

Implementing a solution

When ARC is configured with containerMode: dind, it uses this spec for deploying new runners. Most teams will leave the containerMode unconfigured and use this specification as a starting point for their customizations. That will mean they have something like this in the values.yaml file:

 1spec:
 2  initContainers:
 3  - name: init-dind-externals
 4    image: ghcr.io/actions/actions-runner:latest
 5    command: ["cp", "-r", "/home/runner/externals/.", "/home/runner/tmpDir/"]
 6    volumeMounts:
 7      - name: dind-externals
 8        mountPath: /home/runner/tmpDir
 9  containers:
10  - name: runner
11    image: ghcr.io/actions/actions-runner:latest
12    command: ["/home/runner/run.sh"]
13    env:
14      - name: DOCKER_HOST
15        value: unix:///var/run/docker.sock
16    volumeMounts:
17      - name: work
18        mountPath: /home/runner/_work
19      - name: dind-sock
20        mountPath: /var/run
21  - name: dind
22    image: docker:dind
23    args:
24      - dockerd
25      - --host=unix:///var/run/docker.sock
26      - --group=$(DOCKER_GROUP_GID)
27    env:
28      - name: DOCKER_GROUP_GID
29        value: "123"
30    securityContext:
31      privileged: true
32    volumeMounts:
33      - name: work
34        mountPath: /home/runner/_work
35      - name: dind-sock
36        mountPath: /var/run
37      - name: dind-externals
38        mountPath: /home/runner/externals
39  volumes:
40  - name: work
41    emptyDir: {}
42  - name: dind-sock
43    emptyDir: {}
44  - name: dind-externals
45    emptyDir: {}

Notice that ARC uses an initContainer to stage some files into the mount used by the dind container. Once that completes, the runner and dind containers start in parallel. The runner will not have access to the shared Docker socket until the dind container is running. Since both are configured to run as application containers, the dind container can continue to run after the runner has terminated, consuming resources. Until it terminates, ARC cannot reclaim any other resources. Since Kubernetes has no way to know that the Pod only relied on the runner container, it won’t automatically terminate the dind container.

In short, it’s a perfect example of a situation where native sidecars can help.

Solving the problem

This is a modified version of the ARC spec that takes advantage of native sidecars:

 1spec:
 2  initContainers:
 3  - name: init-dind-externals
 4    image: ghcr.io/actions/actions-runner:latest
 5    command: ["cp", "-r", "/home/runner/externals/.", "/home/runner/tmpDir/"]
 6    volumeMounts:
 7      - name: dind-externals
 8        mountPath: /home/runner/tmpDir
 9  - name: dind
10    image: docker:dind
11    restartPolicy: Always
12    args:
13      - dockerd
14      - --host=unix:///var/run/docker.sock
15      - --group=$(DOCKER_GROUP_GID)
16    env:
17      - name: DOCKER_GROUP_GID
18        value: "123"
19    startupProbe:
20      exec:
21        command:
22         - docker
23         - info
24      initialDelaySeconds: 0
25      failureThreshold: 24
26      periodSeconds: 5    
27    securityContext:
28      privileged: true
29    volumeMounts:
30      - name: work
31        mountPath: /home/runner/_work
32      - name: dind-sock
33        mountPath: /var/run
34      - name: dind-externals
35        mountPath: /home/runner/externals
36  containers:
37  - name: runner
38    image: ghcr.io/actions/actions-runner:latest
39    command: ["/home/runner/run.sh"]
40    env:
41      - name: DOCKER_HOST
42        value: unix:///var/run/docker.sock
43    volumeMounts:
44      - name: work
45        mountPath: /home/runner/_work
46      - name: dind-sock
47        mountPath: /var/run
48  volumes:
49  - name: work
50    emptyDir: {}
51  - name: dind-sock
52    emptyDir: {}
53  - name: dind-externals
54    emptyDir: {}

The biggest difference in this template is that the dind container is now part of the initContainers instead of the application containers. Its definition also includes a new setting, restartPolicy: Always. This turns it into a native sidecar. As a result, the runner container will not start until the container is ready, and the dind container will not be terminated until after the runner container has terminated. Because the runner container is the only application container, Kubernetes knows that it can terminate the pod once the runner has completed. That provides a further improvement.

Notice that dind is listed second in the initContainers. That means it won’t start until after the init-dind-externals container has completed. This behavior is consistent with the original implementation. That ensures that the externals folder will be properly configured before dind or runner can start.

This configuration also addresses one additional issue. Using a native sidecar guarantees that dind has started before runner. It doesn’t, however, guarantee that the dockerd running inside the container is ready to accept commands. The runner could still try to execute a workflow without having a working Docker daemon. This is addressed by the changes on lines 19-26.

Probing the daemons

When the dind container starts, it begins executing a script that configures dockerd. It then starts listening for commands communicated using /var/run/socket. This is called a Unix Domain Socket (UDS); it allows to processes on the same host to communicate using a lightweight file rather than a network connection. Since these containers are in the same Pod, this approach allows the runner to use a fast, dedicated local connection.

The implementation relies on sharing the parent directory, /var/run, with the runner using the dind-sock volume mount. The two containers both have the DOCKER_HOST environment variable configured. This makes both containers aware that they should use the UDS. While the runner can use the shared socket, it doesn’t have a way to know that dockerd is running in the dind container.

To resolve this, the dind container declares a startupProbe. This means that it won’t be considered started until the startupProbe returns success. The implementation takes advantage of the docker CLI that is included in that container. The docker info command can be used to check if dockerd is ready. It will return a zero exit code if it successfully connects to dockerd (and a non-zero on failure). Once the probe succeeds, dockerd is accepting commands. The probe process will then halt and the runner will be started.

It’s worth observing that dockerd is actually running as the entrypoint for the container. As a result, the container will terminate if dockerd exits or fails. A liveness or readiness probe is not strictly necessary in this particular case since the container will automatically restart the container if dockerd fails. While there are some edge cases, this is enough for our situation.

As you can see, it’s quite simple to adopt a native sidecar. In addition, it provides a number of benefits that can improve the reliability of the Pod and the ARC runner. This creates a more resilient deployment that behaves as expected across a wider set of scenarios.