Scaling ARC on a Schedule

Category:

#GitHub

#DevOps

Tags:

#GitHub

#DevOps

#Containers

#ARC

Published: July 3, 2024 Reading Time: 8 min

One of the things I appreciate most about the newest version of Actions Runner Controller (ARC) is that it focus on a single responsibility. It doesn’t try to do everything in a single codebase. Instead, it lets Kubernetes experts manage their resources in alignment with Kubernetes best practices. It makes the code easier to maintain, and it makes the responsibilities clear.

The original version of ARC was powerful, but it often tried to support numerous features, leading to images and controllers that were more complex, needed web hooks, and required more permissions. This is not to disparage the team that create the code. They did a fantastic job of creating a solution that worked for many people. As the project grew, it became clear that the complexity was a barrier to adoption (and development), so they decided to reexamine the approach.

One of the most requested features for the modern version of ARC is probably to port the ScheduledOverrides feature. In the legacy community version, this feature allowed you to change the minimum runner count using a schedule. Unfortunately, it had two big drawbacks:

You could only configure the minimum runner count (at the time, minReplicas)
Calculating the current minimum runner count added a surprising amount of complexity to the code

Oddly, people that want this feature are often unaware that Kubernetes natively provides the ability to configure resources on a schedule. They just need to deploy a CronJob resource.

CronJob resources

A CronJob resource is simply a Kubernetes resource that runs a job on a schedule. It creates a Pod based on a Cron schedule. If you’re not familiar with the cron syntax, I recommend https://crontab.guru/. It’s a great tool for helping create Cron schedules. Cron syntax expresses time as a series of fields in the form minute hour day month day-of-week. You use an asterisk (*) to indicate “all”. For example, to run a scheduled task every day at 09:00 am, you would use 0 9 * * *. It has more options than that, but that’s the basic idea.

The CronJob resource creates a Kubernetes Job instance. That just means it’s spinning up a Pod to run some task. That task can be an executable, a containerized application, or even a shell script.

The AutorunningScaleSet resource

To change the minRunners, we just need to update the Custom Resource, AutoscalingRunnersets. This is the resource used to track the scale set’s configuration. The resource instance will be in the same namespace as the ARC runners it creates. All of the ARC custom resource definitions (CRDs) are in the actions.github.com group and use the version v1alpha1.

Updating resources

To update resources from thee command line, we would use kubectl. While that could work inside a cluster, Kubernetes has a better way. It has an API Server that exposes RESTful endpoints for modifying the resources. To update a resource, simply send an HTTP PATCH request to the resource’s endpoint. The general format for the API endpoint is:

1https://${APISERVER}/apis/${RESOURCE_GROUP}/${VERSION}/namespaces/${NAMESPACE}/${RESOURCE_TYPE}/${RESOURCE_INSTANCE_NAME}

For example, to update the AutoscalingRunnersets resource named my-runners in the happy namespace, you would use:

1https://kubernetes.default.svc/apis/actions.github.com/v1alpha1/namespaces/happy/autoscalingrunnersets/my-runners

It’s worth mentioning that the API Server is generally exposed to Pods as kubernetes.default.svc. However, that may not be the case in all clusters. Kubernetes provides special environment variables to every pod to that can be used to find the API Server. Those variables can be used to determine the host and port to use. The generalized pattern for the server endpoint is https://${KUBERNETES_SERVICE_HOST}:${KUBERNETES_SERVICE_PORT_HTTPS}.

Special permissions

Pods don’t automatically get access to any resource they want. They need to have the appropriate permissions. To do this, create a Service Account and bind it to a Role. The Role will have the permissions to be granted. The Pod will reference this role. These resources will exist in the same namespace as runners. The service account is primarily just a name:

1apiVersion: v1
2kind: ServiceAccount
3metadata:
4  name: scheduled-scaler-service-account

The Role is where the permissions are defined. Since we only need to PATCH the AutoscalingRunnerset, the role doesn’t need many permissions:

1kind: Role
2apiVersion: rbac.authorization.k8s.io/v1
3metadata:
4  name: scheduler-scaler-role
5rules:
6- apiGroups: ["actions.github.com"]
7  resources: ["autoscalingrunnersets"]
8  verbs: [ "patch" ]

Now, we just need to bind the two together. This gives the service account the specified Role.

 1kind: RoleBinding
 2apiVersion: rbac.authorization.k8s.io/v1
 3metadata:
 4  name: scheduled-scaler-role-binding
 5subjects:
 6- kind: ServiceAccount
 7  name: scheduled-scaler-service-account
 8roleRef:
 9  kind: Role
10  name: scheduler-scaler-role
11  apiGroup: rbac.authorization.k8s.io

Now we have a service account that has a minimal set of permissions to update the AutoscalingRunnerset resource. We can use this service account in the Pod that will run the scheduled task. Remember that to separate each item with --- when they are in the same file.

Testing permissions

Want to try out the permissions? You can use kubectl to create a Pod that uses the service account. Then use curl to send a PATCH request to the API Server. For example, to create a Pod in the namespace arc-runners and run in a shell within it:

1 kubectl run permtest --image=curlimages/curl:latest --stdin --tty -n arc-runners --overrides='{ "spec": { "serviceAccount": "scheduled-scaler-service-account" }}' --rm -i /bin/sh

To be able to call the API using curl, a few things are needed. First, the call needs a token for accessing the API server as the service account. It also needs the CA certificate for the API Server so that curl can trust the endpoint. All of these are available from a known path within the container: /var/run/secrets/kubernetes.io/serviceaccount. The token is in a file called token, and the CA certificate is in ca.crt. There’s also a file, namespace, that contains the namespace the Pod is running in.

 1# Get the API Server
 2APISERVER=${KUBERNETES_SERVICE_HOST}:${KUBERNETES_SERVICE_PORT_HTTPS}
 3
 4# The folder containing the service account credentials
 5SERVICEACCOUNT_DATA_PATH=/var/run/secrets/kubernetes.io/serviceaccount
 6
 7# The Pod's namespace
 8NAMESPACE=$(cat ${SERVICEACCOUNT_DATA_PATH}/namespace)
 9
10# The service account token contents
11TOKEN=$(cat ${SERVICEACCOUNT_DATA_PATH}/token)
12
13# The CA certificate path
14CACERT=${SERVICEACCOUNT_DATA_PATH}/ca.crt

These settings enable curl to be called. When patching a resource, a merge strategy is required. In this case, the strategy we’ll use is application/merge-patch+json. This strategy updates a resource by sending only the fields to be changed. The other fields will remain unchanged. The call just needs to include the fields that are changing:

1SPEC='{ "spec": { "minRunners": 30  } }'

Putting everything together, the curl call will look like this:

1curl --cacert ${CACERT} --header "Authorization: Bearer ${TOKEN}" -X PATCH "https://${APISERVER}/apis/${ARC_CUSTOM_RESOURCE_GROUP}/${ARC_CRD_VERSION}/namespaces/${NAMESPACE}/${ARC_SCALESET_RESOURCE}/${SCALESET}" -H "Content-Type: application/merge-patch+json" -H 'Accept: application/json' --data "$SPEC"

Defining the CronJob

With a role and a plan, it’s time to define the CronJob. As an example, we’ll create a CronJob that runs a script every morning at 9am UTC to update the minRunners to 30.

 1apiVersion: batch/v1
 2kind: CronJob
 3metadata:
 4  name: scale-up-arc-runners
 5spec:
 6  # The schedule for the job (required) - every day at 9am
 7  schedule: "0 9 * * *"
 8
 9  # Preserve the history for the last two successful runs (optional, defaults to 3)
10  successfulJobsHistoryLimit: 2
11
12  # Preserve the history for the last two failed runs (optional, defaults to 1)
13  failedJobsHistoryLimit: 2
14
15  # Don't schedule a new job if one is already running (optional)
16  concurrencyPolicy: Forbid
17
18  # The job must start within 60 seconds of the scheduled time (optional)
19  startingDeadlineSeconds: 60
20
21  # The timezone to use for the schedule (optional, defaults to local time)
22  timeZone: Etc/UTC
23  jobTemplate:
24    spec:
25      # The job must finish within 60 seconds (optional)
26      activeDeadlineSeconds: 60
27
28      # The template (and all of these fields) are required
29      template:
30        spec:
31          # The service account to use for the job. It needs permissions for the resource.
32          serviceAccountName: scheduled-scaler-service-account
33          containers:
34          - name: scaler
35            # A simple image with curl and a shell (from the curl team)
36            image: curlimages/curl:latest
37            imagePullPolicy: IfNotPresent
38
39            # Run a shell script
40            command:
41              - sh
42
43            # Inline script to run. Of course, this could be baked into the image!
44            # Using -c to execute a command using a non-login, non-interactive shell
45            # 
46            args:
47              - "-c"
48              - |
49                # Get the details needed for calling the API Server
50                APISERVER=${KUBERNETES_SERVICE_HOST}:${KUBERNETES_SERVICE_PORT_HTTPS}
51                SERVICEACCOUNT_DATA_PATH=/var/run/secrets/kubernetes.io/serviceaccount
52                NAMESPACE=$(cat ${SERVICEACCOUNT_DATA_PATH}/namespace)
53                TOKEN=$(cat ${SERVICEACCOUNT_DATA_PATH}/token)
54                CACERT=${SERVICEACCOUNT_DATA_PATH}/ca.crt
55
56                # The ARC custom resource details
57                ARC_CUSTOM_RESOURCE_GROUP=actions.github.com
58                ARC_CRD_VERSION=v1alpha1
59                ARC_SCALESET_RESOURCE=autoscalingrunnersets
60                
61                # The name of the ARC scaleset to update
62                SCALESET=arc-runner
63                
64                # The data to update. In this case, just `minRunners`.
65                SPEC="{ \"spec\": { \"minRunners\": 30  } }"
66                
67                # Call the API Server
68                curl --cacert ${CACERT} --header "Authorization: Bearer ${TOKEN}" -X PATCH "https://${APISERVER}/apis/${ARC_CUSTOM_RESOURCE_GROUP}/${ARC_CRD_VERSION}/namespaces/${NAMESPACE}/${ARC_SCALESET_RESOURCE}/${SCALESET}" -H "Content-Type: application/merge-patch+json" -H 'Accept: application/json' --data "$SPEC"
69          # Don't automatically restart the container. Only run when it is scheduled.
70          restartPolicy: Never

To deploy the CronJob, put the configurations in a YAML file (deploy.yml). After that, kubectl can be used to manage the deployment. Assuming the runners are in the namespace ${NAMESPACE}:

kubectl apply -f deploy.yml -n ${NAMESPACE}. Deploys the resources.
kubectl get cronjobs -n ${NAMESPACE}. Show the CronJobs.
kubectl delete -f deploy.yml -n ${NAMESPACE}. Cleans up the resources.

You can download the complete YML file here. Don’t forget to update the SCALESET name on line 94. You’ll also want to update the SPEC on line 97 to match your requirements. Optionally, consider making a Helm chart that lets you configure the values.

You can create multiple CronJobs (sharing the same service account) if you need to change the settings at different times. If you have multiple namespaces (for multiple scale sets), you will need to deploy these resources in each namespace.

Conclusion

ARC is focused on doing one task: scaling. That means that the other tasks that the legacy version performed – including scheduled resource changes – are not part of the core code. Instead, they have to be handled using Kubernetes-native features approaches. Instead of learning ARC-specific ways to configure and manage resources, focus on understanding Kubernetes (or rely on a Kubernetes administrator). Working this way provides more flexibility and control over how the cluster and its resources are managed.