Self-Hosted Github Action Runners with EKS

20 May 2023

Reading time ~3 minutes

Setup with Helm

We simply setup ARC via Helm following the instructions from actions-runner-controller.github.io. This will setup the certificate, the action runner controller.

Github Repo/Organization Access

We ended up generating a Personal Access Token with org access.

Required Scopes for Organization Runners

repo (Full control)
admin:org (Full control)
admin:public_key (read:public_key)
admin:repo_hook (read:repo_hook)
admin:org_hook (Full control)
notifications (Full control)
workflow (Full control)

EKS Node Group

Since we all have our own way of spinning up EKS cluster/nodegroup, I will just describe our nodegroup setup.

Main take-away is the following:

capacityType - We’re using SPOT instances to save money.
scalingConfig - We set the minSize of the nodegroup to 0 as auto-scaling will handle this part.
taints - We taint this nodegroup as the EKS cluster is being shared and only want github-runners to run on these instances.

$ aws eks describe-nodegroup --cluster-name preprod-eks-cluster --nodegroup-name preprod-eks-github-runner
{
    "nodegroup": {
        "nodegroupName": "preprod-eks-github-runner",
        "clusterName": "preprod-eks-cluster",
        "status": "ACTIVE",
        "capacityType": "SPOT",
        "scalingConfig": {
            "minSize": 0,
            "maxSize": 5,
            "desiredSize": 0
        },
        "instanceTypes": [
            "m6i.2xlarge",
            "m6a.2xlarge",
            "m6in.2xlarge",
            "r6a.2xlarge",
            "r6i.2xlarge",
            "r6in.2xlarge",
            "m7i.2xlarge"
        ],
        "taints": [
            {
                "key": "github-runner",
                "value": "true",
                "effect": "NO_SCHEDULE"
            }
        ],
        ...
    }
}

Actions Runner Controller

By default, we tell our runner controller auto-scale to have minimum of 1 replica.

scheduledOverrides - During non-working hour, we set the replica to 0 which will then scale down our EKS nodegroup to 0.
metrics - Replicas will scale up on the given lists of repositories.

---
apiVersion: actions.summerwind.dev/v1alpha1
kind: HorizontalRunnerAutoscaler
metadata:
  name: k8s-runners-autoscaler
spec:
  scaleTargetRef:
    name: k8s-runners
  scaleDownDelaySecondsAfterScaleOut: 300
  minReplicas: 1
  maxReplicas: 10
  scheduledOverrides:
    # Override minReplicas to 0 only between 0am to 6am pst
    - startTime: "2023-01-01T21:00:00-07:00"
      endTime: "2023-01-02T06:00:00-07:00"
      recurrenceRule:
        frequency: Daily
      minReplicas: 0
    # Override minReplicas to 0 only during the weekend
    - startTime: "2023-01-07T00:00:00-07:00"
      endTime: "2023-01-08T23:59:59-07:00"
      recurrenceRule:
        frequency: Weekly
      minReplicas: 0
  metrics:
    - type: TotalNumberOfQueuedAndInProgressWorkflowRuns
      repositoryNames:
        - backend-app
        - frontend-app

Runner Deployment

We’re telling the runner pods to only provision on nodes that are in the given nodegroup and with the tolerations set accordingly.

---
apiVersion: actions.summerwind.dev/v1alpha1
kind: RunnerDeployment
metadata:
  name: k8s-runners
spec:
  template:
    spec:
      organization: ourgithuborgname
      labels:
        - self-hosted
      resources:
        limits:
          cpu: "3.3" # daemon sets took 1.4 CPU away from each node. For 8 CPU machine, to fit 2 pods, max is 3.3
          memory: "13Gi"
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
              - matchExpressions:
                  - key: eks.amazonaws.com/nodegroup
                    operator: In
                    values:
                      - preprod-eks-github-runner
      tolerations:
        - effect: NoSchedule
          key: github-runner
          operator: Equal
          value: "true"

Github Workflow

Under each Github jobs, simply change runs-on: value to self-hosted.

jobs:
  my-first-job:
    # in case the self-hosted machine fails, use `ubuntu-latest`
    runs-on: self-hosted

Verify

We’ve since moved our deployment jobs to self-hosted.

$ kubectl get runnerdeployments.actions.summerwind.dev
NAME               ENTERPRISE   ORGANIZATION      REPOSITORY   GROUP   LABELS                 DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
k8s-runners-deploy              companyinc                             ["self-hosted-deploy"] 1         1         1            1           3d8h
k8s-runners-test                companyinc                             ["self-hosted-test"]   4         4         4            4           11h

$ kubectl get horizontalrunnerautoscalers.actions.summerwind.dev
NAME                          MIN   MAX   DESIRED   SCHEDULE
k8s-runners-deploy-autoscaler 1     10    1
k8s-runners-test-autoscaler   1     10    5         min=0 time=2023-11-22 04:00:00 +0000 UTC

$ kubectl get pod
NAME                                         READY   STATUS    RESTARTS   AGE
actions-runner-controller-76b748c4d8-rpf52   2/2     Running   0          2d21h
k8s-runners-test-5nzzw-cvcgw                 2/2     Running   0          5m8s
k8s-runners-test-5nzzw-cw9pt                 2/2     Running   0          6m13s
k8s-runners-test-5nzzw-f5nnn                 2/2     Running   0          3m23s
k8s-runners-deploy-zmb4w-rpsnr               2/2     Running   0          15m

References