I was having a conversation with a colleague the other day and realized that it may be valuable to write something about GitOps. For those not familiar with it, GitOps is simply the concept of using Git as the source of truth for deploying applications. It’s a great way to ensure that you understand the state of your application deployment and can recreate it at any time. When most teams discuss GitOps, they are discussing it as it relates to Kubernetes.
It often surprises teams to learn that there are three flavors of GitOps: push, pull, and push/pull. Each has its own advantages and disadvantages. In this post, I’ll explore these options.
The push model
The push model is the most common and familiar. In fact, it’s the model we typically use for most infrastructure-as-code solutions. In this approach, the environment configuration is stored in Git. Whenever the files change, the workflow or pipeline is triggered and deploys the changes. For example, creating (or updating) a Bicep file in Git could trigger a workflow that deploys the changes to Azure. This also works with Terraform, Cloud Formation, ZIP-published packages, Kubernets manifests, and more. In this model, it’s not uncommon to use OIDC or secrets store in GitHub to provide the necessary credentials for a deployment.
For Kubernetes, this often involves two separate flows. The first flow has the pod or template specifications, deploying those to update one or more Kubernetes environments as those files change. The second flow is for the images. Each image may have a corresponding workflow to build and push it to a container registry. Developers (or automation) can choose when these updated images are applied by updating the pod/template deployment files to reference the new version of the image. This approach supports blue/green deployments, canary deployments, A/B testing, and more. It also allows teams or processes to choose when to move to a specific image version.
This model does not always have additional logic to support drift detection and correction. Drift detection handles situation in which the environment has been changed outside of Git or in an unexpected way. This can include manual changes in the environment, additional resources that co-exist in the environment, or resources that were removed from the Git repository but still exist in the environment. Some environments (like Azure) have built-in support for drift correction. Others – such as Kubernetes – may require additional logic to handle these situations. This is especially true if the target environment does not prevent changes from being made manually.
The pull model
The pull model still relies on Git to be the system of record, but relies on an agent running in the environment to monitor the Git repo for changes. When changes are detected, the agent pulls the repository and then applies the changes to the environment. This approach is often used with ArgoCD or Flux in Kubernetes. In this model, the secrets for the Git repository are stored in the environment, providing the agent with the necessary credentials.
This model often has additional logic to support drift detection and correction. Drift detection handles situation in which the environment has been changed outside of Git or in an unexpected way. It also handles situations where one or more resources exist in the environment but are not defined in Git. By understanding how to handle these issues, the agent can determine how to remedy the situation and deploy the resources. This approach is very common with teams that don’t have strong governance practices to help avoid (or eliminate) drift. That said, it can create a risk that the environment will be perpetually out-of-sync with the Git repo, limiting a core benefit of GitOps. In extreme cases, improper automation rules can lead to the environment either perpetually trying to correct drift or the environment accidentally deleting important resources. The drift detection and reconciliation can also create an additional ongoing load on the Kubernetes environment.
One thing unique to this approach is that the agent itself may require its configuration to also be managed and stored in Git. In this case, changes to the agent are managed by deploying updates to a repository which the agent then uses to update its own definition. This also allows the agent to avoid being updates while it’s applying changes to other resources.
The need to pull changes from Git also requires one of two approaches. The agent may expose a public webhook to receive notifications about changes to the Git environment. This means there is a need to support additional ingress into the environment. That may create security concerns. Alternatively, the agent may rely on a polling approach to periodically check for changes. This approach does not require additional ingress, but does create additional network traffic and an additional load on the Git server.
The push/pull model
This is a model that I see teams implementing when they are early in their GitOps journey. This model may use Git as the primary source of truth, but not in all cases. It builds on the pull model by adding the ability to monitor one or more parts of the environment. When it detects a change, it updates the Git repo to reflect the new state. In some cases, it serves as a way of backing up the environment. In other cases, it serves as a way of memorializing manual changes that may be happening outside of Git. For example, it may capture the current settings after a Kubernetes administrator modifies the cluster to handle a spike in traffic.
A common use for this approach is to monitor a container registry for changes to one or more images. When the images is updated, the agent will update the deployment files to reference the newer version of the image. It may then create a pull request to allow the changes to be reviewed and approved before being applied. Once the pull request is merged, the agent will apply the changes to the environment, upgrading the running version of the image.
Another variation of this approach monitors changes across the cluster. For each change, it serializes the Kubernetes resources, adds them to the Git repo, and pushes the commit directly to the default branch of the repository. This ensures that the Git repo always reflects the current state in Kubernetes. This approach can include encrypted or base-64 encoded data, creating pseudo-binary content which causes the Git repository to grow faster than expected. This approach does not handle dynamic or rapidly scaling environments well. In those cases, the changes that are occurring are related to supporting a specific level of traffic or workloads, creating lots of change records that have little practical value. THe same thing can happen with resources that are primarily controlled via custom resource definitions (CRDs), like GitHub Actions Runner Controller (ARC). In that case, it’s important to filter the resources that the CRDs manage. Otherwise, the captured state will create additional, unnecessary resources any time it is applied.
This monitoring and capturing process in this model tends to scale poorly with growing loads. In fact, as the demands on the environment grow, this process can create significant additional overhead on the API server and the cluster resources. I’ve seen cases where the monitoring process itself creates enough load that it impacts the ability of the cluster to handle the actual workloads. This is in addition to the resource impact created by the agent and the pull model.
Using this approach to capture manual changes is generally considered an anti-pattern. It’s better to avoid the manual changes if possible. Otherwise, Git is not the system of record. Instead, it’s merely a record of the last known state. It’s better to handle these changes via a pull request to the Git repo. This ensures that the changes are reviewed and approved before being applied. Allowing direct, manual updates also has a significant compliance concern. A single individual can make changes to an environment without review. This violates the principles of least privilege and separation of duties.
Conclusion
GitOps is a powerful approach to managing environments. It’s important to understand the different flavors of GitOps their tradeoffs so that you can choose the right approach for your needs. When in doubt, I strongly recommend the push model. It requires the least number of touch points (minimizing security concerns) and supports best practices for governance and compliance. It typically scales better than the other approaches, and it requires you to carefully consider the changes you are making to the environment. That said, the other approaches can be useful in specific situations so its important to understand them as well.