Ken Muse

Defining an Infrastructure-as-Code Maturity Model


Infrastructure-as-code (IaC) is one of the most challenging aspects of development for many people. In fact, that’s one of the main reasons so many teams don’t think about it until the end of a project. In truth, however, it is a powerful part of the development lifecycle. Ideally, it helps bring together both the Development and Operations aspects of a project. After all, it’s DevOps not Dev+Ops, right? This requires an evolving mindset and a willingness to continuously improve. To do that, we need to understand where we are and where we want to go. To that end, it’s helpful to have a maturity model.

The IaC maturity model

Infrastructure-as-code is software with a defined lifecycle processes. Just like any other software process, we can use models to try to understand our current and desired state. Models help teams understand where they are in their journey and identify opportunities to improve. Maturity models are often defined using five standard levels: Initial, Managed (also called Consistent or Repeatable), Defined, Measured, and Optimizing. In the context of IaC, these levels could be defined as follows:

Maturity model for infrastructure as code

  1. Initial
    This is the level where most teams start. There is no real standardization, and the infrastructure is created and updated manually. As a result, the system is not easily replaceable and relies on the knowledge of the team members. The processes are not defined, and it can take significant effort to recreate an environment.
  2. Managed
    The infrastructure is managed with a defined process for changes and a testing strategy. There may be some automation and infrastructure-as-code in place, but some components may still be managed manually. Infrastructure code is generally tightly coupled to the project, making it difficult to reuse or share. The processes and test strategy is often documented in a wiki or markdown.
  3. Defined
    The infrastructure is defined in code and version controlled. It is usually stored in the same repository as the application, but not always. There may be some manual management of the infrastructure, but the code is the source of truth. Changes are planned and tested, and key processes may be automated. There may still be some activities such as debugging or logging that rely on direct access to the system. At this level of maturity, teams can often create new environments quickly and consistently.
  4. Measured
    Infrastructure is defined and managed in source code and peer-reviewed before deployment. The infrastructure may have associated test automation to ensure that it is deployed and operating as expected. The infrastructure code generally includes monitoring and alerts to notify the team of any issues. The team has a process for responding to alerts and incidents, and the infrastructure is easily replaceable. Access to the systems is typically restricted and audited, although there may be a “break glass” account for emergency incidents. The code may be stored in a separate repository. The code may also be used as the basis for shared templates that teams can adopt and integrate into their projects. Operations is integrated into the development lifecycle. Teams can often deploy to production with little or no downtime, and ad-hoc environments can be easily created and destroyed. At this level, teams may begin to use modules to create reusable components.
  5. Optimizing
    The core infrastructure definition is defined in a separate repository, follows best practices, and has its own tests in place. The core infrastructure code is typically versioned, packaged and consumed as a module. Teams compose solutions for their applications from standardized modules; this composed definition may be stored alongside the application code or in a standalone repository. Deployments will generally included automated key rotation (where needed) and use passwordless connectivity (or managed identities) for communication between components. Alerts and logging are managed by the team to ensure that the data is actionable and noise is minimized. Best practices are shared and adopted by continuously improving the code. Production deployments generally have no required downtime, and dynamic environments are often created and removed automatically for automated testing.

The Path to Maturity

Maturing your infrastructure-as-code process is a journey, not a destination. It requires a commitment to continuous improvement and a willingness to learn from your mistakes. The first few steps are often the hardest, but they are also the most important. You need to make sure to define the environment as completely as possible. You don’t have to do everything immediately. You just have to commit to continuously improve. You want to work towards the goal of being able to delete the infrastructure and recreate it from scratch at any time. This is the ultimate test of your infrastructure-as-code process.

Once you reach that point, make sure to start including auditing and logging. Without metrics, how will you know if it is performing correctly? And without logs, how will you know what went wrong? You can’t fix what you can’t see. By having visibility to what’s happening, you can then mature your alerting strategy so that you are responding to issues proactively instead of reactively. As this evolves, the development team will need to make sure that any data they need for debugging is properly logged (hopefully, using semantic logging to get structured, queryable data). The ultimate test of this process is being able to debug and fix an issue without having to access the system directly. This will allow you to fully secure the system, eliminating the need for direct access.

Composition and reusability

A well known software principle states “favor composition over inheritance” (from the 1994 book Design Patterns: Elements of Reusable Object-Oriented Software by the “Gang of Four”: Gamma, Helm, Johnson, and Vlissides). Although originally documented for object-oriented code, the same principle can be applied to infrastructure-as-code. In fact, as you continue to mature your solutions, you will eventually reach the point where you need to package the infrastructure code and make it composable and reusable. You will need modules.

As an example, instead of defining a web server for your particular application, you define a web server component. The component will have input parameters that allow it to be customized for different environments and configurations. This component can then be versioned and published, allowing specific point-in-time implementation practices to be used. A new web application can then create a resource definition that uses a specific version of the web component and provides the specific parameters needed for that application. It even can compose multiple modules together (such as a web server and a database) to create a complete environment.

The great things about this approach is that the components can be independently versioned, tested, and released. As requirements change, applications can update the version of the component being referenced to use a newer version. For example, we might create a v2.0 of the web server component that supports a private networking configuration and automatically deploys a next-generation firewall (by adding its own dependency on a firewall module). When a web application is ready to use the new approach, it just changes its reference from v1.0 to v2.0.

Because modules rely on input parameters, that means that we can also dynamically test and validate the resulting environment. The web server component could have a set of tests that creates a new server, provides it some content, and validates that the content is properly served once the module is deployed. When the v2.0 of the web server is released, the tests can be run automatically to validate that the new version continues to work as expected. It can also add additional scripts and tests to validate the networking and firewall support. This makes the component less likely to fail or have unexpected issues when it is used in a real environment.

Most infrastructure as code technologies have a concept of modules. If you’re working with Azure, consider using Bicep modules. If you’re creating resources for VMWare, AWS, GCP, or other environments using Terraform, consider publishing modules to a Terraform registry. Even AWS CloudFormation supports modules. If the tools you are using don’t yet support modules natively, then it is still possible to create your own reusable solution. Consider using ORAS (OCI Registry as Storage) to store versioned packages with your configurations. This uses the same approach that Helm (Charts) and Dev Containers (Features) use for storing their components in a reusable way.

Continuing the journey

Infrastructure-as-code is not a single solution, but a practice that can continue to grow and mature with your applications. By following the maturity model, you can create a roadmap for your solutions. Focusing on automation, visibility, and reusability, you can continue to improve your process and make it more efficient and effective. At the end of the day, the goal is not to reach a specific level of maturity. It is to continuously improve your practices and make it easier to create, manage, deploy, and secure your environments.