Blog Engineering How GitLab can eliminate the massive value stream friction of developer environment provisioning and cleanup
Published on: November 17, 2022
20 min read

How GitLab can eliminate the massive value stream friction of developer environment provisioning and cleanup

It is important to have the complete picture of scaled effects in view when designing automation.

sandeep-singh-3KbACriapqQ-unsplash.jpg

A strong DevOps value stream drives developer empowerment as far left as possible. In GitLab, this is embodied in per-feature branch merge requests that are rich with automated code quality and defect information - including not only findings - but automated remediation capabilities and collaboration. Some defects and code quality issues can only be found by analyzing a running copy of the application, including DAST, IAST, fuzzing and many others. GitLab has built a fully automated, seamless developer environment lifecycle management approach right into the developer experience. In fact, it’s so seamlessly built-in, it can be easy to overlook how critical developer environment lifecycle management is. This article will highlight why and how GitLab adds value using developer environment automation. In addition, while GitLab provides out of the box developer environment lifecycle management for Kubernetes, this article demonstrates an approach and a working example of how to extend that capability to other common cloud-based application framework PaaS offerings.

Provisioning of development environments is generally a negative feedback loop

In a prior job, I worked on a DevOps transformation team that supported multiple massive shared development environments in AWS. They were accessible to more than 4,000 developers working to build more than 100 SaaS applications and utility stacks. In the journey to the AWS Cloud, each development team took ownership of the automation required to deploy their applications. Since developers were able to self-service, over time this solved the problem of development friction generated by waiting for environments to be provisioned for testing, feature experiments, integration experiments, etc.

However, the other half of the problem then ballooned - environment sprawl - with an untold number of environments idling without management and without knowledge of when they could be torn down. Over time the development environment cost became a significant multiple of production costs. The cloud has solved problems with environment provisioning bottlenecks due to hardware acquisition and provisioning, but this can also inadvertently fuel the high costs of unmanaged sprawl. This problem understandably causes organizations to raise administrative barriers to new development environments.

In many organizations this becomes a vicious cycle - most especially if developer environments are operated by a different team, or worse, on an independent budget. Environment justification friction usually comes quickly after discovering the true cost of the current running environments. Developers then have to justify the need for new environment requests and they have to make the gravest of promises to disband the environment as soon as they are done. Another friction arises when a separate group is tasked with cost controls and environment provisioning and cleanup. This introduces friction in the form of administrative and work queueing delays. Coordination friction also crops up because an accurate understanding of exactly what is needed for an environment can be challenging to convey. When mistakes are made or key information is missing, developers must go back and forth on support requests to get the configuration completely correct.

Partial automation can worsen the problem

That’s the first half of the environment lifecycle, but as I mentioned, even if that is fully automated and under the control of developers, the other half of the feedback loop comes into play. When a given development environment has fulfilled its initial justification reason, the team does not want to destroy it because environments are so hard to justify and create. Then the sprawl starts and, of course, the barriers to new environments are raised even higher. This is a classic negative feedback loop.

Systems theory shows us that sometimes there are just a few key factors in stopping or even reversing a negative feedback loop. Lets take this specific problem apart and talk about how GitLab solves for it.

Treat developer environments as a complete lifecycle

In the prior example it is evident that by leaving out the last stage of the environment lifecycle - retirement or tear down - we still end up with a negative feedback loop. Removing provisioning friction actually makes the problem worse if retirement friction is not also addressed at the same time. Solutions to this problem need to address the entire lifecycle to avoid impacting value stream velocity. Neglecting or avoiding the retirement stage of a lifecycle is a common problem across all types of systems. In contrast, by addressing the entire lifecycle we can transform it from being a negative feedback loop to a managed lifecycle.

The problems of who and when

Buried inside the insidious friction loop are a couple key coordination problems we’ll call “Who and When.” Basically, "Who" should create environments and "When" should they be created to ensure reasonable cost optimization? Then again, Who should cleanup environments and When do you know that the environment is no longer needed with certainty? Even with highly collaborative teams working hard together for maximum business value, these questions present a difficulty that frequently results in environments running for a long time before they are used and after they are no longer needed. The knowledge of appropriate timing plays a critical role in gaining control over this source of friction.

The problem of non-immutable development environments

Friction in environment lifecycle management creates a substantial knock-on problem associated with long-lived environments. Long-lived environments that are updated multiple times for various independent projects start to accumulate configuration rot; they become snowflakes with small changes that are left over from non-implemented experiments, software or configuration removals, and other irrelevant bits and pieces. Immutability is the practice of not doing “in place” updates to a computing element, but rather destroying it and replacing it with a fresh, built-from-scratch, element. Docker has made this concept very accepted and effective in production workloads, but development environments frequently do not have this attribute due to automating without the design constraint of immutability, so they are updated in-place for reuse by various initiatives. If the environment lifecycle is not fully automated, it impossible to make them workable on a per-feature branch basis.

The problem of non-isolated development environments

When environments are manually provisioned or when there is a lot of cost or administrative friction to setting them up, environment sharing becomes more common place. This creates sharing contention at many levels. Waiting to schedule into use an environment, pressure to complete work quickly so others can use the environment, and restrictions on the types of changes that can be made to shared environments are just some of the common sharing contention elements that arise. If environments can be isolated, then sharing contention friction evaporates. Pushing this to the extreme of a per-feature branch granularity brings many benefits, but is also difficult.

Effect on the development value stream

The effect that a friction-filled environment lifecycle has on the value stream can be immense - how many stories have you heard of projects waylaid for weeks or months while waiting on environment provisioning? What about defects shipped to production because a shared environment had left over configuration during testing? Frequently this friction is tolerated in the value stream because no one will argue that unlimited environment sprawl is an unwise use of company resources. We all turn off the lights in our home when we are no longer using a room and it is good business sense and good stewardship not to leave idle resources running at work.

The concept of good stewardship of planetary resources is actually becoming an architectural level priority in the technology sector. This is in evidenced in AWS’ introduction of the “Sustainability” pillar to the AWS Well Architected principals in 2021 and many other green initiatives in the technology sector.

It’s imperative that efforts to improve the development value stream consider whether developer environment management friction is hampering the breadth, depth and velocity of product management and software development.

Seamless and fully automated review environment lifecycle management

What if this negative feedback loop could be stopped? What if new environments were seamless and automatically created right at the moment they were needed? What if developers were completely happy to immediately tear down an environment when they were done because it takes no justification nor effort on their part to create new one at will?

Enter GitLab Review Environments!

GitLab review apps are created by the developer action of creating a new branch. No humans are involved as the environment is deployed while the developer is musing their first code changes on their branch.

As the developer pushes code updates the review apps are automatically updated with the changes and all quality checks and security scanning are run to ensure the developer understands that they introduced a vulnerability or quality defect. This is done within the shortest possible amount of time after the defect was introduced.

When the developer merges their code, the review app is automatically torn down.

This seamless approach to developer environment provisioning and cleanup addresses enough of the critical factors in the negative feedback loop that it is effectively nullified.

Consider:

  • Developer environment provisioning and cleanup are fully automated, transparent, developer-initiated activities. They do not consume people nor human process resources, which are always legions slower and more expensive than technology solutions.
  • Provisioning and cleanup timing are exactly synchronized with the developer’s need, preventing inefficiencies in idle time before or after environment usage.
  • They are immutable on a new branch basis - a new branch always creates a new environment from fresh copy of the latest code.
  • They are isolated - no sharing contention and no mixing of varying configuration.
  • They treat developer environments as a lifecycle.

It is so transparent that some developers may not even realize that their feature branch has an isolated environment associated with it.

Hard dollar costs are important and opportunity costs are paramount

GitLab environments positively contribute to the value stream in two critical ways. First, the actual waste of idle machines is dramatically reduced. However, more importantly, all the human processes that end up being applied to managing that waste also disappear. Machines running in the cloud are only lost money. Inefficient use of people’s time carries a high dollar cost but it also carries a higher opportunity cost. There are so many value-generating activities people can do when their time is unencumbered by cost-control administration.

Multiplying the value stream contributions of developer review environments

Developer environment friction is an industry-wide challenge and GitLab nearly eliminates the core problems of this feedback cycle. However, GitLab has also gone way beyond simply addressing this problem by creating a lot of additional value through seamless per-feature branch developer environments.

Here is a visualization of where dynamic review environments plug into the overall GitLab developer workflow.

Figure 1: Review environments with AWS Cloud Services

Figure 1 is showing GitLab’s full development cycle support with a little art of the possible thrown in around interfacing with AWS deployment services. The green dashed arrow indicates that GitLab deploys a review environment when the branch is first created. Since the green arrow is part of the developer's iteration loop, the green arrow is also depicting that review app updates are done on each code push.

The light purple box is showing that the iterative development and CI checks are all within the context of a merge request (MR), which provides a Single Pane of Glass (SPOG) for all quality checks, vulnerabilities and collaboration. Finally, when the merge is done, the review environment is cleaned up. The feature branch merge request is the furthest left that visibility and remediation can be shifted. GitLab’s shifting of this into the developer feature branch is what gives developers a semi-private opportunity to fix any quality or security findings with the specific code they have added or updated.

One other thing to note here is that when GitLab CD code is engineered to handle review environments, it is reused for all other preproduction and production environments. The set of AWS icons after the “Release” icon would be using the same deployment code. However, if the GitLab CD code is engineered only around deploying to a set of static environments, it is not automatically capable of review environments. Review environment support is a superset of static environment support.

Review environments enable a profound shift left of visibility and remediation

At GitLab “shift left” is not just about “problem visibility” but also about “full developer enablement to resolve problems” while in-context. GitLab merge requests provide critical elements that encourage developers to get into a habit of defect remediation:

  • Context - Defect and vulnerability reporting is only for code the developer changed in their branch and is tracked by the merge request (MR) for that branch.
  • Responsibility - Since MRs and branches are associated to an individual, it is evident to the developer (and the whole team) what defects were introduced or discovered by which developers.
  • Timing - Developers become aware of defects nearly as soon as they are introduced, not weeks or months after having integrated with other code. If they were working on a physical product, we can envision that all the parts are still on the assembly bench.
  • Visibility - Appropriately Local, Then Appropriately Global - Visibility of defects is context specific. While a developer has an open MR that is still a work in progress, they can be left alone to remedy accidentally-introduced defects with little concern from others because the visibility is local to the MR. However, once they seek approvals to merge their code, then the approval process for the MR will cause the visibility of any unresolved defects and vulnerabilities to come to the attention of everyone involved in the approval process. This ensures that oversight happens with just the right timing - not too early and not forgotten. This makes a large-scale contribution to human efficiency in the development value stream.
  • Advisement - As much as possible GitLab integrates tools and advice right into the feature branch MR context where the defects are visible. Developers are given full vulnerability details and can take just-in-time training on specific vulnerabilities.
  • Automated Remediation - Developers can choose to apply auto-remediations when they are available.
  • Collaboration - They can use MR comments and new issues to collaborate with team mates throughout the organization on resolving defects of all types.

Having seamless, effortless review environments at a per-feature branch granularity is a critical ingredient in GitLab’s ability to maximize the shift left of the above developer capabilities. This is most critical in the developer checks that require a running copy of application, which is provided by the review environments. These checks include things such as DAST, IAST, API fuzzing and accessibility testing. The industry is also continuing to multiply the types of defect scanners that require an actively running copy of the application.

Extending GitLab review environments to other cloud application framework PaaS

So you may be thinking, “I love GitLab review environments, but not all of our applications are targeting Kubernetes.” It is true that the out- of-the-box showcasing of GitLab review environments depends on Kubernetes. One of the key reasons for this is that Kubernetes provides an integrated declarative deployment capability known as deployment manifests. The environment isolation capability, known as namespaces, also provides a critical capability. GitLab wires these Kubernetes capabilities up to a few key pieces of GitLab CD to accomplish the magic of isolated, per-feature branch review environments.

As far as I know there is no formal or defacto industry term for what I’ll call “Cloud Application Framework PaaS.” Cloud-provided PaaS can be targeted at various “levels” of the problem of building applications. For instance, primitive components such as AWS ELB address the problem of application load balancing by providing a variety of virtual, cloud-scaling and secured appliances that you can use as a component of building an application. Another example is AWS Cognito to help with providing user login and profile services to an application build.

However, there are also cloud PaaS offerings that seek to solve the entire problem of rapid application building and maintenance. These are services like AWS Amplify and AWS AppRunner. These services frequently knit together primitive PaaS components (such as described above) into a composite that attempts to accelerate the entire process of building applications. Frequently these PaaS also include special CLIs or other developer tools that attempt to abstract the creation, maintenance and deployment of an Infrastructure as Code layer. They also tend to be GitOps-oriented by storing this IaC in the same repository as the application code, which enables full control over deployments via Git controls such as branches and merge requests.

This approach relieves developers of early stage applications from having to learn IaC or hire IaC operations professionals too early. Basically it allows avoidance of overly early optimization of onboarding IaC skills. If the application is indeed successful it is quite common to outgrow the integrated IaC support provided by these specialized PaaS, however, the evolution is very natural because the managed IaC can simply start to be developed by specialists.

The distinction of cloud application framework PaaS is important when understanding where GitLab can create compound value with Dynamic Review Environments. I will refer to this kind of PaaS as “Cloud Application Infrastructure PaaS” that tries to solve the entire “Building Applications Problem.”

So we have a bunch of GitLab interfaces and conventions for implementing seamless developer review environments and we have non-Kubernetes cloud application infrastructures that provide declarative deployment interfaces and we can indeed make them work together! Interesting it is all done in GitLab CI YAML, which means that once you see the art of the possible, you can start implementing dynamic review environment lifecycle management for many custom environment types with the existing GitLab features.

A working, non-Kubernetes example of dynamic review environments in action

Figure 2: Working CD example of review environments for AWS CloudFormation

Figure 2 shows the details of an actual non-Kubernetes working example called CloudFormation AutoDeploy With Dynamic Review Environments. This project enables any AWS CloudFormation template to be deployed. It specifically supports an isolated stack deployment whenever a review branch is created and then also destroys that environment when the branch is merged.

Here are some of the key design constraints and best practices that allow it to support automated review environments.:

  • The code is implemented as an include. Notice that the main .gitlab-ci.yml files have only variables applicable to this project and then the inclusion of Deploy-AWSCloudFormation.gitlab-ci.yml. This allows you to treat the CloudFormation integration as a managed process, shared include to be improved and updated. If the stress of backward compatibility of managing a shared dependency is too much, you can encourage developers to make a copy of this file to essentially version peg it with their project.

  • Avoids Conflict with Auto DevOps CI Stage Names - The standard stages of Auto Devops are here. This constraint allows the auto deploy template to be leveraged.

  • Creates and Sequences Custom Stages as Necessary - For instance, you can see we’ve added create-changeset stage and jobs.

  • The deploy-review job and it’s environment: section must have a very specific construction, let’s look at the important details:

      rules:
        - if: '$CI_COMMIT_BRANCH == "main"'
          when: never
        - if: '$REVIEW_DISABLED'
          when: never
        - if: '($CI_COMMIT_TAG || $CI_COMMIT_BRANCH) && $REQUIRE_CHANGESET_APPROVALS == "true"'
          when: manual
        - if: '($CI_COMMIT_TAG || $CI_COMMIT_BRANCH) && $REQUIRE_CHANGESET_APPROVALS != "true"'
      artifacts:
        reports:
          dotenv: envurl.env
      environment:
        name: review/$CI_COMMIT_REF_SLUG
        url: $DYNAMIC_ENVIRONMENT_URL
        on_stop: stop_review
    
    • rules: are used to ensure this job only runs when we are not on the main branch. The main branch implements long lived stage and prod environments.
    • artifacts:reports:dotenv allows variables populated during a CI job to become pipeline level variables. The most critical role this does in this job is to allow the URL retrieved from CloudFormation Outputs to be populated into the variable DYNAMIC_ENVIRONMENT_URL. The file enviurl.env would have at least the line DYNAMIC_ENVIRONMENT_URL={url-from-cloudformation} in it. You can see this in the job code as echo "DYNAMIC_ENVIRONMENT_URL=${STACK_ENV_URL}" >> envurl.env
    • environment:name: is using the Auto Deploy convention of placing review apps under the review environments top level called review The reference $CI_COMMIT_REF_SLUG ensures that the branch (or tag name) is used, but with all illegal characters removed. By your development convention, the Environment Name should become a part of the IaC constructs that ensure both uniqueness as well as identifiability by this pipeline. In GitLab's standard auto deploy for Kubernetes this is done by constructing a namespace that contains the name in this provided parameter. In CloudFormation we make it part of the Stack Name. The value here is exposed in the job as the variable ${ENVRONMENT}.
    • environment:url: it is not self-evident here that the variable DYNAMIC_ENVIRONMENT_URL was populated by the deployment job and added to the file enviro.env so that it would contain the right value at this time. This causes the GitLab “Environment” page to have a clickable link to visit the environment. It also is used by DAST and other live application scan engines to find and scan the isolated environment.
    • environment:on_stop: in the deploy-review job is what maps to the stop_review named job. This is the magic sauce behind automatic environment deletion when a feature branch is merged. stop_review must be written with the correct commands to accomplish the teardown.

A reusable engineering pattern

This CloudFormation pattern serves as a higher-level pattern of how GitLab review environments can be adopted to any other cloud “Application Level PaaS.” This is a term I use to indicate a cloud PaaS that is abstracted highly enough that developers think of it as “a place to deploy applications.” Perhaps a good way to contrast it with PaaS that does not claim to serve as an entire application platform. Cloud-based load balancers are a good example of a PaaS that performs a utility function for applications but is not a place to build an entire cloud application.

Application PaaS for abstracting IaC concerns for developers

GitLab auto deploy combines well with the cloud application framework PaaS that has a disposition toward developer productivity by reducing or eliminating IaC management required by developers. AWS Amplify has such productivity support in the form of a developer specific CLI which allows impacting to be authored and updated in the same Git repository where the application code is stored. Adding an entire scaling database PaaS is as simple as running a single CLI command.

Generally such Application PaaS not only generate and help maintain IaC through highly abstracted CLI or UI actions, they also contain a single deploy command which is easily combined with a GitLab Auto Deploy template for working with that particular Application PaaS.

Wrap up

Hopefully this article has helped you understand that:

  • GitLab already contains a super valuable feature that automates developer environment lifecycle management.
  • It is critical in addressing a key friction in the DevOps value chain.
  • It can be extended beyond Kubernetes to other cloud application framework PaaS offerings.

Photo by Sandeep Singh on Unsplash

We want to hear from you

Enjoyed reading this blog post or have questions or feedback? Share your thoughts by creating a new topic in the GitLab community forum. Share your feedback

Ready to get started?

See what your team could do with a unified DevSecOps Platform.

Get free trial

Find out which plan works best for your team

Learn about pricing

Learn about what GitLab can do for your team

Talk to an expert