Blog AI/ML Developing GitLab Duo: Blending AI and Root Cause Analysis to fix CI/CD pipelines
Published on: June 6, 2024
9 min read

Developing GitLab Duo: Blending AI and Root Cause Analysis to fix CI/CD pipelines

Discover how we've infused Root Cause Analysis with AI to help remedy broken CI/CD pipelines, including example scenarios and take-away exercises.

gitlab duo - new cover

Generative AI marks a monumental shift in the software development industry, making it easier to develop, secure, and operate software. Our new blog series, written by our product and engineering teams, gives you an inside look at how we create, test, and deploy the AI features you need integrated throughout the enterprise. Get to know new capabilities within GitLab Duo and how they will help DevSecOps teams deliver better results for customers.

Have you ever encountered a broken CI/CD pipeline and had to halt your DevSecOps workflow, or even delay software deployment, as you try to figure out the root cause? Traditionally, when something goes wrong in the process of creating software, developers have to troubleshoot, dig through log files, and often do a lot of trial and error development. GitLab Duo Root Cause Analysis, part of our suite of AI-powered features, removes the guesswork by determining the root cause for a failed CI/CD pipeline. In this article, you'll learn what Root Cause Analysis is and how to apply the AI-powered GitLab Duo feature to your DevSecOps workflow.

Discover the future of AI-driven software development with our GitLab 17 virtual launch event. Watch today!

What is Root Cause Analysis?

GitLab Duo Root Cause Analysis is an AI-powered feature that assists you in determining a root cause and suggesting a fix for a CI/CD job log failure by analyzing the logs.

While Root Cause Analysis is often seen in product incident management, its workflows and debugging practices can be found in any DevSecOps workflow. Ops teams, administrators, and platform engineers are challenged by infrastructure-as-code (IaC) deployment errors, Kubernetes and GitOps problems, and long stack traces while investigating pipeline failures.

GitLab Duo Root Cause Analysis keeps everyone in the same interface and uses AI-powered help to summarize, analyze, and propose fixes so that organizations can release secure software faster.

A pipeline can encounter failures for a variety of reasons, including syntax errors in the code, missing dependencies that the pipeline relies on, test failures during the build process, Kubernetes and IaC deployment timeouts, and numerous other potential issues. When such failures occur, it becomes the responsibility of everyone to meticulously review the logs generated by the pipeline. This job log review process involves scrutinizing the detailed output to identify the specific errors and pinpoint the root cause of the pipeline failure. For example, the following pipeline has multiple job failures that need to be investigated and fixed.

Image depicting multiple job failures

The duration required to fix these failures can vary significantly and is largely influenced by several factors such as:

  • the developer's familiarity with the project
  • their level of experience in dealing with similar issues
  • their overall skill level in troubleshooting and problem-solving within the context of the pipeline.

Manual analysis can be exceedingly challenging and time-consuming, given that log data consists of application logs and system messages with a wide variety of potential sources of failures. A typical pipeline fix can consist of several iterations and context switching. The complexity and the unstructured nature of the logs is a perfect fit for speeding up the task using generative AI. Using AI can reduce the time to identify and fix a pipeline error significantly and also lower the barrier of expertise that would be needed to fix a pipeline such as the above.

Watch GitLab Duo Root Cause Analysis in action:

How does Root Cause Analysis work?

Root Cause Analysis works by forwarding a portion of the CI/CD job log to the GitLab AI Gateway. GitLab ensures that the portion sent will fit inside the large language model (LLM) token limits alongside a prompt that has been pre-crafted to provide insights into why the job might have failed. The prompt also instructs the LLM to provide an example of how a user might fix a broken job.

Here are two example scenarios where Root Cause Analysis can provide assistance.

1. Analyze a Python dependency error

A Python application can import package modules with functionality that is not provided in the standard library. The project Challenge - Root Cause Analysis - Python Config implements an application that parses configuration and initializes an SQLite database, which both work well without any dependencies. It uses best practices in CI/CD with a Python environment and caching. The latest feature implementation adds a Redis caching client, and now the CI/CD build is failing for some reason.

By using Root Cause Analysis, you can immediately learn that the ModuleNotFoundError text means that the module is actually not installed in the Python environment. GitLab Duo also suggests an example fix: Installing the Redis module through the PIP package manager.

Image depicting 'modulenotfounderror' and GL Duo suggested resolution

The failing pipeline can be viewed here.

The Root Cause Analysis prompt provides a summary of the problem, which seems to be a problem with a missing redis module. Let's try to fix the problem by installing the redis module. You can either call pip install redis in the CI/CD job script section, or use a more sophisticated approach with the requirements.txt file. The latter is useful for a single source of truth for dependencies installed in the development environment and CI/CD pipelines.

  extends: [.python-req]
  stage: test 
    # [🦊] hint: Root cause analysis.
    # Solution 1: Install redis using pip
    - pip install redis
    # Solution 2: Add redis to requirements.txt, use pip
    - pip install -r requirements.txt 

    - python src/

After fixing the missing Python dependency, the CI/CD job fails again. Use Root Cause Analysis again to learn that no Redis service is running in the job. Switch to using GitLab Duo Chat and use the prompt How to start a Redis service in CI/CD to learn how to configure the services attribute in the CI/CD job.

Depicts the prompt for how to start a Redis service

Modify the .gitlab-ci.yml with the test job, and specify the redis service.

  extends: [.python-req]
  stage: test 
    # [🦊] hint: Root cause analysis.
    # Solution 1: Install redis using pip
    - pip install redis
    # Solution 2: Add redis to requirements.txt, use pip
    - pip install -r requirements.txt 

    - python src/

  # Solution 3 - Running Redis
    - redis

Running the Redis server allows you to successfully execute the Python application, and print its output into the CI/CD job log.

output of Python application

The solution is provided in the solution/ directory.

Tip: You can also ask GitLab Duo Chat to follow up on potential future problems:

How to lint Python code? Which tools are recommended for CI/CD.
How to pin a package version in Python requirements file?	
What are possible ways that this exception stacktrace is triggered in the future?
Are there ways to prevent the application from failing?

The next example is more advanced and includes multiple failures.

2. Analyze missing Go runtime

CI/CD jobs can be executed in containers, spawned from the contributed image attribute. If the container does not provide a programming language runtime, the executed script sections referencing the go binary fail. For example, the error message /bin/sh: eval: line 149: go: not found needs to be understood and fixed.

If the go command is not found in the container's runtime context, this can have multiple reasons:

  1. The job uses a minimal container image, for example alpine, and the Go language runtime was not installed.
  2. The job uses the wrong default container image, for example, specified on top of the CI/CD configuration, or using the default keyword.
  3. The job does not use a container image but the shell executor. The host operating system does not have the Go language runtime installed, or it is otherwise broken/not configured.

The project Challenge - Root Cause Analysis - Go GitLab Release Fetcher provides an exercise challenge to analyze and fix CI/CD problems with a GitLab release fetcher application, written in Go. The build and docker-build CI/CD jobs are failing. Fixing the problem requires different scopes: Understanding why the Go runtime is not installed, and learning about the Dockerfile syntax.

Screenshot showing Change Docker Label job failed

The solution/ directory provides two possible solutions after Root Cause Analysis.

Practice using Root Cause Analysis

Here are some scenarios to use to practice using Root Cause Analysis.

  • When you are running into Kubernetes deployment errors or timeouts.

  • With OpenTofu or Terraform IaC pipelines failing to provision your cloud resources.

  • When the Ansible playbook fails with a cryptic permission error in CI/CD.

  • When the Java stack trace is 10 pages long.

  • With a shell script highlighting an execution error.

  • When a Perl script fails in a single line, which is the only line in the script.

  • When the CI/CD job times out and it is unclear which section would cause this.

  • When a network connection timeout is reached, and you think it cannot be DNS.

What is next for GitLab Duo Root Cause Analysis?

We want to help our users to get their pipelines back to passing in fewer iterations. The Root Cause Analysis will open and show the response in GitLab Duo Chat, our AI assistant. Users can build on the recommendation to generate a more precise fix by asking specific questions (e.g., programming language-specific fixes) or asking for alternative fixes based on the root cause.

For example, here is the Root Cause Analysis for a failing job:

Root Cause Analysis response

Users can ask follow-up questions that build upon the AI-generated response.

  • I do not want to create my own Docker image. Please explain different ways to fix the problem.

  • I don't have access to the Docker image creation. It seems that the Go binary is missing. Are there alternative images you can suggest?

GitLab also will be running quality benchmarks for the generated responses and shipping usability improvements.

Please see our Root Cause Analysis GA epic for more details. We would also love your feedback on the feature. Please leave a comment on our Root Cause Analysis feedback issue.

Get started with Root Cause Analysis

Please see our documentation on how to enable the feature available to our GitLab Ultimate customers. Also, GitLab Duo Root Cause Analysis will soon be coming to GitLab self-managed and GitLab Dedicated.

Not a GitLab Ultimate customer? Start a 30-day free trial today.

Read more of our "Developing GitLab Duo" series

We want to hear from you

Enjoyed reading this blog post or have questions or feedback? Share your thoughts by creating a new topic in the GitLab community forum. Share your feedback

Ready to get started?

See what your team could do with a unified DevSecOps Platform.

Get free trial

Find out which plan works best for your team

Learn about pricing

Learn about what GitLab can do for your team

Talk to an expert