Aug 3, 2020 - Jaime Martínez    

How GitLab Pages uses the GitLab API to serve content

GitLab Pages is changing the way it reads a project's configuration to speed up booting times and slowly remove its dependency to NFS

This blog post is Unfiltered

GitLab Pages allows you to create and host GitLab project websites from a user account or group for free on GitLab.com or on your self-managed GitLab instance.

In this post, I will explain how the GitLab Pages daemon obtains a domain's configuration using the GitLab API, specifically on GitLab.com.

How does GitLab Pages know where to find your website files?

GitLab Pages is moving to using object storage to store the contents of your web site. You can follow the development of this new feature here.

At the time of writing, GitLab Pages uses an NFS shared mount drive to store the contents of your website. You can define the value of this path by defining the pages_path in your /etc/gitlab/gitlab.rb file.

When you deploy a website using the pages: keyword in your .gitlab-ci.yml file, a public path artifact must be defined, containing the files available for your website. This public artifact eventually makes its way into the NFS shared mount.

When you deploy a website to GitLab Pages a domain will be created based on the custom Pages domain you have configured. For GitLab.com, the pages domain is *.gitlab.io, if you create a project named myproject.gitlab.io and enable HTTPS, a wildcard SSL certificate will be used. You can also setup a custom domain for your project, for example myawesomedomain.com.

For every project (a.k.a. domain) that is served by the Pages daemon, there must exist a directory in the NFS shared mount that matches your domain name and holds its contents. For example, if we had a project named myproject.gitlab.io, the Pages daemon would look for your .html files under /path/to/shared/pages/myproject/myproject.gitlab.io/public directory. This is how GitLab Pages serves the content published by the pages: keyword in your CI configuration.

Before GitLab 12.10 was released on GitLab.com, the Pages daemon would rely on a file named config.json located in your project's directory in the NFS shared mount, that is /path/to/shared/pages/myproject/myproject.gitlab.io/config.json. This file contains metadata related to your project and custom domain names you may have setup.

{
  "domains":[
    {
      "Domain":"myproject.gitlab.io"
    },
    {
      "Domain": "mycustomdomain.com",
      "Certificate": "--certificate contents--",
      "Key": "--key contents--"
    }
  ],
  "id":123,
  "access_control":true,
  "https_only":true
}

GitLab Pages has been a very popular addition to GitLab, and over time the number of hosted websites on GitLab.com has increased a lot. On start-up, the Pages daemon would traverse all directories in the NFS shared mount and load the configuration of all the deployed Pages projects into memory. At some point in time, the Pages daemon would take over 20 minutes to load per instance on GitLab.com!

GitLab API-based configuration

Introduced in GitLab 12.10

On GitLab.com, the Pages daemon now sources the domain configuration via an internal API endpoint /api/v4/internal/pages?domain=myproject.gitlab.io. This is done on demand per domain and the configuration is cached in memory for a certain period of time to speed up serving content from that Pages node.

The response from the API is very similar to the contents of the config.json file:

{
    "certificate": "--cert-contents--",
    "key": "--key-contents--",
    "lookup_paths": [
        {
            "access_control": true,
            "https_only": true,
            "prefix": "/",
            "project_id": 123,
            "source": {
                "path": "myproject/myproject.gitlab.io/public/",
                "type": "file"
            }
        }
    ]
}

You can see that the source type is file. This means that the Pages daemon will still serve the contents from the NFS shared mount. We are actively working on removing the NFS dependency from GitLab Pages by updating the GitLab Pages architecture.

We are planning to transition GitLab pages to object storage instead of NFS. This will essentially enable GitLab Pages to run on Kubernetes in the future.

Self-managed GitLab instances

The changes to the GitLab Pages architecture were piloted on GitLab.com, which is possibly the largest GitLab Pages implementation. Once all the changes supporting the move to an API-based configuration are completed, they will be rolled out to self-managed customers. You can find more details and the issues we faced while rolling out API-based configuration in this issue.

If you can't wait to speed up your Pages nodes startup, we have a potential guide in this issue description which explains how we enabled the API on GitLab.com. However, this method will be removed in the near future.

Domain source configuration and API status

In the meantime, we are working towards adding a new configuration flag for GitLab Pages which will allow you to choose the domain configuration source by specifying domain_config_source in your /etc/gitlab/gitlab.rb file. By default, GitLab Pages will use the disk source configuration the same way is used today.

In the background, the Pages daemon will start checking the API status by calling the /api/v4/internal/pages/status endpoint. This will help you check if the Pages daemon is ready to talk to the GitLab API, especially when you are running Pages on a separate server.

Please check the GitLab Pages adminstration guide for further troubleshooting.

Cover image by @RetroSupply on Unsplash

Guide to the Cloud Harness the power of the cloud with microservices, cloud-agnostic DevOps, and workflow portability. Learn more Arrow
Git is a trademark of Software Freedom Conservancy and our use of 'GitLab' is under license