The Quality Department has a focus on measuring and improving the performance of GitLab, as well as creating and validating reference architectures that self-managed customers can rely on as performant configurations.
To ensure that self-managed customers have performant, reliable, and scalable on-premise configurations, the Quality Department has built and verified Reference Architectures. The goal is to provide tested and verified examples to customers which can be used to ensure good performance and give insight into what changes need to be made as organizations scale.
Reference Architectures project is used to track all work related
to GitLab Reference Architectures and
#reference-architectures Slack channel is used for
discussions related to the Reference Architectures.
|Users||Status||Link to more info|
|100k||To Do (on demand)||Issue link|
We have created the GitLab Performance Tool which measures the performance of various endpoints under load. This Tool is in use internally within GitLab, but it is also available for self-managed customers to set up and run in their own environments.
If you have a self-managed instance and you would like to use the Tool to test its performance, please take a look at the documentation in the Tool's README file.
More detailed information about the current test list that is run by GPT can be viewed at the Test Details wiki page.
The GitLab Performance Tool is run against the existing reference architectures using the latest Nightly release of GitLab. This allows us to catch and triage degradations early in the process so that we can try to implement fixes before a new release is created. If problems are found, issues are created for degraded endpoints and are then prioritized during the weekly Bug Refinement meeting.
High-level GPT pipeline overview:
Information on the testing results can be found over on the Reference Architecture documentation.
Every month on the 23rd a comparison pipeline is triggered that provides performance results comparison table of the last 5 GitLab versions. It builds GitLab docker container with the test data using performance-images project, runs GPT against the last 5 GitLab versions simultaneously, then it generates performance results summary.
To ensure consistent and reliable performance results we need to effectively control each part of the process, including the test environment setup and its test data, for the following reasons:
For the above reasons we test against fully controlled environments and don't tests others such as Staging or Production.
The Quality Department aims to enhance the GPT and performance test coverage. One of the goals is to release the GPT v3, you can track its progress in this epic. We plan to further increase the test coverage, especially in more complex areas like CI/CD and Registry.
Additionally, we would like to define a process for conducting an endpoint coverage review on some regular cadence, whether that is after every release, once a quarter, or some other timing. Because GitLab is constantly expanding and evolving, we need to iterate on our coverage in tandem.
We've created an epic to track the initial expansion as well as the work defining our recurring process for analyzing endpoints and verifying our coverage is adequate.
Another area that Quality team would like to explore on is to shift performance testing left.
We have created the GitLab Browser Performance Tool to specifically test web page frontend performance in browsers. More detailed information about the current test pages list can be viewed at the Test Details wiki page.
Testing process is similar to GPT testing process. After 10k environment is updated to the latest Nightly, GBPT is run against the environment and then it's being shut down to save costs.
|Environment||GCP project||Schedule||Latest results and dashboards|
|10k||10k||Every weekday||10k wiki|
When self-managed customers experience or suspect they are experiencing performance issues, we have developed a playbook for initial steps to investigate the problem.
The first step is requesting logs. We use a tool called fast-stats in conjunction with the following log artifacts. These logs should be either rotated, or logs from a peak day after peak time.