We categorize performance into 3 facets
Backend performance is scoped to response time of API, Controllers and command line interfaces (e.g. git).
DRI: Christopher Lefelhocz, VP of Development.
Performance Indicators:
Frontend performance is scoped to response time of the visible pages and UI components of GitLab.
DRI: Christopher Lefelhocz, VP of Development.
Performance Indicators:
Infrastructure performance is scoped to the performance of GitLab SaaS Infrastructure.
DRI: Steve Loyd, VP of Infrastructure.
Performance Indicators:
Meta issue to track various issues listed here is at on the infrastructure tracker.
Performance of GitLab and GitLab.com is ultimately about the user experience. As also described in the product management handbook, "faster applications are better applications".
Our current focus at the moment are two indicators:
On a mid term we target to focus on all of the Web Vitals with introducing also a bigger focus on First Input delay (FID) and Cumulative Layout Shift (CLS). So if routes are already performing well with our main indicators please extend optimisations on those.
There are many other performance metrics that can be useful in analyzing and prioritizing work, some of those are discussed in the sections below. But the user experienced LCP is the target for the site as a whole, and should be what everything ties back to in the end.
Groups should monitor closely the user experience in regards of performance to also improve the perceived performance also outside those measured performance indicators. For example if any action after loading is very slow and takes a lot of time.
Every end-user performance metric we can, through sitespeed.io by having automatic runs every 4 hours. Any data we collect can be helpful to provide us to analyze for improvements or bottle necks on specific routes. We are sending the data to an graphite instance for continous data storage which is used for all Grafana dashboards. On top of that we also save full reports (links are visible by activating the Runs
toggle on a sitespeed dashboard) to have more insight data, slow motion data, HAR files and full Lighthouse reports.
We currently measure with an empty cache, the connection limited to Cable
and a medium CPU based machine which is located in us-central
every 4 hours.
The URLs from GitLab.com listed in the table below form the basis for measuring performance improvements - these are heavy use cases. The times indicate time passed from web request to "the average time at which visible parts of the page are displayed" (per the definition of Speed Index). Since the "user" of these URLs is a controlled entity in this case, it represents an external measure of our previous performance metric "Speed Index".
Type | 2018-04 | 2019-09 | 2020-02 | Now* |
Issue List: GitLab FOSS Issue List | 2872 | 1197 | - | N/A |
Issue List: GitLab Issue List | 1581 | |||
Issue: GitLab FOSS #4058 | 2414 | 1332 | 1954 | |
Issue Boards: GitLab FOSS repo boards | 3295 | 1773 | - | N/A |
Issue Boards: GitLab repo boards | 2619 | |||
Merge request: GitLab FOSS !9546 | 27644 | 2450 | 1937 | |
Pipelines: GitLab FOSS pipelines | 1965 | 4098 | - | N/A |
Pipelines: GitLab pipelines | 4289 | |||
Pipeline: GitLab FOSS pipeline 9360254 | 4131 | 2672 | 2546 | |
Project: GitLab FOSS project | 3909 | 1863 | - | N/A |
Project: GitLab project | 1533 | |||
Repository: GitLab FOSS Repository | 3149 | 1571 | - | N/A |
Repository: GitLab Repository | 1867 | |||
Single File: GitLab FOSS Single File Repository | 2000 | 1292 | - | N/A |
Single File: GitLab Single File Repository | 2012 | |||
Explore: GitLab explore | 2346 | 1354 | 1336 | |
Snippet: GitLab Snippet 1662597 | 1681 | 1082 | 1378 |
*To access the sitespeed grafana dashboards you need to be logged into your Google account
Note: Since this table spans time before and after single-codebase we kept GitLab FOSS pages close to GitLab ones to enable comparisons despite not being exactly the same project.
Sitespeed - Page timing summaries
If you activate the runs
toggle you will have annotations with links to all full reports. Currently we are running measurements every 2 hours.
All items that start with the tachometer () symbol represent a step in the flow that we measure. Wherever possible, the tachometer icon links to the relevant dashboard in our monitoring. Each step in the listing below links back to its corresponding entry in the goals table.
Consider the scenario of a user opening their browser, and surfing to their dashboard by typing gitlab.com/dashboard
, here is what happens:
HTTP queue time
.RootController#index
. The round trip time it takes for a request to
start in Unicorn and leave Unicorn is what we call Transaction
Timings
. RailsController requests are sent to (and data is received from):SQL timings
),git timings
),cache timings
).gitlab.com/dashboard
example, the controller addresses all three .Load
)
when this particular user hits gitlab.com/dashboard/issues
. The number of SQL calls
will depend on how many projects the person has, how much may already be in cache, etc.view timings
).
In some controllers, data is gathered first after which a view is
constructed. In other controllers, data is gathered from within a
View, so that the view timing
in those cases includes the time it
took to call NFS, PostgreSQL, Redis, etc. And in many cases, both are done.gitlab.com/dashboard/issues
, there are 56 nested / partial views rendered (search for View::
)First Byte - External
is measured for a hand selected number of URLs using SiteSpeeddefer="true"
, so they are parsed and executed in the same
order as they are called but only after html + css has been rendered.DOMContentLoaded
event. The new call is for a new URL, and such requests are routed either
through the Web or API workers, invoke their respective Rails controllers
on the backend, and return the requested files (HTML, JSON, etc).
For example, the calendar and activity feeds on a username page gitlab.com/username
are two separate AJAX calls, triggered by DOMContentLoaded
. (The
DOMContentLoaded
event "marks the point when both the DOM
is ready and there are no stylesheets that are blocking JavaScript
execution" (taken from an article about the critical rendering
path)).
The alternative to using AJAX would be to include the full Rails code to
generate the calendar and activity feed within the same controller that
is called by the gitlab.com/username URL; which would lead to slower First
Paint since it simply involves more calls to the database etc.First read about the steps in a web request above, then pick up the thread here.
After pushing to a repository, e.g. from the web UI:
git-receive-pack
process (on the workhorse machine) to save the new commit to NFSgit-receive-pack
fires a git hook to trigger GitLab Shell
.
post-receive
hook, and the git-receive-pack
process passes along details of what was pushed to the repo to the post-receive
hook. More specifically, it passes a list of three items: old revision, new revision, and ref (e.g. tag or branch) name.post-receive
hook to Redis, which is the Sidekiq queue.
Consider the scenario of a user opening their browser, and surfing to their favorite URL on GitLab.com
. The steps are described in the section on "web request". In this table, the steps are measured and goals for improvement are set.
Guide to this table:
# per request
: average number of times this step occurs per request. For instance, an average "transaction" may require 0.2 SQL calls, 0.4 git calls, 1 call to cache, and 30 nested views to be built.p99 Q2-17
: the p99 timing (in milliseconds) at the end of Q2, 2017p99 Now
: link to the dashboard that displays the current p99 timingp99 Q3-17
: the target for the p99 timing by the end of Q3, 2017Notes:
Table to be built; merge requests welcome!
For any performance metric, the following modifiers can be applied:
Internal: the time as measured from inside GitLab.com's infrastructure (the boundary is defined as being at the "network | Azure load balancer" interface). |
Timing history for First Byte are listed in the table below (click on the tachometer icons for current timings). All times are in milliseconds.
Type | End of Q4-17 | Now |
---|---|---|
Issue: GitLab CE #4058 | 857 | |
Merge request: GitLab CE !9546 | 18673 | |
Pipeline: [GitLab CE pipeline 9360254] | 1529 | |
Repo: GitLab CE repo | 1076 |
To go a little deeper and measure performance of the application & infrastructure without consideration for frontend and network aspects, we look at "transaction timings" as recorded by Unicorn. These timings can be seen on the Rails Controller dashboard per URL that is accessed .
This section has been moved to Availability severity.
To clarify the priority of issues that relate to GitLab.com's performance you should add the ~performance
label, as well as a "Severity"
label. There are two factors that influence which severity label you should pick:
For strictly performance related work you can use the Controller Timings Overview Grafana dashboard. This dashboard categorises data into three different categories, each with their associated severity label:
~severity::2
~severity::3
~severity::4
This means that if a controller (e.g. UsersController#show
) is in the
"Frequently Used" category you assign it the ~severity::2
label.
For database related timings you can also use the SQL Timings Overview. This is the dashboard primarily used by the Database Team to determine the AP label to use for database related performance work.
Some general notes about parameters that affect database performance, at a very crude level.