We categorize performance into 3 facets
Backend performance is scoped to response time of API, Controllers and command line interfaces (e.g. git).
DRI: Christopher Lefelhocz, VP of Development.
Frontend performance is scoped to response time of the visible pages and UI components of GitLab.
DRI: Christopher Lefelhocz, VP of Development.
Infrastructure performance is scoped to the performance of GitLab SaaS Infrastructure.
DRI: Steve Loyd, VP of Infrastructure.
Meta issue to track various issues listed here is at on the infrastructure tracker.
Performance of GitLab and GitLab.com is ultimately about the user experience. As also described in the product management handbook, "faster applications are better applications".
Our current focus at the moment are two indicators:
On a mid term we target to focus on all of the Web Vitals with introducing also a bigger focus on First Input delay (FID) and Cumulative Layout Shift (CLS). So if routes are already performing well with our main indicators please extend optimisations on those.
There are many other performance metrics that can be useful in analyzing and prioritizing work, some of those are discussed in the sections below. But the user experienced LCP is the target for the site as a whole, and should be what everything ties back to in the end.
Groups should monitor closely the user experience in regards of performance to also improve the perceived performance also outside those measured performance indicators. For example if any action after loading is very slow and takes a lot of time.
Every end-user performance metric we can, through sitespeed.io by having automatic runs every 4 hours. Any data we collect can be helpful to provide us to analyze for improvements or bottle necks on specific routes. We are sending the data to an graphite instance for continous data storage which is used for all Grafana dashboards. On top of that we also save full reports (links are visible by activating the
Runs toggle on a sitespeed dashboard) to have more insight data, slow motion data, HAR files and full Lighthouse reports.
We currently measure with an empty cache, the connection limited to
Cable and a medium CPU based machine which is located in
us-central every 4 hours.
The URLs from GitLab.com listed in the table below form the basis for measuring performance improvements - these are heavy use cases. The times indicate time passed from web request to "the average time at which visible parts of the page are displayed" (per the definition of Speed Index). Since the "user" of these URLs is a controlled entity in this case, it represents an external measure of our previous performance metric "Speed Index".
|Issue List: GitLab FOSS Issue List||2872||1197||-||N/A|
|Issue List: GitLab Issue List||1581|
|Issue: GitLab FOSS #4058||2414||1332||1954|
|Issue Boards: GitLab FOSS repo boards||3295||1773||-||N/A|
|Issue Boards: GitLab repo boards||2619|
|Merge request: GitLab FOSS !9546||27644||2450||1937|
|Pipelines: GitLab FOSS pipelines||1965||4098||-||N/A|
|Pipelines: GitLab pipelines||4289|
|Pipeline: GitLab FOSS pipeline 9360254||4131||2672||2546|
|Project: GitLab FOSS project||3909||1863||-||N/A|
|Project: GitLab project||1533|
|Repository: GitLab FOSS Repository||3149||1571||-||N/A|
|Repository: GitLab Repository||1867|
|Single File: GitLab FOSS Single File Repository||2000||1292||-||N/A|
|Single File: GitLab Single File Repository||2012|
|Explore: GitLab explore||2346||1354||1336|
|Snippet: GitLab Snippet 1662597||1681||1082||1378|
*To access the sitespeed grafana dashboards you need to be logged into your Google account
Note: Since this table spans time before and after single-codebase we kept GitLab FOSS pages close to GitLab ones to enable comparisons despite not being exactly the same project.
If you activate the
runs toggle you will have annotations with links to all full reports. Currently we are running measurements every 2 hours.
All items that start with the tachometer () symbol represent a step in the flow that we measure. Wherever possible, the tachometer icon links to the relevant dashboard in our monitoring. Each step in the listing below links back to its corresponding entry in the goals table.
Consider the scenario of a user opening their browser, and surfing to their dashboard by typing
gitlab.com/dashboard, here is what happens:
HTTP queue time.
RootController#index. The round trip time it takes for a request to start in Unicorn and leave Unicorn is what we call
Transaction Timings. RailsController requests are sent to (and data is received from):
gitlab.com/dashboardexample, the controller addresses all three .
Load) when this particular user hits
gitlab.com/dashboard/issues. The number of SQL calls will depend on how many projects the person has, how much may already be in cache, etc.
view timings). In some controllers, data is gathered first after which a view is constructed. In other controllers, data is gathered from within a View, so that the
view timingin those cases includes the time it took to call NFS, PostgreSQL, Redis, etc. And in many cases, both are done.
gitlab.com/dashboard/issues, there are 56 nested / partial views rendered (search for
First Byte - Externalis measured for a hand selected number of URLs using SiteSpeed
defer="true", so they are parsed and executed in the same order as they are called but only after html + css has been rendered.
DOMContentLoadedevent. The new call is for a new URL, and such requests are routed either through the Web or API workers, invoke their respective Rails controllers on the backend, and return the requested files (HTML, JSON, etc). For example, the calendar and activity feeds on a username page
gitlab.com/usernameare two separate AJAX calls, triggered by
First read about the steps in a web request above, then pick up the thread here.
After pushing to a repository, e.g. from the web UI:
git-receive-packprocess (on the workhorse machine) to save the new commit to NFS
git-receive-packfires a git hook to trigger
post-receivehook, and the
git-receive-packprocess passes along details of what was pushed to the repo to the
post-receivehook. More specifically, it passes a list of three items: old revision, new revision, and ref (e.g. tag or branch) name.
post-receivehook to Redis, which is the Sidekiq queue.
Consider the scenario of a user opening their browser, and surfing to their favorite URL on
GitLab.com. The steps are described in the section on "web request". In this table, the steps are measured and goals for improvement are set.
Guide to this table:
# per request: average number of times this step occurs per request. For instance, an average "transaction" may require 0.2 SQL calls, 0.4 git calls, 1 call to cache, and 30 nested views to be built.
p99 Q2-17: the p99 timing (in milliseconds) at the end of Q2, 2017
p99 Now: link to the dashboard that displays the current p99 timing
p99 Q3-17: the target for the p99 timing by the end of Q3, 2017
|Step||# per request||p99 Q2-17||p99 Now||p99 Q3-17 goal||Issue links and impact|
|Lookup IP in DNS||1||~10||?||~10||Use a second DNS provider|
|Browser to Azure LB||1||~10||?||~10|
|BACKEND PROCESSES||Extend monitoring horizon|
|Azure LB to HAProxy||1||~2||?||~2|
|HAProxy SSL with Browser||1||~10||?||~10||Speed up SSL|
|HAProxy to NGINX||1||~2||?||~2|
|NGINX buffers request||1||~10||?||~10|
|NGINX to Workhorse||1||~2||?||~2|
|Workhorse distributes request||1||Adding monitoring to workhorse|
|Workhorse to Unicorn||1||18||10||Adding Unicorns|
|Workhorse to Gitaly||?|
|Workhorse to NFS||?|
|Workhorse to Redis||?|
|Unicorn calls services||1||2500||1000||Allow more GitLab internals monitoring|
|Unicorn Postgres||250||100||Speed up slow queries|
|Unicorn NFS||460||200||Move to Gitaly - sample result|
|Unicorn constructs Views||1500|
|Unicorn makes HTML|
|HTML to Browser|
|Unicorn to Workhorse||1||~2||?||~2|
|Workhorse to NGINX||1||~2||?||~2|
|NGINX to HAProxy||1||~2||?||~2||Compress HTML in NGINX|
|HAProxy to Azure LB||1||~2||?||~2|
|Azure LB to Browser||1||~20||?||~20|
|FIRST BYTE (see note 1)]||1080 - 6347||1000|
|SPEED INDEX (see note 2)||3230 - 14454||2000||Remove inline scripts, Defer script loading when possible, Lazy load images, Set up a CDN for faster asset loading, Use image resizing in CDN|
|Fully Loaded (see note)||6093 - 14003||not specified||Enable webpack code splitting|
Table to be built; merge requests welcome!
For any performance metric, the following modifiers can be applied:
|Internal: the time as measured from inside GitLab.com's infrastructure (the boundary is defined as being at the "network||Azure load balancer" interface).|
Timing history for First Byte are listed in the table below (click on the tachometer icons for current timings). All times are in milliseconds.
|Type||End of Q4-17||Now|
|Issue: GitLab CE #4058||857|
|Merge request: GitLab CE !9546||18673|
|Pipeline: [GitLab CE pipeline 9360254]||1529|
|Repo: GitLab CE repo||1076|
To go a little deeper and measure performance of the application & infrastructure without consideration for frontend and network aspects, we look at "transaction timings" as recorded by Unicorn. These timings can be seen on the Rails Controller dashboard per URL that is accessed .
This section has been moved to Availability severity.
To clarify the priority of issues that relate to GitLab.com's performance you should add the
~performance label, as well as a "Severity"
label. There are two factors that influence which severity label you should pick:
For strictly performance related work you can use the Controller Timings Overview Grafana dashboard. This dashboard categorises data into three different categories, each with their associated severity label:
This means that if a controller (e.g.
UsersController#show) is in the
"Frequently Used" category you assign it the
For database related timings you can also use the SQL Timings Overview. This is the dashboard primarily used by the Database Team to determine the AP label to use for database related performance work.
Some general notes about parameters that affect database performance, at a very crude level.