This guide provides resources for the diagnosing of 5XX errors on GitLab.com. This is used when a user contacts support stating they're receiving either a 500
or 503
error on GitLab.com.
If reports of slowness are received on GitLab.com, first take a look at the GitLab Grafana Monitor, especially:
Worker CPU -> Git CPU Percent
Worker Load -> Git Worker Load
If a customer reports a shared runner running slower than it normally does, it is likely there is a degraded performance happening during the period the customer experienced slowness on the pipeline.
Check the CI Runners Overview graphs where you will find an increase in queue apdex and latency.
Check on the #feed_alerts, #production, and #incident-management Slack channels to ensure this isn't an outage or infrastructure issue.
Before you post to #production or make an issue, here are some helpful ways to capture data that help to narrow down the issue(s):
performance_bar=flamegraph
query parameter to generate a CPU flamegraph.pb
in your browser window. Reload the page and grab the information from the server side.Screenshots from any of these tools will greatly help any engineers looking into the problems.
If our customer is reporting problems connecting to GitLab.com, we should ask for the following:
traceroute gitlab.com
curl http://gitlab.com/cdn-cgi/trace
curl https://gitlab.com/cdn-cgi/trace
curl -svo /dev/null https://gitlab.com
A 503
error on a merge request page may also happen if the repository is corrupted. To confirm, a push to a corrupted repository may show the following:
data/repositories/@hashed/ee/98/ee98b34f343b4e48106fff666d12b61f23f.git/objects/f7/e7f4782) is corrupt
If the customer is reporting a similar error above, take the following steps to verify if their file server was affected:
https://gitlab.com/admin/projects/user-namespace
.gitaly-storage-name
.The following workflows will guide you on how to search Kibana and/or Sentry for the event in our logs that caused a particular 5XX
error.
See the 500-specific section in the Kibana workflow.
See the Sentry workflow.
A video walkthrough of investigating 500 errors using Kibana and Sentry can be seen here (GitLab Unfiltered).
Once results have been found in either Kibana or Sentry, do the following.
In a Priority 1/Severity 1 situation, consider a dev escalation.
If the issue is known it should have a corresponding issue in the GitLab issue tracker. If you found an entry in Sentry that has been converted into an issue, you should see the issue number in the header within Sentry:
Click the issue number to be taken directly to the issue where you can leave a comment to provide a link to the Zendesk ticket.
Then, respond to the user with information about the cause of the issue, provide a link to it, and invite them to subscribe to it for updates.
customer
, priority and severity, and the appropriate DevOps stage.bug
label and any others if needed such as customer
, priority and severity, and the appropriate DevOps stage.Note: If a 5xx error is found in Kibana then there is a high chance that there is also a Sentry issue for it. In those cases, add the
json.correlation_id
filter and search for the value in Sentry withcorrelation_id: