Published on June 4, 2015
4 min read
A quick summary of the causes and solutions regarding the GitLab.com outage on 2015-05-29
GitLab.com suffered an outage from 2015-05-29 01:00 to 2015-05-29 02:34 (times in UTC). In this blog post we will discuss what happened, why it took so long to recover the service, and what we are doing to reduce the likelihood and impact of such incidents.
GitLab.com is provided and maintained by the team of GitLab B.V., the company behind GitLab. On 2015-05-02 we performed a major infrastructure upgrade, moving GitLab.com from a single server to a small cluster of servers, consisting of a load balancer (running HAproxy), three workers (NGINX/Unicorn/Sidekiq/gitlab-shell) and a backend server (PostgreSQL/Redis/NFS). This new infrastructure configuration improved the responsiveness of GitLab.com, at the expense of having more moving parts.
GitLab.com is backed up using Amazon EBS snapshots.
To protect against inconsistent snapshots our backup script 'freezes' the filesystem on the backend server with fsfreeze
prior to making EBS snapshots, and 'unfreezes' the filesystem immediately after.
Italic comments below are written with the knowledge of hindsight
Although we cannot explain what went wrong with the backup script it is hard to come to another conclusion that something did go wrong with it.
The length of the outage was caused by insufficient training and documentation for our on-call engineers following the infrastructure upgrade rolled out on May 2nd.
We have removed the freeze/unfreeze steps from our backup script. Because this (theoretically) increases the risk of occasional corrupt backups we have added a second backup strategy for our SQL data. In the future we would like to have automatical validation of our GitLab.com backups.
The day before this incident we decided the training was our most important priority. We have started to do regular operations drills in one-on-one sessions with all of our on-call engineers.
Find out which plan works best for your team
Learn about pricingLearn about what GitLab can do for your team
Talk to an expert