This strategy is a work in progress, and everyone can contribute. Please comment and contribute in the linked issues and epics. Sharing your feedback directly on GitLab.com is the best way to contribute to our strategy and vision.
Gitaly is responsible for the storage of Git repositories. It is an RPC service for handling all the Git calls made by GitLab. Praefect is router and transaction manager for Gitaly. It sits between the GitLab application and Gitaly, routing requests to an available up-to-date Gitaly replica in high availability configurations.
Before mid-2018, the GitLab application relied on direct disk access to Git repositories, performing Git operations with either Rugged (libgit2 wrapper) or by shelling out to Git directly. At scale, this meant using NFS to make the repositories available to every application server. NFS adds latency and has opaque failure modes which are hard to debug in production. Furthermore, using multiple interfaces for Git makes instrumentation and caching difficult.
In late-2016 GitLab began building Gitaly, a gRPC service that would become the interface through which the GitLab application interacts with Git repositories, and in mid 2018 GitLab completed this process for GitLab.com and unmounted NFS from GitLab.com application servers.
Systems Administrators directly interact with Gitaly when installing, configuring, and managing a GitLab server, particularly when high availability is a requirement. Today systems administrator must create and manage an NFS cluster to configure a high availability GitLab instance, and manual manage the failover to new Gitaly nodes mounted on the same NFS cluster. Once a HA Gitaly reaches minimal viability, it will be possible to eliminate the NFS cluster from architecture and rely on Gitaly for replication. At HA Gitaly continues to mature, automatic failover, automatic Gitaly node rebalancing and horizontal scaling read access across replicas will deliver 99.999% uptime (five 9's) and improved performance without regular intervention. Systems Administrators will have fewer applications to manage as other version control systems are retired as the last projects are migrated to GitLab.
Developers will benefit from increasing performance for repositories of all shapes and sizes, on the command line and in the GitLab application as performance improvements continue. Once support for monolithic repositories reaches minimal and continues maturing, developers will no longer be split between Git and legacy version control systems, as projects consolidate increasingly on Git. Developers that heavily use binary assets, like Game Developers, will at long last be able to switch to Git and eliminate Git LFS by adopting native large file support in Git.
Gitaly is responsible for access to, and the availability of Git repositories, and the performance of Gitaly directly influences the experience of using GitLab. This includes performing code reviews, browsing repositories, the speed to CI jobs, and the performance of push and fetch Git operations. The performance of Gitaly is reliably good in many situations, but poor disk performance, very large repositories, poor Git access patterns are a problem (GitLab is working to address known performance regressions when using NFS, which are exacerbated by bad access Git patterns). Many exciting opportunities to significantly improve performance exist through improving how we use Git (configuration), improving Git, implementing features like deduplicated forks, caching and improving Git access patterns. Performance improvements to Gitaly benefit both the Git interface and GitLab application. Native support for high availability will also allow horizontally scaling Git read operations for better distributed CPU usage and further performance improvements.
The performance and availability of Gitaly is matter of importance for GitLab Administrators who are responsible to their organizations for the performance and availability of GitLab, of which Gitaly is a critical component. The inability to access Git repositories on a GitLab server is an outage event, and for a large instance would prevent thousands of people from doing their job. Today Gitaly depends on external systems, like NFS, to achieve high availability, but in the future Gitaly will be natively highly available, replicating repositories to many Gitaly nodes and will be able to recover automatically from node and repository level failures automatically preventing extended outages caused by disk failures, server failures, or zone outages.
Git is the market leading Version Control System (VCS), but many organizations with extremely large projects continue to use centralized version control systems like CVS, SVN, and Perforce. Many of these smae organizations also use Git for many of their projects, but have been unable to standardize on Git for these extremely large repositories. Gitaly and GitLab will make it possible to standardize on Git for extremely large repositories with native support for monolithic repositories and native large file support (eliminating the need for Git LFS), allow organizations to consolidate on one VCS: Git.
Currently there is no way to run GitLab in a HA configuration without NFS. This is a point of frustration for instance adminstrators, and a performance problem.
In progress: High Availability Git storage (Strong Consistency)
When a developer pushes changes to GitLab, if a success signal is returned, GitLab should have more than one copy of this data to prevent data loss. Strong consistency is the highest priority after shipping the eventually consistent MVC.
Large instances, like GitLab.com, require multiple Gitaly shards. Balancing resource utilization and storage across shards improves performance, and makes it easier to scale.
When running GitLab in a HA configuration, particularly once Strong Consistency is implemented, multiple Gitaly nodes will be able to service read requests with an up to date copy of the repository. Dsitributing read operations across up to date replicas allows better resource utilization and scaling patterns.
GitLab is supporting the direction of the Git project to address to performance problems of working with extremely large projects through partial clone and promisor packfiles. We also want to add native large file support to Git. We have been supporting this work in the Git project for quite a while and it is close to reaching a point where it can be used.
We do not want to split our attention between Microsoft's VFS for Git protocol and the native Git implementation, nor do we want to build support for a feature that is not in mainline Git, and requires custom driver/kernel extensions. We prefer boring solutions, like using native Git and supporting it's direction.
Gitaly is a non-marketable category, and is therefore not assigned a maturity level.
Customers and prospects evaluating GitLab (GitLab.com and self hosted) benchmark GitLab's performance against GitHub.com, including Git performance. The Git performance of GitLab.com for easily benchmarked operations like cloning, fetching and pushing, show that GitLab.com similar to GitHub.com. When comparing GitHub Enterprise to a self-hosted GitLab instance, it is important to compare like to like configurations, particularly the use of NFS. This is because NFS is known to significantly reduce Git performance. Gitaly is planned to provide high availability without NFS in 2020, providing both high performance and high availability. GitHub Enterprise does not currently offer true high availability.
Perforce competes with GitLab primarily on it's ability to support enormous repositories, either from binary files or monolithic repositories with extremely large numbers of files and history. This competitive advantage comes naturally from it's centralized design which means only the files immediately needed by the user are downloaded. Given sufficient support in Git for partial clone, and sufficient performance in GitLab for enormous repositories, existing customers are waiting to migrate to GitLab.
The version control systems market is expected to be valued at close to US$550mn in the year 2021 and is estimated to reach US$971.8md by 2027 according to Future Market Insights which is broadly consistent with revenue estimates of GitHub ($250mn ARR) and Perforce ($130mn ARR). The opportunity for GitLab to grow with the market, and grow it's share of the version control market is significant.
Git is the market leading version control system, demonstrated by the 2018 Stack Overflow Developer Survey where over 88% of respondents use Git. Although there are alternatives to Git, Git remains dominant in open source software, usage by developers continues to grow, it installed by default on macOS and Linux, and the project itself continues to adapt to meet the needs of larger projects and enterprise customers who are adopting Git, like the Microsoft Windows project.
According to a 2016 Bitrise survey of mobile app developers, 62% of apps hosted by SaaS provider were hosted in GitHub, and 95% of apps are hosted in by a SaaS provider. These numbers provide an incomplete view of the industry, but broadly represent the large opportunity for growth in SaaS hosting on GitLab.com, and in self hosted where GitLab is already very successful.
Users do not see Gitaly as a distinct feature or interface of GitLab. Git performance is the most significant user facing area where improvements are frequently requested, however the source of the performance problem can vary significantly.