- Science, Technology & Education
- Geneva, Switzerland
- 13,500+ Associated members
Increased visibility and access to research
Improved code quality and documentation
Sitting on the Franco-Swiss border since 1954, CERN is the world’s largest particle physics laboratory. Known as the European Organization for Nuclear Research, CERN scientists and researchers explore the composition of the universe through the lens of fundamental particles. With 22 member states collaborating on their research, CERN’s study of the principle components of matter and how those particles interact is truly a global effort. Communication amongst participating scientists is critical to the proliferation of the organization’s work. In fact, the World Wide Web was born at CERN as a means to address a pressing need for easy information sharing between contributing scientists at research and scholarly institutions across the globe.
With thousands of projects and just as many contributors, the CERN IT department was seeking to have a streamlined solution for code review. Gitlab met the criteria. In addition to having the capacity to get a large number of projects and users up and running in short order, CERN needed a tool that would be easy to adopt for those less experienced with Git.
That ease of use would, in turn, allow researchers and collaborators to comment on restricted projects.
“It’s very useful to have access to the protected branches and the ability for others to just come in and say, ‘Hey guys, I think this is broken’ or ‘I think this could be improved this way, here’s maybe a snippet or a patch, could you please consider this change, etcetera?’ I think that is a good example on how GitLab allows us to collaborate and actually get things done,” said Nacho Barrientos, a CERN systems engineer in the IT department.
CERN chose to move to GitLab for their code hosting needs about three years ago. CERN has long been a strong advocate for open source software, and solutions enabling data sovereignty, so GitLab’s open core self-hosted model was attractive to the organization.
Now, CERN has more than 12,000 users and runs 120,000 CI jobs a month within GitLab.
“It’s clearly a powerful tool to do our operations, code collaboration and record discussions on our development and deployment process. We can do more because we can handle more complex projects. As an individual, I’m able to be involved with several large projects because I can rely on GitLab, and the other development tools that we have deployed around GitLab, to keep track of things. This is my perception as a GitLab user for three years: it’s not that I can do new things, but I can do more because of the efficiency the tool,” said Alex Lossent, Version Control Systems Service Manager, CERN IT department.
CERN self-hosts their GitLab instance and the organization’s scientists cite on-premise capabilities as a bonus in helping them manage their code when compared to other Git repository managers.
“We have this main analysis code on GitLab with millions of lines of code. Each team of physicists also has their own repositories with their specific data analysis. And the on-premise nature of GitLab is really useful because we can access other CERN services, data storage and other information that we wouldn’t have on GitHub,” Lukas Heinrich, a partner physicist currently studying at New York University, explained.
“It's clearly a powerful tool to do our operations, code collaboration and record discussions on our development and deployment process. We can do more because we can handle more complex projects. As an individual, I'm able to be involved with several large projects because I can rely on GitLab, and the other development tools that we have deployed around GitLab, to keep track of things.”Alex LossentVersion Control Systems Service Manager, CERN IT department
CERN researchers and engineers report that GitLab has improved code quality and documentation on their projects. The application has “much improved” the way code is treated within the ATLAS experiment, Heinrich added. Merge requests in GitLab allow for discussion and code review. In turn, according to the NYU PhD candidate, “as reported in a recently published report and scientific article, the quality of code has increased a lot.”
“The tools have also enhanced our ability to release public results and the preserved analysis in parallel. For example, in a recent ATLAS search for supersymmetry I worked on, the analysis was preserved within a week of releasing the public results,” confirmed Giordon Stark, physicist with the Santa Cruz Institute for Particle Physics.
The fluid nature of scientists flowing in and out of CERN projects can make it difficult for researchers transitioning into an ongoing study. However, merge requests make the transition easier. New, or less experienced, researchers can get up to speed faster now that code changes are thoroughly documented in GitLab.
“The significant improvement that we’ve seen is a huge increase in the visibility of what is changing in the code. It’s become easier to track what’s being changed, by whom, and for what purpose. We now have this record of all of the questions that are asked about changes in the code, why changes are made and can more easily identify mistakes,” Lossent said.
“It allows us to build up the analysis code with the confidence that the work is reproducible and testable. Furthermore, the automation of certain tasks, such as the process of ATLAS publication, reduces the chance of human errors and frees up time for us to do more physics,” adds Stark.
CERN’s technical team, which has begun delving into GitLab’s Auto DevOps offerings, also sees an opportunity for reusable research through their adoption of the application.
“Reusable research is a huge topic in all the sciences and revolves around the question of whether or not replicable studies are possible. The technological developments over the last couple of years have enabled this on a completely new level. What’s special about CERN is that our data set is unique. We collide sub-atomic particles, and essentially take pictures of them at a rate of 40 million times a second.”
“Because of this we need to be very sure the results that we put out are well-tested, and that we make maximum use of that data. We not only want to reproduce a result that we published, but also reuse it on the code that we develop for new research,” Heinrich explained.
Source control, containerization and cloud technology have been the main facilitators of reusable research, according to Heinrich. This makes GitLab’s single application an appropriate and attractive environment to build and house CERN’s research projects.
“The entire infrastructure, with continuous integration and container support, makes it possible to have new scientific results based on code that was developed once before. Having an easily accessible record on how the original code was developed makes that much easier. This is why we are using GitLab CI, pipelines and starting with the Auto DevOps tools.
“Oftentimes, a few people leave in the middle of a project, but they produced valuable code and you can reuse their code to answer new scientific questions. In that instance, we use containers a lot,” Heinrich noted. “But we had to make it very user friendly for people to create these container images of their code. We are becoming successful in getting people to do that and GitLab facilitated part of that process.”