Published on April 15, 2019
3 min read
Our Data and Analytics team manager reflects on how open source and radical transparency has benefited analytics work at GitLab.
One of the great parts of working for a company with such a strong open source ethos is that you're able to apply this philosophy to other parts of the company. We on the Data Team have worked hard to embody the values of GitLab, particularly collaboration and transparency.
It starts by defaulting to public for everything. Our primary code repository is public and MIT licensed, meaning anybody can contribute or just take what they find useful. Our code, issues, and documentation are public.
When we were migrating to Snowflake for our data warehouse, we needed to convert our SQL code that was specific to PostgreSQL to a Snowflake-compatible format. One of the models in our codebase generates a table of dates and related metadata such as day of year, week of year, quarter, etc. An external contributor, Matthias Wirtz, who had been following our project and the Meltano project, took it upon himself to make the update and create a merge request in our project. We went back and forth a bit with code review and testing, but eventually it was merged and we now rely on this code today!
A key part of our data stack is data build tool, or dbt for short. This is a powerful open source project that makes version controlling and executing SQL code easy. The company behind the project, Fishtown Analytics, hosts a great community on Slack. I've been able to answer basic questions about project structure, documentation, and testing just by linking to our codebase and dbt-generated docs countless times, and the feedback is always positive. We see people who are shocked that we're so open but also appreciative that they can poke around a production codebase with ease.
It's one thing to say "Here's what we're doing, but sorry you can't see the code" versus "Here's what we're doing, here's how we're doing it, and what are your ideas to make it better?" The latter invites people into the conversation to build upon ideas and others' creations.
You could know exactly how we move, store, model, and analyze our data, and its utility for a competitor would primarily be to get their own analytics off the ground. The real value is the data itself and the decisions people make from the results of your analyses. We, of course, protect our data and our customers' data, but there's no reason why people shouldn't be able to see how we use that data to make decisions. And, being a transparent company, we're very open about the decisions we make as well.
Overall, we're seeing the same transformation that software engineering underwent with the DevOps movement happen in the analytics world, only with about a five-year lag. More open source tools are being created for data teams every day, and more people are sharing how they build their stacks and analyze their data. At GitLab, we're betting that our core values can bring emergent positive benefits to every part of a company, including data teams! We look forward to collaborating with you as this industry changes and grows!
Find out which plan works best for your team
Learn about pricingLearn about what GitLab can do for your team
Talk to an expert