Apr 15, 2019 - Taylor Murphy    

4 Examples of the power of open source analytics

Our Data and Analytics team manager reflects on how open source and radical transparency has benefited analytics work at GitLab.

One of the great parts of working for a company with such a strong open source ethos is that you're able to apply this philosophy to other parts of the company. We on the Data Team have worked hard to embody the values of GitLab, particularly collaboration and transparency.

It starts by defaulting to public for everything. Our primary code repository is public and MIT licensed, meaning anybody can contribute or just take what they find useful. Our code, issues, and documentation are public.

This radical transparency has had several positive side effects

The effect I'm most excited about is having people contribute to our codebase.

When we were migrating to Snowflake for our data warehouse, we needed to convert our SQL code that was specific to PostgreSQL to a Snowflake-compatible format. One of the models in our codebase generates a table of dates and related metadata such as day of year, week of year, quarter, etc. An external contributor, Matthias Wirtz, who had been following our project and the Meltano project, took it upon himself to make the update and create a merge request in our project. We went back and forth a bit with code review and testing, but eventually it was merged and we now rely on this code today!

Another great benefit is that it makes conversations easier within the analytics community.

A key part of our data stack is data build tool, or dbt for short. This is a powerful open source project that makes version controlling and executing SQL code easy. The company behind the project, Fishtown Analytics, hosts a great community on Slack. I've been able to answer basic questions about project structure, documentation, and testing just by linking to our codebase and dbt-generated docs countless times, and the feedback is always positive. We see people who are shocked that we're so open but also appreciative that they can poke around a production codebase with ease.

An additional benefit that we've seen is that by putting everything out in the open we're helping to drive the industry forward.

It's one thing to say "Here's what we're doing, but sorry you can't see the code" versus "Here's what we're doing, here's how we're doing it, and what are your ideas to make it better?" The latter invites people into the conversation to build upon ideas and others' creations.

The last piece I want to highlight is the idea that the actual code that you use for analytics isn't your company's competitive advantage.

You could know exactly how we move, store, model, and analyze our data, and its utility for a competitor would primarily be to get their own analytics off the ground. The real value is the data itself and the decisions people make from the results of your analyses. We, of course, protect our data and our customers' data, but there's no reason why people shouldn't be able to see how we use that data to make decisions. And, being a transparent company, we're very open about the decisions we make as well.

Overall, we're seeing the same transformation that software engineering underwent with the DevOps movement happen in the analytics world, only with about a five-year lag. More open source tools are being created for data teams every day, and more people are sharing how they build their stacks and analyze their data. At GitLab, we're betting that our core values can bring emergent positive benefits to every part of a company, including data teams! We look forward to collaborating with you as this industry changes and grows!

Try all GitLab features - free for 30 days

GitLab is more than just source code management or CI/CD. It is a full software development lifecycle & DevOps tool in a single application.

Try GitLab for Free