GitLab as a company faces a challenge shared by many — we have lots of data for our engineering organization (via GitLab, our single data store for that part of the company), but there are key gaps in how we understand the effectiveness of business operations. Meltano was created to help fill the gaps by expanding the common data store to support Customer Success, Customer Support, Product teams, and Sales and Marketing.
What is Meltano?
Meltano aims to be a complete solution for data teams — the name stands for model, extract, load, transform, analyze, notebook, orchestrate — in other words, the data science lifecycle. While this might sound familiar if you're already a fan of GitLab, Meltano is a separate product. Rather than wrapping Meltano into GitLab, Meltano will be the complete package for data people, whereas GitLab is the complete package for software developers.
What problem does it solve?
The GitLab Data and Analytics team is charged with getting data from our external sources, presenting it in a usable format to business users across the company, and eventually making predictions from the data. As is the case with many data teams, we currently do this with a series of steps and separate tools, and we're not yet at the level of process and stability that is commonplace in software development. The idea of bringing best practices from software development to data analytics is a huge draw for the Data team at GitLab. Ideally, all of our work could be done in open source tools, and could be version controlled, and we’d be able to track the state of the analytics pipeline from raw data to visualization.
The endgame for Meltano involves making analytics accessible to everyone, not just professional data wranglers. GitLab Data Analyst Emilie Burke explains a common scenario: "There are whole swathes of small and medium size companies that don’t really have data and analytics because they don’t have engineers on their team. The reports that they get are through whatever tools they are using. When they’re dependent on these siloed data sources, you can’t track cross-functional efforts. For example, if you’re doing a giveaway, you might see a bunch of new email signups piping into Mailchimp. But you won't be able to see if those users are then buying things in Shopify. Unless there's a native integration, you can’t relate that data to any other data source."
Managing the integrations you currently have comes with its own challenges. Senior Product Manager Joshua Lambert shares, "The difficulty of hooking up Salesforce and Marketo to see if a marketing campaign was successful is non-trivial. Often money is spent and the question is, 'Was it worth it?'" As an open source tool, we think Meltano will make a big difference for teams without much money to invest in data analytics. It's a new field for many organizations, and we want to do everything we can to make it easier for teams and business to access their data and make better decisions. We talked more about this during a recent Q&A, which you can watch below.
How can I contribute?
Meltano is open source! You can check out the plan for an MVC here. There are many different areas where people can contribute, including Meltano Analysis (the UI), Extractors, and Loaders. Meltano currently only supports Postgres (with Snowflake on the way!), but will need to support many different database types, so any contributions writing Loaders for one of those would be very welcome. You can make requests by opening an issue and labeling it
Readers are also extremely welcome to check out the Data team's work and suggest ways we can improve! We know some aspects of how we do analytics and data science are not where they should be. If you don’t think we’re using the right primitives or going about something the wrong way, we’re all ears!
How can I keep up with the Data Team and Meltano?
The best way to get in touch about Meltano or the Data team is to open an issue! We also publish all of our team calls and working sessions on our brand new YouTube channel, and you can learn more about the team, view our work in GitLab, and follow us on social:
- Jacob Schatz, Staff Developer, Meltano
- Yannis Roussos, Senior Developer, Meltano Specialist
- Alex Zamai, Developer, Meltano
- Micaël Bergeron, Developer, Meltano
- Joshua Lambert, Senior Product Manager, Package, Monitor, Distribution
- Taylor A. Murphy, PhD, Manager, Data and Analytics
- Emilie Schario, Data Analyst
- Thomas La Piana, Data Engineer
- Chase Wright, Finance Operations and Planning
Emily von Hoffmann contributed to this post.
Photo by Jefferson Santos on Unsplash
“Learn more about Meltano, an open source tool for the data science lifecycle from @gitlab” – Jacob Schatz and Taylor A. Murphy, PhD
Click to tweet