If I had a nickel for every time I saw that Data Science Hierarchy of Needs visual in a presentation at a conference, I'd be a gazillionaire (technical term). The pyramid, a nod to Maslow's Hierarchy of Needs, lays out that data science, in it's Machine Learning or Artificial Intelligence forms, has a series of "needs" or requirements that must be met in order to actually output AI.
This visual is great, but I've spent the last couple years working in data, and this visual doesn't capture what I do. ML and AI are attractive subjects to talk about, but the reality for most organizations is that their data teams are incredibly immature and spend the bulk of their time working on analyses. Data organization maturity is made up of many factors; it's not just the details of your machine learning models, the pedigree of your team members, or the headcount of your function. The maturity of your data organization is not something that can be solved by throwing people at the problem.
A mature data organization, first and foremost, is a mature analytics organization.
So, how do you know if you are a mature analytics organization?
There are three tiers of data analysis: reporting, insights, and prediction. As an organization matures in their data analyses, they move through the tiers. This data analysis framework is not focused on all the things your data team will produce, nor does the framework apply to anything outside of data analysis. Things like recommendation engines and predictive analytics are not data analyses; they're a different application of data entirely.
A mature analytics organization is one part of a data function, but it is foundational to a mature data function. Spending an investment in doing analytics right will pay dividends to your data function down the road.
The Briefest History of Data
Before evaluating where data analysis is today, it's important to consider how data got here. Once upon a time, data was impossible to get.
Years ago, SQL was the prerequisite for answering data questions, and those lucky enough to work in an organization that maintained a centralized data warehouse still had to navigate delicate databases easily waylaid by a bad query.
Data analysts were the gatekeepers of data. Anything that was needed— from a pretty chart for a stakeholder meeting or a spreadsheet produced so business or financial analysts could further dig into the data – had to go through a data analyst.
In a world where, knowledge workers are making thousands of decisions a day, we cannot let data live behind the gates. Business leaders have recognized this and are investing in building out data teams whose responsibility it is to democratize data in their organizations. Data teams are investments in your organization, but they can only provide a return if they mature; and the first step is through reporting.
Reporting
Reporting is the straightforward, simplistic asking and answering of questions. The answers to these simple questions give an idea of what data is needed, but doesn’t allow for the standardization, collection, or tracking of data.
When you have no answers, you never get beyond looking for facts. Example reporting questions are:
- How many new users visited our e-commerce site last week?
- How many leads did we capture this month?
- How many MRs were merged this week?
Sometimes, there is no data to answer these questions. This can help identify gaps and drive conversations around the data being collected. When getting data is hard, you never move past reporting.
Today, getting data is easy, at least by comparison. With the rise of analytical data warehouses (at GitLab, we use Snowflake) optimized for columnar analyses and incredibly cheap storage, the barriers to analyses are changing, as are the kinds of questions we want to answer.
Most reporting questions are possible to answer in their recording system of truth:
- You can build a Salesforce dashboard to show you your pipeline for the next quarter.
- You can build a Heap dashboard to show you user retention.
- Even bitmapist— an open-source Mixpanel alternative— comes with off-the-shelf user cohorting.
Data analysts spending their time building analyses that are available in the system of record aren’t adding value, they’re paying tolls: they’re verifying data and getting buy-in from business stakeholders.
Today, the value in data analyses lies in producing insights.
Insights
While reporting analyses are about gathering facts to report on them, insights are about understanding relationships between facts. Deriving insights is a result of combining systems of records, focusing on looking for relationships in the data. This is different from systems informing systems, such as piping account information from Salesforce into Zendesk to see if you’re meeting your Support SLAs; instead it's about producing insights that can only be gathered by combining two data sources into something new.
The GitLab Data team’s net and gross retention analyses are a great example of insights. While subscription information comes from Zuora, our customer accounts— and how they do or don’t roll up into parent accounts— all come from Salesforce. Integrating these two data sources to build out our retention analysis helps inform our Sales and Product teams.
A product manager that knows their engineering team's velocity can better estimate what features will make the next release. A sales team that understands what their inbound marketing pipeline is looking like for next quarter is empowered to better plan their work. It's not enough to know that a particular performance indicator is up or down compared to its target; insights help you understand the why behind the fact.
Answering questions such as these will show the biggest impact and value to your business:
- Which landing pages have the lowest CAC?
- What is the average number of site visits before a user converts?
- What is the MoM user retention in our web application?
Insights are where your data analysts need to be spending their time because insights are where data teams can start providing value. Analysts can only move on to providing insights if they’re not spending all their time building reporting, but accurate reporting is a prerequisite to insights.
A data team that spends all their time producing numbers that already exist for the sole purpose of getting stakeholder buy-in or data tool adoption will quickly find the organization frustrated, as they will not have added new value to the business. Being data-driven means you’ve crossed into a place where decisions are influenced by data, not simply finding data that matches a goal.
Predictions
Mature data analyses are using predictions to help drive the business forward.
A product manager who can estimate the financial impact, both in cost and potential return, of developing a new feature can make a much stronger case for prioritization than a product manager who has a gut feeling and crossed fingers. The same is true throughout the organization. If the Financial Planning and Analysis team can predict revenue, the Support team can predict hiring requirements to support all customers, and the recruiting team can predict what hiring and onboarding timelines look like for those support engineers.
An organization that is empowered with the ability to predict performance through advanced analyses is a data-driven organization; and, because they have reporting in place to track against those predictions, they have the mechanisms to react with when reality differs from those predictions and can adjust appropriately.
How do we mature data teams?
I see you nodding your head in agreement. Hopefully, by now, you've estimated where your team is in this framework, and you're wondering how you can help them move up to the next level.
Invest in your team
Data teams tend to be 2-8% of your organization, and data teams do scale with organization headcount. Your data team will fail if you set them up for failure through understaffing. The company will be frustrated with the team and default to the tools they've always known and loved (spreadsheets - and I hate spreadsheets).
Once you're appropriately staffed, make sure your team is using the right tools, technologies, and processes. At GitLab, we firmly believe in DataOps and that analytics is a subfield of software engineering. Many data analysts are coming from old models where version control, the command line, and checking logs are foreign ideas. Ensure your team is using modern technologies and leveling up along the way.
Empower everyone in your organization with data
Allow all team members to find and build the reporting they need to do their jobs. By empowering them to self-serve the reporting they need, they can gather their own facts and free up the data team to move into the next tier of analysis. Allowing your data team to grow and mature means putting other people in positions to access and analyze the data that they need daily.
Accept that the margin of error is larger on reporting when it's not produced by a member of the data team. It is more important for the data to be directionally correct and accessible than perfect and bottlenecked.
This does require trusting that reporting is facts. Data are not opinion-based. Reporting provides you with the answers and the person or people analyzing can formulate opinions, but reporting itself is not opinionated.
Speed to Value
The sooner there is confidence in data and your data organization through reporting, the sooner your team can start providing value through insights. Part of how we can implement that speed is by leveraging open source analytics. Many data teams are working through the same or similar questions and open sourcing and leveraging things like dbt packages can help minimize the time spent reinventing the reporting wheel.
The best practices of software can help make sure a team maintains their velocity. Through data quality and freshness testing, alerting, and documentation through a tool like dbt, data teams can be proactive rather than reactive, setting them up for better success.
Data is an incredible tool, but the road to maturity can be bumpy. With a strong team, you can create a data driven organization and quickly find yourself seeing the team's value.
Special thanks to Taylor Murphy and Claire Carroll for helping me develop my thoughts on the subject and reading early drafts of this framework.