Data & Analytics

On this page

Primary project: https://gitlab.com/meltano/analytics/ Looker project: https://gitlab.com/meltano/looker/ dbt docs for CloudSQL dbt project: https://meltano.gitlab.io/analytics/dbt/cloudsql/#!/overview dbt docs for Snowflake dbt project: https://meltano.gitlab.io/analytics/dbt/snowflake/#!/overview

The Data Analysis Process

Analysis usually begins with a question. A stakeholder will ask a question of the data and analytics team by creating an issue in the Looker project using the appropriate template. The analyst assigned to the project may schedule a discussion with the stakeholder(s) to further understand the needs of the analysis. This meeting will allow for analysts to understand the overall goals of the analysis, not just the singular question being asked, and should be recorded. Analysts looking for some place to start the discussion can start by asking:

An analyst will then update the issue to reflect their understanding of the project at hand. This may mean turning an existing issue into a meta issue or an epic. Stakeholders are encouraged to engage on the appropriate issues. The issue then becomes the SSOT for the status of the project, indicating the milestone to which its been assigned and the analyst working on it, among other things. Barring any confidentiality concerns, the issue is also where the final project will be delivered. On delivery, the data and analytics manager will be cc'ed where s/he will provide feedback and/or request changes. When satisfied, s/he will close the issue. If the stakeholder would like to request a change after the issue has been closed, s/he should create a new issue and link to the closed issue.

The Data and Analytics team can be found in the #analytics channel on slack.

Getting Work Done

The data team currently works in two-week intervals, called milestones. Milestones may be three weeks long if they cover a major holiday or if the majority of the team is on vacation. As work is assigned to a person and a milestone, it gets a weight assigned to it.

Notes on Pointing process

Tips and Tricks about working in the Analytics Project

Ideally, your workflow should be as follows:

  1. Create an issue
  2. Open an MR from the issue using the "Create merge request" button. This automatically creates a unique branch based on the issue name. This marks the issue for closure once the MR is merged.
  3. Push your work to the branch
  4. Have it reviewed by a peer
  5. Remove WIP: and assign to the project's maintainer once it's ready for merging and further review.

Other tips:

Extract and Load

The data team is the first customer of Meltano. Wherever possible, we rely on Meltano for extractors and loaders.

Adding new Data Sources

Process for adding a new data source:

Using SheetLoad

SheetLoad is the process by which a GoogleSheet can be ingested into the data warehouse. This is not an ideal solution to get data into the warehouse, but may be the appropriate solution at times.

How to use SheetLoad

  1. Add file to SheetLoad Google Drive Folder with appropriate naming convention, described in detail below
  2. Share the sheet with with the SheetLoader runner => Doc with email (GitLab internal)
  3. Add the full file name to the extract-ci.yml file
  4. Create dbt base models
  5. Add to data quality test that helps ensure these files are updated monthly.

Warehouse Access

To gain access to the data warehouse:

Orchestration

We are in the process of moving from GitLab CI to Airflow.

Database

We are in the process of moving from CloudSQL to Snowflake.

Transformation- dbt

Tips and Tricks about Working with dbt

Visualization- Looker

Looker is GitLab's data visualization tool. Many modern data visualization tools require analysts to write SQL queries; Looker's unique advantage lies in their modeling layer that allows non-technical end users to build their own data analysis in a drag and drop user interface. While this means that the initial configuration (e.g., setting up a new data source) takes longer than just querying a table would be, once the initial configuration is done, you have a new data set- a Looker explore- available for all users to take advantage of. The data team aims for Looker to be the SSOT for all of GitLab.

Getting Looker Access

To get initial Looker Access, please create a new access issue following Security's procedures in this project. There are multiple levels of user access: View-Only, Explorer, and Developer. View-only users have access to consume all existing visualizations and reporting. Explorers have the ability to manipulate Looker explores (data sets) to build their own looks and dashboards (analyses). Developers have the ability to edit LookML, Looker's proprietary modeling language.

Getting Started with Looker- A special note for users coming from Redash

Users coming from Redash or another query-based data visualization tool, especially those with a strong familiarity with SQL, may find themselves uniquely frustrated at how long it can take to answer a "simple" question when doing so aims to take advantage of new data. Any new data sources need to be brought into the data warehouse, modeled in dbt following the team's dbt coding conventions, modeled in a LookML view, and added to a new or existing explore before an analysis can be built on top of it. While this initial configuration might seem like a bit of a slog, it moved all of the analyses configuration to be an up-front responsibility, making the explore you build usable not just for you, but for future users with related questions too.

Tips and Tricks about Working in Looker