Primary project: https://gitlab.com/meltano/analytics/ Looker project: https://gitlab.com/meltano/looker/ dbt docs for CloudSQL dbt project: https://meltano.gitlab.io/analytics/dbt/cloudsql/#!/overview dbt docs for Snowflake dbt project: https://meltano.gitlab.io/analytics/dbt/snowflake/#!/overview
Analysis usually begins with a question. A stakeholder will ask a question of the data and analytics team by creating an issue in the Looker project using the appropriate template. The analyst assigned to the project may schedule a discussion with the stakeholder(s) to further understand the needs of the analysis. This meeting will allow for analysts to understand the overall goals of the analysis, not just the singular question being asked, and should be recorded. Analysts looking for some place to start the discussion can start by asking:
An analyst will then update the issue to reflect their understanding of the project at hand. This may mean turning an existing issue into a meta issue or an epic. Stakeholders are encouraged to engage on the appropriate issues. The issue then becomes the SSOT for the status of the project, indicating the milestone to which its been assigned and the analyst working on it, among other things. Barring any confidentiality concerns, the issue is also where the final project will be delivered. On delivery, the data and analytics manager will be cc'ed where s/he will provide feedback and/or request changes. When satisfied, s/he will close the issue. If the stakeholder would like to request a change after the issue has been closed, s/he should create a new issue and link to the closed issue.
The Data and Analytics team can be found in the #analytics channel on slack.
The data team currently works in two-week intervals, called milestones. Milestones may be three weeks long if they cover a major holiday or if the majority of the team is on vacation. As work is assigned to a person and a milestone, it gets a weight assigned to it.
NULLpoints belongs to a meta issue, where the sub issues get points.
Ideally, your workflow should be as follows:
WIP:and assign to the project's maintainer once it's ready for merging and further review.
The data team is the first customer of Meltano. Wherever possible, we rely on Meltano for extractors and loaders.
Process for adding a new data source:
SheetLoad is the process by which a GoogleSheet can be ingested into the data warehouse. This is not an ideal solution to get data into the warehouse, but may be the appropriate solution at times.
How to use SheetLoad
To gain access to the data warehouse:
We are in the process of moving from GitLab CI to Airflow.
We are in the process of moving from CloudSQL to Snowflake.
_xfdbt model should be a
BEAM*table, which means it follows the business event analysis & model structure and answer the who, what, where, when, how many, why, and how question combinations that measure the business.
source table- (can also be called
raw table) table coming directly from data source as configured by the manifest. It is stored directly in a schema that indicates its original data source, e.g.
base models- the only dbt models that reference the source table; base models have minimal transformational logic (usually limited to filtering out rows with data integrity issues or actively flagged not for analysis and renaming columns for easier analysis); can be found in
analyticsschema; is used in
end-user models- dbt models used for analysis. The final version of a model will likely be indicated with an
_xfsuffix when it’s goal is to be a
BEAM*table. It should follow the business event analysis & model structure and answer the who, what, where, when, how many, why, and how question combinations that measure the business.
Looker is GitLab's data visualization tool. Many modern data visualization tools require analysts to write SQL queries; Looker's unique advantage lies in their modeling layer that allows non-technical end users to build their own data analysis in a drag and drop user interface. While this means that the initial configuration (e.g., setting up a new data source) takes longer than just querying a table would be, once the initial configuration is done, you have a new data set- a Looker explore- available for all users to take advantage of. The data team aims for Looker to be the SSOT for all of GitLab.
To get initial Looker Access, please create a new access issue following Security's procedures in this project. There are multiple levels of user access: View-Only, Explorer, and Developer. View-only users have access to consume all existing visualizations and reporting. Explorers have the ability to manipulate Looker explores (data sets) to build their own looks and dashboards (analyses). Developers have the ability to edit LookML, Looker's proprietary modeling language.
Users coming from Redash or another query-based data visualization tool, especially those with a strong familiarity with SQL, may find themselves uniquely frustrated at how long it can take to answer a "simple" question when doing so aims to take advantage of new data. Any new data sources need to be brought into the data warehouse, modeled in dbt following the team's dbt coding conventions, modeled in a LookML view, and added to a new or existing explore before an analysis can be built on top of it. While this initial configuration might seem like a bit of a slog, it moved all of the analyses configuration to be an up-front responsibility, making the explore you build usable not just for you, but for future users with related questions too.
_xfview, be sure to limit explores already referencing that view with sets, so as not to accidentally clutter existing explores with new/irrelevant data.
Technical Account Managershould be
Technical Account Manager (TAM). This is to make it easier for users to search for fields within Looker and avoids ambiguity.