This page is intended to be a cheat sheet or quick start guide to dbt for functional analysts. There is a wealth of documentation available in the Data Team dbt Guide, but it can be hard to sift through for folks who are newer to analytics engineering (ex: functional analysts).
This page aggregates existing resources in order to guide you through the basic steps required to create and test an MR to update a model in dbt.
First things first, you need to get set up with dbt on your local machine. Please read the Configuration section of this handbook page for in-depth instructions. (Pro tip: read the entire section first, then start following the directions). As the section mentions, much of what you need is handled by running the onboarding script.
profiles.yml
file? Go to your home directory and hit command
+ shift
+
.
. This will expose your hidden files (ones that have a period as the first character) and reveal the file you need to update.user:
This is your GitLab email address (ex: JSMITH@GITLAB.COM
)role:
This is your Snowflake role, usually first initial + last name (ex: JSMITH
)database:
This is your first initial + last name (ex: JSMITH
)DEV_XS
and one for DEV_XL
)To do testing locally, you have your own PROD and PREP databases available for development.
Because of the large data volumes (especially the product models), we strongly recommend
that you clone any required dependencies for the model(s) that you are building.
DO NOT build these using dbt run
, you should only do that for models that you have changed.
If you build these using dbt run
, it will take a really long time and consume a lot of resources
because it completely builds the models from scratch.
Here are a few options for cloning dependencies in Snowflake. In these examples, pretend that
your role is JSMITH
and your development databases are JSMITH_PROD
and JSMITH_PREP
.
This will clone a single model from the production database into your development database. Cloning a single model only takes a few seconds.
CREATE OR REPLACE TRANSIENT TABLE {DEV_DATABASE}.{SCHEMA_NAME}.{TABLE_NAME}
CLONE {PROD_DATABASE}.{SCHEMA_NAME}.{TABLE_NAME}
;
--Example
CREATE OR REPLACE TRANSIENT TABLE JSMITH_PROD.common.dim_ping_metric
CLONE PROD.common.dim_ping_metric
;
Note: the schema must already exist in your development database in order to clone the model
CREATE SCHEMA IF NOT EXISTS {DEV_DATABASE}.{SCHEMA_NAME};
--Example
CREATE SCHEMA IF NOT EXISTS JSMITH_PROD.COMMON;
This will clone all models in a production schema into a schema in your development database. Cloning an entire schema can take several minutes.
CREATE OR REPLACE SCHEMA {DEV_DATABASE}.{SCHEMA_NAME}
CLONE {PROD_DATABASE}.{SCHEMA_NAME}
;
--Example
CREATE OR REPLACE SCHEMA JSMITH_PROD.COMMON
CLONE PROD.COMMON
;
Once you make the desired changes to the model(s) you are working on, you need to build
the model(s) locally to make sure the build succeeds and do local testing. To do this,
you can use the dbt run
command. Here
are some other useful resources and tips:
+
and @
to refer to upstream or downstream modelstarget
(warehouse connection)$ dbt run --select my_model # run my_model
$ dbt run --select my_model my_model_2 # run my_model and my_model_2
$ dbt run --select my_model+ # run my_model and all children
$ dbt run --select +my_model # run my_model and all parents
$ dbt run --select +my_model+ # run my_model, all of its parents, and all of its children
$ dbt run --select @my_model # run my_model, all parents, all children, AND all parents of all children
$ dbt run --select my_model+ --exclude my_model_2 # run my_model and all children EXCEPT my_model_2
$ dbt run --select my_model --target dev_xl # run my_model using an XL warehouse (targets defined in profiles.yml)
Testing models in dbt just requires you to run dbt test
. dbt test
uses the same syntax
(ex: --select
) as dbt run
$ dbt test --select my_model # run custom tests on my_model
Once you are running dbt, linting a model can be done with a single command. Please read the SQLFluff section of the SQL Style Guide for instructions on how to install SQLFluff on your local machine and more more details about the linter settings. When you run the linter, the results will be printed in your terminal.
$ sqlfluff lint models/path/to/file/file-to-lint.sql
Here is the agenda from a dbt working session with the Data team and functional analysts. You can view the recording here, the live demo of updating and testing a model starts at ~30:00.
The Data team has a well-documented MR workflow.
Make sure you use the appropriate MR template based on your use case. The templates also have detailed instructions on testing, pipelines, etc. For that vast majority of use cases, you will use one of the following templates:
workspace_product
)Please see the Data Team CI Jobs handbook page for the most up-to-date information and details about the different CI jobs.
There are 2 different ways to kick off a job, either from the MR pipelines page or on a specific pipeline page
The first thing you will need to do is clone the dependencies required for your model. There are a couple of ways to do this, but the fastest, preferred method is to use ⚙️ dbt Run > 🔆⚡️clone_model_dbt_select.
⚙️ dbt Run > 🔆⚡️clone_model_dbt_select requires you to pass which models to clone using
the DBT_MODELS
variable.
DBT_MODELS
+[changed_model_name]
(this will clone all the specified model and all its parents)Other options for cloning models are:
SCHEMA_NAME
variableOnce the dependencies are cloned, you can actually build the model you are changing. To do this, use ⚙️ dbt Run > 🏗🛺️run_changed_models_sql.
⚙️ dbt Run > 🏗🛺️run_changed_models_sql will run all .sql models in the MR diff where the SQL
has been edited. Please note that there are actually 3 versions of the run_changed_models_sql
job, each on a different sized warehouse. Select the job that is best suited for your use
case, keeping in mind the time it will take to run the models and the cost of running a larger
warehouse.
While there are not any required variables for this CI job, there are 1-2 that will be helpful in speeding up the job and testing the upstream/downstream models.
REFRESH
with a single space ` ` as the value. (This satisfies the
vast majority of use cases. It would only be on a MR relating to an incremental model when
you would not want to pass this variable)
REFRESH
DEPENDENT_TYPE
with a value of +
will build the
changed model in addition to all downstream dependencies. (This satisfies the vast majority
of use cases where you need to test upstream/downstream dependencies). See more options/details
in the Data team documentation
DEPENDENT_TYPE
+
Sometimes you need to build a single model when re-running the pipeline or in testing MRs that do not change any SQL files (ex: updates to dbt docs). In this case, you want to use ⚙️ dbt Run > 🐭specify_model.
Like run_changed_models_sql
, there are different versions of ⚙️ dbt Run > 🐭specify_model,
each using a different sized warehouse. Again, you should select the job that is best suited
for your use case, keeping in mind the time it will take to run the models and the cost of
running a larger warehouse.
This job requires a single variable, DBT_MODELS
to specify which model you want to build.
DBT_MODELS
[model_name]
Once the models are rebuilt, you will want to do testing against the new version of the models. The CI jobs above clone and build models in a database specific to the MR branch. Functional analysts do not automatically gain access to these databases. In order to grant access, you need to run ❄️ Snowflake > 🔑grant_clones.
IMPORTANT
This job requires a single variable, GRANT_TO_ROLE
to specify the Snowflake role you want
to grant SELECT access to. Snowflake roles are usually first initial + last name (ex: Jane
Smith's role would be JSMITH
). There is an exhaustive list of roles in roles.yml.
GRANT_TO_ROLE
[SNOWFLAKE_ROLE]
In order to facilitate self-service on requests to add a new table or column to the Postgres/GitLab.com pipeline, the Data team created a runbook and videos to outline the required steps:
Before assigning any reviewers, make sure that you go through the checklist in the MR templates. This will ensure that you completed all required steps before a member of the Data team starts reviewing the code.
In the Auditing
section, make sure that all queries refer to the MR branch database. (See
instructions on how to gain access to the MR database here).
Other team members (ex: Analytics Engineers) reviewing the MR will also have access to the
MR database and that is how they will review and validate the changes.
Once the MR is ready for review, tag code owners and assign as reviewers (you will see code owners listed in the approval section of the MR).
Once the MR is approved by the required number of code owners (as specified in the approval section of the MR), you can assign to a maintainer for final review and merge.