Gitlab hero border pattern left svg Gitlab hero border pattern right svg


This is to demo work currently done as part of data team's effort to develop EDW using dimensional modeling.

Dimensional modeling

DM is part of the Business Dimensional Lifecycle methodology developed by Ralph Kimball which includes a set of methods, techniques and concepts for use in data warehouse design.

a logical design technique that seeks to present the data in a standard, intuitive framework that allows for high-performance acces

Dimensional Modeling is business process oriented and can be built in 4 steps:

  1. Choose the business process e.g. track monthly revenue
  2. Declare the grain e.g. per customer
  3. Identify the dimensions
  4. Identify the fact

Fact and dimension tables

Dimensional modeling always uses the concepts of facts (measures), and dimensions (context). Facts are typically (but not always) numeric values that can be aggregated, and dimensions are groups of hierarchies and descriptors that define the facts.

In the simpliest version fact table is a central table and is linked to dimensional tables with foreign keys creating a star schema. Star schema with dimensional tables linking to more dimensional tables are called snowflake schemas, multi fact tables schemas are called galaxies.

First iteration in GitLab DWH

OKR: Reporting on both ARR and Customer counts entirely supported in a Kimball Dimensional Warehouse

The graphical schema of dimensional model developed for calculating ARR/ Customer count

classDiagram fct_charges --|> dim_subscriptions fct_charges --|> dim_accounts fct_charges --|> dim_products fct_charges --|> dim_dates dim_accounts --|> dim_customers dim_subscriptions --|> dim_customers fct_invoice_items_agg <|-- fct_charges fct_charges : FK: invoice_item_id fct_charges : FK: account_id fct_charges: FK: product_id fct_charges: FK: subscription_id, fct_charges: FK: effective_start_date_id, fct_charges: FK: effective_end_date_id, fct_charges : PK: charge_id fct_charges: mrr() fct_charges: rate_plan_name() fct_charges: rate_plan_charge_name() fct_invoice_items_agg: FK: charge_id fct_invoice_items_agg: charge_amount_sum() class dim_accounts{ PK: account_id account_name() country() } class dim_accounts{ PK: account_id FK: crm_id account_name() country() } class dim_products{ PK: product_id product_category() } class dim_subscriptions{ PK: subscription_id FK: crm_id subscription_status() } class dim_dates { PK: date_id } class dim_customers { PK: crm_id customer_id }

More on use and conventions of Kimball methodology in Data Team Handbook here

How to interact with the dim/fct tables - building Data Marts

Fct_ and dim_ dbt models are created in COMMON schema indicating that they are not source related and create basis to build data marts.\

In Snowflake they can be found in ANALYTICS_STAGING schema.

The advantage of dimensional schema is ease of querying and building data marts.

Data mart is where the dimensions labels selection and aggregation can happen.

    COALESCE(dim_customers.merged_to_account_id , dim_customers.CRM_ID) AS customer_id,
    dateadd('month',-1,dim_dates.first_day_of_month) as mrr_month,
     INNER JOIN ANALYTICS.ANALYTICS_STAGING.fct_charges ON fct_charges.effective_start_date_id <= dim_dates.date_id
     AND fct_charges.effective_end_date_id > dim_dates.date_id  AND dim_dates.day_of_month=1
    INNER JOIN ANALYTICS.ANALYTICS_STAGING.fct_invoice_items_agg ON fct_charges.charge_id = fct_invoice_items_agg.charge_id
    INNER JOIN ANALYTICS.ANALYTICS_STAGING.dim_subscriptions ON dim_subscriptions.SUBSCRIPTION_ID = fct_charges.subscription_id
    INNER JOIN  ANALYTICS.ANALYTICS_STAGING.dim_customers ON dim_customers.crm_id = dim_subscriptions.crm_id;

Why is it worth using fact and dim tables

How dimensional modeling can improve weaknesses of what we are using now

GIT is a trademark of Software Freedom Conservancy and our use of 'GitLab' is under license