GitLab deploys three distinct but interrelated approaches to build data solutions that help drive insights and business decisions. These approaches are complementary to one another and are focused on delivering results at a level of speed, quality, and reliability required by the business, problem being solved, and question being asked. The approaches are complementary and evolutionary in nature, enabling development in an earlier stage to be leveraged in a later stage if required. Data solutions developed at an early stage can be improved and enhanced to a later stage if there is sufficient business need to do so. All analysis follows the well-established Data Analysis Process.
These three approachs "Ad-Hoc","Business Insights", and "Trusted Data".
|Ad-hoc||Business Insights||Trusted Data|
|When To Use||Directional / Urgent Analysis||Routine / Operational Analysis||Mission Critical Analysis|
|Visualization using SiSense||optional||required||required|
|Built Using the Enterprise Dimensional Model||optional||optional||required|
|Registered in the Data Catalog||optional||required||required|
|Follows Trusted Data Development process||optional||optional||required|
|Tested using the Trusted Data Framework||optional||optional||required|
|Auditable w/linkage to source systems||optional||optional||required|
|'Trusted Data' Branded||N/A||N/A||required|
|'Business Insights' Branded||N/A||required||N/A|
Ad-hoc is the typical first step of any analysis effort and results in the delivery of a report or dashboard for one-time or limited use. Ad-hoc development is performed when no existing data solution answers the questions being asked. Code developed for ad-hoc analysis is not written to be leveraged in a long-term solution, rather it is mean to deliver results quickly. To complete ad-hoc analysis, Analysts typically write and run SQL queries versus the Enterprise Data Warehouse, extract data to analyze using tools like Sisense or Python, or perhaps leverage existing dashboards. At times, new data may need to be sourced from text files, spreadsheets, or other data sources.
Most of the time the ad-hoc report solves for the immediate business need and no further action is required. However, sometimes the results of ad-hoc analysis yield results that require additional analysis. And at times, the results of ad-hoc analysis are important enough to warrant developing into a more reliable solution at which point a decision is made to create a Business Insights solution or Trusted Data solution.
Business Insights constitute the majority of solutions where stable and reliable reports are required, but a structured enterprise dimensional model is not available yet. Business Insights solutions serve as the SSOT for their respective metrics and play an important role in the overall reporting landscape.
Business Insights solutions differ from ad-hoc reports because they include quality validations such as data testing, code review, and registration in the Data Catalog. Business Insights solutions may leverage portions of the EDM, but will not be based entirely on it. However, when compared to a Trusted Data Solution, a Business Insights solution lacks complete test coverage and EDM coverage.
Trusted Data delivers the most complete, reliable, and accurate analytics available to an enterprise. Over time as an organization matures and value of analytics increases, Trusted Data evolves and development rigor also evolves, but the core steps remain consistent and include requirements gathering, design, iterative wireframing, testing, and operational monitoring.
All Trusted Data solutions must meet the following criteria:
To make data available for Ad-Hoc Data Development, data is untransformed and made available in the Snowflake
PROD database. The data is made available as an 1 on 1 copy from the source. Sometimes, depending on the source and extraction, data is deduplicated.
There are dedicated schemas available in the
PROD database. The schemas for Ad-Hoc Data Development are prefixed with
RESTRICTED_SAFE_WORKSPACE_ if it contains MNPI data. In order to make data available in the
PROD database schemas,
dbt models are created.
Trusted data is only available on the
PROD database. It follows the EDM methodology. There are dedicated schemas available in the
PROD database for Trusted data. The schemas for Trusted Data Development are prefixed with
RESTRICTED_SAFE_COMMON_ if it contains MNPI data. In order to make data available in the
PROD database schemas,
dbt models are created for transforming the data towards an Enterprise Data Model (fact and dimension tables).
Because Business Insights Data Development is a combination of Ad-Hoc Data Development and Trusted Data Development it will leverage both the schemas mentioned for their particular development methodology.
In any case, data is made available in the
PROD database, throughout multiple schemas. Data can be read out of multiple schemas in Snowflake and Sisense.