GitLab deploys two distinct but interrelated approaches to build data solutions that help drive insights and business decisions. These approaches are complementary to one another and are focused on delivering results at a level of speed, quality, and reliability required by the business, problem being solved, and question being asked. The approaches are complementary and evolutionary in nature, enabling development in an earlier stage to be leveraged in a later stage if required. Data solutions developed at an early stage can be improved and enhanced to a later stage if there is sufficient business need to do so. All analysis follows the well-established Data Analysis Process.
The two approaches are
Ad-hoc development is done in the
WORKSPACE schemas in Snowflake and in the
Ad-hoc folders in Tableau.
Trusted Data development is done in the
SPECIFIC schemas in Snowflake and in the
Trusted Data folders in Tableau.
|When To Use||Prototyping / Directional / Urgent Analysis||Mission Critical Analysis / Operational Analysis|
|Manual adding data||optional||N/A|
|Creating own data structures||optional||N/A|
|Visualization using SiSense||optional||required|
|Built Using the Enterprise Dimensional Model||optional||optional|
|Built Using Data from the
|Registered in the Data Catalog||N/A||required|
|Follows Trusted Data Development process||N/A||required|
|Tested using the Trusted Data Framework||N/A||required|
|Auditable w/linkage to source systems||N/A||required|
Ad-hoc delivers a report or dashboard for one-time or limited use and it can also deliver a prototype, first iteration of a data solution that is not mature enough for a long-term, Trusted Data Solution yet. Ad-hoc development is performed when no existing data solution answers the questions being asked. Code developed for ad-hoc analysis for one-time or limited use is not written to be leveraged in a long-term solution; rather, it is a means to deliver results quickly. Code developed for ad-hoc analysis for prototypes and first iterations of data solutions can be leveraged in a long-term, Trusted Data Solution.
To complete ad-hoc analysis, Analysts typically write and run SQL queries against the Enterprise Data Warehouse and extract data to analyze using tools like Sisense or Python. Analysts and Analytics Engineers can also complete Ad-Hoc analysis using dbt and prototyping data solutions that can be leveraged in long-term, Trusted Data Solutions. At times, new data may need to be sourced from text files, spreadsheets, or other data sources. In many cases, the ad-hoc report solves for the immediate business need and no further action is required. However, sometimes the results of ad-hoc analysis yield results that require additional data modeling or dashboard development. In these cases, a more robust and trustworthy solution can be developed using Trusted Data Development.
Ad-hoc development gives ultimate flexibility to prototype data solutions. If a new data set needs to be explored and new transformations need to be built in a fast-paced, iterative way, the Ad-hoc Data Development approach can be used. Because of its flexible nature, not all ad-hoc data development is suitable to make mission critical decisions. It is often a first step for maturing the data solution. Care and caution should be taken when using ad-hoc data solutions to inform business making decisions.
Ad-hoc development is done either in the
EXPLORATIONAL schemas. The main difference between the two schemas is the permissions set. In the
WORKSPACE schema, users do not have create/insert/update/drop permissions on Snowflake and must follow standard data team processes using dbt to update tables in the data warehouse. In the
EXPLORATIONAL schema, users have create/insert/update/drop permissions on the schema. The permissions in the
EXPLORATIONAL schema are non-standard and would be available to a smaller audience for ad-hoc work.
Trusted Data delivers the most complete, reliable, and accurate analytics available to an enterprise. Over time as an organization matures and value of analytics increases, Trusted Data evolves and development rigor also evolves, but the core steps remain consistent and include requirements gathering, design, iterative wireframing, testing, and operational monitoring. Trusted data solutions differ from ad-hoc reports because they include quality validations such as data testing, code review, and registration in the Data Catalog. Trusted data solutions can be built either in the Enterprise Dimensional Model (EDM) located in the
COMMON schemas or in the
SPECIFIC schema that models application data. Both solutions are trusted due to having data testing, code review, and registration in the Data Catalog.
COMMON solution is suitable to model business processes that are cross-application and cross-functional in nature such as the Order to Cash and Release to Adoption business processes. In those cases, the Kimball methodology allows us to join data together and develop a robust and easy to use star schema for analysis. The
SPECIFIC application data solution is suitable for business processes that are not cross-application and cross-functional in nature such as the NetSuite data source. Overtime, a data model in the
SPECIFIC schema could mature and require a dimensional model in the
COMMON schema for cross-functional and cross-application purposes; however, the data in both schemas is trusted.
All Trusted Data solutions must meet the following criteria:
DRAFT: Under review with Monte Carlo Project) Trusted Data Tests are created and deployed
In both Ad-hoc and Trusted Data development, data is made available in the
PROD database, via multiple schemas. Data can be read out of multiple schemas in Snowflake and Sisense.
To make data available for Ad-Hoc Data Development, data is transformed and made available in the Snowflake
PROD database. This data is available in two different schemas,
WORKSPACE schemas for Ad-Hoc Data Development are prefixed with
RESTRICTED_SAFE_WORKSPACE_ if it contains MNPI data. In order to make data available in the
PROD database schemas,
dbt models are created.
EXPLORATIONAL schemas for Ad-Hoc Data Development are prefixed with
RESTRICTED_SAFE_EXPLORATIONAL_ if it contains MNPI data. In the schemas, users have read and write permissions in order to create tables, add columns and prototype data solutions. The schemas are setup up on departmental level and access is not provisioned on a lower grain than schema level. Functional ownership of the schemas resides with the departmental VP (or equivalent). This means that the VP needs to provide approval in case of an Access Request and carries the responsibility of proper usage of the data in the schema (i.e. in case of MNPI, PII and sensitive data).
EXPLORATIONAL schemas are the least governed data schemas, which gives ultimate flexibility. Advise to use these schemas on a need-to-use base.
Trusted data is only available on the
PROD database. It follows the EDM methodology or the Specific application methodology. There are dedicated schemas available in the
PROD database for Trusted data. The schemas for Trusted Data Development are prefixed with
SPECIFIC; or prefixed with
RESTRICTED_SAFE_SPECIFIC if it contains MNPI data. In order to make data available in the
PROD database schemas,
dbt models are created for transforming the data towards an Enterprise Data Model (fact and dimension tables) or Specific application tables.