Sisense is our enterprise standard data visualization application and is the only application approved for connecting to our Enterprise Data Warehouse.
Everyone at GitLab has View-only access to Sisense. Log in using Okta. If you need elevated access, such as Editor permissions to create your own charts, create an access request. Also see the user roles section
The goal of this section is to empower the reader to build their own Sisense dashboards that answer their questions about GitLab data. The examples given at the end are specific to the Product organization but are generalizable to any other team at GitLab.
The first step to building your own Sisense dashboard is checking that you have the correct permissions.
After logging in to Sisense using Okta, you should see a New Chart
button in the top right corner. If you don’t see anything, you only have View-only access and you should follow the instructions above to gain Editor access.
Once you can see New Chart
, you can start creating your own dashboards! Find Dashboards
on the left side nav bar and click the + icon to build one. Make sure to give it a name in order to keep our directory organized.
Once your dashboard is built and named, you can start adding charts by clicking New Chart
in the top right. Now you’re ready to start writing queries.
If you need to copy an existing chart, but do not have permission to see the queries it includes, the owner of that dashboard will need to update the dashboard permissions to allow Edit Permissions
under the Dashboard Preferences- > Permissions -> Editor
section.
The next step to answering your data question is finding the relevant table(s) to query from. This requires knowing some background about our Snowflake data warehouse and the data sources which feed into it. There are 3 general types of data that we store in Snowflake: External, Internal Frontend, and Internal Backend.
External data is all of the data generated by third-party software we use at GitLab, but don’t store the production data ourselves. These sources include Salesforce, Zuora, Netsuite, Greenhouse, and BambooHR. We load this data into our data warehouse using APIs.
GitLab.com is a Ruby on Rails app using a Postgres database on the backend. Each time a user on GitLab.com creates a new MR, issue, comment, milestone, etc., they create a new row in the database. The data team has written custom ELT to sync these Postgres tables into our data warehouse where they’re scrubbed for PII and made available to analyze.
For self-managed instances, we try to get weekly anonymized summaries of these backend databases using usage ping.
Additionally, we’ve enabled a tool called Snowplow to track frontend interactions on gitlab.com. Snowplow has automatic page view tracking as well as form and link-click tracking. Snowplow sends metadata along with every event, including information about the user’s session and browser.
Note: Snowplow is also capable of capturing backend events but at the moment we’re primarily using it for javascript (frontend) tracking.
What’s the difference between frontend and backend data? Backend data is data that’s already being preserved in the application database because it serves some purpose for the application (MRs, issues, pipelines). In contrast, the primary purpose of frontend tracking is analytics.
dbt Documentation
Our dbt Docs site lists all of the tables available for querying in snowflake. Many of these are documented at both the table and column level, making it a great starting point for writing a query.
Data discovery is a Sisense feature allowing users with limited SQL skills to create visualisations on specific data sets through a drag and drop interface.
We are currently testing this feature to understand its value. We have created some test discovery tests that are accessible to everyone in the company and allow them to build their charts without our assistance.
This dataset is created upon this model documented here. Users will find all events performed by a specific namespace_id with extra metadata about this namespace.
Which questions you can answer with this dataset?
We have created a dashboard that contains some examples of visualisations that can be achieved.
To access this dataset, you have mainly 2 options:
New Chart
on the top right cornerOnce the Chart editor is open, click on the Discovery Button
as shown below.
Find in the list menu the discovery dataset called gitlab_dotcom_usage_data_events
**Question 1: ** We can use internal frontend data to answer this question since we're asking about page views. We can query Snowplow page views with like this:
Running this query in Sisense’s SQL editor will output a table in the chart section below the query. From there, Sisense offers you a variety of options for visualizing your data. A great way to learn about building charts is to watch this 10-minute Data Onboarding video from Sisense.
This shows that about 110K users create a merge request every month.
Question 3: How many "users" convert from one trial form to another in the last 30 days? (Conversion Funnel) We can use snowplow CTEs to query the two steps separately, then join them together.
WITH first_trial_form AS (
SELECT
user_snowplow_domain_id AS user_id,
min_tstamp::DATE AS day_of,
min_tstamp AS sent_at
FROM analytics.snowplow_page_views_30 -- Change `_30` to `_all` if needed. For a custom date range, filter on derived_tstamp: `WHERE derived_tstamp BETWEEN '2019-10-01' AND '2019-12-01'
WHERE page_url_path = '/-/trials/new'
),
second_trial_form AS (
SELECT
user_snowplow_domain_id AS user_id,
derived_tstamp::DATE AS day_of,
derived_tstamp AS sent_at
FROM analytics.snowplow_unstructured_events_30
WHERE event_name = 'submit_form'
AND page_url = 'https://gitlab.com/-/trials/apply'
)
SELECT
first_trial_form.day_of,
COUNT(DISTINCT first_trial_form.user_id) AS "View first trial form",
COUNT(DISTINCT second_trial_form.user_id) AS "Submit last trial form",
COUNT(DISTINCT second_trial_form.user_id) * 100 / COUNT(DISTINCT first_trial_form.user_id) AS "Pct"
FROM first_trial_form
LEFT JOIN second_trial_form
ON first_trial_form.user_id = second_trial_form.user_id
AND first_trial_form.day_of = second_trial_form.day_of
AND first_trial_form.sent_at <= second_trial_form.sent_at
GROUP BY 1
ORDER BY 1
Question 4: Which "status" tabs ('Open', 'Merged', etc) get clicked on the /merge_requests page? Similar to before, we can use snowplow data to measure the clicks and page views on the merge_requests page.
WITH link_clicks AS (
SELECT
TRY_PARSE_JSON(unstruct_event):"data":"data":"elementId"::VARCHAR AS element_id
FROM analytics.snowplow_unstructured_events_30
WHERE event_name = 'link_click'
AND element_id IN ('state-closed', 'state-all', 'state-opened', 'state-merged')
AND page_url_path LIKE '%/merge_requests'
)
SELECT
element_id AS "Tab Name",
COUNT(*) AS "Total Clicks",
COUNT(*) / (SELECT COUNT(*) FROM link_clicks) * 100 AS "Percent of Clicks"
FROM link_clicks
GROUP BY 1
ORDER BY 1
Tab Name | Total Clicks | Percent of Clicks |
---|---|---|
state-merged | 200923 | 65.304300 |
state-opened | 50291 | 16.345700 |
state-closed | 31957 | 10.386700 |
state-all | 24501 | 7.963400 |
Of all the clicks on these 4 tabs, 65% are to the "Merged" tab.
If we wanted to know how often these tabs are clicked as a percentage of total page views to the merge requests page, we would make a few modifications to the query:
WITH link_clicks AS (
SELECT
TRY_PARSE_JSON(unstruct_event):"data":"data":"elementId"::VARCHAR AS element_id,
COUNT(*) AS total_clicks
FROM analytics.snowplow_unnested_events_30
WHERE event_name = 'link_click'
AND element_id IN ('state-closed', 'state-all', 'state-opened', 'state-merged')
AND page_urlpath LIKE '%/merge_requests'
GROUP BY 1
),
page_views AS (
SELECT
COUNT(*) AS total_views
FROM analytics.snowplow_unnested_events_30
WHERE event = 'page_view'
AND page_urlpath LIKE '%/merge_requests'
)
SELECT
link_clicks.element_id AS "Tab Name",
link_clicks.total_clicks / page_views.total_views * 100 AS "Percent of Page Views"
FROM link_clicks
INNER JOIN page_views
ON 1=1
Tab Name | Percent of Page Views |
---|---|
state-merged | 5.423417 |
state-opened | 1.357480 |
state-closed | 0.862600 |
state-all | 0.661344 |
5 percent of page views on the merge requests page result in a click to the "Merged" tab.
If the Sisense chart has timed-out or is taking a long time to run, most likely, the SQL query used to generate the Sisense chart needs to be refactored for query optimization. Please create an issue in the Data Team project for help.
Some dashboards in Sisense will include an Official Badge (similar to Twitter's Verified Checkmark).
That means these analyses have been reviewed by the data team for query accuracy. Dashboards without the verified checkmark are not necessarily inaccurate; they just haven't been reviewed by the data team. Only members of the Data role can add or remove the Official Badge.
We have one Sisense space:
They connect to the data warehouse with different users- periscope
and periscope_sensitive
respectively.
Most work is present in the GitLab space, though some extremely sensitive analyses will be limited to GitLab sensitive. Examples of this may include analyses involving contractor and employee compensation and unanonymized interviewing data.
Spaces are organized with tags. Tags should map to function (Product, Marketing, Sales, etc) and subfunction (Create, Secure, Field Marketing, EMEA). Tags should loosely match issue labels (no prioritization). Tags are free. Make it as easy as possible for people to find the information they're looking for. At this time, tags cannot be deleted or renamed.
Many folks will have some cadence on which they want to see dashboards;
for example, Product wants an update on opportunities lost of product reasons every week.
Where it is best that this info is piped into Slack on a regular cadence, you can take advantage of Slack's native /remind
to print the URL.
If it does not appear that the dashboard is autorefreshing, please ping a Sisense admin to update the refresh schedule.
There are three user roles (Access Levels) in Sisense: admin, SQL, and View Only.
The current status of Sisense licenses can be found in the analytics project.
Updating Users for Sisense
let jq = document.createElement("script");
jq.src = "https://ajax.googleapis.com/ajax/libs/jquery/3.2.1/jquery.min.js";
jq.onload = function() {
//your code here
};
document.body.appendChild(jq);
$('div.list_class_namer_replace_this').map(function(i, el) {
return $(el).text()}
).toArray()
These users have the ability to provision new users, change permissions, and edit database connections. (Typical admin things)
Resource: Onboarding Admins
The users have the ability to write SQL queries against the analytics
and analytics_staging
schemas of the analytics
database that underlie charts and dashboards. They can also create or utilize SQL snippets to make this easier. There are a limited number of SQL access licenses, so at this time we aim to limit teams to one per Director-led team. It will be up to the Director to decide on the best candidate on their team to have SQL access.
These users can consume all existing dashboards. They can change filters on dashboards. Finally, they can take advantage of the Drill Down functionality to dig into dashboards.
We have additional roles for further subdividing the Editor role. Certain charts should not be able to be edited by anyone. For example, the Finance KPIs dashboard should only be able to be edited by members of the Data and Finance roles.
All users have View-only access privileges via Okta.
To upgrade a user, in the Sisense UI, navigate to the Roles and Policies section. Then add the user to the relevant group (Admin/Editor) and their Division (e.g. Marketing, Product, etc.) or Department (e.g. UX, Security, etc.).
Users will inherit the highest access from any group they are in. This is why all functions are by default View-only.
There are 2 reasons why you could not find the user in Sisense.
The user will always exist in the Google Group. So in order to get the account in Sisense, the user needs to perform a (initial) login to Sisense. With that action, the user is created in Sisense (in the GitLab Space) and you can change the account (add to another Space or grant Editor privileges).
This section details the workflow of how to push a dashboard to "production" in Sisense. Currently, there is no functionality to have a MR-first workflow. This workflow is intended to ensure a high level of quality of all dashboards within the company. A dashboard is ready for production when the visuals, SQL, Python, and UX of the dashboard have been peer reviewed by a member of the Data Team and meet the standards detailed in the handbook.
WIP:
as the name and add it to the WIP
topic@gitlab-data
.WIP:
labelWIP
topicApproved
to the MR before closing itThis section details the workflow on how to make updates to existing dashboards that have already been through the Data Team Peer Review process. Once a dashboard is in production, incremental additions to the dashboard can be implemented by the Data Analyst and the DRI/Prioritization Owner without going through the entire New Dashboard Creation and Review Workflow. Please follow the below steps to update an existing dashboard.
WIP:
to the title of the chart being updated. If a new chart is being added, add WIP:
to the title.WIP:
from the new or updated chart and close the MR.You can also make modifications to charts, snippets, or views with a merge request (MR) to the GitLab Data - Periscope Project. Please see this example or follow the steps below:
The sync from the project repo is bi-directional.
You can request for an automatic dashboard refresh by creating a Data team project issue. You can use the issue template to request an automatic refresh over a specified interval for one specific dashboard or a bulk list of dashboards.
The business unit, not the data team, is responsible for embedding these charts in the handbook. Sisense has great embed docs and chat support through the app. There are three main ways to embed charts or dashboard in our handbook.
You can always hardcode HTML in any file type that accepts it. .html
files are the obvious example. But markdown (.md
) and embedded ruby (.erb
) files also allow fallback to regular HTML.
It is quite easy to embed a whole dashboard in the handbook.
To embed a dashboard, you must first make it an Externally Shared Dashboard.
Then, you can add ?embed=true
to the URL string to make it an embed link.
Plug the URL into the following:
<iframe class="dashboard-embed" src="https://app.periscopedata.com/shared/string-of-numbers-here?embed=true" height="700"> </iframe>
Note when using dashboards from the SAFE space, making an Externally Shared Dashboard will make the dashboard available to anyone with the link and is not recommended.
We aim to make sure that the dashboard does not require scroll within the handbook, so you will need to adjust the height value appropriately.
There is no way to do that programmatically.
Embedded charts in the handbook should always be generated using the signed_periscope_url
helper function. This function will generate a signed URL for you automatically without needing a member of the data team to help you. This is especially convenient when experimenting with passing different data options to the Sisense API. This helper function will return an error if attempting to embed a chart from the SAFE space. It is recommended that only URLs be shared to charts that are within the SAFE space.
Note that your file must end in .erb
. If you are working on a file which has a name index.html.md
simply append .erb
to create a filename of index.html.md.erb
.
You simply pass the data as argument to the function. It can take any data required by the Sisense API, including sub-arrays and objects. The Sisense documentation has a full list of options available for the embed API.
<embed width="100%" height="400px" src="<%= signed_periscope_url(chart: 6114177, dashboard: 463858, embed: 'v2') %>">
This method does not work in plain Markdown or HTML files because they do not execute code when rendering.
Tip: The embedded charts will not render locally, because the required PERISCOPE_EMBED_API_KEY is setup as a CI variable. To confirm the charts render correctly, launch the Review App within an MR.
data/performance_indicators.yml
is the basis for a system that automatically generates handbook pages with performance indicator content on them, according to a convention. If you give an object the periscope_data
property with sub-values, the template will automatically generate a signed URL and write the HTML for you. It uses the same signed_periscope_url
helper function as above. This helper function will return an error if attempting to embed a chart from the SAFE space. It is recommended that only URLs be shared to charts that are within the SAFE space.
- name: MR Rate
description: MR Rate is a monthly evaluation of how MRs on average an Development engineer performs.
periscope_data:
chart: 6114177
dashboard: 463858
embed: v2
is_key: true
There are a couple ways to upload data into Sisense through CSVs. All of these methods are documented and can be requested through creating a data team project issue. Please click on the link to review your options.
If you still decide to upload your CSV through Sisense, ensure that you are using the Snowflake
database in the New Chart
or New Exploration
window. Otherwise, you should always default to GitLab_(Use_this_one!)
to access the internal GitLab data models. Please see image below to understand how to change databases in Sisense:
Please remember not to upload personal or sensitive data into Sisense through the Sisense CSV Upload functionality since this data will be publicly accessible by all GitLab team members.
There may be times in which you need to pull data out of Sisense.
If this is a one-off case, you can always download a CSV from the UI. Please note that the CSV download has a maximum size of 500MB and the query must complete within 4 minutes.
If you need to regularly pull this data into, for example, a sheet, you can expose the CSV's public URL by going to Edit > Chart Format > Advanced > Expose Public CSV URL.
Then in the sheet you can use =importdata("PUBLICURLHERE")
If the data you are trying to export is not sensitive, is larger than 500MB in size but less than a million rows, then create a data team project issue.
If you need to tool to manipulate text files, please consider this list of command line tools for manipulating structured text data or these great resources.
You may want a dashboard that only filters to the current fiscal quarter or the next fiscal quarter. Sisense's off-the-shelf date filters cannot accommodate for custom fiscal years.
In your analysis, add the following: (update the [datevalue]
with the date you're looking to have filtered)
LEFT JOIN analytics.date_details on current_date = date_actual
WHERE [datevalue] < last_day_of_fiscal_quarter
AND [datevalue] > first_day_of_fiscal_quarter
In most cases, you need to filter out the current month from your query and only report on completed months. The current month is incomplete and showing these numbers can be misleading. Please use the below statement in your dashboard query to filter out current month.
WHERE <month_column> < date_trunc('month', CURRENT_DATE)
All timestamps in Snowflake should be in UTC.
Sisense's display time is set to PT (Pacific Time). This is aligned with the communication guidelines.
When using [created_date=daterange]
Sisense uses the current_timestamp and converts it to PT for the comparison. For example, if on October 4th at 13:00 PT (20:00 UTC), you request data from the past 3 days, then Sisense will make the filter from 2019-10-02 07:00:00.000 to 2019-10-05 07:00:00.000. These times are in UTC and correspond to midnight at the start of 2019-10-02 in PT and midnight at the end of 2019-10-04 in PT - i.e. this is 3 full days for PT. If the database stores the values in UTC (which we do), then the comparison is exactly what you want.
The main thing you should worry about as an end-user is formatting the date for display in the chart. This can be done simply by converting the timestamp to PT with this syntax [created_at:pst]
. You can also convert it to a date by appending :date
like so [created_at:pst:date]
. This is necessary when comparing dates in a source system such as Salesforce to what you see in Sisense.
The key things to remember are:
When you have an aggregated date that you want to use as a filter on a dashboard, you have to use the aggregated period as the date range start
and one day less than the end of the aggregation as the date range end
value.
Your date range start value can be mapped to your date period.
For the date range end, you need to create an additional column in your query to automatically calculate the end date based on the value selected in your aggregation filter. If we've been using sfdc_opportunity_xf.close_date
as the date we care about, here is an example: dateadd(day,-1,dateadd([aggregation],1,[sfdc_opportunity_xf.close_date:aggregation])) as date_period_end
Then add the mapping for the date range end.
All queries used to generate charts and snippets can be found in the periscope/master branch of the Sisense project. Enter your model or keyword of interest in the search field to find relevant queries.
Pie charts are universally seen as a poor method of visualizing data. Read this blog post as a primer on why not to use pie charts.
When exporting static charts out of Sisense, use the built-in export functionality instead of taking a screenshot. Exporting produces a higher-quality image with a transparent background. To export an image out of Persicope, select More Options
in the top-right corner of any chart and then select Download Image
.
Sisense operates as our Business Intelligence Tool and our Single Source of Truth. As our SSOT, Sisense requires us to maintain a very high level of cleanliness, tidiness, and accuracy.
It also requires that the dashboards created and/or approved by the data team are accurate and informative. It also requires some periodic maintenance from the data team members.
Main Principles:
At the moment, for all dashboards, the auto-archival feature is enabled. That means that if a dashboard is not viewed for more than 45 days, it will be automatically archived. An archived dashboard is not deleted and can be unarchived.
Entropy is a natural but avoidable state of a Business Intelligence Tool. In order to act against this tendency, the Data Team operates periodic Maintenance Operations in our Sisense space.
Every month, a Data Team member will take care of the maintenance. This could be proactively claimed one week before the end of each month during one of the Data Ops meetings.
The maintenance task has to be completed in the first week of every month. To do so, you have to open a new issue in the data team project and select the Sisense Cleanup Issue template. This template will give a list of tasks to complete. Once all of them are completed, you can close the issue.
If for any reason the API Key needs to be rotated it needs to be rotated in the following places:
Team members who work on the performance indicators page generation code will also need it, since they need it to be able to build the pages locally.