We're happy to help you achieve your goals with Data. Most of our work is driven through our Data Fusion Teams, but we do reserve some capacity to work on requests not linked to these initiatives. Here's the process to follow to create a new Data issue:
New Issue
Request Type | Template To Choose |
---|---|
New Dashboard | 'Visualization or Dashboard - New Request' |
Add Data Source | 'New Data Source' |
Data Export | 'Data Export Request' |
New KPI | 'KPI Template' |
Data Quality Issue | Coming soon |
Not all data solutions require the same level of quality, scalability, and performance so we have defined a Data Development framework to help match required outcomes with level of investment. The Data Team works with all teams to build solutions appropriate to the need, but focuses on Trusted Data using Trusted Data Development.
Experimentation is a great approach to performing Explorational Data Development. Oftentimes, there can be multiple solutions to solve a business problem and we need a process to efficiently evaluate the different approaches and ultimately select a solution to promote to a Trusted Data solution. We use Design Spikes to facilitate experimenting. Design Spikes are particularly useful when the proposed solutions result in breaking changes or significant changes to the overall design, structure, and computing of the data tech stack.
The below steps should be followed when performing a Design Spike:
Must Have
or Nice to Have
.Must Have
or Nice to Have
requirements. We should put this analysis into a document along with a recommended solution based on the results of the Design Spike. We can then submit this to the respective DRIs and Stakeholders on the project for review, feedback, and final decisions on how we will proceed with the use case.The Data Team, like the rest of GitLab, works hard to document as much as possible. We believe this framework for types of documentation from Divio is quite valuable. For the most part, what's captured in the handbook are tutorials, how-to guides, and explanations, while reference documentation lives within in the primary analytics project. We have aspirations to tag our documentation with the appropriate function as well as clearly articulate the assumed audiences for each piece of documentation.
As a central shared service with finite time and capacity and with a responsibility to operate and develop the company's central Enterprise Data Warehouse, the Data Team must focus its time and energy on initiatives that will yield the greatest positive impact to the overall global organization towards improving customer results.
The Data Team uses a Value Calculator to quantify the business value of new initiatives (issue, epic, OKR, strategic project) to enable prioritization and ranking of the Data Team development queue. The Value Calculator provides a uniform and transparent mechanism for ranking and enables all work to be evaluated on equal terms. The value calculator approach is similar to the RICE Scoring Model for Product Managers and the Demand Metric Prioritization Model for Marketing.
Every day in Data brings a new challenge or opportunity. However, The Data Team strives to spend the majority of its time developing and operating the Enterprise Data Warehouse and related systems, keeping fresh data flowing through the system, regularly expanding the breadth of data available for analysis, and delivering high-impact strategic projects. Our standing priorities are listed below.
Rank | Priority | Description | Target Allocation |
---|---|---|---|
1 | Production Operations | Activities required to maintain efficient and reliable data services, including triage, bug fixes, and patching to meet established Service Level Objectives. | 10-20%, though will fluctuate as driven by incident frequency and complexity |
2 | Data Team OKRs | The Data Team identifies 3-4 strategic-level OKRs per quarter, primarily focused on core infrastructure and data development that will be beneficial to the entire company. | 60-75%, though this will fluctuate as driven by larger Functional Team OKRs |
3 | Other | Other work, including Functional Team OKRs, as prioritized and ranked using the Value Calculator | 15-25% |
We use scoped labels in GitLab to track our issues across these priorities.
In rare situations established SLOs do not meet turnaround needs and in these cases the Data Team provides an expedite response capability. The Data Team will provide an date estimate if expedited request cannot be handled per the expedite response SLO.
The calculator below is based on the following Value Calculator spreadsheet. Please select the values below to define the value of new work.
Our planning process is called the Planning Drumbeat and it encompasses Quarterly Planning and Milestone Planning. The Planning Drumbeat is one of the most important activities the Data Team performs because it helps us align our work with the broader company, while remaining agile enough to manage shifting business priorities.
Business Technology developed Rolly Bot to automate the creation and dissemination of weekly status updates. By using their tool, the Data Team is able to send out regular updates about their OKRs via email and Slack with minimal time commitments from team members. One roll up is generated for the Data Fusion team and a second is created for the Data Platform team.
Process:
Data-Platform-weekly-rollup
Data-Fusion-weekly-rollup
Data-Science-weekly-rollup
Data-Collaboration-weekly-rollup
Note: The release date will often be the next milestone based on our milestone planning process, but it could be sooner or later than that, depending on the specific KR.
Current limits:
There are three general types of issues:
Not all issues will fall into one of these buckets but 85% should.
Some issues may need a discovery period to understand requirements, gather feedback, or explore the work that needs to be done. Discovery issues are usually 2 points.
Introducing a new data source requires a heavy lift of understanding that new data source, mapping field names to logic, documenting those, and understanding what issues are being delivered. Usually introducing a new data source is coupled with replicating an existing dashboard from the other data source. This helps verify that numbers are accurate and the original data source and the data team's analysis are using the same definitions.
This umbrella term helps capture:
It is the responsibility of the assignee to be clear on what the scope of their issue is. A well-defined issue has a clearly outlined problem statement. Complex or new issues may also include an outline (not all encompassing list) of what steps need to be taken. If an issue is not well-scoped as its assigned, it is the responsibility of the assignee to understand how to scope that issue properly and approach the appropriate team members for guidance early in the milestone.
Incidents are times when a problem is discovered and some immediate action is required to fix the issue. When this happens, we make an Incident Issue in the Data Team Project. The process for working through incidents is as follows:
Data Team Incidents can be reviewed in Incident Overview page within the main project.
Stage (Label) | Responsible | Description | Completion Criteria |
---|---|---|---|
workflow::1 - triage |
Data | New issue, being assessed | Item has enough information to enter problem validation. |
workflow::2 - validation |
Data, Business DRI | Clarifying issue scope and proposing solution | Solution defined with sign off from business owners on proposed solution that is valuable, usable, viable and feasible |
workflow::3 - scheduling |
Data | Waiting for scheduling | Item has a numerical milestone label |
workflow::4 - scheduled |
Data | Waiting for development | Data team has started development |
workflow::5 - development |
Data | Solution is actively being developed | Initial engineering work is complete and review process has started |
workflow::6 - review |
Data | Waiting for or in Review | MR(s) are merged. Issues had all conversations wrapped up. |
workflow::X - blocked |
Data, Business DRI | Issue needs intervention that assignee can't perform | Work is no longer blocked |
Generally issues should move through this process linearly. Some templated issues will skip from triage
to scheduling
or scheduled
.
Issue pointing captures the complexity of an issue, not the time it takes to complete an issue. That is why pointing is independent of who the issue assignee is.
Weight | Description |
---|---|
Null | Meta and Discussions that don't result in an MR |
0 | Should not be used. |
1 | The simplest possible change including documentation changes. We are confident there will be no side effects. |
2 | A simple change (minimal code changes), where we understand all of the requirements. |
3 | A typical change, with understood requirements but some complicating factors |
5 | A more complex change. Requirements are probably understood or there might be dependencies outside the data-team. |
8 | A complex change, that will involve much of the codebase or will require lots of input from others to determine the requirements. |
13 | It's unlikely we would commit to this in a milestone, and the preference would be to further clarify requirements and/or break into smaller Issues. |
Think of each of these groups of labels as ways of bucketing the work done.
All issues should get the following classes of labels assigned to them:
Optional labels that are useful to communicate state or other priority
Ideally, your workflow should be as follows:
Update the MR with an appropriate template. Our current templates are:
Run any relevant jobs to the work being proposed
Draft:
label, mark the branch for deletion, mark squash commits, and assign to the project's maintainer. Ensure that the attached issue is appropriately labeled and pointed.
Draft:
is still in the title of the MR, then the Maintainer will assign the MR back to the author to confirm that the MR is ready for merge.Other tips:
The Merge Request Workflow provides clear expectations; however, there is some wiggle room and freedom around certain steps as follows.
/rebase
into a comment and GitLab will make this happen automatically, barring any merge conflicts.Ideally, your workflow should be as follows:
After some time, environments will have software/code/components that are not needed any more. It feels risky to delete software and code, even when its not being used, seems not being used or asked not to being used (i.e. users access).
There are multiple reasons to perform deletions. I.e:
To address observations and requests, and ensure that deletion will take place in a controlled manner, open an issue with the Cleanup Old Tech template.
Write down what will be deleted and where possible link to existings issues.
The Risk score is build upon 2 variables.
Each variables will be scored 1 to 3.
Probability | Score |
---|---|
Low | 1 |
Medium | 2 |
High | 3 |
Impact | Score |
---|---|
Negligible | 1 |
Lenient | 2 |
Severe | 3 |
Probability
* Impact
= Risk Score
Risk | Negligible | Lenient | Severe | |
---|---|---|---|---|
Probability | ||||
Low | 1 | 2 | 3 | |
Medium | 2 | 4 | 6 | |
High | 3 | 6 | 9 |
Risk Score | Outcome |
---|---|
1 - 2 | Create a MR and have it reviewed by 2 code owners |
3 - 4 | Create a MR, tag @gitlab-data/engineers with a deadline to object and have it reviewed by 2 code owners |
6 - 9 | Create a MR, to be discussed in the DE-Team meeting and have it reviewed by 2 code owners |
Requests to expedite responses, triage issues, or review MRs is rare. Given the Data Team's shared-service model, expediting an item is asking to de-prioritize other work. To request an expedited response:
We regularly measure how ow satisfied our internal customers are with the services and products we provide.
We encourage everyone to record videos and post to GitLab Unfiltered. The handbook page on YouTube does an excellent job of telling why we should be doing this. If you're uploading a video for the data team, be sure to do the following extra steps:
data
as a video tagbt-data-data-science-interview
to help coordinate with your interviewers. Share the Interview Plan with your interviewers through Slack.