For an understanding of where this team is going, take a look at the product vision.
As a member of the Ops Sub-department, you may also like to understand our overall vision.
The following members of other functional teams are our stable counterparts:
Some dedicated Slack channels:
f_agent_for_kubernetes
f_terraform_backend
terraform-provider
f_autodevops
#f_environment_details_page
(Sisense↗) We also track our backlog of issues, including past due security and infradev issues, and total open System Usability Scale (SUS) impacting issues and bugs.
(Sisense↗) MR Type labels help us report what we're working on to industry analysts in a way that's consistent across the engineering department. The dashboard below shows the trend of MR Types over time and a list of merged MRs.
(Sisense↗) Flaky test are problematic for many reasons.
We have one team meeting each week. The time alternates every week to accommodate APAC/EMEA and EMEA/AMER. The purpose of this meeting is to share information about the ongoing projects. It also contains general announcements that are important for collaboration.
Meeting format:
If the meeting for the week has already taken place and you would like to add a new item for discussion, create a new section for the next meeting date above the last one and add your item.
Every week the refinement bot assigns a team member as the refinement DRI, who is responsible for refining some issues from the top of the ~"workflow::refinement"
list (the list is prioritised top to bottom) in the Milestone Board and another issue of their choice with a ping to the EM and/or PM explaining the reasoning.
The refinement process is described in the issue template.
Sometimes we will encounter issues that need the input of the whole team to be refined and then worked on, such issues will be selected as a topic for a Technical Discovery meeting. We try to be conscious of sync time and so we expect a maximum of two of these meetings for each milestone. A technical discovery meeting consists of:
The goal of technical discovery meetings is to come up with a concrete technical proposal for the question at hand. We should not force a proposal, but aim to get there and write the conclusion accordingly with potential follow-ups.
Each week the Product Designer hosts a design pairing session with the team on Thursdays at 1:30pm UTC. The goal of the design pairing sessions is to give the team more insight into what Product Design is currently working on, share feedback and questions, as well as give us a space to brainstorm and work together through bigger problems. Anyone is encouraged to propose topics or existing user problems that could use some brainstorming together as a team. A design pairing session consists of:
If there are no topics, the meeting can be cancelled for the week.
The weights we use are:
Weight | Extra investigation | Surprises | Collaboration |
---|---|---|---|
1: Trivial | not expected | not expected | not required |
2: Small | possible | possible | possible |
3: Medium | likely | likely | likely |
5: Large | guaranteed | guaranteed | guaranteed |
Anything 5 or larger should be broken down, these should not be ready for development
. We would likely turn a 5 into an epic, into a research and implementation issue or a technical discovery.
Occasionally, a proof-of-concept (POC) is necessary to determine a feasible technical path. When one is required, the engineer will create a POC issue that contains the context of the research to be conducted along with the goals of the POC. This issue will be scheduled for work before any further breakdown of tasks is performed. Once the technical path is clear, the engineer can proceed to weight the issue and/or break down the issue further to guide implementation. Every POC issues should contain a list of questions we want to answer, the definition of done should include the answers and suggested next steps.
Not all POCs will be successful, and that is OK! Some avenues of research may not be successful, and the POC will have saved us from investing significant time in a solution that will not meet our needs. The goal is early feedback and fast iteration.
We intentionally leave the term "velocity" undefined and do not use it in planning workload capacity for the team.
We leave the question of interpreting summed weights open to each unique situation.
When making decisions about how much work the team can take on for a milestone, we trust individual impressions and instincts reflected in the discussions that take place in the planning issue and the refinement process. The weighting system helps foster these discussions.
The GitLab Terraform Provider is managed by the Environments group.
The issues scheduled for a milestone can be tracked at Milestone Board.
This board contains all the necessary columns to track the workflow of the team, in particular:
~"workflow::refinement"
the list of issues that needs to be refined before they can be assigned.~"workflow::ready for development"
the list of issues that are ready to be worked on, both assigned and not assigned to the milestone.All the columns are prioritised top to bottom.
Once a team memeber self-assigns an issue on the Milestone Board, issue labels should follow the Engineering Workflow.
For Merge Requests, it's up to the author and the project they are contributing to, to decide if they want to use these ~workflow::
labels. It is not required to use them or keep them synced up with the Issue labels.
Our goal is to move towards a continuous delivery model so the team completes tasks regularly, and keeps working off of a prioritized backlog of issues. We default to team members self-scheduling their work:
workflow:ready for development
column and has the current milestone.~Deliverable
issues take priority over any other work, as they are the main focus of each milestone and inform our say-do ratio.workflow:ready for development
issue.bug
or feature
categorized issues.In addition to the self-scheduling of feature development, the manager will from time to time assign bugs, or other work deemed important, directly to a team member.
Our team keeps track of their commitment with say-do
ratios, two metrics are important: say-do and reprioritized say-do
~Deliverable
issues.~Deliverable
label is applied to the upcoming milestone issues by the EM.~Deliverable
for each engineer, this may change milestone by milestone.~Deliverable
label at that point is considered as promised to be delivered and is part of our say-do ratio.~Deliverable
label is removed or the issue is removed from the milestone that issue does not count anymore in the reprioritized say-do
metric, but still does count for say-do
.We aim to achieve 100% re-prioritized say-do
and at least 80% say-do
.
~Deliverable
issues labelled as such by the 17th of March 2023~Deliverable
issues will not make it, and reasonably before the end of the milestone, we move them to 16.0Our say-do
ratio would be 40% (4 out of 10)
Our reprioritized say-do
would be 80% (4 out of 5)
Team members should use their best judgment to determine whether to assign the first review of an MR based on the DangerBot's suggestion or to someone else on the team. Some factors in making this decision may be:
Team members should make their best effort to resolve UX issues as they come up during MR reviews. However, there are times where the changes requested or feedback given would significantly slow down velocity. For the sake of efficiency and iteration, a UX debt issue must be opened to follow up on the feedback.
In these instances, the engineer who authored the original MR should assign themselves the issue and become the DRI to evaluate the UX feedback. This may mean reaching out to the team's Product Designer to ensure the feedback is actionable and resolving the debt is prioritized appropriately during the following milestone planning. For example, for UX debt issues opened in the 16.3 milestone, engineers should evaluate and ensure appropriate prioritization of the issue during the planning of the 16.4 milestone. This does not mean that the issue must be resolved during the 16.4 milestone, but that the issue is placed into the appropriate step of our product development flow, or closed if appropriate.
This helps to ensure that UX debt issues are resolved in a timely manner, keeping with the overall goals of the group and adherence to broader engineering workflows.
The Environments group uses epics to describe features or capabilities that will increase the maturity of the Environments categories over time.
Each epic should be owned by an engineer who is responsible for all technical aspects of that epic. The engineering DRI will work closely with the Product Manager and Product Designer to understand the requirements and create issues that encapsulate the technical work required during the design/solution validation phases and build track of the Product Development Flow. Each issue needs to be weighted and contain enough information in the description area for any other engineer on the team to be able to pick up that work.
For the duration of building the epic, the engineer does not need to be the only person implementing the issues. They should keep watch of the work that is done on the issues so that they can verify that the work is progressing correctly. If there are problems with the work, or lengthy delays, they need to make sure the Product Manager and Engineering Manager are aware.
When work is nearing completion, the engineer should make sure that any additional issues that may have come up during the build process are either addressed, or scheduled for work. Additional issues should be created and added to the epic. This will help to make sure that we do not build up technical debt while building.
Finally, they should also monitor any work that needs to occur while rolling out the Epic in production. If there are rake tasks, database migrations, or other tasks that need to be run, they need to see those through to being run on the production systems with the help of the Site Reliability counterpart.
This places a lot of responsibility with the DRI, but the PM and EM are always there to support them. This ownerships removes bottlenecks and situations where only the PM or EM is able to advance an idea. In addition, the best people to decide on how to implement an issue are often the people who will actually perform the work.
To declare an ownership, insert DRI: <your-gitlab-handle>
at the top of the epic description. Example.
Maintaining a high standard of quality is a critical factor to delivering winning products.
Within the Environments group we use the following processes and best practices to ensure high quality.
The Environments group uses GitLab QA for End-to-End testing. We have guidelines for how our team is leveraging these tests.
In feed_alerts_configure
we have a bot that runs tests at this project
If this bot alerts of a failed pipeline, we should treat these the same as a broken master branch.
Our target availability is 99.95%
Each week we receive an Error Budget report in #cd-section on Slack if we are under our target availability.
An engineer might be assigned as a DRI to look into this.
The DRI is neither expected to determine a root cause nor propose a solution on their own.
The DRI should instead reach out to the Scalability:Projections team for support.
In order to optimize async collaboration across a big team we use issue updates to share progress completed on a specific issue or epic.
Weekly updates on progress and status will be added to each issue by its assignee. A weekly update may be skipped if there was no progress. It's preferable to update the issue rather than the related merge requests, as those do not provide a view of the overall progress. This applies to issues with the labels workflow::in dev
or workflow::in review
The status comment should include what percentage complete the work is, the confidence of the person that their estimate is correct and, a brief note on what was done. It's perfectly acceptable to have multiple updates if more than one DRI is working on the issue.
As a part of the async update it's important to verify that the issue and related MRs workflow labels are correctly set.
## Async status update
- Complete: 80%
- Confidence: 90%
- Notes: expecting to go into review tomorrow
To simplify the work of adding and keeping track of async updates TalTal can be used.
We want every team member to be advancing in their Career Development.
We follow the Engineering Department Career Development Framework.
We're a highly distributed team. It's simply hard to find a synchronous call slot that works for everyone, therefore it's important that our main communication is asynchronous basis and it's well-optimized for our team dynamics.
For example, when you refine an issue, you would like to collect input from various team members, domain experts and stable counterparts. Typically, posting a comment with pinging them is enough, however, if the topic is complicated, ambiguous or too broad, you wouldn't get useful and relevant feedback. This frustrates both you and participants, which should be avoided.
To maximize our asynchronous performance, we should follow GitLab Communication guideline, More specifically, the following points are important:
We participate in the OPS showcase initiative, to facilitate the selection of topics, the creation of the issues and content we have a Showcase DRI which will:
Currently the showcase DRI for FY24Q3 is: @anna_vovchenko
Read our specific GDK instructions as well as our handbook entry on what existing testing does and how to develop features for Auto DevOps.
The Environments group has access to a shared GCP project which can be used for demos, experiments, or to host auxilliary services.
The project id is deploy-stage-shared-i-e55e01cb
and was created and provisioned using the following ARs:
If you need to create permanent infrastructure in that GCP project, it's encouraged to do it with Terraform to easily share and document the setup with the entire group. You can use this GitLab group to host the project.
If the infrastructure is temporary, you can manage it with whichever tools you prefer.
Currently hosted projects:
When you need to create an example project for demonstartion, consider having it in the example group instead of your personal namespace.
This allows us to collect all of the knowledge under the same place. Also, this example group has EEP license by default.