How to leverage distributed engineering teams for rapid response

I am an Engineering Manager working on a distributed engineering team at GitLab. Our team is distributed globally, and we have engineers working in India, Germany, Australia, New Zealand, and the United States. I am located in the U.S. in Pacific Standard Time (PST). In coordination with other globally distributed engineering teams, we recently responded to an abuse issue which was causing disruptions for legitimate GitLab.com users, and required a rapid response.

Global distribution as an advantage

Many managers view global team distribution as a constraint (because synchronous communication becomes more difficult), but it is possible to embrace the constraint and turn it into an advantage. When teams are globally distributed it is possible for work to continue around-the-clock, uninterrupted, and decrease the overall delivery time of projects. I refer to this as "continuous development."

While we don't typically work this way, when problems are pressing, working continuously can be a strategy to advance the delivery time frame. In this case, two engineers from our team worked on the problem 17 hours apart. This provided some overlap in the afternoon (PST), but for the most part, the engineers were working on the project at different times which allowed work to progress continuously.

It requires some extra management compared to the typical workflow, but the effort may be worth the investment if time is critical.

Define clear handoffs

One risk of multiple engineers working continuously and asynchronously is duplicating work from lack of clear separation of work or handoffs. If possible, it is best to separate work, so engineers are working in different areas of code, but separating work might not always be feasible or practical. In either case, when an engineer finishes working for the day, they should provide an update describing the work which was completed, any problems impeding progress, and what is left to be done.

If engineers are working in the same area of code, it should be clearly defined if they are working in the same branch or separate branches. If they are working in the same branch, it might make sense for one engineer to maintain branch and accept merges from other engineers before it merged into the main development branch.

Agree on interfaces

When distributed engineering teams are working on a project, it is critical to define clear and documented interfaces between systems and components. System interfaces should be documented in a centrally maintained location. If there is a need to change the interface, then everyone affected by the change should be notified.

In retrospect, we lost nearly a day of testing because of confusion about an interface between the frontend and backend of the system. These types of problems tend to be amplified when not all engineers involved in the project are available at the same time, as it may take an entire 24-hour cycle to handle and communicate changes. When a discrepancy is found, the problem should be documented by the engineers currently working and, if possible, a solution proposed.

Place synchronous communication on management

When working concurrently, to help ensure all teams are on the same path, it can be helpful to discuss the project status synchronously. This can be difficult to arrange with distributed engineering teams. On this project, the technical teams met twice weekly for 15-30 minutes. It can be tempting to require team members to work off hours to attend synchronous meetings. I'd recommend fighting this tendency.

It's the responsibility of a manager to ensure effective communication across teams. During rapid-response actions, it's helpful to keep flexible working hours to synchronize with team members across different time zones. I accept working outside my typical hours (knowing I can adjust my hours at other times of the day), to communicate the status of my team synchronously. This also requires the manager to have a more detailed technical understanding of the implementation and status than is normally required, so they can speak on behalf of offline team members.

Instead of requiring synchronous meeting attendance, take good notes and record the meeting so team members in other time zones can review the status and decisions from synchronous meetings.

Trade-offs

In many ways, engineering is the art of balancing trade-offs. Operating in a continuous, globally-distributed fashion takes more management and cognitive overhead than typical asynchronous workflows, but when time is a priority, it could decrease the release time on critical projects.

Operating continuously may come at cost of other management tasks as compressing time increases the effort required to oversee the project requiring a rapid response. At the end of the rapid-response issue, a retrospective should be held to determine if the engineering strategy provided the expected results, relative to the increased overhead. My recommendation is to be realistic about costs when planning continuous development even when it provides short-term results.

Read more on leading engineering teams.

How to leverage distributed engineering teams for rapid response

Global distribution as an advantage

Define clear handoffs

Agree on interfaces

Place synchronous communication on management

Trade-offs

More to explore

Kubernetes overview: Operate cluster data on the frontend

Debug Web apps quickly within GitLab

Tutorial: Install VS Code on a cloud provider VM and set up remote access

We want to hear from you

Ready to get started?

How to leverage distributed engineering teams for rapid response

Global distribution as an advantage

Define clear handoffs

Agree on interfaces

Place synchronous communication on management

Trade-offs

Sign up for GitLab’s newsletter

More to explore

Kubernetes overview: Operate cluster data on the frontend

Debug Web apps quickly within GitLab

Tutorial: Install VS Code on a cloud provider VM and set up remote access

We want to hear from you

Ready to get started?