Site Reliability Engineering

On this page

Workflow How may we be of service? Status STATUS
Issue Trackers Infrastructure: Milestones, OnCall Production: Incidents, Changes, Deltas Delivery
Slack Channels #sre-lounge, #database #alerts, #production #g_delivery
Operations Runbooks (please contribute!) On-call: Handover Document, Reports  


Site Reliability is the complementary primary caretaker of the operational environment, focusing on its uptime through reliability considerations. Whereas Site Availability is focused on the here and now, Site Reliability has a slightly longer time horizon, soon.


Site Reliability's guiding principles are efficiency, effectiveness and frugality. In a sense, this is the team that will outdate both change and delta management. In very colloquial terms, Site Reliability produces well-designed machine parts to replace duct-tape placed in the environment.

Key Metrics

Key metrics related to this group include:


Each member of the Site Reliability Team is part of this vision:

Team Members

The following people are members of the Site Reliability Team:

Person Role
Dave Smith Engineering Manager, Production
John Northrup Site Reliability Engineer
Alejandro Rodríguez Site Reliability Engineer
Devin Sylva Senior Site Reliability Engineer
Yun Guo Senior Database Engineer
Craig Barrett Senior Site Reliability Engineer
John T Skarbek Senior Site Reliability Engineer