- You are here:
- Meet the GitLab team
- Organizational Structure
- Working Groups
- Logging Working Group
On this page
||September 9, 2019
||#wg_log-aggregation (only accessible from within the company)
||Logging Working Group (only accessible from within the company)
- Monitor and Security teams will be responsible for triaging all logging issues
- Determine the long-term owner of the ultimate logging product and/or process
- Monitor and Security teams will be responsible for defining logging standards
- Clearly define how logging works at GitLab in the runbooks
- Triaging logging issues using the TriageBot
- Security team will triage security and compliance issues to either the security team or a dev team
- For dev teams, the Monitor will triage issues and assign to appropriate dev team.
- Triagebot is modified by the security team to send slack notifications if higher priority logging issues aren't resolved in a specified timeframe.
- Monitor: APM group takes ownership of the overall logging products/process
- Infrastructure will continue to own the system architecture and systems.
- Monitor will slowly build out a comprehensive logging product to meet security and dev needs.
- Logging Standards
- Monitor will own logging standards for dev teams
- Security will own logging standards for compliance purposes
- Monitor and Security team will both release logging standards
- Transfer all current state of logging to the logging (runbook)[https://gitlab.com/gitlab-com/runbooks/tree/master/logging/doc]
- Include all known details of the current system (DELKE remains 3rd-party-only for now)
- Describes what logs are logged where
- Describes current and proposed log standards
- Details how to hook up new analyzers
- Details about how TriageBot works
- Details about who owns what parts of the triaging process
What do other companies do?
- Jobs and files in ParK in AWS (wife just started there)
- Look into Presto (Facebook Distributed SQL)
- HotWarm ElasticSearch
LabKit is an application logging library Andrew Newdigate invented to help structure and standardize logging (similar to Graphite pings) throughout the Ruby and Go code bases
Where do logs go today?
- Unstructured logs (redis, etc) sent to GCS and Stackdriver
- Structured sent to ELK
- Large structured logs sent to BigQuery (via Stackdriver)
- Missing logs (for systems/services)
- Incomplete logs
- Substandard logs
- Inconsistent formats
Roles and Responsibilities
|Working Group Role
||Security Software Engineer, Automation
||Interim Director of Security
||Staff Engineer, Infrastructure
||Senior Backend Engineer, Security
||Senior Threat Intelligence Engineer
||Backend Engineer, Verify
||Security Engineer, Security Operations
||Security Manager, Security Operations
||Senior Security Analyst, Compliance
Requirements and Considerations
- As a GitLab employee, I know there is a single team I can talk to about all things logging at GitLab.
- As a GitLab employee, I can find who is on the logging team by visiting a single handbook page.
- As a GitLab employee, I can contact the logging team via dedicated slack channel.
- As a GitLab employee, I can label GitLab issues for the logging team.
- As a logging team member, I only have one set of logging services/infrastructure to manage and maintain.
- As a paying, self-managed customer, I can easily set up a logging infrastructure for GitLab to log to.
- As a developer or support, I can easily find, search, view, and analyze logs up to 6 months old
- As a developer or support, I can easily add a new logger without coordinating with any other team
- As a developer or support, I can easily find one page in the handbook that details what services we log to for each type of data, and what the URLs are for their dashboards.
- As a developer or support, I can easily find one page in the handbook that clearly specifies how to write log messages.
- As a developer or support, I can easily add a log message with all of the required data.
- As a developer or support, I can be confident that no private data is accidentally logged.
- As a developer or support, I use the same logging library per language as every other developer at GitLab.
- As an infrastructure engineer, I can set up multiple sinks for different logs
- As an infrastructure engineer, I can easily find, search, view, and analyze logs up to 1 week old
- As an infrastructure engineer, I can easily specify the retention period for each data store and/or data type (i.e., security related)
- As a security researcher/security engineer/compliance staff, I can easily find, search, view, and analyze logs up to 12 months old
- As a security researcher, I can easily correlate logs between sources and between GitLab subsystems.
- As a security researcher, et al., I can easily use logs to train ML models with historic data.
- As a security researcher, et al., I can easily use our logs to detect potential security issues before they become problems.
- As a security researcher, et al., I can easily analyze the logs for sensitive or private data.