PagerDuty can be integrated into different kinds of workflows. When thinking about how you want to integrate PagerDuty into your workflow, consider 2 things:
- How PagerDuty incidents will be triggered
- How users should respond to PagerDuty incidents
Here we document a few sample workflows for different types of teams:
- Workflow for a DevOps team
- Workflow for a CentralOps organizational structure
- Workflow for a NOC
- Workflow for a support/helpdesk/MSP team
Workflow for a DevOps team
DevOps teams have direct ownership over the services that they build and deploy, meaning they are also responsible for being on-call should any issues arise.
PagerDuty incidents are immediately assigned to the team who owns the impacted service
PagerDuty incidents are escalated to back-up on-call responders and managers
How PagerDuty incidents are triggered?
For a DevOps team, PagerDuty incidents are mostly triggered automatically either via email or via the API.
How users respond to PagerDuty incidents?
Generally, users will acknowledge the PagerDuty notification and begin investigating the problem. Once the problem is resolved, the monitoring tool will send a “resolve” event back to PagerDuty and automatically resolve the incident. Depending on the complexity of the incident, a user may also start a conference bridge, add responders, or add stakeholders to the incident.
Workflow for a Central Ops organizational structure
In a Central Operations (Central Ops) organization, the Operations team owns uptime and reliability and is the first level on-call for all incidents. They are the first line of defense charged with fixing issues and triaging incidents to the engineering team if their remediation efforts require escalations.
PagerDuty incidents are immediately assigned to the Operations group, who will begin investigating and fixing the issue
The Operations group has their own internal escalation process but also escalates incidents by reassigning them outside of their group to the appropriate engineering team as needed
How PagerDuty incidents are triggered?
PagerDuty incidents are triggered by a monitoring tool, event manager tool, or ticketing tool.
How users respond to PagerDuty incidents?
The Operations group will acknowledge the PagerDuty incident and begin remediation efforts. They will either resolve the PagerDuty incident manually or let the monitoring/ticketing tool automatically resolve the PagerDuty incident for them. If they are not able to resolve the incident, they will reassign the PagerDuty incident to the engineering team who should own the incident OR they will add the engineering teams as responders to the incident to begin collaborating on the remediation efforts.
Workflow for a NOC
The NOC is responsible for monitoring, investigating, fixing, and triaging all events. If the NOC needs to notify a team on-call to help fix or investigate an issue, they will use PagerDuty to automate the notification and escalation process and to also identify who is currently working on the issue.
NOC detects an issue that needs to be escalated
NOC opens an incident in PagerDuty which is then assigned to the appropriate team
How PagerDuty incidents are triggered?
PagerDuty incidents are generally triggered manually by the NOC via email, the API, or the website:
Email: Email is sent to a PagerDuty email integration address to trigger an incident.
Website: Click on a button in the PagerDuty web app to manually open an incident. The NOC can either open an incident directly for a team OR open an incident.
API: Configure a button on your wiki page that, when clicked, triggers a PagerDuty incident.
Chat: Type in a command in Slack or HipChat to trigger an incident for the on-call team.
How users respond to PagerDuty incidents?
Since the NOC is usually triggering incidents manually, it is up to the teams being assigned to these incidents to either acknowledge the incident (if they are directly assigned to the incident) or join/decline the incident (if the teams are being added as responders).
If the team is directly assigned to the incident, then they will manually resolve it in PagerDuty. If the team is added as a responder to the incident, then the NOC will manually resolve the incident.
Read more about these NOC workflows
Workflow for a support/helpdesk/MSP team
These teams use PagerDuty to receive notifications for urgent tickets submitted by internal or external customers. Tickets submitted by external customers might come from high-value customers, customers that have agreed SLA, or from any customer that might identify an issue as high priority.
How PagerDuty incidents are triggered?
PagerDuty incidents are triggered manually or automatically:
Manually: A customer can send an email to a designated email address that would trigger a PagerDuty incident when contacted.
Automatically: Ticketing systems like JIRA, Zendesk, ServiceNow, etc. can be configured to automatically trigger a PagerDuty incident when certain conditions of the ticket are met (i.e. Priority level = P1).
How users respond to PagerDuty incidents?
If an incident is triggered manually, then the on-call user will acknowledgement the incident to stop the escalation process and begin investigating the problem. The on-call team will then manually resolve the incident to indicate that they are have resolved the customer’s problem.
If an incident is triggered automatically, then the team assigned to the incident will acknowledge it, work on the issue in their ticketing system, and then let the PagerDuty incident automatically resolve itself through the integration. For example, if integrating with JIRA, the PagerDuty incident will automatically resolve once the JIRA ticket is set to “Done.”