Responding to Incidents in PagerDuty

incident-response
success
official

(Hailey Hickman) #1

Let’s review what to do when an incident is assigned to you.

Acknowledge or Reassign

  • The first step is to Acknowledge the incident if you are going to be taking ownership of it. This will assign the incident to you and stop it from escalating to the next person on-call.
  • If the incident belongs to another person or team, you can reassign it to a different escalation policy (or user) which will restart the notification process for that team.
  • After you acknowledge an incident, you have the option to snooze it to prevent it from notifying you again for a specified period of time. This is useful if the service has an incident acknowledgement timeout, which will re-trigger an incident and begin notifying you again after the specified timeout time has been reached. In addition, you have the option to unacknowledge an incident to return it to a triggered state and restart the notification process.

Assess Impact and assign a Priority

  • Once the incident has been acknowledged, you need to assess the impact and determine actionable next steps. You can assign a priority to an incident which defines the impact and gives you common language and criteria when communicating about incidents. It also provides more visibility into which incidents are most business critical by color coding and moving them to the top of the incident dashboard view.

Mobilize Response Team

  • An incident might require a multi-team response if it’s a widespread issue, in which case you will want to add responders. Responders are any additional individuals who are directly involved with responding to and resolving an incident. Add a response bridge (conference line or meeting link) when you add responders to quickly collaborate with the team. You can also pre-configure a conference bridge at the account level or at the service level.

Inform Business Stakeholders

  • In addition to adding responders to an incident, you have the ability to proactively inform business stakeholders by subscribing them to an incident. Users who are subscribed to an incident will receive a notification when a status update is posted.
  • To completely automate the process of adding responders and subscribers, we have Response Plays. So, for example, at PagerDuty I know as soon as I classify an incident as P1 I need to loop in the same response teams and notify the same business stakeholders. A Response Play is preconfigured with these teams and individuals so I can notify all of them with a click of a button.

Add Notes to provide additional context

  • You can add notes for additional context and documentation during the incident response, or in the pop-up message that appears after you’ve clicked resolve.

Merge Incidents

  • If there are two or more related incidents, you can combine them into one incident by merging them together.

Run a Custom Incident Action

Resolve the Incident

  • Once you are confident that service has been fully restored, resolve the PagerDuty incident.

Check out a video overview here