Some opinions on "How should I set up my Pager Notifications?"

pager
config
notifications
incident-response
alerts

(Waldo G) #1

An extremely common question that I receive when introducing a new team to PagerDuty (or any notification tool, really), is “how should I set up my notifications?”

Here’s an answer that has worked well for me.

A Sample Notification Escalation policy

Notify on Assignment

This is a sample notification policy to alert me when an incident is assigned to me. It’s intention is to be “helpfully persistent”. This is catered to my order-of-notification preference, as well as my “dexterity issues” during the wee hours. Feel free to tweak as desired.

| Timeframe   | Method |
| ----------- | ------ |
| Immediately | Email  |
| Immediately | Push   |
| 1 Minute    | Push   |
| 2 Minutes   | SMS    |
| 3 Minutes   | Call   |

One common mistake is to have all notification types trigger at the same time. Unfortunately, what ends up happening is that in the time that it takes you to start responding to the one notification, another will trigger on top of the first. You can’t acknowledge the Push Notification before the SMS comes in, and while you’re trying to type the number to ACK the SMS, the call will come in.

It’s for this reason that it’s better for me to stagger notifications. Emails don’t provide any special notification to me (no beeps or vibrations, tyvm), but some alerts messages may be mangled or truncated on a mobile device. Knowing that I can reference the original message elsewhere is helpful to me.

As an aside, this is a much more humane way of other people dealing with your notifications, too. If you have people around you when your pager might go off (such as a partner at night, or coworkers during the day), this will save them from having to listen to every paged event becoming a barrage of vibrations and sounds, too.

Notify on Resolution

Next, I like being notified immediately if an incident is resolved. Perhaps an alert was being tested and accidentally went Live, or someone is already working the issue and doesn’t need help, but I want to know immediately that action on my part is no longer needed. If your alerting product of choice supports it, you may consider setting up notification that an incident has been resolved, like so:

| Timeframe   | Method |
| ----------- | ------ |
| Immediately | Email  |
| Immediately | Push   |

Notify before going on-call

As much as I like to think that I have my life together, and that everyone else does, too, things happen. Whether I forget that a rotation is coming up, or that I promised to cover someone’s shift, or a colleague is suddenly ill and I’m needed to cover for them, I like being reminded that I’m going on-call.

| Timeframe   | Method |
|-------------|--------|
| Immediately | Email  |
| Immediately | Push   |
| 1 Hour      | Email  |
| 1 Hour      | Push   |
| 1 Day       | Email  |
| 7 Days      | Email  |

You may notice that this article shares some similarity with the “OnCall Handbook” repo in GitHub that was initiated by Alice Goldfuss. There’s a very good reason for that: I wrote some of that. I call this out to help bring awareness to this great resource, and to help head-off concerns that some of the content was “stolen”. I thank Alice for helping to answer the question of “how do I do on-call?” that we hear so often.


What do you think? What works (or doesn’t work) for you?


(Jonathan Curry) #2

I like to get email and push notifications immediately, and hold the SMS alerts for 3 minutes and calls for 5 minutes. I’m a pretty heavy sleeper, so I have a second call set up at 10 minutes in case I slept through the first call (which sometimes happens despite setting my PagerDuty contact to use the most frightening alarm sound I can), or if I accidentally dismissed the call… possibly from reaching over to my nightstand and throwing my phone across the room. :speak_no_evil:

Staggering the notifications doesn’t just help me when trying to ack incidents, since as you mentioned it’s pretty hard to take action one way when you immediately start getting yelled at another way, it also really helps me maintain my composure. If I had multiple notifications coming in at the same time via push, SMS, and phone, I would get frustrated when my phone started ringing as I started trying to tap ack, reply :four: (or whatever number it is in that case), etc, and lose my cool … maybe want to throw my phone against the wall, then end up taking longer to get started working on a problem because I would still be annoyed at my phone and PagerDuty for awhile. So, I personally need time to peacefully review an incident before I start getting called if I’m already awake, which is why I always cringe whenever I hear somebody say they want to force very specific notification rules for every user in PagerDuty.

As for status updates, I spend quite a bit of time in Slack and we have the PagerDuty :left_right_arrow: Slack integration set up, so I just use the Slack extension to ack or resolve incidents when I’m in Slack and to get incident updates whenever I’m not in PagerDuty itself.

I use the default (I think?) 24 hour and immediate email rules to be notified when I go on-call, and export my personal PagerDuty schedule to the Apple Calendar app since I already use it every day.