An extremely common question that I receive when introducing a new team to PagerDuty (or any notification tool, really), is “how should I set up my notifications?”
Here’s an answer that has worked well for me.
A Sample Notification Escalation policy
Notify on Assignment
This is a sample notification policy to alert me when an incident is assigned to me. It’s intention is to be “helpfully persistent”. This is catered to my order-of-notification preference, as well as my “dexterity issues” during the wee hours. Feel free to tweak as desired.
| Timeframe | Method | | ----------- | ------ | | Immediately | Email | | Immediately | Push | | 1 Minute | Push | | 2 Minutes | SMS | | 3 Minutes | Call |
One common mistake is to have all notification types trigger at the same time. Unfortunately, what ends up happening is that in the time that it takes you to start responding to the one notification, another will trigger on top of the first. You can’t acknowledge the Push Notification before the SMS comes in, and while you’re trying to type the number to ACK the SMS, the call will come in.
It’s for this reason that it’s better for me to stagger notifications. Emails don’t provide any special notification to me (no beeps or vibrations, tyvm), but some alerts messages may be mangled or truncated on a mobile device. Knowing that I can reference the original message elsewhere is helpful to me.
As an aside, this is a much more humane way of other people dealing with your notifications, too. If you have people around you when your pager might go off (such as a partner at night, or coworkers during the day), this will save them from having to listen to every paged event becoming a barrage of vibrations and sounds, too.
Notify on Resolution
Next, I like being notified immediately if an incident is resolved. Perhaps an alert was being tested and accidentally went Live, or someone is already working the issue and doesn’t need help, but I want to know immediately that action on my part is no longer needed. If your alerting product of choice supports it, you may consider setting up notification that an incident has been resolved, like so:
| Timeframe | Method | | ----------- | ------ | | Immediately | Email | | Immediately | Push |
Notify before going on-call
As much as I like to think that I have my life together, and that everyone else does, too, things happen. Whether I forget that a rotation is coming up, or that I promised to cover someone’s shift, or a colleague is suddenly ill and I’m needed to cover for them, I like being reminded that I’m going on-call.
| Timeframe | Method | |-------------|--------| | Immediately | Email | | Immediately | Push | | 1 Hour | Email | | 1 Hour | Push | | 1 Day | Email | | 7 Days | Email |
You may notice that this article shares some similarity with the “OnCall Handbook” repo in GitHub that was initiated by Alice Goldfuss. There’s a very good reason for that: I wrote some of that. I call this out to help bring awareness to this great resource, and to help head-off concerns that some of the content was “stolen”. I thank Alice for helping to answer the question of “how do I do on-call?” that we hear so often.
What do you think? What works (or doesn’t work) for you?