Need to escalate everything, regardless of urgency

urgencies
feature-request
escalation-policies
questions

(Raymond Goldstein) #1

Hello, we need to the ability to escalate all Incidents, regardless of Urgency settings. All our responders have Low Urgency notifications set for Push, Text and EMail, and High-Urgency settings that add phone calls. While our business end-users do not need us to immediately respond to Low Urgency Incidents, our IT management wants to know that someone on the response team is aware of the issue. If someone misses a Low Urgency notification, we would like it to escalate to the next level, but not necessarily having to use the High Urgency settings. The only way to do this now is to just use High urgency for everything, which is not a good solution. How can we have 2nd level be notified that no one has acknowledged a low-urgency incident, after say an hour or two (or on a weekend, a day or 2). We’d rather not resort to High Urgency for everything.


(Jade Paoletta) #2

Hi Raymond,

Thanks for reaching out.

At the moment, we don’t have a way to escalate a low-urgency incident after a set period of time. I can certainly submit a feature request around this, however, the current functionality is that low-urgency incidents do not escalate.

In the meantime, you may want to check out our Support Hours feature. This allows you to set defined hours for when triggered incidents will notify users via low-urgency rules, as well as hours when they should escalate via high-urgency notification rules.

I hope this helps, however, feel free to reach out if you have any additional questions or concerns!


(Raymond Goldstein) #3

Thank you Jade. We tried that. It’s a tough thing to balance, making sure our managers know we saw the issue, but at the same time, not having responder’s phones ring for low urgency issues. Their phones ring enough after hours, and we don’t want to add to that for every low urgency alert. But at the same time, we want to make sure that when upper management sees the alert, they can also see that someone acknowledged it, and will get to it eventually.


(Jade Paoletta) #4

Hi Raymond,

We appreciate your feedback! I’ve gone ahead and communicated this to our Product team in the form of a feature request.

If you are trying to increase the visibility of your incidents without creating additional noise via phone notifications, you may also want to consider looking into our Extensions. Our Extensions send trigger, acknowledge and resolve data to external channels. For example, our Slack extension is a popular way to sync incidents on a service to a Slack channel, increasing the visibility of the incident status.

Let me know if you have any additional questions!


(Ben Abrams) #5

That sounds awful, your management does not understand what incident priority means. If its low priority it means it is something that someone should never be pulled out of bed for and that it can wait until morning. While I think there is value in the feature suggested I don’t feel it solves the underlying problem you have. Your management does not attempt to limit exposure to on-call to only the most critical issues. Failure to do so leads to elevated burn out and missing/ignoring the important alerts because your phone is going off for things that can wait.

Just my $0.02


(Raymond Goldstein) #6

You nailed it. Have you worked here before? :wink:
But in their defense, they definitely want to make sure folks are not pulled out of bed. In fact there are very few things that are deemed “High Urgency” and do wake us up in the dead of night. We want to avoid a situation where, for example, a hard drive fills to 97% on a Friday night, triggers a PD Low Urgency Alert, and it sits as an open incident all weekend (just a hypothetical example). All they want is to see it acknowledged sometime over the weekend. If only to say “I saw it, will fix on Monday”. So I don’t think it is unreasonable to have a Low Urgency alert escalate (with Low Urgency notifications) to 2nd level, if no one acknowledged the alert after a given time. After all, “On Call” does not mean 9-5 on Weekdays.


(Jade Paoletta) #7

Hi Raymond,

We can certainly appreciate and see the value in this use case.

If the concern is not being woken up during off hours, the other workaround you might consider would be to use support hours. You could potentially configure an hour each day, or every other day, during working/waking hours where incidents would escalate via high-urgency rules. The rest of the time, you could have the incidents only notify via low-urgency rules so that they are not waking up your users in the middle of the night.

This workaround would escalate the incidents during the escalation period via high-urgency rules, which may a phone call if this is configured within the user’s high urgency rules, so that is one thing to keep in mind.

I’ll go ahead include your use case as feedback for having low-urgency incidents that escalate but also continue to notify via low-urgency notification rules.

Thanks!


(Ben Abrams) #8

You nailed it. Have you worked here before?

Nope but I have worked at places where this mentality led me to quit.

In fact there are very few things that are deemed “High Urgency” and do wake us up in the dead of night

Glad there is at least something being done right here.

All they want is to see it acknowledged sometime over the weekend. If only to say “I saw it, will fix on Monday”. So I don’t think it is unreasonable to have a Low Urgency alert escalate (with Low Urgency notifications) to 2nd level, if no one acknowledged the alert after a given time. After all, “On Call” does not mean 9-5 on Weekdays.

What’s the value in this? If it is low enough to not warrant fixing the issue but you still have to take away from your personal life it is still a disruption. If it’s low priority then just let it keep going to the same person who will eventually get to it no sense ruining someone else’s weekend.

After all, “On Call” does not mean 9-5 on Weekdays.

That depends we have Mon-Fri on-call that covers day shifts 9-5 and a separate schedule for after hours support which includes a front line support team that normally would not try to address issues during normal business hours as we have the people available who are most likely to get a quick resolution. After hours we are OK paying a little cost in time to protect our people to only actual critical things overnight.

I am definitely in favor of adding this feature though it will likely be used in ways that both enable businesses to use it responsibly and also abuse their employees.


(Raymond Goldstein) #9

That’s how we roll.


(Simon Fiddaman) #10

Hey @RaymondGoldstein, couldn’t you differentiate based on criticality (i.e. use the mapped Common Event Fields) to set a High Urgency response on a CRITICAL (e.g. disk at 97%) and Low Urgency response on everything else? This way you can demonstrate that issues which tip the balance into real problems will be handled with urgency (as with any other high urgency issue), and everything else could safely be left unacknowledged for a while / for the weekend.



(Raymond Goldstein) #11

Thanks Simon. Tried that but it won’t work for us. We’ve also tried making things high-urgency during certain hours and low-urgency at night, but High-urgency notifications (which are intense) cause home and cell phones to ring, and are family disruptive to our staff. Let me give you background. We are a large government contractor that has some level of business operations 24/7. But IT is not, and will not be staffed for that. So On-Call teams handle non-core business hours (nights, weekends and holidays). That’s just how it is. We have over 800 servers, and use Solarwinds as our primary monitoring and alerting system, and deal with literally hundreds of alerts a day. Some we deal with immediately, others we let slide for a while. All handled within Solarwinds. But rather than have our after-hours On-Call teams deal with sorting thru hundreds of Solarwinds alerts, we chose about 5 or 6 conditions that Solarwinds should escalate to PagerDuty after hours. There are High-Urgency alerts like an HVAC casualty, Fire, Flooding, or certain mission critical server outages. These are set to wake people up at night, no matter what. Then there are the low urgency alerts like the disks filling up, lower criticality server outages, etc. that don’t need to be fixed immediately, but need to be resolved before normal business hours resume on the next business day. So in essence, PagerDuty is our first level of after-hours escalation. Anything that is not business critical, is not escalated to PagerDuty. But if it goes to PagerDuty at all, it was deemed important enough to have a response at some point. So in our case, unacknowledged is unacceptable. We can argue with our Management’s philosophy on this, but as they said in the Godfather; “This is the business we’ve chosen”. :wink:


(Simon Fiddaman) #12

Hi @RaymondGoldstein, thanks for the detailed background. I’m so incredibly context-driven I needed this to understand.

I’m on board now - I can see how that would give you value. I can’t think of anything else which is going to meet all of your criteria right now, but if I do, I’ll post back.

Hey, if you need someone to talk to about it, or just rubber duck it, I’m here. :slight_smile:

Cheers,
@simonfiddaman


(Raymond Goldstein) #13

Many thanks Simon! :grinning: