Trigger 1 Pager Duty alert after multiple failures from New Relic

I’m trying to configure Pager Duty as a sort of gate keeper to filter out low quality violations from New Relic ping tests. My goal is to trigger a Pager Duty alert through the escalation policy chain only if the server is down for more than 10 minutes. Right now we’re getting alerts in the middle of the night if the server is down for a few seconds.

I have my New Relic Alert Policy incident preferences set to open an incident every time a condition is violated and I’m running tests every 5 minutes. The Synthetic is set to search for an element that does not exist on the page to specifically trigger a violation for testing purposes only. I’ve configured an Event Rule so when the body contains “Ping” the following actions are performed:

* Route to [Backend Critical (New Relic)](https://24datainc.pagerduty.com/services/XXXX)
* Suppressing until more than ` 2 alerts` received within `12 minutes`
* Then stop processing

I am artificially forcing 2 violations in a row New Relic with the hope that Pager Duty will filter out the first warning and trigger an incident to the escalation policy on the second but it seems like
a. New Relic is only sending Pager Duty one notification and
b. Pager Duty is receiving that first notification from NR and immediately sending it through the escalation policy.

Any advice please?

It sounds like you’re on the right track, but if PagerDuty is triggering an incident and your rule is set to trigger an incident only when 2 events are received within 12 minutes, it sounds like they might not be hitting that rule. Are you sending incidents to the key/endpoint listed in Configuration > Event Rules > Incoming Data Source? If so, could you point me to an incident that was supposed to hit a rule but didn’t?

1 Like

Thanks for the reply Tom and thanks for pointing me in the right direction. I was using the wrong integration but now that I’m using the correct one All events are suppressed. There’s something I’m not understanding here.

The most current alert at 10:58 today is a good example.

I’ve had three failures so far but still no alerts from Pager Duty.

Since you’re just now setting things up, is there a chance you’ve got a schedule set up with gaps in it? If so, you may want to directly add yourself to the escalation policy while testing to ensure an incident is triggered, as somebody must be on-call for a service in order for an incident to be created.

There’s some other common issues you can check for here:

Thanks for the advice Jonathan. I’m not using a schedule but rather I have myself directly added to the escalation policy at every stage so that I don’t bother my responders with alerts while I’m getting set up.

Hi Greg,

We’ll have to do some inbound event troubleshooting to investigate this further for you. We wouldn’t do such troubleshooting in our Community forums but I would recommend reaching out to Support over email for them to investigate further. Feel free to reference this Community thread.

We’ll be sure to let you know any information needed to investigate there! The team will let you know any further info they may need to investigate

Hi @gregu,

Looking at the suppression message, that will always match your SFL Form policy, as you have a suppression rule which matches Form or other things.

Cheers,
@simonfiddaman

Thank you @simonfiddaman, I changed my settings last week after I found out that I could achieve what I wanted through New Relic. Originally though there was an option in the last line of the Event Rules to suppress incident creation unless unless Pager Duty received X alerts within Y minutes, then create an incident. That option seems to have been removed.

This was a recent design decision to remove that option in the catch-all rule; the idea is that by definition a “catch all” rule shouldn’t have criteria in it.

If this type of filtering is required, you can get the same functionality by adding a similar filter as your last rule (before the catch-all rule).

1 Like

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.