Help test PagerDuty -> Zabbix ("2-way ack") integration?

Hey there,

We’re working on moving to zabbix from an old legacy monitoring system, and one thing that was essential for us was having Acknowledgements in PagerDuty show up in the monitoring system.

So we wrote it: https://github.com/sonic-com/pagerduty2zabbix

Handles:

  • PD incident ack -> zabbix event ack
  • PD incident un-ack -> zabbix event un-ack
  • PD trigger (creation) -> zabbix comment of link to the incident in PD
  • PD incident note -> zabbix event comment
  • PD incident resolved -> zabbix event close attempt
  • PD incident priority update -> zabbix event severityupdate

Anybody want to help test it out?

Caveat: not designed for super high alert volumes (it’s a perl CGI).

Also, in my testing it only works if the PD integration is the generic “events api v2”; the “zabbix” branded integration seems to make the dedup_key silently vanish and it’s needed for this.

FYI, 2-way integrations that depend on the deduplication key to identify alerts in the upstream monitoring system have historically only worked if “create alerts and incidents” is disabled for the service, because the dedup_key on the incident will be blank (instead, the alerts on the incident have the deduplication keys). We also don’t recommend the super-legacy setting that keeps dedup_key in the incident model, as it doesn’t work with any of the new hotness that we’ve developed over the years for managing alerts.

I see however the source code is already halfway to a workaround: retrieving the incident record from the REST API (in subroutine pagerduty_get_event_details). All you need to do is add the query parameter include[]=alerts to the URL and then, from the alerts property in the incident model in the response schema (a list of alert objects), the subroutine can put together a list of dedup keys for all the alerts, which can then be used to find and ack any Zabbix alerts that match them.

All this is not to mention that unless all your PagerDuty services have this setting enabled (it should be the default for all new services), the script will need to distinguish and know when to use the incident model vs. alerts to find the dedup_key.

Good luck and thank you for contributing! I made something like this years ago in Python but never got it ready for broader consumption, so by and large it hasn’t seen the light of day.

1 Like

I’ve got it working for my stuff with just include[]=body, but it was definitely finicky (some PD-side configs broke it, such as using the “Zabbix” branded integration)

I have noticed that merging breaks things (“resolves” most of the alerts and only updates the “parent” item after), though we don’t merge often. Sounds like using data fetched via include[]=alerts might get me to a fix for that issue…

Thanks for the info!