Monitoring Team and Service configuration by Best Practices

Hello :wave:,

We had multiple events where our applications sent alerts to PagerDuty through the Icinga2 integration, but the PagerDuty team, service, schedule, or escalation policy wasn’t configured properly.
So we decided to monitor the target of each alert in Icinga in PagerDuty, which means that for every Icinga2 User with an Integration Key we will run a check in front of PagerDuty API to see if a recipient for that Integration Key exists and if it’s properly configured.

We are having trouble implementing this with PagerDuty API and would love some assistance.
The best way to do this is to get the service name by the integration key from PagerDuty API.
I didn’t find a way to do that, tried to look for an integration endpoint, and tried going over all the services to find an integration with my integration key, but the API doesn’t supply a proper result for either.

I even considered creating an Event by the integration key with a defined incident key, to later query the service name of the created incident, or just to try an see if an incident were created, which means the service is properly configured and someone is on call, however, this shoots alerts to my development teams which is very problematic.

I would appreciate any idea or suggestion,
Thank You in advanced!

I recommend integrating Icinga2 with our Event Rulesets feature versus directly with one or more PagerDuty Services. This simplifies your integration to one user/key and gives you the flexibility of event rules to ensure the events/alerts are sent to the right services and notify the right on-call team/responder.

Additionally, you’ll want to ensure the Icinga2 events/alerts contain rich metadata such as the service or application name, team ownership, environment information will then enable you to create the needed event ruleset rules to properly route, enrich, suppress or take other desired actions on incoming events/alerts.

Looks interesting, I will read about it and update here after a successful implementation.

EDIT:
The shift to Event Rulesets will change a lot of how we work, so we don’t want to put that effort in yet.
Plus we would rather keep as much of the check logic in Icinga.

Open to more ideas :slight_smile: