Hi everyone,
I’m part of a IT team working to unify our incident response with infrastructure automation, and we’re exploring better ways to integrate tools like Ansible, Rundeck, or Attune with PagerDuty for real-time operations.
Here’s what we’re trying to achieve:
-
Automatically trigger predefined scripts or runbooks (via Rundeck or Ansible Playbooks) when certain PagerDuty alerts are fired
-
Create custom PagerDuty services for various infrastructure tiers (e.g., DB layer, web servers, networking) with tailored responses
-
Ensure clean logging, rollback, and audit trails for all automated responses triggered by PagerDuty incidents
-
Tie notifications to team roles or escalation policies based on specific server groups or alert types
Has anyone here successfully set up something similar?
We’d love to hear:
-
What integration method worked best for you (webhooks, APIs, plugins)?
-
Any challenges around permissions or security controls when executing remote tasks post-incident?
-
Advice on keeping automation safe and non-disruptive during active incidents?
Would really appreciate any workflows, tooling stacks, or gotchas to avoid!
Thanks in advance,