Hello everyone,
I am facing a challenge with setting up alert routing for Kubernetes clusters using Prometheus and PagerDuty, and I wanted to seek your advice or suggestions on how to solve this.
### Problem:
We are currently using Prometheus rules to trigger alerts and send them to PagerDuty (PD) for our Kubernetes clusters. At the moment, all our alerts are routed to a single PagerDuty service. Now, we need to separate these alerts based on labels in the Prometheus rules to ensure they are handled by different teams. Specifically:
- We want to route alerts for different teams based on labels defined in the Prometheus alerting rules.
- Each team should have its own escalation policy within PagerDuty, but we must keep using a single PagerDuty service.
- Unfortunately, we can't use the **AI Ops feature of PagerDuty** to help with this segregation.
### Example:
We have two teams, **Team A** and **Team B**. Alerts for these teams should be routed based on a label in the Prometheus alert rules. For example:
- Alerts with the label `team="a"` should follow **Team A's** escalation policy.
- Alerts with the label `team="b"` should follow **Team B's** escalation policy.
Here is a simplified example of how our Prometheus rules look:
```yaml
groups:
- name: example-group
rules:
- alert: HighCpuUsage
expr: cpu_usage > 90
labels:
severity: critical
team: "a"
- alert: HighMemoryUsage
expr: memory_usage > 90
labels:
severity: warning
team: "b"
```
We want to ensure that:
- Alerts for **Team A** (with the label `team="a"`) follow their specific PagerDuty escalation policy.
- Alerts for **Team B** (with the label `team="b"`) follow their own escalation policy.
### Goals:
1. Split alerts based on labels (e.g., `team="a"` or `team="b"`) in Prometheus rules.
2. Route these alerts to different PagerDuty escalation policies for each team.
3. Continue using the same PagerDuty service without creating separate services for each team.
### Has anyone encountered a similar situation or found an effective solution to this?
We would appreciate any insights on configurations or workarounds within Prometheus, PagerDuty, or integrations between the two that can help us achieve this routing.
Looking forward to your thoughts and suggestions!
Thanks in advance