Skip to main content

 

Hello everyone,

I am facing a challenge with setting up alert routing for Kubernetes clusters using Prometheus and PagerDuty, and I wanted to seek your advice or suggestions on how to solve this.

 

### Problem:
We are currently using Prometheus rules to trigger alerts and send them to PagerDuty (PD) for our Kubernetes clusters. At the moment, all our alerts are routed to a single PagerDuty service. Now, we need to separate these alerts based on labels in the Prometheus rules to ensure they are handled by different teams. Specifically:

- We want to route alerts for different teams based on labels defined in the Prometheus alerting rules.
- Each team should have its own escalation policy within PagerDuty, but we must keep using a single PagerDuty service.
- Unfortunately, we can't use the **AI Ops feature of PagerDuty** to help with this segregation.

 

### Example:
We have two teams, **Team A** and **Team B**. Alerts for these teams should be routed based on a label in the Prometheus alert rules. For example:

- Alerts with the label `team="a"` should follow **Team A's** escalation policy.
- Alerts with the label `team="b"` should follow **Team B's** escalation policy.

Here is a simplified example of how our Prometheus rules look:

 

```yaml
groups:
  - name: example-group
    rules:
      - alert: HighCpuUsage
        expr: cpu_usage > 90
        labels:
          severity: critical
          team: "a"
      - alert: HighMemoryUsage
        expr: memory_usage > 90
        labels:
          severity: warning
          team: "b"
```

We want to ensure that:
- Alerts for **Team A** (with the label `team="a"`) follow their specific PagerDuty escalation policy.
- Alerts for **Team B** (with the label `team="b"`) follow their own escalation policy.

 

### Goals:
1. Split alerts based on labels (e.g., `team="a"` or `team="b"`) in Prometheus rules.
2. Route these alerts to different PagerDuty escalation policies for each team.
3. Continue using the same PagerDuty service without creating separate services for each team.

### Has anyone encountered a similar situation or found an effective solution to this? 
We would appreciate any insights on configurations or workarounds within Prometheus, PagerDuty, or integrations between the two that can help us achieve this routing.

Looking forward to your thoughts and suggestions!

Thanks in advance 

@Minku This is possible to do with Event Orchestration (you can start a free AIOps trial here), which would allow you to set up rules based on fields on the incoming event, to dynamically assign an escalation policy.

Or you’d be able to create custom fields from that event rule, which you could then use Incident Workflows to reassign the incident to a particular escalation policy. 


Reply