Customization of Alert Subject and Description/Message

Hi PD team,

  1. We are getting the alerts like below. But the message is not useful for a engineer as it does not convey nice description. We have multiple alerts where in prometheus metrics we have the job or pod. While job is there pod will be there but vice versa is not true. If prometheus metrics is for deployment pod will be there but not the job.

So wanted descritpion should be more meaningful for job and pod. So that engineer can work on the alerts.

Below is the alert which we are getting but the message/description is not meaningful.

  - Labels:
 - alertname = KubeJobCompletion
 - cluster = pre2
 - endpoint = http
 - instance = 10.2.45.126:8080
 - job = kube-state-metrics
 - job_name = vault-backup-1581410820
 - namespace = dev-vault
 - pod = prometheus-operator-kube-state-metrics-7ccb55bbc7-nssmp
 - prometheus = dev-monitoring/dev-prometheus
 - service = prometheus-operator-kube-state-metrics
 - severity = warning
Annotations:
 - message = Job dev-vault/vault-backup-1581410820 is taking more than one hour to complete.
 - runbook_url = https://someurl/

Below is the configurations

global:
  resolve_timeout: 5m
route:
  group_by: [ 'namespace', 'pod', 'job', 'severity', 'instance']
  group_wait: 30s
  group_interval: 5m
  repeat_interval: 12h
  receiver: 'dev-teams-test' #Default rec. Maybe a slack channel for non-critical/fallback alerts.
  routes:
  - receiver: dev-pager
    match_re:
      ## The default namespace is included because some API alerts specify that namespace.
      ## The trailing '|' is intentional. This will give us alerts from any non-namespaced alerts
      namespace: dev.*|kube.*|istio.*|default|
      severity: critical|error|warning
    continue: true
receivers:
- name: dev-pager
  pagerduty_configs:
  - send_resolved: true
    routing_key:
    url: https://events.pagerduty.com/v2/enqueue
    severity: '{{ if .CommonLabels.severity }}{{ .CommonLabels.severity | toLower }}{{ else }}critical{{ end }}'
    description: '{{ if .CommonLabels.job }}{{ .CommonLabels.job | toLower }}{{ else }}{{ .CommonLabels.pod | toLower }}{{ end }}'

In the configurations, I tried to add the descritpion, is it valid? If not could you please suggest or share the example.

  1. Second issue is basically for the alert name which we are getting

Alert subject/Name : [FIRING:6] 10.2.45.126:8080 kube-state-metrics dev-vault warning (pre2 http prometheus-operator-kube-state-metrics-7ccb55bbc7-nssmp dev-monitoring/dev-prometheus prometheus-operator-kube-state-metrics)

Here in command label we are getting un wanted data like http.

How could we delete the http.

alertmanager/tempalte entry which is adding the common labels.

{{ define "__subject" }}[{{ .Status | toUpper }}{{ if eq .Status "firing" }}:{{ .Alerts.Firing | len }}{{ end }}] {{ .GroupLabels.SortedPairs.Values | join " " }} {{ if gt (len .CommonLabels) (len .GroupLabels) }}({{ with .CommonLabels.Remove .GroupLabels.Names }}{{ .Values | join " " }}{{ end }}){{ end }}{{ end }}

Kindly suggest on both?

Hi Gourav,

Regarding the first issue reported, Prometheus’s documentation provides an example of how to explicitly set the description for a PagerDuty incident. Can you take a look at that example and see if you can apply it to the configuration?

In regards to the alert name, could you provide some more detail on how you would wish for the alert name to appear? Also, based on your template, could you clarify if http and other data are part of the common labels specified?

I did also want to add that while these issues focus around configuration on the Alertmanager side, it is also possible to extract a field into the title of an incident using field extraction via event rules. This can be done on the PagerDuty end.

Hi Jade,

Thank you for your inputs, I was able to set the description explicitly and results are encouraging. Using the description and common labels, I was also able to get the alert name / subject which convey message to on call engineer. Thank you again for your inputs, with the above changes able to remove the redundant fields like http.

Hi Gourav,

Stepping in here for Jade. Thanks for letting us know. We are glad that you have it working.

If there is anything else we can help you with please do not hesitate to ask.

Thanks,

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.