Integrating Nagios with Multiple PagerDuty Services

nagios
official
tips

(Demitri Morgan) #1

Introduction

Let’s say your Nagios service monitors groups of services and hosts that are under the care of different teams of individuals. In a basic integration, all alerts for all services and hosts would go through the same service and thus the same escalation policy, alerting the same personnel. However, it is possible to route alerts for distinct hosts and services to different integrations and thus different services and escalation policies.

Furthermore, being able to route to multiple services grants another option for of performing automatic alert triage: by setting the urgency of incidents on some services (designated for low-priority systems) to low, as opposed to others that are more mission-critical.

Disclaimer

Nagios is highly configurable. How it is configured out-of-the-box when it is first installed may vary dramatically between implementations, i.e. if it is installed from source or through a software package built by the maintainers of a Linux distribution. With features like templating, groups and inheritance, the design of Nagios’ configuration language permits a virtual infinitude of different ways in which one can achieve any given goal in configuration.

That being said, the configuration examples given here implementation are not the only way to achieve the goal of integrating Nagios with multiple PagerDuty services, and an actual working path to implementing this goal may vary based on your preexisting Nagios configuration. If you paste them into your configuration as-is and substitute your own object names, not only will it probably not work but it might also break or otherwise disrupt your existing Nagios configuration.

You must understand what you are doing. If you feel comfortable with the configuration language of Nagios, you should be able to adapt these rudimentary examples to best fit your implementation.

Prerequisites

It is thus strongly advised to have some familiarity with the following topics:

To learn more about your Nagios installation: you can start by finding out where your configuration resides, which is what you’ll be editing. Here’s one way to do it:

root@ip-172-30-0-109:~# ps aux | grep nagios
nagios   15006  0.0  0.4  42576  4844 ?        SNsl 01:01   0:00 /usr/sbin/nagios3 -d /etc/nagios3/nagios.cfg

The -d option indicates the path to the main configuration file.

Next, note any instances of cfg_file or cfg_dir in the main config file. These indicate secondary configuration files that are loaded, parsed and applied.

How It Works

Firstly, note (vis-à-vis the PagerDuty configuration templates and configuration changes shown in the integration guides), whether for the agent-based integration or agentless integration, the integration works as follows:

  • Two notification commands are defined, which handle calling an external integration script (supplied by PagerDuty) with the necessary arguments for submitting alerting data to PagerDuty. They are for service and host notifications
  • A contact is defined, whose pager property is the integration key, and which is configured to use the special notification commands for event delivery to PagerDuty.
  • The contact is added to all hosts and services by way of adding it to a contact group.
  • The contact group, to which the contact is added, is used on all hosts and all services by way of templates that specify that contact group for alerts. This structure applies to many out-of-box implementations of Nagios, particularly for Debian and RHEL-based installations, which is why it is advised.

What is touched upon in the FAQ of the Nagios integration guides is the fact that, by re-using the same notification commands on a different contact with a different pager property (set to a distinct integration key), that contact can be used for sending Nagios alerts to a different service.

Sample Configuration

First, let’s define two distinct contacts based on the original configuration template, which have distinct values for contact_name and pager.

    define contact {
           contact_name                             pagerduty_foo
           alias                                    PagerDuty Pseudo-Contact (FOO)
           service_notification_period              24x7
           host_notification_period                 24x7
           service_notification_options             w,u,c,r
           host_notification_options                d,r
           service_notification_commands            notify-service-by-pagerduty
           host_notification_commands               notify-host-by-pagerduty
           pager                                    291a2412ebec493b9e4cd0f92aceb8eb
    }

    define contact {
           contact_name                             pagerduty_bar
           alias                                    PagerDuty Pseudo-Contact (BAR)
           service_notification_period              24x7
           host_notification_period                 24x7
           service_notification_options             w,u,c,r
           host_notification_options                d,r
           service_notification_commands            notify-service-by-pagerduty
           host_notification_commands               notify-host-by-pagerduty
           pager                                    66ac7e430b924344b4dff67353171722
    }

Next, let’s say we have a host foohost and want the service associated with pagerduty_bar to have an incident when it is unreachable. *Note, in this example, it is assumed we have a host template named generic-host which defines some reasonable default values for various options. This host is defined in the pre-built Nagios software package for Debian / RHEL-based implementations. We add:

    define host{
        use                     generic-host 
        host_name               foohost
        alias                   foohost
        address                 192.168.7.6
        contacts                pagerduty_bar # Required for PagerDuty integration
    }

On this host, we are running HTTP and SSH services. To send alerts to pagerduty_foo's service when the HTTP service is inaccessible, and pagerduty_bar's service when SSH is inaccessible (note also here the template generic-service):

    define service {
        host_name                       foohost
        service_description             HTTP
        check_command                   check_http
        use                             generic-service
        notification_interval           0 
        contacts                        pagerduty_foo # Required for PagerDuty integration
    }

    define service {
        host_name                       foohost
        service_description             SSH
        check_command                   check_ssh
        use                             generic-service
        notification_interval           2
        check_interval                  2 
        contacts                        pagerduty_bar # Required for PagerDuty integration
    }

Icinga1 send to different services
trouble with Perl Nagios integration