Is there a way to do this? I need to pick up notes as they are posted to an incident, but I don’t see any way to implement webhooks to make PagerDuty issue a POST call. Alternatively, I can poll the incident externally via GET every few minutes, but I already have generic webhooks hitting an AWS Lambda function upon incident creation or status change. It would be great if I could get the notes as well.
It’s not currently possible to get notified via webhooks whenever notes are added to incidents, but it’s definitely something we’re exploring for the future, especially since this could be used to improve several integrations (i.e. ServiceNow, which currently requires manually clicking Refresh PagerDuty Notes or specific updates to happen, like an ack or resolve in PD, in order to sync notes between the two).
Would you be able to share more about your overall goal so we can pass this on to our product team? They’d love to get an idea of what you’re trying to do as they plan our roadmap, whether it’s just syncing notes with a ticketing system, firing additional actions based on the content of incident notes, or something else entirely.
In the meantime, I’d just recommend making a
GET request to fetch the latest notes every time your webhook processor receives ack/unack, escalation, resolve, etc., in addition to polling on a regular interval. That way you get some semblance of liveness by polling immediately after other webhooks are sent, rather than just waiting for your next cron job to fire.
We have many products and services with mission-critical uptime requirements. Because of this, our executives take an active interest in any sort of outage. To meet their needs and reduce our dependency on email, I developed 3-way integration between PagerDuty, StatusPage, and HipChat. Each of these products has some built-in tools for integration with the other products, but my management asked for things that ultimately required custom integration.
When a PagerDuty incident is created, the webhook hits an AWS Lambda function that creates an incident in StatusPage and sets the appropriate component status, depending on the Pager Duty service name. As part of that process, a dedicated HipChat room is created for the incident, and a note is posted back to the PagerDuty incident with the chat room URL. Any changes to the PagerDuty incident status become webhook calls to AWS Lambda and ultimately updates to the incident on StatusPage. Most importantly, engineers working on an outage are expected to join the chat room and post a note every 30 minutes to give upper management some idea about progress being made, estimated recovery time, etc. To post these notes, we have designed a custom /PD command for HipChat, which takes the rest of the command line and posts it as a note to the PagerDuty incident. I am currently polling PagerDuty, using GET calls issued via cron, to grab the notes and post them to StatusPage. Polling is a clumsy solution because PagerDuty webhooks can capture just about everything else I need in real time. I suppose the other choice is to add a second POST to the HipChat /PD command to update StatusPage.
This 3-way integration gives us a system-wide dashboard in StatusPage, with a summary of outage response activity. We want our executives to see the overall state of the world and to be aware of all remedial actions being taken, without forcing them to try to understand the engineering details being discussed in the chat room. At the end of the incident, we retain the engineers’ detailed conversation in HipChat, which facilitates the production of root cause analysis documents later on.
Ahhh, that’s interesting. It sounds like you basically built your own version of what we call stakeholder engagement
While it’s not a substitute for webhook notifications on notes, which still has its own merits, it sounds like you can get the desired effect by using stakeholder engagement and adding your exec team as subscribers to your incidents. This way they can automatically get updates from PagerDuty whenever notes are added to an incident, on a simplified incident status page where they can’t take unintended actions on the incident (i.e. can’t accidentally resolve it), and you can simply add a link to the incident status page to your HipChat room topic for anybody else who may be interested in following your incident as well.
Status page URLs look like this:
Out of curiosity, why do you create a new HipChat room for every incident? Is it just to have a single place to look for writing postmortems, or is there another reason behind that?
We re-use a single Slack channel here, that way everybody - including our support, sales, and other teams -
knows exactly where to go whenever there’s an incident and they want to stay up-to-date on what’s going on with it. Then we just use the postmortem tool to import Slack messages during the incident timeframe from that single channel.
We have a few requirements that go beyond the scope of stakeholder engagement. In some ways, what we developed is the opposite perspective. For example, we have a unique dedicated chat room per incident, but our status page shows all services and a historical view of all incidents.
In our micro services environment, a problem that shows up in one location may have a root cause elsewhere – and may trigger additional failures. Our custom integration evaluates a set of known dependencies for each service and provides appropriate warnings on the status page. Considerable effort is being made to document and automate the analysis of dependencies, as it will help us dispatch the right engineering team on the first try.
Having a single HipChat room for each incident serves 2 purposes: (1) It keeps the engineering conversation in one place, ready to be incorporated in a postmortem via copy/paste. (2) We have coded HipChat extensions to help our engineers meet a policy directive that requires a note to be posted in PagerDuty every 30 minutes during an outage. Because we have a 1:1 correlation of chat rooms to PD incidents, engineers can issue a HipChat command in the form “/PD This is my note”. Because we know which incident the room belongs to, we know where to post the note in PagerDuty. Also, we are working on a /PDCLOSE command that accepts a final note in HipChat, posts the note to PagerDuty, and simultaneously closes the incident in PagerDuty and StatusPage. Ultimately, if we can ensure that all PagerDuty notes are posted via HipChat, we would no longer care if PagerDuty has web hooks for note creation.
Because the audience for the status page includes high level management of a large company, we receive a great deal of input regarding what the page needs to look like. A static URL is a must, because accessing the page can never be more involved than clicking a bookmark. In a multiple-outage scenario, stakeholders need to see the “state of the world” in one place. Every element of status page design in “in play”, meaning that our stakeholders expect to see a specific layout and even our official corporate CSS. We will never get any single vendor to give us all of this, so we leverage the API of various products (PagerDuty, StatusPage, HipChat) to make them work together in our complex environment.
We have a very similar use case to David’s, so the Note webhook trigger is a must-have for us too!
Pinging this thread to let you know that we’ve released the ability to generate a webhook on incident note creation.
You can create a Generic v2 Webhook (not available on v1 webhooks) on a service. The webhook event type is
More documentation on the new v2 webhook is available here: https://v2.developer.pagerduty.com/docs/webhooks-v2-overview
Let me know if you have questions!