Getting a list of inactive or unused PagerDuty users


(Demitri Morgan) #1


One may wish to detect and remove PagerDuty users who are no longer working in one’s organization, to avoid unused user seats. This post deals with how to obtain that user list.

Getting user sets

There are a few ways of doing this, and all of them ultimately lead to obtaining difference between two sets: the list of all users, which we’ll call A, and the list of active users, which we’ll call B.

Once each list of users is obtained, it is useful to construct a mapping of user IDs to user email addresses, and vice versa. The set comparison can use either. However, to make subsequent API calls easier, it’s best to retain/memoize the user IDs, i.e. so that (for instance) DELETE REST API calls already have the IDs and don’t have to obtain them once again by querying for the email address. Furthermore, if B is obtained from a third-party system, i.e. an employee roster or identity provider, and these data contain the same email addresses as used in PagerDuty, having a list of user emails makes the comparison easier.

A: The list of all users

A is easy to obtain; one can query the list of all users by iterating through successive calls to the /users endpoint with offset and limit parameters until the more property of the response is false (see: Pagination (v2 REST API guide)). For instance, if one has 178 users, there will be a total of two API calls, as follows:

# response should contain "more": true 
# response should contain "more": false

In each user record returned from the above endpoint (the list in the users property), per the response schema specification, will contain an id and email property, corresponding to their user ID and login email, respectively.

B: The list of active users

This is where one will need to pick an approach that best fits the use case. For instance, if using an identity provider for single sign-on, the most straightforward approach is to get a list of users still provisioned in the IDP, since anyone without an account in the IDP is not going intended to be logging in to PagerDuty. Furthermore, one will need to handle edge cases such as identifying any dummy users, and the account owner.

A few techniques, apart from those already mentioned, will both involve iterating over records that implicate a user in some activity within PagerDuty, and adding their user ID to the set as they are encountered.

Get all users that still have on-call shifts

First obtain the list of active users using the on-calls API endpoint, setting the since and until parameters at the end of the URL to specify a range of dates (see: DateTime type format). In each element of the list returned in the response schema of that endpoint (see the documentation for further details), the property at namespace path has the ID of the user, which uniquely identifies the user.

For the date/time range, one should pick one that is as broad as possible to include all users who are on call but infrequently, but narrow enough to exclude the last time that the inactive users were on-call.

A - B will then involve a comparison based on user IDs. Depending on the next steps one wishes to take, one might need to get the users via their API endpoint url, given in the user.self property of each on-call list entry.

See who’s been doing work and getting involved vis-à-vis log entries

The REST API Incident Log Entries endpoint lists all the activity of every incident’s timeline and accepts the same since and until parameters as the /oncalls endpoint.

Note, however, that there will be a lot of data in this endpoint, and so going back far enough in time will require more paging through results using the limit and offset parameters. For instance, one could construct a while loop that runs until the created_at date of any log entry is more than a set amount of time ago, and incrementing the offset parameter by the limit parameter in each iteration.

Hence, one could use this to determine the last time any given user acknowledged, resolved, escalated, was assigned or was notified about an incident in a time range. This would be in the agent property of each list entry returned from the endpoint, in the id field, wherever type (of the agent property’s object) is user_reference. The created_at property contains the time that the action was taken.

Similar to using /oncalls, one will then need to use the self property (it will be in agent.self), and the set difference will involve a comparison of user IDs.

The disadvantage of this method is that if there are agents who are only on-call at higher levels of escalation policies, they might be less often paged or involved in incident response (unless escalation is common enough), and so they might end up excluded from the list of active users.

Computing the difference

This is by far the easiest part; many scripting languages have utilities for this already. For example, Python has the native set type, which recognizes the minus operator, so (given two set type objects A and B) it will be as simple as:

inactive_users = A - B

De-provisioning Users

Before deleting a user, one must:

  • Remove them from all on-call schedules and escalation policies
  • Ensure that there are no open incidents such that the service used an escalation policy that the user was on at the time the incident was triggered

Fortunately, we have a solution for this, which automates these tasks:

To use it, one will just need the list of users’ login email addresses.

Only active users in
(system) #2