One may wish to detect and remove PagerDuty users who are no longer working in one’s organization, to avoid unused user seats. This post deals with how to obtain that user list.
Getting user sets
There are a few ways of doing this, and all of them ultimately lead to obtaining difference between two sets: the list of all users, which we’ll call A, and the list of active users, which we’ll call B.
Once each list of users is obtained, it is useful to construct a mapping of user IDs to user email addresses, and vice versa. The set comparison can use either. However, to make subsequent API calls easier, it’s best to retain/memoize the user IDs, i.e. so that (for instance)
DELETE REST API calls already have the IDs and don’t have to obtain them once again by querying for the email address. Furthermore, if B is obtained from a third-party system, i.e. an employee roster or identity provider, and these data contain the same email addresses as used in PagerDuty, having a list of user emails makes the comparison easier.
A: The list of all users
A is easy to obtain; one can query the list of all users by iterating through successive calls to the
/users endpoint with
limit parameters until the
more property of the response is
false (see: Pagination (v2 REST API guide)). For instance, if one has 178 users, there will be a total of two API calls, as follows:
# response should contain "more": true https://api.pagerduty.com/users?limit=100&offset=0 # response should contain "more": false https://api.pagerduty.com/users?limit=100&offset=100
In each user record returned from the above endpoint (the list in the
users property), per the response schema specification, will contain an
B: The list of active users
This is where one will need to pick an approach that best fits the use case. For instance, if using an identity provider for single sign-on, the most straightforward approach is to get a list of users still provisioned in the IDP, since anyone without an account in the IDP is not going intended to be logging in to PagerDuty. Furthermore, one will need to handle edge cases such as identifying any dummy users, and the account owner.
A few techniques, apart from those already mentioned, will both involve iterating over records that implicate a user in some activity within PagerDuty, and adding their user ID to the set as they are encountered.
Get all users that still have on-call shifts
First obtain the list of active users using the on-calls API endpoint, setting the
until parameters at the end of the URL to specify a range of dates (see: DateTime type format). In each element of the list returned in the response schema of that endpoint (see the documentation for further details), the property at namespace path
user.id has the ID of the user, which uniquely identifies the user.
For the date/time range, one should pick one that is as broad as possible to include all users who are on call but infrequently, but narrow enough to exclude the last time that the inactive users were on-call.
A - B will then involve a comparison based on user IDs. Depending on the next steps one wishes to take, one might need to get the users via their API endpoint url, given in the
user.self property of each on-call list entry.
See who’s been doing work and getting involved vis-à-vis log entries
The REST API Incident Log Entries endpoint lists all the activity of every incident’s timeline and accepts the same
until parameters as the
Note, however, that there will be a lot of data in this endpoint, and so going back far enough in time will require more paging through results using the
offset parameters. For instance, one could construct a
while loop that runs until the
created_at date of any log entry is more than a set amount of time ago, and incrementing the
offset parameter by the
limit parameter in each iteration.
Hence, one could use this to determine the last time any given user acknowledged, resolved, escalated, was assigned or was notified about an incident in a time range. This would be in the
agent property of each list entry returned from the endpoint, in the
id field, wherever
type (of the agent property’s object) is
created_at property contains the time that the action was taken.
Similar to using
/oncalls, one will then need to use the
self property (it will be in
agent.self), and the set difference will involve a comparison of user IDs.
The disadvantage of this method is that if there are agents who are only on-call at higher levels of escalation policies, they might be less often paged or involved in incident response (unless escalation is common enough), and so they might end up excluded from the list of active users.
Computing the difference
This is by far the easiest part; many scripting languages have utilities for this already. For example, Python has the native
set type, which recognizes the minus operator, so (given two
set type objects
B) it will be as simple as:
inactive_users = A - B
Before deleting a user, one must:
- Remove them from all on-call schedules and escalation policies
- Ensure that there are no open incidents such that the service used an escalation policy that the user was on at the time the incident was triggered
Fortunately, we have a solution for this, which automates these tasks:
To use it, one will just need the list of users’ login email addresses.