Skip to main content

I am trying to extract data from pagerduty api using python script. I have written script to get list of all the incidents from a given specified date range from the available team data stored in json and using that data to display results of incidents from api https://api.pagerduty.com/analytics/raw/incidents/{incident_id}’. I am able to retrieve data for 1, 2 teams but when I add 500 + teams the results is around 10000 rows. I am not sure how to retrieve this in a faster way. It took me 2 hours and then script keeps on running. I am not sure how to bypass this pagerduty api limit , I have added pagination, I have used offset, limit. There is no straightforward guide on this. Please help

Hello @aatiya!


Depending on what you are trying to achieve I would probably recommend you to look into the Analytics endpoint instead. We provided aggregate information of the incidents per team, service and others.


PagerDuty API endpoints have a limitation of returning max of 10000 results through pagination ( offset + limit < 10000) as mentioned in our documentation. The recommendation is to apply filters (per service, per team, … depends on your scenario and endpoint your querying) and then aggregate the data yourself.


image


Still, this can be challenging to achieve depending on what you are trying to do. Let me know if this helps!


If it doesn’t and you want to give me more details on your specific scenario I can try to help.


Hi Tiago,

Thanks for your response. I tried to change my code to fetch data from given endpointurl = "https://api.pagerduty.com/analytics/raw/incidents. I was able to create the code to fetch the desired data but limit is set to 1000 whereas actual output is more than 4000. Further I want to extend my code to display results which has almost 10,000 records. I don’t think analytics endpoint supports classic pagination, and the documentation you shared has another option cursor-based pagination but seems like it is supported only for audit/record endpoint. I am struggling to get this sorted and there is no complete documentation how can I use cursor starting_after


and ending_before. I have been struggling from months, any quick help will be appreciated.

Code I tried:


import requests

import csv

import json


url = “https://api.pagerduty.com/analytics/raw/incidents


with open(“team_data.json”, ‘r’) as file:

team_data = json.load(file)

team_ids = team_datad‘pagerduty_id’]


payload = {

“filters”: {

“created_at_start”: “2023-11-30T00:00:00Z”,

“created_at_end”: “2023-12-03T00:00:00Z”,

“team_ids”: team_ids,

},

“limit”: 1000,

“order”: “desc”,

“order_by”: “created_at”,

“time_zone”: “Etc/UTC”

}


headers = {

“Accept”: “application/json”,

“Content-Type”: “application/json”,

“Authorization”: “Token token=api_key”

}


response = requests.post(url, json=payload, headers=headers)


if response.status_code == 200:

incidents = response.json().get(‘incidents’, ])

for incident in incidents:

incident_number = incident.get(‘incident_number’)

print(f"Incident Number: {incident_number}")


raw_data = response.json().get('data', n])

# Specify the file path for the CSV file
csv_file_path = "incident_data.csv"

# Writing incident data to CSV file
with open(csv_file_path, mode='w', newline='') as csv_file:
fieldnames = "incident_number", "urgency", "priority_name", "service_name",
"team_name", "created_at", "resolved_at", "auto_resolved",
"assignment_count", "escalation_policy_name",
"business_hour_interruptions", "created_at",
"escalation_policy_id", "resolved_by_user_name",
"business_hour_interruptions"]

writer = csv.DictWriter(csv_file, fieldnames=fieldnames)
writer.writeheader()

for d in raw_data:
writer.writerow({
"incident_number": d.get("incident_number"),
"urgency": d.get("urgency"),
"priority_name": d.get("priority_name"),
"service_name": d.get("service_name"),
"team_name": d.get("team_name"),
"created_at": d.get("created_at"),
"resolved_at": d.get("resolved_at"),
"auto_resolved": d.get("auto_resolved"),
"assignment_count": d.get("assignment_count"),
"escalation_policy_name": d.get("escalation_policy_name"),
"business_hour_interruptions": d.get("business_hour_interruptions"),
"created_at": d.get("created_at"),
"escalation_policy_id": d.get("escalation_policy_id"),
"resolved_by_user_name": d.get("resolved_by_user_name"),
"business_hour_interruptions": d.get("business_hour_interruptions"),
})

print(f"Data written to {csv_file_path}")

else:

print(f"Error: {response.status_code}")


For anyone looking for solution. It was a nightmare to look out how to retrieve this.

I have taken “more” field and ran a loop until more field is false to read from all the pages and set “starting_after” parameter in payload which takes the value from “last” field in every loop. This helps to set first value from the every. next page.


Reply