August 28 Kafka Outages – What Happened and How We’re Improving
Ask questions, find answers, get inspired
Hi everyone,We sincerely apologize for this morning's technical issues that prevented registrants from joining the live webinar with Arize. The full recording is now available.Since you couldn't ask questions live, we've created this thread for Q&A. Our speakers are ready to answer your questions about AI reliability, monitoring, observability, and implementation strategies.How to participate:Watch the recording Post your questions in the comments Get answers directly from our speakersWhether you want to dive deeper into specific topics or discuss how to apply these concepts to your organization, ask away!
I am configuring a new PD service where “Outside support hours” is set to “Dynamic notifications based on alert severity”. If severity is “high” the on-call engineer gets paged, otherwise it will be considered as low urgency. Up to this point all good. However, I want the alert to be raised to high urgency once we enter support hours. This is possible when “Outside of support hours” is set to “Low-urgency notifications, do not escalate” as there’s a check-box “Raise urgency of unacknowledged incidents to high” I can tick. Why is this checkbox not available when I select "“Dynamic notifications based on alert severity”? Let me know of an alternative approach. I wanting to avoid having to create 2 services for this.
Exciting news for RBA users!PDU has revamped our RBA learning path! 🚀 Whether a user is new to automation or looking to deepen their expertise, this learning path guides users from foundational concepts to advanced operational controls—empowering them to design, deploy, and manage resilient automation at scale.Even more exciting news: We have launched our PagerDuty Runbook Automation Certification! All users that have access to certifications will see this certification in their PagerDuty University account. Check out the one-pager about this new certification here, and the study guide for the certification exam here.
Will you be at KubeCon + CloudNative North America in Atlanta this November 10-13?PagerDuty will be there, and we’d love to connect - whether it’s at our booth or during an exclusive happy-hour event! Drinks and eats on us! Register to attend our Happy Hour on Wednesday, November 12 Visit the PagerDuty Booth #455 for interactive demos, participate in the booth passport program, and enter our raffle to win a Pagey Lego set!
Hello when attempting to run a ansible play via bash script on a remote RHEL 9 node i am getting "ERROR]: Task failed: Finalization of task args for 'ansible.builtin.set_fact' failed: Error while resolving value for 'infoblox_next_ip': The lookup plugin 'nios_next_ip' failed: a bytes-like object is required, not 'str'". This play is using the infoblox plugin to lookup ip addresses, it works great if i run on the remote node local CLI however when triggered from rundeck to the RHEL 9 node i get this error. From my testing it appears to be related to the password and special characters but im not 100% sure. Any ideas on how to troubleshoot?Not working(trigger from rundeck): rundeck -> shell script -> ansible-playbook -> infoblox moduleWorking(local CLI on RHEL Node): ansible-playbook -> infoblox module250 - name: Get next available IP address in given subnet with exclusions251 ansible.builtin.set_fact:252 infoblox_next_ip: >- ^ column 23<&
I am new to pagerduty. Trying to set up alerts and escalation policy. nothing seems to work at all. I have set escalation policy repeat alert 9 times if no one acknowledges. and guest what after that change I stop getting any kind of alerts even if I acknowledged it whilst alert is still firing from grafana side. I doesnt repeat any alert at all for some reason and also creates incidents in a single payload. Why would I want all incidents in one payload. Also do you have feature when alert closes itself once it stops firing from grafana side like it happens in opsgenie.
Are you passionate about building resilient systems or have a story to tell about managing incidents in creative ways? We want to hear from you! We’re excited to announce the Call for Papers is officially open for PagerDuty on Tour San Francisco 2026, taking place on March 3rd, 2026. This is your chance to showcase your innovative ideas, share real-world lessons, and connect with a vibrant community of peers and industry experts.Session Guidelines:Session format: Talks (20-30 minutes including Q&A) or Panels (40 minutes moderated discussion) Session topics: Back to Basics, Building resilient systems and teams, Automation and AI in operations, and more - see full list of topics in the submission form Audience: developers/SREs - we recommend providing at least three actionable takeaways fit for a technical audience.Ready to submit your idea? Submit your proposal here by October 31st and help shape the future of incident response, automation, and reliability.We can’t wait to see what
Hey all, I’m utilizing the /incidents API to fetch all incidents for an internal report. From what I can tell the since/until date parameter is filtering based on the created_at date. Does anyone know if there is a way to filter based on the updated_at field? I’d like to incrementally load a warehouse with the incidents from this API, but you can’t really do that if you can only filter by the created_at date.
For over 16 years, PagerDuty has been trusted by 32,000+ organizations as the incident management leader, helping teams forge a path forward with modern operations.This launch is built around one principle: champion the customer. Here's what we built for you:🌟 The End-to-End Incident Management Features You've Been Waiting ForFlexible scheduling that fits how you run on-callModern Flexible Shifts (forthcoming EA for IM and CSOps plans) - Create intuitive iCal-based recurring shifts that eliminate the complexity of layers entirely, with support for multiple responders per shift, shadow schedules for backup coverage, and automatic conflict detection. Shift Agent (GA, PagerDuty Advance customers) - Handle on-call conflicts and overrides directly in chat with AI-powered recommendations.Core incident management enhancementsRequired Fields on Resolve (EA for IM and CSOps Enterprise plans) - Ensure critical data is captured before incidents can be closed. Reopen Incidents (GA, IM and CSOps
Hi, in the get incidents api the list of acknowledgements gets cleared after it has been resolved/triggered, why is that? Can we please preserve the list of acknowledgements, we need this data for our reporting
Our open-source API client for Python has now been refactored from a monolithic one-file module into a multi-file module. This change is being made to fulfill a need for long-term maintainability and improved readability. This new release does not add new features, but aims to make contribution of new features far easier going forward, most notably enabling us to add a py.typed file to enable using typehints in projects (issue #26). More about this release:https://github.com/PagerDuty/python-pagerduty/releases/tag/v2.0.0https://pagerduty.github.io/python-pagerduty/changelog.html
Expert-level, in-depth, and practical guides for practitioners
Learn about PagerDuty, get support and find latest releases
Customize your experience with with PagerDuty APIs
Your source for on-demand PagerDuty training
Already have an account? Login
No account yet? Create an account
Enter your E-mail address. We'll send you an e-mail with instructions to reset your password.
Sorry, we're still checking this file's contents to make sure it's safe to download. Please try again in a few minutes.
Sorry, our virus scanner detected that this file isn't safe to download.