Meet Wataru Manji, a Site Reliability Engineer (SRE) based in Japan. With PagerDuty in his toolkit, he effortlessly identifies automation opportunities and streamlines processes to eliminate unnecessary alerts. The result? Enhanced report value and operational excellence, all powered by PagerDuty.
What is your current job title? How many years of experience in the industry do you have?:
SRE. 5 years.
How many years of working with PagerDuty products (please mention which ones) do you have?:
4 years
Tell us about your experience using a PagerDuty product, native integrations, custom APIs, extensions or add-ons to services/systems challenges you were trying to fix/solve/improve and what you achieved after the implementation (feel free to include performance metrics, if available or any fun/odd/remarkable stories).:
I have implemented a detailed incident and alert reporting feature using the PagerDuty API, and Python's pandas & numpy libraries.
The ability to retrieve data from the API, such as incident resolution time, the scope of the outage, the number of concurrent occurrences, and the frequency of occurrence, significantly enhanced the value of the reports. This led to a streamlined process for unnecessary alerts and potential automation. As a result, in one service, the number of alerts was reduced by more than 90% within six months of introducing PagerDuty, while maintaining the service level.
What type of resources you like/use the most when in need of support? Knowledge Base, Ops Guides, LinkedIn/Twitch streams, 1o1 with Devops Advocate/Account Manager/Solutions Consultant, Community Forums, PagerDuty University training/certifications or other. Tell us why!:
The first resource you should refer to is the official documentation. In particular, the API documentation is user-friendly as it provides specific examples of request/response scenarios.
Do you have a favorite PagerDuty product and/or API, integration? Tell us why!:
I believe Slack Integration is quite beneficial for software engineers. While the standard PagerDuty User Interface displays incidents as separate entries, Slack Integration provides an overview timeline of incidents. This makes it convenient for retrospection.
What feature you like to see implemented in our Operations Cloud ecosystem in the future?:
I believe it would be beneficial to have a feature that allows for the construction of a publicly accessible Status Page. This feature should not only provide incident status updates, but also include capabilities for managing service levels.
Be the next PagerDuty Community member featured
Worked on a custom integration, extension, or add-on using PagerDuty? Has a success use case story? Wrote an article that can help the dev community?
Join the Community and share your project with us to get the chance to be featured in PagerDuty Community content across social channels!