Lots of people talk about “operational maturity”…being more “mature” in your operations by proactively managing your teams, processes, applications, systems, and infrastructure. Back in 2018, we mapped out what this maturity looks like as a progression of different levels:
The customers that I’ve shown this graph to usually bucket themselves under the “Responsive” category, which then leads to the question - “How do we get to the next level? And then to the next?”
This is a question one of my customers was eager to understand, specifically “What do each of these maturity levels look like in PagerDuty?” and “What do our teams need to be doing in PagerDuty to ‘get better’?”
So, I created a Checklist:
The checklist is broken out in a way that not only aligns with the Operations Maturity Levels, but that also aligns with the way I see a lot of teams maturing in their PagerDuty adoption. There are 5 levels in the checklist:
- Level 0: Foundations. These are the building blocks of PD. What every team who’s going to be responding to incidents and alerts should have set up to do anything in PD.
- Level 1: Basics. This is what a lot of teams start doing in their early PD days: integrating with their monitoring tools, making sure to only funnel in critical incidents, and getting in the habit of receiving and responding to PD notifications on their phone. (Reactive/Responsive)
- Level 2: Intermediate. But as more monitoring event data gets funneled into PD, teams find that they need to be able to better manage their alerts, so they start updating their PD configuration by categorizing, suppressing or grouping their alerts to focus on those actionable events that matter most. (Responsive)
- Level 3: Advanced. Once they’ve got a strong handle of their monitoring event data, they start shifting focus into being smarter about how they manage major incidents, which means automating manual incident response processes in PD (like notifying stakeholders and additional incident response teams). (Proactive)
- Level 4: Operationally Mature. For those rare teams that have a strong handle of their operations, the focus then turns to being a complete data nerd. This means analyzing on-call, service, and infrastructure health data to not only proactively understand their operations, but to also put in place measures to prevent on-call burnout or another major incident. (Preventative)
Ultimately, this checklist is intended to be used as a guideline to make sure that as you mature in your operations, you’re also adopting the right PagerDuty features and products to support you in your maturity.
Download the PDF of the checklist PLUS access links to best-practice/how-to videos to help you complete each checklist item here.