One way to break the cycle of alert fatigue is by improving the quality of the signals you monitor. That can mean greater resolution at which monitoring data is ingested and processed, smarter statistical methods for aggregating and correlating data across multiple services, or routing alerts through an escalation and incident management system.
But what if the right signals aren’t emitted by my systems, but instead by the actions of my operations team?
In this talk, I’ll share how we combined application metrics with metadata from our chatops tools to unlock insights about the behavior of our operations team. And how monitoring that behaviour in real-life provided our leaders with more accurate and earlier warnings about incidents happening in real-time.
Arijit Mukherji, CTO, SignalFX