Our next guest for the PagerDuty Community AMA is Jennifer Petoff!

Jennifer Petoff is a Senior Program Manager for Google’s Site Reliability Engineering team based in Dublin, Ireland. She is the global lead for Google’s SRE EDU program and is one of the co-editors of the best-selling book, Site Reliability Engineering: How Google Runs Production Systems.

You can find her on Twitter at @jennski and can read her book online here.

How This Works

Post your questions to Jennifer in this thread - we’ll collect them up and she will answer them in a live-stream video on May 28. Questions should be posted no later than Friday, May 24. You can also tweet your questions for her to us via our twitter handle, @pagerduty. Please use the hashtag #pagerdutyama.

In addition to your questions about Jennifer’s experiences, we encourage you to interpret “AMA” as “Ask My Advice”!

2 Likes

Hi Jennifer!

The SRE book chapter 14 talks about managing incidents as they grow in scale - and involving an Incident Commander.

Can you talk about how you train someone to be an effective Incident Commanders? How do you build up the confidence that they will be able to handle highly visible, stressful situations successfully?
In your organizations, is there a formal list of incident commanders, or is it a role that any teammate is expected to be able to take on?

thanks for sharing!

Jennifer, thanks for doing this AMA with us.

I was recently having a conversation about SLOs and error budgets with Google Cloud Developer Advocate Nathen Harvey and he told me that attempts to set SLOs often fail because business leaders either don’t understand or can’t define the consequences of violating an SLO. He recommends that, to get buy-in for SLOs, that folks start with defining these consequences. This strikes me as a bit of a tall order (particularly in firms that are not as engineering-centric as Google) to get non-technical business leaders to think in this way.

Do you have any thoughts on how technical folks can help their management define and quantify these consequences & tie them to factors that business leaders do care about?

The video for the AMA is online!