AMA with Charity Majors - June 15, 2017


(Charity Majors) #25

And engineering for the sake of engineering is a waste of time. Delivering on company goals is what makes you a fucking badass. :slight_smile:

(Charity Majors) #26

what kidn of auto-remediation do you feel is not a prerequisite feature of production systems already? I’m confused. I would definitely say it already is.

(Charity Majors) #27

The best code is no code at all.

The second best code is code that fits your use case, that someone else maintains, and optimally that you get to read the source of.

The third best code is any other code.

(h/t @pvh)

(Charity Majors) #28

Seriously though … :slight_smile:

I think a super useful framework here is @jessitron’s iconic blog post on where to spend your technical risk. She has a continuum from dev tools (risk away, wheee!) to storage engines and databases (hold your horses pardner). This is because of the time it takes to harden them respectively and the possible blast radius when they fail. That’s the best way to think of buy vs build imho – in terms of maturity and time to mature.

(Charity Majors) #29

Oh god. Almost every decision I’ve ever made has been obsoleted or could be improved upon by now.

For example, I remember when building and owning your own metrics in house was the sign of a Serious eng team, because we thought it was our core competency. Shit, I remember when RUNNING OUR OWN POSTFIX AND CYRUS IMAP and training our own spamassassin filters and clamav antivirus filters was a sign of Serious Eng Teams.

Fuck those days so hard…

(Eric Sigler) #30


(Charity Majors) #31

Oh man. What an awesome question. I think there’s no question but that we’re seeing a revolution in quality for tools in the infrastructure space. We used to hack the most awful shit together and take PRIDE in our ability to navigate and gain signal out of horrendous interfaces.

Turns out engineers are people too. We believe in consumer quality developer tools at honeycomb, in tools that are elegant and intuitive and opinionated and designed. An opinionated tool is great because it lets you use your intuition when dealing with it, instead of making you memorize arcane rules. A good tool gets out of your way and is basically invisible.

Our process … well my cofounder Christine is the god of all things UX. But for the first 6 months of the company we intensely studied kathy sierra’s work, esp her “Badass” book. I highly recommend it for every engineer. To parapharase … your job is not to build a better camera, it’s to build a better photographer. People will fall in love with you if you make them radically better at their work!

So that’s really one way of putting our base goal: we want to build a better engineer. :slight_smile:

(Charity Majors) #32

Hi greeno!! <3

God no. I grew up on a farm in rural idaho, and I went to college when i was super young for classical piano performance. I fell in love with computers but let’s be honest, I also just didn’t want to be poor all my life. (And I didn’t want to be around women because I thought being a women meant babies and dishes, but that’s another story. It’s unfortunate that I got into computers due to misogyny but I will be doing penance for a long long time. :slight_smile: )

Yes … I think we need to demystify the priesthood. Comptuers are just really fucking fun and powerful for people who like solving problems. They really aren’t that hard. We need to put the ridiculousness and fun back into what we do.

(Charity Majors) #33

Hm. Honestly? Dashboards. I fucking hate dashboards. A dashboard is a place you go to stop and consume data passively, scanning with your eyeballs and pattern matching. You need to be actively modeling your systems and testing hypotheses and falsifying them all the time. Dashboards are the antithesis of this, and they can go die in a fire.

(Charity Majors) #34

However, that’s a very OLD problem. A newer one… let me think.

(Eric Sigler) #35


(Charity Majors) #36

A trend in the world of reliability, resiliency, distributed systems that I just don’t like is … bringing more and more people into the on call fold (good!) without commensurate attention toward making oncall NOT MISERABLE (bad!)

On call has a terrible reputation because we’ve been flagellating ourselves for … as long as I’ve been in computers. This is terrible. It’s totally reasonable not to want to be on call in a miserable way that kills your life and fucks up your sleep systems. I’m ashamed of us, and I’m sorry.

We have to fix this. It’s unfortunate and inhumane that management and execs don’t consdier quality of life as important as quality of systems. What if, next to the uptime %, we were required to report the % of times a human was woken up … every. fucking. time?

(Charity Majors) #37

Another thing I don’t really like is the amount of duct tape and baling wire and homegrown one-offs we are using to make dashbards, metrics, and logs kinda sorta work. What people really need is a high-cardinality, high-dimensionality event-driven solution. But they are so locked into the mental limitations of the metrics model that they don’t even accept the possibilities that are unlocked by events. We have a lot of things to build and education to do, just to rebuild these muscles as a community.

(Alex Maier) #38

So what is the difference between metrics and events?

(Charity Majors) #39

Oh, I’m so glad you asked!! A metric is a single ‘dot’ of information (usually an int or a float), sometimes with “tags” appended (a set of metrics may be tagged with the build_id, host_name, etc). An event is a collection of bits of information that are all submitted at the same time.

A metric: “system_cpu_15m_avg” = “2.0”

An event: { “first_name”: “charity”, “last_name” : “majors”, “visiting_company”: “pagerduty”, “temperature_outdoors” : 80, “distance_walked”: 1.5, “distance_units”: “miles”, “duration”: 120 }

An event is a thing that changed a system. People are used to thinking of log lines in ways that closely resemble the way you should think about an event. Something happened, so we log it.

We typically use metrics to describe the state of the system when an event takes place.

In Honeycomb, we handle arbitrarily wide events, so you can simply append a bunch of values about the state of the system to the event itself. Alternately, you can periodically poll the state of the system (every second, 30 sec, etc) and submit it to the same dataset as you are submitting the events, and then you have the best of both worlds.

No more hopping around between your log aggregator and event store and your metrics. Which si just a terrible way to live.

(David Shackelford) #40

Thanks @charity! On the PD product team we love Kathy’s book.

(Dom) #41

Hi Charity - here’s a shout out from Seattle - PagerDuty Summit!

(Charity Majors) #42

YESSS. Every every every engineer ever should read kathy’s book, it’s like my bible. :slight_smile:

(Charity Majors) #43

HI folks!! Wish you could be here, but i’ll have a shot for each of you. I suffer for you all. :heart_eyes:

(Charity Majors) #44

Signing off for now. Thank you to the awesome pagerduty folks, and thanks everyone for your terrific questions. And thanks most of all for the swag, the real reason any of us do any of these things. :slight_smile: