Red Gold Leader: February 2020

Sunday, February 23, 2020

Data / Model-driven writing tools

For ~5 years, I've continuously returned to this blog post:

I think that there's something so subtly powerful about being able to take data / logic and surround it with rhetoric and prose. Most tools separate these out between Word / Excel / Powerpoint, but I think that successes of things like dashboarding tools and other interesting kinds of data-driven visualization tools make me really excited about the kinds of tools we should be focused on building as technologists.

Saturday, February 15, 2020

Incident Response

Part 1 and Part 2 and are here – main takeaways below, but recommend reading the raw papers for more detail. Overall, I think that we do a pretty good job of this, but especially as we move towards more Operational Responsibilty, Baseline, and PRX working closely together, worth reviewing for some ideas of where things fail in a distributed incident response environment. Some of my notes below:

It was really cool to see the rigor with which they analyzed the response rather than the incident itself – there are a lot of papers about “bad bugs” but less about “collective multi-engineer debugging.”
Organizational complexity often mirrors system complexity (Conway’s Law) conflating incident response.
How important decentralized agency / autonomy is in incident response – “plans / actions” are often poorly decomped and require high-trust operators to go off script.
Reminds me a lot of patterns in immunology with antibodies / tcells / general sympathetic nervous response, as well as trauma triage principles in emergency medicine.
Willingness to achieve “immune response” and bring in expertise rather that fearing the repercussions of over-escalating.

Thematically, I think that the most important takeaway for me is that preventative design is not enough: while it’s really important to design our software and try to prevent catastrophic failure, it’s probably equally valuable to design our human / organizational response systems in a way that quickly resolves these catastrophes.