Skip to content

Good reads: Rethinking On-Call

Rethinking On-Call: Compensation, Runbooks, and Sustainable Practices

Old School Burke
Old School Burke
1 min read

Let us talk about one of the most-hated aspect of the engineering experience: the on-call experience.


  1. How to Write Good Runbooks: This article emphasizes the importance of well-crafted runbooks in incident management. It offers practical advice on making runbooks actionable, reducing the stress and uncertainty that often accompany on-call situations.
  2. Navigating On-Call Compensation in the Tech Industry in 2023: Discusses the evolving landscape of on-call compensation, providing insights into how fair and motivating compensation practices are essential for maintaining team morale and performance.
  3. Incident Metrics Tell You Nothing About Reliability by Dan Slimmon: Dan assesses (critically) the effectiveness of using incident metrics to gauge system reliability, proposing a more nuanced approach to understanding what these metrics truly indicate about our systems.
  4. Oncall and Sustainable Software Development: This piece links effective on-call practices with sustainable software development, suggesting ways to align on-call duties with a broader commitment to developer health and software quality.
  5. Project Star: Streamlining Our On-Call Process: A case study from LinkedIn detailing how they refined their on-call process to boost developer satisfaction and productivity, providing a practical example of successful on-call management.
reads-editionNewsletter

Related Posts

Members Public

010: Don’t Panic: Unblock yourself first

Unblocking yourself is part of the learning journey. When you get stuck, resist the temptation to type “Help!” immediately and run. Try these steps first: * Give your brain a chance to self-solve * Dive into existing docs or knowledge bases * Tinker, test, and experiment * Reach out methodically, with strong context, only

Members Public

009: The Ladder of Autonomy

Understanding Task Relevant Maturity and Ladder of Autonomy

009: The Ladder of Autonomy
Members Public

008: Complete Ownership During Incidents

There’s a constant temptation in our software engineering world to treat incidents as someone else’s problem. When your service experiences downtime because of an infra hiccup, it’s easy to say, “This is Infra's problem,” and then sit back. But if you’re the service owner,

008: Complete Ownership During Incidents