Tag: Operational excellence

  • The Temple and the Error Budget

    The Temple and the Error Budget

    What happens when you put Solomon’s Temple next to a modern error budget and ask them both what “perfection” really means? In this episode, we explore the idea that reliable service is not just a technical outcome but a moral consequence — the visible result of character, duty, and brotherly love expressed through IT work.

    Drawing on Freemasonry, Stoic philosophy, and the writings of Marcus Aurelius, we unpack what it means to work logarithmically toward an ideal you will never fully reach. We contrast the Masonic Temple and its working tools with SRE and ITIL principles: why 100% uptime is the wrong target, how continual improvement mirrors lifelong moral refinement, and how duty becomes the backbone of both spiritual life and professional reliability.

    Then we zoom in on the real builders of today’s “Temple”: the backup and recovery specialist guarding the sacred data; the infrastructure engineer hewing and setting the foundation; the Citrix/WebSphere/DB2 specialist adorning the inward workings; the mainframe programmer quietly automating away chaos; and the mainframe operator keeping vigil in the sanctum of production. By the end, your ticket queue, your runbooks, and your change windows look less like random toil and more like stonework on a shared, enduring structure.

    Source #1: ITILv4 Foundation

    Source #2: The Meditations by Marcus Aurellius

  • The Trestle-board and the SLO

    The Trestle-board and the SLO

    Join us as we uncover how the timeless lessons of structure, planning, and meticulous refinement, taught within the degrees of the Entered Apprentice, Fellow Craft, and Master Mason, are utilized by modern Site Reliability Engineers (SREs). These lessons are crucial for designing, deploying, and maintaining reliable computing systems.

    What You Will Learn:
     – The Blueprint for Reliability: Adherence to Design. Discover how SREs apply the principles of the Trestle-board (used by the Master-workman to draw his designs) to their infrastructure. We discuss the foundational importance of explicit planning, focusing on translating business goals into measurable Service Level Objectives (SLOs). The goal is to build a “spiritual building” (the reliable service) that achieves figure, strength, and beauty.
     – Refining the Rough Ashlar: Eliminating Toil. Learn how the SRE mandate to eliminate toil directly mirrors the builders’ transition from the Rough Ashlar (representing a crude, imperfect state) to the Perfect Ashlar (a stone ready by the hands of the workmen). Toil is the manual, repetitive, automatable work that lacks enduring value and scales linearly with service growth. SREs dedicate their time to engineering work (at least 50% of their focus) to write software that replaces this manual labor, ensuring staff scales sublinearly with system size.
     – Searching for Truth: Mastery Through Failure. The diligent worker must search to the foundations of knowledge to find the Truth buried under error. We explore SRE’s commitment to rigorous self-assessment, particularly through blameless postmortems following significant incidents. This practice is essential for finding the root causes of failures, improving systems, and making the organization more resilient as a whole.
     – The Discipline of the Craft: Understand the emphasis SRE places on high standards for workmanship and conduct. Just as the craft requires “virtuous education”, SREs prioritize continuous learning and structured training, including studying the liberal ARTS AND SCIENCES, to master the complexity of distributed systems. We look at how practicing mental discipline, combined with preparation exercises like disaster role-playing, aids in maintaining rational, focused, and deliberate cognitive functions during emergencies.
    This episode demonstrates that whether erecting physical edifices or building the world’s largest cloud services, success hinges on meticulous execution, relentless refinement, and an unwavering commitment to quality and Fidelity.

    Source #1: Duncan’s Masonic Ritual & Monitor (1866) by Malcom C. Duncan

    Source #2: Site Reliability Engineering edited by Betsy Beyer, Chris Jones, Jennifer Petoff, and Niall Richard Murphy