Tag: Site Reliability Engineering

  • The Square and the Server

    The Square and the Server

    In this episode, Change Advisory Board draws a straight line from the lodge to the datacenter via the square, exploring how the symbolic working tools of Freemasonry — the gauge, gavel, square, level, plumb, compasses, and trowel — can be reinterpreted as instruments of modern Site Reliability Engineering.

    From the Entered Apprentice’s 24-inch gauge to the SRE’s time budgets and service-level objectives, each tool becomes a lens for understanding the moral and operational discipline behind reliable systems. The common gavel’s task of removing rough edges parallels how engineers refine noise from telemetry. The Fellow Craft’s square and level emerge as early templates for data integrity and fairness — the moral geometry of incident response. The plumb rule, once a test of uprightness, becomes the model for aligned observability: systems and people both measured against their true vertical.

    Finally, the Master Mason’s compasses and trowel remind us that every great system — like every enduring fraternity — is held together not by code alone but by the invisible cement of trust, accountability, and shared purpose. Observability, in this light, is not just about data; it is the moral act of ensuring that what we build is true, just, and aligned with the architecture of higher principles.

    It’s a conversation about craftsmanship in code and in character — an investigation into how the oldest working tools of humanity still guide the newest disciplines of reliability engineering.

    Source #1: The Lecture of the Second Degree of Freemasonry

    Source #2: Site Reliability Engineering edited by Betsy Beyer, Chris Jones, Jennifer Petoff, and Niall Richard Murphy

  • The Watchtower and the Mirror

    The Watchtower and the Mirror

    This episode examines modern software maintenance practices, specifically Monitoring and Observability, through the lens of Masonic symbolism to illustrate principles of operational wisdom. Monitoring is aligned with the Watchtower, focusing on tracking real-time quantitative data about known system conditions, much like a Tiler guards a perimeter to detect anticipated problems. In contrast, Observability is compared to the All-Seeing Eye and the Mirror, representing the capacity to ask questions about a system’s inner workings to troubleshoot novel problems or “unknown unknowns.” Together, these concepts constitute the operational wisdom required by Site Reliability Engineers (SREs), which is further mapped onto the Masonic pillars of Wisdom, Strength, and Beauty to guide the pursuit of system reliability, efficiency, and continuous improvement.

    Source #1: The Lecture of the Second Degree of Freemasonry

    Source #2: Site Reliability Engineering edited by Betsy Beyer, Chris Jones, Jennifer Petoff, and Niall Richard Murphy

  • The Trestle-board and the SLO

    The Trestle-board and the SLO

    Join us as we uncover how the timeless lessons of structure, planning, and meticulous refinement, taught within the degrees of the Entered Apprentice, Fellow Craft, and Master Mason, are utilized by modern Site Reliability Engineers (SREs). These lessons are crucial for designing, deploying, and maintaining reliable computing systems.

    What You Will Learn:
     – The Blueprint for Reliability: Adherence to Design. Discover how SREs apply the principles of the Trestle-board (used by the Master-workman to draw his designs) to their infrastructure. We discuss the foundational importance of explicit planning, focusing on translating business goals into measurable Service Level Objectives (SLOs). The goal is to build a “spiritual building” (the reliable service) that achieves figure, strength, and beauty.
     – Refining the Rough Ashlar: Eliminating Toil. Learn how the SRE mandate to eliminate toil directly mirrors the builders’ transition from the Rough Ashlar (representing a crude, imperfect state) to the Perfect Ashlar (a stone ready by the hands of the workmen). Toil is the manual, repetitive, automatable work that lacks enduring value and scales linearly with service growth. SREs dedicate their time to engineering work (at least 50% of their focus) to write software that replaces this manual labor, ensuring staff scales sublinearly with system size.
     – Searching for Truth: Mastery Through Failure. The diligent worker must search to the foundations of knowledge to find the Truth buried under error. We explore SRE’s commitment to rigorous self-assessment, particularly through blameless postmortems following significant incidents. This practice is essential for finding the root causes of failures, improving systems, and making the organization more resilient as a whole.
     – The Discipline of the Craft: Understand the emphasis SRE places on high standards for workmanship and conduct. Just as the craft requires “virtuous education”, SREs prioritize continuous learning and structured training, including studying the liberal ARTS AND SCIENCES, to master the complexity of distributed systems. We look at how practicing mental discipline, combined with preparation exercises like disaster role-playing, aids in maintaining rational, focused, and deliberate cognitive functions during emergencies.
    This episode demonstrates that whether erecting physical edifices or building the world’s largest cloud services, success hinges on meticulous execution, relentless refinement, and an unwavering commitment to quality and Fidelity.

    Source #1: Duncan’s Masonic Ritual & Monitor (1866) by Malcom C. Duncan

    Source #2: Site Reliability Engineering edited by Betsy Beyer, Chris Jones, Jennifer Petoff, and Niall Richard Murphy