“SRE is what happens when a software engineer is tasked with what’s called operations”. That’s how Ben Treynor, the mastermind behind SRE and VP of engineering at Google describes Site Reliability Engineering (SRE).

Effectively, Site Reliability Engineering is in the crossroads of software engineering and system operations. Its main goal is to create scalable and high reliable software systems. Combining release speed (required by development teams), quality control (required by QA teams) and system stability (required by operations teams) is the core of what SRE is about. 

Site Reliability Engineering goals

Adopting Site Reliability Engineering unlocks a big potential. SRE enables teams to manage error budgets and efficiency costs in order to preserve a continuous and reliable software delivery. Thus, a Site Reliability Engineer can implement new features while preserving an availability of 99,9%. The user’s happiness with the service and performance is optimised.

At Digital Architects Zurich, we embrace the SRE approach. Site Reliability Engineering became a fundamental part of how we help businesses, how we manage our products, and a key feature of our value proposition. We believe that Site Reliability Engineering is the way to go in order to tackle digital transformation challenges.

If you want to dive deeper, we bring to your attention that Google published a book embodying a detailed review of what it is about and how it has been implemented within Google software teams. You can read Google books “Site Reliability Engineering: how Google runs production systems” and “Site Reliability Engineering: practical ways to implement SRE” here for free.

Effective SRE by Digital Architects Zurich 

At Digital Architects Zurich, we believe that the SRE approach should be accessible to any organisation, even outside of the IT field. That is why we conceptualised the “Effective SRE” approach.

Inheriting from Google’s best practices and a 20+ years of field experience in managing digital projects, we aim to democratise the Site Reliability Engineering approach, and make it operative for any organisation looking towards digital transformation.

 

We define “Effective SRE” as “the application of a systematic, holistic, disciplined approach to the definition, evaluation and cost-effective assurance of Service Level Objectives in the context of high-speed software delivery, usually required as a central capability in Digital, DevOps, or Cloud transformations Roadmaps”.

 

Effective SRE’s main objectives are:

  • Clear and structured set of activities required to design, pilot/control and operate the digital highway (Cloud-Native Continuous Delivery Pipelines and Operations).
  • Reducing complexity and increasing efficiency by leveraging emerging Observability/Open Telemetry/Open Tracing Standards, AI, Cloud-native stacks and CI/CD tooling capabilities.
  • Clear roles, profiles descriptions and related modular trainings which can be adapted to the organisation context (maturity, size, roadmap…).

 

Effective SRE scope

Effective SRE Scope, Digital Architects Zurich, 2020.

 

In more concrete words, Effective SRE is a practical method that any organisation can use for a step by step adoption of Site Reliability Engineering. It is, we believe, the best way to ensure Operations efficiency all along with providing continuous and highly reliable software delivery.

 

To learn more:

You can have a more deep insight on how, at Digital Architects Zurich, we perceive and apply the Site Reliability Engineering, including “Effective SRE”, by visiting our blogposts section

If you have comments or feedbacks or you would like to get in touch for more information, please feel free to reach out to us on info@digital-architects-zurich.ch or via our social media channels.