What is Site Reliability Engineering (SRE)?
“SRE is what happens when a software engineer is tasked with what’s called operations”. That’s how Ben Treynor, the mastermind behind SRE and VP of engineering at Google describes Site Reliability Engineering (SRE).
Effectively, SRE is in the crossroads of software engineering and system operations. Its main goal is to create scalable and high reliable software systems. Combining release speed (required by development teams), quality control (required by QA teams) and system stability (required by operations teams) is the core of what SRE is about.
A big potential is unlocked when adopting SRE: teams are enabled to manage error budgets and efficiency costs in order to preserve a continuous and reliable software delivery. Thus, a Site Reliability Engineer can implement new features while preserving an availability of 99,9% so that the user’s happiness with the service and performance is optimised.
At Digital Architects Zurich, we embrace the SRE concept. It has became a fundamental part of how we help businesses, how we manage our products, and a key feature of our value proposition [link to new page of Value proposition]. We believe that Site Reliability Engineering is the way to go in order to meet digital transformation challenges.
If you want to dive deeper, we bring to your attention that Google published a book embodying a detailed review of what SRE is about and how it has been implemented within Google software teams. You can read Google books “Site Reliability Engineering: how Google runs production systems” and “Site Reliability Engineering: practical ways to implement SRE” here for free.
Effective SRE by Digital Architects Zurich
We believe at Digital Architects Zurich that the SRE approach should be accessible to any organisation, even outside of the IT field. That is why we conceptualised the “Effective SRE” approach. Inheriting from Google’s best practices and a 20+ years of field experience in managing digital projects, we aim to democratise the SRE approach and make it operative for any organisation looking towards digital transformation.
We define “Effective SRE” as:
“the application of a systematic, holistic, disciplined approach to the definition, evaluation and cost-effective assurance of Service Level Objectives in the context of high-speed software delivery, usually required as a central capability in Digital, DevOps, or Cloud transformations Roadmaps”.
Its main objectives are:
- Clear and structured set of activities required to design, pilot/control and operate the digital highway (Cloud-Native Continuous Delivery Pipelines and Operations).
- Reducing complexity and increasing efficiency by leveraging emerging Observability/Open Telemetry/Open Tracing Standards, AI, Cloud-native stacks and CI/CD tooling capabilities.
- Clear roles, profiles descriptions and related modular trainings which can be adapted to the organisation context (maturity, size, roadmap…).
In more concrete words, Effective SRE is a practical method that any organisation can use for a step by step adoption of Site Reliability Engineering. It is, we believe, the best way to ensure Operations efficiency all along with providing continuous and highly reliable software delivery.
More Insight on Effective SRE
You can have a more deep insight on how, at Digital Architects Zurich, we perceive and apply the Site Reliability Engineering, including “Effective SRE”, by visiting our blogposts section.
If you have comments or feedbacks or you would like to get in touch for more information, please feel free to reach out to us on email@example.com or via our social media channels.
Our offers for Effective SRE
Digital Architects Zurich have several offers to help you adopt Site Reliability Engineering (SRE) for your organisation:
Set-up Site Reliability Engineering in your organisation
Adopt and integrate SRE with your existing operating model
Develop the required skills for SRE through training and by establishing practices