Site reliability engineering (SRE) continues to gain popularity as organizations embrace hybrid cloud strategies and IT automation at scale. By applying software engineering principles to operations and infrastructure practices, SRE enables organizations to streamline and automate IT processes.
In the Dynatrace State of SRE Report: 2022 Edition, 88% of surveyed site reliability engineers (SREs) said there is now greater recognition of the strategic importance of their role in business success than there was three years ago. SRE is becoming an essential discipline in organizations that use DevOps (the combination of development and operations) and agile methodologies.
SRE adoption is growing, yet gaps remain. In order to unleash the innovation organizations need to evolve their SRE approaches. The report uncovers six site reliability engineering trends that will help organizations get the most from DevOps practices.
1. MTTR reduction remains top of the list for SREs
SREs improve the reliability of production systems, and reducing mean-time-to-repair (MTTR) is their top priority. But research shows that 60% of SREs find they spend most of their time building and maintaining automation code. While increasing automation is a key goal, organizations will lose their derived efficiency if enablement is arduous and time-consuming.
Much of the problem stems from how site reliability engineering teams build automation for DevOps workflows. Often, teams handle this on a case-by-case basis because their tooling doesn’t come with automation built in and doesn’t offer everything-as-code capabilities. As a result, they must build a layer of automation on top of their tooling.
Over time, this creates a complex web of code that becomes more difficult to scale across the DevOps pipeline. If SRE teams don’t identify a more efficient approach to DevOps from day one, they will undoubtedly find more of their time drained in the future. That time sink results in developers chasing down bugs and other code issues rather than focusing on strategic, revenue-generating work.
This underscores the need for SREs to work with DevOps teams, developers, and architects to ensure that software not only meets a business need but also is resilient and automatable by default. Enabling teams to easily integrate new automation capabilities with existing tools and workflows reduces manual effort and improves engineering practices.
2. A shift to SRE-driven engineering takes hold
More than half (51%) of SREs say they dedicate significant time to influencing architectural design decisions to improve reliability. This suggests that all departments have made progress toward SRE-driven engineering across to improve reliability, resiliency, and security. But there’s still a long way to go.
The most mature organizations embrace site reliability engineering practices that include developers who have fought the battles. They understand what it takes to build systems that can scale from 10 users to 1,000, or from 1 million to 10 million users. Integrating these developers with the design process provides insight that enables architects to incorporate reliability from day one.
3. Security is a core pillar of site reliability engineering
SREs are also making progress in extending organization-wide DevSecOps approaches to ensure organizations can restore systems quickly after discovering a vulnerability. More than two-thirds (68%) of SREs say they expect their role in security to become even more central in the future. This trend will increase as organizations continue using third-party libraries for cloud-native application development.
As the Log4Shell vulnerability demonstrated when it emerged in December 2021, third-party code libraries face significant security risks. Site reliability engineering teams are critical to identifying those flaws and eliminating them for fail-safe IT protection and to minimize cloud and third-party risks.
4. SREs need the freedom to experiment
While more than half (52%) of SREs dedicate significant of time to designing experiments and tests to reduce the risk of production failure, only 1 in 10 highlights this as their top priority.
Experimentation is critical to SRE. Teams still need to make progress to ensure they have more time available for these tasks. For SREs to deliver more strategic business value, engineers must streamline tasks that involve intensive manual effort.
5. SREs need the license to prioritize strategic work
Although experimentation falls relatively low on their priority list, 51% of SREs say they’re encouraged to experiment. In addition, only a quarter (26%) of organizations see incremental project failure as OK. The weak emphasis on experimentation highlights the many distractions SREs face, limiting the time they have to focus on it.
Organizations must consider new strategies that enable site reliability engineering teams to focus on more strategic tasks. That way, teams have more time to play. Team leaders also need to foster a culture that accepts failure and understands that the principle “fail fast, fail often” provides the greatest competitive edge.
Unshackle SRE teams from traditional organizational structures that view IT as a cost center and a burden. Organizations can’t benefit from lessons gained from IT mistakes without openness to failure.
6. Organizations recognize and reward site reliability engineering teams
SREs must be free to challenge accepted norms and set new benchmarks for innovation-led design and engineering practices. Many organizations are making strides in this direction and have methods for rewarding SRE teams’ successes.
Nearly a third (31%) use hackathons to devise new ways to improve reliability, offering prizes to winning SRE teams. These approaches are key to encouraging a culture of experimentation that promotes the strategic value of site reliability engineering for the business.
All six of these SRE trends can help your organization advance its SRE practices. Embrace experimentation and unleash exciting innovation.