Purple background

Site Reliability Engineering

Empower SRE to improve availability, performance, and user experience and solve problems proactively with full-stack visibility and real-time insights.

State of SRE Report

We asked 450 SREs to share their perspectives on the challenges they face and the ways site reliability engineering (SRE) is evolving as a discipline.

Download our complimentary report to see why:

  • 88% say there’s a greater understanding of the strategic importance of their role now vs. three years ago.
  • 99% encounter challenges when defining and creating SLOs to evaluate service levels for applications and infrastructure.
  • By 2025, 85% want to standardize on the same observability platform from Dev to Ops and Security.

Download your free report

Country/Region
 

Propel SRE with observability and security insights

  • Drive production reliability

    Reduce risk and ensure any changes made to applications, services, and infrastructure with critical dependencies are evaluated against key metrics, SLOs, and security data with the Site Reliability Guardian app.

  • Reduce MTTR

    Combine answers from observability data with automation workflows to intelligently orchestrate remediation and incident management workflows. Understand the root cause of issues to triage and resolve them quickly.

  • Power proactivity

    Leverage Service Level Objectives (SLOs) and error budgets to proactively monitor critical metrics and take action before any violations occur. Keep all your SLAs in check and the business happy.

Product Tour

Service Level Objectives (SLOs): A fundamental tenet for SREs

Define, measure, and monitor your SLOs with automated, AI-powered monitoring and analysis. Get real-time, actionable insights to proactively resolve issues and optimize performance, allowing you to meet business goals and improve customer experience.
Dynatrace product tour illustration
request demo

Common SLOs we monitor

Dynatrace can monitor a wide range of Service Level Objectives (SLOs), including:

Common business SLOs

  • Availability: Is the service available for users?
  • Engagement: How engaged are the users?
  • Conversion: What is the rate of users reaching my business goals?
  • User Satisfaction (ApDex): What is the satisfaction level with the performance of my app from 0-1?

Common performance SLOs

  • Utilization: Average time resources are busy servicing work
  • Success Rate: Ratio of success vs. total requests
  • Response time: Time it takes to service a request
  • Saturation: Resources that are most constrained
  • Traffic: Measure of how much demand is being placed on your system

Common SLOs for mobile apps

  • App adoption: Ratio of daily users vs. total users
  • Availability: Rate of requests with a valid response
  • App rating: Ratings based on Android or iOS Store
  • Response time: Rate of login requests faster than 100ms
  • Crashes: Crash rate on officially supported devices
  • Success rate: Rate of successful requests

The benefits of SRE

Leveraging SRE to drive automation and keep systems reliable has clear benefits across the enterprise.

  • Improve MTTR
  • Improve availability and uptime
  • Improve engineering productivity
  • Improve CSAT
  • Improve customer retention

Insights from our experts