Site Reliability Guardian objective examples
Latest Dynatrace
Use this objective examples as an inspiration for creating your own guardians.
Error log entries
Objective
We want the number of error log entries to be less or equal to 10
, with a warning at 8
.
Indicator
For this, we'll create a query that fetches the log events from your environment with the loglevel
set to ERROR
fetch logs
| summarize countIf(loglevel == "ERROR")
Comparison operator
Less or equal than (Lower is better
)
Warning and failure thresholds
- Failure:
10
- Warning:
8
Request success rate - log based
Objective
We want the ratio of successful requests to total requests to be greater or equal to 98
with a warning at 99
.
Indicator
For this, we'll use a query that
- fetches some specific log entries and parses them for events indicating requests,
- defines what's the successful response in these entries,
- and measures the success criteria.
fetch logs
| filter endsWith(log.source,"pgi.log")
| parse content, "LD IPADDR:ip ':' LONG:payload SPACE LD 'HTTP_STATUS' SPACE INT:http_status LD (EOL| EOS)"
| fieldsAdd success = toLong(http_status < 400)
| summarize successRate = sum(success)/count() * 100
Comparison operator
Greater or equals than (Higher is better
)
Warning and failure thresholds
- Failure:
98
- Warning:
99
Request failure rate - log based
Objective
We want the ratio of failed requests to total requests to be less or equal to 0.5
with a warning at 1
.
Indicator
For this, we'll use a query that
- fetches some specific log entries and parses them for events indicating requests,
- defines what's the failed request in these entries,
- and measures the failure criteria.
fetch logs
| filter endsWith(log.source,"pgi.log")
| parse content, "LD 'HTTP_STATUS' SPACE INT:http_status LD (EOL| EOS)"
| fieldsAdd failure = toLong(http_status >= 400)
| summarize failureRate = sum(failure)/count() * 100
Comparison operator
Less or equal than (Lower is better
)
Warning and failure thresholds
- Target:
1
- Warning:
0.5