Problem lifecycle and important timings
The Dynatrace Davis® root-cause engine collects all single events that belong to the same incident. As a result, Davis generates a problem that references all incident-relevant information such as individual events that were detected on the impacted topology graph. The figure below shows how two individual events are analyzed within one problem generated by Davis.
As you can see, each event comes with its own start and end timestamps. Each event producer uses various observation sliding time windows, which we call event analysis time (shown in yellow).
Let's consider an example of a metric event configured to use a 5-minute sliding window where 3-minute samples need to violate the threshold to raise an event. In that case, the metric starts violating the threshold 3 minutes prior to the timestamp when the event is raised. The moment the violation first occurs is marked as the event start analysis timestamp, so the information on exactly when the violation started is not lost.
As the event start analysis timestamp represents the earliest point in time when the violating state was observed, the event end analysis timestamp represents the point in time after all necessary violation samples are collected and the Davis problem is raised.
Because each event involved in the problem uses a sliding window, each problem has a trailing period during which a closed problem might be reopened. This is called the reopening period, and its maximum length is 30 minutes.
If a problem remains open for longer than 90 minutes, no new events will be merged into it after the 90-minute point. This prevents Davis from collecting unrelated information for long-lasting incidents (for example, a synthetic test constantly failing and keeping problems open for weeks).
Summary of the problem lifecycle:
- Individual events use variable analysis sliding windows.
- A problem is raised at the event end analysis timestamp.
- A problem lifespan is defined by the lifespans of individual events in the problem.
- A problem is closed when all events in the problem are closed.
- A closed problem can be reopened during a reopening period of 30 minutes.
- If a problem lasts for longer than 90 minutes, no new events will be merged after the 90-minute point—a new problem will be raised instead.