Root Cause Analysis

Root Cause Analysis analyzes the test execution events surrounding the time when a suspected abnormality occurred in a Backbone, Last Mile, Mobile, or Private Last Mile test. It identifies most common performance, availability, and page content issues that may have led to that situation occurring. For each issue discovered, Root Cause Analysis provides a list of the top ranked items that contributed most to the issue.

When you access Root Cause Analysis for a specific data point, Root Cause Analysis shows the analysis results for a 6-hour time period surrounding the selected data point, five hours before and one hour after. Root Cause Analysis shows all of the problems that were identified, from any testing location, for the test being analyzed.

Root Cause Analysis is performed using pre-configured metrics.

Accessing Root Cause Analysis

  1. In the Dynatrace Portal, create an interactive chart for the test and time period in question.
    You can also go to the interactive chart from a chart in the Trend details page by clicking View in interactive charts above the chart.

  2. Configure the chart as follows:

    • Any time frame except Last 6 Months, Last 12 Months, or Year to Date.
    • Interval – 5 minutes, 15 minutes, 30 minutes, 1 hour.
    • Focus – Response Time or Availability
    • Calculation – Average
  3. Click a data point and select the Root Cause Analysis option from the menu.

The Root Cause Analysis window displays the initial analysis findings.

Root Cause Analysis information also appears in the Alert details page for Backbone, Mobile, and Private Last Mile alerts.

Root Cause Analysis provides information on issues categorized by problem or by location. The initial window displays information on causes across all problems.

Regardless of how you view the information, all windows display similar information.

The top of the window lists:

  • The level of analysis: test, step, or geography.
  • The number of test execution events (step executions, objects retrieved) that were reviewed as part of the analysis.
  • The test name.

The center section for every window shows:

  • The 6-hour time period used to analyze the data
  • A summary of the total number of unique problems that were identified.
    There may be more problems listed in the tables shown below than the number shown here as a problem could have been found in more than one step.
  • The number of testing locations, out of the total testing locations for this test, which contributed at least one problem to the results.
  • The type of problems found.
    Click a problem type to hide that type of problem in the table below.
  • The minimum and maximum response time values during the analysis window.
  • The number of failed and successful steps.
  • The option to download the information on a window to a CSV file by clicking .

The information shown in the table depends on whether you are viewing:

  • Causes across all problems. This is the initial window that appears when you select to view Root Cause Analysis information.
  • A summary of an individual problem
  • Location information

At the bottom of every window, is the date, time, response time, and availability for the selected data point.

Problem types

Root Cause Analysis categorizes problems into these categories.

 Performance

Availability

Page content

Root Cause Analysis metrics

These metrics define the thresholds for identifying performance problems. The threshold values are fixed and cannot be changed.

  • Minimum variance for Page Response Times between high and low times – 1000 milliseconds (1 second)
  • Minimum % deviation for Page Response Time between high and low values – 100%
  • Minimum variance for Page Objects between high and low values – 20 objects
  • Minimum % deviation for Page Objects between high and low values – 50%
  • Minimum variance for Page Size between high and low values – 100,000 bytes
  • Minimum % deviation for Page Size between high and low values – 40%

Viewing the Root Cause Analysis by problem

To view details about each of the problems discovered in the 6-hour analysis window, click in the Problems section. Root Cause Analysis displays:

Viewing causes across all problems

The initial Root Cause Analysis window displays causes for all problems.

Root Cause Analysis looks for common themes by analyzing all individual causes for each problem and then ranks these causes. This helps you determine if there is a common issue causing a problem.The columns in the table show:

  • Rank – Shows the ranking order for the specific cause found in the problem analysis results. Factors that determine the rank of a cause are the significance of the problem in which this cause was isolated, the commonality of this cause across other problems and the number of causes coming from the same host or provider.
  • Causes – Provides a description of the factors contributing to the problem, such as the affected host and object combination, return codes for each of the actions or other critical information, depending on the type of problem.
  • First Occurrence – The date and time that the problem first occurred. Click the time stamp in this column to view the Waterfall summary page for the test execution. From there, you can drill down to the waterfall chart for further analysis.

When navigating the Root Cause Analysis window, to view this window again:

  • Click to display this information when viewing location information.
  • Click Causes across all problems when viewing individual problem summary information.

Viewing host information

View the host information to determine if all of the causes belong to a specific host. This helps you to determine if the problem is in or outside your data center.

Click to view host information.

The table lists the hosts associated with each identified cause.

Click to hide this information and return to the previous window.

Viewing individual problems

To view information on individual problems:

  1. Click if this option is not already selected.
  2. Click Individual problem summary.

This window lists each problem found. The columns in the table show:

  • Rank – Factors that determine the rank are the type of problem (availability, performance, or page content), the percentage of problem testing sites reporting the same problem, the percentage of observed bad test runs exhibiting this problem, the severity of the problem, and the time the problem first occurred with a greater emphasis given to issues seen earlier in the time line.
  • Problems – The problem that occurred.
  • Problem Types – The problem type: Performance, Availability, or Page Content.
  • Step – The step number and name where the problem occurred.
  • First Occurrence – The date and time that the problem first occurred. Click the time stamp in this column to view a waterfall chart of the step where the problem first occurred.
  • Locations – The number of location out of the total testing locations that observed this problem. The first value in this column shows the number of locations where this problem occurred. The second value shows the total number of agents where all of the problems during the 6-hour time analysis period occurred.
    For example, if, during the 6-hour analysis time period, problems occurred at eight locations, but the selected problem only occurred at two of the eight locations, the value in this column is 2/8. If this problem occurred at all of the locations, this value in this column is 8/8.
  • Occurrences – The number of times during the 6-hour analysis time period that the problem occurred.
  • Observations – The total number of observations of this step across all locations that exhibited this problem in the 6-hour analysis window.
  • Low Value and High Value – These columns show the lowest and highest value, if applicable, for a particular problem, for example, the lowest and highest response time value.

In this window, you can:

  • Sort by the Rank column. You may find the main problem is the same for all of the locations.
  • Sort a column by clicking a column heading.
  • Select the time link in the First Occurrence column to view a waterfall chart of the current problem.
  • Hover over a problem type to view the description.
  • Select a problem to view a Problem Summary of the top ranked contributors to the problem, including a description that includes the main factors contributing to the problem, such as the affected host and object combination, return codes for each of the actions or other critical information, depending on the type of the problem.

    Click View All to see all of the locations that experienced this same problem and the top 3 problem causes at each location.

Viewing Root Cause Analysis by location

Click in the Locations section to view the locations that are experiencing a problem.

The table in this window lists:

  • Locations – The testing location where the problems occurred during the 6-hour analysis time period.

  • Problem Types – The type of problem that occurred:

    • Performance
    • Availability
    • Page Content
  • Problem – The number of problems that occurred during the 6-hour analysis time period at this location.

Click a testing location to display a table that lists all problems that occurred at this location during the 6-hour analysis time period.

There is a row for each problem. The columns in the table show:

  • Rank – Factors that determine the rank are the type of problem (availability, performance, or page content), the percentage of problem testing sites reporting the same problem, the percentage of observed bad test runs exhibiting this problem, the severity of the problem, and the time the problem first occurred with a greater emphasis given to issues seen earlier in the time line.

  • Problem Types – The problem type: Performance, Availability, or Page Content.

  • Problem – The problem that occurred.

    • Hover over a problem to view the description.
    • Click the problem name to view all of the locations that experienced this same problem and the top 3 problem causes at each location.
  • Step – The step number and name where the problem occurred.

  • First Occurrence – The date and time that the problem first occurred. Click the time stamp in this column to view a waterfall chart of the step where the problem first occurred.

  • Occurrences – The number of times during the 4-hour analysis time period that the problem occurred.

  • Observations – The total number of observations of this step across all locations that exhibited this problem in the 6-hour analysis window.

  • Low Value and High Value – These columns show the lowest and highest value, if applicable, for a particular problem, for example, the lowest and highest response time value.

Drill down for further analysis

In the Causes across all problems or Individual problem summary table, click a First Occurrence timestamp to go to the Waterfall summary page.  From this page, you can drill down to the waterfall chart for a selected step.