Powered by Grail
The forecast analysis predicts future values of any time series of numeric values. The forecast analyzer is not limited to stored metric data—you can bring your own data, use a metric query, or run a data query that results in a time series of numeric values.
The analysis is agnostic to the distribution of the input data. The forecast is calculated without any assumption about specific data distribution and works for both symmetric and non-symmetric distributions.
Input time series serving as the base for the prediction analysis.
It must include at least 14 data points. If fewer than 14 data points are available, the forecast fails.
Number of simulated paths
The number of paths (nPath) simulated by the sampling forecaster. The more paths that are simulated, the more accurate the forecast and the longer it takes to calculate.
The default value is
The number of data points to be predicted. The more data points that are predicted, the less accurate the forecast is.
The default value is
The coverage probability targeted by the forecaster, which translates to the following quantiles:
For example, if the coverage probability is
The default value is
When you trigger a forecast analysis, the time series is sent to an appropriate forecaster that produces the forecast. Then the forecast quality is evaluated, and the forecast and evaluation are returned as analysis results.
The analysis uses one of the two forecasters:
The sampling forecaster is used if the variance of the linear history timeframe is larger than the maximum value of the
0.001 × input time series variance pair. The linear history timeframe is the X most recent data points, with X being in the range from
In any other case (including the case when training of the sampling forecaster fails), the linear extrapolation forecaster is used.
The sampling forecaster provides multi-step forecasts based on seasonal patterns. Its architecture is shown in the image below.
First, the path transformer is applied to the input time series, and then the forecaster is trained on it. If the resulting model is invalid, the linear extrapolation forecaster is used instead. If the model is valid, sampling paths are created by the seasonal forecaster. Those paths are then inverse transformed before applying the quantiles on each time step.
The forecaster always looks for various seasonal patterns in the input time series. The input time series must contain enough data to detect seasonality reliably.
|Seasonal pattern||Required time series duration|
|1 hour||2+ hours|
|1 day (24 hours)||7+ days|
|1 week (7 days)||14+ days|
|Day of week||7+ days|
|Time of day||2+ days|
|Minute of hour||2+ hours|
Apart from those natural patterns, the forecaster looks for any repetitive patterns in the time series by breaking them based on a certain time window.
The seasonal forecaster estimates quartiles of the time series distribution at the time stamp of the next data point.
The distribution for the value at the next time step is estimated from the quartiles. The bell curve in the image is for illustrative purposes and doesn't represent the distribution used in the sampling forecaster.
The value for the next time step is sampled from the distribution obtained in step 2.
The value obtained in step 3 is now considered a part of the time series, and the forecaster repeats steps 1 to 4 until it reaches the forecast horizon.
The predicted data points form a single sampling path (shown in red in the image below).
The forecaster repeats this process N times, creating multiple sampling paths.
The forecaster takes the sought-after quantile at each time step, forming the prediction interval (shown in light purple in the image below).
Linear extrapolation forecaster
The linear extrapolation forecaster is an algorithm that uses simple linear regression to find the best-fitting line through the last 14 to 20 data points and extend this line into the future. The number of data points used to train the forecaster is determined as
nHistory = min(20, max(14, n)), where
n is the length of the input time series. That is, at least 14 data points of input time series are needed to train the linear forecaster.
For the sake of simplicity, we calculate the confidence interval under the assumption that the residuals are normally distributed as follows:
where tcrit is the 95th percentile of the Student's t-distribution. The standard error is
X̅ is the mean value of x, the sums taken over the training data, and
s is the sample standard error of the last
nHistory points of the input time series.
Forecast quality assessment
After the predictions are generated, we assess their quality to spot potential numerical problems. To assess the forecast quality, the analyzer compares the standard deviation of the prediction to the standard deviation of the input time series (SDinput).
The standard deviation of the predictions (SDprediction) is calculated as the maximum standard deviation of the lower and upper bounds of the prediction interval as well as the standard deviation of the point prediction.
To account for acceptable trends in the predictions, the analyzer uses a scaling factor (SCF). When the length of the prediction data (Nprediction) is large compared to the input time series (Ninput), we allow a larger standard deviation of the prediction than for a small prediction.
Additional input into the scale factor is the Standard deviation factor (SDfactor), with a default value of
100. The scale factor is calculated as follows:
The forecast is evaluated by the following condition:
If the condition is satisfied, the forecast is assessed as valid. Otherwise, the forecast is invalid.
Start forecast analysis
You can trigger a forecast analysis from your notebook.
Navigate to the required notebook or create a new one.
If needed, create a visualization of data.
Hover over the required time series in the sidebar and select > Filter and forecast.
Davis calculates the forecast and shows it, extending your visualization.
The quality of the forecast depends on the quality of data you're feeding to the analyzer and the width of the timeframe you want to predict. The best results are derived for a short timeframe from data without noise and with clear seasonal patterns. Here are some examples of forecasts for various types of data.
This example shows the forecast for an available disk metric. There's no seasonal pattern in the data, and there's a downward trend. Here, the linear extrapolation forecaster is used, producing a wide forecasted interval.
This example shows the forecast for a 1,5-hour timeframe. Notice that the forecasted interval is not smooth, reflecting the noise in input data.
This example shows the forecast for a 6-hours timeframe. Apart from an extensive forecast timeframe, the data itself has some noise and a downward trend that affect forecast quality—the forecasted interval widens more and more as it goes into the future.