gluonts.evaluation package#

class gluonts.evaluation.Evaluator(quantiles: typing.Iterable[typing.Union[float, str]] = (0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9), seasonality: typing.Optional[int] = None, alpha: float = 0.05, calculate_owa: bool = False, custom_eval_fn: typing.Optional[typing.Dict] = None, num_workers: typing.Optional[int] = 4, chunk_size: int = 32, aggregation_strategy: typing.Callable = <function aggregate_no_nan>, ignore_invalid_values: bool = True, allow_nan_forecast: bool = False)[source]#

Bases: object

Evaluator class, to compute accuracy metrics by comparing observations to forecasts.

Parameters
  • quantiles – list of strings of the form ‘p10’ or floats in [0, 1] with the quantile levels

  • seasonality – seasonality to use for seasonal_error, if nothing is passed uses the default seasonality for the given series frequency as returned by get_seasonality

  • alpha – Parameter of the MSIS metric from the M4 competition that defines the prediction interval. For alpha=0.05 (default) the 95% considered is considered in the metric, see https://www.m4.unic.ac.cy/wp-content/uploads/2018/03/M4-Competitors-Guide.pdf for more detail on MSIS

  • calculate_owa – Determines whether the OWA metric should also be calculated, which is computationally expensive to evaluate and thus slows down the evaluation process considerably. By default False.

  • custom_eval_fn – Option to include custom evaluation metrics. Expected input is a dictionary with keys specifying the name of the custom metric and the values are a list containing three elements. First, a callable which takes as input target and forecast and returns the evaluation metric. Second, a string specifying the aggregation metric across all time series, f.e. “mean”, “sum”. Third, either “mean” or “median” to specify whether mean or median forecast should be passed to the custom evaluation function. E.g. {“RMSE”: [rmse, “mean”, “median”]}

  • num_workers – The number of multiprocessing workers that will be used to process the data in parallel. Default is multiprocessing.cpu_count(). Setting it to 0 or None means no multiprocessing.

  • chunk_size – Controls the approximate chunk size each workers handles at a time. Default is 32.

  • ignore_invalid_values – Ignore NaN and inf values in the timeseries when calculating metrics.

  • aggregation_strategy – Function for aggregating per timeseries metrics. Available options are: aggregate_valid | aggregate_all | aggregate_no_nan The default function is aggregate_no_nan.

  • allow_nan_forecast – Whether to allow NaN values in forecasts. If False, raises an error when forecast contains NaN values. Defaults to False.

default_quantiles = (0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9)#
static extract_past_data(time_series: Union[pandas.core.series.Series, pandas.core.frame.DataFrame], forecast: gluonts.model.forecast.Forecast) numpy.ndarray[source]#
Parameters
  • time_series

  • forecast

Returns

time series without the forecast dates

Return type

np.ndarray

static extract_pred_target(time_series: Union[pandas.core.series.Series, pandas.core.frame.DataFrame], forecast: gluonts.model.forecast.Forecast) numpy.ndarray[source]#
Parameters
  • time_series

  • forecast

Returns

time series cut in the Forecast object dates

Return type

np.ndarray

get_aggregate_metrics(metric_per_ts: pandas.core.frame.DataFrame) Tuple[Dict[str, float], pandas.core.frame.DataFrame][source]#
get_base_metrics(forecast: gluonts.model.forecast.Forecast, pred_target, mean_fcst, median_fcst, seasonal_error) Dict[str, Optional[Union[float, str]]][source]#
get_metrics_per_ts(time_series: Union[pandas.core.series.Series, pandas.core.frame.DataFrame], forecast: gluonts.model.forecast.Forecast) Mapping[str, Union[float, str, None, numpy.ma.core.MaskedConstant]][source]#
class gluonts.evaluation.MultivariateEvaluator(quantiles: Iterable[Union[float, str]] = array([0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9]), seasonality: Optional[int] = None, alpha: float = 0.05, eval_dims: Optional[List[int]] = None, target_agg_funcs: Dict[str, Callable] = {}, custom_eval_fn: Optional[dict] = None, num_workers: Optional[int] = None)[source]#

Bases: gluonts.evaluation._base.Evaluator

The MultivariateEvaluator class evaluates forecasts for multivariate or multi-dimensional observations.

Evaluations of individual dimensions will be stored with the corresponding dimension prefix and contain metrics calculated only for this dimension. Metrics with the plain metric name correspond to metrics calculated over all dimensions. Additionally, the user can provide custom aggregation functions that first aggregate the target and forecast over dimensions and then calculate the metric. These metrics will be prefixed with m_<aggregation_fun_name>_

The evaluation dimensions can be set by the user.

Example

{‘0_MSE’: 0.004307240342677687, # MSE of dimension 0 ‘0_abs_error’: 1.6246897801756859, ‘1_MSE’: 0.003949341769475723, # MSE of dimension 1 ‘1_abs_error’: 1.5052175521850586, ‘MSE’: 0.004128291056076705, # MSE of all dimensions ‘abs_error’: 3.1299073323607445, ‘m_sum_MSE’: 0.02 # MSE of aggregated target and aggregated forecast (if target_agg_funcs is set). ‘m_sum_abs_error’: 4.2}

calculate_aggregate_multivariate_metrics(ts_iterator: Iterator[pandas.core.frame.DataFrame], forecast_iterator: Iterator[gluonts.model.forecast.Forecast], agg_fun: Callable) Dict[str, float][source]#
Parameters
  • ts_iterator – Iterator over time series

  • forecast_iterator – Iterator over forecasts

  • agg_fun – aggregation function

Returns

dictionary with aggregate datasets metrics

Return type

Dict[str, float]

calculate_aggregate_vector_metrics(all_agg_metrics: Dict[str, float], all_metrics_per_ts: pandas.core.frame.DataFrame) Dict[str, float][source]#
Parameters
  • all_agg_metrics – dictionary with aggregate metrics of individual dimensions

  • all_metrics_per_ts – DataFrame containing metrics for all time series of all evaluated dimensions

Returns

dictionary with aggregate metrics (of individual (evaluated) dimensions and the entire vector)

Return type

Dict[str, float]

static extract_aggregate_forecast(forecast_iterator: Iterator[gluonts.model.forecast.Forecast], agg_fun: Callable) Iterator[gluonts.model.forecast.Forecast][source]#
static extract_aggregate_target(it_iterator: Iterator[pandas.core.frame.DataFrame], agg_fun: Callable) Iterator[pandas.core.frame.DataFrame][source]#
static extract_forecast_by_dim(forecast_iterator: Iterator[gluonts.model.forecast.Forecast], dim: int) Iterator[gluonts.model.forecast.Forecast][source]#
static extract_target_by_dim(it_iterator: Iterator[pandas.core.frame.DataFrame], dim: int) Iterator[pandas.core.frame.DataFrame][source]#
get_eval_dims(target_dimensionality: int) List[int][source]#
static get_target_dimensionality(forecast: gluonts.model.forecast.Forecast) int[source]#
static peek(iterator: Iterator[Any]) Tuple[Any, Iterator[Any]][source]#
gluonts.evaluation.aggregate_all(metric_per_ts: pandas.core.frame.DataFrame, agg_funs: Dict[str, str]) Dict[str, float][source]#

No filtering applied.

Both nan and inf possible in aggregate metrics.

gluonts.evaluation.aggregate_no_nan(metric_per_ts: pandas.core.frame.DataFrame, agg_funs: Dict[str, str]) Dict[str, float][source]#

Filter all nan but keep inf.

nan is only possible in the aggregate metric if all timeseries for a metric resulted in nan.

gluonts.evaluation.aggregate_valid(metric_per_ts: pandas.core.frame.DataFrame, agg_funs: Dict[str, str]) Dict[str, Union[float, numpy.ma.core.MaskedConstant]][source]#

Filter all nan & inf values from metric_per_ts.

If all metrics in a column of metric_per_ts are nan or inf the result will be np.ma.masked for that column.

gluonts.evaluation.backtest_metrics(test_dataset: gluonts.dataset.Dataset, predictor: gluonts.model.predictor.Predictor, evaluator=<gluonts.evaluation._base.Evaluator object>, num_samples: int = 100, logging_file: typing.Optional[str] = None) Tuple[dict, pandas.core.frame.DataFrame][source]#
Parameters
  • test_dataset – Dataset to use for testing.

  • predictor – The predictor to test.

  • evaluator – Evaluator to use.

  • num_samples – Number of samples to use when generating sample-based forecasts. Only sampling-based models will use this.

  • logging_file – If specified, information of the backtest is redirected to this file.

Returns

A tuple of aggregate metrics and metrics per time series obtained by training forecaster on train_dataset and evaluating the resulting evaluator provided on the test_dataset.

Return type

Tuple[dict, pd.DataFrame]

gluonts.evaluation.make_evaluation_predictions(dataset: gluonts.dataset.Dataset, predictor: gluonts.model.predictor.Predictor, num_samples: int = 100) Tuple[Iterator[gluonts.model.forecast.Forecast], Iterator[pandas.core.series.Series]][source]#

Returns predictions for the trailing prediction_length observations of the given time series, using the given predictor.

The predictor will take as input the given time series without the trailing prediction_length observations.

Parameters
  • dataset – Dataset where the evaluation will happen. Only the portion excluding the prediction_length portion is used when making prediction.

  • predictor – Model used to draw predictions.

  • num_samples – Number of samples to draw on the model when evaluating. Only sampling-based models will use this.

Returns

A pair of iterators, the first one yielding the forecasts, and the second one yielding the corresponding ground truth series.

Return type

Tuple[Iterator[Forecast], Iterator[pd.Series]]