gluonts.dataset.pandas module#

class gluonts.dataset.pandas.PandasDataset(dataframes: InitVar[Union[pd.DataFrame, pd.Series, Iterable[pd.DataFrame], Iterable[pd.Series], Iterable[tuple[Any, pd.DataFrame]], Iterable[tuple[Any, pd.Series]], dict[str, pd.DataFrame], dict[str, pd.Series]]], target: Union[str, list[str]] = 'target', feat_dynamic_real: Optional[list[str]] = None, past_feat_dynamic_real: Optional[list[str]] = None, timestamp: Optional[str] = None, freq: Optional[str] = None, static_features: InitVar[Optional[pd.DataFrame]] = None, future_length: int = 0, unchecked: bool = False, assume_sorted: bool = False, dtype: Type = <class 'numpy.float32'>)[source]#

Bases: object

A dataset type based on pandas.DataFrame.

This class is constructed with a collection of pandas.DataFrame objects where each DataFrame is representing one time series. Both target and timestamp columns are essential. Dynamic features of a series can be specified with together with the series’ DataFrame, while static features can be specified in a separate DataFrame object via the static_features argument.

Parameters
  • dataframes (InitVar[Union[pd.DataFrame, pd.Series, Iterable[pd.DataFrame], Iterable[pd.Series], Iterable[tuple[Any, pd.DataFrame]], Iterable[tuple[Any, pd.Series]], dict[str, pd.DataFrame], dict[str, pd.Series]]]) – Single pd.DataFrame/pd.Series or a collection as list or dict containing at least timestamp and target values. If a dict is provided, the key will be the associated item_id.

  • target (Union[str, list[str]]) – Name of the column that contains the target time series. For multivariate targets, a list of column names should be provided.

  • timestamp (Optional[str]) – Name of the column that contains the timestamp information.

  • freq (Optional[str]) – Frequency of observations in the time series. Must be a valid pandas frequency.

  • feat_dynamic_real (Optional[list[str]]) – List of column names that contain dynamic real features.

  • past_feat_dynamic_real (Optional[list[str]]) – List of column names that contain dynamic real features only available in the past.

  • static_features (InitVar[Optional[pd.DataFrame]]) – pd.DataFrame containing static features for the series. The index should contain the key of the series in the dataframes argument.

  • future_length (int) – For target and past dynamic features last future_length elements are removed when iterating over the data set.

  • unchecked (bool) – Whether consistency checks on indexes should be skipped. (Default: False)

  • assume_sorted (bool) – Whether to assume that indexes are sorted by time, and skip sorting. (Default: False)

assume_sorted: bool = False#
dataframes: InitVar[Union[pd.DataFrame, pd.Series, Iterable[pd.DataFrame], Iterable[pd.Series], Iterable[tuple[Any, pd.DataFrame]], Iterable[tuple[Any, pd.Series]], dict[str, pd.DataFrame], dict[str, pd.Series]]]#
dtype#

alias of numpy.float32

feat_dynamic_real: Optional[list[str]] = None#
freq: Optional[str] = None#
classmethod from_long_dataframe(dataframe: pd.DataFrame, item_id: str, timestamp: Optional[str] = None, static_feature_columns: Optional[list[str]] = None, static_features: pd.DataFrame = Empty DataFrame Columns: [] Index: [], **kwargs) PandasDataset[source]#

Construct PandasDataset out of a long data frame.

A long dataframe contains time series data (both the target series and covariates) about multiple items at once. An item_id column is used to distinguish the items and group_by accordingly.

Static features can be included in the long data frame as well (with constant value), or be given as a separate data frame indexed by the item_id values.

Note: on large datasets, this constructor can take some time to complete since it does some indexing and groupby operations on the data, and caches the result.

Parameters
  • dataframe – pandas.DataFrame containing at least timestamp, target and item_id columns.

  • item_id – Name of the column that, when grouped by, gives the different time series.

  • static_feature_columns – Columns in dataframe containing static features.

  • static_features – Dedicated DataFrame for static features. If both static_features and static_feature_columns are specified, then the two sets of features are appended together.

  • **kwargs – Additional arguments. Same as of PandasDataset class.

Returns

Dataset containing series data from the given long dataframe.

Return type

PandasDataset

future_length: int = 0#
property num_feat_dynamic_real: int#
property num_feat_static_cat: int#
property num_feat_static_real: int#
property num_past_feat_dynamic_real: int#
past_feat_dynamic_real: Optional[list[str]] = None#
property static_cardinalities#
static_features: InitVar[Optional[pd.DataFrame]] = None#
target: Union[str, list[str]] = 'target'#
timestamp: Optional[str] = None#
unchecked: bool = False#
gluonts.dataset.pandas.infer_freq(index: pandas.core.indexes.base.Index) str[source]#
gluonts.dataset.pandas.is_uniform(index: pandas.core.indexes.period.PeriodIndex) bool[source]#

Check if index contains monotonically increasing periods, evenly spaced with frequency index.freq.

>>> ts = ["2021-01-01 00:00", "2021-01-01 02:00", "2021-01-01 04:00"]
>>> is_uniform(pd.DatetimeIndex(ts).to_period("2H"))
True
>>> ts = ["2021-01-01 00:00", "2021-01-01 04:00"]
>>> is_uniform(pd.DatetimeIndex(ts).to_period("2H"))
False
gluonts.dataset.pandas.pair_with_item_id(obj: Union[tuple, pandas.core.frame.DataFrame, pandas.core.series.Series])[source]#