gluonts.dataset.split module#
Train/test splitter#
This module defines strategies to split a whole dataset into train and test
subsets. The split()
function can also be used to trigger their logic.
For uniform datasets, where all time series start and end at the same point in
time OffsetSplitter
can be used:
splitter = OffsetSplitter(offset=7)
train, test_template = splitter.split(whole_dataset)
For all other datasets, the more flexible DateSplitter
can be used:
splitter = DateSplitter(
date=pd.Period('2018-01-31', freq='D')
)
train, test_template = splitter.split(whole_dataset)
In the above examples, the train
output is a regular Dataset
that can
be used for training purposes; test_template
can generate test instances
as follows:
test_dataset = test_template.generate_instances(
prediction_length=7,
windows=2,
)
The windows
argument controls how many test windows to generate from each
entry in the original dataset. Each window will begin after the split point,
and so will not contain any training data. By default, windows are
non-overlapping, but this can be controlled with the distance
optional
argument.
test_dataset = test_template.generate_instances(
prediction_length=7,
windows=2,
distance=3, # windows are three time steps apart from each other
)
- class gluonts.dataset.split.AbstractBaseSplitter[source]#
Bases:
abc.ABC
Base class for all other splitter.
- generate_test_pairs(dataset: gluonts.dataset.Dataset, prediction_length: int, windows: int = 1, distance: Optional[int] = None, max_history: Optional[int] = None) Generator[Tuple[Dict[str, Any], Dict[str, Any]], None, None] [source]#
- generate_training_entries(dataset: gluonts.dataset.Dataset) Generator[Dict[str, Any], None, None] [source]#
- split(dataset: gluonts.dataset.Dataset) Tuple[gluonts.dataset.split.TrainingDataset, gluonts.dataset.split.TestTemplate] [source]#
- class gluonts.dataset.split.DateSplitter(date: pandas._libs.tslibs.period.Period)[source]#
Bases:
gluonts.dataset.split.AbstractBaseSplitter
A splitter that slices training and test data based on a
pandas.Period
.Training entries obtained from this class will be limited to observations up to (including) the given
date
.- Parameters
date (pandas._libs.tslibs.period.Period) –
pandas.Period
determining where the training data ends.
- date: pandas._libs.tslibs.period.Period#
- class gluonts.dataset.split.InputDataset(test_data: gluonts.dataset.split.TestData)[source]#
Bases:
object
- test_data: gluonts.dataset.split.TestData#
- class gluonts.dataset.split.LabelDataset(test_data: gluonts.dataset.split.TestData)[source]#
Bases:
object
- test_data: gluonts.dataset.split.TestData#
- class gluonts.dataset.split.OffsetSplitter(offset: int)[source]#
Bases:
gluonts.dataset.split.AbstractBaseSplitter
A splitter that slices training and test data based on a fixed integer offset.
- Parameters
offset (int) – Offset determining where the training data ends. A positive offset indicates how many observations since the start of each series should be in the training slice; a negative offset indicates how many observations before the end of each series should be excluded from the training slice.
- offset: int#
- class gluonts.dataset.split.TestData(dataset: gluonts.dataset.Dataset, splitter: gluonts.dataset.split.AbstractBaseSplitter, prediction_length: int, windows: int = 1, distance: Optional[int] = None, max_history: Optional[int] = None)[source]#
Bases:
object
An iterable type used for wrapping test data.
Elements of a
TestData
object are pairs(input, label)
, whereinput
is input data for models, whilelabel
is the future ground truth that models are supposed to predict.- Parameters
dataset (gluonts.dataset.Dataset) – Whole dataset used for testing.
splitter (gluonts.dataset.split.AbstractBaseSplitter) – A specific splitter that knows how to slices training and test data.
prediction_length (int) – Length of the prediction interval in test data.
windows (int) – Indicates how many test windows to generate for each original dataset entry.
distance (Optional[int]) – This is rather the difference between the start of each test window generated, for each of the original dataset entries.
max_history (Optional[int]) – If given, all entries in the test-set have a max-length of max_history. This can be used to produce smaller file-sizes.
- dataset: gluonts.dataset.Dataset#
- distance: Optional[int] = None#
- property input: gluonts.dataset.split.InputDataset#
- property label: gluonts.dataset.split.LabelDataset#
- max_history: Optional[int] = None#
- prediction_length: int#
- windows: int = 1#
- class gluonts.dataset.split.TestTemplate(dataset: gluonts.dataset.Dataset, splitter: gluonts.dataset.split.AbstractBaseSplitter)[source]#
Bases:
object
A class used for generating test data.
- Parameters
dataset (gluonts.dataset.Dataset) – Whole dataset used for testing.
splitter (gluonts.dataset.split.AbstractBaseSplitter) – A specific splitter that knows how to slices training and test data.
- dataset: gluonts.dataset.Dataset#
- generate_instances(prediction_length: int, windows: int = 1, distance: Optional[int] = None, max_history: Optional[int] = None) gluonts.dataset.split.TestData [source]#
Generate an iterator of test dataset, which includes input part and label part.
- Parameters
prediction_length – Length of the prediction interval in test data.
windows – Indicates how many test windows to generate for each original dataset entry.
distance – This is rather the difference between the start of each test window generated, for each of the original dataset entries.
max_history – If given, all entries in the test-set have a max-length of max_history. This can be used to produce smaller file-sizes.
- class gluonts.dataset.split.TrainingDataset(dataset: gluonts.dataset.Dataset, splitter: gluonts.dataset.split.AbstractBaseSplitter)[source]#
Bases:
object
- dataset: gluonts.dataset.Dataset#
- gluonts.dataset.split.periods_between(start: pandas._libs.tslibs.period.Period, end: pandas._libs.tslibs.period.Period) int [source]#
Count how many periods fit between
start
andend
(inclusive). The frequency is taken fromstart
.For example:
>>> start = pd.Period("2021-01-01 00", freq="2H") >>> end = pd.Period("2021-01-01 11", "2H") >>> periods_between(start, end) 6
>>> start = pd.Period("2021-03-03 23:00", freq="30T") >>> end = pd.Period("2021-03-04 03:29", freq="30T") >>> periods_between(start, end) 9
- gluonts.dataset.split.slice_data_entry(entry: Dict[str, Any], slice_: slice, prediction_length: int = 0) Dict[str, Any] [source]#
- gluonts.dataset.split.split(dataset: gluonts.dataset.Dataset, *, offset: Optional[int] = None, date: Optional[pandas._libs.tslibs.period.Period] = None) Tuple[gluonts.dataset.split.TrainingDataset, gluonts.dataset.split.TestTemplate] [source]#