gluonts.transform package#
- class gluonts.transform.AddAgeFeature(target_field: str, output_field: str, pred_length: int, log_scale: bool = True, dtype: typing.Type = <class 'numpy.float32'>)[source]#
Bases:
gluonts.transform._base.MapTransformation
Adds an ‘age’ feature to the data_entry.
The age feature starts with a small value at the start of the time series and grows over time.
If is_train=True the age feature has the same length as the target field. If is_train=False the age feature has length len(target) + pred_length
- Parameters
target_field – Field with target values (array) of time series
output_field – Field name to use for the output.
pred_length – Prediction length
log_scale – If set to true the age feature grows logarithmically otherwise linearly over time.
- class gluonts.transform.AddAggregateLags(target_field: str, output_field: str, pred_length: int, base_freq: str, agg_freq: str, agg_lags: typing.List[int], agg_fun: str = 'mean', dtype: typing.Type = <class 'numpy.float32'>)[source]#
Bases:
gluonts.transform._base.MapTransformation
Adds aggregate lags as a feature to the data_entry.
Aggregates the original time series to a new frequency and selects the aggregated lags of interest. It does not use aggregate lags that need the last prediction_length values to be computed. Therefore the transformation is applicable to both training and inference.
If is_train=True the lags have the same length as the target field. If is_train=False the lags have length len(target) + pred_length
- Parameters
target_field – Field with target values (array) of time series
output_field – Field name to use for the output.
pred_length – Prediction length.
base_freq – Base frequency, i.e., the frequency of the original time series.
agg_freq – Aggregate frequency, i.e., the frequency of the aggregate time series.
agg_lags – List of aggregate lags given in the aggregate frequency. If some of them are invalid (need some of the last prediction_length values to be computed) they are ignored.
agg_fun – Aggregation function. Default is ‘mean’.
- class gluonts.transform.AddConstFeature(output_field: str, target_field: str, pred_length: int, const: float = 1.0, dtype: typing.Type = <class 'numpy.float32'>)[source]#
Bases:
gluonts.transform._base.MapTransformation
Expands a const value along the time axis as a dynamic feature, where the T-dimension is defined as the sum of the pred_length parameter and the length of a time series specified by the target_field.
If is_train=True the feature matrix has the same length as the target field. If is_train=False the feature matrix has length len(target) + pred_length.
- Parameters
output_field – Field name for output.
target_field – Field containing the target array. The length of this array will be used.
pred_length – Prediction length (this is necessary since features have to be available in the future)
const – Constant value to use.
dtype – Numpy dtype to use for resulting array.
- class gluonts.transform.AddObservedValuesIndicator(target_field: str, output_field: str, imputation_method: typing.Optional[gluonts.transform.feature.MissingValueImputation] = gluonts.transform.feature.DummyValueImputation(dummy_value=0.0), dtype: typing.Type = <class 'numpy.float32'>)[source]#
Bases:
gluonts.transform._base.SimpleTransformation
Replaces missing values in a numpy array (NaNs) with a dummy value and adds an “observed”-indicator that is
1
when values are observed and0
when values are missing.- Parameters
target_field – Field for which missing values will be replaced
output_field – Field name to use for the indicator
imputation_method – One of the methods from ImputationStrategy. If set to None, no imputation is done and only the indicator is included.
- class gluonts.transform.AddTimeFeatures(start_field: str, target_field: str, output_field: str, time_features: typing.List[typing.Callable[[pandas.core.indexes.period.PeriodIndex], numpy.ndarray]], pred_length: int, dtype: typing.Type = <class 'numpy.float32'>)[source]#
Bases:
gluonts.transform._base.MapTransformation
Adds a set of time features.
If is_train=True the feature matrix has the same length as the target field. If is_train=False the feature matrix has length len(target) + pred_length
- Parameters
start_field – Field with the start time stamp of the time series
target_field – Field with the array containing the time series values
output_field – Field name for result.
time_features – list of time features to use.
pred_length – Prediction length
- class gluonts.transform.AdhocTransform(func: Callable[[Dict[str, Any]], Dict[str, Any]])[source]#
Bases:
gluonts.transform._base.SimpleTransformation
Applies a function as a transformation This is called ad-hoc, because it is not serializable.
It is OK to use this for experiments and outside of a model pipeline that needs to be serialized.
- class gluonts.transform.AsNumpyArray(field: str, expected_ndim: int, dtype: typing.Type = <class 'numpy.float32'>)[source]#
Bases:
gluonts.transform._base.SimpleTransformation
Converts the value of a field into a numpy array.
- Parameters
expected_ndim – Expected number of dimensions. Throws an exception if the number of dimensions does not match.
dtype – numpy dtype to use.
- class gluonts.transform.BucketInstanceSampler(*, axis: int = - 1, min_past: int = 0, min_future: int = 0, scale_histogram: gluonts.dataset.stat.ScaleHistogram)[source]#
Bases:
gluonts.transform.sampler.InstanceSampler
This sample can be used when working with a set of time series that have a skewed distributions. For instance, if the dataset contains many time series with small values and few with large values.
The probability of sampling from bucket i is the inverse of its number of elements.
- Parameters
scale_histogram (gluonts.dataset.stat.ScaleHistogram) – The histogram of scale for the time series. Here scale is the mean abs value of the time series.
- scale_histogram: gluonts.dataset.stat.ScaleHistogram#
- class gluonts.transform.CDFtoGaussianTransform(target_dim: int, target_field: str, observed_values_field: str, cdf_suffix='_cdf', max_context_length: typing.Optional[int] = None, dtype: typing.Type = <class 'numpy.float32'>)[source]#
Bases:
gluonts.transform._base.MapTransformation
Marginal transformation that transforms the target via an empirical CDF to a standard gaussian as described here: https://arxiv.org/abs/1910.03002.
To be used in conjunction with a multivariate gaussian to from a copula. Note that this transformation is currently intended for multivariate targets only.
- static winsorized_cutoff(m: float) float [source]#
Apply truncation to the empirical CDF estimator to reduce variance as described here: https://arxiv.org/abs/0903.0649.
- Parameters
m – Input empirical CDF value.
- Returns
Truncated empirical CDf value.
- Return type
res
- class gluonts.transform.CanonicalInstanceSplitter(target_field: str, is_pad_field: str, start_field: str, forecast_start_field: str, instance_sampler: gluonts.transform.sampler.InstanceSampler, instance_length: int, output_NTC: bool = True, time_series_fields: List[str] = [], allow_target_padding: bool = False, pad_value: float = 0.0, use_prediction_features: bool = False, prediction_length: Optional[int] = None)[source]#
Bases:
gluonts.transform._base.FlatMapTransformation
Selects instances, by slicing the target and other time series like arrays at random points in training mode or at the last time point in prediction mode. Assumption is that all time like arrays start at the same time point.
In training mode, the returned instances contain past_`target_field` as well as past_`time_series_fields`.
In prediction mode, one can set use_prediction_features to get future_`time_series_fields`.
If the target array is one-dimensional, the target_field in the resulting instance has shape (instance_length). In the multi-dimensional case, the instance has shape (dim, instance_length), where dim can also take a value of 1.
In the case of insufficient number of time series values, the transformation also adds a field ‘past_is_pad’ that indicates whether values where padded or not, and the value is padded with default_pad_value with a default value 0. This is done only if allow_target_padding is True, and the length of target is smaller than instance_length.
- Parameters
target_field – fields that contains time series
is_pad_field – output field indicating whether padding happened
start_field – field containing the start date of the time series
forecast_start_field – field containing the forecast start date
instance_sampler – instance sampler that provides sampling indices given a time series
instance_length – length of the target seen before making prediction
output_NTC – whether to have time series output in (time, dimension) or in (dimension, time) layout
time_series_fields – fields that contains time series, they are split in the same interval as the target
allow_target_padding – flag to allow padding
pad_value – value to be used for padding
use_prediction_features – flag to indicate if prediction range features should be returned
prediction_length – length of the prediction range, must be set if use_prediction_features is True
- class gluonts.transform.CausalMeanValueImputation[source]#
Bases:
gluonts.transform.feature.MissingValueImputation
This class replaces each missing value with the average of all the values up to this point.
(If the first values are missing, they are replaced by the closest non missing value.)
- class gluonts.transform.Chain(transformations: List[gluonts.transform._base.Transformation])[source]#
Bases:
gluonts.transform._base.Transformation
Chain multiple transformations together.
- transformations: List[gluonts.transform._base.Transformation]#
- class gluonts.transform.ConcatFeatures(output_field: str, input_fields: List[str], drop_inputs: bool = True)[source]#
Bases:
gluonts.transform._base.SimpleTransformation
Concatenate fields together using
np.concatenate
.Fields with value
None
are ignored.- Parameters
output_field – Field name to use for the output
input_fields – Fields to stack together
drop_inputs – If set to true the input fields will be dropped.
- class gluonts.transform.ContinuousTimeInstanceSplitter(past_interval_length: float, future_interval_length: float, freq: pandas._libs.tslibs.offsets.BaseOffset, instance_sampler: gluonts.transform.sampler.ContinuousTimePointSampler, target_field: str = 'target', start_field: str = 'start', end_field: str = 'end', forecast_start_field: str = 'forecast_start')[source]#
Bases:
gluonts.transform._base.FlatMapTransformation
Selects training instances by slicing “intervals” from a continuous-time process instantiation. Concretely, the input data is expected to describe an instantiation from a point (or jump) process, with the “target” identifying inter-arrival times and other features (marks), as described in detail below.
The splitter will then take random points in continuous time from each given observation, and return a (variable-length) array of points in the past (context) and the future (prediction) intervals.
The transformation is analogous to its discrete counterpart InstanceSplitter except that
It does not allow “incomplete” records. That is, the past and future intervals sampled are always complete
Outputs a (T, C) layout.
Does not accept time_series_fields (i.e., only accepts target fields) as these would typically not be available in TPP data.
The target arrays are expected to have (2, T) layout where the first axis corresponds to the (i) inter-arrival times between consecutive points, in order and (ii) integer identifiers of marks (from {0, 1, …,
num_marks
}). The returned arrays will have (T, 2) layout.For example, the array below corresponds to a target array where points with timestamps 0.5, 1.1, and 1.5 were observed belonging to categories (marks) 3, 1 and 0 respectively:
[[0.5, 0.6, 0.4], [3, 1, 0]]
.- Parameters
past_interval_length – length of the interval seen before making prediction
future_interval_length – length of the interval that must be predicted
train_sampler – instance sampler that provides sampling indices given a time series
target_field – field containing the target
start_field – field containing the start date of the of the point process observation
end_field – field containing the end date of the point process observation
forecast_start_field – output field that will contain the time point where the forecast starts
- class gluonts.transform.ContinuousTimePointSampler(*, min_past: float = 0.0, min_future: float = 0.0)[source]#
Bases:
pydantic.v1.main.BaseModel
Abstract class for “continuous time” samplers, which, given a lower bound and upper bound, sample “points” (events) in continuous time from a specified interval.
- min_future: float#
- min_past: float#
- class gluonts.transform.ContinuousTimePredictionSampler(*, min_past: float = 0.0, min_future: float = 0.0, allow_empty_interval: bool = False)[source]#
Bases:
gluonts.transform.sampler.ContinuousTimePointSampler
- allow_empty_interval: bool#
- class gluonts.transform.ContinuousTimeUniformSampler(*, min_past: float = 0.0, min_future: float = 0.0, num_instances: int)[source]#
Bases:
gluonts.transform.sampler.ContinuousTimePointSampler
Implements a simple random sampler to sample points in the continuous interval between
a
andb
.- num_instances: int#
- class gluonts.transform.DummyValueImputation(dummy_value: float = 0.0)[source]#
Bases:
gluonts.transform.feature.MissingValueImputation
This class replaces all the missing values with the same dummy value given in advance.
- class gluonts.transform.ExpandDimArray(field: str, axis: Optional[int] = None)[source]#
Bases:
gluonts.transform._base.SimpleTransformation
Expand dims in the axis specified, if the axis is not present does nothing. (This essentially calls np.expand_dims)
- Parameters
field – Field in dictionary to use
axis – Axis to expand (see np.expand_dims for details)
- class gluonts.transform.ExpectedNumInstanceSampler(*, axis: int = - 1, min_past: int = 0, min_future: int = 0, num_instances: float, min_instances: int = 0, total_length: int = 0, n: int = 0)[source]#
Bases:
gluonts.transform.sampler.InstanceSampler
Keeps track of the average time series length and adjusts the probability per time point such that on average num_instances training examples are generated per time series.
- Parameters
num_instances (float) – number of time points to sample per time series on average
min_instances (int) – minimum number of time points to sample per time series
- min_instances: int#
- n: int#
- num_instances: float#
- total_length: int#
- class gluonts.transform.FlatMapTransformation[source]#
Bases:
gluonts.transform._base.Transformation
Transformations that yield zero or more results per input, but do not combine elements from the input stream.
- class gluonts.transform.InstanceSampler(*, axis: int = - 1, min_past: int = 0, min_future: int = 0)[source]#
Bases:
pydantic.v1.main.BaseModel
An InstanceSampler is called with the time series
ts
, and returns a set of indices at which training instances will be generated.The sampled indices
i
satisfya <= i <= b
, wherea = min_past
andb = ts.shape[axis] - min_future
.- axis: int#
- min_future: int#
- min_past: int#
- class gluonts.transform.InstanceSplitter(target_field: str, is_pad_field: str, start_field: str, forecast_start_field: str, instance_sampler: gluonts.transform.sampler.InstanceSampler, past_length: int, future_length: int, lead_time: int = 0, output_NTC: bool = True, time_series_fields: List[str] = [], dummy_value: float = 0.0)[source]#
Bases:
gluonts.transform._base.FlatMapTransformation
Split instances from a dataset, by slicing the target and other time series fields at points in time selected by the specified sampler. The assumption is that all time series fields start at the same time point.
It is assumed that time axis is always the last axis.
The
target_field
and each field intime_series_fields
are removed and replaced by two new fields, with prefix past_ and future_ respectively.A
past_is_pad
is also added, that indicates whether values at a given time point are padding or not.- Parameters
target_field – field containing the target
is_pad_field – output field indicating whether padding happened
start_field – field containing the start date of the time series
forecast_start_field – output field that will contain the time point where the forecast starts
instance_sampler – instance sampler that provides sampling indices given a time series
past_length – length of the target seen before making prediction
future_length – length of the target that must be predicted
lead_time – gap between the past and future windows (default: 0)
output_NTC – whether to have time series output in (time, dimension) or in (dimension, time) layout (default: True)
time_series_fields – fields that contains time series, they are split in the same interval as the target (default: None)
dummy_value – Value to use for padding. (default: 0.0)
- class gluonts.transform.LastValueImputation[source]#
Bases:
gluonts.transform.feature.MissingValueImputation
This class replaces each missing value with the last value that was not missing.
(If the first values are missing, they are replaced by the closest non missing value.)
- class gluonts.transform.LeavesMissingValues[source]#
Bases:
gluonts.transform.feature.MissingValueImputation
Just leaves the missing values untouched.
- class gluonts.transform.ListFeatures(output_field: str, input_fields: List[str], drop_inputs: bool = True)[source]#
Bases:
gluonts.transform._base.SimpleTransformation
Creates a new field which contains a list of features.
- Parameters
output_field – Field name for output
input_fields – Fields to combine into list
drop_inputs – If true the input fields will be removed from the result.
- class gluonts.transform.MapTransformation[source]#
Bases:
gluonts.transform._base.Transformation
Base class for Transformations that returns exactly one result per input in the stream.
- class gluonts.transform.MeanValueImputation[source]#
Bases:
gluonts.transform.feature.MissingValueImputation
This class replaces all the missing values with the mean of the non missing values.
Careful this is not a ‘causal’ method in the sense that it leaks information about the furture in the imputation. You may prefer to use CausalMeanValueImputation instead.
- class gluonts.transform.MissingValueImputation[source]#
Bases:
object
The parent class for all the missing value imputation classes.
You can just implement your own inheriting this class.
- class gluonts.transform.NumInstanceSampler(*, axis: int = - 1, min_past: int = 0, min_future: int = 0, N: int)[source]#
Bases:
gluonts.transform.sampler.InstanceSampler
Samples N time points from each series.
- Parameters
N (int) – number of time points to sample from each time series.
- N: int#
- class gluonts.transform.QuantizeMeanScaled(bin_edges: List[float], past_target_field: str = 'past_target', past_observed_values_field: str = 'past_observed_values', future_target_field: str = 'future_target', scale_field: str = 'scale')[source]#
Bases:
gluonts.transform._base.SimpleTransformation
Rescale and quantize the target variable. Requires past_target_field, and future_target_field to be present.
The mean absolute value of the past_target is used to rescale past_target and future_target. Then the bin_edges are used to quantize the rescaled target.
The calculated scale is stored in the scale_field.
- Parameters
bin_edges – The bin edges for quantization.
past_target_field – The field name that contains past_target, by default “past_target”
optional – The field name that contains past_target, by default “past_target”
past_observed_values_field – The field name that contains past_observed_values, by default “past_observed_values”
optional – The field name that contains past_observed_values, by default “past_observed_values”
future_target_field – The field name that contains future_target, by default “future_target”
optional – The field name that contains future_target, by default “future_target”
scale_field – The field name where scale will be stored, by default “scale”
optional – The field name where scale will be stored, by default “scale”
- class gluonts.transform.RemoveFields(field_names: List[str])[source]#
Bases:
gluonts.transform._base.SimpleTransformation
Remove field names if present.
- Parameters
field_names – List of names of the fields that will be removed
- class gluonts.transform.RenameFields(mapping: Dict[str, str])[source]#
Bases:
gluonts.transform._base.SimpleTransformation
Rename fields using a mapping, if source field present.
- Parameters
mapping – Name mapping input_name -> output_name
- class gluonts.transform.RollingMeanValueImputation(window_size: int = 10)[source]#
Bases:
gluonts.transform.feature.MissingValueImputation
This class replaces each missing value with the average of all the last window_size (default=10) values.
(If the first values are missing, they are replaced by the closest non missing value.)
- class gluonts.transform.SampleTargetDim(field_name: str, target_field: str, observed_values_field: str, num_samples: int, shuffle: bool = True)[source]#
Bases:
gluonts.transform._base.FlatMapTransformation
Samples random dimensions from the target at training time.
- class gluonts.transform.SelectFields(input_fields: List[str], allow_missing: bool = False)[source]#
Bases:
gluonts.transform._base.MapTransformation
Only keep the listed fields.
- Parameters
input_fields – List of fields to keep.
allow_missing – If
True
, skip any missing field. Default:False
.
- class gluonts.transform.SetField(output_field: str, value: Any)[source]#
Bases:
gluonts.transform._base.SimpleTransformation
Sets a field in the dictionary with the given value.
- Parameters
output_field – Name of the field that will be set
value – Value to be set
- class gluonts.transform.SetFieldIfNotPresent(field: str, value: Any)[source]#
Bases:
gluonts.transform._base.SimpleTransformation
Sets a field in the dictionary with the given value, in case it does not exist already.
- Parameters
output_field – Name of the field that will be set
value – Value to be set
- class gluonts.transform.SimpleTransformation[source]#
Bases:
gluonts.transform._base.MapTransformation
Element wise transformations that are the same in train and test mode.
- class gluonts.transform.SwapAxes(input_fields: List[str], axes: Tuple[int, int])[source]#
Bases:
gluonts.transform._base.SimpleTransformation
Apply np.swapaxes to fields.
- Parameters
input_fields – Field to apply to
axes – Axes to use
- class gluonts.transform.TargetDimIndicator(field_name: str, target_field: str)[source]#
Bases:
gluonts.transform._base.SimpleTransformation
Label-encoding of the target dimensions.
- gluonts.transform.TestSplitSampler(axis: int = - 1, min_past: int = 0) gluonts.transform.sampler.PredictionSplitSampler [source]#
- class gluonts.transform.Transformation[source]#
Bases:
object
Base class for all Transformations.
A Transformation processes works on a stream (iterator) of dictionaries.
- apply(dataset: gluonts.dataset.Dataset, is_train: bool = True) gluonts.transform._base.TransformedDataset [source]#
- class gluonts.transform.TransformedDataset(base_dataset: gluonts.dataset.Dataset, transformation: gluonts.transform._base.Transformation, is_train=True)[source]#
Bases:
gluonts.dataset.Dataset
A dataset that corresponds to applying a list of transformations to each element in the base_dataset. This only supports SimpleTransformations, which do the same thing at prediction and training time.
- Parameters
base_dataset – Dataset to transform
transformations – List of transformations to apply
- class gluonts.transform.UniformSplitSampler(*, axis: int = - 1, min_past: int = 0, min_future: int = 0, p: float)[source]#
Bases:
gluonts.transform.sampler.InstanceSampler
Samples each point with the same fixed probability.
- Parameters
p (float) – Probability of selecting a time point
- p: float#
- gluonts.transform.ValidationSplitSampler(axis: int = - 1, min_past: int = 0, min_future: int = 0) gluonts.transform.sampler.PredictionSplitSampler [source]#
- class gluonts.transform.VstackFeatures(output_field: str, input_fields: List[str], drop_inputs: bool = True, h_stack: bool = False)[source]#
Bases:
gluonts.transform._base.SimpleTransformation
Stack fields together using
np.vstack
when h_stack = False. Otherwise stack fields together usingnp.hstack
.Fields with value
None
are ignored.- Parameters
output_field – Field name to use for the output
input_fields – Fields to stack together
drop_inputs – If set to true the input fields will be dropped.
h_stack – To stack horizontally instead of vertically
- gluonts.transform.cdf_to_gaussian_forward_transform(input_batch: Dict[str, Any], outputs: numpy.ndarray) numpy.ndarray [source]#
Forward transformation of the CDFtoGaussianTransform.
- Parameters
input_batch – Input data to the predictor.
outputs – Predictor outputs.
- Returns
Forward transformed outputs.
- Return type
outputs