gluonts.torch.model.i_transformer package#
- class gluonts.torch.model.i_transformer.ITransformerEstimator(prediction_length: int, context_length: Optional[int] = None, d_model: int = 32, nhead: int = 4, dim_feedforward: int = 128, dropout: float = 0.1, activation: str = 'relu', norm_first: bool = False, num_encoder_layers: int = 2, lr: float = 0.001, weight_decay: float = 1e-08, scaling: Optional[str] = 'mean', distr_output: gluonts.torch.distributions.output.Output = gluonts.torch.distributions.studentT.StudentTOutput(beta=0.0), num_parallel_samples: int = 100, batch_size: int = 32, num_batches_per_epoch: int = 50, trainer_kwargs: Optional[Dict[str, Any]] = None, train_sampler: Optional[gluonts.transform.sampler.InstanceSampler] = None, validation_sampler: Optional[gluonts.transform.sampler.InstanceSampler] = None, nonnegative_pred_samples: bool = False)[source]#
Bases:
gluonts.torch.model.estimator.PyTorchLightningEstimator
An estimator training the iTransformer model for multivariate forecasting as described in https://arxiv.org/abs/2310.06625 extended to be probabilistic.
This class uses the model defined in
ITransformerModel
, and wraps it into aITransformerLightningModule
for training purposes: training is performed using PyTorch Lightning’spl.Trainer
class.- Parameters
prediction_length (int) – Length of the prediction horizon.
context_length – Number of time steps prior to prediction time that the model takes as inputs (default:
10 * prediction_length
).d_model – Size of latent in the Transformer encoder.
nhead – Number of attention heads in the Transformer encoder which must divide d_model.
dim_feedforward – Size of hidden layers in the Transformer encoder.
dropout – Dropout probability in the Transformer encoder.
activation – Activation function in the Transformer encoder.
norm_first – Whether to apply normalization before or after the attention.
num_encoder_layers – Number of layers in the Transformer encoder.
lr – Learning rate (default:
1e-3
).weight_decay – Weight decay regularization parameter (default:
1e-8
).scaling – Scaling parameter can be “mean”, “std” or None.
distr_output – Distribution to use to evaluate observations and sample predictions (default: StudentTOutput()).
num_parallel_samples – Number of samples per time series to that the resulting predictor should produce (default: 100).
batch_size – The size of the batches to be used for training (default: 32).
num_batches_per_epoch –
- Number of batches to be processed in each training epoch
(default: 50).
trainer_kwargs – Additional arguments to provide to
pl.Trainer
for construction.train_sampler – Controls the sampling of windows during training.
validation_sampler – Controls the sampling of windows during validation.
nonnegative_pred_samples – Should final prediction samples be non-negative? If yes, an activation function is applied to ensure non-negative. Observe that this is applied only to the final samples and this is not applied during training.
- create_lightning_module() lightning.pytorch.core.module.LightningModule [source]#
Create and return the network used for training (i.e., computing the loss).
- Returns
The network that computes the loss given input data.
- Return type
pl.LightningModule
- create_predictor(transformation: gluonts.transform._base.Transformation, module) gluonts.torch.model.predictor.PyTorchPredictor [source]#
Create and return a predictor object.
- Parameters
transformation – Transformation to be applied to data before it goes into the model.
module – A trained pl.LightningModule object.
- Returns
A predictor wrapping a nn.Module used for inference.
- Return type
- create_training_data_loader(data: gluonts.dataset.Dataset, module: gluonts.torch.model.i_transformer.lightning_module.ITransformerLightningModule, shuffle_buffer_length: Optional[int] = None, **kwargs) Iterable [source]#
Create a data loader for training purposes.
- Parameters
data – Dataset from which to create the data loader.
module – The pl.LightningModule object that will receive the batches from the data loader.
- Returns
The data loader, i.e. and iterable over batches of data.
- Return type
Iterable
- create_transformation() gluonts.transform._base.Transformation [source]#
Create and return the transformation needed for training and inference.
- Returns
The transformation that will be applied entry-wise to datasets, at training and inference time.
- Return type
- create_validation_data_loader(data: gluonts.dataset.Dataset, module: gluonts.torch.model.i_transformer.lightning_module.ITransformerLightningModule, **kwargs) Iterable [source]#
Create a data loader for validation purposes.
- Parameters
data – Dataset from which to create the data loader.
module – The pl.LightningModule object that will receive the batches from the data loader.
- Returns
The data loader, i.e. and iterable over batches of data.
- Return type
Iterable
- lead_time: int#
- prediction_length: int#
- class gluonts.torch.model.i_transformer.ITransformerLightningModule(model_kwargs: dict, num_parallel_samples: int = 100, lr: float = 0.001, weight_decay: float = 1e-08)[source]#
Bases:
lightning.pytorch.core.module.LightningModule
A
pl.LightningModule
class that can be used to train aITransformerModel
with PyTorch Lightning.This is a thin layer around a (wrapped)
ITransformerModel
object, that exposes the methods to evaluate training and validation loss.- Parameters
model_kwargs – Keyword arguments to construct the
ITransformerModel
to be trained.num_parallel_samples – Number of evaluation samples per time series to sample during inference.
lr – Learning rate.
weight_decay – Weight decay regularization parameter.
- class gluonts.torch.model.i_transformer.ITransformerModel(prediction_length: int, context_length: int, d_model: int, nhead: int, dim_feedforward: int, dropout: float, activation: str, norm_first: bool, num_encoder_layers: int, scaling: Optional[str], distr_output=gluonts.torch.distributions.studentT.StudentTOutput(beta=0.0), nonnegative_pred_samples: bool = False)[source]#
Bases:
torch.nn.modules.module.Module
Module implementing the iTransformer model for multivariate forecasting as described in https://arxiv.org/abs/2310.06625 extended to be probabilistic.
- Parameters
prediction_length – Number of time points to predict.
context_length – Number of time steps prior to prediction time that the model.
d_model – Transformer latent dimension.
nhead – Number of attention heads which must be divisible with d_model.
dim_feedforward – Dimension of the transformer’s feedforward network model.
dropout – Dropout rate for the transformer.
activation – Activation function for the transformer.
norm_first – Whether to normalize the input before the transformer.
num_encoder_layers – Number of transformer encoder layers.
scaling – Whether to scale the input using mean or std or None.
distr_output – Distribution to use to evaluate observations and sample predictions. Default:
StudentTOutput()
.nonnegative_pred_samples – Should final prediction samples be non-negative? If yes, an activation function is applied to ensure non-negative. Observe that this is applied only to the final samples and this is not applied during training.
- describe_inputs(batch_size=1) gluonts.model.inputs.InputSpec [source]#
- forward(past_target: torch.Tensor, past_observed_values: torch.Tensor) Tuple[Tuple[torch.Tensor, ...], torch.Tensor, torch.Tensor] [source]#
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- loss(past_target: torch.Tensor, past_observed_values: torch.Tensor, future_target: torch.Tensor, future_observed_values: torch.Tensor) torch.Tensor [source]#
- training: bool#