gluonts.mx.trainer.model_iteration_averaging module#
- class gluonts.mx.trainer.model_iteration_averaging.Alpha_Suffix(epochs: int, alpha: float = 0.75, eta: float = 0)[source]#
Bases:
gluonts.mx.trainer.model_iteration_averaging.IterationAveragingStrategy
Implement Alpha Suffix model averaging.
This method is based on paper “Making Gradient Descent Optimalfor Strongly Convex Stochastic Optimization” (https://arxiv.org/pdf/1109.5647.pdf).
- alpha_suffix: float#
- class gluonts.mx.trainer.model_iteration_averaging.IterationAveragingStrategy(eta: float = 0)[source]#
Bases:
object
The model averaging is based on paper “Stochastic Gradient Descent for Non- smooth Optimization: Convergence Results and Optimal Averaging Schemes”, (http://proceedings.mlr.press/v28/shamir13.pdf), which implements polynomial-decay averaging, parameterized by eta.
When eta = 0, it is equivalent to simple average over all iterations with same weights.
- apply(model: mxnet.gluon.block.HybridBlock) Optional[Dict] [source]#
- Parameters
model – The model of the current iteration.
- Return type
The averaged model, None if the averaging hasn’t started.
- average_counter: int#
- averaged_model: Optional[Dict[str, mxnet.ndarray.ndarray.NDArray]]#
- averaging_started: bool#
- cached_model: Optional[Dict[str, mxnet.ndarray.ndarray.NDArray]]#
- load_averaged_model(model: mxnet.gluon.block.HybridBlock)[source]#
When validating/evaluating the averaged model in the half way of training, use load_averaged_model first to load the averaged model and overwrite the current model, do the evaluation, and then use load_cached_model to load the current model back.
- Parameters
model – The model that the averaged model is loaded to.
- load_cached_model(model: mxnet.gluon.block.HybridBlock)[source]#
- Parameters
model – The model that the cached model is loaded to.
- class gluonts.mx.trainer.model_iteration_averaging.ModelIterationAveraging(avg_strategy: gluonts.mx.trainer.model_iteration_averaging.IterationAveragingStrategy)[source]#
Bases:
gluonts.mx.trainer.callback.Callback
Callback to implement iteration based model averaging strategies.
- Parameters
avg_strategy – IterationAveragingStrategy, one of NTA or Alpha_Suffix from gluonts.mx.trainer.model_iteration_averaging
- on_epoch_end(epoch_no: int, epoch_loss: float, training_network: mxnet.gluon.block.HybridBlock, trainer: mxnet.gluon.trainer.Trainer, best_epoch_info: Dict[str, Any], ctx: mxnet.context.Context) bool [source]#
Hook that is called after every epoch. As on_train_epoch_end and on_validation_epoch_end, it returns a boolean whether training should continue. This hook is always called after on_train_epoch_end and on_validation_epoch_end. It is called regardless of these hooks’ return values.
- Parameters
epoch_no – The current epoch (the first epoch has epoch_no = 0).
epoch_loss – The validation loss that was recorded in the last epoch if validation data was provided. The training loss otherwise.
training_network – The network that is being trained.
trainer – The trainer which is running the training.
best_epoch_info – Aggregate information about the best epoch. Contains keys params_path, epoch_no and score. The score is the best validation loss if validation data is provided or the best training loss otherwise.
ctx – The MXNet context used.
- Returns
A boolean whether the training should continue. Defaults to True.
- Return type
bool
- on_train_batch_end(training_network: mxnet.gluon.block.HybridBlock) bool [source]#
Hook that is called after each training batch.
- Parameters
training_network – The network that is being trained.
- Returns
A boolean whether the training should continue. Defaults to True.
- Return type
bool
- on_train_end(training_network: mxnet.gluon.block.HybridBlock, temporary_dir: str, ctx: Optional[mxnet.context.Context] = None) None [source]#
Hook that is called after training is finished. This is the last hook to be called.
- Parameters
training_network – The network that was trained.
temporary_dir – The directory where model parameters are logged throughout training.
ctx – An MXNet context used.
- on_validation_epoch_end(epoch_no: int, epoch_loss: float, training_network: mxnet.gluon.block.HybridBlock, trainer: mxnet.gluon.trainer.Trainer) bool [source]#
Hook that is called after each validation epoch. Similar to on_train_epoch_end, this method returns a boolean whether training should continue. Note that it is always called after on_train_epoch_end within a single epoch. If on_train_epoch_end returned False, this method will not be called.
- Parameters
epoch_no – The current epoch (the first epoch has epoch_no = 0).
epoch_loss – The validation loss that was recorded in the last epoch.
training_network – The network that is being trained.
trainer – The trainer which is running the training.
- Returns
A boolean whether the training should continue. Defaults to True.
- Return type
bool
- class gluonts.mx.trainer.model_iteration_averaging.NTA(epochs: int, n: int = 5, maximize: bool = False, last_n_trigger: bool = False, eta: float = 0, fallback_alpha: float = 0.05)[source]#
Bases:
gluonts.mx.trainer.model_iteration_averaging.IterationAveragingStrategy
Implement Non-monotonically Triggered AvSGD (NTA).
This method is based on paper “Regularizing and Optimizing LSTM Language Models”, (https://openreview.net/pdf?id=SyyGPP0TZ), and an implementation is available in Salesforce GitHub (https://github.com/salesforce/awd-lstm-lm/blob/master/main.py). Note that it mismatches the arxiv (and gluonnlp) version, which is referred to as NTA_V2 below.
- update_average_trigger(metric: Optional[Any] = None, epoch: int = 0, **kwargs)[source]#
- Parameters
metric – The criteria to trigger averaging.
epoch – The epoch to start averaging, not used in NTA
- val_logs: List[Any]#