Robustness of LSTM neural networks for multi-step forecasting of chaotic time series

M Sangiorgio, F Dercole - Chaos, Solitons & Fractals, 2020 - Elsevier
Chaos, Solitons & Fractals, 2020Elsevier
Recurrent neurons (and in particular LSTM cells) demonstrated to be efficient when used as
basic blocks to build sequence to sequence architectures, which represent the state-of-the-
art approach in many sequential tasks related to natural language processing. In this work,
these architectures are proposed as general purposes, multi-step predictors for nonlinear
time series. We analyze artificial, noise-free data generated by chaotic oscillators and
compare LSTM nets with the benchmarks set by feed-forward, one-step-recursive and multi …
Abstract
Recurrent neurons (and in particular LSTM cells) demonstrated to be efficient when used as basic blocks to build sequence to sequence architectures, which represent the state-of-the-art approach in many sequential tasks related to natural language processing. In this work, these architectures are proposed as general purposes, multi-step predictors for nonlinear time series. We analyze artificial, noise-free data generated by chaotic oscillators and compare LSTM nets with the benchmarks set by feed-forward, one-step-recursive and multi-output predictors. We focus on two different training methods for LSTM nets. The traditional one makes use of the so-called teacher forcing, i.e. the ground truth data are used as input for each time step ahead, rather than the outputs predicted for the previous steps. Conversely, the second feeds the previous predictions back into the recurrent neurons, as it happens when the network is used in forecasting. LSTM predictors robustly show the strengths of the two benchmark competitors, i.e., the good short-term performance of one-step-recursive predictors and greatly improved mid-long-term predictions with respect to feed-forward, multi-output predictors. Training LSTM predictors without teacher forcing is recommended to improve accuracy and robustness, and ensures a more uniform distribution of the predictive power within the chaotic attractor. We also show that LSTM architectures maintain good performances when the number of time lags included in the input differs from the actual embedding dimension of the dataset, a feature that is very important when working on real data.
Elsevier