Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

KODA: A Data-Driven Recursive Model for Forecasting and Data Assimilation Using Koopman operator

Ashutosh Singh1, Ashish Singh3, Tales Imbiriba1, 2, Deniz Erdogmus1, Ricardo Borsoi4

KODA: A Data-Driven Recursive Model for Time Series Forecasting and data assimilation using koopman operators

Ashutosh Singh1, Ashish Singh3, Tales Imbiriba1, 2, Deniz Erdogmus1, Ricardo Borsoi4
Abstract

Approaches based on Koopman operators have shown great promise in forecasting time series data generated by complex nonlinear dynamical systems (NLDS). Although such approaches are able to capture the latent state representation of a NLDS, they still face difficulty in long term forecasting when applied to real world data. Specifically many real-world NLDS exhibit time-varying behavior, leading to nonstationarity that is hard to capture with such models. Furthermore they lack a systematic data-driven approach to perform data assimilation, that is, exploiting noisy measurements on the fly in the forecasting task. To alleviate the above issues, we propose a Koopman operator-based approach (named KODA - Koopman Operator with Data Assimilation) that integrates forecasting and data assimilation in NLDS. In particular we use a Fourier domain filter to disentangle the data into a physical component whose dynamics can be accurately represented by a Koopman operator, and residual dynamics that represents the local or time varying behavior that are captured by a flexible and learnable recursive model. We carefully design an architecture and training criterion that ensures this decomposition lead to stable and long-term forecasts. Moreover, we introduce a course correction strategy to perform data assimilation with new measurements at inference time. The proposed approach is completely data-driven and can be learned end-to-end. Through extensive experimental comparisons we show that KODA outperforms existing state of the art methods on multiple time series benchmarks such as electricity, temperature, weather, lorenz 63 and duffing oscillator demonstrating its superior performance and efficacy along the three tasks a) forecasting, b) data assimilation and c) state prediction.

Introduction

Many real world applications tackle phenomena which are dynamic in nature. Measuring and tracking the evolution of such phenomena holds extreme importance in many fields such as weather forecasting (Pathak et al. 2022; Lam et al. 2022), evolution of PDEs (Li et al. 2020; Kovachki et al. 2023), modeling neural dynamics (Brunton et al. 2016; Singh et al. 2021), etc. Most of these phenomena are characterised as nonlinear dynamical systems (NLDS). If prior knowledge about the underlying system’s evolution is known then it can inform modeling, see, e.g., (Cai et al. 2021; Cuomo et al. 2022; Raissi, Perdikaris, and Karniadakis 2019; Imbiriba et al. 2023). However, in most scenarios no direct information is available about the physics of the underlying system. Therefore, for such cases we can only depend on the measurement data to model system’s dynamics (Guen and Thome 2020; Afshar, Germ, and Morris 2023; Frion et al. 2024b). Key motivations for such modeling is to evolve the model and generate accurate forecasts over long horizon. But the non-linear and non-stationary characteristic of these system makes it hard to capture the state representation or generate accurate forecasts. One classical approach is to model the NLDS in a representation space where the system dynamics is linear. Using the tools from spectral theory and linear algebra we can study various characteristics of the NLDS such as stability, asymptotic nature etc.

Refer to caption
Figure 1: Pictorial representation of data assimilation using a recursive prediction-correction. The blue line represents actual state trajectory while the dotted line represents the trajectory generated by the learned model. Red triangles represent the model prediction while the green squares represent the corrected prediction using the noisy measurement represented by the blue circles.

Koopman operator theory (Koopman 1931; Phillips 1961) generates a representation with a linear dynamics from the NLDS. It achieves this by mapping the system’s dynamics into an infinite-dimensional space using a set of measurement functions. Given that learning such a mapping is computationally infeasible, numerous algorithms have been proposed in the literature which seek to approximate the Koopman operator in finite dimensions and effectively describe the NLDS (Brunton et al. 2021). In recent years, deep learning algorithms have emerged to learn these operators directly from the measurement snapshot data to predict the system’s state space evolution. Despite these advancements, such methods are still susceptible to error accumulation, which can lead to significant drift in state space estimation during long-term forecasting. This calls for an adaptive framework that can utilise measurement data on the fly to correct its prediction. 1 represents a prediction-correction mechanism for modeling a dynamical system. When measurements snapshots are available they are used to correct prediction error.

The Kalman filter presents a systematic way for state space prediction and correction mechanism using measurement data. Many deep learning methods have been recently proposed that takes inspiration from the traditional Kalman filter framework, motivating the use of neural network architectures in an adaptive manner to produce robust long term forecasts (Revach et al. 2022; Singh et al. 2024; Guen and Thome 2020).

In this paper, we propose a branched architecture that disentangles the dynamics into global and local (residual) components in a segmented fashion. This allows us to recover the dynamics of the underlying global factor while being robust to noise and local variations within each segment. Additionally, by not discarding the local sources of variation and capturing it with a recurrent residual model we are able to generate long term time series forecasting (LTSF). We further motivate a data assimilation framework for this disentangled view of the state space to perform online data assimilation during inference. The main contributions of this work are:

  • The proposed approach is able to disentangle the physical and residual components of the dynamics and evolve them separately to generate forecasts.

  • We introduce a Kalman filter-inspired data assimilation framework with the Koopman operator-based prediction model.

  • The proposed approach can leverage data available during inference time to improve the prediction of the model hence making it possible to generate accurate forecasts over long horizons, hence presenting a solution for LTSF.

  • The proposed approach is one of the first works that extends data assimilation to a disentangled state representation-based model.

Related Works

Learning with Koopman Operator

Koopman operator theory (Brunton et al. 2021) serves as a potent tool for unveiling the inherent dynamics of non-linear systems. Notably, the recent surge in interest surrounding Koopman operators can be attributed to the strong theoretical foundations and empirical success of such algorithms. Most famous among them is Dynamic Mode Decomposition (Tu 2013), a data-driven approach (Brunton, Proctor, and Kutz 2016; Kutz et al. 2016) enabling a practical approximation of the Koopman operator. Recent works (see, e.g., (Kawahara 2016; Lusch, Kutz, and Brunton 2018; Yeung, Kundu, and Hodas 2019; Takeishi, Kawahara, and Yairi 2017; Bevanda et al. 2023)) further explored learning theory for estimating Koopman operators. In particularly, the Koopman autoencoder architecture proposed in (Azencot et al. 2020) and (Lusch, Kutz, and Brunton 2018) has become one of the most widely used deep learning models for this task. Since then several work have been proposed in the intersection of learning theory and Koopman operators (Fathi et al. 2024; Wang, Xu, and Mu 2023; Berman, Naiman, and Azencot 2023). The recent models proposed in (Liu et al. 2024; Wang et al. 2023b), proposed Koopman-based neural forecasting methods by disentangling the system’s dynamics into local and global components. However, they lack a systematic approach for data assimilation during inference time.

Data Assimilation

Some of the earlier work that combine Koopman operators with the Kalman filter are (Benosman, Mansour, and Huroyan 2017), where the author formulates a linear observer design using Koopman operators to predict crowd flow, and (Jiang et al. 2022) where the authors use a kernel-based Koopman operator for robotic systems. Unlike these works, the proposed approach is a deep learning-based recursive approach. (Frion et al. 2024a) is one of the first works that motivates data assimilation with a neural Koopman operator. While the model proposed in (Frion et al. 2024a) uses a Koopman operator as prior for variational data assimilation, in KODA we adopt a branched prediction model within a Kalman filter-inspired framework for both forecasting and assimilation. Additionally, unlike (Frion et al. 2024a), KODA can achieve online data assimilation during inference. Similarly, in (Singh et al. 2024) the authors propose a Kalman filter-based data assimilation using neural operators for semilinear PDEs. Differently, our method motivates data assimilation jointly with a Koopman operator-based prediction model and a parallel residual model. We also don’t assume access to a ground truth of a latent state representation of the system and instead learn to evolve the model just from measurement data. In (Frion et al. 2024b) authors propose a way of uncertainty quantification for time series forecasting using Koopman ensembles.

Problem Setting

In this section, we formally define the problem of learning a model that can not only produce solution for LTSF problem, provided sufficient historical data, but also utilise new data available during inference to correct the model prediction. The underlying physical quantity whose dynamic we observe is referred to as states and the observed data itself is referred to as measurements of the state. Therefore, let us denote s(t)𝑠𝑡s(t)\in\mathcal{H}italic_s ( italic_t ) ∈ caligraphic_H as the state representation, and by y(t)𝒴𝑦𝑡𝒴y(t)\in\mathcal{Y}italic_y ( italic_t ) ∈ caligraphic_Y as the representation of the measurement snapshots. \mathcal{H}caligraphic_H and 𝒴𝒴\mathcal{Y}caligraphic_Y are finite dimensional vector spaces. Hence a NLDS can be described as:

s(t)t=𝒜(s(t))+η(t)𝑠𝑡𝑡𝒜𝑠𝑡𝜂𝑡\displaystyle\frac{\partial s(t)}{\partial t}=\mathcal{A}(s(t))+\eta(t)divide start_ARG ∂ italic_s ( italic_t ) end_ARG start_ARG ∂ italic_t end_ARG = caligraphic_A ( italic_s ( italic_t ) ) + italic_η ( italic_t ) (1)
y(t)=(s(t))+ϵ(t)𝑦𝑡𝑠𝑡italic-ϵ𝑡\displaystyle y(t)=\mathcal{B}(s(t))+\epsilon(t)italic_y ( italic_t ) = caligraphic_B ( italic_s ( italic_t ) ) + italic_ϵ ( italic_t ) (2)

where operator 𝒜::𝒜\mathcal{A}:\mathcal{H}\to\mathcal{H}caligraphic_A : caligraphic_H → caligraphic_H models the evolution of the states, :𝒴:𝒴\mathcal{B}:\mathcal{H}\to\mathcal{Y}caligraphic_B : caligraphic_H → caligraphic_Y represents the mapping from the states to the measurements, η(t)𝜂𝑡\eta(t)\in\mathcal{H}italic_η ( italic_t ) ∈ caligraphic_H denote a process noise which represents possible stochasticity in the state evolution, and ϵ(t)𝒴italic-ϵ𝑡𝒴\epsilon(t)\in\mathcal{Y}italic_ϵ ( italic_t ) ∈ caligraphic_Y denotes measurement noise.
The main objective of the forecasting problem is to obtain accurate estimates of the future measurement trajectory y~t+τ=(s(t+τ))subscript~𝑦𝑡𝜏𝑠𝑡𝜏\tilde{y}_{t+\tau}=\mathcal{B}(s(t+\tau))over~ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_t + italic_τ end_POSTSUBSCRIPT = caligraphic_B ( italic_s ( italic_t + italic_τ ) ), τ{1,2,3,}𝜏123\tau\in\{1,2,3,\ldots\}italic_τ ∈ { 1 , 2 , 3 , … } based on past measurements {y0,,yt}subscript𝑦0subscript𝑦𝑡\{y_{0},\ldots,y_{t}\}{ italic_y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , … , italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT }, where ytsubscript𝑦𝑡y_{t}italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT denote a discrete measurement of y(t)𝑦𝑡y(t)italic_y ( italic_t ). Note that while the NLDS (2) might be described in continuous time, we consider the discrete-time forecasting problem, based on discrete measurements. In order to keep notation clear, we denote the discrete time index as a subscript. Considering some training dataset with different realizations of {y1,,yT}subscript𝑦1subscript𝑦𝑇\{y_{1},\ldots,y_{T}\}{ italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_y start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT }, the forecasting objective can be formulated as learning an operator Φ:t×:Φsuperscript𝑡subscript\Phi:\mathcal{H}^{t}\times\mathbb{N}_{*}\to\mathcal{H}roman_Φ : caligraphic_H start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT × roman_ℕ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT → caligraphic_H that predicts the future measurements as y~t+τ=Φ(y1,,yt;τ)subscript~𝑦𝑡𝜏Φsubscript𝑦1subscript𝑦𝑡𝜏\tilde{y}_{t+\tau}=\Phi(y_{1},\ldots,y_{t};\tau)over~ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_t + italic_τ end_POSTSUBSCRIPT = roman_Φ ( italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ; italic_τ ). The above objective is generally optimised by minimizing the prediction loss between the forecast and ground truth during training. The data assimilation problem extends this task by including the possibility of incorporating a sparse set of measurements, irregularly present along the forecast horizon, on the fly to improve the forecast estimate.

Note that learning such a model is not easy since the amount of training data might be limited, and the NLDS under consideration can be highly nonlinear. Moreover the measurements y(t)𝑦𝑡y(t)italic_y ( italic_t ) are often accompanied with noise. This makes it harder to avoid drift in the forecast over time. In the section below, we introduce KODA a framework for generating long term forecasts and preforming data assimilation, while enabling us to study the properties of the NLDS through the learned koopman operator.

KODA

In this section, we formally propose KODA and its various elements. KODA adopts a disentangled view of the state representation separating it into a physical and a residual component 2. To achieve this purely from the measurement data, we separate the dominant spectrum across the data using a Fourier filter. We then use an encoding model that learns the physical and residual components, of the state representation, from the dominant and non-dominant parts of the measurement data. Using separate prediction models, for both the state space components, we generate their future estimates. Upon adding the predicate from the two models we compute the future state estimates. We then use a decoder to map the state estimate back to the measurement space, hence recovering the future forecast. We perform this recursively in a windowed fashion to generate future trajectory. At inference time whenever the measurement data is available we use it in the correction mechanism of KODA to update the model predictions. Hence KODA presents a framework that is purely learning based and data driven.

Refer to caption
Figure 2: Schematic diagram of one time step prediction from KDOA. Z represent physical state component, R represents the residual while Y represents measurement data. Frsubscript𝐹𝑟F_{r}italic_F start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT and K𝐾Kitalic_K rperesents prediction models.

Data Encoding Scheme

State Disentanglement

In order to facilitate the task of LTSF, we aim to learn a latent representation htsubscript𝑡h_{t}\in\mathcal{H}italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ caligraphic_H. This representation act as the learning based approximation of the true stsubscript𝑠𝑡s_{t}italic_s start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT in (2). Assume that htsubscript𝑡h_{t}italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT comprises of multiple sources of variation (physical and residual) we use the following disentanglement of the state trajectory,

ht=zt+rt,subscript𝑡subscript𝑧𝑡subscript𝑟𝑡\displaystyle h_{t}=z_{t}+r_{t}\,,italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = italic_z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_r start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , (3)

where ztsubscript𝑧𝑡z_{t}\in\mathcal{H}italic_z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ caligraphic_H represents the underlying physical component while rtsubscript𝑟𝑡r_{t}\in\mathcal{H}italic_r start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ caligraphic_H represents the residual element of the state trajectory. By disentangling the dynamics into these two components we aim at recovering the physical component i.e. the stable component (time invariant in the selected window). The benefit of such disentanglement is that we are able to separate physical component from the local ones. The physical component is expected to capture sources of variation that cause long term change and are stable in smaller window analysis of which could shed light on the spectral property of the underlying system. Due to its usefulness this disentangled view of the dynamics have garnered interest in recent literature (Guen and Thome 2020), (Liu et al. 2024) and (Wang et al. 2023b).

Measurement Disentanglement

Since no ground truth is available about the true state, we learn to approximate the state representations purely from the measurement data. We use Fourier filtering to find and separate the dominant spectrum present across the measurement space. To achieve the best performance for LTSF we pre-computed a filter by taking all the τ𝜏\tauitalic_τ length windows in the training data and extracted the dominant spectrum across them. We define Yk={yt,yt+1,,yt+τ}subscript𝑌𝑘subscript𝑦𝑡subscript𝑦𝑡1subscript𝑦𝑡𝜏Y_{k}=\{y_{t},y_{t+1},...,y_{t+\tau}\}italic_Y start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = { italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT , … , italic_y start_POSTSUBSCRIPT italic_t + italic_τ end_POSTSUBSCRIPT } as the sliding window over the data with τ𝜏subscript\tau\in\mathbb{N}_{*}italic_τ ∈ roman_ℕ start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT as the window length. We use this Fourier filtering to represent the resulting disentanglement in the measurement space as,

Ykgsubscriptsuperscript𝑌𝑔𝑘\displaystyle Y^{g}_{k}italic_Y start_POSTSUPERSCRIPT italic_g end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT =1(Gα((Yk)))absentsuperscript1subscript𝐺𝛼subscript𝑌𝑘\displaystyle=\mathcal{F}^{-1}(G_{\alpha}(\mathcal{F}(Y_{k})))= caligraphic_F start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_G start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT ( caligraphic_F ( italic_Y start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ) ) (4)
Ykrsubscriptsuperscript𝑌𝑟𝑘\displaystyle Y^{r}_{k}italic_Y start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT =1(G¯α((Yk)))absentsuperscript1subscript¯𝐺𝛼subscript𝑌𝑘\displaystyle=\mathcal{F}^{-1}(\overline{G}_{\alpha}(\mathcal{F}(Y_{k})))= caligraphic_F start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( over¯ start_ARG italic_G end_ARG start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT ( caligraphic_F ( italic_Y start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ) ) (5)
s.t.Yks.t.subscript𝑌𝑘\displaystyle\text{s.t.}\quad Y_{k}s.t. italic_Y start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT =Ykg+Ykrabsentsubscriptsuperscript𝑌𝑔𝑘subscriptsuperscript𝑌𝑟𝑘\displaystyle=Y^{g}_{k}+Y^{r}_{k}= italic_Y start_POSTSUPERSCRIPT italic_g end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + italic_Y start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT (6)

here Gαsubscript𝐺𝛼G_{\alpha}italic_G start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT represents the filter, where α𝛼\alphaitalic_α is a hyper-parameter, and G¯αsubscript¯𝐺𝛼\overline{G}_{\alpha}over¯ start_ARG italic_G end_ARG start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT represents its complement which is designed such that (6) is satisfied, guaranteeing no information loss. \mathcal{F}caligraphic_F and 1superscript1\mathcal{F}^{-1}caligraphic_F start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT represents Fourier transform and its inverse applied in the time dimension. Both ytgsubscriptsuperscript𝑦𝑔𝑡y^{g}_{t}italic_y start_POSTSUPERSCRIPT italic_g end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT and ytrsubscriptsuperscript𝑦𝑟𝑡y^{r}_{t}italic_y start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT are defined such that ytg𝒴subscriptsuperscript𝑦𝑔𝑡𝒴y^{g}_{t}\in\mathcal{Y}italic_y start_POSTSUPERSCRIPT italic_g end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ caligraphic_Y and ytr𝒴subscriptsuperscript𝑦𝑟𝑡𝒴y^{r}_{t}\in\mathcal{Y}italic_y start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ caligraphic_Y. We assume that whenever the data is available it always exist as a block of τ𝜏\tauitalic_τ samples. In many application, such as electricity, motor fault analysis etc, it is quite common that the data is collected for a small window of time at irregular intervals. Hence during the data assimilation task we assume that data is irregularly present as a block of τ𝜏\tauitalic_τ.

Segmentation and Encoding

Within each kthsuperscript𝑘𝑡k^{th}italic_k start_POSTSUPERSCRIPT italic_t italic_h end_POSTSUPERSCRIPT window we create further s𝑠sitalic_s segments of segment length w=τ/s𝑤𝜏𝑠w=\tau/sitalic_w = italic_τ / italic_s. Hence Ykg={𝕪k1g,𝕪k2g,,𝕪ksg}subscriptsuperscript𝑌𝑔𝑘subscriptsuperscript𝕪𝑔𝑘1subscriptsuperscript𝕪𝑔𝑘2subscriptsuperscript𝕪𝑔𝑘𝑠Y^{g}_{k}=\{\mathbb{y}^{g}_{k1},\mathbb{y}^{g}_{k2},...,\mathbb{y}^{g}_{ks}\}italic_Y start_POSTSUPERSCRIPT italic_g end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = { roman_𝕪 start_POSTSUPERSCRIPT italic_g end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k 1 end_POSTSUBSCRIPT , roman_𝕪 start_POSTSUPERSCRIPT italic_g end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k 2 end_POSTSUBSCRIPT , … , roman_𝕪 start_POSTSUPERSCRIPT italic_g end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k italic_s end_POSTSUBSCRIPT } after reshaping, where 𝕪kigsubscriptsuperscript𝕪𝑔𝑘𝑖\mathbb{y}^{g}_{ki}roman_𝕪 start_POSTSUPERSCRIPT italic_g end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k italic_i end_POSTSUBSCRIPT represents the ithsuperscript𝑖𝑡i^{th}italic_i start_POSTSUPERSCRIPT italic_t italic_h end_POSTSUPERSCRIPT segment of kthsuperscript𝑘𝑡k^{th}italic_k start_POSTSUPERSCRIPT italic_t italic_h end_POSTSUPERSCRIPT window. Same segmentation is used for Ykrsubscriptsuperscript𝑌𝑟𝑘Y^{r}_{k}italic_Y start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT. Therefore both 𝕪kig𝒴wsubscriptsuperscript𝕪𝑔𝑘𝑖superscript𝒴𝑤\mathbb{y}^{g}_{ki}\in\mathcal{Y}^{w}roman_𝕪 start_POSTSUPERSCRIPT italic_g end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k italic_i end_POSTSUBSCRIPT ∈ caligraphic_Y start_POSTSUPERSCRIPT italic_w end_POSTSUPERSCRIPT and 𝕪kir𝒴wsubscriptsuperscript𝕪𝑟𝑘𝑖superscript𝒴𝑤\mathbb{y}^{r}_{ki}\in\mathcal{Y}^{w}roman_𝕪 start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k italic_i end_POSTSUBSCRIPT ∈ caligraphic_Y start_POSTSUPERSCRIPT italic_w end_POSTSUPERSCRIPT. We use identical encoder architecture ϕg:𝒴w:subscriptitalic-ϕ𝑔superscript𝒴𝑤\phi_{g}:\mathcal{Y}^{w}\rightarrow\mathcal{H}italic_ϕ start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT : caligraphic_Y start_POSTSUPERSCRIPT italic_w end_POSTSUPERSCRIPT → caligraphic_H and ϕr:𝒴w:subscriptitalic-ϕ𝑟superscript𝒴𝑤\phi_{r}:\mathcal{Y}^{w}\rightarrow\mathcal{H}italic_ϕ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT : caligraphic_Y start_POSTSUPERSCRIPT italic_w end_POSTSUPERSCRIPT → caligraphic_H for Ykgsubscriptsuperscript𝑌𝑔𝑘Y^{g}_{k}italic_Y start_POSTSUPERSCRIPT italic_g end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT and Ykrsubscriptsuperscript𝑌𝑟𝑘Y^{r}_{k}italic_Y start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT respectively, to then get Zk={zk1,zk2,,zks}subscript𝑍𝑘subscript𝑧𝑘1subscript𝑧𝑘2subscript𝑧𝑘𝑠Z_{k}=\{z_{k1},z_{k2},...,z_{ks}\}italic_Z start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = { italic_z start_POSTSUBSCRIPT italic_k 1 end_POSTSUBSCRIPT , italic_z start_POSTSUBSCRIPT italic_k 2 end_POSTSUBSCRIPT , … , italic_z start_POSTSUBSCRIPT italic_k italic_s end_POSTSUBSCRIPT } and Rk={rk1,rk2,,rks}subscript𝑅𝑘subscript𝑟𝑘1subscript𝑟𝑘2subscript𝑟𝑘𝑠R_{k}=\{r_{k1},r_{k2},...,r_{ks}\}italic_R start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = { italic_r start_POSTSUBSCRIPT italic_k 1 end_POSTSUBSCRIPT , italic_r start_POSTSUBSCRIPT italic_k 2 end_POSTSUBSCRIPT , … , italic_r start_POSTSUBSCRIPT italic_k italic_s end_POSTSUBSCRIPT } the respective representations as

zki=ϕg(𝕪kig),rki=ϕr(𝕪kir).formulae-sequencesubscript𝑧𝑘𝑖subscriptitalic-ϕ𝑔subscriptsuperscript𝕪𝑔𝑘𝑖subscript𝑟𝑘𝑖subscriptitalic-ϕ𝑟subscriptsuperscript𝕪𝑟𝑘𝑖\displaystyle z_{ki}=\phi_{g}(\mathbb{y}^{g}_{ki})\,,\quad r_{ki}=\phi_{r}(% \mathbb{y}^{r}_{ki})\,.italic_z start_POSTSUBSCRIPT italic_k italic_i end_POSTSUBSCRIPT = italic_ϕ start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT ( roman_𝕪 start_POSTSUPERSCRIPT italic_g end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k italic_i end_POSTSUBSCRIPT ) , italic_r start_POSTSUBSCRIPT italic_k italic_i end_POSTSUBSCRIPT = italic_ϕ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( roman_𝕪 start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k italic_i end_POSTSUBSCRIPT ) . (7)

Previous work (Lin et al. 2023) have shown that segment wise iteration leads to better forecast for the LTSF problem instead of point wise iteration.

Prediction Model

Based on the disentanglement of the latent state representation (3), we design a prediction model which is used to propagate the states (7) forward along the temporal dimension. The prediction model consists of two parallel branches: (i) a Global model, which propagates the component Zksubscript𝑍𝑘Z_{k}italic_Z start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT, and (ii) a Residual model which propagates the component Rksubscript𝑅𝑘R_{k}italic_R start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT. We describe each of these models below.

Residual Model

Modeling the residual is important for the reconstruction and for the forecast of the measurement trajectory. We can define the residual Rtsubscript𝑅𝑡R_{t}italic_R start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT trajectory as follows,

rki=Fr(rk(i1))subscript𝑟𝑘𝑖subscript𝐹𝑟subscript𝑟𝑘𝑖1r_{ki}=F_{r}(r_{k(i-1)})italic_r start_POSTSUBSCRIPT italic_k italic_i end_POSTSUBSCRIPT = italic_F start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( italic_r start_POSTSUBSCRIPT italic_k ( italic_i - 1 ) end_POSTSUBSCRIPT ) (8)

here Fr::subscript𝐹𝑟F_{r}:\mathcal{H}\rightarrow\mathcal{H}italic_F start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT : caligraphic_H → caligraphic_H represents the residual model. We use a gated recurrent unit (GRU) (Chung et al. 2014) to then propagate the residual trajectory. Other recurrent architecture such as LSTM (Cheng, Dong, and Lapata 2016) or RNN (Salehinejad et al. 2017) could also be used here.

Physical Model

Our prediction model for the physical component of the signal learns koopman operator 𝒦::𝒦\mathcal{K}:\mathcal{H}\rightarrow\mathcal{H}caligraphic_K : caligraphic_H → caligraphic_H such that,

z^k(i)subscript^𝑧𝑘𝑖\displaystyle\hat{z}_{k(i)}over^ start_ARG italic_z end_ARG start_POSTSUBSCRIPT italic_k ( italic_i ) end_POSTSUBSCRIPT =𝒦z^k(i1)absent𝒦subscript^𝑧𝑘𝑖1\displaystyle=\mathcal{K}\circ\hat{z}_{k(i-1)}= caligraphic_K ∘ over^ start_ARG italic_z end_ARG start_POSTSUBSCRIPT italic_k ( italic_i - 1 ) end_POSTSUBSCRIPT (9)

many methods in literature has been proposed to estimate the Koopman operator from data. Note that z^kisubscript^𝑧𝑘𝑖\hat{z}_{ki}over^ start_ARG italic_z end_ARG start_POSTSUBSCRIPT italic_k italic_i end_POSTSUBSCRIPT in (9) and zkisubscript𝑧𝑘𝑖z_{ki}italic_z start_POSTSUBSCRIPT italic_k italic_i end_POSTSUBSCRIPT in (7) are different as the later represents the encoding from the measurement while the prior represents the koopman forecast. To align the two representation, an alignment loss is computed and minimised. In (Lusch, Kutz, and Brunton 2018; Azencot et al. 2020) the 𝒦𝒦\mathcal{K}caligraphic_K is composed recursively with z^0subscript^𝑧0\hat{z}_{0}over^ start_ARG italic_z end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT to generate all the future z^^𝑧\hat{z}over^ start_ARG italic_z end_ARG. This leads to poor performance and drift especially for multivariate high dimensional non-stationary data as shown in (Fathi et al. 2024). This also leads to poor generalisation across state space not observed in the look-back window. To mitigate this, during inference for LTSF task we make use of periodic re-encoding that is after every kthsuperscript𝑘𝑡k^{th}italic_k start_POSTSUPERSCRIPT italic_t italic_h end_POSTSUPERSCRIPT windows we use the decoder to generate predicted measurement window (k+1)thsuperscript𝑘1𝑡(k+1)^{th}( italic_k + 1 ) start_POSTSUPERSCRIPT italic_t italic_h end_POSTSUPERSCRIPT and use the encoding model to get state representation.

Upon adding the output of (9) and (8) we can get the state representation for the next segment. For each prediction cycle we use (9) and (8) to predict all the segments in the future window and stack them in order to get Z^ksubscript^𝑍𝑘\hat{Z}_{k}over^ start_ARG italic_Z end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT and Rksubscript𝑅𝑘R_{k}italic_R start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT respectively. Similarly we stack and add the output of the encoders in (7) to get Reconstruction of (k1)thsuperscript𝑘1𝑡(k-1)^{th}( italic_k - 1 ) start_POSTSUPERSCRIPT italic_t italic_h end_POSTSUPERSCRIPT window.

Y~k=ψ(Z^k+Rk),Y^k1=ψ(Zk1+Rk1)formulae-sequencesubscript~𝑌𝑘𝜓subscript^𝑍𝑘subscript𝑅𝑘subscript^𝑌𝑘1𝜓subscript𝑍𝑘1subscript𝑅𝑘1\tilde{Y}_{k}=\psi(\hat{Z}_{k}+R_{k}),\quad\hat{Y}_{k-1}=\psi(Z_{k-1}+R_{k-1})over~ start_ARG italic_Y end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = italic_ψ ( over^ start_ARG italic_Z end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + italic_R start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) , over^ start_ARG italic_Y end_ARG start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT = italic_ψ ( italic_Z start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT + italic_R start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT ) (10)

here ψ:s𝒴s×w:𝜓superscript𝑠superscript𝒴𝑠𝑤\psi:\mathcal{H}^{s}\rightarrow\mathcal{Y}^{s\times w}italic_ψ : caligraphic_H start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT → caligraphic_Y start_POSTSUPERSCRIPT italic_s × italic_w end_POSTSUPERSCRIPT is the decoder that maps from state space to the segmented view of the measurement space. We reshape Y~ksubscript~𝑌𝑘\tilde{Y}_{k}over~ start_ARG italic_Y end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT and Y^k1subscript^𝑌𝑘1\hat{Y}_{k-1}over^ start_ARG italic_Y end_ARG start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT back to 𝒴τsuperscript𝒴𝜏\mathcal{Y}^{\tau}caligraphic_Y start_POSTSUPERSCRIPT italic_τ end_POSTSUPERSCRIPT. Therefore by continuously evolving the physical and residual components and decoding their output (10) we can generate Y~ksubscript~𝑌𝑘\tilde{Y}_{k}over~ start_ARG italic_Y end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT and reconstructed trajectory Y^k1subscript^𝑌𝑘1\hat{Y}_{k-1}over^ start_ARG italic_Y end_ARG start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT.

Correction Model

Table 1: Multivariate long-term time series forecasting results. We evaluate all the methods across forecast horizons of H{96,192,336,720}𝐻96192336720H\in\{96,192,336,720\}italic_H ∈ { 96 , 192 , 336 , 720 } in all datasets. We take the average of 10 iterations for KODA. Bold represents the best and underlined represents the second best.

Models KODA (ours) SegRNN (2023) PatchTST (2023) FEDformer (2022) Informer (2021) TiDE (2023) Dlinear (2023) KOOPA (2023) KNF (2022) MICN (2023) TimesNet (2023) Metric MSE MAE MSE MAE MSE MAE MSE MAE MSE MAE MSE MAE MSE MAE MSE MAE MSE MAE MSE MAE MSE MAE ETTh1 96 0.339 0.359 0.341 0.376 0.376 0.402 0.376 0.415 0.941 0.769 0.375 0.398 0.374 0.399 0.374 0.408 0.457 0.459 0.421 0.431 0.384 0.402 192 0.356 0.388 0.385 0.402 0.413 0.429 0.423 0.446 1.007 0.786 0.412 0.422 0.405 0.416 0.425 0.446 0.518 0.520 0.474 0.487 0.436 0.429 336 0.394 0.404 0.403 0.417 0.429 0.438 0.444 0.462 1.038 0.784 0.435 0.433 0.439 0.443 0.437 0.472 0.592 0.551 0.569 0.551 0.491 0.469 720 0.411 0.451 0.434 0.447 0.448 0.468 0.469 0.492 1.144 0.857 0.454 0.465 0.472 0.49 0.461 0.498 0.784 0.596 0.775 0.672 0.521 0.538 ETTh2 96 0.255 0.324 0.263 0.321 0.274 0.337 0.332 0.374 1.549 0.952 0.284 0.336 0.289 0.353 0.297 0.355 0.433 0.446 0.299 0.364 0.34 0.374 192 0.316 0.361 0.321 0.367 0.341 0.382 0.407 0.446 3.792 1.542 0.332 0.381 0.383 0.418 0.356 0.394 0.528 0.503 0.441 0.454 0.402 0.414 336 0.327 0.385 0.325 0.374 0.329 0.384 0.410 0.447 4.215 1.642 0.365 0.407 0.448 0.465 0.385 0.421 0.631 0.617 0.654 0.567 0.452 0.452 720 0.362 0.416 0.394 0.424 0.379 0.422 0.412 0.469 3.656 1.619 0.419 0.451 0.605 0.551 0.427 0.476 0.795 0.738 0.956 0.716 0.462 0.468 ETTm1 96 0.281 0.331 0.284 0.339 0.293 0.336 0.326 0.39 0.626 0.56 0.306 0.349 0.299 0.344 0.296 0.344 0.341 0.394 0.316 0.362 0.338 0.375 192 0.323 0.362 0.319 0.375 .335 0.371 0.365 0.415 0.725 0.619 0.335 0.366 0.335 0.368 0.339 0.378 0.414 0.428 0.363 0.390 0.374 0.387 336 0.341 0.379 0.349 0.383 0.369 0.392 0.392 0.425 1.005 0.741 0.364 0.381 0.369 0.386 0.381 0.392 0.426 0.440 0.408 0.426 0.410 0.411 720 0.409 0.420 0.407 0.418 0.416 0.420 0.446 0.458 1.133 0.845 0.413 0.418 0.425 0.421 0.435 0.438 0.499 0.467 0.481 0.476 0.478 0.458 ETTm2 96 0.154 0.237 0.158 0.241 0.166 0.256 0.182 0.271 0.355 0.462 0.161 0.251 0.167 0.260 0.171 0.254 0.312 0.328 0.179 0.275 0.187 0.267 192 0.211 0.278 0.215 0.283 0.223 0.296 0.252 0.318 0.595 0.586 0.215 0.281 0.224 0.303 0.226 0.298 0.425 0.490 0.307 0.376 0.249 0.309 336 0.267 0.325 0.263 0.317 0.274 0.329 0.324 0.364 1.277 0.871 0.267 0.326 0.281 0.342 0.287 0.362 0.491 0.547 0.325 0.388 0.321 0.351 720 0.322 0.358 0.328 0.366 0.362 0.385 0.410 0.424 3.001 1.267 0.352 0.383 0.397 0.421 0.391 0.406 0.568 0.612 0.502 0.493 0.408 0.403 Electricity 96 0.131 0.220 0.128 0.219 0.129 0.222 0.186 0.302 0.304 0.393 0.132 0.229 0.142 0.237 0.136 0.236 0.198 0.284 0.164 0.269 0.168 0.272 192 0.144 0.225 0.148 0.239 0.147 0.249 0.197 0.311 0.327 0.417 0.147 0.243 0.153 0.249 0.156 0.255 0.245 0.320 0.177 0.285 0.184 0.289 336 0.157 0.251 0.166 0.258 0.163 0.259 0.213 0.328 0.333 0.422 0.161 0.261 0.169 0.267 0.178 0.281 0.281 0.351 0.193 0.304 0.198 0.309 720 0.189 0.293 0.201 0.298 0.197 0.291 0.233 0.344 0.351 0.427 0.196 0.294 0.203 0.301 0.210 0.314 0.304 0.387 0.212 0.321 0.22 0.321 Traffic 96 0.397 0.227 0.543 0.235 0.361 0.249 0.576 0.359 0.733 0.412 0.334 0.253 0.410 0.282 0.401 0.275 0.643 0.376 0.519 0.309 0.593 0.321 192 0.396 0.248 0.567 0.242 0.379 0.256 0.619 0.381 0.777 0.435 0.342 0.257 0.423 0.287 0.405 0.284 0.699 0.405 0.537 0.315 0.617 0.336 336 0.403 0.260 0.602 0.262 0.393 0.264 0.608 0.375 0.776 0.434 0.355 0.261 0.436 0.296 0.442 0.301 0.728 0.429 0.534 0.313 0.629 0.336 720 0.428 0.277 0.671 0.281 0.432 0.286 0.621 0.375 0.827 0.466 0.388 0.273 0.466 0.315 0.468 0.309 0.782 0.440 0.577 0.325 0.641 0.352 Weather 96 0.147 0.193 0.142 0.181 0.149 0.198 0.238 0.314 0.354 0.405 0.166 0.222 0.176 0.237 0.154 0.205 0.293 0.308 0.161 0.229 0.172 0.226 192 0.184 0.239 0.187 0.227 0.194 0.238 0.275 0.329 0.419 0.434 0.209 0.263 0.227 0.282 0.195 0.247 0.406 0.457 0.225 0.281 0.219 0.261 336 0.231 0.258 0.237 0.270 0.245 0.282 0.339 0.377 0.583 0.543 0.254 0.301 0.265 0.319 0.247 0.306 0.462 0.492 0.278 0.331 0.280 0.306 720 0.305 0.312 0.311 0.327 0.314 0.325 0.389 0.409 0.916 0.705 0.310 0.341 0.323 0.362 0.316 0.351 0.517 0.527 0.311 0.356 0.365 0.359

When the model is used to make predictions over long time horizons the predictions starts to experience drift from the actual trajectory. In many use cases the measurement data is sparsely and irregularly available the long prediction window. Using a data assimilation-based framework we can make use of these measurements to correct the drift present in the predictions in a systematic way.

The extended Kalman filter (EKF) framework is a principled framework which updates the predicted states in a NLDS as a function of the error between the observed and predicted measurements (Särkkä and Svensson 2023). Assuming data Yksubscript𝑌𝑘Y_{k}italic_Y start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT is available, we can define the correction equation inspired by the EKF as

[ZkRk]=[Z^kRk]+[Lg00Lr][JψTJψT][Y~kYk].matrixsubscriptsuperscript𝑍𝑘subscriptsuperscript𝑅𝑘matrixsubscript^𝑍𝑘subscript𝑅𝑘matrixsubscript𝐿𝑔00subscript𝐿𝑟matrixsuperscriptsubscript𝐽𝜓𝑇superscriptsubscript𝐽𝜓𝑇matrixsubscript~𝑌𝑘subscript𝑌𝑘\displaystyle\begin{bmatrix}Z^{\prime}_{k}\\ R^{\prime}_{k}\end{bmatrix}=\begin{bmatrix}\hat{Z}_{k}\\ R_{k}\end{bmatrix}+\begin{bmatrix}L_{g}&0\\ 0&L_{r}\end{bmatrix}\begin{bmatrix}J_{\psi}^{T}\\ J_{\psi}^{T}\end{bmatrix}\begin{bmatrix}\tilde{Y}_{k}-Y_{k}\end{bmatrix}\,.[ start_ARG start_ROW start_CELL italic_Z start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_R start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ] = [ start_ARG start_ROW start_CELL over^ start_ARG italic_Z end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_R start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ] + [ start_ARG start_ROW start_CELL italic_L start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT end_CELL start_CELL 0 end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL italic_L start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ] [ start_ARG start_ROW start_CELL italic_J start_POSTSUBSCRIPT italic_ψ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL italic_J start_POSTSUBSCRIPT italic_ψ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT end_CELL end_ROW end_ARG ] [ start_ARG start_ROW start_CELL over~ start_ARG italic_Y end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - italic_Y start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ] . (11)

Here Jψsubscript𝐽𝜓J_{\psi}italic_J start_POSTSUBSCRIPT italic_ψ end_POSTSUBSCRIPT is the Jacobian of the decoder function evaluated at the output of the prediction step i.e. h^k=Z^k+Rksubscript^𝑘subscript^𝑍𝑘subscript𝑅𝑘\hat{h}_{k}=\hat{Z}_{k}+R_{k}over^ start_ARG italic_h end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = over^ start_ARG italic_Z end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + italic_R start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT. The Jacobian linearizes the nonlinear measurement function around the current predicted state:

Jψ=ψ(;θ)h|h^ksubscript𝐽𝜓evaluated-at𝜓𝜃subscript^𝑘J_{\psi}=\frac{\partial\psi(\cdot;\theta)}{\partial h}\bigg{|}_{\hat{h}_{k}}italic_J start_POSTSUBSCRIPT italic_ψ end_POSTSUBSCRIPT = divide start_ARG ∂ italic_ψ ( ⋅ ; italic_θ ) end_ARG start_ARG ∂ italic_h end_ARG | start_POSTSUBSCRIPT over^ start_ARG italic_h end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT (12)

The output of the correction step hk=Zk+Rksubscriptsuperscript𝑘subscriptsuperscript𝑍𝑘subscriptsuperscript𝑅𝑘h^{\prime}_{k}=Z^{\prime}_{k}+R^{\prime}_{k}italic_h start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = italic_Z start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + italic_R start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT is then fed to the decoder to get the corrected prediction of the measurement trajectory.

Kalman Gain

The gain matrices Lgsubscript𝐿𝑔L_{g}italic_L start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT and Lrsubscript𝐿𝑟L_{r}italic_L start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT represent the component of Kalman gain for the global and residual branches. We use a gating mechanism parameterised by neural network to define the Kalman gains,

Lgsubscript𝐿𝑔\displaystyle L_{g}italic_L start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT =tanh(Wzz(Z^k)+Wyg(ϕg(Ykg))+bg),absentsuperscriptsubscript𝑊𝑧𝑧subscript^𝑍𝑘superscriptsubscript𝑊𝑦𝑔subscriptitalic-ϕ𝑔subscriptsuperscript𝑌𝑔𝑘subscript𝑏𝑔\displaystyle=\tanh(W_{z}^{z}(\hat{Z}_{k})+W_{y}^{g}(\phi_{g}(Y^{g}_{k}))+b_{g% })\,,= roman_tanh ( italic_W start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_z end_POSTSUPERSCRIPT ( over^ start_ARG italic_Z end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) + italic_W start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_g end_POSTSUPERSCRIPT ( italic_ϕ start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT ( italic_Y start_POSTSUPERSCRIPT italic_g end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ) + italic_b start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT ) , (13)
Lrsubscript𝐿𝑟\displaystyle L_{r}italic_L start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT =tanh(Wzr(Rk)+Wyr(ϕr(Ykr))+br).absentsuperscriptsubscript𝑊𝑧𝑟subscript𝑅𝑘superscriptsubscript𝑊𝑦𝑟subscriptitalic-ϕ𝑟subscriptsuperscript𝑌𝑟𝑘subscript𝑏𝑟\displaystyle=\tanh(W_{z}^{r}(R_{k})+W_{y}^{r}(\phi_{r}(Y^{r}_{k}))+b_{r})\,.= roman_tanh ( italic_W start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT ( italic_R start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) + italic_W start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT ( italic_ϕ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( italic_Y start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ) + italic_b start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ) . (14)

here Wzz,Wzr,Wyr,Wygsuperscriptsubscript𝑊𝑧𝑧superscriptsubscript𝑊𝑧𝑟superscriptsubscript𝑊𝑦𝑟superscriptsubscript𝑊𝑦𝑔W_{z}^{z},W_{z}^{r},W_{y}^{r},W_{y}^{g}italic_W start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_z end_POSTSUPERSCRIPT , italic_W start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT , italic_W start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT , italic_W start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_g end_POSTSUPERSCRIPT are all single layer neural networks with ReLU activation. br,bgsubscript𝑏𝑟subscript𝑏𝑔b_{r},b_{g}italic_b start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT , italic_b start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT are bias terms for Lr,Lgsubscript𝐿𝑟subscript𝐿𝑔L_{r},L_{g}italic_L start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT , italic_L start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT respectively.

Learning criterion

Operator Learning Loss

The physical model consists of the following trainable components: (i) latent Koopman model 𝒦𝒦\mathcal{K}caligraphic_K, (ii) state encoding model and (iii) state decoding model. To ensure that the koopman dynamcis is respected by the learned physical model we minimise the following losses,

recon=t=0Hy^kyk2,subscript𝑟𝑒𝑐𝑜𝑛superscriptsubscript𝑡0𝐻subscriptnormsubscript^𝑦𝑘subscript𝑦𝑘2\displaystyle\mathcal{L}_{recon}=\sum_{t=0}^{H}||\hat{y}_{k}-y_{k}||_{2},caligraphic_L start_POSTSUBSCRIPT italic_r italic_e italic_c italic_o italic_n end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_t = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_H end_POSTSUPERSCRIPT | | over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - italic_y start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT | | start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , pred=t=1Ty~kyk2,subscript𝑝𝑟𝑒𝑑superscriptsubscript𝑡1𝑇subscriptnormsubscript~𝑦𝑘subscript𝑦𝑘2\displaystyle\mathcal{L}_{pred}=\sum_{t=1}^{T}||\tilde{y}_{k}-y_{k}||_{2},caligraphic_L start_POSTSUBSCRIPT italic_p italic_r italic_e italic_d end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT | | over~ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - italic_y start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT | | start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ,
alignsubscript𝑎𝑙𝑖𝑔𝑛\displaystyle\mathcal{L}_{align}caligraphic_L start_POSTSUBSCRIPT italic_a italic_l italic_i italic_g italic_n end_POSTSUBSCRIPT =k=1T/kZ^kZk2absentsuperscriptsubscript𝑘1𝑇𝑘subscriptnormsubscript^𝑍𝑘subscript𝑍𝑘2\displaystyle=\sum_{k=1}^{T/k}||\hat{Z}_{k}-Z_{k}||_{2}= ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T / italic_k end_POSTSUPERSCRIPT | | over^ start_ARG italic_Z end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - italic_Z start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT | | start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT

H𝐻Hitalic_H is forecast horizon and T𝑇Titalic_T is lookback window length. To achieve the best performance we train KODA in stages. First we train the prediction model with alignsubscript𝑎𝑙𝑖𝑔𝑛\mathcal{L}_{align}caligraphic_L start_POSTSUBSCRIPT italic_a italic_l italic_i italic_g italic_n end_POSTSUBSCRIPT, predsubscript𝑝𝑟𝑒𝑑\mathcal{L}_{pred}caligraphic_L start_POSTSUBSCRIPT italic_p italic_r italic_e italic_d end_POSTSUBSCRIPT and reconsubscript𝑟𝑒𝑐𝑜𝑛\mathcal{L}_{recon}caligraphic_L start_POSTSUBSCRIPT italic_r italic_e italic_c italic_o italic_n end_POSTSUBSCRIPT. We then add the correction model and train only using predsubscript𝑝𝑟𝑒𝑑\mathcal{L}_{pred}caligraphic_L start_POSTSUBSCRIPT italic_p italic_r italic_e italic_d end_POSTSUBSCRIPT. This two stage training process allows us to first learn the prediction model and then in the second stage learn the gain function while further improving the prediction output.

Training Criteria for Lgsubscript𝐿𝑔L_{g}italic_L start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT and Lrsubscript𝐿𝑟L_{r}italic_L start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT

The Kalman gain Lgsubscript𝐿𝑔L_{g}italic_L start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT and Lrsubscript𝐿𝑟L_{r}italic_L start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT depends on whether the measurement data Yksubscript𝑌𝑘Y_{k}italic_Y start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT is available for assimilation or not. The gating mechanism is first trained separately to allow correction to happen only when measurement data is made available during the inference time. During inference when no data is available the correction component should be 00 such that (14) is only driven by the prediction term. For further details on training criteria and network architecture please refer to the appendix.

Experimental setting

Table 2: Multivariate long-term time series forecasting with data assimilation results for KODA. We evaluate KODA across forecast horizon of H=720𝐻720H=720italic_H = 720 in all datasets. We use α𝛼\alphaitalic_α to denote %percent\%% of measurement data available for online assimilation task in the forecast horizon. We report the mean of 10 runs for each value of α𝛼\alphaitalic_α.

Alpha(α𝛼\alphaitalic_α) ETTh1 ETTh2 ETTm1 ETTm2 Electricity Traffic Weather % MSE MAE MSE MAE MSE MAE MSE MAE MSE MAE MSE MAE MSE MAE 10 0.397 0.439 0.323 0.402 0.391 0.420 0.311 0.358 0.181 0.293 0.411 0.268 0.367 0.295 20 0.378 0.402 0.318 0.489 0.355 0.413 0.289 0.341 0.175 0.278 0.352 0.245 0.336 0.277 30 0.351 0.370 0.296 0.362 0.340 0.392 0.268 0.305 0.173 0.271 0.328 0.219 0.319 0.258

Table 3: State prediction for simple NLDS. KODA(G) represents the model with just the global Koopman component of KODA (without Fourier filtering). We compare different Koopman autoencoder-based models for thr state prediction task. These models are evaluated using the 100×100\times100 × MSE over H=1000𝐻1000H=1000italic_H = 1000.

Data KODA KODA(G) KAE KAE (PE) Pendulum 0.059 0.106 12.023 0.214 Duffing Oscillator 0.208 0.451 23.042 1.198 Lotka-Volterra 0.119 0.128 2.956 0.449 Lorenz’ 63 32.288 49.327 - 83.416

Table 4: We perform ablation study of the KODA framework. All results are for H=720𝐻720H=720italic_H = 720 on ETT benchmark. We compare four different variations of the KODA frameworks.

Dataset ETTh1 ETTh2 ETTm1 ETTm2 Metric MSE MAE MSE MAE MSE MAE MSE MAE KODA(G (w. Lgsubscript𝐿𝑔L_{g}italic_L start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT)) 0.508 0.536 0.442 0.479 0.459 0.468 0.389 0.418 KODA(R (w. Lrsubscript𝐿𝑟L_{r}italic_L start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT)) 0.451 0.497 0.406 0.436 0.453 0.455 0.395 0.422 KODA(P w/o C) 0.429 0.458 0.379 0.425 0.414 0.423 0.329 0.360 KODA(w/o L) 0.417 0.454 0.358 0.418 0.421 0.426 0.328 0.362 KODA(complete) 0.411 0.451 0.362 0.416 0.409 0.420 0.322 0.358

We evaluate KODA across multiple datasets and conduct an extensive comparison across different baselines. We evaluate KODA over two main tasks: (1) Forecasting and (2) Data Assimilation. We also present comparison based on some well-known NLDS datasets for the (3) State Prediction task. Through our experiments we showcase the comparable effectiveness of KODA with state of the art baselines for forecasting and then further improvement with the data assimilation strategy.

Datasets:

We compare KODA across 4 different real world multivariate NLDS time series datasets : (1) weather (Wetterstation), (2) traffic (PeMS), (3) ECL (UCI), (4) ETT (Zhou et al. 2021)(including 4 subsets: ETTh1, ETTh2, ETTm1, ETTm2). We also evaluate the performance of KODA on M4 (SpyrosMakridakis 2018) which is a real world univariate time series on periodically collected marketing data. We use the same data prepossessing and test time split ratio as presented in (Wu et al. 2023).

We use the same datasets for showing results for data assimilation task. We present additional results on state prediction task for some of the well known uni-variate NLDS: (1) Lokta Volterra, (2) Duffing Oscillator, (3) pendulum and (4) Lorenz 63. These NLDS present extremely nonlinear dynamics with varying dimensionality including multiple fixed points. Appendix B for NLDS description.

Baseline:

We choose state of the art methods to compare KODA performance on forecasting and data assimilation tasks. We would like to make the note that most deep learning methods discussed as our baseline don’t have a well defined way for data assimilation. For the comparison on the forecasting task we evaluate our model against MLP based methods: Koopman forecaster (KNF)(Wang et al. 2023b), Koopa (Liu et al. 2024), Dlinear (Zeng et al. 2022), TiDE (Das et al. 2023); Transformer based methods: fedformer (Zhou et al. 2022), PatchTST (Nie et al. 2023); Informer (Zhou et al. 2021) Temporal convolution method: TimesNet (Wu et al. 2023), MICN (Wang et al. 2023a). For the state prediction task from noisy measurements on the known NLDS we compare KODA against vanilla Koopman autoencoder (Lusch, Kutz, and Brunton 2018), MLP and Koopman autoencoder with periodic re-encoding (Fathi et al. 2024). We compare KODA against the method presented in (Frion et al. 2024a) for the data assimilation and forecasting task when data is sparsely present in the lookback window. KODA’s framework allows us to assimilate measurement data into prediction recursively. Hence, to evaluate KODA’s performance we use different amount of noisy measurement present in the forecast window and evaluate model forecast on the rest of the forecast window. We compare this over different sets of noise levels in the measurement data.

Evaluation Metrics:

The most commonly used metric for comparing the accuracy of the forecasting method is: (1) Mean Absolute Error (MAE) and (2) Mean Squared Error (MSE). For further details on the experimental setup, please refer to Appendix B.

Results

Long Term Forecast

In Table 1 we present the results for long term forecasting across several recent state-of-the-art methods. We compare all the methods on four different multivariate time series datasets. The choice of horizon THsubscript𝑇𝐻T_{H}italic_T start_POSTSUBSCRIPT italic_H end_POSTSUBSCRIPT is set to be {96,192,336,720}96192336720\{96,192,336,720\}{ 96 , 192 , 336 , 720 }. Through this task we are solely evaluating KODA’s ability to generate long term accurate forecasts. The lookback window for all the models in Table 1 was set to be TL=720subscript𝑇𝐿720T_{L}=720italic_T start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT = 720. We present additional results across different lookback windows in the appendix. We observe that except for the traffic benchmark, KODA almost beats all the other methods across benchmarks. In the case of traffic dataset we observe that the best performing model were TiDE and PatchTST across several of the prediction horizon length except when H=720𝐻720H=720italic_H = 720, where KODA provides the second best performance. One interesting observation is that KODA outperforms all the other Koopman-based models: KOOPA and KNF. This shows that KODA presents a flexible model that can produce long term accurate forecast for multivariate non-stationary time series datasets. We provide additional results including results on uni-variate time-series benchmark M4 (SpyrosMakridakis 2018) in Appendix C.

Data Assimilation

One of the main contribution of KODA is in its ability to perform online data assimilation during inference time hence making the model highly adaptive. In Table 2 we present the result for the online data assimilation task. We define α𝛼\alphaitalic_α as the percentage of data available in the forecast horizon for assimilation. We show results for α{10%,20%,30%}𝛼percent10percent20percent30\alpha\in\{10\%,20\%,30\%\}italic_α ∈ { 10 % , 20 % , 30 % } across all the benchmarks. For this experiment we chose τ=24𝜏24\tau=24italic_τ = 24 and w=8𝑤8w=8italic_w = 8. For the sake of simplicity we distributed the available measurements along the prediction horizon uniformly. We see a consistent improvement in the forecast results when correction is used. This shows that KODA other than being able to generate accurate long term forecasts, KODA is also able to make use of the data available at a future time to further improve its forecast. This makes KODA framework really potent in tasks where data is irregularly present at different intervals and long term forecasting is required. Additionally KODA can also be used when the data are irregularly present instead of uniformly. We show additional results on assimilation task across benchmarks in the appendix C.

State Prediction

The main attraction of using Koopman operators based approaches is achieving system identification in a complete data driven approach. To demonstrate KODA’s ability for such, we generate 100 trajectories of the different NLDS mentioned above. For further details on data generation process and NLDS desciption refer appendix A. We train the model using first 500 steps of the generated trajectories. During inference we generate the forecast for 1000 steps from the model. We observe that the model is able to generalise to the part of state space that were previously not seen during the training. We present the results in Table 3. Here we compare KODA with Koopman autoencoder methods proposed in literature (Lusch, Kutz, and Brunton 2018; Fathi et al. 2024). We also evaluate KODA and KODA (G) separately to evaluate the effectiveness of the physical branch of KODA. We observe that not only KODA but KODA (G) performs much better than KAE and KAE (PE). This attributed to the structure of our training procedure that doesn’t enforce the whole trajectory in the latent space to be globally linear but only enforces local linearity in the latent space. This is similar to the course correction achieved in (Fathi et al. 2024) using periodic re-encoding. Yet our model is largely different from the one used in (Fathi et al. 2024). We can observe that both KODA and KODA (G) performs better than KAE(PE) on all four datasets.

Ablation Study

To compare the efficacy of different elements of KODA we evaluate 5 different variations of KODA framework across different subsets of ETT dataset. These variations include 1. global model of KODA with the: KODA (G (w. Lgsubscript𝐿𝑔L_{g}italic_L start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT)); 2. Residual model of KODA: KODA (R (w. Lrsubscript𝐿𝑟L_{r}italic_L start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT)); 3. Only the prediction model without any correction component: KODA (P w/o C); 4. Full KODA model but without the Kalman gain gating function L𝐿Litalic_L: KODA (w/o L𝐿Litalic_L) and finally 5. Full KODA model. For KODA (G (w. Lgsubscript𝐿𝑔L_{g}italic_L start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT)) and KODA (R (w. Lrsubscript𝐿𝑟L_{r}italic_L start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT)) we also don’t use the measurement disentanglement. We present the results of the ablation study in Table 4. When we only use the residual component, KODA reduces to an implementation of SegRNN (Lin et al. 2023). Albeit in (Lin et al. 2023) there is no data assimilation framework. KODA(complete) gives the best performance. We also observe a significant jump when the whole prediction model is used versus individual branches. We provide further ablation study of the KODA framework in appendix C.

Discussion

In this paper we proposed a novel framework for time series forecasting and data assimilation named KODA. Through rigorous experimentation across different benchmarks we showed that KODA is able to (1) Generate accurate long term forecast, (2) Generate a disentangled view of the NLDS into local and global components, (3) Use a correction framework to perform online data assimilation, (4) Perform state prediction of the state space for known NLDS. Therefore KODA present a flexible recursive framework that is able to exploit the benefits of koopman operator representation while being able to generate long term forecast and perform data assimilation.

References

  • Afshar, Germ, and Morris (2023) Afshar, S.; Germ, F.; and Morris, K. 2023. Extended Kalman filter based observer design for semilinear infinite-dimensional systems. IEEE Transactions on Automatic Control.
  • Azencot et al. (2020) Azencot, O.; Erichson, N. B.; Lin, V.; and Mahoney, M. 2020. Forecasting sequential data using consistent koopman autoencoders. In International Conference on Machine Learning, 475–485. PMLR.
  • Benosman, Mansour, and Huroyan (2017) Benosman, M.; Mansour, H.; and Huroyan, V. 2017. Koopman-operator observer-based estimation of pedestrian crowd flows. IFAC-PapersOnLine, 50(1): 14028–14033.
  • Berman, Naiman, and Azencot (2023) Berman, N.; Naiman, I.; and Azencot, O. 2023. Multifactor Sequential Disentanglement via Structured Koopman Autoencoders. In The Eleventh International Conference on Learning Representations.
  • Bevanda et al. (2023) Bevanda, P.; Beier, M.; Lederer, A.; Sosnowski, S. G.; Hüllermeier, E.; and Hirche, S. 2023. Koopman Kernel Regression. In Thirty-seventh Conference on Neural Information Processing Systems.
  • Brunton et al. (2016) Brunton, B. W.; Johnson, L. A.; Ojemann, J. G.; and Kutz, J. N. 2016. Extracting spatial–temporal coherent patterns in large-scale neural recordings using dynamic mode decomposition. Journal of neuroscience methods, 258: 1–15.
  • Brunton et al. (2021) Brunton, S. L.; Budišić, M.; Kaiser, E.; and Kutz, J. N. 2021. Modern Koopman theory for dynamical systems. arXiv preprint arXiv:2102.12086.
  • Brunton, Proctor, and Kutz (2016) Brunton, S. L.; Proctor, J. L.; and Kutz, J. N. 2016. Discovering governing equations from data by sparse identification of nonlinear dynamical systems. Proceedings of the national academy of sciences, 113(15): 3932–3937.
  • Cai et al. (2021) Cai, S.; Mao, Z.; Wang, Z.; Yin, M.; and Karniadakis, G. E. 2021. Physics-informed neural networks (PINNs) for fluid mechanics: A review. Acta Mechanica Sinica, 37(12): 1727–1738.
  • Cheng, Dong, and Lapata (2016) Cheng, J.; Dong, L.; and Lapata, M. 2016. Long short-term memory-networks for machine reading. arXiv preprint arXiv:1601.06733.
  • Chung et al. (2014) Chung, J.; Gulcehre, C.; Cho, K.; and Bengio, Y. 2014. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555.
  • Cuomo et al. (2022) Cuomo, S.; Di Cola, V. S.; Giampaolo, F.; Rozza, G.; Raissi, M.; and Piccialli, F. 2022. Scientific machine learning through physics–informed neural networks: Where we are and what’s next. Journal of Scientific Computing, 92(3): 88.
  • Das et al. (2023) Das, A.; Kong, W.; Leach, A.; Mathur, S. K.; Sen, R.; and Yu, R. 2023. Long-term Forecasting with TiDE: Time-series Dense Encoder. Transactions on Machine Learning Research.
  • Fathi et al. (2024) Fathi, M.; Gehring, C.; Pilault, J.; Kanaa, D.; Bacon, P.-L.; and Goroshin, R. 2024. Course Correcting Koopman Representations. In The Twelfth International Conference on Learning Representations.
  • Frion et al. (2024a) Frion, A.; Drumetz, L.; Dalla Mura, M.; Tochon, G.; and El Bey, A. A. 2024a. Neural Koopman prior for data assimilation. IEEE Transactions on Signal Processing.
  • Frion et al. (2024b) Frion, A.; Drumetz, L.; Tochon, G.; Mura, M. D.; and Bey, A. A. E. 2024b. Koopman Ensembles for Probabilistic Time Series Forecasting. arXiv preprint arXiv:2403.06757.
  • Guen and Thome (2020) Guen, V. L.; and Thome, N. 2020. Disentangling physical dynamics from unknown factors for unsupervised video prediction. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 11474–11484.
  • Imbiriba et al. (2023) Imbiriba, T.; Straka, O.; Duník, J.; and Closas, P. 2023. Augmented physics-based machine learning for navigation and tracking. IEEE Transactions on Aerospace and Electronic Systems.
  • Jiang et al. (2022) Jiang, W.; long Zhang, X.; Zuo, Z.; Shi, M.; and Su, S. 2022. Data-driven Kalman Filter with Kernel-based Koopman Operators for Nonlinear Robot Systems. In 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 12872–12878. IEEE.
  • Kawahara (2016) Kawahara, Y. 2016. Dynamic mode decomposition with reproducing kernels for Koopman spectral analysis. Advances in neural information processing systems, 29.
  • Koopman (1931) Koopman, B. O. 1931. Hamiltonian systems and transformation in Hilbert space. Proceedings of the National Academy of Sciences, 17(5): 315–318.
  • Kovachki et al. (2023) Kovachki, N.; Li, Z.; Liu, B.; Azizzadenesheli, K.; Bhattacharya, K.; Stuart, A.; and Anandkumar, A. 2023. Neural operator: Learning maps between function spaces with applications to pdes. Journal of Machine Learning Research, 24(89): 1–97.
  • Kutz et al. (2016) Kutz, J. N.; Brunton, S. L.; Brunton, B. W.; and Proctor, J. L. 2016. Dynamic mode decomposition: data-driven modeling of complex systems. SIAM.
  • Lam et al. (2022) Lam, R.; Sanchez-Gonzalez, A.; Willson, M.; Wirnsberger, P.; Fortunato, M.; Alet, F.; Ravuri, S.; Ewalds, T.; Eaton-Rosen, Z.; Hu, W.; et al. 2022. GraphCast: Learning skillful medium-range global weather forecasting. arXiv preprint arXiv:2212.12794.
  • Li et al. (2020) Li, Z.; Kovachki, N.; Azizzadenesheli, K.; Liu, B.; Bhattacharya, K.; Stuart, A.; and Anandkumar, A. 2020. Fourier neural operator for parametric partial differential equations. arXiv preprint arXiv:2010.08895.
  • Lin et al. (2023) Lin, S.; Lin, W.; Wu, W.; Zhao, F.; Mo, R.; and Zhang, H. 2023. Segrnn: Segment recurrent neural network for long-term time series forecasting. arXiv preprint arXiv:2308.11200.
  • Liu et al. (2024) Liu, Y.; Li, C.; Wang, J.; and Long, M. 2024. Koopa: Learning non-stationary time series dynamics with koopman predictors. Advances in Neural Information Processing Systems, 36.
  • Lusch, Kutz, and Brunton (2018) Lusch, B.; Kutz, J. N.; and Brunton, S. L. 2018. Deep learning for universal linear embeddings of nonlinear dynamics. Nature communications, 9(1): 4950.
  • Nie et al. (2023) Nie, Y.; Nguyen, N. H.; Sinthong, P.; and Kalagnanam, J. 2023. A Time Series is Worth 64 Words: Long-term Forecasting with Transformers. In The Eleventh International Conference on Learning Representations.
  • Pathak et al. (2022) Pathak, J.; Subramanian, S.; Harrington, P.; Raja, S.; Chattopadhyay, A.; Mardani, M.; Kurth, T.; Hall, D.; Li, Z.; Azizzadenesheli, K.; et al. 2022. Fourcastnet: A global data-driven high-resolution weather model using adaptive fourier neural operators. arXiv preprint arXiv:2202.11214.
  • Phillips (1961) Phillips, J. 1961. Generalized Koopmans’ Theorem. Physical Review, 123(2): 420.
  • Raissi, Perdikaris, and Karniadakis (2019) Raissi, M.; Perdikaris, P.; and Karniadakis, G. E. 2019. Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. Journal of Computational physics, 378: 686–707.
  • Revach et al. (2022) Revach, G.; Shlezinger, N.; Ni, X.; Escoriza, A. L.; Van Sloun, R. J.; and Eldar, Y. C. 2022. KalmanNet: Neural network aided Kalman filtering for partially known dynamics. IEEE Transactions on Signal Processing, 70: 1532–1547.
  • Salehinejad et al. (2017) Salehinejad, H.; Sankar, S.; Barfett, J.; Colak, E.; and Valaee, S. 2017. Recent advances in recurrent neural networks. arXiv preprint arXiv:1801.01078.
  • Särkkä and Svensson (2023) Särkkä, S.; and Svensson, L. 2023. Bayesian filtering and smoothing, volume 17. Cambridge university press.
  • Singh et al. (2024) Singh, A.; Borsoi, R. A.; Erdogmus, D.; and Imbiriba, T. 2024. Learning semilinear neural operators: A unified recursive framework for prediction and data assimilation. In The Twelfth International Conference on Learning Representations.
  • Singh et al. (2021) Singh, A.; Westlin, C.; Eisenbarth, H.; Losin, E. A. R.; Andrews-Hanna, J. R.; Wager, T. D.; Satpute, A. B.; Barrett, L. F.; Brooks, D. H.; and Erdogmus, D. 2021. Variation is the norm: Brain state dynamics evoked by emotional video clips. In 2021 43rd Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC), 6003–6007. IEEE.
  • SpyrosMakridakis (2018) SpyrosMakridakis. 2018. M4 dataset.
  • Takeishi, Kawahara, and Yairi (2017) Takeishi, N.; Kawahara, Y.; and Yairi, T. 2017. Learning Koopman invariant subspaces for dynamic mode decomposition. Advances in neural information processing systems, 30.
  • Tu (2013) Tu, J. H. 2013. Dynamic mode decomposition: Theory and applications. Ph.D. thesis, Princeton University.
  • Wang et al. (2023a) Wang, H.; Peng, J.; Huang, F.; Wang, J.; Chen, J.; and Xiao, Y. 2023a. MICN: Multi-scale Local and Global Context Modeling for Long-term Series Forecasting. In The Eleventh International Conference on Learning Representations.
  • Wang et al. (2023b) Wang, R.; Dong, Y.; Arik, S.; and Yu, R. 2023b. Koopman Neural Forecaster for Time Series with Temporal Distribution Shifts. In International Conference on Learning Representations (ICLR).
  • Wang, Xu, and Mu (2023) Wang, X.; Xu, X.; and Mu, Y. 2023. Neural Koopman Pooling: Control-Inspired Temporal Dynamics Encoding for Skeleton-Based Action Recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 10597–10607.
  • Wu et al. (2023) Wu, H.; Hu, T.; Liu, Y.; Zhou, H.; Wang, J.; and Long, M. 2023. TimesNet: Temporal 2D-Variation Modeling for General Time Series Analysis. In The Eleventh International Conference on Learning Representations.
  • Yeung, Kundu, and Hodas (2019) Yeung, E.; Kundu, S.; and Hodas, N. 2019. Learning deep neural network representations for Koopman operators of nonlinear dynamical systems. In 2019 American Control Conference (ACC), 4832–4839. IEEE.
  • Zeng et al. (2022) Zeng, A.; Chen, M.; Zhang, L.; and Xu, Q. 2022. Are Transformers Effective for Time Series Forecasting? arXiv preprint arXiv:2205.13504.
  • Zhou et al. (2021) Zhou, H.; Zhang, S.; Peng, J.; Zhang, S.; Li, J.; Xiong, H.; and Zhang, W. 2021. Informer: Beyond efficient transformer for long sequence time-series forecasting. In Proceedings of the AAAI conference on artificial intelligence, volume 35, 11106–11115.
  • Zhou et al. (2022) Zhou, T.; Ma, Z.; Wen, Q.; Wang, X.; Sun, L.; and Jin, R. 2022. Fedformer: Frequency enhanced decomposed transformer for long-term series forecasting. In International conference on machine learning, 27268–27286. PMLR.