KODA: A Data-Driven Recursive Model for Forecasting and Data Assimilation Using Koopman operator

Ashutosh Singh¹, Ashish Singh³, Tales Imbiriba^{1, 2}, Deniz Erdogmus¹, Ricardo Borsoi⁴

KODA: A Data-Driven Recursive Model for Time Series Forecasting and data assimilation using koopman operators

Ashutosh Singh¹, Ashish Singh³, Tales Imbiriba^{1, 2}, Deniz Erdogmus¹, Ricardo Borsoi⁴

Abstract

Approaches based on Koopman operators have shown great promise in forecasting time series data generated by complex nonlinear dynamical systems (NLDS). Although such approaches are able to capture the latent state representation of a NLDS, they still face difficulty in long term forecasting when applied to real world data. Specifically many real-world NLDS exhibit time-varying behavior, leading to nonstationarity that is hard to capture with such models. Furthermore they lack a systematic data-driven approach to perform data assimilation, that is, exploiting noisy measurements on the fly in the forecasting task. To alleviate the above issues, we propose a Koopman operator-based approach (named KODA - Koopman Operator with Data Assimilation) that integrates forecasting and data assimilation in NLDS. In particular we use a Fourier domain filter to disentangle the data into a physical component whose dynamics can be accurately represented by a Koopman operator, and residual dynamics that represents the local or time varying behavior that are captured by a flexible and learnable recursive model. We carefully design an architecture and training criterion that ensures this decomposition lead to stable and long-term forecasts. Moreover, we introduce a course correction strategy to perform data assimilation with new measurements at inference time. The proposed approach is completely data-driven and can be learned end-to-end. Through extensive experimental comparisons we show that KODA outperforms existing state of the art methods on multiple time series benchmarks such as electricity, temperature, weather, lorenz 63 and duffing oscillator demonstrating its superior performance and efficacy along the three tasks a) forecasting, b) data assimilation and c) state prediction.

Introduction

Many real world applications tackle phenomena which are dynamic in nature. Measuring and tracking the evolution of such phenomena holds extreme importance in many fields such as weather forecasting (Pathak et al. 2022; Lam et al. 2022), evolution of PDEs (Li et al. 2020; Kovachki et al. 2023), modeling neural dynamics (Brunton et al. 2016; Singh et al. 2021), etc. Most of these phenomena are characterised as nonlinear dynamical systems (NLDS). If prior knowledge about the underlying system’s evolution is known then it can inform modeling, see, e.g., (Cai et al. 2021; Cuomo et al. 2022; Raissi, Perdikaris, and Karniadakis 2019; Imbiriba et al. 2023). However, in most scenarios no direct information is available about the physics of the underlying system. Therefore, for such cases we can only depend on the measurement data to model system’s dynamics (Guen and Thome 2020; Afshar, Germ, and Morris 2023; Frion et al. 2024b). Key motivations for such modeling is to evolve the model and generate accurate forecasts over long horizon. But the non-linear and non-stationary characteristic of these system makes it hard to capture the state representation or generate accurate forecasts. One classical approach is to model the NLDS in a representation space where the system dynamics is linear. Using the tools from spectral theory and linear algebra we can study various characteristics of the NLDS such as stability, asymptotic nature etc.

Refer to caption — Figure 1: Pictorial representation of data assimilation using a recursive prediction-correction. The blue line represents actual state trajectory while the dotted line represents the trajectory generated by the learned model. Red triangles represent the model prediction while the green squares represent the corrected prediction using the noisy measurement represented by the blue circles.

Koopman operator theory (Koopman 1931; Phillips 1961) generates a representation with a linear dynamics from the NLDS. It achieves this by mapping the system’s dynamics into an infinite-dimensional space using a set of measurement functions. Given that learning such a mapping is computationally infeasible, numerous algorithms have been proposed in the literature which seek to approximate the Koopman operator in finite dimensions and effectively describe the NLDS (Brunton et al. 2021). In recent years, deep learning algorithms have emerged to learn these operators directly from the measurement snapshot data to predict the system’s state space evolution. Despite these advancements, such methods are still susceptible to error accumulation, which can lead to significant drift in state space estimation during long-term forecasting. This calls for an adaptive framework that can utilise measurement data on the fly to correct its prediction. 1 represents a prediction-correction mechanism for modeling a dynamical system. When measurements snapshots are available they are used to correct prediction error.

The Kalman filter presents a systematic way for state space prediction and correction mechanism using measurement data. Many deep learning methods have been recently proposed that takes inspiration from the traditional Kalman filter framework, motivating the use of neural network architectures in an adaptive manner to produce robust long term forecasts (Revach et al. 2022; Singh et al. 2024; Guen and Thome 2020).

In this paper, we propose a branched architecture that disentangles the dynamics into global and local (residual) components in a segmented fashion. This allows us to recover the dynamics of the underlying global factor while being robust to noise and local variations within each segment. Additionally, by not discarding the local sources of variation and capturing it with a recurrent residual model we are able to generate long term time series forecasting (LTSF). We further motivate a data assimilation framework for this disentangled view of the state space to perform online data assimilation during inference. The main contributions of this work are:

•

The proposed approach is able to disentangle the physical and residual components of the dynamics and evolve them separately to generate forecasts.
•

We introduce a Kalman filter-inspired data assimilation framework with the Koopman operator-based prediction model.
•

The proposed approach can leverage data available during inference time to improve the prediction of the model hence making it possible to generate accurate forecasts over long horizons, hence presenting a solution for LTSF.
•

The proposed approach is one of the first works that extends data assimilation to a disentangled state representation-based model.

Related Works

Learning with Koopman Operator

Koopman operator theory (Brunton et al. 2021) serves as a potent tool for unveiling the inherent dynamics of non-linear systems. Notably, the recent surge in interest surrounding Koopman operators can be attributed to the strong theoretical foundations and empirical success of such algorithms. Most famous among them is Dynamic Mode Decomposition (Tu 2013), a data-driven approach (Brunton, Proctor, and Kutz 2016; Kutz et al. 2016) enabling a practical approximation of the Koopman operator. Recent works (see, e.g., (Kawahara 2016; Lusch, Kutz, and Brunton 2018; Yeung, Kundu, and Hodas 2019; Takeishi, Kawahara, and Yairi 2017; Bevanda et al. 2023)) further explored learning theory for estimating Koopman operators. In particularly, the Koopman autoencoder architecture proposed in (Azencot et al. 2020) and (Lusch, Kutz, and Brunton 2018) has become one of the most widely used deep learning models for this task. Since then several work have been proposed in the intersection of learning theory and Koopman operators (Fathi et al. 2024; Wang, Xu, and Mu 2023; Berman, Naiman, and Azencot 2023). The recent models proposed in (Liu et al. 2024; Wang et al. 2023b), proposed Koopman-based neural forecasting methods by disentangling the system’s dynamics into local and global components. However, they lack a systematic approach for data assimilation during inference time.

Data Assimilation

Some of the earlier work that combine Koopman operators with the Kalman filter are (Benosman, Mansour, and Huroyan 2017), where the author formulates a linear observer design using Koopman operators to predict crowd flow, and (Jiang et al. 2022) where the authors use a kernel-based Koopman operator for robotic systems. Unlike these works, the proposed approach is a deep learning-based recursive approach. (Frion et al. 2024a) is one of the first works that motivates data assimilation with a neural Koopman operator. While the model proposed in (Frion et al. 2024a) uses a Koopman operator as prior for variational data assimilation, in KODA we adopt a branched prediction model within a Kalman filter-inspired framework for both forecasting and assimilation. Additionally, unlike (Frion et al. 2024a), KODA can achieve online data assimilation during inference. Similarly, in (Singh et al. 2024) the authors propose a Kalman filter-based data assimilation using neural operators for semilinear PDEs. Differently, our method motivates data assimilation jointly with a Koopman operator-based prediction model and a parallel residual model. We also don’t assume access to a ground truth of a latent state representation of the system and instead learn to evolve the model just from measurement data. In (Frion et al. 2024b) authors propose a way of uncertainty quantification for time series forecasting using Koopman ensembles.

Problem Setting

In this section, we formally define the problem of learning a model that can not only produce solution for LTSF problem, provided sufficient historical data, but also utilise new data available during inference to correct the model prediction. The underlying physical quantity whose dynamic we observe is referred to as states and the observed data itself is referred to as measurements of the state. Therefore, let us denote $s(t)\in\mathcal{H}$ as the state representation, and by $y(t)\in\mathcal{Y}$ as the representation of the measurement snapshots. $\mathcal{H}$ and $\mathcal{Y}$ are finite dimensional vector spaces. Hence a NLDS can be described as:

	$\displaystyle\frac{\partial s(t)}{\partial t}=\mathcal{A}(s(t))+\eta(t)$		(1)
	$\displaystyle y(t)=\mathcal{B}(s(t))+\epsilon(t)$		(2)

where operator $\mathcal{A}:\mathcal{H}\to\mathcal{H}$ models the evolution of the states, $\mathcal{B}:\mathcal{H}\to\mathcal{Y}$ represents the mapping from the states to the measurements, $\eta(t)\in\mathcal{H}$ denote a process noise which represents possible stochasticity in the state evolution, and $\epsilon(t)\in\mathcal{Y}$ denotes measurement noise.
The main objective of the forecasting problem is to obtain accurate estimates of the future measurement trajectory $\tilde{y}_{t+\tau}=\mathcal{B}(s(t+\tau))$ , $\tau\in\{1,2,3,\ldots\}$ based on past measurements $\{y_{0},\ldots,y_{t}\}$ , where $y_{t}$ denote a discrete measurement of $y(t)$ . Note that while the NLDS (2) might be described in continuous time, we consider the discrete-time forecasting problem, based on discrete measurements. In order to keep notation clear, we denote the discrete time index as a subscript. Considering some training dataset with different realizations of $\{y_{1},\ldots,y_{T}\}$ , the forecasting objective can be formulated as learning an operator $\Phi:\mathcal{H}^{t}\times\mathbb{N}_{*}\to\mathcal{H}$ that predicts the future measurements as $\tilde{y}_{t+\tau}=\Phi(y_{1},\ldots,y_{t};\tau)$ . The above objective is generally optimised by minimizing the prediction loss between the forecast and ground truth during training. The data assimilation problem extends this task by including the possibility of incorporating a sparse set of measurements, irregularly present along the forecast horizon, on the fly to improve the forecast estimate.

Note that learning such a model is not easy since the amount of training data might be limited, and the NLDS under consideration can be highly nonlinear. Moreover the measurements $y(t)$ are often accompanied with noise. This makes it harder to avoid drift in the forecast over time. In the section below, we introduce KODA a framework for generating long term forecasts and preforming data assimilation, while enabling us to study the properties of the NLDS through the learned koopman operator.

KODA

In this section, we formally propose KODA and its various elements. KODA adopts a disentangled view of the state representation separating it into a physical and a residual component 2. To achieve this purely from the measurement data, we separate the dominant spectrum across the data using a Fourier filter. We then use an encoding model that learns the physical and residual components, of the state representation, from the dominant and non-dominant parts of the measurement data. Using separate prediction models, for both the state space components, we generate their future estimates. Upon adding the predicate from the two models we compute the future state estimates. We then use a decoder to map the state estimate back to the measurement space, hence recovering the future forecast. We perform this recursively in a windowed fashion to generate future trajectory. At inference time whenever the measurement data is available we use it in the correction mechanism of KODA to update the model predictions. Hence KODA presents a framework that is purely learning based and data driven.

Data Encoding Scheme

State Disentanglement

In order to facilitate the task of LTSF, we aim to learn a latent representation $h_{t}\in\mathcal{H}$ . This representation act as the learning based approximation of the true $s_{t}$ in (2). Assume that $h_{t}$ comprises of multiple sources of variation (physical and residual) we use the following disentanglement of the state trajectory,

\displaystyle h_{t}=z_{t}+r_{t}\,,

(3)

where $z_{t}\in\mathcal{H}$ represents the underlying physical component while $r_{t}\in\mathcal{H}$ represents the residual element of the state trajectory. By disentangling the dynamics into these two components we aim at recovering the physical component i.e. the stable component (time invariant in the selected window). The benefit of such disentanglement is that we are able to separate physical component from the local ones. The physical component is expected to capture sources of variation that cause long term change and are stable in smaller window analysis of which could shed light on the spectral property of the underlying system. Due to its usefulness this disentangled view of the dynamics have garnered interest in recent literature (Guen and Thome 2020), (Liu et al. 2024) and (Wang et al. 2023b).

Measurement Disentanglement

Since no ground truth is available about the true state, we learn to approximate the state representations purely from the measurement data. We use Fourier filtering to find and separate the dominant spectrum present across the measurement space. To achieve the best performance for LTSF we pre-computed a filter by taking all the $\tau$ length windows in the training data and extracted the dominant spectrum across them. We define $Y_{k}=\{y_{t},y_{t+1},...,y_{t+\tau}\}$ as the sliding window over the data with $\tau\in\mathbb{N}_{*}$ as the window length. We use this Fourier filtering to represent the resulting disentanglement in the measurement space as,

$\displaystyle Y^{g}_{k}$	$\displaystyle=\mathcal{F}^{-1}(G_{\alpha}(\mathcal{F}(Y_{k})))$	(4)
$\displaystyle Y^{r}_{k}$	$\displaystyle=\mathcal{F}^{-1}(\overline{G}_{\alpha}(\mathcal{F}(Y_{k})))$	(5)
$\displaystyle\text{s.t.}\quad Y_{k}$	$\displaystyle=Y^{g}_{k}+Y^{r}_{k}$	(6)

here $G_{\alpha}$ represents the filter, where $\alpha$ is a hyper-parameter, and $\overline{G}_{\alpha}$ represents its complement which is designed such that (6) is satisfied, guaranteeing no information loss. $\mathcal{F}$ and $\mathcal{F}^{-1}$ represents Fourier transform and its inverse applied in the time dimension. Both $y^{g}_{t}$ and $y^{r}_{t}$ are defined such that $y^{g}_{t}\in\mathcal{Y}$ and $y^{r}_{t}\in\mathcal{Y}$ . We assume that whenever the data is available it always exist as a block of $\tau$ samples. In many application, such as electricity, motor fault analysis etc, it is quite common that the data is collected for a small window of time at irregular intervals. Hence during the data assimilation task we assume that data is irregularly present as a block of $\tau$ .

Segmentation and Encoding

Within each $k^{th}$ window we create further $s$ segments of segment length $w=\tau/s$ . Hence $Y^{g}_{k}=\{\mathbb{y}^{g}_{k1},\mathbb{y}^{g}_{k2},...,\mathbb{y}^{g}_{ks}\}$ after reshaping, where $\mathbb{y}^{g}_{ki}$ represents the $i^{th}$ segment of $k^{th}$ window. Same segmentation is used for $Y^{r}_{k}$ . Therefore both $\mathbb{y}^{g}_{ki}\in\mathcal{Y}^{w}$ and $\mathbb{y}^{r}_{ki}\in\mathcal{Y}^{w}$ . We use identical encoder architecture $\phi_{g}:\mathcal{Y}^{w}\rightarrow\mathcal{H}$ and $\phi_{r}:\mathcal{Y}^{w}\rightarrow\mathcal{H}$ for $Y^{g}_{k}$ and $Y^{r}_{k}$ respectively, to then get $Z_{k}=\{z_{k1},z_{k2},...,z_{ks}\}$ and $R_{k}=\{r_{k1},r_{k2},...,r_{ks}\}$ the respective representations as

\displaystyle z_{ki}=\phi_{g}(\mathbb{y}^{g}_{ki})\,,\quad r_{ki}=\phi_{r}(% \mathbb{y}^{r}_{ki})\,.

(7)

Previous work (Lin et al. 2023) have shown that segment wise iteration leads to better forecast for the LTSF problem instead of point wise iteration.

Prediction Model

Based on the disentanglement of the latent state representation (3), we design a prediction model which is used to propagate the states (7) forward along the temporal dimension. The prediction model consists of two parallel branches: (i) a Global model, which propagates the component $Z_{k}$ , and (ii) a Residual model which propagates the component $R_{k}$ . We describe each of these models below.

Residual Model

Modeling the residual is important for the reconstruction and for the forecast of the measurement trajectory. We can define the residual $R_{t}$ trajectory as follows,

r_{ki}=F_{r}(r_{k(i-1)})

(8)

here $F_{r}:\mathcal{H}\rightarrow\mathcal{H}$ represents the residual model. We use a gated recurrent unit (GRU) (Chung et al. 2014) to then propagate the residual trajectory. Other recurrent architecture such as LSTM (Cheng, Dong, and Lapata 2016) or RNN (Salehinejad et al. 2017) could also be used here.

Physical Model

Our prediction model for the physical component of the signal learns koopman operator $\mathcal{K}:\mathcal{H}\rightarrow\mathcal{H}$ such that,

\displaystyle\hat{z}_{k(i)}

\displaystyle=\mathcal{K}\circ\hat{z}_{k(i-1)}

(9)

many methods in literature has been proposed to estimate the Koopman operator from data. Note that $\hat{z}_{ki}$ in (9) and $z_{ki}$ in (7) are different as the later represents the encoding from the measurement while the prior represents the koopman forecast. To align the two representation, an alignment loss is computed and minimised. In (Lusch, Kutz, and Brunton 2018; Azencot et al. 2020) the $\mathcal{K}$ is composed recursively with $\hat{z}_{0}$ to generate all the future $\hat{z}$ . This leads to poor performance and drift especially for multivariate high dimensional non-stationary data as shown in (Fathi et al. 2024). This also leads to poor generalisation across state space not observed in the look-back window. To mitigate this, during inference for LTSF task we make use of periodic re-encoding that is after every $k^{th}$ windows we use the decoder to generate predicted measurement window $(k+1)^{th}$ and use the encoding model to get state representation.

Upon adding the output of (9) and (8) we can get the state representation for the next segment. For each prediction cycle we use (9) and (8) to predict all the segments in the future window and stack them in order to get $\hat{Z}_{k}$ and $R_{k}$ respectively. Similarly we stack and add the output of the encoders in (7) to get Reconstruction of $(k-1)^{th}$ window.

\tilde{Y}_{k}=\psi(\hat{Z}_{k}+R_{k}),\quad\hat{Y}_{k-1}=\psi(Z_{k-1}+R_{k-1})

(10)

here $\psi:\mathcal{H}^{s}\rightarrow\mathcal{Y}^{s\times w}$ is the decoder that maps from state space to the segmented view of the measurement space. We reshape $\tilde{Y}_{k}$ and $\hat{Y}_{k-1}$ back to $\mathcal{Y}^{\tau}$ . Therefore by continuously evolving the physical and residual components and decoding their output (10) we can generate $\tilde{Y}_{k}$ and reconstructed trajectory $\hat{Y}_{k-1}$ .

Correction Model

Table 1: Multivariate long-term time series forecasting results. We evaluate all the methods across forecast horizons of

H\in\{96,192,336,720\}

in all datasets. We take the average of 10 iterations for KODA. Bold represents the best and underlined represents the second best.

Models KODA (ours) SegRNN (2023) PatchTST (2023) FEDformer (2022) Informer (2021) TiDE (2023) Dlinear (2023) KOOPA (2023) KNF (2022) MICN (2023) TimesNet (2023) Metric MSE MAE MSE MAE MSE MAE MSE MAE MSE MAE MSE MAE MSE MAE MSE MAE MSE MAE MSE MAE MSE MAE ETTh1 96 0.339 0.359 0.341 0.376 0.376 0.402 0.376 0.415 0.941 0.769 0.375 0.398 0.374 0.399 0.374 0.408 0.457 0.459 0.421 0.431 0.384 0.402 192 0.356 0.388 0.385 0.402 0.413 0.429 0.423 0.446 1.007 0.786 0.412 0.422 0.405 0.416 0.425 0.446 0.518 0.520 0.474 0.487 0.436 0.429 336 0.394 0.404 0.403 0.417 0.429 0.438 0.444 0.462 1.038 0.784 0.435 0.433 0.439 0.443 0.437 0.472 0.592 0.551 0.569 0.551 0.491 0.469 720 0.411 0.451 0.434 0.447 0.448 0.468 0.469 0.492 1.144 0.857 0.454 0.465 0.472 0.49 0.461 0.498 0.784 0.596 0.775 0.672 0.521 0.538 ETTh2 96 0.255 0.324 0.263 0.321 0.274 0.337 0.332 0.374 1.549 0.952 0.284 0.336 0.289 0.353 0.297 0.355 0.433 0.446 0.299 0.364 0.34 0.374 192 0.316 0.361 0.321 0.367 0.341 0.382 0.407 0.446 3.792 1.542 0.332 0.381 0.383 0.418 0.356 0.394 0.528 0.503 0.441 0.454 0.402 0.414 336 0.327 0.385 0.325 0.374 0.329 0.384 0.410 0.447 4.215 1.642 0.365 0.407 0.448 0.465 0.385 0.421 0.631 0.617 0.654 0.567 0.452 0.452 720 0.362 0.416 0.394 0.424 0.379 0.422 0.412 0.469 3.656 1.619 0.419 0.451 0.605 0.551 0.427 0.476 0.795 0.738 0.956 0.716 0.462 0.468 ETTm1 96 0.281 0.331 0.284 0.339 0.293 0.336 0.326 0.39 0.626 0.56 0.306 0.349 0.299 0.344 0.296 0.344 0.341 0.394 0.316 0.362 0.338 0.375 192 0.323 0.362 0.319 0.375 .335 0.371 0.365 0.415 0.725 0.619 0.335 0.366 0.335 0.368 0.339 0.378 0.414 0.428 0.363 0.390 0.374 0.387 336 0.341 0.379 0.349 0.383 0.369 0.392 0.392 0.425 1.005 0.741 0.364 0.381 0.369 0.386 0.381 0.392 0.426 0.440 0.408 0.426 0.410 0.411 720 0.409 0.420 0.407 0.418 0.416 0.420 0.446 0.458 1.133 0.845 0.413 0.418 0.425 0.421 0.435 0.438 0.499 0.467 0.481 0.476 0.478 0.458 ETTm2 96 0.154 0.237 0.158 0.241 0.166 0.256 0.182 0.271 0.355 0.462 0.161 0.251 0.167 0.260 0.171 0.254 0.312 0.328 0.179 0.275 0.187 0.267 192 0.211 0.278 0.215 0.283 0.223 0.296 0.252 0.318 0.595 0.586 0.215 0.281 0.224 0.303 0.226 0.298 0.425 0.490 0.307 0.376 0.249 0.309 336 0.267 0.325 0.263 0.317 0.274 0.329 0.324 0.364 1.277 0.871 0.267 0.326 0.281 0.342 0.287 0.362 0.491 0.547 0.325 0.388 0.321 0.351 720 0.322 0.358 0.328 0.366 0.362 0.385 0.410 0.424 3.001 1.267 0.352 0.383 0.397 0.421 0.391 0.406 0.568 0.612 0.502 0.493 0.408 0.403 Electricity 96 0.131 0.220 0.128 0.219 0.129 0.222 0.186 0.302 0.304 0.393 0.132 0.229 0.142 0.237 0.136 0.236 0.198 0.284 0.164 0.269 0.168 0.272 192 0.144 0.225 0.148 0.239 0.147 0.249 0.197 0.311 0.327 0.417 0.147 0.243 0.153 0.249 0.156 0.255 0.245 0.320 0.177 0.285 0.184 0.289 336 0.157 0.251 0.166 0.258 0.163 0.259 0.213 0.328 0.333 0.422 0.161 0.261 0.169 0.267 0.178 0.281 0.281 0.351 0.193 0.304 0.198 0.309 720 0.189 0.293 0.201 0.298 0.197 0.291 0.233 0.344 0.351 0.427 0.196 0.294 0.203 0.301 0.210 0.314 0.304 0.387 0.212 0.321 0.22 0.321 Traffic 96 0.397 0.227 0.543 0.235 0.361 0.249 0.576 0.359 0.733 0.412 0.334 0.253 0.410 0.282 0.401 0.275 0.643 0.376 0.519 0.309 0.593 0.321 192 0.396 0.248 0.567 0.242 0.379 0.256 0.619 0.381 0.777 0.435 0.342 0.257 0.423 0.287 0.405 0.284 0.699 0.405 0.537 0.315 0.617 0.336 336 0.403 0.260 0.602 0.262 0.393 0.264 0.608 0.375 0.776 0.434 0.355 0.261 0.436 0.296 0.442 0.301 0.728 0.429 0.534 0.313 0.629 0.336 720 0.428 0.277 0.671 0.281 0.432 0.286 0.621 0.375 0.827 0.466 0.388 0.273 0.466 0.315 0.468 0.309 0.782 0.440 0.577 0.325 0.641 0.352 Weather 96 0.147 0.193 0.142 0.181 0.149 0.198 0.238 0.314 0.354 0.405 0.166 0.222 0.176 0.237 0.154 0.205 0.293 0.308 0.161 0.229 0.172 0.226 192 0.184 0.239 0.187 0.227 0.194 0.238 0.275 0.329 0.419 0.434 0.209 0.263 0.227 0.282 0.195 0.247 0.406 0.457 0.225 0.281 0.219 0.261 336 0.231 0.258 0.237 0.270 0.245 0.282 0.339 0.377 0.583 0.543 0.254 0.301 0.265 0.319 0.247 0.306 0.462 0.492 0.278 0.331 0.280 0.306 720 0.305 0.312 0.311 0.327 0.314 0.325 0.389 0.409 0.916 0.705 0.310 0.341 0.323 0.362 0.316 0.351 0.517 0.527 0.311 0.356 0.365 0.359

When the model is used to make predictions over long time horizons the predictions starts to experience drift from the actual trajectory. In many use cases the measurement data is sparsely and irregularly available the long prediction window. Using a data assimilation-based framework we can make use of these measurements to correct the drift present in the predictions in a systematic way.

The extended Kalman filter (EKF) framework is a principled framework which updates the predicted states in a NLDS as a function of the error between the observed and predicted measurements (Särkkä and Svensson 2023). Assuming data $Y_{k}$ is available, we can define the correction equation inspired by the EKF as

\displaystyle\begin{bmatrix}Z^{\prime}_{k}\\ R^{\prime}_{k}\end{bmatrix}=\begin{bmatrix}\hat{Z}_{k}\\ R_{k}\end{bmatrix}+\begin{bmatrix}L_{g}&0\\ 0&L_{r}\end{bmatrix}\begin{bmatrix}J_{\psi}^{T}\\ J_{\psi}^{T}\end{bmatrix}\begin{bmatrix}\tilde{Y}_{k}-Y_{k}\end{bmatrix}\,.

(11)

Here $J_{\psi}$ is the Jacobian of the decoder function evaluated at the output of the prediction step i.e. $\hat{h}_{k}=\hat{Z}_{k}+R_{k}$ . The Jacobian linearizes the nonlinear measurement function around the current predicted state:

J_{\psi}=\frac{\partial\psi(\cdot;\theta)}{\partial h}\bigg{|}_{\hat{h}_{k}}

(12)

The output of the correction step $h^{\prime}_{k}=Z^{\prime}_{k}+R^{\prime}_{k}$ is then fed to the decoder to get the corrected prediction of the measurement trajectory.

Kalman Gain

The gain matrices $L_{g}$ and $L_{r}$ represent the component of Kalman gain for the global and residual branches. We use a gating mechanism parameterised by neural network to define the Kalman gains,

	$\displaystyle L_{g}$	$\displaystyle=\tanh(W_{z}^{z}(\hat{Z}_{k})+W_{y}^{g}(\phi_{g}(Y^{g}_{k}))+b_{g% })\,,$		(13)
	$\displaystyle L_{r}$	$\displaystyle=\tanh(W_{z}^{r}(R_{k})+W_{y}^{r}(\phi_{r}(Y^{r}_{k}))+b_{r})\,.$		(14)

here $W_{z}^{z},W_{z}^{r},W_{y}^{r},W_{y}^{g}$ are all single layer neural networks with ReLU activation. $b_{r},b_{g}$ are bias terms for $L_{r},L_{g}$ respectively.

Learning criterion

Operator Learning Loss

The physical model consists of the following trainable components: (i) latent Koopman model $\mathcal{K}$ , (ii) state encoding model and (iii) state decoding model. To ensure that the koopman dynamcis is respected by the learned physical model we minimise the following losses,

	$\displaystyle\mathcal{L}_{recon}=\sum_{t=0}^{H}\|\|\hat{y}_{k}-y_{k}\|\|_{2},$	$\displaystyle\mathcal{L}_{pred}=\sum_{t=1}^{T}\|\|\tilde{y}_{k}-y_{k}\|\|_{2},$
	$\displaystyle\mathcal{L}_{align}$	$\displaystyle=\sum_{k=1}^{T/k}\|\|\hat{Z}_{k}-Z_{k}\|\|_{2}$

$H$ is forecast horizon and $T$ is lookback window length. To achieve the best performance we train KODA in stages. First we train the prediction model with $\mathcal{L}_{align}$ , $\mathcal{L}_{pred}$ and $\mathcal{L}_{recon}$ . We then add the correction model and train only using $\mathcal{L}_{pred}$ . This two stage training process allows us to first learn the prediction model and then in the second stage learn the gain function while further improving the prediction output.

Training Criteria for $L_{g}$ and $L_{r}$

The Kalman gain $L_{g}$ and $L_{r}$ depends on whether the measurement data $Y_{k}$ is available for assimilation or not. The gating mechanism is first trained separately to allow correction to happen only when measurement data is made available during the inference time. During inference when no data is available the correction component should be $0$ such that (14) is only driven by the prediction term. For further details on training criteria and network architecture please refer to the appendix.

Experimental setting

Table 2: Multivariate long-term time series forecasting with data assimilation results for KODA. We evaluate KODA across forecast horizon of

H=720

in all datasets. We use

\alpha

to denote

\%

of measurement data available for online assimilation task in the forecast horizon. We report the mean of 10 runs for each value of

\alpha

Alpha( $\alpha$ ) ETTh1 ETTh2 ETTm1 ETTm2 Electricity Traffic Weather % MSE MAE MSE MAE MSE MAE MSE MAE MSE MAE MSE MAE MSE MAE 10 0.397 0.439 0.323 0.402 0.391 0.420 0.311 0.358 0.181 0.293 0.411 0.268 0.367 0.295 20 0.378 0.402 0.318 0.489 0.355 0.413 0.289 0.341 0.175 0.278 0.352 0.245 0.336 0.277 30 0.351 0.370 0.296 0.362 0.340 0.392 0.268 0.305 0.173 0.271 0.328 0.219 0.319 0.258

Table 3: State prediction for simple NLDS. KODA(G) represents the model with just the global Koopman component of KODA (without Fourier filtering). We compare different Koopman autoencoder-based models for thr state prediction task. These models are evaluated using the

100\times

MSE over

H=1000

Data KODA KODA(G) KAE KAE (PE) Pendulum 0.059 0.106 12.023 0.214 Duffing Oscillator 0.208 0.451 23.042 1.198 Lotka-Volterra 0.119 0.128 2.956 0.449 Lorenz’ 63 32.288 49.327 - 83.416

Table 4: We perform ablation study of the KODA framework. All results are for

H=720

on ETT benchmark. We compare four different variations of the KODA frameworks.

Dataset ETTh1 ETTh2 ETTm1 ETTm2 Metric MSE MAE MSE MAE MSE MAE MSE MAE KODA(G (w. $L_{g}$ )) 0.508 0.536 0.442 0.479 0.459 0.468 0.389 0.418 KODA(R (w. $L_{r}$ )) 0.451 0.497 0.406 0.436 0.453 0.455 0.395 0.422 KODA(P w/o C) 0.429 0.458 0.379 0.425 0.414 0.423 0.329 0.360 KODA(w/o L) 0.417 0.454 0.358 0.418 0.421 0.426 0.328 0.362 KODA(complete) 0.411 0.451 0.362 0.416 0.409 0.420 0.322 0.358

We evaluate KODA across multiple datasets and conduct an extensive comparison across different baselines. We evaluate KODA over two main tasks: (1) Forecasting and (2) Data Assimilation. We also present comparison based on some well-known NLDS datasets for the (3) State Prediction task. Through our experiments we showcase the comparable effectiveness of KODA with state of the art baselines for forecasting and then further improvement with the data assimilation strategy.

Datasets:

We compare KODA across 4 different real world multivariate NLDS time series datasets : (1) weather (Wetterstation), (2) traffic (PeMS), (3) ECL (UCI), (4) ETT (Zhou et al. 2021)(including 4 subsets: ETTh1, ETTh2, ETTm1, ETTm2). We also evaluate the performance of KODA on M4 (SpyrosMakridakis 2018) which is a real world univariate time series on periodically collected marketing data. We use the same data prepossessing and test time split ratio as presented in (Wu et al. 2023).

We use the same datasets for showing results for data assimilation task. We present additional results on state prediction task for some of the well known uni-variate NLDS: (1) Lokta Volterra, (2) Duffing Oscillator, (3) pendulum and (4) Lorenz 63. These NLDS present extremely nonlinear dynamics with varying dimensionality including multiple fixed points. Appendix B for NLDS description.

Baseline:

We choose state of the art methods to compare KODA performance on forecasting and data assimilation tasks. We would like to make the note that most deep learning methods discussed as our baseline don’t have a well defined way for data assimilation. For the comparison on the forecasting task we evaluate our model against MLP based methods: Koopman forecaster (KNF)(Wang et al. 2023b), Koopa (Liu et al. 2024), Dlinear (Zeng et al. 2022), TiDE (Das et al. 2023); Transformer based methods: fedformer (Zhou et al. 2022), PatchTST (Nie et al. 2023); Informer (Zhou et al. 2021) Temporal convolution method: TimesNet (Wu et al. 2023), MICN (Wang et al. 2023a). For the state prediction task from noisy measurements on the known NLDS we compare KODA against vanilla Koopman autoencoder (Lusch, Kutz, and Brunton 2018), MLP and Koopman autoencoder with periodic re-encoding (Fathi et al. 2024). We compare KODA against the method presented in (Frion et al. 2024a) for the data assimilation and forecasting task when data is sparsely present in the lookback window. KODA’s framework allows us to assimilate measurement data into prediction recursively. Hence, to evaluate KODA’s performance we use different amount of noisy measurement present in the forecast window and evaluate model forecast on the rest of the forecast window. We compare this over different sets of noise levels in the measurement data.

Evaluation Metrics:

The most commonly used metric for comparing the accuracy of the forecasting method is: (1) Mean Absolute Error (MAE) and (2) Mean Squared Error (MSE). For further details on the experimental setup, please refer to Appendix B.

Results

Long Term Forecast

In Table 1 we present the results for long term forecasting across several recent state-of-the-art methods. We compare all the methods on four different multivariate time series datasets. The choice of horizon $T_{H}$ is set to be $\{96,192,336,720\}$ . Through this task we are solely evaluating KODA’s ability to generate long term accurate forecasts. The lookback window for all the models in Table 1 was set to be $T_{L}=720$ . We present additional results across different lookback windows in the appendix. We observe that except for the traffic benchmark, KODA almost beats all the other methods across benchmarks. In the case of traffic dataset we observe that the best performing model were TiDE and PatchTST across several of the prediction horizon length except when $H=720$ , where KODA provides the second best performance. One interesting observation is that KODA outperforms all the other Koopman-based models: KOOPA and KNF. This shows that KODA presents a flexible model that can produce long term accurate forecast for multivariate non-stationary time series datasets. We provide additional results including results on uni-variate time-series benchmark M4 (SpyrosMakridakis 2018) in Appendix C.

Data Assimilation

One of the main contribution of KODA is in its ability to perform online data assimilation during inference time hence making the model highly adaptive. In Table 2 we present the result for the online data assimilation task. We define $\alpha$ as the percentage of data available in the forecast horizon for assimilation. We show results for $\alpha\in\{10\%,20\%,30\%\}$ across all the benchmarks. For this experiment we chose $\tau=24$ and $w=8$ . For the sake of simplicity we distributed the available measurements along the prediction horizon uniformly. We see a consistent improvement in the forecast results when correction is used. This shows that KODA other than being able to generate accurate long term forecasts, KODA is also able to make use of the data available at a future time to further improve its forecast. This makes KODA framework really potent in tasks where data is irregularly present at different intervals and long term forecasting is required. Additionally KODA can also be used when the data are irregularly present instead of uniformly. We show additional results on assimilation task across benchmarks in the appendix C.

State Prediction

The main attraction of using Koopman operators based approaches is achieving system identification in a complete data driven approach. To demonstrate KODA’s ability for such, we generate 100 trajectories of the different NLDS mentioned above. For further details on data generation process and NLDS desciption refer appendix A. We train the model using first 500 steps of the generated trajectories. During inference we generate the forecast for 1000 steps from the model. We observe that the model is able to generalise to the part of state space that were previously not seen during the training. We present the results in Table 3. Here we compare KODA with Koopman autoencoder methods proposed in literature (Lusch, Kutz, and Brunton 2018; Fathi et al. 2024). We also evaluate KODA and KODA (G) separately to evaluate the effectiveness of the physical branch of KODA. We observe that not only KODA but KODA (G) performs much better than KAE and KAE (PE). This attributed to the structure of our training procedure that doesn’t enforce the whole trajectory in the latent space to be globally linear but only enforces local linearity in the latent space. This is similar to the course correction achieved in (Fathi et al. 2024) using periodic re-encoding. Yet our model is largely different from the one used in (Fathi et al. 2024). We can observe that both KODA and KODA (G) performs better than KAE(PE) on all four datasets.

Ablation Study

To compare the efficacy of different elements of KODA we evaluate 5 different variations of KODA framework across different subsets of ETT dataset. These variations include 1. global model of KODA with the: KODA (G (w. $L_{g}$ )); 2. Residual model of KODA: KODA (R (w. $L_{r}$ )); 3. Only the prediction model without any correction component: KODA (P w/o C); 4. Full KODA model but without the Kalman gain gating function $L$ : KODA (w/o $L$ ) and finally 5. Full KODA model. For KODA (G (w. $L_{g}$ )) and KODA (R (w. $L_{r}$ )) we also don’t use the measurement disentanglement. We present the results of the ablation study in Table 4. When we only use the residual component, KODA reduces to an implementation of SegRNN (Lin et al. 2023). Albeit in (Lin et al. 2023) there is no data assimilation framework. KODA(complete) gives the best performance. We also observe a significant jump when the whole prediction model is used versus individual branches. We provide further ablation study of the KODA framework in appendix C.

Discussion

In this paper we proposed a novel framework for time series forecasting and data assimilation named KODA. Through rigorous experimentation across different benchmarks we showed that KODA is able to (1) Generate accurate long term forecast, (2) Generate a disentangled view of the NLDS into local and global components, (3) Use a correction framework to perform online data assimilation, (4) Perform state prediction of the state space for known NLDS. Therefore KODA present a flexible recursive framework that is able to exploit the benefits of koopman operator representation while being able to generate long term forecast and perform data assimilation.

References

Afshar, Germ, and Morris (2023) Afshar, S.; Germ, F.; and Morris, K. 2023. Extended Kalman filter based observer design for semilinear infinite-dimensional systems. IEEE Transactions on Automatic Control.
Azencot et al. (2020) Azencot, O.; Erichson, N. B.; Lin, V.; and Mahoney, M. 2020. Forecasting sequential data using consistent koopman autoencoders. In International Conference on Machine Learning, 475–485. PMLR.
Benosman, Mansour, and Huroyan (2017) Benosman, M.; Mansour, H.; and Huroyan, V. 2017. Koopman-operator observer-based estimation of pedestrian crowd flows. IFAC-PapersOnLine, 50(1): 14028–14033.
Berman, Naiman, and Azencot (2023) Berman, N.; Naiman, I.; and Azencot, O. 2023. Multifactor Sequential Disentanglement via Structured Koopman Autoencoders. In The Eleventh International Conference on Learning Representations.
Bevanda et al. (2023) Bevanda, P.; Beier, M.; Lederer, A.; Sosnowski, S. G.; Hüllermeier, E.; and Hirche, S. 2023. Koopman Kernel Regression. In Thirty-seventh Conference on Neural Information Processing Systems.
Brunton et al. (2016) Brunton, B. W.; Johnson, L. A.; Ojemann, J. G.; and Kutz, J. N. 2016. Extracting spatial–temporal coherent patterns in large-scale neural recordings using dynamic mode decomposition. Journal of neuroscience methods, 258: 1–15.
Brunton et al. (2021) Brunton, S. L.; Budišić, M.; Kaiser, E.; and Kutz, J. N. 2021. Modern Koopman theory for dynamical systems. arXiv preprint arXiv:2102.12086.
Brunton, Proctor, and Kutz (2016) Brunton, S. L.; Proctor, J. L.; and Kutz, J. N. 2016. Discovering governing equations from data by sparse identification of nonlinear dynamical systems. Proceedings of the national academy of sciences, 113(15): 3932–3937.
Cai et al. (2021) Cai, S.; Mao, Z.; Wang, Z.; Yin, M.; and Karniadakis, G. E. 2021. Physics-informed neural networks (PINNs) for fluid mechanics: A review. Acta Mechanica Sinica, 37(12): 1727–1738.
Cheng, Dong, and Lapata (2016) Cheng, J.; Dong, L.; and Lapata, M. 2016. Long short-term memory-networks for machine reading. arXiv preprint arXiv:1601.06733.
Chung et al. (2014) Chung, J.; Gulcehre, C.; Cho, K.; and Bengio, Y. 2014. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555.
Cuomo et al. (2022) Cuomo, S.; Di Cola, V. S.; Giampaolo, F.; Rozza, G.; Raissi, M.; and Piccialli, F. 2022. Scientific machine learning through physics–informed neural networks: Where we are and what’s next. Journal of Scientific Computing, 92(3): 88.
Das et al. (2023) Das, A.; Kong, W.; Leach, A.; Mathur, S. K.; Sen, R.; and Yu, R. 2023. Long-term Forecasting with TiDE: Time-series Dense Encoder. Transactions on Machine Learning Research.
Fathi et al. (2024) Fathi, M.; Gehring, C.; Pilault, J.; Kanaa, D.; Bacon, P.-L.; and Goroshin, R. 2024. Course Correcting Koopman Representations. In The Twelfth International Conference on Learning Representations.
Frion et al. (2024a) Frion, A.; Drumetz, L.; Dalla Mura, M.; Tochon, G.; and El Bey, A. A. 2024a. Neural Koopman prior for data assimilation. IEEE Transactions on Signal Processing.
Frion et al. (2024b) Frion, A.; Drumetz, L.; Tochon, G.; Mura, M. D.; and Bey, A. A. E. 2024b. Koopman Ensembles for Probabilistic Time Series Forecasting. arXiv preprint arXiv:2403.06757.
Guen and Thome (2020) Guen, V. L.; and Thome, N. 2020. Disentangling physical dynamics from unknown factors for unsupervised video prediction. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 11474–11484.
Imbiriba et al. (2023) Imbiriba, T.; Straka, O.; Duník, J.; and Closas, P. 2023. Augmented physics-based machine learning for navigation and tracking. IEEE Transactions on Aerospace and Electronic Systems.
Jiang et al. (2022) Jiang, W.; long Zhang, X.; Zuo, Z.; Shi, M.; and Su, S. 2022. Data-driven Kalman Filter with Kernel-based Koopman Operators for Nonlinear Robot Systems. In 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 12872–12878. IEEE.
Kawahara (2016) Kawahara, Y. 2016. Dynamic mode decomposition with reproducing kernels for Koopman spectral analysis. Advances in neural information processing systems, 29.
Koopman (1931) Koopman, B. O. 1931. Hamiltonian systems and transformation in Hilbert space. Proceedings of the National Academy of Sciences, 17(5): 315–318.
Kovachki et al. (2023) Kovachki, N.; Li, Z.; Liu, B.; Azizzadenesheli, K.; Bhattacharya, K.; Stuart, A.; and Anandkumar, A. 2023. Neural operator: Learning maps between function spaces with applications to pdes. Journal of Machine Learning Research, 24(89): 1–97.
Kutz et al. (2016) Kutz, J. N.; Brunton, S. L.; Brunton, B. W.; and Proctor, J. L. 2016. Dynamic mode decomposition: data-driven modeling of complex systems. SIAM.
Lam et al. (2022) Lam, R.; Sanchez-Gonzalez, A.; Willson, M.; Wirnsberger, P.; Fortunato, M.; Alet, F.; Ravuri, S.; Ewalds, T.; Eaton-Rosen, Z.; Hu, W.; et al. 2022. GraphCast: Learning skillful medium-range global weather forecasting. arXiv preprint arXiv:2212.12794.
Li et al. (2020) Li, Z.; Kovachki, N.; Azizzadenesheli, K.; Liu, B.; Bhattacharya, K.; Stuart, A.; and Anandkumar, A. 2020. Fourier neural operator for parametric partial differential equations. arXiv preprint arXiv:2010.08895.
Lin et al. (2023) Lin, S.; Lin, W.; Wu, W.; Zhao, F.; Mo, R.; and Zhang, H. 2023. Segrnn: Segment recurrent neural network for long-term time series forecasting. arXiv preprint arXiv:2308.11200.
Liu et al. (2024) Liu, Y.; Li, C.; Wang, J.; and Long, M. 2024. Koopa: Learning non-stationary time series dynamics with koopman predictors. Advances in Neural Information Processing Systems, 36.
Lusch, Kutz, and Brunton (2018) Lusch, B.; Kutz, J. N.; and Brunton, S. L. 2018. Deep learning for universal linear embeddings of nonlinear dynamics. Nature communications, 9(1): 4950.
Nie et al. (2023) Nie, Y.; Nguyen, N. H.; Sinthong, P.; and Kalagnanam, J. 2023. A Time Series is Worth 64 Words: Long-term Forecasting with Transformers. In The Eleventh International Conference on Learning Representations.
Pathak et al. (2022) Pathak, J.; Subramanian, S.; Harrington, P.; Raja, S.; Chattopadhyay, A.; Mardani, M.; Kurth, T.; Hall, D.; Li, Z.; Azizzadenesheli, K.; et al. 2022. Fourcastnet: A global data-driven high-resolution weather model using adaptive fourier neural operators. arXiv preprint arXiv:2202.11214.
Phillips (1961) Phillips, J. 1961. Generalized Koopmans’ Theorem. Physical Review, 123(2): 420.
Raissi, Perdikaris, and Karniadakis (2019) Raissi, M.; Perdikaris, P.; and Karniadakis, G. E. 2019. Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. Journal of Computational physics, 378: 686–707.
Revach et al. (2022) Revach, G.; Shlezinger, N.; Ni, X.; Escoriza, A. L.; Van Sloun, R. J.; and Eldar, Y. C. 2022. KalmanNet: Neural network aided Kalman filtering for partially known dynamics. IEEE Transactions on Signal Processing, 70: 1532–1547.
Salehinejad et al. (2017) Salehinejad, H.; Sankar, S.; Barfett, J.; Colak, E.; and Valaee, S. 2017. Recent advances in recurrent neural networks. arXiv preprint arXiv:1801.01078.
Särkkä and Svensson (2023) Särkkä, S.; and Svensson, L. 2023. Bayesian filtering and smoothing, volume 17. Cambridge university press.
Singh et al. (2024) Singh, A.; Borsoi, R. A.; Erdogmus, D.; and Imbiriba, T. 2024. Learning semilinear neural operators: A unified recursive framework for prediction and data assimilation. In The Twelfth International Conference on Learning Representations.
Singh et al. (2021) Singh, A.; Westlin, C.; Eisenbarth, H.; Losin, E. A. R.; Andrews-Hanna, J. R.; Wager, T. D.; Satpute, A. B.; Barrett, L. F.; Brooks, D. H.; and Erdogmus, D. 2021. Variation is the norm: Brain state dynamics evoked by emotional video clips. In 2021 43rd Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC), 6003–6007. IEEE.
SpyrosMakridakis (2018) SpyrosMakridakis. 2018. M4 dataset.
Takeishi, Kawahara, and Yairi (2017) Takeishi, N.; Kawahara, Y.; and Yairi, T. 2017. Learning Koopman invariant subspaces for dynamic mode decomposition. Advances in neural information processing systems, 30.
Tu (2013) Tu, J. H. 2013. Dynamic mode decomposition: Theory and applications. Ph.D. thesis, Princeton University.
Wang et al. (2023a) Wang, H.; Peng, J.; Huang, F.; Wang, J.; Chen, J.; and Xiao, Y. 2023a. MICN: Multi-scale Local and Global Context Modeling for Long-term Series Forecasting. In The Eleventh International Conference on Learning Representations.
Wang et al. (2023b) Wang, R.; Dong, Y.; Arik, S.; and Yu, R. 2023b. Koopman Neural Forecaster for Time Series with Temporal Distribution Shifts. In International Conference on Learning Representations (ICLR).
Wang, Xu, and Mu (2023) Wang, X.; Xu, X.; and Mu, Y. 2023. Neural Koopman Pooling: Control-Inspired Temporal Dynamics Encoding for Skeleton-Based Action Recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 10597–10607.
Wu et al. (2023) Wu, H.; Hu, T.; Liu, Y.; Zhou, H.; Wang, J.; and Long, M. 2023. TimesNet: Temporal 2D-Variation Modeling for General Time Series Analysis. In The Eleventh International Conference on Learning Representations.
Yeung, Kundu, and Hodas (2019) Yeung, E.; Kundu, S.; and Hodas, N. 2019. Learning deep neural network representations for Koopman operators of nonlinear dynamical systems. In 2019 American Control Conference (ACC), 4832–4839. IEEE.
Zeng et al. (2022) Zeng, A.; Chen, M.; Zhang, L.; and Xu, Q. 2022. Are Transformers Effective for Time Series Forecasting? arXiv preprint arXiv:2205.13504.
Zhou et al. (2021) Zhou, H.; Zhang, S.; Peng, J.; Zhang, S.; Li, J.; Xiong, H.; and Zhang, W. 2021. Informer: Beyond efficient transformer for long sequence time-series forecasting. In Proceedings of the AAAI conference on artificial intelligence, volume 35, 11106–11115.
Zhou et al. (2022) Zhou, T.; Ma, Z.; Wen, Q.; Wang, X.; Sun, L.; and Jin, R. 2022. Fedformer: Frequency enhanced decomposed transformer for long-term series forecasting. In International conference on machine learning, 27268–27286. PMLR.

	$\displaystyle\mathcal{L}_{recon}=\sum_{t=0}^{H}\|\|\hat{y}_{k}-y_{k}\|\|_{2},$	$\displaystyle\mathcal{L}_{pred}=\sum_{t=1}^{T}\|\|\tilde{y}_{k}-y_{k}\|\|_{2},$
	$\displaystyle\mathcal{L}_{align}$	$\displaystyle=\sum_{k=1}^{T/k}\|\|\hat{Z}_{k}-Z_{k}\|\|_{2}$

KODA: A Data-Driven Recursive Model for Forecasting and Data Assimilation Using Koopman operator

KODA: A Data-Driven Recursive Model for Time Series Forecasting and data assimilation using koopman operators

Abstract

Introduction

Related Works

Learning with Koopman Operator

Data Assimilation

Problem Setting

KODA

Data Encoding Scheme

State Disentanglement

Measurement Disentanglement

Segmentation and Encoding

Prediction Model

Residual Model

Physical Model

Correction Model

Kalman Gain

Learning criterion

Operator Learning Loss

Training Criteria for Lgsubscript𝐿𝑔L_{g}italic_L start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT and Lrsubscript𝐿𝑟L_{r}italic_L start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT

Experimental setting

Datasets:

Baseline:

Evaluation Metrics:

Results

Long Term Forecast

Data Assimilation

State Prediction

Ablation Study

Discussion

References

Training Criteria for $L_{g}$ and $L_{r}$