Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
License: arXiv.org perpetual non-exclusive license
arXiv:2312.00713v1 [math.NA] 01 Dec 2023

Nonlinear-manifold reduced order models with domain decomposition

Alejandro N. Diaz
Rice University
Houston, TX 77005
and5@rice.edu
&Youngsoo Choi
Lawrence Livermore National Laboratory
Livermore, CA 94550
choi15@llnl.gov
&Matthias Heinkenschloss
Rice University
Houston, TX 77005
heinken@rice.edu
Abstract

A nonlinear-manifold reduced order model (NM-ROM) is a great way of incorporating underlying physics principles into a neural network-based data-driven approach. We combine NM-ROMs with domain decomposition (DD) for efficient computation. NM-ROMs offer benefits over linear-subspace ROMs (LS-ROMs) but can be costly to train due to parameter scaling with the full-order model (FOM) size. To address this, we employ DD on the FOM, compute subdomain NM-ROMs, and then merge them into a global NM-ROM. This approach has multiple advantages: parallel training of subdomain NM-ROMs, fewer parameters than global NM-ROMs, and adaptability to subdomain-specific FOM features. Each subdomain NM-ROM uses a shallow, sparse autoencoder, enabling hyper-reduction (HR) for improved computational speed. In this paper, we detail an algebraic DD formulation for the FOM, train HR-equipped NM-ROMs for subdomains, and numerically compare them to DD LS-ROMs with HR. Results show a significant accuracy boost, on the order of magnitude, for the proposed DD NM-ROMs over DD LS-ROMs in solving the 2D steady-state Burgers’ equation.

1 Introduction

In science and engineering, complex tasks often involve repeatedly simulating a large-scale, parameterized, nonlinear system referred to as the full-order model (FOM). Ensuring high fidelity requires a high-dimensional model, leading to significant computational costs and lengthy simulations. As a result, tasks like design optimization become impractical for large-scale problems. Model reduction offers a solution by replacing the FOM with a computationally efficient, low-dimensional model called a reduced-order model (ROM). This ROM approximates the FOM’s behavior with adjustable accuracy, making it suitable for many-query applications. However, construction of accurate and computationally efficient ROMs poses challenges. To address them, we integrate the nonlinear-manifold ROM (NM-ROM) approach with an algebraic domain-decomposition (DD) framework.

Various model reduction methods have been integrated with DD, like reduced basis elements (RBE) Maday and Rønquist [2002, 2004], Iapichino et al. [2012], Antonietti et al. [2016], Eftang et al. [2012], Huynh et al. [2013], Eftang and Patera [2013], Iapichino et al. [2012], and the alternating Schwarz method Buffoni et al. [2009], Barnett et al. [2022], Smetana and Taddei [2022], Iollo et al. [2023]. However, they are often specialized to specific problems, dealing with the physical domain at the PDE level. In contrast, the authors in Hoang et al. [2021] take an algebraic approach by decomposing the FOM at the discrete level and computing linear-subspace ROMs (LS-ROMs) for each subdomain. While LS-ROMs work well in many cases Haasdonk [2017], Quarteroni et al. [2016], Hinze and Volkwein [2005], Gubisch and Volkwein [2017], Cheung et al. [2023], Copeland et al. [2022], Carlberg et al. [2018], Antoulas [2005], Benner and Breiten [2017], Antoulas et al. [2020], Gu [2011], Benner and Breiten [2015], Mayo and Antoulas [2007], Antoulas et al. [2016], Gosea and Antoulas [2018], Choi et al. [2021], Kim et al. [2021], Choi and Carlberg [2019], it is well known that advection-dominated problems and problems with sharp gradients cannot be well-approximated using low-dimensional linear subspaces. These problems are said to have slowly decaying Kolmogorov n𝑛nitalic_n-width Ohlberger and Rave [2016]. Recent approaches, such as nonlinear-manifold ROMs (NM-ROMs), address these problems by nonlinearly approximating the FOM in a low-dimensional nonlinear manifold. This is typically achieved through training an autoencoder on FOM snapshot data (e.g., Kashima [2016], Hartman and Mestha [2017], Lee and Carlberg [2020], Kim et al. [2022, 2020]). However, training of NM-ROMs is expensive. Indeed, in the monolithic single-domain case, the high-dimensionality of the FOM training data results in a large number of neural network (NN) parameters requiring training. In Barnett et al. [2023] this cost issue was mitigated by first computing a low dimensional proper orthogonal decomposition (POD) model, and then using a NN to train the coefficients in this POD. Instead, we integrate an autoencoder framework with DD. By coupling NM-ROM with DD, one can compute FOM training data on subdomains, thus reducing the dimensionality of subdomain NM-ROM training data, resulting in fewer parameters that need to be trained per subdomain NM-ROM.

We also note that couplings of NNs and DD for solutions of partial differential equations (PDEs) have been considered in previous work (e.g., Li et al. [2020a, b], Sun et al. [2022], Li et al. [2023]). However, these approaches use deep learning to solve a PDE by representing its solution as a NN and minimizing a corresponding physics-informed loss function. In contrast, our work uses autoencoders to reduce the dimensionality of an existing numerical model. The autoencoders are pretrained in an offline stage to find low-dimensional representations of FOM snapshot data, and used in an online stage to significantly reduce the computational cost and runtime of numerical simulations. Our work is the first to couple autoencoders with DD in the reduced-order modeling context.

Here, we extend the work of Hoang et al. [2021] on DD LS-ROM and integrate NM-ROM with hyper-reduction (HR) using shallow, sparse autoencoders discussed in Kim et al. [2022]. We incorporate the NM-ROM approach into this framework because of its success when applied to problems with slowly decaying Kolmogorov n𝑛nitalic_n-width. DD allows one to compute FOM training snapshots on subdomains, thus reducing the dimensionality of subdomain NM-ROM training data, resulting in fewer parameters that need to be trained per subdomain NM-ROM. We use wide, shallow, and sparse autoencoder architecture, which allows HR to be efficiently applied, thus reducing the complexity caused by nonlinearity and yielding computational speedup. Additionally, we modify the wide, shallow, and sparse architecture used in Kim et al. [2022] to also include a sparsity mask for the encoder input layer as well as the decoder output layer. The proposed DD NM-ROM approach is compared with DD LS-ROM on the 2D Burgers’ equation.

2 DD full order model

First consider the monolithic, single-domain FOM written as a residual equation

𝒓(𝒙;𝝁)=𝟎,𝒓𝒙𝝁0\boldsymbol{r}(\boldsymbol{x};{\boldsymbol{\mu}})=\boldsymbol{0},bold_italic_r ( bold_italic_x ; bold_italic_μ ) = bold_0 , (1)

where 𝒙Nx𝒙superscriptsubscript𝑁𝑥\boldsymbol{x}\in\mathbb{R}^{N_{x}}bold_italic_x ∈ blackboard_R start_POSTSUPERSCRIPT italic_N start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT end_POSTSUPERSCRIPT is the state, 𝝁𝒟Nμ𝝁𝒟superscriptsubscript𝑁𝜇{\boldsymbol{\mu}}\in{\cal D}\subset\mathbb{R}^{N_{\mu}}bold_italic_μ ∈ caligraphic_D ⊂ blackboard_R start_POSTSUPERSCRIPT italic_N start_POSTSUBSCRIPT italic_μ end_POSTSUBSCRIPT end_POSTSUPERSCRIPT is a parameter, and 𝒓:Nx×NμNx:𝒓superscriptsubscript𝑁𝑥superscriptsubscript𝑁𝜇superscriptsubscript𝑁𝑥\boldsymbol{r}:\mathbb{R}^{N_{x}}\times\mathbb{R}^{N_{\mu}}\to\mathbb{R}^{N_{x}}bold_italic_r : blackboard_R start_POSTSUPERSCRIPT italic_N start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT end_POSTSUPERSCRIPT × blackboard_R start_POSTSUPERSCRIPT italic_N start_POSTSUBSCRIPT italic_μ end_POSTSUBSCRIPT end_POSTSUPERSCRIPT → blackboard_R start_POSTSUPERSCRIPT italic_N start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT end_POSTSUPERSCRIPT is the residual function. FOMs of the form (1) typically arise from discretizations of partial differential equations (PDEs). One can reformulate (1) into a DD formulation by partitioning the residual equation into nΩsubscript𝑛Ωn_{\Omega}italic_n start_POSTSUBSCRIPT roman_Ω end_POSTSUBSCRIPT systems of equations (so-called algebraic subdomains), coupling them via compatibility constraints, and converting the systems of equations into a least-squares problem, resulting in

min(𝒙iΩ,𝒙iΓ),i=1,,nΩ12i=1nΩ𝒓i(𝒙iΩ,𝒙iΓ;𝝁)22,s.t.i=1nΩ𝑨i𝒙iΓ=𝟎,\min_{(\boldsymbol{x}_{i}^{\Omega},\boldsymbol{x}_{i}^{\Gamma}),i=1,\dots,n_{% \Omega}}\quad\frac{1}{2}\sum_{i=1}^{n_{\Omega}}\left\|\boldsymbol{r}_{i}\left(% \boldsymbol{x}_{i}^{\Omega},\boldsymbol{x}_{i}^{\Gamma};{\boldsymbol{\mu}}% \right)\right\|_{2}^{2},\quad{\rm s.t.}\quad\sum_{i=1}^{n_{\Omega}}\boldsymbol% {A}_{i}\boldsymbol{x}_{i}^{\Gamma}=\boldsymbol{0},roman_min start_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_Ω end_POSTSUPERSCRIPT , bold_italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_Γ end_POSTSUPERSCRIPT ) , italic_i = 1 , … , italic_n start_POSTSUBSCRIPT roman_Ω end_POSTSUBSCRIPT end_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT roman_Ω end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ∥ bold_italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_Ω end_POSTSUPERSCRIPT , bold_italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_Γ end_POSTSUPERSCRIPT ; bold_italic_μ ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , roman_s . roman_t . ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT roman_Ω end_POSTSUBSCRIPT end_POSTSUPERSCRIPT bold_italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT bold_italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_Γ end_POSTSUPERSCRIPT = bold_0 , (2)

where 𝒙iΩNiΩsuperscriptsubscript𝒙𝑖Ωsuperscriptsuperscriptsubscript𝑁𝑖Ω\boldsymbol{x}_{i}^{\Omega}\in\mathbb{R}^{N_{i}^{\Omega}}bold_italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_Ω end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_N start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_Ω end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT, 𝒙iΓNiΓsuperscriptsubscript𝒙𝑖Γsuperscriptsuperscriptsubscript𝑁𝑖Γ\boldsymbol{x}_{i}^{\Gamma}\in\mathbb{R}^{N_{i}^{\Gamma}}bold_italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_Γ end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_N start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_Γ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT, 𝒓i:NiΩ×NiΓ×𝒟Nir:subscript𝒓𝑖superscriptsuperscriptsubscript𝑁𝑖Ωsuperscriptsuperscriptsubscript𝑁𝑖Γ𝒟superscriptsuperscriptsubscript𝑁𝑖𝑟\boldsymbol{r}_{i}:\mathbb{R}^{N_{i}^{\Omega}}\times\mathbb{R}^{N_{i}^{\Gamma}% }\times{\cal D}\to\mathbb{R}^{N_{i}^{r}}bold_italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT : blackboard_R start_POSTSUPERSCRIPT italic_N start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_Ω end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT × blackboard_R start_POSTSUPERSCRIPT italic_N start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_Γ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT × caligraphic_D → blackboard_R start_POSTSUPERSCRIPT italic_N start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT, and 𝑨i{1,0,1}Na×NiΓsubscript𝑨𝑖superscript101subscript𝑁𝑎superscriptsubscript𝑁𝑖Γ\boldsymbol{A}_{i}\in\left\{-1,0,1\right\}^{N_{a}\times N_{i}^{\Gamma}}bold_italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ { - 1 , 0 , 1 } start_POSTSUPERSCRIPT italic_N start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT × italic_N start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_Γ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT are the i𝑖iitalic_i-th subdomain interior-state, interface-state, residual function, and compatibility constraint matrix, respectively. The sparsity pattern of the monolithic residual function 𝒓𝒓\boldsymbol{r}bold_italic_r determines the structure of the subdomain residual functions 𝒓isubscript𝒓𝑖\boldsymbol{r}_{i}bold_italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, as well as the decomposition of the state 𝒙𝒙\boldsymbol{x}bold_italic_x into subdomain states (𝒙iΩ,𝒙iΓ)superscriptsubscript𝒙𝑖Ωsuperscriptsubscript𝒙𝑖Γ(\boldsymbol{x}_{i}^{\Omega},\boldsymbol{x}_{i}^{\Gamma})( bold_italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_Ω end_POSTSUPERSCRIPT , bold_italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_Γ end_POSTSUPERSCRIPT ). The interior-states 𝒙iΩsuperscriptsubscript𝒙𝑖Ω\boldsymbol{x}_{i}^{\Omega}bold_italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_Ω end_POSTSUPERSCRIPT are those that are only used to compute the residual 𝒓isubscript𝒓𝑖\boldsymbol{r}_{i}bold_italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT in the i𝑖iitalic_i-th subdomain, whereas the interface-states 𝒙iΓsuperscriptsubscript𝒙𝑖Γ\boldsymbol{x}_{i}^{\Gamma}bold_italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_Γ end_POSTSUPERSCRIPT are also used in the residual computation of neighboring subdomains. The equality constraint determined by 𝑨isubscript𝑨𝑖\boldsymbol{A}_{i}bold_italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT enforces equality on the overlapping interface states. For further details, see [Diaz et al., 2023, Sec. 2] or [Hoang et al., 2021, Sec. 2].

3 DD nonlinear-manifold reduced order model

For each subdomain i{1,,nΩ}𝑖1subscript𝑛Ωi\in\left\{1,\dots,n_{\Omega}\right\}italic_i ∈ { 1 , … , italic_n start_POSTSUBSCRIPT roman_Ω end_POSTSUBSCRIPT }, let 𝒈iΩ:niΩNiΩ:superscriptsubscript𝒈𝑖Ωsuperscriptsuperscriptsubscript𝑛𝑖Ωsuperscriptsuperscriptsubscript𝑁𝑖Ω\boldsymbol{g}_{i}^{\Omega}:\mathbb{R}^{n_{i}^{\Omega}}\to\mathbb{R}^{N_{i}^{% \Omega}}bold_italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_Ω end_POSTSUPERSCRIPT : blackboard_R start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_Ω end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT → blackboard_R start_POSTSUPERSCRIPT italic_N start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_Ω end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT, niΩNiΩmuch-less-thansuperscriptsubscript𝑛𝑖Ωsuperscriptsubscript𝑁𝑖Ωn_{i}^{\Omega}\ll N_{i}^{\Omega}italic_n start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_Ω end_POSTSUPERSCRIPT ≪ italic_N start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_Ω end_POSTSUPERSCRIPT, and 𝒈iΓ:niΓNiΓ:superscriptsubscript𝒈𝑖Γsuperscriptsuperscriptsubscript𝑛𝑖Γsuperscriptsuperscriptsubscript𝑁𝑖Γ\boldsymbol{g}_{i}^{\Gamma}:\mathbb{R}^{n_{i}^{\Gamma}}\to\mathbb{R}^{N_{i}^{% \Gamma}}bold_italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_Γ end_POSTSUPERSCRIPT : blackboard_R start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_Γ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT → blackboard_R start_POSTSUPERSCRIPT italic_N start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_Γ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT, niΓNiΓmuch-less-thansuperscriptsubscript𝑛𝑖Γsuperscriptsubscript𝑁𝑖Γn_{i}^{\Gamma}\ll N_{i}^{\Gamma}italic_n start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_Γ end_POSTSUPERSCRIPT ≪ italic_N start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_Γ end_POSTSUPERSCRIPT, be decoders such that 𝒙iΩ𝒈iΩ(𝒙^iΩ)superscriptsubscript𝒙𝑖Ωsuperscriptsubscript𝒈𝑖Ωsuperscriptsubscript^𝒙𝑖Ω\boldsymbol{x}_{i}^{\Omega}\approx\boldsymbol{g}_{i}^{\Omega}(\widehat{% \boldsymbol{x}}_{i}^{\Omega})bold_italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_Ω end_POSTSUPERSCRIPT ≈ bold_italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_Ω end_POSTSUPERSCRIPT ( over^ start_ARG bold_italic_x end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_Ω end_POSTSUPERSCRIPT ) and 𝒙iΓ𝒈iΓ(𝒙^iΓ)superscriptsubscript𝒙𝑖Γsuperscriptsubscript𝒈𝑖Γsuperscriptsubscript^𝒙𝑖Γ\boldsymbol{x}_{i}^{\Gamma}\approx\boldsymbol{g}_{i}^{\Gamma}(\widehat{% \boldsymbol{x}}_{i}^{\Gamma})bold_italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_Γ end_POSTSUPERSCRIPT ≈ bold_italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_Γ end_POSTSUPERSCRIPT ( over^ start_ARG bold_italic_x end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_Γ end_POSTSUPERSCRIPT ). Also let 𝑩i{0,1}NiB×Nirsubscript𝑩𝑖superscript01superscriptsubscript𝑁𝑖𝐵superscriptsubscript𝑁𝑖𝑟\boldsymbol{B}_{i}\in\left\{0,1\right\}^{N_{i}^{B}\times N_{i}^{r}}bold_italic_B start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ { 0 , 1 } start_POSTSUPERSCRIPT italic_N start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_B end_POSTSUPERSCRIPT × italic_N start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT, NiBNirsuperscriptsubscript𝑁𝑖𝐵superscriptsubscript𝑁𝑖𝑟N_{i}^{B}\leq N_{i}^{r}italic_N start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_B end_POSTSUPERSCRIPT ≤ italic_N start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT, denote a row-sampling matrix for collocation HR, and let 𝑪nC×NA¯𝑪superscriptsubscript𝑛𝐶subscript𝑁¯𝐴\boldsymbol{C}\in\mathbb{R}^{n_{C}\times N_{\overline{A}}}bold_italic_C ∈ blackboard_R start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT × italic_N start_POSTSUBSCRIPT over¯ start_ARG italic_A end_ARG end_POSTSUBSCRIPT end_POSTSUPERSCRIPT, nCNamuch-less-thansubscript𝑛𝐶subscript𝑁𝑎\;n_{C}\ll N_{a}italic_n start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT ≪ italic_N start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT, be a Gaussian test matrix. The DD NM-ROM is evaluated by solving

min(𝒙^iΩ,𝒙^iΓ),i=1,,nΩ12i=1nΩ𝑩i𝒓i(𝒈iΩ(𝒙^iΩ),𝒈iΓ(𝒙^iΓ))22,s.t.i=1nΩ𝑪𝑨i𝒈iΓ(𝒙^iΓ)=𝟎.\min_{(\widehat{\boldsymbol{x}}_{i}^{\Omega},\widehat{\boldsymbol{x}}_{i}^{% \Gamma}),i=1,\dots,n_{\Omega}}\quad\frac{1}{2}\sum_{i=1}^{n_{\Omega}}\left\|% \boldsymbol{B}_{i}\boldsymbol{r}_{i}\left(\boldsymbol{g}_{i}^{\Omega}\left(% \widehat{\boldsymbol{x}}_{i}^{\Omega}\right),\boldsymbol{g}_{i}^{\Gamma}\left(% \widehat{\boldsymbol{x}}_{i}^{\Gamma}\right)\right)\right\|_{2}^{2},\quad{\rm s% .t.}\quad\sum_{i=1}^{n_{\Omega}}\boldsymbol{C}\boldsymbol{A}_{i}\boldsymbol{g}% _{i}^{\Gamma}(\widehat{\boldsymbol{x}}_{i}^{\Gamma})=\boldsymbol{0}.roman_min start_POSTSUBSCRIPT ( over^ start_ARG bold_italic_x end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_Ω end_POSTSUPERSCRIPT , over^ start_ARG bold_italic_x end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_Γ end_POSTSUPERSCRIPT ) , italic_i = 1 , … , italic_n start_POSTSUBSCRIPT roman_Ω end_POSTSUBSCRIPT end_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT roman_Ω end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ∥ bold_italic_B start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT bold_italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_Ω end_POSTSUPERSCRIPT ( over^ start_ARG bold_italic_x end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_Ω end_POSTSUPERSCRIPT ) , bold_italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_Γ end_POSTSUPERSCRIPT ( over^ start_ARG bold_italic_x end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_Γ end_POSTSUPERSCRIPT ) ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , roman_s . roman_t . ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT roman_Ω end_POSTSUBSCRIPT end_POSTSUPERSCRIPT bold_italic_C bold_italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT bold_italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_Γ end_POSTSUPERSCRIPT ( over^ start_ARG bold_italic_x end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_Γ end_POSTSUPERSCRIPT ) = bold_0 . (3)

If HR is not applied (i.e., 𝑩i=𝑰subscript𝑩𝑖𝑰\boldsymbol{B}_{i}=\boldsymbol{I}bold_italic_B start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = bold_italic_I in (3)), the ROM’s computational savings are limited because evaluation of residuals (𝒙^iΩ,𝒙^iΓ)superscriptsubscript^𝒙𝑖Ωsuperscriptsubscript^𝒙𝑖Γabsent\big{(}\widehat{\boldsymbol{x}}_{i}^{\Omega},\widehat{\boldsymbol{x}}_{i}^{% \Gamma}\big{)}\to( over^ start_ARG bold_italic_x end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_Ω end_POSTSUPERSCRIPT , over^ start_ARG bold_italic_x end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_Γ end_POSTSUPERSCRIPT ) → (𝒈iΩ(𝒙^iΩ),𝒈iΓ(𝒙^iΓ))superscriptsubscript𝒈𝑖Ωsuperscriptsubscript^𝒙𝑖Ωsuperscriptsubscript𝒈𝑖Γsuperscriptsubscript^𝒙𝑖Γabsent\big{(}\boldsymbol{g}_{i}^{\Omega}(\widehat{\boldsymbol{x}}_{i}^{\Omega}),% \boldsymbol{g}_{i}^{\Gamma}(\widehat{\boldsymbol{x}}_{i}^{\Gamma})\big{)}\to( bold_italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_Ω end_POSTSUPERSCRIPT ( over^ start_ARG bold_italic_x end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_Ω end_POSTSUPERSCRIPT ) , bold_italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_Γ end_POSTSUPERSCRIPT ( over^ start_ARG bold_italic_x end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_Γ end_POSTSUPERSCRIPT ) ) → 𝒓i(𝒈iΩ(𝒙^iΩ),𝒈iΓ(𝒙^iΓ))subscript𝒓𝑖superscriptsubscript𝒈𝑖Ωsuperscriptsubscript^𝒙𝑖Ωsuperscriptsubscript𝒈𝑖Γsuperscriptsubscript^𝒙𝑖Γ\boldsymbol{r}_{i}\big{(}\boldsymbol{g}_{i}^{\Omega}\big{(}\widehat{% \boldsymbol{x}}_{i}^{\Omega}\big{)},\boldsymbol{g}_{i}^{\Gamma}\big{(}\widehat% {\boldsymbol{x}}_{i}^{\Gamma}\big{)}\big{)}bold_italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_Ω end_POSTSUPERSCRIPT ( over^ start_ARG bold_italic_x end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_Ω end_POSTSUPERSCRIPT ) , bold_italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_Γ end_POSTSUPERSCRIPT ( over^ start_ARG bold_italic_x end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_Γ end_POSTSUPERSCRIPT ) ) scales with the size NiΩsuperscriptsubscript𝑁𝑖ΩN_{i}^{\Omega}italic_N start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_Ω end_POSTSUPERSCRIPT and NiΓsuperscriptsubscript𝑁𝑖ΓN_{i}^{\Gamma}italic_N start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_Γ end_POSTSUPERSCRIPT of the FOM. Thus, HR is applied to decrease the computational complexity caused by the nonlinearity of 𝒓isubscript𝒓𝑖\boldsymbol{r}_{i}bold_italic_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, and increase the computational speedup. We use [Carlberg et al., 2013, Algo. 3] to greedily compute a row sampling matrix 𝑩isubscript𝑩𝑖\boldsymbol{B}_{i}bold_italic_B start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT for collocation HR. The application of HR to the decoders 𝒈iΩsuperscriptsubscript𝒈𝑖Ω\boldsymbol{g}_{i}^{\Omega}bold_italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_Ω end_POSTSUPERSCRIPT and 𝒈iΓsuperscriptsubscript𝒈𝑖Γ\boldsymbol{g}_{i}^{\Gamma}bold_italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_Γ end_POSTSUPERSCRIPT is discussed further in Sec. 3.1. Following Hoang et al. [2021], we apply a Gaussian test matrix 𝑪nC×Na𝑪superscriptsubscript𝑛𝐶subscript𝑁𝑎\boldsymbol{C}\in\mathbb{R}^{n_{C}\times N_{a}}bold_italic_C ∈ blackboard_R start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT × italic_N start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT end_POSTSUPERSCRIPT, nCNamuch-less-thansubscript𝑛𝐶subscript𝑁𝑎\;n_{C}\ll N_{a}italic_n start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT ≪ italic_N start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT, to convert the compatibility constraints into a so-called “weak compatibility constraint", which decreases the number of constraints to avoid making the DD ROM over-determined.

The DD FOM (2) and DD NM-ROM (3) are solved using an inexact Lagrange-Newton sequential quadratic programming (SQP) solver, where the Hessian of the Lagrangian is replaced with a Gauss-Newton approximation. This avoids computation of second order derivatives of residuals and constraints in (3), but still achieves good convergence for (2) and (3). For further details, see Diaz et al. [2023].

The DD NM-ROM (3) formulation has several benefits. Training, i.e., computation of the 𝒈iΩsuperscriptsubscript𝒈𝑖Ω\boldsymbol{g}_{i}^{\Omega}bold_italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_Ω end_POSTSUPERSCRIPT and 𝒈iΓsuperscriptsubscript𝒈𝑖Γ\boldsymbol{g}_{i}^{\Gamma}bold_italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_Γ end_POSTSUPERSCRIPT is local, involves few parameters, and can be done in parallel. The ROMs can be adjusted to localized features of the problem, which may result in smaller ROMs. Parallelization can be used to speed up ROM computation/training and ROM execution.

3.1 NM-ROM architecture and training

We use single-layer, wide, and sparse decoders with smooth activation functions to represent the maps 𝒈iΩsuperscriptsubscript𝒈𝑖Ω\boldsymbol{g}_{i}^{\Omega}bold_italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_Ω end_POSTSUPERSCRIPT and 𝒈iΓsuperscriptsubscript𝒈𝑖Γ\boldsymbol{g}_{i}^{\Gamma}bold_italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_Γ end_POSTSUPERSCRIPT. The corresponding encoders, denoted 𝒉iΩsuperscriptsubscript𝒉𝑖Ω\boldsymbol{h}_{i}^{\Omega}bold_italic_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_Ω end_POSTSUPERSCRIPT and 𝒉iΓsuperscriptsubscript𝒉𝑖Γ\boldsymbol{h}_{i}^{\Gamma}bold_italic_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_Γ end_POSTSUPERSCRIPT, are also single-layer, wide, and sparse. Shallow networks are used for computational efficiency; fewer layers correspond to fewer repeated matrix-vector multiplications when evaluating the decoders. The shallow depth necessitates a wide network to maintain enough expressiveness for use in NM-ROM. Smooth activations (i.e., swish) are used to ensure that 𝒈iΩsuperscriptsubscript𝒈𝑖Ω\boldsymbol{g}_{i}^{\Omega}bold_italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_Ω end_POSTSUPERSCRIPT and 𝒈iΓsuperscriptsubscript𝒈𝑖Γ\boldsymbol{g}_{i}^{\Gamma}bold_italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_Γ end_POSTSUPERSCRIPT are continuously differentiable. Normalization and de-normalization layers are also applied at the encoder input and decoder output layers, respectively.

Sparsity is applied at the decoder output layer so that HR can be applied. The sparsity allows one to compute a subnet, which only keeps track of the hidden nodes required to compute the output nodes that remain after HR. Further details can be found in [Kim et al., 2022, Sec. 3.2], [Diaz et al., 2023, Sec. 5.3]. We also apply a sparsity mask to the encoder input layer so that the autoencoders are symmetric across the latent layer. The sparsity pattern has a tri-banded structure inspired by 2D finite difference stencils, where the number of nonzeros per band and the separation between bands are hyper-parameters.

To train the autoencoders, we first generate FOM snapshots in an offline stage by solving (2) at parameters {𝝁}=1Msuperscriptsubscriptsubscript𝝁1𝑀\left\{{\boldsymbol{\mu}}_{\ell}\right\}_{\ell=1}^{M}{ bold_italic_μ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT } start_POSTSUBSCRIPT roman_ℓ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_M end_POSTSUPERSCRIPT, and collect interior- and interface-state snapshot datasets 𝑿iΩNiΩ×Msuperscriptsubscript𝑿𝑖Ωsuperscriptsuperscriptsubscript𝑁𝑖Ω𝑀\boldsymbol{X}_{i}^{\Omega}\in\mathbb{R}^{N_{i}^{\Omega}\times M}bold_italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_Ω end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_N start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_Ω end_POSTSUPERSCRIPT × italic_M end_POSTSUPERSCRIPT and 𝑿iΓNiΓ×M.superscriptsubscript𝑿𝑖Γsuperscriptsuperscriptsubscript𝑁𝑖Γ𝑀\boldsymbol{X}_{i}^{\Gamma}\in\mathbb{R}^{N_{i}^{\Gamma}\times M}.bold_italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_Γ end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_N start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_Γ end_POSTSUPERSCRIPT × italic_M end_POSTSUPERSCRIPT . Alternatively, one can solve the monolithic FOM (1) at each 𝝁subscript𝝁{\boldsymbol{\mu}}_{\ell}bold_italic_μ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT and restrict the corresponding states 𝒙(𝝁)𝒙subscript𝝁\boldsymbol{x}({\boldsymbol{\mu}}_{\ell})bold_italic_x ( bold_italic_μ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ) to interior-states 𝒙iΩ(𝝁)superscriptsubscript𝒙𝑖Ωsubscript𝝁\boldsymbol{x}_{i}^{\Omega}({\boldsymbol{\mu}}_{\ell})bold_italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_Ω end_POSTSUPERSCRIPT ( bold_italic_μ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ) and interface-states 𝒙iΓ(𝝁)superscriptsubscript𝒙𝑖Γsubscript𝝁\boldsymbol{x}_{i}^{\Gamma}({\boldsymbol{\mu}}_{\ell})bold_italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_Γ end_POSTSUPERSCRIPT ( bold_italic_μ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ) for each subdomain. We use the latter approach. The autoencoders (𝒉iΩ,𝒈iΩ)superscriptsubscript𝒉𝑖Ωsuperscriptsubscript𝒈𝑖Ω(\boldsymbol{h}_{i}^{\Omega},\boldsymbol{g}_{i}^{\Omega})( bold_italic_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_Ω end_POSTSUPERSCRIPT , bold_italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_Ω end_POSTSUPERSCRIPT ) and (𝒉iΓ,𝒈iΓ)superscriptsubscript𝒉𝑖Γsuperscriptsubscript𝒈𝑖Γ(\boldsymbol{h}_{i}^{\Gamma},\boldsymbol{g}_{i}^{\Gamma})( bold_italic_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_Γ end_POSTSUPERSCRIPT , bold_italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_Γ end_POSTSUPERSCRIPT ) are then trained in parallel by minimizing the respective MSE losses

iΩ=1M=1M𝒙iΩ(𝝁)𝒈iΩ(𝒉iΩ(𝒙iΩ(𝝁)))22,iΓ=1M=1M𝒙iΓ(𝝁)𝒈iΓ(𝒉iΓ(𝒙iΓ(𝝁)))22formulae-sequencesuperscriptsubscript𝑖Ω1𝑀superscriptsubscript1𝑀superscriptsubscriptnormsuperscriptsubscript𝒙𝑖Ωsubscript𝝁superscriptsubscript𝒈𝑖Ωsuperscriptsubscript𝒉𝑖Ωsuperscriptsubscript𝒙𝑖Ωsubscript𝝁22superscriptsubscript𝑖Γ1𝑀superscriptsubscript1𝑀superscriptsubscriptnormsuperscriptsubscript𝒙𝑖Γsubscript𝝁superscriptsubscript𝒈𝑖Γsuperscriptsubscript𝒉𝑖Γsuperscriptsubscript𝒙𝑖Γsubscript𝝁22{\cal L}_{i}^{\Omega}=\frac{1}{M}\sum_{\ell=1}^{M}\left\|\boldsymbol{x}_{i}^{% \Omega}({\boldsymbol{\mu}}_{\ell})-\boldsymbol{g}_{i}^{\Omega}(\boldsymbol{h}_% {i}^{\Omega}(\boldsymbol{x}_{i}^{\Omega}({\boldsymbol{\mu}}_{\ell})))\right\|_% {2}^{2},\;{\cal L}_{i}^{\Gamma}=\frac{1}{M}\sum_{\ell=1}^{M}\left\|\boldsymbol% {x}_{i}^{\Gamma}({\boldsymbol{\mu}}_{\ell})-\boldsymbol{g}_{i}^{\Gamma}(% \boldsymbol{h}_{i}^{\Gamma}(\boldsymbol{x}_{i}^{\Gamma}({\boldsymbol{\mu}}_{% \ell})))\right\|_{2}^{2}caligraphic_L start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_Ω end_POSTSUPERSCRIPT = divide start_ARG 1 end_ARG start_ARG italic_M end_ARG ∑ start_POSTSUBSCRIPT roman_ℓ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_M end_POSTSUPERSCRIPT ∥ bold_italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_Ω end_POSTSUPERSCRIPT ( bold_italic_μ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ) - bold_italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_Ω end_POSTSUPERSCRIPT ( bold_italic_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_Ω end_POSTSUPERSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_Ω end_POSTSUPERSCRIPT ( bold_italic_μ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ) ) ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , caligraphic_L start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_Γ end_POSTSUPERSCRIPT = divide start_ARG 1 end_ARG start_ARG italic_M end_ARG ∑ start_POSTSUBSCRIPT roman_ℓ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_M end_POSTSUPERSCRIPT ∥ bold_italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_Γ end_POSTSUPERSCRIPT ( bold_italic_μ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ) - bold_italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_Γ end_POSTSUPERSCRIPT ( bold_italic_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_Γ end_POSTSUPERSCRIPT ( bold_italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_Γ end_POSTSUPERSCRIPT ( bold_italic_μ start_POSTSUBSCRIPT roman_ℓ end_POSTSUBSCRIPT ) ) ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT (4)

for each subdomain i=1,,nΩ𝑖1subscript𝑛Ωi=1,\ldots,n_{\Omega}italic_i = 1 , … , italic_n start_POSTSUBSCRIPT roman_Ω end_POSTSUBSCRIPT. The snapshots undergo a random 90-10 split for training and validation, and the MSE loss is minimized using the Adam optimizer over 2000200020002000 epochs with a batch size of 32323232. We also apply early stopping Prechelt [1998] with a patience of 300300300300 and reduce the learning rate on plateau with an initial learning rate of 103superscript10310^{-3}10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT. The implementation was done in PyTorch and used the PyTorch Sparse and SparseLinear packages.

4 Numerical experiment: 2D Burgers’ equation

We compare the DD LS-ROM of Hoang et al. [2021] and the proposed DD NM-ROM with HR for the 2D steady-state Burgers equation. The DD LS-ROM can be regarded as a specific instance within the realm of DD NM-ROMs, where the encoders and decoders defined in Equation (4) are exchanged for linear operators derived through singular value decomposition. We compute the relative error as

e=(1nΩi=1nΩ(𝒙iΩ𝒈iΩ(𝒙^iΩ)22+𝒙iΓ𝒈iΓ(𝒙^iΓ)22)/(𝒙iΩ22+𝒙iΓ22))1/2.𝑒superscript1subscript𝑛Ωsuperscriptsubscript𝑖1subscript𝑛Ωsuperscriptsubscriptnormsuperscriptsubscript𝒙𝑖Ωsuperscriptsubscript𝒈𝑖Ωsuperscriptsubscript^𝒙𝑖Ω22superscriptsubscriptnormsuperscriptsubscript𝒙𝑖Γsuperscriptsubscript𝒈𝑖Γsuperscriptsubscript^𝒙𝑖Γ22superscriptsubscriptnormsuperscriptsubscript𝒙𝑖Ω22superscriptsubscriptnormsuperscriptsubscript𝒙𝑖Γ2212e=\left(\frac{1}{n_{\Omega}}\sum_{i=1}^{n_{\Omega}}\Big{(}\left\|\boldsymbol{x% }_{i}^{\Omega}-\boldsymbol{g}_{i}^{\Omega}(\widehat{\boldsymbol{x}}_{i}^{% \Omega})\right\|_{2}^{2}+\left\|\boldsymbol{x}_{i}^{\Gamma}-\boldsymbol{g}_{i}% ^{\Gamma}(\widehat{\boldsymbol{x}}_{i}^{\Gamma})\right\|_{2}^{2}\Big{)}/\Big{(% }\left\|\boldsymbol{x}_{i}^{\Omega}\right\|_{2}^{2}+\left\|\boldsymbol{x}_{i}^% {\Gamma}\right\|_{2}^{2}\Big{)}\right)^{1/2}.italic_e = ( divide start_ARG 1 end_ARG start_ARG italic_n start_POSTSUBSCRIPT roman_Ω end_POSTSUBSCRIPT end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT roman_Ω end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ( ∥ bold_italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_Ω end_POSTSUPERSCRIPT - bold_italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_Ω end_POSTSUPERSCRIPT ( over^ start_ARG bold_italic_x end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_Ω end_POSTSUPERSCRIPT ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ∥ bold_italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_Γ end_POSTSUPERSCRIPT - bold_italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_Γ end_POSTSUPERSCRIPT ( over^ start_ARG bold_italic_x end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_Γ end_POSTSUPERSCRIPT ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) / ( ∥ bold_italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_Ω end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ∥ bold_italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_Γ end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) ) start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT . (5)

All training and computations were performed on the Lassen machine at Lawrence Livermore National Laboratory, which consists of an IBM Power9 processor with NVIDIA V100 (Volta) GPUs, clock speed between 2.3-3.8 GHz, and 256 GB DDR4 memory. The code can be found at https://anonymous.4open.science/r/DDNMROM_NeurIPS-4160/.

The implementation was done sequentially, but to highlight potential advantages of a parallel implementation, the reported wall clock time for computing subdomain-specific quantities for the SQP solver is taken to be the largest wall clock time incurred among all subdomains. The wall clock time for the remaining steps of the SQP solver is set to the overall wall clock time.

We consider the 2D steady-state Burgers’ equation

uux+vuy𝑢𝑢𝑥𝑣𝑢𝑦\displaystyle u\frac{\partial u}{\partial x}+v\frac{\partial u}{\partial y}italic_u divide start_ARG ∂ italic_u end_ARG start_ARG ∂ italic_x end_ARG + italic_v divide start_ARG ∂ italic_u end_ARG start_ARG ∂ italic_y end_ARG =ν(2ux2+2uy2),absent𝜈superscript2𝑢superscript𝑥2superscript2𝑢superscript𝑦2\displaystyle=\nu\left(\frac{\partial^{2}u}{\partial x^{2}}+\frac{\partial^{2}% u}{\partial y^{2}}\right),= italic_ν ( divide start_ARG ∂ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_u end_ARG start_ARG ∂ italic_x start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG + divide start_ARG ∂ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_u end_ARG start_ARG ∂ italic_y start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ) , uvx+vvy𝑢𝑣𝑥𝑣𝑣𝑦\displaystyle u\frac{\partial v}{\partial x}+v\frac{\partial v}{\partial y}italic_u divide start_ARG ∂ italic_v end_ARG start_ARG ∂ italic_x end_ARG + italic_v divide start_ARG ∂ italic_v end_ARG start_ARG ∂ italic_y end_ARG =ν(2vx2+2vy2)absent𝜈superscript2𝑣superscript𝑥2superscript2𝑣superscript𝑦2\displaystyle=\nu\left(\frac{\partial^{2}v}{\partial x^{2}}+\frac{\partial^{2}% v}{\partial y^{2}}\right)= italic_ν ( divide start_ARG ∂ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_v end_ARG start_ARG ∂ italic_x start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG + divide start_ARG ∂ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_v end_ARG start_ARG ∂ italic_y start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ) (6)

for (x,y)[1,1]×[0,0.05]𝑥𝑦1100.05(x,y)\in[-1,1]\times[0,0.05]( italic_x , italic_y ) ∈ [ - 1 , 1 ] × [ 0 , 0.05 ] with viscosity ν=0.1𝜈0.1\nu=0.1italic_ν = 0.1. As in Hoang et al. [2021], we use the exact solution uex=2νxψ/ψsubscript𝑢𝑒𝑥2𝜈𝑥𝜓𝜓u_{ex}=-2\nu\frac{\partial}{\partial x}\psi\,/\psiitalic_u start_POSTSUBSCRIPT italic_e italic_x end_POSTSUBSCRIPT = - 2 italic_ν divide start_ARG ∂ end_ARG start_ARG ∂ italic_x end_ARG italic_ψ / italic_ψ,  vex=2νyψ/ψsubscript𝑣𝑒𝑥2𝜈𝑦𝜓𝜓v_{ex}=-2\nu\frac{\partial}{\partial y}\psi\,/\psiitalic_v start_POSTSUBSCRIPT italic_e italic_x end_POSTSUBSCRIPT = - 2 italic_ν divide start_ARG ∂ end_ARG start_ARG ∂ italic_y end_ARG italic_ψ / italic_ψ, where ψ(x,y;a,λ)=a(1+x)+(eλ(x1)+eλ(x1))cos(λy)𝜓𝑥𝑦𝑎𝜆𝑎1𝑥superscript𝑒𝜆𝑥1superscript𝑒𝜆𝑥1𝜆𝑦\psi(x,y;a,\lambda)=a(1+x)+\left(e^{\lambda(x-1)}+e^{-\lambda(x-1)}\right)\cos% (\lambda y)italic_ψ ( italic_x , italic_y ; italic_a , italic_λ ) = italic_a ( 1 + italic_x ) + ( italic_e start_POSTSUPERSCRIPT italic_λ ( italic_x - 1 ) end_POSTSUPERSCRIPT + italic_e start_POSTSUPERSCRIPT - italic_λ ( italic_x - 1 ) end_POSTSUPERSCRIPT ) roman_cos ( italic_λ italic_y ) and (a,λ)𝑎𝜆(a,\lambda)( italic_a , italic_λ ) are parameters, and its restriction to the boundary as Dirichlet boundary conditions. The PDE is discretized using centered finite differences with with 482482482482 uniformly spaced grid points in the x𝑥xitalic_x-direction and 26262626 uniformly spaced grid points in the y𝑦yitalic_y-direction. For ROM training, we collected 6400640064006400 FOM snapshots corresponding to varying (a,λ)[1,104]×[5,25]𝑎𝜆1superscript104525(a,\lambda)\in[1,10^{4}]\times[5,25]( italic_a , italic_λ ) ∈ [ 1 , 10 start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT ] × [ 5 , 25 ] (see Fig. 1) in a uniform 80×80808080\times 8080 × 80 grid. We use ROMs to predict the out-of-sample case (a,λ)=(7692.5384,21.9230)𝑎𝜆7692.538421.9230(a,\lambda)=(7692.5384,21.9230)( italic_a , italic_λ ) = ( 7692.5384 , 21.9230 ).

Refer to caption
Refer to caption
(a) (a,λ)=(1,25)𝑎𝜆125(a,\lambda)=(1,25)( italic_a , italic_λ ) = ( 1 , 25 ).
Refer to caption
Refer to caption
(b) (a,λ)=(104,5)𝑎𝜆superscript1045(a,\lambda)=(10^{4},5)( italic_a , italic_λ ) = ( 10 start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT , 5 ).
Figure 1: FOM u𝑢uitalic_u and v𝑣vitalic_v components for different (a,λ)𝑎𝜆(a,\lambda)( italic_a , italic_λ ). The distance of the shock from the left boundary and its steepness are determined by a𝑎aitalic_a and λ𝜆\lambdaitalic_λ, respectively.

First we use DD problem with 4444 uniformly sized subdomains in a 2×2222\times 22 × 2 configuration and vary the ROM sizes niΩsuperscriptsubscript𝑛𝑖Ωn_{i}^{\Omega}italic_n start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_Ω end_POSTSUPERSCRIPT and niΓ.superscriptsubscript𝑛𝑖Γn_{i}^{\Gamma}.italic_n start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_Γ end_POSTSUPERSCRIPT . Table 1 shows that NM-ROM has an order of magnitude lower error than LS-ROM with and without HR when comparing ROMs of the same size. In the non-HR case, LS-ROM only achieves order 103superscript10310^{-3}10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT error for a ROM with 96969696 total DoF (error = 2.66×1032.66superscript1032.66\times 10^{-3}2.66 × 10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT), while NM-ROM can achieve a similar error with only 36363636 DoF (error = 2.42×1032.42superscript1032.42\times 10^{-3}2.42 × 10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT) and a higher speedup (speedup = 26.226.226.226.2) compared to LS-ROM with similar accuracy (speedup = 18.318.318.318.3). LS-ROM achieves a much higher speedup in the HR cases while retaining similar errors from the non-HR cases. NM-ROM also retains high accuracy after HR, and gains an extra 15151515-20202020 times speedup after applying HR.

niΩsuperscriptsubscript𝑛𝑖Ωn_{i}^{\Omega}italic_n start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_Ω end_POSTSUPERSCRIPT niΓsuperscriptsubscript𝑛𝑖Γn_{i}^{\Gamma}italic_n start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_Γ end_POSTSUPERSCRIPT DoF Error Speedup Error (HR) Speedup (HR)
LS-ROM 6666 3333 36363636 2.06×1022.06superscript1022.06\times 10^{-2}2.06 × 10 start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT 48.748.748.748.7 1.78×1021.78superscript1021.78\times 10^{-2}1.78 × 10 start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT 340.0340.0340.0340.0
8888 4444 48484848 1.98×1021.98superscript1021.98\times 10^{-2}1.98 × 10 start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT 30.030.030.030.0 1.44×1021.44superscript1021.44\times 10^{-2}1.44 × 10 start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT 347.6347.6347.6347.6
10101010 5555 60606060 1.50×1021.50superscript1021.50\times 10^{-2}1.50 × 10 start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT 16.316.316.316.3 1.16×1021.16superscript1021.16\times 10^{-2}1.16 × 10 start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT 329.6329.6329.6329.6
16161616 8888 96969696 2.66×1032.66superscript1032.66\times 10^{-3}2.66 × 10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT 18.318.318.318.3 3.23×1033.23superscript1033.23\times 10^{-3}3.23 × 10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT 280.4280.4280.4280.4
NM-ROM 6666 3333 36363636 2.42×1032.42superscript1032.42\times 10^{-3}2.42 × 10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT 26.226.226.226.2 2.60×1032.60superscript1032.60\times 10^{-3}2.60 × 10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT 44.744.744.744.7
8888 4444 48484848 1.28×1031.28superscript1031.28\times 10^{-3}1.28 × 10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT 21.721.721.721.7 1.64×1031.64superscript1031.64\times 10^{-3}1.64 × 10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT 43.943.943.943.9
10101010 5555 60606060 1.09×1031.09superscript1031.09\times 10^{-3}1.09 × 10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT 15.015.015.015.0 1.19×1031.19superscript1031.19\times 10^{-3}1.19 × 10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT 43.643.643.643.6
16161616 8888 96969696 7.87×1047.87superscript1047.87\times 10^{-4}7.87 × 10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT 13.913.913.913.9 9.80×1049.80superscript1049.80\times 10^{-4}9.80 × 10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT 37.537.537.537.5
Table 1: Relative error and speedup for LS-ROM and NM-ROM with and without HR for varying ROM size. We use NiB=100superscriptsubscript𝑁𝑖𝐵100N_{i}^{B}=100italic_N start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_B end_POSTSUPERSCRIPT = 100 HR nodes per subdomain in the HR case.

Next we examine the per-subdomain reduction in the required number of autoencoder parameters for different subdomain configurations compared to the monolothic single-domain NM-ROM. We use the notation 2×1212\times 12 × 1 subdomains to indicate 2222 subdomains in the x𝑥xitalic_x-direction and 1111 subdomain in the y𝑦yitalic_y-direction. As expected, from Table 2, we see that the maximum number of NN parameters per subdomain decreases significantly as more subdomains are used. Furthermore, the total number of NN parameters in the DD cases also decreases relative to the single-domain case. We also note that the error increases as more subdomains are used. We kept the ROM size (niΩ,niΓ)=(6,3)superscriptsubscript𝑛𝑖Ωsuperscriptsubscript𝑛𝑖Γ63(n_{i}^{\Omega},n_{i}^{\Gamma})=(6,3)( italic_n start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_Ω end_POSTSUPERSCRIPT , italic_n start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_Γ end_POSTSUPERSCRIPT ) = ( 6 , 3 ) constant for each subdomain configuration to isolate the effect of DD on the number of NN parameters, but this may cause overfitting in the 16161616 subdomain case. More careful hyper-parameter tuning is necessary to mitigate increases in error as the number of subdomains is increased.

Subdomains Max # subdomain params. Reduction Total # params. Error
1×1111\times 11 × 1 2.995×1062.995superscript1062.995\times 10^{6}2.995 × 10 start_POSTSUPERSCRIPT 6 end_POSTSUPERSCRIPT 0.00.00.00.0 % 2.995×1062.995superscript1062.995\times 10^{6}2.995 × 10 start_POSTSUPERSCRIPT 6 end_POSTSUPERSCRIPT 1.08×1031.08superscript1031.08\times 10^{-3}1.08 × 10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT
2×1212\times 12 × 1 1.147×1061.147superscript1061.147\times 10^{6}1.147 × 10 start_POSTSUPERSCRIPT 6 end_POSTSUPERSCRIPT 61.761.761.761.7 % 2.307×1062.307superscript1062.307\times 10^{6}2.307 × 10 start_POSTSUPERSCRIPT 6 end_POSTSUPERSCRIPT 1.27×1031.27superscript1031.27\times 10^{-3}1.27 × 10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT
2×2222\times 22 × 2 5.257×1055.257superscript1055.257\times 10^{5}5.257 × 10 start_POSTSUPERSCRIPT 5 end_POSTSUPERSCRIPT 82.482.482.482.4 % 2.384×1062.384superscript1062.384\times 10^{6}2.384 × 10 start_POSTSUPERSCRIPT 6 end_POSTSUPERSCRIPT 2.42×1032.42superscript1032.42\times 10^{-3}2.42 × 10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT
4×2424\times 24 × 2 2.617×1052.617superscript1052.617\times 10^{5}2.617 × 10 start_POSTSUPERSCRIPT 5 end_POSTSUPERSCRIPT 91.391.391.391.3 % 2.391×1062.391superscript1062.391\times 10^{6}2.391 × 10 start_POSTSUPERSCRIPT 6 end_POSTSUPERSCRIPT 4.26×1034.26superscript1034.26\times 10^{-3}4.26 × 10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT
8×2828\times 28 × 2 1.297×1051.297superscript1051.297\times 10^{5}1.297 × 10 start_POSTSUPERSCRIPT 5 end_POSTSUPERSCRIPT 95.795.795.795.7 % 2.406×1062.406superscript1062.406\times 10^{6}2.406 × 10 start_POSTSUPERSCRIPT 6 end_POSTSUPERSCRIPT 4.58×1024.58superscript1024.58\times 10^{-2}4.58 × 10 start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT
Table 2: Max number of NN parameters per subdomain, the per-subdomain reduction in number of NN parameters, the total number of parameters, and the corresponding error for different subdomain configurations. For the single-domain case, an NM-ROM of dimension n=9𝑛9n=9italic_n = 9 is used. For the DD cases, (niΩ,niΓ)=(6,3)superscriptsubscript𝑛𝑖Ωsuperscriptsubscript𝑛𝑖Γ63(n_{i}^{\Omega},n_{i}^{\Gamma})=(6,3)( italic_n start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_Ω end_POSTSUPERSCRIPT , italic_n start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_Γ end_POSTSUPERSCRIPT ) = ( 6 , 3 ), resulting in 9999 DoF per subdomain. HR was not used to evaluate the NM-ROMs in these examples.

5 Conclusion

We extended the DD framework of Hoang et al. [2021] and compute ROMs using NM-ROM with HR as presented in Kim et al. [2022]. Our experiments on the 2D Burgers’ equation show that NM-ROM achieves an order of magnitude lower relative error than LS-ROM in nearly all cases tested. While LS-ROM with HR achieves much higher speedup than NM-ROM with HR, NM-ROM is still the clear winner in terms of ROM accuracy for a given ROM size. Moreover, HR allows NM-ROM to gain an extra 15151515-20202020 time speedup compared to the non-HR cases. While the speedup is not as drastic as for LS-ROM, these speedup gains for NM-ROM are the highest that have been achieved for NM-ROM to our knowledge. We also showed that using the DD approach significantly decreases the number of required NN parameters per subdomain compared to the monolithic single-domain NM-ROM. In future work, we plan to apply DD NM-ROM to more challenging problems, including those with slowly decaying Kolmogorov n𝑛nitalic_n-width and to time-dependent problems. Other directions for future research include a greedy sampling strategy when choosing which FOM snapshots to compute for NM-ROM training and applying the DD NM-ROM framework to decomposable or component-based systems.

Acknowledgments and Disclosure of Funding

This work was performed at Lawrence Livermore National Laboratory. A. N. Diaz was supported for this work by a Defense Science and Technology Internship (DSTI) at Lawrence Livermore National Laboratory and a 2021 National Defense Science and Engineering Graduate Fellowship. Y. Choi was supported for this work by the U.S. Department of Energy, Office of Science, Office of Advanced Scientific Computing Research, as part of the CHaRMNET Mathematical Multifaceted Integrated Capability Center (MMICC) program, under Award Number DE-SC0023164 and partially by LDRD (21-SI-006). M. Heinkenschloss was supported by AFOSR Grant FA9550-22-1-0004 at Rice University. Lawrence Livermore National Laboratory is operated by Lawrence Livermore National Security, LLC, for the U.S. Department of Energy, National Nuclear Security Administration under Contract DE-AC52-07NA27344. IM review: LLNL-CONF-854737.

References