Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Solving partial differential equations
with sampled neural networks

Chinmay Datar1, Taniya Kapoor2, Abhishek Chandra3, Qing Sun1, Iryna Burak1,
Erik Lien Bolager1, Anna Veselovska1, Massimo Fornasier1, and Felix Dietrich1

1
School of Computation, Information and Technology
Technical University of Munich
Garching, Germany
&2Department of Engineering Structures
Delft University of Technology
Delft, The Netherlands
&3Department of Electrical Engineering
Eindhoven University of Technology
Eindhoven, The Netherlands
Corresponding author, felix.dietrich@tum.de.
Abstract

Approximation of solutions to partial differential equations (PDE) is an important problem in computational science and engineering. Using neural networks as an ansatz for the solution has proven a challenge in terms of training time and approximation accuracy. In this contribution, we discuss how sampling the hidden weights and biases of the ansatz network from data-agnostic and data-dependent probability distributions allows us to progress on both challenges. In most examples, the random sampling schemes outperform iterative, gradient-based optimization of physics-informed neural networks regarding training time and accuracy by several orders of magnitude. For time-dependent PDE, we construct neural basis functions only in the spatial domain and then solve the associated ordinary differential equation with classical methods from scientific computing over a long time horizon. This alleviates one of the greatest challenges for neural PDE solvers because it does not require us to parameterize the solution in time. For second-order elliptic PDE in Barron spaces, we prove the existence of sampled networks with L2superscript𝐿2L^{2}italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT convergence to the solution. We demonstrate our approach on several time-dependent and static PDEs. We also illustrate how sampled networks can effectively solve inverse problems in this setting. Benefits compared to common numerical schemes include spectral convergence and mesh-free construction of basis functions.

1 Introduction

Approximation of solutions to partial differential equations (PDE) is an important problem in computational science and engineering. Many numerical solution methods using neural networks involve the parametrization of the solution over the entire space and time domain, followed by an expensive hyperparameter search over network architectures and training parameters. In this paper, we demonstrate that sampling a specific, data-dependent probability distribution for the weights of neural networks allows us to solve PDE by using individual neurons as basis functions (cf. Figure 1).

Refer to caption Refer to caption
PDE: Δu=fΔ𝑢𝑓-\Delta u=f- roman_Δ italic_u = italic_f
Neural basis: ϕ1,ϕ2,ϕ3,subscriptitalic-ϕ1subscriptitalic-ϕ2subscriptitalic-ϕ3{\color[rgb]{0,0.39453125,0.7421875}\definecolor[named]{pgfstrokecolor}{rgb}{% 0,0.39453125,0.7421875}\phi_{1}},{\color[rgb]{0.890625,0.4453125,0.1328125}% \definecolor[named]{pgfstrokecolor}{rgb}{0.890625,0.4453125,0.1328125}\phi_{2}% },{\color[rgb]{0.63671875,0.6796875,0}\definecolor[named]{pgfstrokecolor}{rgb}% {0.63671875,0.6796875,0}\phi_{3}},\dotsitalic_ϕ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_ϕ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_ϕ start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT , …
Ansatz: u^=kckϕk^𝑢subscript𝑘subscript𝑐𝑘subscriptitalic-ϕ𝑘\hat{u}=\sum_{k}c_{k}\phi_{k}over^ start_ARG italic_u end_ARG = ∑ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_ϕ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT
Approximation: kckΔϕkfsubscript𝑘subscript𝑐𝑘Δsubscriptitalic-ϕ𝑘𝑓-\sum_{k}c_{k}\Delta\phi_{k}\approx f- ∑ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT roman_Δ italic_ϕ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ≈ italic_f
Figure 1: Solving a Poisson PDE with a neural network ansatz. We propose to sample random neural basis functions ϕksubscriptitalic-ϕ𝑘\phi_{k}italic_ϕ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT on the given domain, and then proceed to solve for their coefficients cksubscript𝑐𝑘c_{k}italic_c start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT.

We identify three problem settings that neural PDE solvers deal with and propose algorithms and examples that leverage random sampling of internal weights to address these. We compare the results of our sampled networks with the Physics-informed neural networks (PINNs) and discuss our method’s benefits and drawbacks compared to Isogeometric Analysis in the Finite Element Method framework (IGA-FEM).

Linear, static PDEs on complicated geometries present a challenge for mesh-based methods. Our sampling approach is particularly effective because it is mesh-free, and the linear operator of the PDE must only be applied to the sampled and fixed basis. Consequently, only a single linear system must be solved to construct the remaining weights in the network’s final layer to compute the solution.

Time-dependent problems present a challenge for recently developed neural PDE solvers, especially when the solution changes rapidly. Many solvers treat the temporal dimension as just another space dimension, meaning the neural network solution must capture all variations in time and space. Our sampling approach allows us to use the separation of variables to split the problem into a space- and a time-dependent part. The neurons we sample only capture the spatial variability, while the temporal variations are resolved through the solution of an associated ordinary differential equation. This allows us to integrate over exceedingly long time horizons with high accuracy.

Inverse problems present an even greater challenge for neural PDE solvers that are fully trained with back-propagation, because multiple neural networks must be trained jointly to compute the solution and the coefficients in the PDE. We demonstrate that randomly selecting internal weights simplifies much of the joint optimization challenge, resulting in optimization problems that are either low-dimensional or bi-linear.

2 Related Work

Many classical methods to solve PDE exist, together with theoretical guarantees and approximations with high accuracy. Isogeometric analysis (IGA) is one such method in which spline-based basis functions are defined over a structured grid (cf. Hughes et al. [33], Cottrell et al. [11, 12]) and is designed to integrate computer-aided design (CAD) and finite element analysis (FEA). Here, we will compare the results, benefits, and disadvantages of the neural PDE solvers to classical methods like IGA. Despite their solid theoretical grounding, classical mesh-based methods often entail a time-consuming setup phase, especially when mesh generation is necessary. Moreover, user-friendly software is often not readily available. Given the challenges posed by the mesh-based methods, there has been a substantial development of meshfree methods, including those based on radial basis functions (RBFs) (cf. Powell [54], Chen et al. [7]), moving least squares (MLS) (cf. Shepard [64], Lancaster and Salkauskas [40]), and smoothed particle hydrodynamics (SPH) (cf. Lucy [47], Gingold and Monaghan [28], Shadloo et al. [61]).

Neural PDE solvers offer an attractive framework to approximate solutions of PDEs because of the high expressivity of neural networks (cf. Rudi and Rosasco [60]), their ability to represent functions in high dimensions (cf. E [24], Wu and Long [71]), and powerful software for automatic differentiation (e.g., Pytorch cf. [51], TensorFlow cf. [1], and specialized software like DeepXDE cf. [46]). Earlier work on solving PDEs through neural networks (cf. Dissanayake and Phan-Thien [16], Lagaris et al. [39]) was recently popularized in the form of Physics-informed neural networks (PINNs) (cf. Raissi et al. [56]) and neural operators (cf. Lu et al. [45], Li et al. [42], Raonic et al. [58]), and others (cf. Han et al. [30], Sirignano and Spiliopoulos [65]). There has been a lot of development in accelerating neural PDE solvers involving variants of PINNs (cf  Cho et al. [10],  Meng et al. [48]), Sharma and Shankar [63], and Chiu et al. [9]), methods based on hash-encoding (cf  Huang and Alkhalifah [32], Wang et al. [68]) and transfer learning (cf  Kapoor et al. [37])). However, these methods rely on backpropagation and necessitate computing the derivatives of the given PDE for each forward pass, rendering them computationally intensive, even for simple PDEs, and often resulting in limited approximation accuracy. Moreover, despite the recent extensions of PINNs and Deep Operator Networks (DeepONets) (cf. Kim et al. [38], Kapoor et al. [34, 36], Ren et al. [59], Rao et al. [57], Zhu et al. [73]), extrapolating in time and training these networks over large space-time domains (without using multiple neural networks in space-time) poses a big challenge.

Spectral methods for solving PDEs promise fast convergence with a small number of basis functions. Meuris et al. [49] presents a machine learning-based spectral method in which hierarchical spatial basis functions are extracted from a trained DeepONet and employed in a spectral method to solve the PDE. Xia et al. [72] integrate adaptive techniques for spectral methods into PINN-based PDE solvers to obtain numerical solutions of unbounded domain problems that standard PINNs cannot efficiently approximate. Lange et al. [41] propose spectral methods that fit linear and nonlinear oscillators to data and facilitate long-term forecasting of temporal signals. Dresdner et al. [20] demonstrate ML-augmented spectral solvers that provide sub-grid corrections to classical spectral methods and improve the accuracy of spectral-only methods. Recently, Du et al. [21] proposed an approach using fixed orthogonal bases to learn PDE solutions as mappings between spectral coefficients and introduce a new training strategy based on spectral loss. These methods differ from ours in problem setting, architecture, and training.

Randomized neural networks for solving PDEs have been studied, mostly combining Extreme Learning Machines (ELMs) with the self-supervised setting of PINNs (cf. Chen et al. [6], Wang and Dong [70], Shang and Wang [62], Sun et al. [66]). Dwivedi and Srinivasan [22] propose a physics-informed extreme learning machine (PIELM) to efficiently solve linear PDEs, while Calabrò et al. [5], Galaris et al. [27] employ ELMs to learn invariant manifolds as well as PDE from data. Dong and Yang [19] establish that given a fixed computational budget, ELMs achieve substantially higher accuracy compared to classical second-order FEM and slightly higher accuracy compared to higher-order FEM. For static, nonlinear PDEs, ELMs can be used together with nonlinear optimization schemes (cf. Fabiani et al. [26]). On larger spatiotemporal domains, Dong and Li [17] and Dwivedi et al. [23] propose using multiple distributed ELMs on multiple subdomains. These approaches for solving time-dependent PDEs, treat time as an extra dimension in space, and neural basis functions span (a part of) the space-time domain. Moreover, our approach is significantly different as we will discuss in Section 4.

Inverse problems pose a challenge for iterative training methods because the loss landscape is typically more complicated to navigate. Owing to the problems associated with overfitting to noisy data and high sensitivity of accuracy with respect to the number of neurons in PIELM, Liu et al. [44] propose a Bayesian physics-informed extreme learning machine (BPIELM) to solve both forward and inverse linear PDE problems with noisy data. Dwivedi et al. [23] use distributed PINNs to solve inverse problems for predicting scalar coefficients of PDEs. Miao and Chen [50] propose VC-PINN specifically designed for PDE problems with variable coefficients, and Herrmann et al. [31] discuss PINNs for full-waveform inversion problems. Dong and Wang [18] use ELMs to solve inverse PDE problems involving predictions of scalar coefficients in PDEs and space-varying coefficients as functions of space. In 5.2, we show that many of these inverse problems are solved in a setting that makes them more challenging than they really are.

3 Mathematical framework

We discuss solution methods for PDEs on domains ΩdΩsuperscript𝑑\Omega\in\mathbb{R}^{d}roman_Ω ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT with boundary ΩΩ\partial\Omega∂ roman_Ω. We mostly address linear PDEs with solutions u:Ω×:𝑢Ωu:\Omega\times\mathbb{R}\to\mathbb{R}italic_u : roman_Ω × blackboard_R → blackboard_R. These PDEs are defined by linear operators L𝐿Litalic_L and B𝐵Bitalic_B that only involve derivative operators in space, and functions f:Ω:𝑓Ωf:\Omega\to\mathbb{R}italic_f : roman_Ω → blackboard_R, g:Ω:𝑔Ωg:\partial\Omega\to\mathbb{R}italic_g : ∂ roman_Ω → blackboard_R that define forcing and boundary conditions. For nonlinear PDEs, we denote N𝑁Nitalic_N as a nonlinear operator, and its scaling λ0𝜆0\lambda\geq 0italic_λ ≥ 0 is either zero (for linear PDEs) or positive (for nonlinear PDEs). Then,

ut(x,t)+Lu(x,t)+λN(u)(x,t)subscript𝑢𝑡𝑥𝑡𝐿𝑢𝑥𝑡𝜆𝑁𝑢𝑥𝑡\displaystyle u_{t}(x,t)+Lu(x,t)+\lambda N(u)(x,t)italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_x , italic_t ) + italic_L italic_u ( italic_x , italic_t ) + italic_λ italic_N ( italic_u ) ( italic_x , italic_t ) =\displaystyle== f(x),xΩ,t[0,T],formulae-sequence𝑓𝑥𝑥Ω𝑡0𝑇\displaystyle f(x),\ x\in\Omega,\ t\in[0,T],italic_f ( italic_x ) , italic_x ∈ roman_Ω , italic_t ∈ [ 0 , italic_T ] , (1)
Bu(x,t)𝐵𝑢𝑥𝑡\displaystyle Bu(x,t)italic_B italic_u ( italic_x , italic_t ) =\displaystyle== g(x),xΩ,𝑔𝑥𝑥Ω\displaystyle g(x),\ x\in\partial\Omega,italic_g ( italic_x ) , italic_x ∈ ∂ roman_Ω , (2)

where we denote by utsubscript𝑢𝑡u_{t}italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT the first derivative of u𝑢uitalic_u by time.

3.1 Neural network ansatz

We parameterize the approximation of a solution with a neural network with one hidden layer, activation function σ=tanh𝜎\sigma=\tanhitalic_σ = roman_tanh, and m𝑚mitalic_m neurons so that

u^(x,t)=c(t)Aσ(Wx+b)+c0(t).^𝑢𝑥𝑡𝑐superscript𝑡top𝐴𝜎𝑊𝑥𝑏subscript𝑐0𝑡\hat{u}(x,t)=c(t)^{\top}A\sigma(Wx+b)+c_{0}(t).over^ start_ARG italic_u end_ARG ( italic_x , italic_t ) = italic_c ( italic_t ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_A italic_σ ( italic_W italic_x + italic_b ) + italic_c start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_t ) . (3)

The parameters of the network are split into time-dependent and time-independent ones, c(t)m𝑐𝑡superscript𝑚c(t)\in\mathbb{R}^{m}italic_c ( italic_t ) ∈ blackboard_R start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT, c0(t)subscript𝑐0𝑡c_{0}(t)\in\mathbb{R}italic_c start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_t ) ∈ blackboard_R and Am×m𝐴superscript𝑚𝑚A\in\mathbb{R}^{m\times m}italic_A ∈ blackboard_R start_POSTSUPERSCRIPT italic_m × italic_m end_POSTSUPERSCRIPT, Wm×d𝑊superscript𝑚𝑑W\in\mathbb{R}^{m\times d}italic_W ∈ blackboard_R start_POSTSUPERSCRIPT italic_m × italic_d end_POSTSUPERSCRIPT, bm𝑏superscript𝑚b\in\mathbb{R}^{m}italic_b ∈ blackboard_R start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT. We will distinguish between two approaches with two different weight spaces for the hidden layer. For the extreme learning machine (ELM) framework, the weight and bias space is the full space m×d×[η,η]superscript𝑚𝑑𝜂𝜂\mathbb{R}^{m\times d}\times[-\eta,\eta]blackboard_R start_POSTSUPERSCRIPT italic_m × italic_d end_POSTSUPERSCRIPT × [ - italic_η , italic_η ], where η𝜂\etaitalic_η is sufficiently large. The second approach, the sampling where it matters (SWIM) framework, follows Bolager et al. [4] and restricts the weight space to Ω×ΩΩΩ\Omega\times\Omegaroman_Ω × roman_Ω. We construct each weight and bias pair wk,bksubscript𝑤𝑘subscript𝑏𝑘w_{k},b_{k}italic_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_b start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT by taking two points x(1),x(2)Ωsuperscript𝑥1superscript𝑥2Ωx^{(1)},x^{(2)}\in\Omegaitalic_x start_POSTSUPERSCRIPT ( 1 ) end_POSTSUPERSCRIPT , italic_x start_POSTSUPERSCRIPT ( 2 ) end_POSTSUPERSCRIPT ∈ roman_Ω and construct the weight and bias as

wk=s1x(2)x(1)x(2)x(1)2,bk=wk,x(1)+s2,formulae-sequencesubscript𝑤𝑘subscript𝑠1superscript𝑥2superscript𝑥1superscriptdelimited-∥∥superscript𝑥2superscript𝑥12subscript𝑏𝑘subscript𝑤𝑘superscript𝑥1subscript𝑠2\displaystyle w_{k}=s_{1}\frac{x^{(2)}-x^{(1)}}{\lVert x^{(2)}-x^{(1)}\rVert^{% 2}},\quad b_{k}=-\langle w_{k},x^{(1)}\rangle+s_{2},italic_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT divide start_ARG italic_x start_POSTSUPERSCRIPT ( 2 ) end_POSTSUPERSCRIPT - italic_x start_POSTSUPERSCRIPT ( 1 ) end_POSTSUPERSCRIPT end_ARG start_ARG ∥ italic_x start_POSTSUPERSCRIPT ( 2 ) end_POSTSUPERSCRIPT - italic_x start_POSTSUPERSCRIPT ( 1 ) end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG , italic_b start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = - ⟨ italic_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_x start_POSTSUPERSCRIPT ( 1 ) end_POSTSUPERSCRIPT ⟩ + italic_s start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , (4)

where s1,s2subscript𝑠1subscript𝑠2s_{1},s_{2}italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_s start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT are constants dependent on the activation function. We distinguish between the two approaches by referring to neural networks constructed by SWIM or ELM. While Section 4 describes each approach in more detail, here we will show that using neural networks as an ansatz is useful, even in the restrictive SWIM setting.

3.2 Existence of solutions

We consider a family of second-order elliptic PDEs on dsuperscript𝑑\mathbb{R}^{d}blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT of the form

u=(Ψu)+ψu=f.𝑢Ψ𝑢𝜓𝑢𝑓\mathcal{L}u=-\nabla\cdot(\Psi\nabla u)+\psi u=f.caligraphic_L italic_u = - ∇ ⋅ ( roman_Ψ ∇ italic_u ) + italic_ψ italic_u = italic_f . (5)

Following Chen et al. [8], we add two assumptions on the coefficients ΨΨ\Psiroman_Ψ and ψ𝜓\psiitalic_ψ, as well as the source term f𝑓fitalic_f. The first assumption, Assumption A.1, assures the existence of a unique weak solution usuperscript𝑢u^{*}italic_u start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT. The second assumption, Assumption A.2, narrows the space of PDEs down to those where the coefficients and source term are in a Barron space. For more on what types of Ψ,ψ,fΨ𝜓𝑓\Psi,\psi,froman_Ψ , italic_ψ , italic_f are included and the definition of Barron spaces, see Appendix A.

With this in mind, the following result shows that there exists sampled (both ELM and SWIM) neural networks that can accurately approximate a weak solution of an elliptic PDE, where the SWIM setting restricts the weights and biases to the form Equation 4.

Theorem 3.1.

Consider a second-order elliptic PDE of the form (5) with the coefficients Ψ(x),ψ(x)Ψ𝑥𝜓𝑥\Psi(x),\psi(x)roman_Ψ ( italic_x ) , italic_ψ ( italic_x ) and the source term f(x)𝑓𝑥f(x)italic_f ( italic_x ) satisfying Assumption A.1 and Assumption A.2, with weak solution uH1(d)superscript𝑢superscript𝐻1superscript𝑑u^{*}\in H^{1}(\mathbb{R}^{d})italic_u start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∈ italic_H start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT ( blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT ). Let ΩdΩsuperscript𝑑\Omega\subset\mathbb{R}^{d}roman_Ω ⊂ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT be any open, bounded subset following Definition 1 in Appendix A and μ𝜇\muitalic_μ being the Lebesgue measure. Then, for any ϵ(0,12)italic-ϵ012\epsilon\in(0,\frac{1}{2})italic_ϵ ∈ ( 0 , divide start_ARG 1 end_ARG start_ARG 2 end_ARG ), there exists a neural network with one hidden layer and tanh\tanhroman_tanh activation function of the form (3), constructed either by ELM or SWIM, such that

u^uL2(Ω,μ)ϵ.subscriptdelimited-∥∥^𝑢superscript𝑢subscript𝐿2Ω𝜇italic-ϵ\lVert\hat{u}-u^{*}\rVert_{L_{2}(\Omega,\mu)}\leq\epsilon.∥ over^ start_ARG italic_u end_ARG - italic_u start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT italic_L start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( roman_Ω , italic_μ ) end_POSTSUBSCRIPT ≤ italic_ϵ .

The proof can be found in Appendix A. We continue the theoretical analysis to show that convergence in the setup of Theorem 3.1 can also be achieved when the hidden layer is sampled randomly, either by data-agnostic (ELM) or data-driven methods (SWIM) and then fixed, so that only the outer weights can vary. Let ρ~~𝜌\widetilde{\rho}over~ start_ARG italic_ρ end_ARG be one of the three distributions described in Section 4.1, with η𝜂\etaitalic_η set sufficiently large.

Theorem 3.2.

With the assumptions and setup of Theorem 3.1, let the sampling method be either ELM or SWIM with a distribution ρ~~𝜌\widetilde{\rho}over~ start_ARG italic_ρ end_ARG, and let δ(0,1]𝛿01\delta\in(0,1]italic_δ ∈ ( 0 , 1 ]. Then, for any ϵ(0,12)italic-ϵ012\epsilon\in(0,\frac{1}{2})italic_ϵ ∈ ( 0 , divide start_ARG 1 end_ARG start_ARG 2 end_ARG ), there exists a K𝐾K\in\mathbb{N}italic_K ∈ blackboard_N such that after sampling K𝐾Kitalic_K pairs of weights WK×d𝑊superscript𝐾𝑑W\in\mathbb{R}^{K\times d}italic_W ∈ blackboard_R start_POSTSUPERSCRIPT italic_K × italic_d end_POSTSUPERSCRIPT and biases bK𝑏superscript𝐾b\in\mathbb{R}^{K}italic_b ∈ blackboard_R start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT i.i.d. from ρ~~𝜌\widetilde{\rho}over~ start_ARG italic_ρ end_ARG, there exist outer weights cK𝑐superscript𝐾c\in\mathbb{R}^{K}italic_c ∈ blackboard_R start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT such that with probability 1δ1𝛿1-\delta1 - italic_δ the following bound holds:

u^uL2(Ω,μ)=cσ(W+b)uL2(Ω,μ)ϵ.\lVert\hat{u}-u^{*}\rVert_{L_{2}(\Omega,\mu)}=\lVert c^{\top}\sigma(W\cdot+b)-% u^{*}\rVert_{L^{2}{(\Omega,\mu)}}\leq\epsilon.∥ over^ start_ARG italic_u end_ARG - italic_u start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT italic_L start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( roman_Ω , italic_μ ) end_POSTSUBSCRIPT = ∥ italic_c start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_σ ( italic_W ⋅ + italic_b ) - italic_u start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( roman_Ω , italic_μ ) end_POSTSUBSCRIPT ≤ italic_ϵ .

The proof is in Section A.5, and relates the number of neurons K𝐾Kitalic_K to the accuracy ϵitalic-ϵ\epsilonitalic_ϵ and the probability 1δ1𝛿1-\delta1 - italic_δ.

4 Approximation of PDE solutions using the neural network ansatz

As hinted at in the previous section, the neural network ansatz is very flexible in approximating the solution of the PDE. We now describe how to construct the basis functions of the ansatz in Section 4.1. We then explain how to construct more specialized basis functions that satisfy boundary conditions in Section 4.3. If the PDE we want to solve is time-dependent, we propose solving an ordinary differential equation associated with our construction of the basis, cf. Section 4.2.

4.1 Random sampling of parameters in the hidden layer

Unlike classical machine learning approaches, we do not employ gradient-descent type optimization to construct the parameters of the hidden layer. Instead, we first construct an expressive function space basis composed of neurons σ(Wx+b)𝜎𝑊𝑥𝑏\sigma(Wx+b)italic_σ ( italic_W italic_x + italic_b ) over the domain ΩΩ\Omegaroman_Ω by sampling W𝑊Witalic_W and b𝑏bitalic_b from a certain probability distribution. Here, we consider two approaches for sampling: ELM and SWIM. In ELM, we sample from the full Euclidean space with a Gaussian distribution for the weights and a uniform distribution in [η,η]𝜂𝜂[-\eta,\eta][ - italic_η , italic_η ] for the bias. The second method, SWIM, needs a distribution over Ω×ΩΩΩ\Omega\times\Omegaroman_Ω × roman_Ω. The common choices are the uniform distribution or one where the density is based on f(x(2))f(x(1))x(2)x(1)delimited-∥∥𝑓superscript𝑥2𝑓superscript𝑥1delimited-∥∥superscript𝑥2superscript𝑥1\frac{\lVert f(x^{(2)})-f(x^{(1)})\rVert}{\lVert x^{(2)}-x^{(1)}\rVert}divide start_ARG ∥ italic_f ( italic_x start_POSTSUPERSCRIPT ( 2 ) end_POSTSUPERSCRIPT ) - italic_f ( italic_x start_POSTSUPERSCRIPT ( 1 ) end_POSTSUPERSCRIPT ) ∥ end_ARG start_ARG ∥ italic_x start_POSTSUPERSCRIPT ( 2 ) end_POSTSUPERSCRIPT - italic_x start_POSTSUPERSCRIPT ( 1 ) end_POSTSUPERSCRIPT ∥ end_ARG, with f𝑓fitalic_f being the true function in a supervised setting. While ELM is a data-agnostic method, SWIM takes the data into account more directly and uses the geometry of the input space to inform which weights and biases are more useful for the approximation. This technique can be quite advantageous when sampling, as one wishes to minimize the amount of irrelevant or redundant weights and biases in a network. Empirically, it has been observed that SWIM outperforms ELM on many classical machine learning tasks [4].

4.2 Solving time-dependent PDE

For both linear and nonlinear time-dependent PDE, we can plug the ansatz (3) into the PDE (1) to formulate an ODE for the time-dependent coefficients. We denote a randomly sampled set of m𝑚mitalic_m neurons by ϕ(x):=(σ(w1x+b1),σ(w2x+b2),,σ(wmx+bm))assignitalic-ϕ𝑥𝜎superscriptsubscript𝑤1top𝑥subscript𝑏1𝜎superscriptsubscript𝑤2top𝑥subscript𝑏2𝜎superscriptsubscript𝑤𝑚top𝑥subscript𝑏𝑚\phi(x):=(\sigma(w_{1}^{\top}x+b_{1}),\sigma(w_{2}^{\top}x+b_{2}),\dots,\sigma% (w_{m}^{\top}x+b_{m}))italic_ϕ ( italic_x ) := ( italic_σ ( italic_w start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_x + italic_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) , italic_σ ( italic_w start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_x + italic_b start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) , … , italic_σ ( italic_w start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_x + italic_b start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) ). For a set of Ncsubscript𝑁𝑐N_{c}italic_N start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT collocation points assembled in the columns of a matrix Xd×Nc𝑋superscript𝑑subscript𝑁𝑐X\in\mathbb{R}^{d\times N_{c}}italic_X ∈ blackboard_R start_POSTSUPERSCRIPT italic_d × italic_N start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT end_POSTSUPERSCRIPT, we denote the pseudo-inverse by +superscript\cdot^{+}⋅ start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT, write C:=(c,c0)m+1assign𝐶superscript𝑐topsubscript𝑐0superscript𝑚1C:=(c^{\top},c_{0})\in\mathbb{R}^{m+1}italic_C := ( italic_c start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT , italic_c start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) ∈ blackboard_R start_POSTSUPERSCRIPT italic_m + 1 end_POSTSUPERSCRIPT, and Φ(X):=(Aϕ(X),1)Nc×(m+1)assignΦ𝑋superscript𝐴italic-ϕ𝑋1topsuperscriptsubscript𝑁𝑐𝑚1\Phi(X):=(A\phi(X),1)^{\top}\in\mathbb{R}^{N_{c}\times(m+1)}roman_Φ ( italic_X ) := ( italic_A italic_ϕ ( italic_X ) , 1 ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_N start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT × ( italic_m + 1 ) end_POSTSUPERSCRIPT. We then define the ODE

Ct(t)=Φ(X)+P(X,C(t)),subscript𝐶𝑡𝑡Φsuperscript𝑋𝑃𝑋𝐶𝑡C_{t}(t)=\Phi(X)^{+}P(X,C(t)),italic_C start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_t ) = roman_Φ ( italic_X ) start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT italic_P ( italic_X , italic_C ( italic_t ) ) , (6)

with the reformulation of the PDE (1) by P(X,C(t))=LΦ(X)C(t)λN(Φ(X)C(t))+f(X)𝑃𝑋𝐶𝑡𝐿Φ𝑋𝐶𝑡𝜆𝑁Φ𝑋𝐶𝑡𝑓𝑋P(X,C(t))=-L\Phi(X)C(t)-\lambda N(\Phi(X)C(t))+f(X)italic_P ( italic_X , italic_C ( italic_t ) ) = - italic_L roman_Φ ( italic_X ) italic_C ( italic_t ) - italic_λ italic_N ( roman_Φ ( italic_X ) italic_C ( italic_t ) ) + italic_f ( italic_X ). The initial condition for this ODE is given through C(0)=Φ(X)+u(X,0)𝐶0Φsuperscript𝑋𝑢𝑋0C(0)=\Phi(X)^{+}u(X,0)italic_C ( 0 ) = roman_Φ ( italic_X ) start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT italic_u ( italic_X , 0 ). The boundary conditions are satisfied automatically if we compute the matrix A𝐴Aitalic_A as described next.

4.3 Constructing outer weights to satisfy the boundary condition

We can construct the parameter matrix A𝐴Aitalic_A so that the “outer basis functions” Aϕ(x)𝐴italic-ϕ𝑥A\phi(x)italic_A italic_ϕ ( italic_x ) approximately satisfy the boundary condition, i.e., BAϕ0𝐵𝐴italic-ϕ0BA\phi\approx 0italic_B italic_A italic_ϕ ≈ 0 on ΩΩ\partial\Omega∂ roman_Ω. This step is most important for time-dependent problems because the boundary condition must be satisfied at all times. For static PDE, the coefficients cAsuperscript𝑐top𝐴c^{\top}Aitalic_c start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_A can be constructed by jointly solving for parameters that satisfy boundary conditions and the PDE. For linear, static PDE, this can be achieved through the solution of a single linear system. For nonlinear, static PDE, we would need to resort to non-linear optimization problems to find cAsuperscript𝑐top𝐴c^{\top}Aitalic_c start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_A. For time-dependent problems, the construction of A𝐴Aitalic_A depends on which outer basis functions are useful for the given boundary conditions. For periodic boundaries, we find A𝐴Aitalic_A so that Aϕ(xl)=Aϕ(xr)𝐴italic-ϕsubscript𝑥𝑙𝐴italic-ϕsubscript𝑥𝑟A\phi(x_{l})=A\phi(x_{r})italic_A italic_ϕ ( italic_x start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ) = italic_A italic_ϕ ( italic_x start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ), where xl,xrsubscript𝑥𝑙subscript𝑥𝑟x_{l},x_{r}italic_x start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT are the left and right boundary points of the spatial domain. In this paper, for xΩ𝑥Ωx\in\Omegaitalic_x ∈ roman_Ω and k=1,2,,m𝑘12𝑚k=1,2,\dots,mitalic_k = 1 , 2 , … , italic_m, we approximate [Aϕ]k(x)=sin(kx)subscriptdelimited-[]𝐴italic-ϕ𝑘𝑥𝑘𝑥[A\phi]_{k}(x)=\sin(kx)[ italic_A italic_ϕ ] start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_x ) = roman_sin ( italic_k italic_x ) (for k𝑘kitalic_k even) and [Aϕ]k(x)=cos(kx)subscriptdelimited-[]𝐴italic-ϕ𝑘𝑥𝑘𝑥[A\phi]_{k}(x)=\cos(kx)[ italic_A italic_ϕ ] start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_x ) = roman_cos ( italic_k italic_x ) (for k𝑘kitalic_k odd) and set c0(t)=0tsubscript𝑐0𝑡0for-all𝑡c_{0}(t)=0\;\forall titalic_c start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_t ) = 0 ∀ italic_t. As the ansatz is a linear combination of the basis functions, if all of them satisfy the boundary conditions, the ansatz satisfies them, too.

After the approximation of these new basis functions, we orthogonalize the basis functions to improve the condition number of the associated ODE (6). For this step, on the points XNc×d𝑋superscriptsubscript𝑁𝑐𝑑X\in\mathbb{R}^{N_{c}\times d}italic_X ∈ blackboard_R start_POSTSUPERSCRIPT italic_N start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT × italic_d end_POSTSUPERSCRIPT, we compute a truncated singular value decomposition of Aϕ(X)Nc×m𝐴italic-ϕ𝑋superscriptsubscript𝑁𝑐𝑚A\phi(X)\in\mathbb{R}^{N_{c}\times m}italic_A italic_ϕ ( italic_X ) ∈ blackboard_R start_POSTSUPERSCRIPT italic_N start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT × italic_m end_POSTSUPERSCRIPT to obtain matrices Ur,Σr,subscript𝑈𝑟subscriptΣ𝑟U_{r},\Sigma_{r},italic_U start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT , roman_Σ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT , and Vrsubscript𝑉𝑟V_{r}italic_V start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT with rm𝑟𝑚r\leq mitalic_r ≤ italic_m so that UrΣrVr=Aϕ(x)+O(Σr+1)subscript𝑈𝑟subscriptΣ𝑟superscriptsubscript𝑉𝑟top𝐴italic-ϕ𝑥𝑂subscriptΣ𝑟1U_{r}\Sigma_{r}V_{r}^{\top}=A\phi(x)+O(\Sigma_{r+1})italic_U start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT roman_Σ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT italic_V start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT = italic_A italic_ϕ ( italic_x ) + italic_O ( roman_Σ start_POSTSUBSCRIPT italic_r + 1 end_POSTSUBSCRIPT ). We then define Ar:=VrAassignsubscript𝐴𝑟superscriptsubscript𝑉𝑟top𝐴A_{r}:=V_{r}^{\top}Aitalic_A start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT := italic_V start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_A and use it instead of the matrix A𝐴Aitalic_A. This ensures Arϕ(x)subscript𝐴𝑟italic-ϕ𝑥A_{r}\phi(x)italic_A start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT italic_ϕ ( italic_x ) are orthogonal functions on the data X𝑋Xitalic_X, and the matrix Arϕ(X)subscript𝐴𝑟italic-ϕ𝑋A_{r}\phi(X)italic_A start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT italic_ϕ ( italic_X ) has bounded condition number.

For Dirichlet boundary conditions (u(x)=0𝑢𝑥0u(x)=0italic_u ( italic_x ) = 0 for xΩ𝑥Ωx\in\partial\Omegaitalic_x ∈ ∂ roman_Ω), we can employ the same technique. However, it is also possible to augment the ODE (6) with an additional equation,

u^t(x)=κu^(x)for xΩCt(t)=Φ((X,Xb))+(m+1)×(Nc+Nb)(P(X,C(t)),κΦ(Xb)C(t))Nc+Nb,subscript^𝑢𝑡𝑥𝜅^𝑢𝑥for 𝑥Ωsubscript𝐶𝑡𝑡subscriptΦsuperscript𝑋subscript𝑋𝑏absentsuperscript𝑚1subscript𝑁𝑐subscript𝑁𝑏subscriptsuperscript𝑃𝑋𝐶𝑡𝜅Φsubscript𝑋𝑏𝐶𝑡topabsentsuperscriptsubscript𝑁𝑐subscript𝑁𝑏\hat{u}_{t}(x)=-\kappa\hat{u}(x)\,\text{for }x\in\partial\Omega\,\implies\,C_{% t}(t)=\underbrace{\Phi((X,X_{b}))^{+}}_{\in\mathbb{R}^{(m+1)\times(N_{c}+N_{b}% )}}\underbrace{(P(X,C(t)),-\kappa\Phi(X_{b})C(t))^{\top}}_{\in\mathbb{R}^{N_{c% }+N_{b}}},over^ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_x ) = - italic_κ over^ start_ARG italic_u end_ARG ( italic_x ) for italic_x ∈ ∂ roman_Ω ⟹ italic_C start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_t ) = under⏟ start_ARG roman_Φ ( ( italic_X , italic_X start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT ) ) start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT end_ARG start_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT ( italic_m + 1 ) × ( italic_N start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT + italic_N start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT end_POSTSUBSCRIPT under⏟ start_ARG ( italic_P ( italic_X , italic_C ( italic_t ) ) , - italic_κ roman_Φ ( italic_X start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT ) italic_C ( italic_t ) ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT end_ARG start_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_N start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT + italic_N start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ,

where κ>0𝜅0\kappa>0italic_κ > 0 is a fixed parameter, X𝑋Xitalic_X are the Ncsubscript𝑁𝑐N_{c}italic_N start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT collocation points and Xbsubscript𝑋𝑏X_{b}italic_X start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT is a (column-wise) collection of Nbsubscript𝑁𝑏N_{b}italic_N start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT points on the boundary ΩΩ\partial\Omega∂ roman_Ω. The augmented equation will quickly force the value of the ansatz u^^𝑢\hat{u}over^ start_ARG italic_u end_ARG to zero on the boundary ΩΩ\partial\Omega∂ roman_Ω. This technique allows setting A𝐴Aitalic_A to the identity matrix.

4.4 Solving inverse problems involving PDEs

The classical inverse problem setting is given when certain parts (parameters) of the PDE are unknown, and we only have sparse measurements of the solution. In this case, we propose two solution methods to construct both the unknown parameters of the PDE and its solution over the entire domain. The first method concerns PDEs where only a low-dimensional parameter space is involved, e.g., we only search for a low number of K𝐾Kitalic_K scalar parameters. Here, we can effectively optimize over the K𝐾Kitalic_K-dimensional parameter space. For each suggested optimum, we solve the PDE directly, because there are no unknown parameters left. We then evaluate our solution at the measurement points where we know the ground truth and compute the difference. This defines a loss function over the low-dimensional parameter space. We demonstrate this method in Section 5.2.1. The second method concerns PDEs for which an entire parameter field is unknown. Here, the parameter space is technically infinite-dimensional, and optimization methods working in low-dimensional spaces are not applicable anymore. We demonstrate in Section 5.2.2 that the solution and the parameter field can be parametrized with separate sampled neural networks. Then, we need to solve one joint problem for their last-layer weights. Even though this joint problem is generally nonlinear, it is only bi-linear in certain cases and can then be solved effectively.

5 Computational experiments

We now demonstrate our approach to solve PDEs in time-dependent, static, and inverse problems. The software and hardware environments used to perform the experiments are listed in Appendix B. The code to reproduce the experiments from the paper, and an up-to-date code base, can be found at

https://gitlab.com/felix.dietrich/swimpde-paper,

https://gitlab.com/felix.dietrich/swimpde.

Table 1 lists all PDE we solve together with forcing and boundary terms on the given domain.

Table 1: Summary of PDEs we solve in this paper. Dtsubscript𝐷𝑡D_{t}italic_D start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT denotes differentiation by time, Dx,Dxxxxsubscript𝐷𝑥subscript𝐷𝑥𝑥𝑥𝑥D_{x},D_{xxxx}italic_D start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT , italic_D start_POSTSUBSCRIPT italic_x italic_x italic_x italic_x end_POSTSUBSCRIPT by space (once and four times), ΔΔ\Deltaroman_Δ is the Laplacian in space, and T𝑇Titalic_T is the final time. Functions f,g𝑓𝑔f,gitalic_f , italic_g are different for each PDE, and parameters α,β,ν𝛼𝛽𝜈\alpha,\beta,\nuitalic_α , italic_β , italic_ν, and the field γ(x)𝛾𝑥\gamma(x)italic_γ ( italic_x ) are described in each subsection.
Section PDE Operator Forcing Boundary Domain
5.1.1 Advection Dt+βDxsubscript𝐷𝑡𝛽subscript𝐷𝑥D_{t}+\beta D_{x}italic_D start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_β italic_D start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT 0 Periodic [0,2π]×[0,T]02𝜋0𝑇[0,2\pi]\times[0,T][ 0 , 2 italic_π ] × [ 0 , italic_T ]
5.1.2 Euler-Bernoulli Dtt+Dxxxxsubscript𝐷𝑡𝑡subscript𝐷𝑥𝑥𝑥𝑥D_{tt}+D_{xxxx}italic_D start_POSTSUBSCRIPT italic_t italic_t end_POSTSUBSCRIPT + italic_D start_POSTSUBSCRIPT italic_x italic_x italic_x italic_x end_POSTSUBSCRIPT f (1,Dxx)=01subscript𝐷𝑥𝑥0(1,D_{xx})=0( 1 , italic_D start_POSTSUBSCRIPT italic_x italic_x end_POSTSUBSCRIPT ) = 0 [0,π]×[0,1]0𝜋01[0,\pi]\times[0,1][ 0 , italic_π ] × [ 0 , 1 ]
5.1.3 Burgers Dt+Dx()2νΔsubscript𝐷𝑡subscript𝐷𝑥superscript2𝜈ΔD_{t}+D_{x}(\cdot)^{2}-\nu\Deltaitalic_D start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_D start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT ( ⋅ ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - italic_ν roman_Δ 0 0 [1,1]×[0,1]1101[-1,1]\times[0,1][ - 1 , 1 ] × [ 0 , 1 ]
5.2.1 Poisson div((1,α)Dx)div1𝛼subscript𝐷𝑥-\text{div}((1,\alpha)D_{x})- div ( ( 1 , italic_α ) italic_D start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT ) f g [0,1.4]2superscript01.42[0,1.4]^{2}[ 0 , 1.4 ] start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
5.2.2 Helmholtz Δ+γ(x)Δ𝛾𝑥-\Delta+\gamma(x)- roman_Δ + italic_γ ( italic_x ) f g [0,1.5]2superscript01.52[0,1.5]^{2}[ 0 , 1.5 ] start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT

In Section 5.1, we use increasingly difficult examples and demonstrate how time-dependent problems can be addressed with the approach of separation of variables. Separately, in Section C.1, we discuss how solving static linear PDE on a complicated geometry is straightforward with our approach. Finally, in Section 5.2, we show how inverse problems with unknown parameters and parameter fields can be solved. We use the root mean squared error (RMSE) and the relative L2superscript𝐿2L^{2}italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT error to quantify errors in all experiments (cf. Appendix B for the definitions). We compute the test error on a uniform grid for all time-dependent PDEs with 256 points in space and 100 points in time.

We compare our results with physics-informed neural networks (PINNs), both classical [56] and causality-respecting (causal PINNs) [69], and isogeometric finite element analysis (IGA-FEM)[33, 11, 12]. The details of these methods can be found in Section B.2 and Section B.3, respectively.

5.1 Time-dependent PDE

In this section, we compare our approach (cf. Section 4.2) to other neural PDE solvers and IGA-FEM on PDE problems involving high advection speeds, long-time simulation, higher-order derivative terms, and steep gradients in the solution.

5.1.1 Linear advection equation

We consider the linear advection equation (ut+βux=0subscript𝑢𝑡𝛽subscript𝑢𝑥0u_{t}+\beta u_{x}=0italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_β italic_u start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT = 0, also see Section C.2) with the initial condition u(x,0)=sin(x)𝑢𝑥0𝑥u(x,0)=\sin(x)italic_u ( italic_x , 0 ) = roman_sin ( italic_x ) and periodic boundary conditions. The analytical solution is given by u(x,t)=sin(xβt)𝑢𝑥𝑡𝑥𝛽𝑡u(x,t)=\sin(x-\beta t)italic_u ( italic_x , italic_t ) = roman_sin ( italic_x - italic_β italic_t ). In the first experiment, we solve this PDE using different neural PDE solvers and IGA-FEM for increasing flow velocities β𝛽\betaitalic_β over the domain Ω×T=[0,2π]×[0,1]Ω𝑇02𝜋01\Omega\times T=[0,2\pi]\times[0,1]roman_Ω × italic_T = [ 0 , 2 italic_π ] × [ 0 , 1 ]. The details on hyper-parameters and the setup of this experiment are listed in Section C.2. Figure 3 shows that approaches using basis functions in the entire spatio-temporal domain, such as PINNs, ELM, and SWIM fail as the flow velocity β𝛽\betaitalic_β increases beyond 40404040. In contrast, ELM-ODE, SWIM-ODE, and IGA-FEM can accurately solve the PDE, even for high values of β𝛽\betaitalic_β. Figure 3 shows that for β=40𝛽40\beta=40italic_β = 40, Lrelative2subscriptsuperscript𝐿2relative{L}^{2}_{\text{relative}}italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT relative end_POSTSUBSCRIPT decays exponentially with the number of basis functions for ELM-ODE, SWIM-ODE, and IGA-FEM. In contrast to IGA-FEM, which uses local basis functions, ELM-ODE and SWIM-ODE require fewer basis functions for a fixed Lrelative2subscriptsuperscript𝐿2relative{L}^{2}_{\text{relative}}italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT relative end_POSTSUBSCRIPT, because they use global basis functions. In this example, PINNs yield high errors. In the second experiment, we attempt solve the PDE with β=1𝛽1\beta=1italic_β = 1 for T=[0,1000]𝑇01000T=[0,1000]italic_T = [ 0 , 1000 ], with the true solution shown in Figure 4(a). Figure 4(b) shows that with ELM-ODE and SWIM-ODE, we can solve over 1000100010001000 seconds with Lrelative2subscriptsuperscript𝐿2relativeL^{2}_{\text{relative}}italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT relative end_POSTSUBSCRIPT of less than 0.001%percent0.0010.001\%0.001 %, requiring only 0.940.940.940.94 seconds of runtime. Simulating long-time dynamics is a longstanding challenge for traditional neural PDE solvers [43, 36], and all solvers using neural network basis functions extending over both space and time, such as PINN, ELM, and SWIM, fail at approximating functions on long time intervals.

Refer to caption
Figure 2: Growth of test error with varying flow velocities β𝛽\betaitalic_β for different PDE solvers (14 basis functions) and IGA-FEM (15 basis functions).
Refer to caption
Figure 3: Fast exponential decay of test error with respect to the number of neurons in the hidden layer (number of basis functions) for β=40𝛽40\beta=40italic_β = 40.
Refer to caption
(a) Analytical solution u(x,t)=sin(xβt)𝑢𝑥𝑡𝑥𝛽𝑡u(x,t)=\sin(x-\beta t)italic_u ( italic_x , italic_t ) = roman_sin ( italic_x - italic_β italic_t ) at β=1𝛽1\beta=1italic_β = 1.
Refer to caption
(b) Slow error growth with time
Figure 4: Advection equation: Long time simulation at β=1𝛽1\beta=1italic_β = 1.

5.1.2 Fourth order Euler–Bernoulli Beam Equation

In this example, the beam equation (utt+uxxxx=fsubscript𝑢𝑡𝑡subscript𝑢𝑥𝑥𝑥𝑥𝑓u_{tt}+u_{xxxx}=fitalic_u start_POSTSUBSCRIPT italic_t italic_t end_POSTSUBSCRIPT + italic_u start_POSTSUBSCRIPT italic_x italic_x italic_x italic_x end_POSTSUBSCRIPT = italic_f, also see Section C.3) is solved with initial data u(x,0)=sin(x)𝑢𝑥0𝑥u(x,0)=\sin(x)italic_u ( italic_x , 0 ) = roman_sin ( italic_x ) on the spatial domain Ω=[0,π]Ω0𝜋\Omega=[0,\pi]roman_Ω = [ 0 , italic_π ]. The force function and the analytical solution are taken from [35]. The challenge here is not high advection speed but the higher orders of derivatives in space and time. The details of the PDE, the absolute error plots, and the comparison of approaches can be found in Section C.3. Table 2 shows that ODE-ELM is more than five orders of magnitude faster and more accurate than PINNs.

5.1.3 Non-linear Viscous Burgers equation

We now demonstrate how sampling weights and biases from a data-dependant distribution can be exploited to handle locally steep gradients in the solution of a non-linear Burgers equation. We compare our approximations to the reference solution provided by Raissi et al. [56]. Table 15 indicates that ODE-ELM cannot accurately represent the sharp gradient in the domain’s center due to the exponentially small probability of having large norms of internal weights. Sampling ELM-ODE weights from a broader uniform distribution increases the probability of having steeper basis functions, as Calabrò et al. [5] discuss for linear PDEs. However, given enough collocation points in the domain’s center, ODE-SWIM can create numerous basic functions with steep gradients, accurately placing them in the domain’s center by factoring in the data. To concentrate collocation points near the shock in the domain’s center, we resample them two times after a set number of time steps, guided by a probability distribution that leverages the gradient of the approximated solution. At the resampling time tr[0,T]subscript𝑡𝑟0𝑇t_{r}\in[0,T]italic_t start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ∈ [ 0 , italic_T ], we approximate the probability density p(x)|u^(x,tr)|similar-to𝑝𝑥^𝑢𝑥subscript𝑡𝑟{p(x)}\sim|\nabla\hat{u}(x,t_{r})|italic_p ( italic_x ) ∼ | ∇ over^ start_ARG italic_u end_ARG ( italic_x , italic_t start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ) |, which we then use to re-sample collocation points at random. While PINNs provide a reasonable error, ODE-SWIM is more accurate by an order of magnitude, almost twice as fast as regular PINN, and over ten times faster than causal PINN.

Refer to caption
Refer to caption
Figure 5: Comparison of SWIM solution to Burgers’ equation and ground truth. Black dashed lines indicate the times at which the solution is compared on the right. Gray dotted lines indicate the times at which the collocation points are re-sampled.
Table 2: Selective summary of the results for the forward problems (all results in Appendix C).
PDE Method Training time (ssecond\mathrm{s}roman_s) Relative L2superscript𝐿2{L}^{2}italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT error
Advection PINN 30.56 6.92e-1 ±plus-or-minus\pm± 2.96e-2
ODE-ELM (our) 2.71 3.84e-6 ±plus-or-minus\pm± 5.2e-7
IGA-FEM 0.07 1.17e-10
Euler-Bernoulli PINN 2303.71 4.21e-3 ±plus-or-minus\pm± 9.56e-4
ODE-ELM (our) 0.06 3.50e-8 ±plus-or-minus\pm± 7.79e-9
IGA-FEM 0.94 4.21e-7
Burgers PINN 275.23 3.88e-3 ±plus-or-minus\pm± 2.61e-3
ODE-SWIM (our) 141.35 3.33e-4 ±plus-or-minus\pm± 4.63e-4
IGA-FEM 13.61 2.20e-4

5.2 Inverse problems

We now demonstrate the efficacy of our approach in solving inverse problems. They consist of PDEs with unknown parameters or source terms. Here, we solve two inverse problems, estimating a unknown scalar parameter and the coefficient field. Table 3 lists all results in this section.

Table 3: Summary of results for inverse problems. The Poisson PDE does not have parameter γ𝛾\gammaitalic_γ, and the Helmholtz PDE does not have parameter α𝛼\alphaitalic_α. We could not solve the Helmholtz PDE with PINNs.
Poisson coefficient Helmholtz field
PINN SWIM PINN SWIM
Architecture (2, 4×\times×20, 2) (2,1024,1) - (2,400,1) for u𝑢uitalic_u and γ𝛾\gammaitalic_γ
α𝛼\alphaitalic_α 9.94e-1 ±plus-or-minus\pm± 3.52e-3 1.003±plus-or-minus\pm±3.45e-3 - -
Rel. error (PDE) 1.11e-3 ±plus-or-minus\pm± 3.07e-4 8.8e-2±plus-or-minus\pm± 6.4e-2 - 1.85e-2 ±plus-or-minus\pm± 1.5e-2
Rel. error (γ𝛾\gammaitalic_γ) - - - 1.23e-2 ±plus-or-minus\pm± 9.10e-3
Training time (ssecond\mathrm{s}roman_s) 140.04 ±plus-or-minus\pm± 6.27 2.85±plus-or-minus\pm± 4.3e-1 - 5.2e-2 ±plus-or-minus\pm± 1.6e-2

5.2.1 Parametric Poisson PDE

Here, we solve a Poisson equation on Ω=[0,1.4]2Ωsuperscript01.42\Omega=[0,1.4]^{2}roman_Ω = [ 0 , 1.4 ] start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT with parameterized, diagonal diffusivity matrix D=diag(1,α)𝐷diag1𝛼D=\text{diag}\left(1,\alpha\right)italic_D = diag ( 1 , italic_α ), α>0𝛼0\alpha>0italic_α > 0. The unknown parameter α𝛼\alphaitalic_α must be estimated from a set of 50505050 measurements, distributed uniformly at random over ΩΩ\Omegaroman_Ω. We only need to solve a one-dimensional optimization problem over α𝛼\alphaitalic_α. For any α𝛼\alphaitalic_α, we solve the PDE and compute the MSE of our approximation to the known measurements (cf. Section C.5). We minimize the MSE with scipy.optimize.minimize_scalar (cf. Table 3). For PINNs, the solution method is described in Section C.5.

5.2.2 Helmholtz PDE with parameter field

We assume access to values of u𝑢uitalic_u at N=300𝑁300N=300italic_N = 300 measurement points, distributed uniformly at random inside ΩΩ\Omegaroman_Ω. The goal is to construct an accurate approximation of u𝑢uitalic_u and γ𝛾\gammaitalic_γ on ΩΩ\Omegaroman_Ω. Different to Dong and Wang [18], we use tanh\tanhroman_tanh (not any uncommon activation function), and also do not distribute additional measurement points on the boundary (we do not assume access to any values from g𝑔gitalic_g). Figure 6 shows the resulting approximation with a network of M=400𝑀400M=400italic_M = 400 neurons in one hidden layer. The relative error is plotted over a grid of 101×101101101101\times 101101 × 101 test points.

Refer to caption
Refer to caption
Figure 6: Left: Ground truth solution and approximation for the Helmholtz PDE. Right: Ground truth parameter field γ𝛾\gammaitalic_γ and our approximation.

6 Conclusion

We discuss how PDEs can be solved with a neural network ansatz if the parameters of each neuron are sampled from specific probability distributions. We prove that such approximations converge to the true solution of the PDE in the L2superscript𝐿2L^{2}italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT norm and discuss how time-dependent PDEs can be solved with a separation of variables approach. We introduce a simple method to compute new basis functions to satisfy the boundary conditions over time. We also demonstrate how to effectively solve inverse problems.

Benefits of the approach include reducing the number of parameters because we choose many randomly. For static, linear PDEs, finding the solution reduces to solving a linear problem, for which many efficient methods exist already. For general time-dependent PDEs, sampling the ansatz in space at random allows us to reduce the problem to solving a (high-dimensional) ODE. We can use existing solution methods to solve this ODE and find the time-dependent parameters of the ansatz. In the inverse problem setting, we utilize the random basis to parameterize both the PDE solution and parameter fields, so far fewer parameters must be optimized.

Disadvantages of our method compared to neural solvers using back-propagation include that our networks require more neurons for the same accuracy, leading to higher inference times. We could use our method to solve the PDE and then construct a smaller neural network to approximate the new solution. The linear systems we must solve depend on the number of data points in the domain and the number of neurons in the neural ansatz. Scaling up both leads to cubic complexity in computational time and memory. However, the number of neurons does not need to grow arbitrarily for a fixed problem and accuracy level. In this case, our method scales linearly with the number of points.

Ethical considerations are important for any new machine learning approach because neural networks are generally dual-use. Our approach is based on classical methods from scientific computing, which are well understood. This connection now allows researchers to better understand our neural solvers’ behavior, failure modes, and robustness. We believe that the benefits of our approach far outweigh the potential downsides of misuse because a system that is understood better can also be controlled more straightforwardly.

Remaining challenges and future work include very high-dimensional base spaces and experiments on domain decomposition, which is a very effective method for low-dimensional problems. In principle, our approach should also work for this setting because the constraints required on the inner boundaries of domains are linear. Many exciting new directions that we will explore in the future are related to real-world problems in science and engineering. Our method is tailored to scattered, noisy data, where mesh generation for classical methods is tedious and challenging. Our approach is mesh-free, flexible w.r.t. different PDE settings, and can approximate the solution with comparatively low computational resources.

Acknowledgments and Disclosure of Funding

F.D. and C.D. acknowledge Sølve Eidnes and Martine Mahlum for a great workshop on physics in machine learning (in Oslo, 2024). We are grateful to Tim Bürchner for providing an initial version of the IGA code, and to Jana Huhne for providing an initial version of the neural PDE solver code. F.D., I.B., and E.L.B. acknowledge funding by the German Research Foundation—project 468830823, and association to DFG-SPP-229. C.D. is partially funded by the Institute for Advanced Study (IAS) at the Technical University of Munich. F.D. and Q.S. are supported by the TUM Georg Nemetschek Institute - Artificial Intelligence for the Built World.

References

  • Abadi et al. [2015] Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Ian Goodfellow, Andrew Harp, Geoffrey Irving, Michael Isard, Yangqing Jia, Rafal Jozefowicz, Lukasz Kaiser, Manjunath Kudlur, Josh Levenberg, Dandelion Mané, Rajat Monga, Sherry Moore, Derek Murray, Chris Olah, Mike Schuster, Jonathon Shlens, Benoit Steiner, Ilya Sutskever, Kunal Talwar, Paul Tucker, Vincent Vanhoucke, Vijay Vasudevan, Fernanda Viégas, Oriol Vinyals, Pete Warden, Martin Wattenberg, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. TensorFlow: Large-scale machine learning on heterogeneous systems, 2015. URL https://www.tensorflow.org/. Software available from tensorflow.org.
  • Barron [1993] A.R. Barron. Universal approximation bounds for superpositions of a sigmoidal function. IEEE Transactions on Information Theory, 39(3):930–945, May 1993. ISSN 0018-9448, 1557-9654. doi: 10.1109/18.256500.
  • Bellock et al. [2021] Ken Bellock, Neil Godber, and Philip Kahn. bellockk/alphashape: v1.3.1 release, April 2021.
  • Bolager et al. [2023] Erik L Bolager, Iryna Burak, Chinmay Datar, Qing Sun, and Felix Dietrich. Sampling weights of deep neural networks. In Advances in Neural Information Processing Systems, volume 36, pages 63075–63116. Curran Associates, Inc., 2023.
  • Calabrò et al. [2021] Francesco Calabrò, Gianluca Fabiani, and Constantinos Siettos. Extreme learning machine collocation for the numerical solution of elliptic PDEs with sharp gradients. Computer Methods in Applied Mechanics and Engineering, 387:114188, December 2021. ISSN 00457825. doi: 10.1016/j.cma.2021.114188.
  • Chen et al. [2024] Jingrun Chen, Weinan E, and Yifei Sun. Optimization of random feature method in the high-precision regime. Communications on Applied Mathematics and Computation, pages 1–28, 2024.
  • Chen et al. [2014] Wen Chen, Zhuo-Jia Fu, and C.S. Chen. Recent Advances in Radial Basis Function Collocation Methods. Springer Berlin Heidelberg, 2014. ISBN 9783642395727. doi: 10.1007/978-3-642-39572-7.
  • Chen et al. [2021] Ziang Chen, Jianfeng Lu, and Yulong Lu. On the representation of solutions to elliptic pdes in barron spaces. Advances in neural information processing systems, 34:6454–6465, 2021.
  • Chiu et al. [2022] Pao-Hsiung Chiu, Jian Cheng Wong, Chinchun Ooi, My Ha Dao, and Yew-Soon Ong. Can-pinn: A fast physics-informed neural network based on coupled-automatic–numerical differentiation method. Computer Methods in Applied Mechanics and Engineering, 395:114909, 2022.
  • Cho et al. [2024] Junwoo Cho, Seungtae Nam, Hyunmo Yang, Seok-Bae Yun, Youngjoon Hong, and Eunbyung Park. Separable physics-informed neural networks. Advances in Neural Information Processing Systems, 36, 2024.
  • Cottrell et al. [2009] J. Austin Cottrell, Thomas J. R. Hughes, and Yuri Bazilevs. Isogeometric Analysis: Toward Integration of CAD and FEA. Wiley Publishing, 1st edition, 2009. ISBN 0470748737.
  • Cottrell et al. [2006] J.A. Cottrell, A. Reali, Y. Bazilevs, and T.J.R. Hughes. Isogeometric analysis of structural vibrations. Computer Methods in Applied Mechanics and Engineering, 195(41):5257–5296, 2006. ISSN 0045-7825. doi: https://doi.org/10.1016/j.cma.2005.09.027. John H. Argyris Memorial Issue. Part II.
  • COX [1972] M. G. COX. The Numerical Evaluation of B-Splines*. IMA Journal of Applied Mathematics, 10(2):134–149, 10 1972. ISSN 0272-4960. doi: 10.1093/imamat/10.2.134.
  • Cybenko [1989] G. Cybenko. Approximation by superpositions of a sigmoidal function. Mathematics of Control, Signals and Systems, 2(4):303–314, December 1989. ISSN 1435-568X. doi: 10.1007/BF02551274.
  • de Boor [1972] Carl de Boor. On calculating with b-splines. Journal of Approximation Theory, 6(1):50–62, 1972. ISSN 0021-9045. doi: https://doi.org/10.1016/0021-9045(72)90080-9.
  • Dissanayake and Phan-Thien [1994] MWMG Dissanayake and Nhan Phan-Thien. Neural-network-based approximations for solving partial differential equations. communications in Numerical Methods in Engineering, 10(3):195–201, 1994.
  • Dong and Li [2021] Suchuan Dong and Zongwei Li. Local extreme learning machines and domain decomposition for solving linear and nonlinear partial differential equations. Computer Methods in Applied Mechanics and Engineering, 387:114129, 2021.
  • Dong and Wang [2023] Suchuan Dong and Yiran Wang. A method for computing inverse parametric pde problems with random-weight neural networks. Journal of Computational Physics, 489:112263, 2023.
  • Dong and Yang [2022] Suchuan Dong and Jielin Yang. On computing the hyperparameter of extreme learning machines: Algorithm and application to computational pdes, and comparison with classical and high-order finite elements. Journal of Computational Physics, 463:111290, 2022.
  • Dresdner et al. [2022] Gideon Dresdner, Dmitrii Kochkov, Peter Christian Norgaard, Leonardo Zepeda-Nunez, Jamie Smith, Michael Brenner, and Stephan Hoyer. Learning to correct spectral methods for simulating turbulent flows. Transactions on Machine Learning Research, 2022.
  • Du et al. [2023] Yiheng Du, Nithin Chalapathi, and Aditi Krishnapriyan. Neural spectral methods: Self-supervised learning in the spectral domain. arXiv preprint arXiv:2312.05225, 2023.
  • Dwivedi and Srinivasan [2020] Vikas Dwivedi and Balaji Srinivasan. Physics informed extreme learning machine (pielm)–a rapid method for the numerical solution of partial differential equations. Neurocomputing, 391:96–118, 2020.
  • Dwivedi et al. [2021] Vikas Dwivedi, Nishant Parashar, and Balaji Srinivasan. Distributed learning machines for solving forward and inverse problems in partial differential equations. Neurocomputing, 420:299–316, 2021.
  • E [2020] Weinan E. Towards a Mathematical Understanding of Neural Network-Based Machine Learning: What We Know and What We Don’t. CSIAM Transactions on Applied Mathematics, 1(4):561–615, June 2020. ISSN 2708-0560, 2708-0579. doi: 10.4208/csiam-am.SO-2020-0002.
  • E et al. [2022] Weinan E, Chao Ma, and Lei Wu. The Barron Space and the Flow-Induced Function Spaces for Neural Network Models. Constructive Approximation, 55(1):369–406, February 2022. ISSN 0176-4276, 1432-0940. doi: 10.1007/s00365-021-09549-y.
  • Fabiani et al. [2021] Gianluca Fabiani, Francesco Calabrò, Lucia Russo, and Constantinos Siettos. Numerical solution and bifurcation analysis of nonlinear partial differential equations with extreme learning machines. Journal of Scientific Computing, 89(2):44, November 2021. ISSN 0885-7474, 1573-7691. doi: 10.1007/s10915-021-01650-5.
  • Galaris et al. [2022] Evangelos Galaris, Gianluca Fabiani, Ioannis Gallos, Ioannis Kevrekidis, and Constantinos Siettos. Numerical Bifurcation Analysis of PDEs From Lattice Boltzmann Model Simulations: A Parsimonious Machine Learning Approach. Journal of Scientific Computing, 92(2):34, August 2022. ISSN 0885-7474, 1573-7691. doi: 10.1007/s10915-022-01883-y.
  • Gingold and Monaghan [1977] R. A. Gingold and J. J. Monaghan. Smoothed particle hydrodynamics: theory and application to non-spherical stars. Monthly Notices of the Royal Astronomical Society, 181(3):375–389, 12 1977. ISSN 0035-8711. doi: 10.1093/mnras/181.3.375.
  • Glorot and Bengio [2010] Xavier Glorot and Yoshua Bengio. Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the thirteenth international conference on artificial intelligence and statistics, pages 249–256. JMLR Workshop and Conference Proceedings, 2010.
  • Han et al. [2018] Jiequn Han, Arnulf Jentzen, and Weinan E. Solving high-dimensional partial differential equations using deep learning. Proceedings of the National Academy of Sciences, 115(34):8505–8510, 2018.
  • Herrmann et al. [2023] Leon Herrmann, Tim Bürchner, Felix Dietrich, and Stefan Kollmannsberger. On the use of neural networks for full waveform inversion. Computer Methods in Applied Mechanics and Engineering, 415:116278, October 2023. ISSN 00457825. doi: 10.1016/j.cma.2023.116278.
  • Huang and Alkhalifah [2024] Xinquan Huang and Tariq Alkhalifah. Efficient physics-informed neural networks using hash encoding. Journal of Computational Physics, 501:112760, 2024.
  • Hughes et al. [2005] T.J.R. Hughes, J.A. Cottrell, and Y. Bazilevs. Isogeometric analysis: Cad, finite elements, nurbs, exact geometry and mesh refinement. Computer Methods in Applied Mechanics and Engineering, 194(39):4135–4195, 2005.
  • Kapoor et al. [2023a] Taniya Kapoor, Abhishek Chandra, Daniel Tartakovsky, Hongrui Wang, Alfredo Núñez, and Rolf Dollevoet. Neural oscillators for generalizing parametric pdes. In The Symbiosis of Deep Learning and Differential Equations III, 2023a.
  • Kapoor et al. [2023b] Taniya Kapoor, Hongrui Wang, Alfredo Núñez, and Rolf Dollevoet. Physics-informed neural networks for solving forward and inverse problems in complex beam systems. IEEE Transactions on Neural Networks and Learning Systems, 2023b.
  • Kapoor et al. [2024a] Taniya Kapoor, Abhishek Chandra, Daniel M Tartakovsky, Hongrui Wang, Alfredo Nunez, and Rolf Dollevoet. Neural oscillators for generalization of physics-informed machine learning. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 38, pages 13059–13067, 2024a.
  • Kapoor et al. [2024b] Taniya Kapoor, Hongrui Wang, Alfredo Núñez, and Rolf Dollevoet. Transfer learning for improved generalizability in causal physics-informed neural networks for beam simulations. Engineering Applications of Artificial Intelligence, 133:108085, 2024b.
  • Kim et al. [2021] Jungeun Kim, Kookjin Lee, Dongeun Lee, Sheo Yon Jhin, and Noseong Park. Dpm: a novel training method for physics-informed neural networks in extrapolation. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 8146–8154, 2021.
  • Lagaris et al. [1998] Isaac E Lagaris, Aristidis Likas, and Dimitrios I Fotiadis. Artificial neural networks for solving ordinary and partial differential equations. IEEE transactions on neural networks, 9(5):987–1000, 1998.
  • Lancaster and Salkauskas [1981] Peter Lancaster and Kestutis Salkauskas. Surfaces generated by moving least squares methods. Mathematics of Computation, 37:141–158, 1981.
  • Lange et al. [2021] Henning Lange, Steven L Brunton, and J Nathan Kutz. From fourier to koopman: Spectral methods for long-term time series prediction. Journal of Machine Learning Research, 22(41):1–38, 2021.
  • Li et al. [2020] Zongyi Li, Nikola Borislavov Kovachki, Kamyar Azizzadenesheli, Kaushik Bhattacharya, Andrew Stuart, Anima Anandkumar, et al. Fourier neural operator for parametric partial differential equations. In International Conference on Learning Representations, 2020.
  • Lippe et al. [2024] Phillip Lippe, Bas Veeling, Paris Perdikaris, Richard Turner, and Johannes Brandstetter. Pde-refiner: Achieving accurate long rollouts with neural pde solvers. Advances in Neural Information Processing Systems, 36, 2024.
  • Liu et al. [2023] Xu Liu, Wen Yao, Wei Peng, and Weien Zhou. Bayesian physics-informed extreme learning machine for forward and inverse pde problems with noisy data. Neurocomputing, 549:126425, 2023.
  • Lu et al. [2021a] Lu Lu, Pengzhan Jin, Guofei Pang, Zhongqiang Zhang, and George Em Karniadakis. Learning nonlinear operators via deeponet based on the universal approximation theorem of operators. Nature machine intelligence, 3(3):218–229, 2021a.
  • Lu et al. [2021b] Lu Lu, Xuhui Meng, Zhiping Mao, and George Em Karniadakis. DeepXDE: A deep learning library for solving differential equations. SIAM Review, 63(1):208–228, 2021b. doi: 10.1137/19M1274067.
  • Lucy [1977] L. B. Lucy. A numerical approach to the testing of the fission hypothesis. The Astronomical Journal, 82:1013–1024, December 1977. doi: 10.1086/112164.
  • Meng et al. [2020] Xuhui Meng, Zhen Li, Dongkun Zhang, and George Em Karniadakis. Ppinn: Parareal physics-informed neural network for time-dependent pdes. Computer Methods in Applied Mechanics and Engineering, 370:113250, 2020.
  • Meuris et al. [2023] Brek Meuris, Saad Qadeer, and Panos Stinis. Machine-learning-based spectral methods for partial differential equations. Scientific Reports, 13(1):1739, 2023.
  • Miao and Chen [2023] Zhengwu Miao and Yong Chen. Vc-pinn: Variable coefficient physics-informed neural network for forward and inverse problems of pdes with variable coefficient. Physica D: Nonlinear Phenomena, 456:133945, 2023.
  • Paszke et al. [2017] Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. Automatic differentiation in PyTorch, 2017.
  • Piegl and Tiller [1997] Les Piegl and Wayne Tiller. The NURBS book (2nd ed.). Springer-Verlag, Berlin, Heidelberg, 1997. ISBN 3540615458.
  • Pinkus [1999] Allan Pinkus. Approximation theory of the MLP model in neural networks. Acta Numerica, 8:143–195, January 1999. ISSN 0962-4929, 1474-0508. doi: 10.1017/S0962492900002919.
  • Powell [1992] M J D Powell. The Theory of Radial Basis Function Approximation in 1990. In Advances in Numerical Analysis: Wavelets, Subdivision Algorithms, and Radial Basis Functions. Oxford University Press, 04 1992. ISBN 9780198534396. doi: 10.1093/oso/9780198534396.003.0003.
  • Prudhomme et al. [2001] Serge Prudhomme, Frédéric Pascal, J.Tinsley Oden, and Albert Romkes. A priori error estimate for the baumann–oden version of the discontinuous galerkin method. Comptes Rendus de l’Académie des Sciences - Series I - Mathematics, 332(9):851–856, 2001. ISSN 0764-4442. doi: https://doi.org/10.1016/S0764-4442(01)01936-X.
  • Raissi et al. [2019] Maziar Raissi, Paris Perdikaris, and George E Karniadakis. Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. Journal of Computational physics, 378:686–707, 2019.
  • Rao et al. [2023] Chengping Rao, Pu Ren, Qi Wang, Oral Buyukozturk, Hao Sun, and Yang Liu. Encoding physics to learn reaction–diffusion processes. Nature Machine Intelligence, 5(7):765–779, 2023.
  • Raonic et al. [2024] Bogdan Raonic, Roberto Molinaro, Tim De Ryck, Tobias Rohner, Francesca Bartolucci, Rima Alaifari, Siddhartha Mishra, and Emmanuel de Bézenac. Convolutional neural operators for robust and accurate learning of pdes. Advances in Neural Information Processing Systems, 36, 2024.
  • Ren et al. [2022] Pu Ren, Chengping Rao, Yang Liu, Jian-Xun Wang, and Hao Sun. Phycrnet: Physics-informed convolutional-recurrent network for solving spatiotemporal pdes. Computer Methods in Applied Mechanics and Engineering, 389:114399, 2022.
  • Rudi and Rosasco [2021] Alessandro Rudi and Lorenzo Rosasco. Generalization Properties of Learning with Random Features, April 2021.
  • Shadloo et al. [2016] M.S. Shadloo, G. Oger, and D. Le Touzé. Smoothed particle hydrodynamics method for fluid flows, towards industrial applications: Motivations, current state, and challenges. Computers & Fluids, 136:11–34, 2016. ISSN 0045-7930. doi: https://doi.org/10.1016/j.compfluid.2016.05.029.
  • Shang and Wang [2024] Yong Shang and Fei Wang. Randomized Neural Networks with Petrov–Galerkin Methods for Solving Linear Elasticity and Navier–Stokes Equations. Journal of Engineering Mechanics, 150(4):04024010, April 2024. ISSN 0733-9399, 1943-7889. doi: 10.1061/JENMDT.EMENG-7463.
  • Sharma and Shankar [2022] Ramansh Sharma and Varun Shankar. Accelerated training of physics-informed neural networks (pinns) using meshless discretizations. Advances in Neural Information Processing Systems, 35:1034–1046, 2022.
  • Shepard [1968] Donald Shepard. A two-dimensional interpolation function for irregularly-spaced data. In Proceedings of the 1968 23rd ACM National Conference, ACM ’68, page 517–524, New York, NY, USA, 1968. Association for Computing Machinery. ISBN 9781450374866. doi: 10.1145/800186.810616.
  • Sirignano and Spiliopoulos [2018] Justin Sirignano and Konstantinos Spiliopoulos. Dgm: A deep learning algorithm for solving partial differential equations. Journal of computational physics, 375:1339–1364, 2018.
  • Sun et al. [2024] Jingbo Sun, Suchuan Dong, and Fei Wang. Local randomized neural networks with discontinuous Galerkin methods for partial differential equations. Journal of Computational and Applied Mathematics, 445:115830, August 2024. ISSN 03770427. doi: 10.1016/j.cam.2024.115830.
  • Turk and Levoy [1994] Greg Turk and Marc Levoy. Zippered polygon meshes from range images. In Proceedings of the 21st annual conference on Computer graphics and interactive techniques, pages 311–318, 1994.
  • Wang et al. [2024a] Haoxiang Wang, Tao Yu, Tianwei Yang, Hui Qiao, and Qionghai Dai. Neural physical simulation with multi-resolution hash grid encoding. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 38, pages 5410–5418, 2024a.
  • Wang et al. [2024b] Sifan Wang, Shyam Sankaran, and Paris Perdikaris. Respecting causality for training physics-informed neural networks. Computer Methods in Applied Mechanics and Engineering, 421:116813, 2024b.
  • Wang and Dong [2024] Yiran Wang and Suchuan Dong. An extreme learning machine-based method for computational pdes in higher dimensions. Computer Methods in Applied Mechanics and Engineering, 418:116578, 2024.
  • Wu and Long [2022] Lei Wu and Jihao Long. A Spectral-Based Analysis of the Separation between Two-Layer Neural Networks and Linear Methods. Journal of Machine Learning Research, 23(1), January 2022. ISSN 1532-4435.
  • Xia et al. [2023] Mingtao Xia, Lucas Böttcher, and Tom Chou. Spectrally adapted physics-informed neural networks for solving unbounded domain problems. Machine Learning: Science and Technology, 4(2):025024, 2023.
  • Zhu et al. [2023] Min Zhu, Handi Zhang, Anran Jiao, George Em Karniadakis, and Lu Lu. Reliable extrapolation of deep neural operators informed by physics or sparse observations. Computer Methods in Applied Mechanics and Engineering, 412:116064, 2023.

Appendix

Appendix A Theoretical results

In this section, we describe in more detail the setting of the PDE and the assumptions, define the particular variant of Barron spaces, and prove the theoretical result in the main paper.

A.1 Input space

We start by defining the input space ΩΩ\Omegaroman_Ω, following the setup from [4]. We set

dd(x,A)=inf{d(x,a):aA},subscript𝑑superscript𝑑𝑥𝐴infimumconditional-set𝑑𝑥𝑎𝑎𝐴\displaystyle d_{\mathbb{R}^{d}}(x,A)=\inf\{d(x,a)\colon a\in A\},italic_d start_POSTSUBSCRIPT blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_x , italic_A ) = roman_inf { italic_d ( italic_x , italic_a ) : italic_a ∈ italic_A } ,

where d𝑑ditalic_d is the canonical Euclidean distance in the space dsuperscript𝑑\mathbb{R}^{d}blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT and Ad𝐴superscript𝑑A\subseteq\mathbb{R}^{d}italic_A ⊆ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT. The medial axis is defined as

Med(A)={xd:pqA,px=qx=dd(x,A)}Med𝐴conditional-set𝑥superscript𝑑formulae-sequence𝑝𝑞𝐴delimited-∥∥𝑝𝑥delimited-∥∥𝑞𝑥subscript𝑑superscript𝑑𝑥𝐴\displaystyle\text{Med}(A)=\{x\in\mathbb{R}^{d}\colon\exists p\neq q\in A,% \lVert p-x\rVert=\lVert q-x\rVert=d_{\mathbb{R}^{d}}(x,A)\}Med ( italic_A ) = { italic_x ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT : ∃ italic_p ≠ italic_q ∈ italic_A , ∥ italic_p - italic_x ∥ = ∥ italic_q - italic_x ∥ = italic_d start_POSTSUBSCRIPT blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_x , italic_A ) }

and the reach is the scalar

τA=infaAdd(a,Med(A)),subscript𝜏𝐴subscriptinfimum𝑎𝐴subscript𝑑superscript𝑑𝑎Med𝐴\displaystyle\tau_{A}=\inf_{a\in A}d_{\mathbb{R}^{d}}(a,\text{Med}(A)),italic_τ start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT = roman_inf start_POSTSUBSCRIPT italic_a ∈ italic_A end_POSTSUBSCRIPT italic_d start_POSTSUBSCRIPT blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_a , Med ( italic_A ) ) ,

i.e., the point in A𝐴Aitalic_A that is closest to the projection of points in Acsuperscript𝐴𝑐A^{c}italic_A start_POSTSUPERSCRIPT italic_c end_POSTSUPERSCRIPT.

Definition 1.

Let Ω~~Ω\widetilde{\Omega}over~ start_ARG roman_Ω end_ARG be a nonempty compact subset of dsuperscript𝑑\mathbb{R}^{d}blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT with reach τΩ~>0subscript𝜏~Ω0\tau_{\widetilde{\Omega}}>0italic_τ start_POSTSUBSCRIPT over~ start_ARG roman_Ω end_ARG end_POSTSUBSCRIPT > 0. The input space ΩΩ\Omegaroman_Ω is defined as

Ω={xd:dd(x,Ω~)<ϵΩ},Ωconditional-set𝑥superscript𝑑subscript𝑑superscript𝑑𝑥~Ωsubscriptitalic-ϵΩ\displaystyle\Omega=\{x\in\mathbb{R}^{d}\colon d_{\mathbb{R}^{d}}(x,\widetilde% {\Omega})<\epsilon_{\Omega}\},roman_Ω = { italic_x ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT : italic_d start_POSTSUBSCRIPT blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_x , over~ start_ARG roman_Ω end_ARG ) < italic_ϵ start_POSTSUBSCRIPT roman_Ω end_POSTSUBSCRIPT } ,

where 0<ϵΩ<min{τΩ~,1}0subscriptitalic-ϵΩsubscript𝜏~Ω10<\epsilon_{\Omega}<\min\{\tau_{\widetilde{\Omega}},1\}0 < italic_ϵ start_POSTSUBSCRIPT roman_Ω end_POSTSUBSCRIPT < roman_min { italic_τ start_POSTSUBSCRIPT over~ start_ARG roman_Ω end_ARG end_POSTSUBSCRIPT , 1 }. In addition, we denote

Ωc={xd:dd(x,Ω~)ϵΩ}.subscriptΩ𝑐conditional-set𝑥superscript𝑑subscript𝑑superscript𝑑𝑥~Ωsubscriptitalic-ϵΩ\displaystyle\Omega_{c}=\{x\in\mathbb{R}^{d}\colon d_{\mathbb{R}^{d}}(x,% \widetilde{\Omega})\leq\epsilon_{\Omega}\}.roman_Ω start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT = { italic_x ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT : italic_d start_POSTSUBSCRIPT blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_x , over~ start_ARG roman_Ω end_ARG ) ≤ italic_ϵ start_POSTSUBSCRIPT roman_Ω end_POSTSUBSCRIPT } .

The input spaces encompassed by the definition above are still very broad, and as argued by [4], these assumptions typically hold due to noise in real-world applications. In addition, in many PDE settings, the input space is also included in the definition above or close enough. We altered the definition somewhat from [4], as we will need the open and bounded set ΩΩ\Omegaroman_Ω, as well as the compact set ΩcsubscriptΩ𝑐\Omega_{c}roman_Ω start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT.

A.2 PDE Setting

Consider a family of second-order elliptic PDEs on dsuperscript𝑑\mathbb{R}^{d}blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT of the form

u=(Ψu)+ψu=f.𝑢Ψ𝑢𝜓𝑢𝑓\mathcal{L}u=-\nabla\cdot(\Psi\nabla u)+\psi u=f.caligraphic_L italic_u = - ∇ ⋅ ( roman_Ψ ∇ italic_u ) + italic_ψ italic_u = italic_f . (7)
Assumption A.1.

Ψ(x)=(Ψij(x))1i,jdΨ𝑥subscriptsubscriptΨ𝑖𝑗𝑥formulae-sequence1𝑖𝑗𝑑\Psi(x)=(\Psi_{ij}(x))_{1\leq i,j\leq d}roman_Ψ ( italic_x ) = ( roman_Ψ start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ( italic_x ) ) start_POSTSUBSCRIPT 1 ≤ italic_i , italic_j ≤ italic_d end_POSTSUBSCRIPT is a symmetric matrix with Ψ(x)amaxnormΨ𝑥subscript𝑎𝑚𝑎𝑥\|\Psi(x)\|\leq a_{max}\leq\infty∥ roman_Ψ ( italic_x ) ∥ ≤ italic_a start_POSTSUBSCRIPT italic_m italic_a italic_x end_POSTSUBSCRIPT ≤ ∞ and uniformly elliptic, that is, for some amin>0subscript𝑎𝑚𝑖𝑛0a_{min}>0italic_a start_POSTSUBSCRIPT italic_m italic_i italic_n end_POSTSUBSCRIPT > 0, it satisfies

ξΨ(x)ξaminξ2, for allx,ξd.formulae-sequencesuperscript𝜉topΨ𝑥𝜉subscript𝑎𝑚𝑖𝑛superscriptnorm𝜉2 for all𝑥𝜉superscript𝑑\xi^{\top}\Psi(x)\xi\geq a_{min}\|\xi\|^{2},\quad\text{ for all}\quad x,\xi\in% \mathbb{R}^{d}.italic_ξ start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT roman_Ψ ( italic_x ) italic_ξ ≥ italic_a start_POSTSUBSCRIPT italic_m italic_i italic_n end_POSTSUBSCRIPT ∥ italic_ξ ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , for all italic_x , italic_ξ ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT .

In addition, assume that 0<ψminψ(x)ψmax<10subscript𝜓𝑚𝑖𝑛𝜓𝑥subscript𝜓𝑚𝑎𝑥10<\psi_{min}\leq\psi(x)\leq\psi_{max}<10 < italic_ψ start_POSTSUBSCRIPT italic_m italic_i italic_n end_POSTSUBSCRIPT ≤ italic_ψ ( italic_x ) ≤ italic_ψ start_POSTSUBSCRIPT italic_m italic_a italic_x end_POSTSUBSCRIPT < 1 and fL2(d)𝑓subscript𝐿2superscript𝑑f\in L_{2}(\mathbb{R}^{d})italic_f ∈ italic_L start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT ).

Under the above assumptions, the Lax-Milgram theorem implies that there exists a unique weak solution uH1(d)superscript𝑢subscript𝐻1superscript𝑑u^{*}\in H_{1}(\mathbb{R}^{d})italic_u start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∈ italic_H start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT ), such that u=fH(d)𝑢𝑓superscript𝐻superscript𝑑\mathcal{L}u=f\in H^{*}(\mathbb{R}^{d})caligraphic_L italic_u = italic_f ∈ italic_H start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT ), with H(d)superscript𝐻superscript𝑑H^{*}(\mathbb{R}^{d})italic_H start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT ) the dual space of H1(d)subscript𝐻1superscript𝑑H_{1}(\mathbb{R}^{d})italic_H start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT ) [8], i.e.,

dΨuvdx+dψuvdx=dfvdx, for allvH1(d).formulae-sequencesubscriptsuperscript𝑑Ψsuperscript𝑢𝑣d𝑥subscriptsuperscript𝑑𝜓superscript𝑢𝑣differential-d𝑥subscriptsuperscript𝑑𝑓𝑣differential-d𝑥 for all𝑣subscript𝐻1superscript𝑑\int_{\mathbb{R}^{d}}\Psi\nabla u^{*}\cdot\nabla v\mathrm{d}x+\int_{\mathbb{R}% ^{d}}\psi u^{*}v\mathrm{d}x=\int_{\mathbb{R}^{d}}fv\mathrm{d}x,\quad\text{ for% all}\quad v\in H_{1}(\mathbb{R}^{d}).∫ start_POSTSUBSCRIPT blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT end_POSTSUBSCRIPT roman_Ψ ∇ italic_u start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ⋅ ∇ italic_v roman_d italic_x + ∫ start_POSTSUBSCRIPT blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_ψ italic_u start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT italic_v roman_d italic_x = ∫ start_POSTSUBSCRIPT blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_f italic_v roman_d italic_x , for all italic_v ∈ italic_H start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT ) .

Following the setup of [8], we consider the PDE whose coefficients and source term are in the Barron space.

A.3 Barron Spaces

The Barron space has been defined in distinct ways [2, 25], but all definitions can be seen as the space of infinitely wide two-layered neural networks. Given an activation function σ~~𝜎\widetilde{\sigma}over~ start_ARG italic_σ end_ARG (not to be confused with σ𝜎\sigmaitalic_σ which we set to tanh\tanhroman_tanh in the main paper), and a probability measure ρ𝜌\rhoitalic_ρ over the parameter space, the function

uρ(x)=aσ~(wx+b)ρ(da,dw,db),xdformulae-sequencesubscript𝑢𝜌𝑥𝑎~𝜎superscript𝑤top𝑥𝑏𝜌d𝑎d𝑤d𝑏𝑥superscript𝑑u_{\rho}(x)=\int a\widetilde{\sigma}(w^{\top}x+b)\rho(\mathrm{d}a,\mathrm{d}w,% \mathrm{d}b),\quad x\in\mathbb{R}^{d}italic_u start_POSTSUBSCRIPT italic_ρ end_POSTSUBSCRIPT ( italic_x ) = ∫ italic_a over~ start_ARG italic_σ end_ARG ( italic_w start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_x + italic_b ) italic_ρ ( roman_d italic_a , roman_d italic_w , roman_d italic_b ) , italic_x ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT (8)

can be seen as the infinite width function of the neural network

k=1Kakσ~(wkx+bk),{ak,wk,bk}k=1Kρ.similar-tosuperscriptsubscript𝑘1𝐾subscript𝑎𝑘~𝜎superscriptsubscript𝑤𝑘top𝑥subscript𝑏𝑘superscriptsubscriptsubscript𝑎𝑘subscript𝑤𝑘subscript𝑏𝑘𝑘1𝐾𝜌\displaystyle\sum_{k=1}^{K}a_{k}\widetilde{\sigma}(w_{k}^{\top}x+b_{k}),\quad% \{a_{k},w_{k},b_{k}\}_{k=1}^{K}\sim\rho.∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT italic_a start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT over~ start_ARG italic_σ end_ARG ( italic_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_x + italic_b start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) , { italic_a start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_b start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT ∼ italic_ρ .

The Barron space is a generalization from one particular probability distribution to all function that have at least one distribution ρ𝜌\rhoitalic_ρ such that Equation 8 holds.

Definition 2.

For a domain ΩΩ\Omegaroman_Ω and R[0,]𝑅0R\in[0,\infty]italic_R ∈ [ 0 , ∞ ] and a function g:d:𝑔superscript𝑑g\colon\mathbb{R}^{d}\rightarrow\mathbb{R}italic_g : blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT → blackboard_R such that g=uρ𝑔subscript𝑢𝜌g=u_{\rho}italic_g = italic_u start_POSTSUBSCRIPT italic_ρ end_POSTSUBSCRIPT with some probability measure ρ𝜌\rhoitalic_ρ, we define the Barron norm of g𝑔gitalic_g on ΩΩ\Omegaroman_Ω with index p[1,]𝑝1p\in[1,\infty]italic_p ∈ [ 1 , ∞ ] and support radius R𝑅Ritalic_R by

gRp(Ω)subscriptdelimited-∥∥𝑔subscriptsuperscript𝑝𝑅Ω\displaystyle\lVert g\rVert_{\mathcal{B}^{p}_{R}(\Omega)}∥ italic_g ∥ start_POSTSUBSCRIPT caligraphic_B start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT ( roman_Ω ) end_POSTSUBSCRIPT =infρ{(|a|pρ(da,dw,db))1/p:g(x)=aσ~(wx+b)ρ(da,dw,db)\displaystyle=\inf_{\rho}\Bigg{\{}\left(\int\lvert a\rvert^{p}\rho(\mathrm{d}a% ,\mathrm{d}w,\mathrm{d}b)\right)^{1/p}\colon g(x)=\int a\widetilde{\sigma}(w^{% \top}x+b)\rho(\mathrm{d}a,\mathrm{d}w,\mathrm{d}b)= roman_inf start_POSTSUBSCRIPT italic_ρ end_POSTSUBSCRIPT { ( ∫ | italic_a | start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT italic_ρ ( roman_d italic_a , roman_d italic_w , roman_d italic_b ) ) start_POSTSUPERSCRIPT 1 / italic_p end_POSTSUPERSCRIPT : italic_g ( italic_x ) = ∫ italic_a over~ start_ARG italic_σ end_ARG ( italic_w start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_x + italic_b ) italic_ρ ( roman_d italic_a , roman_d italic_w , roman_d italic_b )
onΩwithsupp(ρ)×B¯Rd×},\displaystyle\quad\quad\text{on}\,\Omega\;\text{with}\;\mathrm{supp}(\rho)% \subseteq\mathbb{R}\times\overline{B}_{R}^{d}\times\mathbb{R}\Bigg{\}},on roman_Ω with roman_supp ( italic_ρ ) ⊆ blackboard_R × over¯ start_ARG italic_B end_ARG start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT × blackboard_R } ,

where B¯Rd:={xd:|x|R}assignsuperscriptsubscript¯𝐵𝑅𝑑conditional-set𝑥superscript𝑑𝑥𝑅\overline{B}_{R}^{d}:=\left\{x\in\mathbb{R}^{d}\colon\lvert x\rvert\leq R\right\}over¯ start_ARG italic_B end_ARG start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT := { italic_x ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT : | italic_x | ≤ italic_R }. The corresponding Barron space is then defined as

Rp={g:d:gRp(Ω)<}.superscriptsubscript𝑅𝑝conditional-set𝑔:superscript𝑑subscriptnorm𝑔subscriptsuperscript𝑝𝑅Ω\mathcal{B}_{R}^{p}=\left\{g:\mathbb{R}^{d}\to\mathbb{R}\colon\|g\|_{\mathcal{% B}^{p}_{R}(\Omega)}<\infty\right\}.caligraphic_B start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT = { italic_g : blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT → blackboard_R : ∥ italic_g ∥ start_POSTSUBSCRIPT caligraphic_B start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT ( roman_Ω ) end_POSTSUBSCRIPT < ∞ } . (9)

As already discussed in [8], this definition of Barron space adapts a similar definition in [25] with several important modifications for the purpose of PDE analysis. Namely, it is required that the w𝑤witalic_w-marginal of the probability measure ρ𝜌\rhoitalic_ρ has compact support in order to control the derivatives of a Barron function, see [8] for further explanation. In addition, the Barron norm defined here only considers the norm of the parameter a𝑎aitalic_a while [25] takes into consideration w𝑤witalic_w and b𝑏bitalic_b as well. This is because [25] uses unbounded activation functions (such as ReLU), which requires the moment condition in all parameters to make the integral in (8) well-defined. This is not necessary for us, as we will limit σ~~𝜎\widetilde{\sigma}over~ start_ARG italic_σ end_ARG to be bounded, and w𝑤witalic_w is already bounded by R𝑅Ritalic_R.

A.4 Existence result

Using the Barron space defined above, we can impose our second assumption on the type of PDE, following the ideas in [8].

Assumption A.2.

For some constants RΨ,Rψ,Rf(0,)subscript𝑅Ψsubscript𝑅𝜓subscript𝑅𝑓0R_{\Psi},R_{\psi},R_{f}\in(0,\infty)italic_R start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT , italic_R start_POSTSUBSCRIPT italic_ψ end_POSTSUBSCRIPT , italic_R start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT ∈ ( 0 , ∞ ), the following bounds holds on the coefficients and the source term of our second-order elliptic PDE:

max1<i,j<dΨi,jRΨ1(d)<,ψRψ1(d)<,fRf1(d)<.\displaystyle\max_{1<i,j<d}\lVert\Psi_{i,j}\rVert_{\mathcal{B}^{1}_{R_{\Psi}}(% \mathbb{R}^{d})}<\infty,\quad\lVert\psi\rVert_{\mathcal{B}^{1}_{R_{\psi}}(% \mathbb{R}^{d})}<\infty,\quad\lVert f\rVert_{\mathcal{B}^{1}_{R_{f}}(\mathbb{R% }^{d})}<\infty.roman_max start_POSTSUBSCRIPT 1 < italic_i , italic_j < italic_d end_POSTSUBSCRIPT ∥ roman_Ψ start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT caligraphic_B start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_R start_POSTSUBSCRIPT roman_Ψ end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT ) end_POSTSUBSCRIPT < ∞ , ∥ italic_ψ ∥ start_POSTSUBSCRIPT caligraphic_B start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_R start_POSTSUBSCRIPT italic_ψ end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT ) end_POSTSUBSCRIPT < ∞ , ∥ italic_f ∥ start_POSTSUBSCRIPT caligraphic_B start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_R start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT ) end_POSTSUBSCRIPT < ∞ .

The assumptions still allow for a broad range of PDE. The two assumptions, Assumption A.1 and Assumption A.2, on the constants holds as long as Im(σ~)=0𝐼𝑚~𝜎0Im(\widetilde{\sigma})=0italic_I italic_m ( over~ start_ARG italic_σ end_ARG ) = 0. Possible source terms can be found in Proposition A.1 in [8].

We are now ready to prove the existence result in the main paper.

Theorem A.1.

Consider a second-order elliptic PDE of the form (5)

u=(Ψu)+ψu=f𝑢Ψ𝑢𝜓𝑢𝑓\mathcal{L}u=-\nabla\cdot(\Psi\nabla u)+\psi u=fcaligraphic_L italic_u = - ∇ ⋅ ( roman_Ψ ∇ italic_u ) + italic_ψ italic_u = italic_f (10)

with the coefficients Ψ(x),ψ(x)Ψ𝑥𝜓𝑥\Psi(x),\psi(x)roman_Ψ ( italic_x ) , italic_ψ ( italic_x ) and the source term f(x)𝑓𝑥f(x)italic_f ( italic_x ) satisfying Assumption A.1 and Assumption A.2, with weak solution uH1(d)superscript𝑢superscript𝐻1superscript𝑑u^{*}\in H^{1}(\mathbb{R}^{d})italic_u start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∈ italic_H start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT ( blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT ). Let ΩdΩsuperscript𝑑\Omega\subset\mathbb{R}^{d}roman_Ω ⊂ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT be any open, bounded subset following Definition 1 in Appendix A and μ𝜇\muitalic_μ being the Lebesgue measure. Then, for any ϵ(0,12)italic-ϵ012\epsilon\in(0,\frac{1}{2})italic_ϵ ∈ ( 0 , divide start_ARG 1 end_ARG start_ARG 2 end_ARG ), there exists one hidden layered neural network with tanh\tanhroman_tanh activation function of the form (3), constructed either by ELM or SWIM, such that

u^uL2(Ω,μ)ϵ.subscriptdelimited-∥∥^𝑢superscript𝑢subscript𝐿2Ω𝜇italic-ϵ\lVert\hat{u}-u^{*}\rVert_{L_{2}(\Omega,\mu)}\leq\epsilon.∥ over^ start_ARG italic_u end_ARG - italic_u start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT italic_L start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( roman_Ω , italic_μ ) end_POSTSUBSCRIPT ≤ italic_ϵ .
Proof.

Consider the weak solution usuperscript𝑢u^{*}italic_u start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPTof the PDE (10). According to the Theorem 2.10 from [8], for any open ΩdΩsuperscript𝑑\Omega\subset\mathbb{R}^{d}roman_Ω ⊂ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT and any ϵ(0,12)italic-ϵ012\epsilon\in(0,\frac{1}{2})italic_ϵ ∈ ( 0 , divide start_ARG 1 end_ARG start_ARG 2 end_ARG ), there exists a two-layer neural network ukcossubscriptsuperscript𝑢𝑘u^{\cos}_{k}italic_u start_POSTSUPERSCRIPT roman_cos end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT with σ~=cos~𝜎cos\widetilde{\sigma}=\text{cos}over~ start_ARG italic_σ end_ARG = cos of the form

ukcos(x)=1kk=1Kakcos(wkx+bk),xΩ,formulae-sequencesubscriptsuperscript𝑢𝑘𝑥1𝑘superscriptsubscript𝑘1𝐾subscript𝑎𝑘superscriptsubscript𝑤𝑘top𝑥subscript𝑏𝑘𝑥Ωu^{\cos}_{k}(x)=\frac{1}{k}\sum_{k=1}^{K}a_{k}\cos(w_{k}^{\top}x+b_{k}),\quad x% \in\Omega,italic_u start_POSTSUPERSCRIPT roman_cos end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_x ) = divide start_ARG 1 end_ARG start_ARG italic_k end_ARG ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT italic_a start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT roman_cos ( italic_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_x + italic_b start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) , italic_x ∈ roman_Ω , (11)

with parameters (ak;wk;bk)×d×,k=1,2,,Kformulae-sequencesubscript𝑎𝑘subscript𝑤𝑘subscript𝑏𝑘superscript𝑑𝑘12𝐾(a_{k};w_{k};b_{k})\in\mathbb{R}\times\mathbb{R}^{d}\times\mathbb{R},k=1,2,% \ldots,K( italic_a start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ; italic_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ; italic_b start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ∈ blackboard_R × blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT × blackboard_R , italic_k = 1 , 2 , … , italic_K and Kγμ(Ω)dβln(ϵ/2)𝐾𝛾𝜇Ωsuperscript𝑑𝛽italic-ϵ2K\leq\gamma\mu(\Omega)d^{\beta\ln{(\epsilon/2)}}italic_K ≤ italic_γ italic_μ ( roman_Ω ) italic_d start_POSTSUPERSCRIPT italic_β roman_ln ( italic_ϵ / 2 ) end_POSTSUPERSCRIPT with some absolute constants γ,β>0𝛾𝛽0\gamma,\beta>0italic_γ , italic_β > 0 (see [8] for more details) and μ(Ω)𝜇Ω\mu(\Omega)italic_μ ( roman_Ω ) the Lebesgue measure of ΩΩ\Omegaroman_Ω, such that

uukcosH1(Ω,μ)ϵ2.subscriptdelimited-∥∥superscript𝑢subscriptsuperscript𝑢𝑘superscript𝐻1Ω𝜇italic-ϵ2\lVert u^{*}-u^{\cos}_{k}\rVert_{H^{1}(\Omega,\mu)}\leq\frac{\epsilon}{2}.∥ italic_u start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT - italic_u start_POSTSUPERSCRIPT roman_cos end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT italic_H start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT ( roman_Ω , italic_μ ) end_POSTSUBSCRIPT ≤ divide start_ARG italic_ϵ end_ARG start_ARG 2 end_ARG .

As for any function g𝑔gitalic_g we have gL2(Ω,μ)gH1(Ω,μ)subscriptdelimited-∥∥𝑔superscript𝐿2Ω𝜇subscriptdelimited-∥∥𝑔superscript𝐻1Ω𝜇\lVert g\rVert_{L^{2}(\Omega,\mu)}\leq\lVert g\rVert_{H^{1}(\Omega,\mu)}∥ italic_g ∥ start_POSTSUBSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( roman_Ω , italic_μ ) end_POSTSUBSCRIPT ≤ ∥ italic_g ∥ start_POSTSUBSCRIPT italic_H start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT ( roman_Ω , italic_μ ) end_POSTSUBSCRIPT, it holds that

uukcosL2(Ω,μ)ϵ2.subscriptdelimited-∥∥superscript𝑢subscriptsuperscript𝑢𝑘superscript𝐿2Ω𝜇italic-ϵ2\lVert u^{*}-u^{\cos}_{k}\rVert_{L^{2}(\Omega,\mu)}\leq\frac{\epsilon}{2}.∥ italic_u start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT - italic_u start_POSTSUPERSCRIPT roman_cos end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( roman_Ω , italic_μ ) end_POSTSUBSCRIPT ≤ divide start_ARG italic_ϵ end_ARG start_ARG 2 end_ARG . (12)

As ukcossubscriptsuperscript𝑢𝑘u^{\cos}_{k}italic_u start_POSTSUPERSCRIPT roman_cos end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT is a continuous function on ΩcsubscriptΩ𝑐\Omega_{c}roman_Ω start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT, we know that for any ϵ2(0,1)subscriptitalic-ϵ201\epsilon_{2}\in(0,1)italic_ϵ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∈ ( 0 , 1 ) there exist neural networks u^^𝑢\hat{u}over^ start_ARG italic_u end_ARG with one hidden layer of arbitrary width and tanh\tanhroman_tanh-activation function such that

ukcosu^L(Ωc)ϵ2.subscriptnormsubscriptsuperscript𝑢𝑘^𝑢superscript𝐿subscriptΩ𝑐subscriptitalic-ϵ2\|u^{\cos}_{k}-\hat{u}\|_{L^{\infty}(\Omega_{c})}\leq\epsilon_{2}.∥ italic_u start_POSTSUPERSCRIPT roman_cos end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - over^ start_ARG italic_u end_ARG ∥ start_POSTSUBSCRIPT italic_L start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT ( roman_Ω start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT ≤ italic_ϵ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT .

This holds both for the ELM setting [14, 53], but also the SWIM setting [4]. From the Hölder inequality it follows that

ukcosμ^L2(Ωc,μ)μ(Ωc)1/2ukcosu^L(Ωc),subscriptnormsubscriptsuperscript𝑢𝑘^𝜇superscript𝐿2subscriptΩ𝑐𝜇𝜇superscriptsubscriptΩ𝑐12subscriptnormsubscriptsuperscript𝑢𝑘^𝑢superscript𝐿subscriptΩ𝑐\displaystyle\|u^{\cos}_{k}-\hat{\mu}\|_{L^{2}(\Omega_{c},\mu)}\leq\mu(\Omega_% {c})^{1/2}\|u^{\cos}_{k}-\hat{u}\|_{L^{\infty}(\Omega_{c})},∥ italic_u start_POSTSUPERSCRIPT roman_cos end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - over^ start_ARG italic_μ end_ARG ∥ start_POSTSUBSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( roman_Ω start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT , italic_μ ) end_POSTSUBSCRIPT ≤ italic_μ ( roman_Ω start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT ∥ italic_u start_POSTSUPERSCRIPT roman_cos end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - over^ start_ARG italic_u end_ARG ∥ start_POSTSUBSCRIPT italic_L start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT ( roman_Ω start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT ,

and we have

ukcosu^L2(Ω,μ)μ(Ωc)1/2ϵ2.subscriptnormsubscriptsuperscript𝑢𝑘^𝑢superscript𝐿2Ω𝜇𝜇superscriptsubscriptΩ𝑐12subscriptitalic-ϵ2\|u^{\cos}_{k}-\hat{u}\|_{L^{2}(\Omega,\mu)}\leq\mu(\Omega_{c})^{1/2}\epsilon_% {2}.∥ italic_u start_POSTSUPERSCRIPT roman_cos end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - over^ start_ARG italic_u end_ARG ∥ start_POSTSUBSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( roman_Ω , italic_μ ) end_POSTSUBSCRIPT ≤ italic_μ ( roman_Ω start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT italic_ϵ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT .

Taking ϵ2=μ(Ωc)1/2ϵ2subscriptitalic-ϵ2𝜇superscriptsubscriptΩ𝑐12italic-ϵ2\epsilon_{2}=\mu(\Omega_{c})^{1/2}\frac{\epsilon}{2}italic_ϵ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = italic_μ ( roman_Ω start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT divide start_ARG italic_ϵ end_ARG start_ARG 2 end_ARG, we get

ukcosu^L(Ω,μ)ϵ2.subscriptnormsubscriptsuperscript𝑢𝑘^𝑢superscript𝐿Ω𝜇italic-ϵ2\|u^{\cos}_{k}-\hat{u}\|_{L^{\infty}(\Omega,\mu)}\leq\frac{\epsilon}{2}.∥ italic_u start_POSTSUPERSCRIPT roman_cos end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - over^ start_ARG italic_u end_ARG ∥ start_POSTSUBSCRIPT italic_L start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT ( roman_Ω , italic_μ ) end_POSTSUBSCRIPT ≤ divide start_ARG italic_ϵ end_ARG start_ARG 2 end_ARG . (13)

Now, allowing for (12) and (13), we see that the weak PDE solution usuperscript𝑢u^{*}italic_u start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT can be well approximated by the tanh-network in terms of the L2superscript𝐿2L^{2}italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT-norm due to

u^uL2(Ω)u^ukcosL2(Ω,μ)+ukcosuL2(Ω,μ)ϵ2+ϵ2=ϵ,subscriptnorm^𝑢superscript𝑢superscript𝐿2Ωsubscriptnorm^𝑢subscriptsuperscript𝑢𝑘superscript𝐿2Ω𝜇subscriptnormsubscriptsuperscript𝑢𝑘superscript𝑢superscript𝐿2Ω𝜇italic-ϵ2italic-ϵ2italic-ϵ\|\hat{u}-u^{*}\|_{L^{2}(\Omega)}\leq\|\hat{u}-u^{\cos}_{k}\|_{L^{2}(\Omega,% \mu)}+\|u^{\cos}_{k}-u^{*}\|_{L^{2}(\Omega,\mu)}\leq\frac{\epsilon}{2}+\frac{% \epsilon}{2}=\epsilon,∥ over^ start_ARG italic_u end_ARG - italic_u start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( roman_Ω ) end_POSTSUBSCRIPT ≤ ∥ over^ start_ARG italic_u end_ARG - italic_u start_POSTSUPERSCRIPT roman_cos end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( roman_Ω , italic_μ ) end_POSTSUBSCRIPT + ∥ italic_u start_POSTSUPERSCRIPT roman_cos end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - italic_u start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( roman_Ω , italic_μ ) end_POSTSUBSCRIPT ≤ divide start_ARG italic_ϵ end_ARG start_ARG 2 end_ARG + divide start_ARG italic_ϵ end_ARG start_ARG 2 end_ARG = italic_ϵ ,

which completes the proof. ∎

A.5 Convergence for sampled networks

We now consider the case where we sample the hidden layer, and determine if the outer weights exist such that the network is ϵitalic-ϵ\epsilonitalic_ϵ-close to the weak solution of the elliptic PDE. First, we define the kernel 𝒦𝒦\mathcal{K}caligraphic_K

𝒦(x,x)=𝒲σ(wx+b)σ(wx+b)ρ~(dw,db),𝒦𝑥superscript𝑥subscript𝒲𝜎superscript𝑤top𝑥𝑏𝜎superscript𝑤topsuperscript𝑥𝑏~𝜌d𝑤d𝑏\displaystyle\mathcal{K}(x,x^{\prime})=\int_{\mathcal{W}}\sigma(w^{\top}x+b)% \sigma(w^{\top}x^{\prime}+b)\widetilde{\rho}(\mathrm{d}w,\mathrm{d}b),caligraphic_K ( italic_x , italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) = ∫ start_POSTSUBSCRIPT caligraphic_W end_POSTSUBSCRIPT italic_σ ( italic_w start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_x + italic_b ) italic_σ ( italic_w start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT + italic_b ) over~ start_ARG italic_ρ end_ARG ( roman_d italic_w , roman_d italic_b ) , (14)

where ρ~~𝜌\widetilde{\rho}over~ start_ARG italic_ρ end_ARG is the distribution from which we sample the weights and biases of the hidden layer, and 𝒲𝒲\mathcal{W}caligraphic_W is the parameter space. As σ𝜎\sigmaitalic_σ equals tanh, 𝒦𝒦\mathcal{K}caligraphic_K exists and is well-defined, due to boundedness of σ𝜎\sigmaitalic_σ. The accompanying RKHS is then defined as

2={𝒲α(w,b)σ(w+b)ρ~(dw,db):𝒲|α(w,b)|2ρ~(dw,db)<}.\displaystyle\mathcal{H}_{2}=\left\{\int_{\mathcal{W}}\alpha(w,b)\sigma(w^{% \top}\cdot+b)\widetilde{\rho}(\mathrm{d}w,\mathrm{d}b)\colon\int_{\mathcal{W}}% \lvert\alpha(w,b)\rvert^{2}\widetilde{\rho}(\mathrm{d}w,\mathrm{d}b)<\infty% \right\}.caligraphic_H start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = { ∫ start_POSTSUBSCRIPT caligraphic_W end_POSTSUBSCRIPT italic_α ( italic_w , italic_b ) italic_σ ( italic_w start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ⋅ + italic_b ) over~ start_ARG italic_ρ end_ARG ( roman_d italic_w , roman_d italic_b ) : ∫ start_POSTSUBSCRIPT caligraphic_W end_POSTSUBSCRIPT | italic_α ( italic_w , italic_b ) | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT over~ start_ARG italic_ρ end_ARG ( roman_d italic_w , roman_d italic_b ) < ∞ } .

Before we can show the main result, we need a simple lemma stating that any finite neural network can be approximated by a function in 2subscript2\mathcal{H}_{2}caligraphic_H start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT.

Lemma A.1.

Let 𝒦𝒦\mathcal{K}caligraphic_K in Equation 14 be well-defined. For every neural network u^(x)=k=1Kakσ(wkx+bk)^𝑢𝑥superscriptsubscript𝑘1𝐾subscript𝑎𝑘𝜎superscriptsubscript𝑤𝑘top𝑥subscript𝑏𝑘\hat{u}(x)=\sum_{k=1}^{K}a_{k}\sigma(w_{k}^{\top}x+b_{k})over^ start_ARG italic_u end_ARG ( italic_x ) = ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT italic_a start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_σ ( italic_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_x + italic_b start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) and ϵ(0,1)italic-ϵ01\epsilon\in(0,1)italic_ϵ ∈ ( 0 , 1 ), if for all k=1,2,,K𝑘12𝐾k=1,2,\dots,Kitalic_k = 1 , 2 , … , italic_K, there exist δk>0subscript𝛿𝑘0\delta_{k}>0italic_δ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT > 0 such that 0<ρ~(B¯Rd+1(wk,bk))<0~𝜌subscriptsuperscript¯𝐵𝑑1𝑅subscript𝑤𝑘subscript𝑏𝑘0<\widetilde{\rho}(\overline{B}^{d+1}_{R}(w_{k},b_{k}))<\infty0 < over~ start_ARG italic_ρ end_ARG ( over¯ start_ARG italic_B end_ARG start_POSTSUPERSCRIPT italic_d + 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_R end_POSTSUBSCRIPT ( italic_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_b start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ) < ∞ for all R(0,δk]𝑅0subscript𝛿𝑘R\in(0,\delta_{k}]italic_R ∈ ( 0 , italic_δ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ], then there is a function h2subscript2h\in\mathcal{H}_{2}italic_h ∈ caligraphic_H start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT such that

u^hL2(Ω,μ)<ϵ,subscriptdelimited-∥∥^𝑢superscript𝐿2Ω𝜇italic-ϵ\displaystyle\lVert\hat{u}-h\rVert_{L^{2}(\Omega,\mu)}<\epsilon,∥ over^ start_ARG italic_u end_ARG - italic_h ∥ start_POSTSUBSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( roman_Ω , italic_μ ) end_POSTSUBSCRIPT < italic_ϵ ,

where μ𝜇\muitalic_μ is the Lebesgue measure.

Proof.

By continuity of σ𝜎\sigmaitalic_σ and on a compact subset ΩcsubscriptΩ𝑐\Omega_{c}roman_Ω start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT, we have the following: for every ϵ^=ϵmax{|ak|,1}k=1KKμ(Ω)>0\hat{\epsilon}=\frac{\sqrt{\epsilon}}{\max\{\lvert a_{k}\rvert,1\}_{k=1}^{K}% \cdot K\cdot\mu(\Omega)}>0over^ start_ARG italic_ϵ end_ARG = divide start_ARG square-root start_ARG italic_ϵ end_ARG end_ARG start_ARG roman_max { | italic_a start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT | , 1 } start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT ⋅ italic_K ⋅ italic_μ ( roman_Ω ) end_ARG > 0, there exists a δ𝛿\deltaitalic_δ such that for all k=1,2,,K𝑘12𝐾k=1,2,\dots,Kitalic_k = 1 , 2 , … , italic_K, if [w~k,b~k]B¯min{δk,δ}d+1(wk,bk)subscript~𝑤𝑘subscript~𝑏𝑘subscriptsuperscript¯𝐵𝑑1subscript𝛿𝑘𝛿subscript𝑤𝑘subscript𝑏𝑘[\tilde{w}_{k},\tilde{b}_{k}]\in\overline{B}^{d+1}_{\min\{\delta_{k},\delta\}}% (w_{k},b_{k})[ over~ start_ARG italic_w end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , over~ start_ARG italic_b end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ] ∈ over¯ start_ARG italic_B end_ARG start_POSTSUPERSCRIPT italic_d + 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_min { italic_δ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_δ } end_POSTSUBSCRIPT ( italic_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_b start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ), then

σ(w~k+b~k)σ(wk+bk)L(Ωc)2<ϵ^.\displaystyle\lVert\sigma(\tilde{w}_{k}^{\top}\cdot+\tilde{b}_{k})-\sigma(w_{k% }^{\top}\cdot+b_{k})\rVert^{2}_{L^{\infty}(\Omega_{c})}<\hat{\epsilon}.∥ italic_σ ( over~ start_ARG italic_w end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ⋅ + over~ start_ARG italic_b end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) - italic_σ ( italic_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ⋅ + italic_b start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_L start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT ( roman_Ω start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT < over^ start_ARG italic_ϵ end_ARG .

Let Bk=B¯min{δk,δ}d+1(wk,bk)subscript𝐵𝑘subscriptsuperscript¯𝐵𝑑1subscript𝛿𝑘𝛿subscript𝑤𝑘subscript𝑏𝑘B_{k}=\overline{B}^{d+1}_{\min\{\delta_{k},\delta\}}(w_{k},b_{k})italic_B start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = over¯ start_ARG italic_B end_ARG start_POSTSUPERSCRIPT italic_d + 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_min { italic_δ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_δ } end_POSTSUBSCRIPT ( italic_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_b start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ). We then set

α(w,b)={akρ~(Bk),if [w,b]Bk0,otherwise.𝛼𝑤𝑏casessubscript𝑎𝑘~𝜌subscript𝐵𝑘if 𝑤𝑏subscript𝐵𝑘0otherwise.\displaystyle\alpha(w,b)=\begin{cases}\frac{a_{k}}{\widetilde{\rho}(B_{k})},&% \text{if }[w,b]\in B_{k}\\ 0,&\text{otherwise.}\end{cases}italic_α ( italic_w , italic_b ) = { start_ROW start_CELL divide start_ARG italic_a start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG start_ARG over~ start_ARG italic_ρ end_ARG ( italic_B start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) end_ARG , end_CELL start_CELL if [ italic_w , italic_b ] ∈ italic_B start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL 0 , end_CELL start_CELL otherwise. end_CELL end_ROW

Constructing hhitalic_h with this α𝛼\alphaitalic_α, we have

h(x)𝑥\displaystyle h(x)italic_h ( italic_x ) =𝒲α(w,b)σ(wx+b)ρ~(dw,db)absentsubscript𝒲𝛼𝑤𝑏𝜎superscript𝑤top𝑥𝑏~𝜌d𝑤d𝑏\displaystyle=\int_{\mathcal{W}}\alpha(w,b)\sigma(w^{\top}x+b)\widetilde{\rho}% (\mathrm{d}w,\mathrm{d}b)= ∫ start_POSTSUBSCRIPT caligraphic_W end_POSTSUBSCRIPT italic_α ( italic_w , italic_b ) italic_σ ( italic_w start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_x + italic_b ) over~ start_ARG italic_ρ end_ARG ( roman_d italic_w , roman_d italic_b )
=k=1KBkakρ~(Bk)σ(wx+b)ρ~(dw,db)absentsuperscriptsubscript𝑘1𝐾subscriptsubscript𝐵𝑘subscript𝑎𝑘~𝜌subscript𝐵𝑘𝜎superscript𝑤top𝑥𝑏~𝜌d𝑤d𝑏\displaystyle=\sum_{k=1}^{K}\int_{B_{k}}\frac{a_{k}}{\widetilde{\rho}(B_{k})}% \sigma(w^{\top}x+b)\widetilde{\rho}(\mathrm{d}w,\mathrm{d}b)= ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT ∫ start_POSTSUBSCRIPT italic_B start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT divide start_ARG italic_a start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG start_ARG over~ start_ARG italic_ρ end_ARG ( italic_B start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) end_ARG italic_σ ( italic_w start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_x + italic_b ) over~ start_ARG italic_ρ end_ARG ( roman_d italic_w , roman_d italic_b )
=k=1Kakσ(wkx+bk)+akρ~(Bk)Bk(σ(wx+b)σ(wkx+bk))ρ~(dw,db)absentsuperscriptsubscript𝑘1𝐾subscript𝑎𝑘𝜎superscriptsubscript𝑤𝑘top𝑥subscript𝑏𝑘subscript𝑎𝑘~𝜌subscript𝐵𝑘subscriptsubscript𝐵𝑘𝜎superscript𝑤top𝑥𝑏𝜎superscriptsubscript𝑤𝑘top𝑥subscript𝑏𝑘~𝜌d𝑤d𝑏\displaystyle=\sum_{k=1}^{K}a_{k}\sigma(w_{k}^{\top}x+b_{k})+\frac{a_{k}}{% \widetilde{\rho}(B_{k})}\int_{B_{k}}(\sigma(w^{\top}x+b)-\sigma(w_{k}^{\top}x+% b_{k}))\widetilde{\rho}(\mathrm{d}w,\mathrm{d}b)= ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT italic_a start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_σ ( italic_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_x + italic_b start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) + divide start_ARG italic_a start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG start_ARG over~ start_ARG italic_ρ end_ARG ( italic_B start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) end_ARG ∫ start_POSTSUBSCRIPT italic_B start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_σ ( italic_w start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_x + italic_b ) - italic_σ ( italic_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_x + italic_b start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ) over~ start_ARG italic_ρ end_ARG ( roman_d italic_w , roman_d italic_b )
=u^+k=1Kakρ~(Bk)Bk(σ(wx+b)σ(wkx+bk))ρ~(dw,db).absent^𝑢superscriptsubscript𝑘1𝐾subscript𝑎𝑘~𝜌subscript𝐵𝑘subscriptsubscript𝐵𝑘𝜎superscript𝑤top𝑥𝑏𝜎superscriptsubscript𝑤𝑘top𝑥subscript𝑏𝑘~𝜌d𝑤d𝑏\displaystyle=\hat{u}+\sum_{k=1}^{K}\frac{a_{k}}{\widetilde{\rho}(B_{k})}\int_% {B_{k}}(\sigma(w^{\top}x+b)-\sigma(w_{k}^{\top}x+b_{k}))\widetilde{\rho}(% \mathrm{d}w,\mathrm{d}b).= over^ start_ARG italic_u end_ARG + ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT divide start_ARG italic_a start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG start_ARG over~ start_ARG italic_ρ end_ARG ( italic_B start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) end_ARG ∫ start_POSTSUBSCRIPT italic_B start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_σ ( italic_w start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_x + italic_b ) - italic_σ ( italic_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_x + italic_b start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ) over~ start_ARG italic_ρ end_ARG ( roman_d italic_w , roman_d italic_b ) .

We then have

u^hL2(Ω,μ)2superscriptsubscriptdelimited-∥∥^𝑢superscript𝐿2Ω𝜇2\displaystyle\lVert\hat{u}-h\rVert_{L^{2}(\Omega,\mu)}^{2}∥ over^ start_ARG italic_u end_ARG - italic_h ∥ start_POSTSUBSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( roman_Ω , italic_μ ) end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT =k=1Kakρ~(Bk)Bk(σ(wk+bk)σ(w+b))ρ~(dw,db)L2(Ω,μ)2\displaystyle=\left\lVert\sum_{k=1}^{K}\frac{a_{k}}{\widetilde{\rho}(B_{k})}% \int_{B_{k}}(\sigma(w_{k}^{\top}\cdot+b_{k})-\sigma(w^{\top}\cdot+b))% \widetilde{\rho}(\mathrm{d}w,\mathrm{d}b)\right\rVert_{L^{2}(\Omega,\mu)}^{2}= ∥ ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT divide start_ARG italic_a start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG start_ARG over~ start_ARG italic_ρ end_ARG ( italic_B start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) end_ARG ∫ start_POSTSUBSCRIPT italic_B start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_σ ( italic_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ⋅ + italic_b start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) - italic_σ ( italic_w start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ⋅ + italic_b ) ) over~ start_ARG italic_ρ end_ARG ( roman_d italic_w , roman_d italic_b ) ∥ start_POSTSUBSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( roman_Ω , italic_μ ) end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
<k=1KϵKμ(Ω)L2(Ω,μ)2=ϵ.absentsuperscriptsubscriptdelimited-∥∥superscriptsubscript𝑘1𝐾italic-ϵ𝐾𝜇Ωsuperscript𝐿2Ω𝜇2italic-ϵ\displaystyle<\left\lVert\sum_{k=1}^{K}\frac{\sqrt{\epsilon}}{K\cdot\mu(\Omega% )}\right\rVert_{L^{2}(\Omega,\mu)}^{2}=\epsilon.< ∥ ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT divide start_ARG square-root start_ARG italic_ϵ end_ARG end_ARG start_ARG italic_K ⋅ italic_μ ( roman_Ω ) end_ARG ∥ start_POSTSUBSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( roman_Ω , italic_μ ) end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = italic_ϵ .

Remark 1.

The condition on ρ~~𝜌\widetilde{\rho}over~ start_ARG italic_ρ end_ARG and the balls surrounding the weights of the parameters of u^^𝑢\hat{u}over^ start_ARG italic_u end_ARG holds in both the case where u^^𝑢\hat{u}over^ start_ARG italic_u end_ARG is constructed by ELM and the case where it is constructed by SWIM. For the ELM case, as long as we set η𝜂\etaitalic_η sufficiently large enough which we will assume, this holds true. In SWIM, any network used to prove convergence of Theorem 1 in [4] relies on weights and biases that corresponds to points in the interior of the input space. This implies we can find a δksubscript𝛿𝑘\delta_{k}italic_δ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT small enough such that the support of ϕ~~italic-ϕ\widetilde{\phi}over~ start_ARG italic_ϕ end_ARG includes the balls Bksubscript𝐵𝑘B_{k}italic_B start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT in the proof above. The finite nonzero requirement holds due to ρ~~𝜌\widetilde{\rho}over~ start_ARG italic_ρ end_ARG is either the Gaussian/uniform when sampling with ELM, and uniform or the finite-based distribution in [4] when sampling with SWIM.

We are now ready to prove the main result of convergence when sampling the weights. Letting ρ~~𝜌\widetilde{\rho}over~ start_ARG italic_ρ end_ARG be one of the three distributions discussed in the previous remark, such that Lemma A.1 holds.

Theorem A.2.

With the same PDE and input setup, and assumptions as in Theorem A.1, let the sampling method be either ELM or SWIM with distribution ρ~~𝜌\widetilde{\rho}over~ start_ARG italic_ρ end_ARG, and let δ(0,1]𝛿01\delta\in(0,1]italic_δ ∈ ( 0 , 1 ]. Then for any ϵ(0,12)italic-ϵ012\epsilon\in(0,\frac{1}{2})italic_ϵ ∈ ( 0 , divide start_ARG 1 end_ARG start_ARG 2 end_ARG ), there exist a K𝐾K\in\mathbb{N}italic_K ∈ blackboard_N, so that by sampling K𝐾Kitalic_K pairs of weights WK×d𝑊superscript𝐾𝑑W\in\mathbb{R}^{K\times d}italic_W ∈ blackboard_R start_POSTSUPERSCRIPT italic_K × italic_d end_POSTSUPERSCRIPT and biases bK𝑏superscript𝐾b\in\mathbb{R}^{K}italic_b ∈ blackboard_R start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT i.i.d. from ρ~~𝜌\widetilde{\rho}over~ start_ARG italic_ρ end_ARG, there exists outer weights cK𝑐superscript𝐾c\in\mathbb{R}^{K}italic_c ∈ blackboard_R start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT, such that the bound

u^uL2(Ω,μ)=cσ(W+b)uL2(Ω,μ)ϵ.\lVert\hat{u}-u^{*}\rVert_{L_{2}(\Omega,\mu)}=\lVert c^{\top}\sigma(W\cdot+b)-% u^{*}\rVert_{L^{2}{(\Omega,\mu)}}\leq\epsilon.∥ over^ start_ARG italic_u end_ARG - italic_u start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT italic_L start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( roman_Ω , italic_μ ) end_POSTSUBSCRIPT = ∥ italic_c start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_σ ( italic_W ⋅ + italic_b ) - italic_u start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( roman_Ω , italic_μ ) end_POSTSUBSCRIPT ≤ italic_ϵ .

holds with probability 1δ1𝛿1-\delta1 - italic_δ.

Proof.

We start by constructing a neural network with tanh-activation function u^1subscript^𝑢1\hat{u}_{1}over^ start_ARG italic_u end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, such that

u^1uL2(Ω,μ)2ϵ3,superscriptsubscriptdelimited-∥∥subscript^𝑢1superscript𝑢superscript𝐿2Ω𝜇2italic-ϵ3\displaystyle\lVert\hat{u}_{1}-u^{*}\rVert_{L^{2}(\Omega,\mu)}^{2}\leq\frac{% \epsilon}{3},∥ over^ start_ARG italic_u end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - italic_u start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( roman_Ω , italic_μ ) end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ divide start_ARG italic_ϵ end_ARG start_ARG 3 end_ARG ,

which holds by Theorem A.1. Secondly, we choose a h2subscript2h\in\mathcal{H}_{2}italic_h ∈ caligraphic_H start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT such that

hu^1L2(Ω,μ)2ϵ3,superscriptsubscriptdelimited-∥∥subscript^𝑢1superscript𝐿2Ω𝜇2italic-ϵ3\displaystyle\lVert h-\hat{u}_{1}\rVert_{L^{2}(\Omega,\mu)}^{2}\leq\frac{% \epsilon}{3},∥ italic_h - over^ start_ARG italic_u end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( roman_Ω , italic_μ ) end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ divide start_ARG italic_ϵ end_ARG start_ARG 3 end_ARG ,

which we can by Lemma A.1.

We then construct a probability measure μ~=μμ(Ωc)~𝜇𝜇𝜇subscriptΩ𝑐\widetilde{\mu}=\frac{\mu}{\mu(\Omega_{c})}over~ start_ARG italic_μ end_ARG = divide start_ARG italic_μ end_ARG start_ARG italic_μ ( roman_Ω start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ) end_ARG. Now we can observe a few things. Firstly, |σ(wx+b)|1𝜎superscript𝑤top𝑥𝑏1\lvert\sigma(w^{\top}x+b)\rvert\leq 1| italic_σ ( italic_w start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_x + italic_b ) | ≤ 1 a.s. Secondly, |h(x)|l𝑥𝑙\lvert h(x)\rvert\leq l| italic_h ( italic_x ) | ≤ italic_l a.s. w.r.t. μ~~𝜇\widetilde{\mu}over~ start_ARG italic_μ end_ARG, where l>0𝑙0l>0italic_l > 0. This is due to the compactness of ΩcsubscriptΩ𝑐\Omega_{c}roman_Ω start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT and hhitalic_h being continuous.

From this, we can deduce that Theorem 1 in [60] holds, with the distribution over the data π𝜋\piitalic_π being such that the marginal for xΩc𝑥subscriptΩ𝑐x\in\Omega_{c}italic_x ∈ roman_Ω start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT is μ~~𝜇\widetilde{\mu}over~ start_ARG italic_μ end_ARG, and the distribution of y𝑦yitalic_y conditioned on x𝑥xitalic_x is the Dirac delta at h(x)𝑥h(x)italic_h ( italic_x ). This means there exist a N0subscript𝑁0N_{0}italic_N start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, and we set

N=max{N0,zδ2},zδ=e1log218δϵ/3μ(Ωc)2,formulae-sequence𝑁subscript𝑁0superscriptsubscript𝑧𝛿2subscript𝑧𝛿subscript𝑒1superscript218𝛿italic-ϵ3𝜇superscriptsubscriptΩ𝑐2\displaystyle N=\max\{N_{0},z_{\delta}^{2}\},\quad z_{\delta}=\frac{e_{1}\log^% {2}\frac{18}{\delta}}{\epsilon/3\cdot\mu(\Omega_{c})^{2}},italic_N = roman_max { italic_N start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_z start_POSTSUBSCRIPT italic_δ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT } , italic_z start_POSTSUBSCRIPT italic_δ end_POSTSUBSCRIPT = divide start_ARG italic_e start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT roman_log start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT divide start_ARG 18 end_ARG start_ARG italic_δ end_ARG end_ARG start_ARG italic_ϵ / 3 ⋅ italic_μ ( roman_Ω start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ,

where e1>0subscript𝑒10e_{1}>0italic_e start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT > 0 is a constant. We then sample i.i.d. {xn,f(xn)}n=1Nπsimilar-tosuperscriptsubscriptsubscript𝑥𝑛𝑓subscript𝑥𝑛𝑛1𝑁𝜋\{x_{n},f(x_{n})\}_{n=1}^{N}\sim\pi{ italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_f ( italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) } start_POSTSUBSCRIPT italic_n = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT ∼ italic_π, set λ=N1/2𝜆superscript𝑁12\lambda=N^{-1/2}italic_λ = italic_N start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT, and sample

K=e0max{N0,zδ}log108max{N0,zδ}δ,𝐾subscript𝑒0subscript𝑁0subscript𝑧𝛿108subscript𝑁0subscript𝑧𝛿𝛿\displaystyle K=e_{0}\max\{\sqrt{N_{0}},z_{\delta}\}\log\frac{108\max\{\sqrt{N% _{0}},z_{\delta}\}}{\delta},italic_K = italic_e start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT roman_max { square-root start_ARG italic_N start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG , italic_z start_POSTSUBSCRIPT italic_δ end_POSTSUBSCRIPT } roman_log divide start_ARG 108 roman_max { square-root start_ARG italic_N start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG , italic_z start_POSTSUBSCRIPT italic_δ end_POSTSUBSCRIPT } end_ARG start_ARG italic_δ end_ARG ,

weights and biases, where e0>0subscript𝑒00e_{0}>0italic_e start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT > 0 is a constant. Let W𝑊Witalic_W be the resulting sampled weight matrix, and b𝑏bitalic_b be the resulting bias vector. By setting c𝑐citalic_c in similar fashion to Equation 7 in [60], and u^=cσ(W+b)\hat{u}=c^{\top}\sigma(W\cdot+b)over^ start_ARG italic_u end_ARG = italic_c start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_σ ( italic_W ⋅ + italic_b ), we have that with probability 1δ1𝛿1-\delta1 - italic_δ,

u^hL2(Ωc,μ~)2e1log218δzδ=ϵ3μ(Ωc)2.superscriptsubscriptdelimited-∥∥^𝑢superscript𝐿2subscriptΩ𝑐~𝜇2subscript𝑒1superscript218𝛿subscript𝑧𝛿italic-ϵ3𝜇superscriptsubscriptΩ𝑐2\displaystyle\lVert\hat{u}-h\rVert_{L^{2}(\Omega_{c},\widetilde{\mu})}^{2}\leq% \frac{e_{1}\log^{2}\frac{18}{\delta}}{z_{\delta}}=\frac{\epsilon}{3\cdot\mu(% \Omega_{c})^{2}}.∥ over^ start_ARG italic_u end_ARG - italic_h ∥ start_POSTSUBSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( roman_Ω start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT , over~ start_ARG italic_μ end_ARG ) end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ divide start_ARG italic_e start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT roman_log start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT divide start_ARG 18 end_ARG start_ARG italic_δ end_ARG end_ARG start_ARG italic_z start_POSTSUBSCRIPT italic_δ end_POSTSUBSCRIPT end_ARG = divide start_ARG italic_ϵ end_ARG start_ARG 3 ⋅ italic_μ ( roman_Ω start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG .

Combining the results above, with probability 1δ1𝛿1-\delta1 - italic_δ, the following bound holds,

u^uL2(Ω,μ)2superscriptsubscriptdelimited-∥∥^𝑢superscript𝑢superscript𝐿2Ω𝜇2\displaystyle\lVert\hat{u}-u^{*}\rVert_{L^{2}(\Omega,\mu)}^{2}∥ over^ start_ARG italic_u end_ARG - italic_u start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( roman_Ω , italic_μ ) end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT μ(Ωc)2μ(Ωc)2u^hL2(Ω,μ)2+hu^1L2(Ω,μ)2+u^1uL2(Ω,μ)2absent𝜇superscriptsubscriptΩ𝑐2𝜇superscriptsubscriptΩ𝑐2superscriptsubscriptdelimited-∥∥^𝑢superscript𝐿2Ω𝜇2superscriptsubscriptdelimited-∥∥subscript^𝑢1superscript𝐿2Ω𝜇2superscriptsubscriptdelimited-∥∥subscript^𝑢1superscript𝑢superscript𝐿2Ω𝜇2\displaystyle\leq\frac{\mu(\Omega_{c})^{2}}{\mu(\Omega_{c})^{2}}\lVert\hat{u}-% h\rVert_{L^{2}(\Omega,\mu)}^{2}+\lVert h-\hat{u}_{1}\rVert_{L^{2}(\Omega,\mu)}% ^{2}+\lVert\hat{u}_{1}-u^{*}\rVert_{L^{2}(\Omega,\mu)}^{2}≤ divide start_ARG italic_μ ( roman_Ω start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_μ ( roman_Ω start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ∥ over^ start_ARG italic_u end_ARG - italic_h ∥ start_POSTSUBSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( roman_Ω , italic_μ ) end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ∥ italic_h - over^ start_ARG italic_u end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( roman_Ω , italic_μ ) end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ∥ over^ start_ARG italic_u end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - italic_u start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∥ start_POSTSUBSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( roman_Ω , italic_μ ) end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
μ(Ωc)2u^hL2(Ωc,μ~)2+ϵ3+ϵ3absent𝜇superscriptsubscriptΩ𝑐2superscriptsubscriptdelimited-∥∥^𝑢superscript𝐿2subscriptΩ𝑐~𝜇2italic-ϵ3italic-ϵ3\displaystyle\leq\mu(\Omega_{c})^{2}\lVert\hat{u}-h\rVert_{L^{2}(\Omega_{c},% \widetilde{\mu})}^{2}+\frac{\epsilon}{3}+\frac{\epsilon}{3}≤ italic_μ ( roman_Ω start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∥ over^ start_ARG italic_u end_ARG - italic_h ∥ start_POSTSUBSCRIPT italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( roman_Ω start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT , over~ start_ARG italic_μ end_ARG ) end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG italic_ϵ end_ARG start_ARG 3 end_ARG + divide start_ARG italic_ϵ end_ARG start_ARG 3 end_ARG
μ(Ωc)2ϵ3μ(Ωc)2+2ϵ3=ϵ.absent𝜇superscriptsubscriptΩ𝑐2italic-ϵ3𝜇superscriptsubscriptΩ𝑐22italic-ϵ3italic-ϵ\displaystyle\leq\mu(\Omega_{c})^{2}\cdot\frac{\epsilon}{3\cdot\mu(\Omega_{c})% ^{2}}+\frac{2\epsilon}{3}=\epsilon.≤ italic_μ ( roman_Ω start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ⋅ divide start_ARG italic_ϵ end_ARG start_ARG 3 ⋅ italic_μ ( roman_Ω start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG + divide start_ARG 2 italic_ϵ end_ARG start_ARG 3 end_ARG = italic_ϵ .

Appendix B Methods

The computational experiments were performed on the following system: UBUNTU 20.04.6 LTS, PYTHON 3.12.3, NUMPY 1.26.4, SCIPY 1.13.0, MATPLOTLIB 3.9.0, PYTORCH 1.12.1, and NVIDIA Driver 515.105.01, i7 CPU.

The code to reproduce the experiments from the paper, and an up-to-date code base, can be found (with MIT Licence) at

https://gitlab.com/felix.dietrich/swimpde-paper,

https://gitlab.com/felix.dietrich/swimpde.

Let d𝑑ditalic_d be the dimension of space and N𝑁Nitalic_N be the total number of points in the spatio-temporal domain ΩΩ\Omegaroman_Ω (d,)absentsuperscript𝑑\subset(\mathbb{R}^{d},\mathbb{R})⊂ ( blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT , blackboard_R ). Then, given N𝑁Nitalic_N points in a test set X𝑋Xitalic_X, we define the error metrics we use to compare numerical results are RMSE and relative L2superscript𝐿2L^{2}italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT error by

RMSE:=xX(utrue(x)upred(x))2N,andLrelative2:=xX(utrue(x)upred(x))2xX(utrue(x))2.formulae-sequenceassignRMSEsubscript𝑥𝑋superscriptsubscript𝑢𝑡𝑟𝑢𝑒𝑥subscript𝑢𝑝𝑟𝑒𝑑𝑥2𝑁assignandsubscriptsuperscript𝐿2relativesubscript𝑥𝑋superscriptsubscript𝑢𝑡𝑟𝑢𝑒𝑥subscript𝑢𝑝𝑟𝑒𝑑𝑥2subscript𝑥𝑋superscriptsubscript𝑢𝑡𝑟𝑢𝑒𝑥2\textnormal{RMSE}:=\sqrt{\frac{\sum_{x\in X}(u_{true}(x)-u_{pred}(x))^{2}}{N}}% ,\ \textnormal{and}\ L^{2}_{\text{relative}}:=\frac{\sqrt{\sum_{x\in X}(u_{% true}(x)-u_{pred}(x))^{2}}}{\sqrt{\sum_{x\in X}(u_{true}(x))^{2}}}.RMSE := square-root start_ARG divide start_ARG ∑ start_POSTSUBSCRIPT italic_x ∈ italic_X end_POSTSUBSCRIPT ( italic_u start_POSTSUBSCRIPT italic_t italic_r italic_u italic_e end_POSTSUBSCRIPT ( italic_x ) - italic_u start_POSTSUBSCRIPT italic_p italic_r italic_e italic_d end_POSTSUBSCRIPT ( italic_x ) ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_N end_ARG end_ARG , and italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT relative end_POSTSUBSCRIPT := divide start_ARG square-root start_ARG ∑ start_POSTSUBSCRIPT italic_x ∈ italic_X end_POSTSUBSCRIPT ( italic_u start_POSTSUBSCRIPT italic_t italic_r italic_u italic_e end_POSTSUBSCRIPT ( italic_x ) - italic_u start_POSTSUBSCRIPT italic_p italic_r italic_e italic_d end_POSTSUBSCRIPT ( italic_x ) ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG end_ARG start_ARG square-root start_ARG ∑ start_POSTSUBSCRIPT italic_x ∈ italic_X end_POSTSUBSCRIPT ( italic_u start_POSTSUBSCRIPT italic_t italic_r italic_u italic_e end_POSTSUBSCRIPT ( italic_x ) ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG end_ARG .

B.1 SWIM, ELM, SWIM-ODE, ELM-ODE

The details of these methods are already discussed in the main text.

B.2 Physics-informed networks

This work employs two prominent variants of physics-informed machine learning to compare the results obtained by the proposed methods. In particular, physics-informed neural network (PINN) [56] and causality-respecting physics-informed neural network (causal PINN) [69] are utilized to compare the obtained approximations.

Vanilla PINNs are feedforward deep neural networks designed to simulate PDEs by incorporating physical laws into the learning process. The architecture of a vanilla PINN includes a deep neural network that maps inputs (e.g., space and time coordinates) to outputs (e.g., physical quantities of interest) and is trained to minimize a loss function that combines both data and physics-based errors. The data term ensures that the neural network fits the provided data points, while the physics term enforces the PDE constraints with automatic differentiation. Hence, the loss function for PINN could be defined by

L(μ)=λ1LPDE(μ)+λ2LData(μ).𝐿𝜇subscript𝜆1subscript𝐿PDE𝜇subscript𝜆2subscript𝐿Data𝜇L(\mu)=\lambda_{1}L_{\mathrm{PDE}}(\mu)+\lambda_{2}L_{\mathrm{Data}}(\mu).italic_L ( italic_μ ) = italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_L start_POSTSUBSCRIPT roman_PDE end_POSTSUBSCRIPT ( italic_μ ) + italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_L start_POSTSUBSCRIPT roman_Data end_POSTSUBSCRIPT ( italic_μ ) . (15)

Here, μ𝜇\muitalic_μ represents the trainable network parameters. Considering the generic nonlinear PDE defined by (1) with well-posed boundary and initial conditions (2), the individual loss terms weighted by the hyperparameters λisubscript𝜆𝑖\lambda_{i}italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, i=1,2𝑖12i=1,2italic_i = 1 , 2, are defined by

LPDE(μ)=1Nintn=1Nintut(x(n),t(n))+Lu(x(n),t(n))+λN(u)(x(n),t(n))f(x(n))p.subscript𝐿PDE𝜇1subscript𝑁intsuperscriptsubscript𝑛1subscript𝑁intsuperscriptnormsubscriptsuperscript𝑢𝑡superscript𝑥𝑛superscript𝑡𝑛𝐿𝑢superscript𝑥𝑛superscript𝑡𝑛𝜆𝑁superscript𝑢superscript𝑥𝑛superscript𝑡𝑛𝑓superscript𝑥𝑛𝑝\begin{gathered}L_{\mathrm{PDE}}(\mu)=\frac{1}{N_{\mathrm{int}}}\sum_{n=1}^{N_% {\mathrm{int}}}||u^{*}_{t}(x^{(n)},t^{(n)})+Lu(x^{(n)},t^{(n)})+\lambda N(u^{*% })(x^{(n)},t^{(n)})-f(x^{(n)})||^{p}.\end{gathered}start_ROW start_CELL italic_L start_POSTSUBSCRIPT roman_PDE end_POSTSUBSCRIPT ( italic_μ ) = divide start_ARG 1 end_ARG start_ARG italic_N start_POSTSUBSCRIPT roman_int end_POSTSUBSCRIPT end_ARG ∑ start_POSTSUBSCRIPT italic_n = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N start_POSTSUBSCRIPT roman_int end_POSTSUBSCRIPT end_POSTSUPERSCRIPT | | italic_u start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_x start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT , italic_t start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT ) + italic_L italic_u ( italic_x start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT , italic_t start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT ) + italic_λ italic_N ( italic_u start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ( italic_x start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT , italic_t start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT ) - italic_f ( italic_x start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT ) | | start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT . end_CELL end_ROW (16)

The data loss term considering the initial and boundary conditions is defined by

LData(μ)=1Nin=1NiBu(x(n),t(n))g(x(n))p.subscript𝐿Data𝜇1subscript𝑁isuperscriptsubscript𝑛1subscript𝑁isuperscriptnorm𝐵superscript𝑢superscript𝑥𝑛superscript𝑡𝑛𝑔superscript𝑥𝑛𝑝\begin{gathered}L_{\mathrm{Data}}(\mu)=\frac{1}{N_{\mathrm{i}}}\sum_{n=1}^{N_{% \mathrm{i}}}||Bu^{*}(x^{(n)},t^{(n)})-g(x^{(n)})||^{p}.\end{gathered}start_ROW start_CELL italic_L start_POSTSUBSCRIPT roman_Data end_POSTSUBSCRIPT ( italic_μ ) = divide start_ARG 1 end_ARG start_ARG italic_N start_POSTSUBSCRIPT roman_i end_POSTSUBSCRIPT end_ARG ∑ start_POSTSUBSCRIPT italic_n = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N start_POSTSUBSCRIPT roman_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT | | italic_B italic_u start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_x start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT , italic_t start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT ) - italic_g ( italic_x start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT ) | | start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT . end_CELL end_ROW (17)

Here, N𝑁Nitalic_N is the total number of training points, which is the sum of interior training points (Nintsubscript𝑁intN_{\mathrm{int}}italic_N start_POSTSUBSCRIPT roman_int end_POSTSUBSCRIPT) and initial or boundary training points (Nsubscript𝑁absentN_{\mathrm{}}italic_N start_POSTSUBSCRIPT end_POSTSUBSCRIPT). The neural network predicted solution of u𝑢uitalic_u at a point in computational domain, (x(n),t(n))superscript𝑥𝑛superscript𝑡𝑛(x^{(n)},t^{(n)})( italic_x start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT , italic_t start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT ) is denoted by u(x(n),t(n))superscript𝑢superscript𝑥𝑛superscript𝑡𝑛u^{*}(x^{(n)},t^{(n)})italic_u start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_x start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT , italic_t start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT ). The experiments are trained with L2superscript𝐿2L^{2}italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT-norm, implying p=2𝑝2p=2italic_p = 2. The main goal is to minimize (15) and determine the optimal parameters (μ𝜇\muitalic_μ). These parameters, once optimized, are employed to predict the solution of the PDE within the computational domain.

The second physics-informed method employed is Causal PINNs, which modifies the PINN loss function to explicitly adhere to the temporal causality inherent in time-dependent PDEs. In vanilla PINNs, the loss function does not prioritize resolving the solution at lower times before higher times, leading to inaccuracies, especially in time-dependent problems. Causal PINNs address this by introducing a weighting factor for the loss at each time step, which depends on the accumulated loss from previous time steps. The resulting loss function ensures that the network prioritizes learning the solution accurately at earlier times before focusing on later times, thus maintaining the causal structure of the physical problem being solved. The causal PDE loss term is defined by

LPDE(μ)=i=1NtwiLPDE(ti,μ),w1=1,wi=eϵk=1i1LPDE(tk,μ),i=2,3,Nt.\begin{gathered}L_{\mathrm{PDE}}(\mu)=\sum_{i=1}^{N_{\mathrm{t}}}w_{i}L_{% \mathrm{PDE}}(t_{i},\mu),\\ w_{1}=1,\quad w_{i}=e^{-\epsilon\sum_{k=1}^{i-1}L_{\mathrm{PDE}}(t_{k},\mu)},% \quad i=2,3,\ldots N_{\mathrm{t}}.\end{gathered}start_ROW start_CELL italic_L start_POSTSUBSCRIPT roman_PDE end_POSTSUBSCRIPT ( italic_μ ) = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N start_POSTSUBSCRIPT roman_t end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_L start_POSTSUBSCRIPT roman_PDE end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_μ ) , end_CELL end_ROW start_ROW start_CELL italic_w start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 1 , italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_e start_POSTSUPERSCRIPT - italic_ϵ ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i - 1 end_POSTSUPERSCRIPT italic_L start_POSTSUBSCRIPT roman_PDE end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_μ ) end_POSTSUPERSCRIPT , italic_i = 2 , 3 , … italic_N start_POSTSUBSCRIPT roman_t end_POSTSUBSCRIPT . end_CELL end_ROW (18)

Ntsubscript𝑁tN_{\mathrm{t}}italic_N start_POSTSUBSCRIPT roman_t end_POSTSUBSCRIPT represents the number of time steps into which the computational domain is divided. The causality hyperparameter ϵitalic-ϵ\epsilonitalic_ϵ regulates the steepness of the weights and is incorporated in the loss function similar to [37]. This modification introduces a weighting factor, wisubscript𝑤𝑖w_{i}italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, for the loss at each time level tisubscript𝑡𝑖t_{i}italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, with wisubscript𝑤𝑖w_{i}italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT being dependent on the cumulative PDE loss up to time tisubscript𝑡𝑖t_{i}italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. The network prioritizes a fully resolved solution at earlier time levels by exponentiating the negative of this accumulated loss. Consequently, the modified loss function for causal PINN is expressed as

LPDE(μ)=1Nt[w1LPDE(t1,μ)+i=2Nteϵk=1i1LPDE(tk,μ)LPDE(ti,μ)].subscript𝐿PDE𝜇1subscript𝑁𝑡delimited-[]subscript𝑤1subscript𝐿PDEsubscript𝑡1𝜇superscriptsubscript𝑖2subscript𝑁𝑡superscript𝑒italic-ϵsuperscriptsubscript𝑘1𝑖1subscript𝐿PDEsubscript𝑡𝑘𝜇subscript𝐿PDEsubscript𝑡𝑖𝜇\displaystyle L_{\mathrm{PDE}}(\mu)=\frac{1}{N_{t}}\left[w_{1}L_{\mathrm{PDE}}% (t_{1},\mu)+\sum_{i=2}^{N_{t}}e^{-\epsilon\sum_{k=1}^{i-1}L_{\mathrm{PDE}}(t_{% k},\mu)}L_{\mathrm{PDE}}(t_{i},\mu)\right].italic_L start_POSTSUBSCRIPT roman_PDE end_POSTSUBSCRIPT ( italic_μ ) = divide start_ARG 1 end_ARG start_ARG italic_N start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG [ italic_w start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_L start_POSTSUBSCRIPT roman_PDE end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_μ ) + ∑ start_POSTSUBSCRIPT italic_i = 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_e start_POSTSUPERSCRIPT - italic_ϵ ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i - 1 end_POSTSUPERSCRIPT italic_L start_POSTSUBSCRIPT roman_PDE end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_μ ) end_POSTSUPERSCRIPT italic_L start_POSTSUBSCRIPT roman_PDE end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_μ ) ] . (19)

B.3 IGA-FEM

First introduced in [33], Isogeometric analysis (IGA) is a numerical method developed to unify the fields of computer-aided design (CAD) and finite element analysis (FEA). The key idea is to represent the solution space for the numerical analysis using the same functions that define the geometry in CAD [11], which include the B-Splines and Non-Uniform Rational B-Splines (NURBS) [52].

In this paper, we use B-Splines as the basis functions. The B-Splines are defined using the Cox-de Boor recursion formula [13, 15], i.e.,

Ni,0(ξ)={1ξiξ<ξi+10otherwise,subscript𝑁𝑖0𝜉cases1subscript𝜉𝑖𝜉subscript𝜉𝑖10otherwise,\displaystyle N_{i,0}(\xi)=\begin{cases}1&\xi_{i}\leq\xi<\xi_{i+1}\\ 0&\text{otherwise,}\end{cases}italic_N start_POSTSUBSCRIPT italic_i , 0 end_POSTSUBSCRIPT ( italic_ξ ) = { start_ROW start_CELL 1 end_CELL start_CELL italic_ξ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ≤ italic_ξ < italic_ξ start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL otherwise, end_CELL end_ROW
Ni,p(ξ)=ξξiξi+pξiNi,p1(ξ)+ξi+p+1ξξi+p+1ξi+1Ni+1,p1(ξ),subscript𝑁𝑖𝑝𝜉𝜉subscript𝜉𝑖subscript𝜉𝑖𝑝subscript𝜉𝑖subscript𝑁𝑖𝑝1𝜉subscript𝜉𝑖𝑝1𝜉subscript𝜉𝑖𝑝1subscript𝜉𝑖1subscript𝑁𝑖1𝑝1𝜉\displaystyle N_{i,p}(\xi)=\frac{\xi-\xi_{i}}{\xi_{i+p}-\xi_{i}}N_{i,p-1}(\xi)% +\frac{\xi_{i+p+1}-\xi}{\xi_{i+p+1}-\xi_{i+1}}N_{i+1,p-1}(\xi),italic_N start_POSTSUBSCRIPT italic_i , italic_p end_POSTSUBSCRIPT ( italic_ξ ) = divide start_ARG italic_ξ - italic_ξ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG italic_ξ start_POSTSUBSCRIPT italic_i + italic_p end_POSTSUBSCRIPT - italic_ξ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG italic_N start_POSTSUBSCRIPT italic_i , italic_p - 1 end_POSTSUBSCRIPT ( italic_ξ ) + divide start_ARG italic_ξ start_POSTSUBSCRIPT italic_i + italic_p + 1 end_POSTSUBSCRIPT - italic_ξ end_ARG start_ARG italic_ξ start_POSTSUBSCRIPT italic_i + italic_p + 1 end_POSTSUBSCRIPT - italic_ξ start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT end_ARG italic_N start_POSTSUBSCRIPT italic_i + 1 , italic_p - 1 end_POSTSUBSCRIPT ( italic_ξ ) ,

where ξisubscript𝜉𝑖\xi_{i}italic_ξ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is the i𝑖iitalic_ith knot, and p𝑝pitalic_p is the polynomial degree. The vector Ξ=[ξ1,ξ2,,ξn+p+1]Ξsubscript𝜉1subscript𝜉2subscript𝜉𝑛𝑝1\Xi=[\xi_{1},\xi_{2},\dots,\xi_{n+p+1}]roman_Ξ = [ italic_ξ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_ξ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , italic_ξ start_POSTSUBSCRIPT italic_n + italic_p + 1 end_POSTSUBSCRIPT ] is the knot vector, where n𝑛nitalic_n is the number of B-Splines. By specifing the knot vector, we define the basis functions we use to solve the PDEs. We use an uniform open knot vector, where the first and last knots have multiplicity p+1𝑝1p+1italic_p + 1, the inner knots have no multiplicity, and all knots that have different values are uniformly distributed. We refer to the knots with different values as "nodes". The intervals between two successive nodes are knot spans, which can be viewed as "elements". The elements form a "patch". A domain can be partitioned into subdomains and each is represented by a patch. In our work, we use a single patch to represent the entire 1D domain. Figure 7(a) shows an example of such a patch, where the B-Splines are Cpsuperscript𝐶𝑝C^{p}italic_C start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT-continuous within the knot spans and Cp1superscript𝐶𝑝1C^{p-1}italic_C start_POSTSUPERSCRIPT italic_p - 1 end_POSTSUPERSCRIPT continuous at the inner knots. In order to address the boundary conditions, we adapt the B-Splines as shown in Figure 7(b) Figure 7(c), so that the boundary conditions are directly built into the solution space.

Refer to caption
(a) Number of basis functions =7.absent7{=7.}= 7 .
Refer to caption
(b) Number of basis functions =5.absent5{=5.}= 5 .
Refer to caption
(c) Number of basis functions =5.absent5{=5.}= 5 .
Figure 7: Examples of B-Splines representing the 1D domain [0,1]01[0,1][ 0 , 1 ]. Number of nodes =6absent6=6= 6 and degree of polynomials =2.absent2=2.= 2 . Left: The original B-Splines. Middle: Adapted B-Splines to satisfy the Dirichlet boundary condition. Right: Adapted B-Splines to satisfy the periodic boundary condition. Note that the first (blue) spline is identical to the second last (brown) one, and the second (orange) spline is identical to the last (pink) one, as they share the same coefficient. The gray dashed lines indicate where the domain starts and ends.

In the following, we refer to the adapted B-Splines as basis functions ϕk(x)subscriptitalic-ϕ𝑘𝑥\phi_{k}(x)italic_ϕ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_x ). Thus, the solutions of PDEs are approximated by

u(x,t)=k=1Kck(t)ϕk(x).𝑢𝑥𝑡subscriptsuperscript𝐾𝑘1subscript𝑐𝑘𝑡subscriptitalic-ϕ𝑘𝑥\displaystyle u(x,t)=\sum^{K}_{k=1}c_{k}(t)\phi_{k}(x).italic_u ( italic_x , italic_t ) = ∑ start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_t ) italic_ϕ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_x ) .

We solve the PDEs in the weak formulation. For the linear advection equation (23), the weak form of the equation is

k=1Kck(t)Xϕk(x)v(x)𝑑x+βk=1Kck(t)Xϕk(x)v(x)𝑑x=0,subscriptsuperscript𝐾𝑘1subscriptsuperscript𝑐𝑘𝑡subscript𝑋subscriptitalic-ϕ𝑘𝑥𝑣𝑥differential-d𝑥𝛽subscriptsuperscript𝐾𝑘1subscript𝑐𝑘𝑡subscript𝑋subscriptsuperscriptitalic-ϕ𝑘𝑥𝑣𝑥differential-d𝑥0\displaystyle\sum^{K}_{k=1}c^{\prime}_{k}(t)\int_{X}\phi_{k}(x)v(x)dx+\beta% \sum^{K}_{k=1}c_{k}(t)\int_{X}\phi^{\prime}_{k}(x)v(x)dx=0,∑ start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT italic_c start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_t ) ∫ start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT italic_ϕ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_x ) italic_v ( italic_x ) italic_d italic_x + italic_β ∑ start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_t ) ∫ start_POSTSUBSCRIPT italic_X end_POSTSUBSCRIPT italic_ϕ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_x ) italic_v ( italic_x ) italic_d italic_x = 0 , (20)

where v(x)𝑣𝑥v(x)italic_v ( italic_x ) are the test functions. The test functions are chosen to be the same as the basis functions. The integral of the functions is computed using Gaussian quadrature. Then we solve the linear Ordinary differential equation (ODE)

𝐌𝐜˙+𝐊𝐜=𝟎,𝐌˙𝐜𝐊𝐜0\displaystyle\mathbf{M}\dot{\mathbf{c}}+\mathbf{Kc}=\mathbf{0},bold_M over˙ start_ARG bold_c end_ARG + bold_Kc = bold_0 ,

where matrix 𝐌𝐌\mathbf{M}bold_M and matrix 𝐊𝐊\mathbf{K}bold_K contain the integral of the B-Splines and their derivatives, and the coefficient β𝛽\betaitalic_β, which are given. We solve the Euler-Bernoulli equation (25) and the Burgers’ equation (26) in a similar way. The boundary condition for the Euler-Bernoulli equation is in addition weakly imposed, as is done in [55].

Appendix C Numerical Experiments

C.1 Helmholtz equation on a geometry described by the Stanford bunny dataset

We solve the Helmholtz equation with periodic boundary conditions on the domain Ω3Ωsuperscript3\Omega\subset\mathbb{R}^{3}roman_Ω ⊂ blackboard_R start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT. The Stanford bunny data set [67] contains 3-D point cloud data of 15,258 points. The PDE is defined by

Δu(x)u(x)Δ𝑢𝑥𝑢𝑥\displaystyle\Delta u(x)-u(x)roman_Δ italic_u ( italic_x ) - italic_u ( italic_x ) =f(x)forxΩ,formulae-sequenceabsent𝑓𝑥for𝑥Ω\displaystyle=f(x)\quad\textnormal{for}\quad x\in\Omega,= italic_f ( italic_x ) for italic_x ∈ roman_Ω , (21a)
u(x)𝑢𝑥\displaystyle u(x)italic_u ( italic_x ) =g(x)forxΩ.formulae-sequenceabsent𝑔𝑥for𝑥Ω\displaystyle=g(x)\quad\textnormal{for}\quad x\in\partial\Omega.= italic_g ( italic_x ) for italic_x ∈ ∂ roman_Ω . (21b)

We consider two cases with different analytical solutions. The boundary conditions g𝑔gitalic_g, and the forcing terms f𝑓fitalic_f for the two cases are chosen such that the respective exact solutions of the PDE are

u1(x)=exp(x1+x2+x3),andu2(x)=5+sin(πx1)sin(πx2)sin(4πx3).formulae-sequencesubscript𝑢1𝑥subscript𝑥1subscript𝑥2subscript𝑥3andsubscript𝑢2𝑥5𝜋subscript𝑥1𝜋subscript𝑥24𝜋subscript𝑥3\displaystyle u_{1}(x)=\exp(x_{1}+x_{2}+x_{3}),\ \text{and}\,u_{2}(x)=5+\sin(% \pi x_{1})\sin(\pi x_{2})\sin(4\pi x_{3}).italic_u start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_x ) = roman_exp ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT + italic_x start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ) , and italic_u start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_x ) = 5 + roman_sin ( italic_π italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) roman_sin ( italic_π italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) roman_sin ( 4 italic_π italic_x start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ) . (22)

Solving PDEs with classical mesh-based methods requires dealing with difficulties and higher computational costs of generating meshes on complex geometries. The meshfree nature of neural PDE solvers such as PINN, ELM, and SWIM eliminates the difficulties associated with meshing and can cheaply find a good approximation as shown in Table 5 and Table 6. ELM requires fewer neurons for the same accuracy as SWIM for PDE solutions that do not involve steep local gradients. SWIM usually performs slightly better than ELM for solutions involving moderately steep local gradients, and PINNs struggle. Moreover, Figure 8 shows that Lrelative2subscriptsuperscript𝐿2relativeL^{2}_{\text{relative}}italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT relative end_POSTSUBSCRIPT decreases exponentially with the neurons in the hidden layer for SWIM and ELM. We always choose the number of neurons as half the number of collocation points in Figure 8(b) and Figure 8(d).

Refer to caption
(a) Solution u1(x)subscript𝑢1𝑥u_{1}(x)~{}italic_u start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_x )(22).
Refer to caption
(b) Exponential error decay with respect to the hidden layer width.
Refer to caption
(c) Solution u2(x)subscript𝑢2𝑥u_{2}(x)italic_u start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_x ) (22).
Refer to caption
(d) Exponential error decay with respect to the hidden layer width.
Figure 8: Spectral convergence for the static Helmholtz PDE on a complicated domain.

In Table 4, we describe additional details in solving the Helmholtz equation with neural PDE solvers: ELM, SWIM, and PINN. Figure 9 and Figure 10 show the absolute errors obtained with the ELM, SWIM, and PINN PDE solvers. We use the Python package alphashape [3] to extract the boundary points of the Stanford bunny dataset. However, we do not uniformly sample the boundary points on the boundary, and there may be some points on the domain’s interior near the boundary that are selected as the boundary points. This, however, entails no changes in our algorithm.

For approximating u1(x)subscript𝑢1𝑥u_{1}(x)italic_u start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_x ) and u2(x)subscript𝑢2𝑥u_{2}(x)italic_u start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_x ), 6000 points in the interior and 888 points on the boundary of the bunny are sampled to train the networks. In addition, testing is performed on 8370 interior points and 99 boundary points. The exact architectures and comparison of training times and errors are presented in Table 5 and Table 6. With ELM and SWIM, we compute the SVD of the feature matrix and retain 1200 singular values for approximating u1(x)subscript𝑢1𝑥u_{1}(x)italic_u start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_x ) and 1600 singular values for approximating u2(x)subscript𝑢2𝑥u_{2}(x)italic_u start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_x ).

Table 4: Network hyperparameters used to solve the static PDE on the Stanford bunny dataset (on both examples).
Parameter Value
SWIM and ELM Number of layers 2222
Layer width [50,500,1000,2000,3000,4000]505001000200030004000[50,500,1000,2000,\textbf{3000},\textbf{4000}][ 50 , 500 , 1000 , 2000 , 3000 , 4000 ]
SVD/Outer layer [1200,1600]12001600[1200,1600][ 1200 , 1600 ]
Activation tanh
L2superscript𝐿2L^{2}italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT-regularization [108,1010,𝟏𝟎𝟏𝟐]superscript108superscript1010superscript1012[10^{-8},10^{-10},\mathbf{10^{-12}}][ 10 start_POSTSUPERSCRIPT - 8 end_POSTSUPERSCRIPT , 10 start_POSTSUPERSCRIPT - 10 end_POSTSUPERSCRIPT , bold_10 start_POSTSUPERSCRIPT - bold_12 end_POSTSUPERSCRIPT ]
Loss mean-squared error
PINN Number of layers 4444
Layer width 20202020
Activation tanh
Optimizer LBFGS
Epochs 2500250025002500 (10000100001000010000 for approximating u2(x)subscript𝑢2𝑥u_{2}(x)italic_u start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_x ))
Loss mean-squared error
Learning rate 0.10.10.10.1
Batch size 888888888888
Parameter initialization Xavier [29]
Loss weights, λ1subscript𝜆1\lambda_{1}italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, λ2subscript𝜆2\lambda_{2}italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT 0.10.10.10.1, 1111
Refer to caption
(a) SWIM
Refer to caption
(b) ELM
Refer to caption
(c) PINN
Figure 9: Absolute error compared to the ground truth solution of the Helmholtz equation u1(x)subscript𝑢1𝑥u_{1}(x)italic_u start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_x ) (see Equation 22) on a complicated geometry.
Refer to caption
(a) SWIM
Refer to caption
(b) ELM
Refer to caption
(c) PINN
Figure 10: Absolute error compared to the ground truth solution of the Helmholtz equation u2(x)subscript𝑢2𝑥u_{2}(x)italic_u start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_x ) (see Equation 22) on a complicated geometry.
Table 5: Summary of Helmholtz equation with ground truth given by u1(x)subscript𝑢1𝑥u_{1}(x)italic_u start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_x ) (see Equation 22).
Method Training time (ssecond\mathrm{s}roman_s) RMSE Relative L2superscript𝐿2{L}^{2}italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT error architecture
PINN 57.12 4.86e-4 ±plus-or-minus\pm± 1.53e-4 3.18e-4 ±plus-or-minus\pm± 9.99e-5 (3, 4 ×\times×20, 1)
SWIM 64.30 3.99e-4 ±plus-or-minus\pm± 1.47 e-4 2.61e-4 ±plus-or-minus\pm± 9.65e-5 (3, 3000, 1200, 1)
ELM 69.06 1.05e-6 ±plus-or-minus\pm± 4.31e-7 6.90e-7 ±plus-or-minus\pm± 2.82 e-7 (3, 3000, 1200, 1)
Table 6: Summary of Helmholtz equation with ground truth given by u2(x)subscript𝑢2𝑥u_{2}(x)italic_u start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_x ) (see Equation 22).
Method Training time (ssecond\mathrm{s}roman_s) RMSE Relative L2superscript𝐿2{L}^{2}italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT error architecture
PINN 229.06 4.81e-2 ±plus-or-minus\pm± 1.01e-3 9.53e-3 ±plus-or-minus\pm± 1.99e-4 (3, 4 ×\times× 20, 1)
ELM 144.83 5.15e-3 ±plus-or-minus\pm± 2.64e-3 2.80e-3 ±plus-or-minus\pm± 3.13e-4 (3, 4000, 1600, 1)
SWIM 129.20 2.80e-3 ±plus-or-minus\pm± 3.13e-4 1.02e-3 ±plus-or-minus\pm± 5.24e-4 (3, 4000, 1600, 1)

C.2 Linear Advection Equation

The advection equation models the propagation of a quantity at a speed β𝛽\betaitalic_β without altering the shape. We solve the linear advection equation with periodic boundary conditions described by

ut(x,t)+βux(x,t)subscript𝑢𝑡𝑥𝑡𝛽subscript𝑢𝑥𝑥𝑡\displaystyle u_{t}(x,t)+\beta u_{x}(x,t)italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_x , italic_t ) + italic_β italic_u start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT ( italic_x , italic_t ) =0,forx[0,2π],t[0,1],formulae-sequenceabsent0forformulae-sequence𝑥02𝜋𝑡01\displaystyle=0,\quad\textnormal{for}\quad x\in[0,2\pi],t\in[0,1],= 0 , for italic_x ∈ [ 0 , 2 italic_π ] , italic_t ∈ [ 0 , 1 ] , (23)
u(x,0)𝑢𝑥0\displaystyle u(x,0)italic_u ( italic_x , 0 ) =sin(x).absent𝑥\displaystyle=\sin(x).= roman_sin ( italic_x ) . (24)

We describe additional details in solving the advection equation with various neural PDE solvers in Table 7. In addition, hyperparameter optimization for PINN for the case of β=10𝛽10\beta=10italic_β = 10 was carried out, varying the number of neurons and interior points. The results for the hyperparameter optimization are detailed in Table 8, Table 9. For SWIM and ELM, we use 1000100010001000 interior points for β{102,101,1,10}𝛽superscript102superscript101110\beta\in\{10^{-2},10^{-1},1,10\}italic_β ∈ { 10 start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT , 10 start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT , 1 , 10 }, and we use 8000800080008000 interior points for β{40,100}𝛽40100\beta\in\{40,100\}italic_β ∈ { 40 , 100 }. Figure 11 shows the absolute errors obtained with the SWIM-ODE, ELM-ODE, PINN, Causal PINN and IGA methods.

Refer to caption
(a) SWIM-ODE
Refer to caption
(b) ELM-ODE
Refer to caption
(c) PINN
Refer to caption
(d) Causal PINN
Refer to caption
(e) IGA
Refer to caption
(f) Ground truth
Figure 11: Advection equation: absolute error plots and ground truth
Table 7: Network hyperparameters used for the linear advection equation for β{102,101,1,10,40,100}𝛽superscript102superscript10111040100\beta\in\{10^{-2},10^{-1},1,10,40,100\}italic_β ∈ { 10 start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT , 10 start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT , 1 , 10 , 40 , 100 }
Parameter Value
SWIM-ODE, ELM-ODE Number of layers 2222
Hidden layer width [140,𝟑𝟖𝟎,560]140380560[140,\mathbf{380},560][ 140 , bold_380 , 560 ]
Activation tanh
L2superscript𝐿2L^{2}italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT-regularization [108,𝟏𝟎𝟏𝟎,1012]superscript108superscript1010superscript1012[10^{-8},\mathbf{10^{-10}},10^{-12}][ 10 start_POSTSUPERSCRIPT - 8 end_POSTSUPERSCRIPT , bold_10 start_POSTSUPERSCRIPT - bold_10 end_POSTSUPERSCRIPT , 10 start_POSTSUPERSCRIPT - 12 end_POSTSUPERSCRIPT ]
Loss mean-squared error
SWIM, ELM Number of layers 2222
SVD cutoff 1012superscript101210^{-12}10 start_POSTSUPERSCRIPT - 12 end_POSTSUPERSCRIPT
Hidden layer width [140,𝟑𝟖𝟎,560]140380560[140,\mathbf{380},560][ 140 , bold_380 , 560 ]
Activation tanh
L2superscript𝐿2L^{2}italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT-regularization [108,𝟏𝟎𝟏𝟎,1012]superscript108superscript1010superscript1012[10^{-8},\mathbf{10^{-10}},10^{-12}][ 10 start_POSTSUPERSCRIPT - 8 end_POSTSUPERSCRIPT , bold_10 start_POSTSUPERSCRIPT - bold_10 end_POSTSUPERSCRIPT , 10 start_POSTSUPERSCRIPT - 12 end_POSTSUPERSCRIPT ]
Loss mean-squared error
# Initial and boundary points 400400400400
# Interior points [1000,8000]10008000[\textbf{1000},\textbf{8000}][ 1000 , 8000 ]
IGA Number of nodes 16161616
Degree of polynomials 8888
Number of basis functions 15151515
PINN Number of layers 4444
Layer width [10,20,30,40]10203040[10,20,\textbf{30},40][ 10 , 20 , 30 , 40 ]
Activation tanh
Optimizer LBFGS
Epochs 5000500050005000
Loss mean-squared error
Learning rate 0.10.10.10.1
Batch size 200200200200
Parameter initialization Xavier [29]
Loss weights, λ1subscript𝜆1\lambda_{1}italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, λ2subscript𝜆2\lambda_{2}italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT 1111, 1111
# Interior points [500,1000,1500,2000]500100015002000[500,1000,1500,\textbf{2000}][ 500 , 1000 , 1500 , 2000 ]
# Initial and boundary points 600600600600
Causal PINN Number of layers 4444
Layer width 30303030
Activation tanh
Optimizer ADAM followed by LBFGS
ADAM Epochs 2000200020002000
LBFGS Epochs 5000500050005000
Loss mean-squared error
Learning rate 0.10.10.10.1
Batch size 2000200020002000
Parameter initialization Xavier [29]
Loss weights, λ1subscript𝜆1\lambda_{1}italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, λ2subscript𝜆2\lambda_{2}italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT 1111, 1111
# Interior points 40000400004000040000
# Initial and boundary points 6000600060006000
Causality parameter, ϵitalic-ϵ\epsilonitalic_ϵ 10101010
Table 8: Advection equation (23): hyperparameter optimization for PINN varying layer width for β=10𝛽10\beta=10italic_β = 10.
Layer width Training time (ssecond\mathrm{s}roman_s) RMSE Relative L2superscript𝐿2{L}^{2}italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT error
10 24.47 ±plus-or-minus\pm± 0.19 1.24e-3 ±plus-or-minus\pm± 2.38e-4 1.76e-3 ±plus-or-minus\pm± 3.37e-4
20 27.46 ±plus-or-minus\pm± 0.08 6.52e-4 ±plus-or-minus\pm± 2.59e-4 9.22e-4 ±plus-or-minus\pm± 3.66e-4
30 30.43 ±plus-or-minus\pm± 0.50 3.69e-4 ±plus-or-minus\pm± 4.33e-5 5.23e-4 ±plus-or-minus\pm± 6.13e-5
40 33.64 ±plus-or-minus\pm± 0.41 3.86e-4 ±plus-or-minus\pm± 9.37e-5 5.46e-4 ±plus-or-minus\pm± 1.32e-4
Table 9: Advection equation (23): hyperparameter optimization for PINN varying interior points for β=10𝛽10\beta=10italic_β = 10.
Interior points Training time (ssecond\mathrm{s}roman_s) RMSE Relative L2superscript𝐿2{L}^{2}italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT error
500 25.76 ±plus-or-minus\pm± 0.29 4.10e-4 ±plus-or-minus\pm± 7.20e-5 5.80e-4 ±plus-or-minus\pm± 1.01e-4
1000 27.44 ±plus-or-minus\pm± 0.25 3.72e-4 ±plus-or-minus\pm± 4.06e-5 5.27e-4 ±plus-or-minus\pm± 5.74e-5
1500 29.61 ±plus-or-minus\pm± 0.16 5.68e-4 ±plus-or-minus\pm± 1.97e-4 8.03e-4 ±plus-or-minus\pm± 2.79e-4
2000 30.43 ±plus-or-minus\pm± 0.50 3.69e-4 ±plus-or-minus\pm± 4.33e-5 5.23e-4 ±plus-or-minus\pm± 6.13e-5
Table 10: Summary of results for the advection equation for β=40𝛽40\beta=40italic_β = 40.
Method Training time (ssecond\mathrm{s}roman_s) RMSE Relative L2superscript𝐿2{L}^{2}italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT error architecture
SWIM 66.30 6.81 ±plus-or-minus\pm± 0.26 9.63 ±plus-or-minus\pm± 0.38 (2, 4000, 1)
ELM 59.05 3.78 ±plus-or-minus\pm± 0.21 5.35 ±plus-or-minus\pm± 0.29 (2, 4000, 1)
Causal PINN 357.63 ±plus-or-minus\pm± 3.11 2.07 ±plus-or-minus\pm± 0.87 2.92 ±plus-or-minus\pm± 1.23 (2, 4 ×\times× 30, 1)
PINN 30.56 ±plus-or-minus\pm± 0.27 4.89e-1 ±plus-or-minus\pm± 2.09e-2 6.92e-1 ±plus-or-minus\pm± 2.96e-2 (2, 4 ×\times× 30, 1)
ODE-SWIM (our) 2.72 5.04 e-6 ±plus-or-minus\pm± 1.45e-6 7.13e-6 ±plus-or-minus\pm± 2.05 e-6 (1, 350, 15, 1)
ODE-ELM (our) 2.71 2.73e-6 ±plus-or-minus\pm± 3.68e-7 3.84e-6 ±plus-or-minus\pm± 5.2e-7 (1, 350, 15, 1)
IGA-FEM 0.07 8.24e-11 1.17e-10 15

C.3 Euler-Bernoulli PDE

This is a time-dependent PDE, given by

utt+uxxxx=f(x,t)x[0,π],t[0,1]formulae-sequencesubscript𝑢ttsubscript𝑢xxxx𝑓𝑥𝑡formulae-sequence𝑥0𝜋𝑡01u_{\mathrm{tt}}+u_{\mathrm{xxxx}}=f(x,t)\quad x\in[0,\pi],t\in[0,1]italic_u start_POSTSUBSCRIPT roman_tt end_POSTSUBSCRIPT + italic_u start_POSTSUBSCRIPT roman_xxxx end_POSTSUBSCRIPT = italic_f ( italic_x , italic_t ) italic_x ∈ [ 0 , italic_π ] , italic_t ∈ [ 0 , 1 ] (25)

where f(x,t)=(116π2)sin(x)cos(4πt)𝑓𝑥𝑡116superscript𝜋2𝑥4𝜋𝑡f(x,t)=(1-16\pi^{2})\sin{(x)}\cos(4\pi t)italic_f ( italic_x , italic_t ) = ( 1 - 16 italic_π start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) roman_sin ( italic_x ) roman_cos ( 4 italic_π italic_t ), with initial and boundary conditions

u(x,0)=sin(x),ut(x,0)=0formulae-sequence𝑢𝑥0𝑥subscript𝑢𝑡𝑥00u(x,0)=\sin(x),\quad u_{t}(x,0)=0italic_u ( italic_x , 0 ) = roman_sin ( italic_x ) , italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_x , 0 ) = 0
u(0,t)=u(π,t)=uxx(0,t)=uxx(π,t)=0.𝑢0𝑡𝑢𝜋𝑡subscript𝑢xx0𝑡subscript𝑢xx𝜋𝑡0u(0,t)=u(\pi,t)=u_{\mathrm{xx}}(0,t)=u_{\mathrm{xx}}(\pi,t)=0.italic_u ( 0 , italic_t ) = italic_u ( italic_π , italic_t ) = italic_u start_POSTSUBSCRIPT roman_xx end_POSTSUBSCRIPT ( 0 , italic_t ) = italic_u start_POSTSUBSCRIPT roman_xx end_POSTSUBSCRIPT ( italic_π , italic_t ) = 0 .

It models a simply supported beam with varying transverse force. We describe additional details in solving the Euler-Bernoulli with various neural PDE solvers in Table 11. Figure 12 shows the absolute errors obtained with the SWIM-ODE, ELM-ODE, PINN, and IGA methods.

Refer to caption
(a) SWIM-ODE
Refer to caption
(b) ELM-ODE
Refer to caption
(c) PINN
Refer to caption
(d) IGA
Refer to caption
(e) Ground truth
Figure 12: Euler-Bernoulli equation: absolute error plots and ground truth
Table 11: Network hyperparameters used for the Euler-Bernoulli equation.
Parameter Value
SWIM-ODE and ELM-ODE Number of layers [2]delimited-[]2[2][ 2 ]
Hidden layer width 200200200200
Outer layer width 10101010
Activation tanh
L2superscript𝐿2L^{2}italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT-regularization 1012superscript101210^{-12}10 start_POSTSUPERSCRIPT - 12 end_POSTSUPERSCRIPT
Loss mean-squared error
IGA Number of nodes 27272727
Degree of polynomials 9999
Number of basis functions 33333333
PINN Number of layers 4444
Layer width 20202020
Activation tanh
Optimizer LBFGS
Epochs 15000150001500015000
Loss mean-squared error
Learning rate 0.10.10.10.1
Batch size 2000200020002000
Parameter initialization Xavier [29]
Loss weights, λ1subscript𝜆1\lambda_{1}italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, λ2subscript𝜆2\lambda_{2}italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT 0.10.10.10.1, 1111
# Interior points 10000100001000010000
# Initial and boundary points 6000600060006000
Table 12: Summary of results for the Euler-Bernoulli beam equation.
Method Training time (ssecond\mathrm{s}roman_s) RMSE Relative L2superscript𝐿2{L}^{2}italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT error architecture
PINN 2303.71 ±plus-or-minus\pm± 278.68 2.11e-3 ±plus-or-minus\pm± 4.79e-4 4.21e-3 ±plus-or-minus\pm± 9.56e-4 (2, 4×\times× 20, 1)
ODE-SWIM (our) 0.05 6.06e-8 ±plus-or-minus\pm± 2.96e-8 1.20e-7 ±plus-or-minus\pm± 5.91e-8 (1, 200, 10, 1)
ODE-ELM (our) 0.06 1.75e-8 ±plus-or-minus\pm± 3.91e-9 3.50e-8 ±plus-or-minus\pm± 7.79e-9 (1, 200, 10, 1)
IGA-FEM 0.94 2.11e-7 4.21e-7 33

C.4 Burgers

The inviscid Burgers’ equation is a non-linear PDE, which can form shock waves. We solve Burgers’ equation on Ω=[1,1]Ω11\Omega=[-1,1]roman_Ω = [ - 1 , 1 ] for time t(0,1]𝑡01t\in(0,1]italic_t ∈ ( 0 , 1 ], so that

ut+uux(0.01/π)uxxsubscript𝑢𝑡𝑢subscript𝑢𝑥0.01𝜋subscript𝑢𝑥𝑥\displaystyle u_{t}+uu_{x}-(0.01/\pi)u_{xx}italic_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_u italic_u start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT - ( 0.01 / italic_π ) italic_u start_POSTSUBSCRIPT italic_x italic_x end_POSTSUBSCRIPT =0,xΩ,,t[0,1],\displaystyle=0,\quad x\in\Omega,\quad,t\in[0,1],= 0 , italic_x ∈ roman_Ω , , italic_t ∈ [ 0 , 1 ] , (26)
u(0,x)=sin(πx),𝑢0𝑥𝜋𝑥\displaystyle u(0,x)=-\sin(\pi x),italic_u ( 0 , italic_x ) = - roman_sin ( italic_π italic_x ) , (27)
u(1,1)=u(t,1)=0.𝑢11𝑢𝑡10\displaystyle u(1,-1)=u(t,1)=0.italic_u ( 1 , - 1 ) = italic_u ( italic_t , 1 ) = 0 . (28)

We describe additional details in solving the Burgers’ equation with various neural PDE solvers in Table 13 and Table 14. Figure 13 shows the absolute errors obtained with the PINN, Causal PINN, and IGA methods.

Refer to caption
(a) PINN
Refer to caption
(b) Causal PINN
Refer to caption
(c) IGA
Refer to caption
(d) Ground truth
Figure 13: Burgers’ equation: absolute error plots and ground truth
Table 13: Network hyper-parameters used for SWIM-ODE and ELM-ODE to solve Burgers’ equation.
Parameter Value
SWIM-ODE Number of layers 2222
Hidden layer width [450]delimited-[]450[450][ 450 ]
Activation tanh
L2superscript𝐿2L^{2}italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT-regularization [106,𝟏𝟎𝟕,108,1010,1012]superscript106superscript107superscript108superscript1010superscript1012[10^{-6},\mathbf{10^{-7}},10^{-8},10^{-10},10^{-12}][ 10 start_POSTSUPERSCRIPT - 6 end_POSTSUPERSCRIPT , bold_10 start_POSTSUPERSCRIPT - bold_7 end_POSTSUPERSCRIPT , 10 start_POSTSUPERSCRIPT - 8 end_POSTSUPERSCRIPT , 10 start_POSTSUPERSCRIPT - 10 end_POSTSUPERSCRIPT , 10 start_POSTSUPERSCRIPT - 12 end_POSTSUPERSCRIPT ]
Loss mean-squared error
# collocation points (space) [800]delimited-[]800[800][ 800 ]
# sampling points [6000]delimited-[]6000[6000][ 6000 ]
ELM-ODE Number of layers 2222
Hidden layer width [2000]delimited-[]2000[2000][ 2000 ]
Activation tanh
L2superscript𝐿2L^{2}italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT-regularization [106,𝟏𝟎𝟕,108,1010,1012]superscript106superscript107superscript108superscript1010superscript1012[10^{-6},\mathbf{10^{-7}},10^{-8},10^{-10},10^{-12}][ 10 start_POSTSUPERSCRIPT - 6 end_POSTSUPERSCRIPT , bold_10 start_POSTSUPERSCRIPT - bold_7 end_POSTSUPERSCRIPT , 10 start_POSTSUPERSCRIPT - 8 end_POSTSUPERSCRIPT , 10 start_POSTSUPERSCRIPT - 10 end_POSTSUPERSCRIPT , 10 start_POSTSUPERSCRIPT - 12 end_POSTSUPERSCRIPT ]
Loss mean-squared error
# collocation points (space) [3000]delimited-[]3000[3000][ 3000 ]
# sampling points [6000]delimited-[]6000[6000][ 6000 ]
Table 14: Network hyper-parameters used for PINN, Causal PINN, and IGA to solve Burgers’ equation.
Parameter Value
PINN Number of layers 9999
Layer width 20202020
Activation tanh
Optimizer LBFGS
Epochs 10000100001000010000
Loss mean-squared error
Learning rate 0.10.10.10.1
Batch size 200200200200
Parameter initialization Xavier [29]
Loss weights, λ1subscript𝜆1\lambda_{1}italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, λ2subscript𝜆2\lambda_{2}italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT 1111, 1111
# Interior points 10000100001000010000
# Initial and boundary points 600600600600
Causal PINN Number of layers 9999
Layer width 20202020
Activation tanh
Optimizer ADAM followed by LBFGS
ADAM Epochs 5000500050005000
LBFGS Epochs 10000100001000010000
Loss mean-squared error
Learning rate 0.10.10.10.1
Batch size 200200200200
Parameter initialization Xavier [29]
Loss weights, λ1subscript𝜆1\lambda_{1}italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, λ2subscript𝜆2\lambda_{2}italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT 1111, 1111
# Interior points 40000400004000040000
# Initial and boundary points 600600600600
Causality parameter, ϵitalic-ϵ\epsilonitalic_ϵ 5555
IGA Number of nodes 400400400400
Degree of polynomials 8888
Number of basis functions 405405405405
Table 15: Summary of results for Burgers’ Equation.
Method Training time (ssecond\mathrm{s}roman_s) RMSE Relative L2superscript𝐿2{L}^{2}italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT error architecture
ODE-ELM (our) 2.41 1.51e-1 ±plus-or-minus\pm± 3.27 e-4 2.47e-1 ±plus-or-minus\pm± 5.33e-4 (1, 2000, 1)
PINN 275.23 ±plus-or-minus\pm± 5.38 2.38e-3 ±plus-or-minus\pm± 1.61e-3 3.88e-3 ±plus-or-minus\pm± 2.61e-3 (2, 9×\times×20, 1)
Causal-PINN 1531.79 ±plus-or-minus\pm± 18.45 9.85e-3 ±plus-or-minus\pm± 5.51e-3 1.60e-2 ±plus-or-minus\pm± 8.97e-3 (2, 9×\times×20, 1)
ODE-SWIM (our) 141.35 2.05e-4 ±plus-or-minus\pm± 2.84e-4 3.33e-4 ±plus-or-minus\pm± 4.63e-4 (1, 400, 1)
IGA-FEM 13.61 1.35e-4 2.20e-4 405
Table 16: Network hyperparameters for the two inverse problems.
Parameter Value
SWIM Number of layers 1
Layer width 1024 (Poisson), 400 (Helmholtz)
Activation tanh
L2superscript𝐿2L^{2}italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT-regularization 1.0e-10 (Poisson), 0.0 for Helmholtz
Linear solver np.linalg.lstsq
Loss mean-squared error
# Interior points to solve PDE 900900900900 (Poisson and Helmholtz)
# Boundary points None (boundary condition not provided)
# Measurement points 50505050 (Poisson), 300300300300 (Helmholtz)
PINN Number of layers 4444
Layer width 20202020
Activation tanh
Optimizer LBFGS
Epochs 10000100001000010000
Loss mean-squared error
Learning rate 0.10.10.10.1
Batch size 375375375375
Parameter initialization Xavier [29]
Loss weights, λ1subscript𝜆1\lambda_{1}italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, λ2subscript𝜆2\lambda_{2}italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT 0.050.050.050.05, 1111
# Interior points to solve PDE 1500150015001500
# Initial and boundary points 1500150015001500
# Measurement points 50505050 (Poisson)

C.5 Parametric Poisson equation

We solve a parametric Poisson equation on Ω=[0,1.4]2Ωsuperscript01.42\Omega=[0,1.4]^{2}roman_Ω = [ 0 , 1.4 ] start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT,

div(Du(x))=f(x)forxΩ,andu(x)=g(x)forxΩ,formulae-sequencediv𝐷𝑢𝑥𝑓𝑥for𝑥Ωand𝑢𝑥𝑔𝑥for𝑥Ω\text{div}(D\nabla u(x))=f(x)\,\text{for}\,x\in\Omega,\,\text{and}\,u(x)=g(x)% \,\text{for}\,x\in\partial\Omega,div ( italic_D ∇ italic_u ( italic_x ) ) = italic_f ( italic_x ) for italic_x ∈ roman_Ω , and italic_u ( italic_x ) = italic_g ( italic_x ) for italic_x ∈ ∂ roman_Ω , (29)

with parameterized, diagonal diffusivity matrix D=diag(1,α)𝐷diag1𝛼D=\text{diag}\left(1,\alpha\right)italic_D = diag ( 1 , italic_α ), α>0𝛼0\alpha>0italic_α > 0. Here u(x)𝑢𝑥u(x)italic_u ( italic_x ) denotes the unknown solution of the PDE and α𝛼\alphaitalic_α denotes the unknown model parameter that needs to be estimated given the set of measurements. We describe additional details in solving this inverse problem with various neural PDE solvers in Table 16. Figure 14 shows the ground truth as well as the distribution of the random measurement points in the two-dimensional domain. When solving the PDE with varying parameter α𝛼\alphaitalic_α, it is obvious from Figure 14 (right) that the ground truth parameter must be close to α=1𝛼1\alpha=1italic_α = 1.

Refer to caption
Refer to caption
Figure 14: Left: Ground truth and measurement points for the parametric Poisson PDE. Right: The measured loss over 51 trials of the unknown parameter α𝛼\alphaitalic_α.

For general inverse problems solved with PINN, the unknown parameter can be in the PDE or initial or boundary conditions. Hence, this unknown parameter will be the part of the loss function. To estimate this parameter in conjunction to training PINN and solving the forward problem, this parameter should be learnable. Following, we describe two possible ways to estimate this parameter. First, the unknown parameter can be considered as a learnable parameter like weights and biases of PINN and the parameter could be passed on to the optimizer to iterate over the landscape of parameter values to reach a suboptimal value. This approach works fine for cases with a limited number of unknown parameters, but has a downside of initialization. That is, when making the unknown parameter learnable, the parameter needs initialization, for which a good guess may not be known and sometimes can completely lead to the failure of PINN. The second approach is to consider a pseudo function whose mean over space-time is the unknown parameter. This pseudo function is considered as an output of the neural network. With such an approach, the mean of the function needs to be passed on to the loss function to estimate the parameter. Hence, although the neural network has two outputs, The quantities of interest are the first output which is the solution of the PDE and the mean of the second output, which is the unknown parameter itself. The benefit of this approach is that no initialization of the unknown parameter is required. However, a negative side is that if there are several parameters, the training can be expensive. However, as we have a single parameter, this approach suits well. In addition, if one has to approximate an unknown source term or function through PINN, the second approach is more feasible as the first approach would require an intelligent parametrization of the unknown function field.

C.6 Helmholtz equation with parameter field

We repeat the experiment from Dong and Wang [18] and set up an inverse problem involving the Helmholtz equation on Ω=[0,1.5]2Ωsuperscript01.52\Omega=[0,1.5]^{2}roman_Ω = [ 0 , 1.5 ] start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT with Dirichlet (fixed-value) boundary conditions, and a space-dependent parameter field γ𝛾\gammaitalic_γ, so that

Δu(x)γ(x)u(x)Δ𝑢𝑥𝛾𝑥𝑢𝑥\displaystyle\Delta u(x)-\gamma(x)u(x)roman_Δ italic_u ( italic_x ) - italic_γ ( italic_x ) italic_u ( italic_x ) =\displaystyle== f(x)forxΩ,𝑓𝑥for𝑥Ω\displaystyle f(x)\quad\textnormal{for}\quad x\in\Omega,italic_f ( italic_x ) for italic_x ∈ roman_Ω ,
u(x)𝑢𝑥\displaystyle u(x)italic_u ( italic_x ) =\displaystyle== g(x)forxΩ.𝑔𝑥for𝑥Ω\displaystyle g(x)\quad\textnormal{for}\quad x\in\partial\Omega.italic_g ( italic_x ) for italic_x ∈ ∂ roman_Ω .

Here, we fix the same ground truth solution as Dong and Wang [18], to

γ(x)𝛾𝑥\displaystyle\gamma(x)italic_γ ( italic_x ) =\displaystyle== 100[1+0.25sin(2πx1)+0.25sin(2πx2)],100delimited-[]10.252𝜋subscript𝑥10.252𝜋subscript𝑥2\displaystyle 100\left[1+0.25\sin(2\pi x_{1})+0.25\sin(2\pi x_{2})\right],100 [ 1 + 0.25 roman_sin ( 2 italic_π italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) + 0.25 roman_sin ( 2 italic_π italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ] ,
u(x)𝑢𝑥\displaystyle u(x)italic_u ( italic_x ) =\displaystyle== [2.5sin(πx10.4π)+1.5cos(2πx1+0.3π)]delimited-[]2.5𝜋subscript𝑥10.4𝜋1.52𝜋subscript𝑥10.3𝜋\displaystyle[2.5\sin(\pi x_{1}-0.4\pi)+1.5\cos(2\pi x_{1}+0.3\pi)][ 2.5 roman_sin ( italic_π italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - 0.4 italic_π ) + 1.5 roman_cos ( 2 italic_π italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + 0.3 italic_π ) ]
[2.5sin(πx20.4π)+1.5cos(2πx2+0.3π)].absentdelimited-[]2.5𝜋subscript𝑥20.4𝜋1.52𝜋subscript𝑥20.3𝜋\displaystyle\cdot[2.5\sin(\pi x_{2}-0.4\pi)+1.5\cos(2\pi x_{2}+0.3\pi)].⋅ [ 2.5 roman_sin ( italic_π italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT - 0.4 italic_π ) + 1.5 roman_cos ( 2 italic_π italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT + 0.3 italic_π ) ] .
Refer to caption
Figure 15: PINN predictions for the inverse parametric Poisson PDE problem.