Solving partial differential equations
with sampled neural networks

Chinmay Datar¹, Taniya Kapoor², Abhishek Chandra³, Qing Sun¹, Iryna Burak¹,
Erik Lien Bolager¹, Anna Veselovska¹, Massimo Fornasier¹, and Felix Dietrich¹

¹School of Computation, Information and Technology
Technical University of Munich
Garching, Germany
&²Department of Engineering Structures
Delft University of Technology
Delft, The Netherlands
&³Department of Electrical Engineering
Eindhoven University of Technology
Eindhoven, The Netherlands
Corresponding author, felix.dietrich@tum.de.

Abstract

Approximation of solutions to partial differential equations (PDE) is an important problem in computational science and engineering. Using neural networks as an ansatz for the solution has proven a challenge in terms of training time and approximation accuracy. In this contribution, we discuss how sampling the hidden weights and biases of the ansatz network from data-agnostic and data-dependent probability distributions allows us to progress on both challenges. In most examples, the random sampling schemes outperform iterative, gradient-based optimization of physics-informed neural networks regarding training time and accuracy by several orders of magnitude. For time-dependent PDE, we construct neural basis functions only in the spatial domain and then solve the associated ordinary differential equation with classical methods from scientific computing over a long time horizon. This alleviates one of the greatest challenges for neural PDE solvers because it does not require us to parameterize the solution in time. For second-order elliptic PDE in Barron spaces, we prove the existence of sampled networks with $L^{2}$ convergence to the solution. We demonstrate our approach on several time-dependent and static PDEs. We also illustrate how sampled networks can effectively solve inverse problems in this setting. Benefits compared to common numerical schemes include spectral convergence and mesh-free construction of basis functions.

1 Introduction

Approximation of solutions to partial differential equations (PDE) is an important problem in computational science and engineering. Many numerical solution methods using neural networks involve the parametrization of the solution over the entire space and time domain, followed by an expensive hyperparameter search over network architectures and training parameters. In this paper, we demonstrate that sampling a specific, data-dependent probability distribution for the weights of neural networks allows us to solve PDE by using individual neurons as basis functions (cf. Figure 1).

Refer to caption — Figure 1: Solving a Poisson PDE with a neural network ansatz. We propose to sample random neural basis functions $\phi_{k}$ on the given domain, and then proceed to solve for their coefficients $c_{k}$ .

We identify three problem settings that neural PDE solvers deal with and propose algorithms and examples that leverage random sampling of internal weights to address these. We compare the results of our sampled networks with the Physics-informed neural networks (PINNs) and discuss our method’s benefits and drawbacks compared to Isogeometric Analysis in the Finite Element Method framework (IGA-FEM).

Linear, static PDEs on complicated geometries present a challenge for mesh-based methods. Our sampling approach is particularly effective because it is mesh-free, and the linear operator of the PDE must only be applied to the sampled and fixed basis. Consequently, only a single linear system must be solved to construct the remaining weights in the network’s final layer to compute the solution.

Time-dependent problems present a challenge for recently developed neural PDE solvers, especially when the solution changes rapidly. Many solvers treat the temporal dimension as just another space dimension, meaning the neural network solution must capture all variations in time and space. Our sampling approach allows us to use the separation of variables to split the problem into a space- and a time-dependent part. The neurons we sample only capture the spatial variability, while the temporal variations are resolved through the solution of an associated ordinary differential equation. This allows us to integrate over exceedingly long time horizons with high accuracy.

Inverse problems present an even greater challenge for neural PDE solvers that are fully trained with back-propagation, because multiple neural networks must be trained jointly to compute the solution and the coefficients in the PDE. We demonstrate that randomly selecting internal weights simplifies much of the joint optimization challenge, resulting in optimization problems that are either low-dimensional or bi-linear.

2 Related Work

Many classical methods to solve PDE exist, together with theoretical guarantees and approximations with high accuracy. Isogeometric analysis (IGA) is one such method in which spline-based basis functions are defined over a structured grid (cf. Hughes et al. [33], Cottrell et al. [11, 12]) and is designed to integrate computer-aided design (CAD) and finite element analysis (FEA). Here, we will compare the results, benefits, and disadvantages of the neural PDE solvers to classical methods like IGA. Despite their solid theoretical grounding, classical mesh-based methods often entail a time-consuming setup phase, especially when mesh generation is necessary. Moreover, user-friendly software is often not readily available. Given the challenges posed by the mesh-based methods, there has been a substantial development of meshfree methods, including those based on radial basis functions (RBFs) (cf. Powell [54], Chen et al. [7]), moving least squares (MLS) (cf. Shepard [64], Lancaster and Salkauskas [40]), and smoothed particle hydrodynamics (SPH) (cf. Lucy [47], Gingold and Monaghan [28], Shadloo et al. [61]).

Neural PDE solvers offer an attractive framework to approximate solutions of PDEs because of the high expressivity of neural networks (cf. Rudi and Rosasco [60]), their ability to represent functions in high dimensions (cf. E [24], Wu and Long [71]), and powerful software for automatic differentiation (e.g., Pytorch cf. [51], TensorFlow cf. [1], and specialized software like DeepXDE cf. [46]). Earlier work on solving PDEs through neural networks (cf. Dissanayake and Phan-Thien [16], Lagaris et al. [39]) was recently popularized in the form of Physics-informed neural networks (PINNs) (cf. Raissi et al. [56]) and neural operators (cf. Lu et al. [45], Li et al. [42], Raonic et al. [58]), and others (cf. Han et al. [30], Sirignano and Spiliopoulos [65]). There has been a lot of development in accelerating neural PDE solvers involving variants of PINNs (cf Cho et al. [10], Meng et al. [48]), Sharma and Shankar [63], and Chiu et al. [9]), methods based on hash-encoding (cf Huang and Alkhalifah [32], Wang et al. [68]) and transfer learning (cf Kapoor et al. [37])). However, these methods rely on backpropagation and necessitate computing the derivatives of the given PDE for each forward pass, rendering them computationally intensive, even for simple PDEs, and often resulting in limited approximation accuracy. Moreover, despite the recent extensions of PINNs and Deep Operator Networks (DeepONets) (cf. Kim et al. [38], Kapoor et al. [34, 36], Ren et al. [59], Rao et al. [57], Zhu et al. [73]), extrapolating in time and training these networks over large space-time domains (without using multiple neural networks in space-time) poses a big challenge.

Spectral methods for solving PDEs promise fast convergence with a small number of basis functions. Meuris et al. [49] presents a machine learning-based spectral method in which hierarchical spatial basis functions are extracted from a trained DeepONet and employed in a spectral method to solve the PDE. Xia et al. [72] integrate adaptive techniques for spectral methods into PINN-based PDE solvers to obtain numerical solutions of unbounded domain problems that standard PINNs cannot efficiently approximate. Lange et al. [41] propose spectral methods that fit linear and nonlinear oscillators to data and facilitate long-term forecasting of temporal signals. Dresdner et al. [20] demonstrate ML-augmented spectral solvers that provide sub-grid corrections to classical spectral methods and improve the accuracy of spectral-only methods. Recently, Du et al. [21] proposed an approach using fixed orthogonal bases to learn PDE solutions as mappings between spectral coefficients and introduce a new training strategy based on spectral loss. These methods differ from ours in problem setting, architecture, and training.

Randomized neural networks for solving PDEs have been studied, mostly combining Extreme Learning Machines (ELMs) with the self-supervised setting of PINNs (cf. Chen et al. [6], Wang and Dong [70], Shang and Wang [62], Sun et al. [66]). Dwivedi and Srinivasan [22] propose a physics-informed extreme learning machine (PIELM) to efficiently solve linear PDEs, while Calabrò et al. [5], Galaris et al. [27] employ ELMs to learn invariant manifolds as well as PDE from data. Dong and Yang [19] establish that given a fixed computational budget, ELMs achieve substantially higher accuracy compared to classical second-order FEM and slightly higher accuracy compared to higher-order FEM. For static, nonlinear PDEs, ELMs can be used together with nonlinear optimization schemes (cf. Fabiani et al. [26]). On larger spatiotemporal domains, Dong and Li [17] and Dwivedi et al. [23] propose using multiple distributed ELMs on multiple subdomains. These approaches for solving time-dependent PDEs, treat time as an extra dimension in space, and neural basis functions span (a part of) the space-time domain. Moreover, our approach is significantly different as we will discuss in Section 4.

Inverse problems pose a challenge for iterative training methods because the loss landscape is typically more complicated to navigate. Owing to the problems associated with overfitting to noisy data and high sensitivity of accuracy with respect to the number of neurons in PIELM, Liu et al. [44] propose a Bayesian physics-informed extreme learning machine (BPIELM) to solve both forward and inverse linear PDE problems with noisy data. Dwivedi et al. [23] use distributed PINNs to solve inverse problems for predicting scalar coefficients of PDEs. Miao and Chen [50] propose VC-PINN specifically designed for PDE problems with variable coefficients, and Herrmann et al. [31] discuss PINNs for full-waveform inversion problems. Dong and Wang [18] use ELMs to solve inverse PDE problems involving predictions of scalar coefficients in PDEs and space-varying coefficients as functions of space. In 5.2, we show that many of these inverse problems are solved in a setting that makes them more challenging than they really are.

3 Mathematical framework

We discuss solution methods for PDEs on domains $\Omega\in\mathbb{R}^{d}$ with boundary $\partial\Omega$ . We mostly address linear PDEs with solutions $u:\Omega\times\mathbb{R}\to\mathbb{R}$ . These PDEs are defined by linear operators $L$ and $B$ that only involve derivative operators in space, and functions $f:\Omega\to\mathbb{R}$ , $g:\partial\Omega\to\mathbb{R}$ that define forcing and boundary conditions. For nonlinear PDEs, we denote $N$ as a nonlinear operator, and its scaling $\lambda\geq 0$ is either zero (for linear PDEs) or positive (for nonlinear PDEs). Then,

	$\displaystyle u_{t}(x,t)+Lu(x,t)+\lambda N(u)(x,t)$	$\displaystyle=$	$\displaystyle f(x),\ x\in\Omega,\ t\in[0,T],$		(1)
	$\displaystyle Bu(x,t)$	$\displaystyle=$	$\displaystyle g(x),\ x\in\partial\Omega,$		(2)

where we denote by $u_{t}$ the first derivative of $u$ by time.

3.1 Neural network ansatz

We parameterize the approximation of a solution with a neural network with one hidden layer, activation function $\sigma=\tanh$ , and $m$ neurons so that

\hat{u}(x,t)=c(t)^{\top}A\sigma(Wx+b)+c_{0}(t).

(3)

The parameters of the network are split into time-dependent and time-independent ones, $c(t)\in\mathbb{R}^{m}$ , $c_{0}(t)\in\mathbb{R}$ and $A\in\mathbb{R}^{m\times m}$ , $W\in\mathbb{R}^{m\times d}$ , $b\in\mathbb{R}^{m}$ . We will distinguish between two approaches with two different weight spaces for the hidden layer. For the extreme learning machine (ELM) framework, the weight and bias space is the full space $\mathbb{R}^{m\times d}\times[-\eta,\eta]$ , where $\eta$ is sufficiently large. The second approach, the sampling where it matters (SWIM) framework, follows Bolager et al. [4] and restricts the weight space to $\Omega\times\Omega$ . We construct each weight and bias pair $w_{k},b_{k}$ by taking two points $x^{(1)},x^{(2)}\in\Omega$ and construct the weight and bias as

\displaystyle w_{k}=s_{1}\frac{x^{(2)}-x^{(1)}}{\lVert x^{(2)}-x^{(1)}\rVert^{% 2}},\quad b_{k}=-\langle w_{k},x^{(1)}\rangle+s_{2},

(4)

where $s_{1},s_{2}$ are constants dependent on the activation function. We distinguish between the two approaches by referring to neural networks constructed by SWIM or ELM. While Section 4 describes each approach in more detail, here we will show that using neural networks as an ansatz is useful, even in the restrictive SWIM setting.

3.2 Existence of solutions

We consider a family of second-order elliptic PDEs on $\mathbb{R}^{d}$ of the form

\mathcal{L}u=-\nabla\cdot(\Psi\nabla u)+\psi u=f.

(5)

Following Chen et al. [8], we add two assumptions on the coefficients $\Psi$ and $\psi$ , as well as the source term $f$ . The first assumption, Assumption A.1, assures the existence of a unique weak solution $u^{*}$ . The second assumption, Assumption A.2, narrows the space of PDEs down to those where the coefficients and source term are in a Barron space. For more on what types of $\Psi,\psi,f$ are included and the definition of Barron spaces, see Appendix A.

With this in mind, the following result shows that there exists sampled (both ELM and SWIM) neural networks that can accurately approximate a weak solution of an elliptic PDE, where the SWIM setting restricts the weights and biases to the form Equation 4.

Theorem 3.1.

Consider a second-order elliptic PDE of the form (5) with the coefficients $\Psi(x),\psi(x)$ and the source term $f(x)$ satisfying Assumption A.1 and Assumption A.2, with weak solution $u^{*}\in H^{1}(\mathbb{R}^{d})$ . Let $\Omega\subset\mathbb{R}^{d}$ be any open, bounded subset following Definition 1 in Appendix A and $\mu$ being the Lebesgue measure. Then, for any $\epsilon\in(0,\frac{1}{2})$ , there exists a neural network with one hidden layer and $\tanh$ activation function of the form (3), constructed either by ELM or SWIM, such that

\lVert\hat{u}-u^{*}\rVert_{L_{2}(\Omega,\mu)}\leq\epsilon.

The proof can be found in Appendix A. We continue the theoretical analysis to show that convergence in the setup of Theorem 3.1 can also be achieved when the hidden layer is sampled randomly, either by data-agnostic (ELM) or data-driven methods (SWIM) and then fixed, so that only the outer weights can vary. Let $\widetilde{\rho}$ be one of the three distributions described in Section 4.1, with $\eta$ set sufficiently large.

Theorem 3.2.

With the assumptions and setup of Theorem 3.1, let the sampling method be either ELM or SWIM with a distribution $\widetilde{\rho}$ , and let $\delta\in(0,1]$ . Then, for any $\epsilon\in(0,\frac{1}{2})$ , there exists a $K\in\mathbb{N}$ such that after sampling $K$ pairs of weights $W\in\mathbb{R}^{K\times d}$ and biases $b\in\mathbb{R}^{K}$ i.i.d. from $\widetilde{\rho}$ , there exist outer weights $c\in\mathbb{R}^{K}$ such that with probability $1-\delta$ the following bound holds:

\lVert\hat{u}-u^{*}\rVert_{L_{2}(\Omega,\mu)}=\lVert c^{\top}\sigma(W\cdot+b)-% u^{*}\rVert_{L^{2}{(\Omega,\mu)}}\leq\epsilon.

The proof is in Section A.5, and relates the number of neurons $K$ to the accuracy $\epsilon$ and the probability $1-\delta$ .

4 Approximation of PDE solutions using the neural network ansatz

As hinted at in the previous section, the neural network ansatz is very flexible in approximating the solution of the PDE. We now describe how to construct the basis functions of the ansatz in Section 4.1. We then explain how to construct more specialized basis functions that satisfy boundary conditions in Section 4.3. If the PDE we want to solve is time-dependent, we propose solving an ordinary differential equation associated with our construction of the basis, cf. Section 4.2.

4.1 Random sampling of parameters in the hidden layer

Unlike classical machine learning approaches, we do not employ gradient-descent type optimization to construct the parameters of the hidden layer. Instead, we first construct an expressive function space basis composed of neurons $\sigma(Wx+b)$ over the domain $\Omega$ by sampling $W$ and $b$ from a certain probability distribution. Here, we consider two approaches for sampling: ELM and SWIM. In ELM, we sample from the full Euclidean space with a Gaussian distribution for the weights and a uniform distribution in $[-\eta,\eta]$ for the bias. The second method, SWIM, needs a distribution over $\Omega\times\Omega$ . The common choices are the uniform distribution or one where the density is based on $\frac{\lVert f(x^{(2)})-f(x^{(1)})\rVert}{\lVert x^{(2)}-x^{(1)}\rVert}$ , with $f$ being the true function in a supervised setting. While ELM is a data-agnostic method, SWIM takes the data into account more directly and uses the geometry of the input space to inform which weights and biases are more useful for the approximation. This technique can be quite advantageous when sampling, as one wishes to minimize the amount of irrelevant or redundant weights and biases in a network. Empirically, it has been observed that SWIM outperforms ELM on many classical machine learning tasks [4].

4.2 Solving time-dependent PDE

For both linear and nonlinear time-dependent PDE, we can plug the ansatz (3) into the PDE (1) to formulate an ODE for the time-dependent coefficients. We denote a randomly sampled set of $m$ neurons by $\phi(x):=(\sigma(w_{1}^{\top}x+b_{1}),\sigma(w_{2}^{\top}x+b_{2}),\dots,\sigma% (w_{m}^{\top}x+b_{m}))$ . For a set of $N_{c}$ collocation points assembled in the columns of a matrix $X\in\mathbb{R}^{d\times N_{c}}$ , we denote the pseudo-inverse by $\cdot^{+}$ , write $C:=(c^{\top},c_{0})\in\mathbb{R}^{m+1}$ , and $\Phi(X):=(A\phi(X),1)^{\top}\in\mathbb{R}^{N_{c}\times(m+1)}$ . We then define the ODE

C_{t}(t)=\Phi(X)^{+}P(X,C(t)),

(6)

with the reformulation of the PDE (1) by $P(X,C(t))=-L\Phi(X)C(t)-\lambda N(\Phi(X)C(t))+f(X)$ . The initial condition for this ODE is given through $C(0)=\Phi(X)^{+}u(X,0)$ . The boundary conditions are satisfied automatically if we compute the matrix $A$ as described next.

4.3 Constructing outer weights to satisfy the boundary condition

We can construct the parameter matrix $A$ so that the “outer basis functions” $A\phi(x)$ approximately satisfy the boundary condition, i.e., $BA\phi\approx 0$ on $\partial\Omega$ . This step is most important for time-dependent problems because the boundary condition must be satisfied at all times. For static PDE, the coefficients $c^{\top}A$ can be constructed by jointly solving for parameters that satisfy boundary conditions and the PDE. For linear, static PDE, this can be achieved through the solution of a single linear system. For nonlinear, static PDE, we would need to resort to non-linear optimization problems to find $c^{\top}A$ . For time-dependent problems, the construction of $A$ depends on which outer basis functions are useful for the given boundary conditions. For periodic boundaries, we find $A$ so that $A\phi(x_{l})=A\phi(x_{r})$ , where $x_{l},x_{r}$ are the left and right boundary points of the spatial domain. In this paper, for $x\in\Omega$ and $k=1,2,\dots,m$ , we approximate $[A\phi]_{k}(x)=\sin(kx)$ (for $k$ even) and $[A\phi]_{k}(x)=\cos(kx)$ (for $k$ odd) and set $c_{0}(t)=0\;\forall t$ . As the ansatz is a linear combination of the basis functions, if all of them satisfy the boundary conditions, the ansatz satisfies them, too.

After the approximation of these new basis functions, we orthogonalize the basis functions to improve the condition number of the associated ODE (6). For this step, on the points $X\in\mathbb{R}^{N_{c}\times d}$ , we compute a truncated singular value decomposition of $A\phi(X)\in\mathbb{R}^{N_{c}\times m}$ to obtain matrices $U_{r},\Sigma_{r},$ and $V_{r}$ with $r\leq m$ so that $U_{r}\Sigma_{r}V_{r}^{\top}=A\phi(x)+O(\Sigma_{r+1})$ . We then define $A_{r}:=V_{r}^{\top}A$ and use it instead of the matrix $A$ . This ensures $A_{r}\phi(x)$ are orthogonal functions on the data $X$ , and the matrix $A_{r}\phi(X)$ has bounded condition number.

For Dirichlet boundary conditions ( $u(x)=0$ for $x\in\partial\Omega$ ), we can employ the same technique. However, it is also possible to augment the ODE (6) with an additional equation,

\hat{u}_{t}(x)=-\kappa\hat{u}(x)\,\text{for }x\in\partial\Omega\,\implies\,C_{% t}(t)=\underbrace{\Phi((X,X_{b}))^{+}}_{\in\mathbb{R}^{(m+1)\times(N_{c}+N_{b}% )}}\underbrace{(P(X,C(t)),-\kappa\Phi(X_{b})C(t))^{\top}}_{\in\mathbb{R}^{N_{c% }+N_{b}}},

where $\kappa>0$ is a fixed parameter, $X$ are the $N_{c}$ collocation points and $X_{b}$ is a (column-wise) collection of $N_{b}$ points on the boundary $\partial\Omega$ . The augmented equation will quickly force the value of the ansatz $\hat{u}$ to zero on the boundary $\partial\Omega$ . This technique allows setting $A$ to the identity matrix.

4.4 Solving inverse problems involving PDEs

The classical inverse problem setting is given when certain parts (parameters) of the PDE are unknown, and we only have sparse measurements of the solution. In this case, we propose two solution methods to construct both the unknown parameters of the PDE and its solution over the entire domain. The first method concerns PDEs where only a low-dimensional parameter space is involved, e.g., we only search for a low number of $K$ scalar parameters. Here, we can effectively optimize over the $K$ -dimensional parameter space. For each suggested optimum, we solve the PDE directly, because there are no unknown parameters left. We then evaluate our solution at the measurement points where we know the ground truth and compute the difference. This defines a loss function over the low-dimensional parameter space. We demonstrate this method in Section 5.2.1. The second method concerns PDEs for which an entire parameter field is unknown. Here, the parameter space is technically infinite-dimensional, and optimization methods working in low-dimensional spaces are not applicable anymore. We demonstrate in Section 5.2.2 that the solution and the parameter field can be parametrized with separate sampled neural networks. Then, we need to solve one joint problem for their last-layer weights. Even though this joint problem is generally nonlinear, it is only bi-linear in certain cases and can then be solved effectively.

5 Computational experiments

We now demonstrate our approach to solve PDEs in time-dependent, static, and inverse problems. The software and hardware environments used to perform the experiments are listed in Appendix B. The code to reproduce the experiments from the paper, and an up-to-date code base, can be found at

https://gitlab.com/felix.dietrich/swimpde-paper,

https://gitlab.com/felix.dietrich/swimpde.

Table 1 lists all PDE we solve together with forcing and boundary terms on the given domain.

Table 1: Summary of PDEs we solve in this paper.

D_{t}

denotes differentiation by time,

D_{x},D_{xxxx}

by space (once and four times),

\Delta

is the Laplacian in space, and

T

is the final time. Functions

f,g

are different for each PDE, and parameters

\alpha,\beta,\nu

, and the field

\gamma(x)

are described in each subsection.

Section	PDE	Operator	Forcing	Boundary	Domain
5.1.1	Advection	$D_{t}+\beta D_{x}$	0	Periodic	$[0,2\pi]\times[0,T]$
5.1.2	Euler-Bernoulli	$D_{tt}+D_{xxxx}$	f	$(1,D_{xx})=0$	$[0,\pi]\times[0,1]$
5.1.3	Burgers	$D_{t}+D_{x}(\cdot)^{2}-\nu\Delta$	0	0	$[-1,1]\times[0,1]$
5.2.1	Poisson	$-\text{div}((1,\alpha)D_{x})$	f	g	$[0,1.4]^{2}$
5.2.2	Helmholtz	$-\Delta+\gamma(x)$	f	g	$[0,1.5]^{2}$

In Section 5.1, we use increasingly difficult examples and demonstrate how time-dependent problems can be addressed with the approach of separation of variables. Separately, in Section C.1, we discuss how solving static linear PDE on a complicated geometry is straightforward with our approach. Finally, in Section 5.2, we show how inverse problems with unknown parameters and parameter fields can be solved. We use the root mean squared error (RMSE) and the relative $L^{2}$ error to quantify errors in all experiments (cf. Appendix B for the definitions). We compute the test error on a uniform grid for all time-dependent PDEs with 256 points in space and 100 points in time.

We compare our results with physics-informed neural networks (PINNs), both classical [56] and causality-respecting (causal PINNs) [69], and isogeometric finite element analysis (IGA-FEM)[33, 11, 12]. The details of these methods can be found in Section B.2 and Section B.3, respectively.

5.1 Time-dependent PDE

In this section, we compare our approach (cf. Section 4.2) to other neural PDE solvers and IGA-FEM on PDE problems involving high advection speeds, long-time simulation, higher-order derivative terms, and steep gradients in the solution.

5.1.1 Linear advection equation

We consider the linear advection equation ( $u_{t}+\beta u_{x}=0$ , also see Section C.2) with the initial condition $u(x,0)=\sin(x)$ and periodic boundary conditions. The analytical solution is given by $u(x,t)=\sin(x-\beta t)$ . In the first experiment, we solve this PDE using different neural PDE solvers and IGA-FEM for increasing flow velocities $\beta$ over the domain $\Omega\times T=[0,2\pi]\times[0,1]$ . The details on hyper-parameters and the setup of this experiment are listed in Section C.2. Figure 3 shows that approaches using basis functions in the entire spatio-temporal domain, such as PINNs, ELM, and SWIM fail as the flow velocity $\beta$ increases beyond $40$ . In contrast, ELM-ODE, SWIM-ODE, and IGA-FEM can accurately solve the PDE, even for high values of $\beta$ . Figure 3 shows that for $\beta=40$ , ${L}^{2}_{\text{relative}}$ decays exponentially with the number of basis functions for ELM-ODE, SWIM-ODE, and IGA-FEM. In contrast to IGA-FEM, which uses local basis functions, ELM-ODE and SWIM-ODE require fewer basis functions for a fixed ${L}^{2}_{\text{relative}}$ , because they use global basis functions. In this example, PINNs yield high errors. In the second experiment, we attempt solve the PDE with $\beta=1$ for $T=[0,1000]$ , with the true solution shown in Figure 4(a). Figure 4(b) shows that with ELM-ODE and SWIM-ODE, we can solve over $1000$ seconds with $L^{2}_{\text{relative}}$ of less than $0.001\%$ , requiring only $0.94$ seconds of runtime. Simulating long-time dynamics is a longstanding challenge for traditional neural PDE solvers [43, 36], and all solvers using neural network basis functions extending over both space and time, such as PINN, ELM, and SWIM, fail at approximating functions on long time intervals.

5.1.2 Fourth order Euler–Bernoulli Beam Equation

In this example, the beam equation ( $u_{tt}+u_{xxxx}=f$ , also see Section C.3) is solved with initial data $u(x,0)=\sin(x)$ on the spatial domain $\Omega=[0,\pi]$ . The force function and the analytical solution are taken from [35]. The challenge here is not high advection speed but the higher orders of derivatives in space and time. The details of the PDE, the absolute error plots, and the comparison of approaches can be found in Section C.3. Table 2 shows that ODE-ELM is more than five orders of magnitude faster and more accurate than PINNs.

5.1.3 Non-linear Viscous Burgers equation

We now demonstrate how sampling weights and biases from a data-dependant distribution can be exploited to handle locally steep gradients in the solution of a non-linear Burgers equation. We compare our approximations to the reference solution provided by Raissi et al. [56]. Table 15 indicates that ODE-ELM cannot accurately represent the sharp gradient in the domain’s center due to the exponentially small probability of having large norms of internal weights. Sampling ELM-ODE weights from a broader uniform distribution increases the probability of having steeper basis functions, as Calabrò et al. [5] discuss for linear PDEs. However, given enough collocation points in the domain’s center, ODE-SWIM can create numerous basic functions with steep gradients, accurately placing them in the domain’s center by factoring in the data. To concentrate collocation points near the shock in the domain’s center, we resample them two times after a set number of time steps, guided by a probability distribution that leverages the gradient of the approximated solution. At the resampling time $t_{r}\in[0,T]$ , we approximate the probability density ${p(x)}\sim|\nabla\hat{u}(x,t_{r})|$ , which we then use to re-sample collocation points at random. While PINNs provide a reasonable error, ODE-SWIM is more accurate by an order of magnitude, almost twice as fast as regular PINN, and over ten times faster than causal PINN.

Table 2: Selective summary of the results for the forward problems (all results in Appendix C).

PDE	Method	Training time ( $\mathrm{s}$ )	Relative ${L}^{2}$ error
Advection	PINN	30.56	6.92e-1 $\pm$ 2.96e-2
	ODE-ELM (our)	2.71	3.84e-6 $\pm$ 5.2e-7
	IGA-FEM	0.07	1.17e-10
Euler-Bernoulli	PINN	2303.71	4.21e-3 $\pm$ 9.56e-4
	ODE-ELM (our)	0.06	3.50e-8 $\pm$ 7.79e-9
	IGA-FEM	0.94	4.21e-7
Burgers	PINN	275.23	3.88e-3 $\pm$ 2.61e-3
	ODE-SWIM (our)	141.35	3.33e-4 $\pm$ 4.63e-4
	IGA-FEM	13.61	2.20e-4

5.2 Inverse problems

We now demonstrate the efficacy of our approach in solving inverse problems. They consist of PDEs with unknown parameters or source terms. Here, we solve two inverse problems, estimating a unknown scalar parameter and the coefficient field. Table 3 lists all results in this section.

Table 3: Summary of results for inverse problems. The Poisson PDE does not have parameter

\gamma

, and the Helmholtz PDE does not have parameter

\alpha

. We could not solve the Helmholtz PDE with PINNs.

	Poisson coefficient		Helmholtz field
	PINN	SWIM	PINN	SWIM
Architecture	(2, 4 $\times$ 20, 2)	(2,1024,1)	-	(2,400,1) for $u$ and $\gamma$
$\alpha$	9.94e-1 $\pm$ 3.52e-3	1.003 $\pm$ 3.45e-3	-	-
Rel. error (PDE)	1.11e-3 $\pm$ 3.07e-4	8.8e-2 $\pm$ 6.4e-2	-	1.85e-2 $\pm$ 1.5e-2
Rel. error ( $\gamma$ )	-	-	-	1.23e-2 $\pm$ 9.10e-3
Training time ( $\mathrm{s}$ )	140.04 $\pm$ 6.27	2.85 $\pm$ 4.3e-1	-	5.2e-2 $\pm$ 1.6e-2

5.2.1 Parametric Poisson PDE

Here, we solve a Poisson equation on $\Omega=[0,1.4]^{2}$ with parameterized, diagonal diffusivity matrix $D=\text{diag}\left(1,\alpha\right)$ , $\alpha>0$ . The unknown parameter $\alpha$ must be estimated from a set of $50$ measurements, distributed uniformly at random over $\Omega$ . We only need to solve a one-dimensional optimization problem over $\alpha$ . For any $\alpha$ , we solve the PDE and compute the MSE of our approximation to the known measurements (cf. Section C.5). We minimize the MSE with scipy.optimize.minimize_scalar (cf. Table 3). For PINNs, the solution method is described in Section C.5.

5.2.2 Helmholtz PDE with parameter field

We assume access to values of $u$ at $N=300$ measurement points, distributed uniformly at random inside $\Omega$ . The goal is to construct an accurate approximation of $u$ and $\gamma$ on $\Omega$ . Different to Dong and Wang [18], we use $\tanh$ (not any uncommon activation function), and also do not distribute additional measurement points on the boundary (we do not assume access to any values from $g$ ). Figure 6 shows the resulting approximation with a network of $M=400$ neurons in one hidden layer. The relative error is plotted over a grid of $101\times 101$ test points.

6 Conclusion

We discuss how PDEs can be solved with a neural network ansatz if the parameters of each neuron are sampled from specific probability distributions. We prove that such approximations converge to the true solution of the PDE in the $L^{2}$ norm and discuss how time-dependent PDEs can be solved with a separation of variables approach. We introduce a simple method to compute new basis functions to satisfy the boundary conditions over time. We also demonstrate how to effectively solve inverse problems.

Benefits of the approach include reducing the number of parameters because we choose many randomly. For static, linear PDEs, finding the solution reduces to solving a linear problem, for which many efficient methods exist already. For general time-dependent PDEs, sampling the ansatz in space at random allows us to reduce the problem to solving a (high-dimensional) ODE. We can use existing solution methods to solve this ODE and find the time-dependent parameters of the ansatz. In the inverse problem setting, we utilize the random basis to parameterize both the PDE solution and parameter fields, so far fewer parameters must be optimized.

Disadvantages of our method compared to neural solvers using back-propagation include that our networks require more neurons for the same accuracy, leading to higher inference times. We could use our method to solve the PDE and then construct a smaller neural network to approximate the new solution. The linear systems we must solve depend on the number of data points in the domain and the number of neurons in the neural ansatz. Scaling up both leads to cubic complexity in computational time and memory. However, the number of neurons does not need to grow arbitrarily for a fixed problem and accuracy level. In this case, our method scales linearly with the number of points.

Ethical considerations are important for any new machine learning approach because neural networks are generally dual-use. Our approach is based on classical methods from scientific computing, which are well understood. This connection now allows researchers to better understand our neural solvers’ behavior, failure modes, and robustness. We believe that the benefits of our approach far outweigh the potential downsides of misuse because a system that is understood better can also be controlled more straightforwardly.

Remaining challenges and future work include very high-dimensional base spaces and experiments on domain decomposition, which is a very effective method for low-dimensional problems. In principle, our approach should also work for this setting because the constraints required on the inner boundaries of domains are linear. Many exciting new directions that we will explore in the future are related to real-world problems in science and engineering. Our method is tailored to scattered, noisy data, where mesh generation for classical methods is tedious and challenging. Our approach is mesh-free, flexible w.r.t. different PDE settings, and can approximate the solution with comparatively low computational resources.

Acknowledgments and Disclosure of Funding

F.D. and C.D. acknowledge Sølve Eidnes and Martine Mahlum for a great workshop on physics in machine learning (in Oslo, 2024). We are grateful to Tim Bürchner for providing an initial version of the IGA code, and to Jana Huhne for providing an initial version of the neural PDE solver code. F.D., I.B., and E.L.B. acknowledge funding by the German Research Foundation—project 468830823, and association to DFG-SPP-229. C.D. is partially funded by the Institute for Advanced Study (IAS) at the Technical University of Munich. F.D. and Q.S. are supported by the TUM Georg Nemetschek Institute - Artificial Intelligence for the Built World.

References

Abadi et al. [2015] Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Ian Goodfellow, Andrew Harp, Geoffrey Irving, Michael Isard, Yangqing Jia, Rafal Jozefowicz, Lukasz Kaiser, Manjunath Kudlur, Josh Levenberg, Dandelion Mané, Rajat Monga, Sherry Moore, Derek Murray, Chris Olah, Mike Schuster, Jonathon Shlens, Benoit Steiner, Ilya Sutskever, Kunal Talwar, Paul Tucker, Vincent Vanhoucke, Vijay Vasudevan, Fernanda Viégas, Oriol Vinyals, Pete Warden, Martin Wattenberg, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. TensorFlow: Large-scale machine learning on heterogeneous systems, 2015. URL https://www.tensorflow.org/. Software available from tensorflow.org.
Barron [1993] A.R. Barron. Universal approximation bounds for superpositions of a sigmoidal function. IEEE Transactions on Information Theory, 39(3):930–945, May 1993. ISSN 0018-9448, 1557-9654. doi: 10.1109/18.256500.
Bellock et al. [2021] Ken Bellock, Neil Godber, and Philip Kahn. bellockk/alphashape: v1.3.1 release, April 2021.
Bolager et al. [2023] Erik L Bolager, Iryna Burak, Chinmay Datar, Qing Sun, and Felix Dietrich. Sampling weights of deep neural networks. In Advances in Neural Information Processing Systems, volume 36, pages 63075–63116. Curran Associates, Inc., 2023.
Calabrò et al. [2021] Francesco Calabrò, Gianluca Fabiani, and Constantinos Siettos. Extreme learning machine collocation for the numerical solution of elliptic PDEs with sharp gradients. Computer Methods in Applied Mechanics and Engineering, 387:114188, December 2021. ISSN 00457825. doi: 10.1016/j.cma.2021.114188.
Chen et al. [2024] Jingrun Chen, Weinan E, and Yifei Sun. Optimization of random feature method in the high-precision regime. Communications on Applied Mathematics and Computation, pages 1–28, 2024.
Chen et al. [2014] Wen Chen, Zhuo-Jia Fu, and C.S. Chen. Recent Advances in Radial Basis Function Collocation Methods. Springer Berlin Heidelberg, 2014. ISBN 9783642395727. doi: 10.1007/978-3-642-39572-7.
Chen et al. [2021] Ziang Chen, Jianfeng Lu, and Yulong Lu. On the representation of solutions to elliptic pdes in barron spaces. Advances in neural information processing systems, 34:6454–6465, 2021.
Chiu et al. [2022] Pao-Hsiung Chiu, Jian Cheng Wong, Chinchun Ooi, My Ha Dao, and Yew-Soon Ong. Can-pinn: A fast physics-informed neural network based on coupled-automatic–numerical differentiation method. Computer Methods in Applied Mechanics and Engineering, 395:114909, 2022.
Cho et al. [2024] Junwoo Cho, Seungtae Nam, Hyunmo Yang, Seok-Bae Yun, Youngjoon Hong, and Eunbyung Park. Separable physics-informed neural networks. Advances in Neural Information Processing Systems, 36, 2024.
Cottrell et al. [2009] J. Austin Cottrell, Thomas J. R. Hughes, and Yuri Bazilevs. Isogeometric Analysis: Toward Integration of CAD and FEA. Wiley Publishing, 1st edition, 2009. ISBN 0470748737.
Cottrell et al. [2006] J.A. Cottrell, A. Reali, Y. Bazilevs, and T.J.R. Hughes. Isogeometric analysis of structural vibrations. Computer Methods in Applied Mechanics and Engineering, 195(41):5257–5296, 2006. ISSN 0045-7825. doi: https://doi.org/10.1016/j.cma.2005.09.027. John H. Argyris Memorial Issue. Part II.
COX [1972] M. G. COX. The Numerical Evaluation of B-Splines*. IMA Journal of Applied Mathematics, 10(2):134–149, 10 1972. ISSN 0272-4960. doi: 10.1093/imamat/10.2.134.
Cybenko [1989] G. Cybenko. Approximation by superpositions of a sigmoidal function. Mathematics of Control, Signals and Systems, 2(4):303–314, December 1989. ISSN 1435-568X. doi: 10.1007/BF02551274.
de Boor [1972] Carl de Boor. On calculating with b-splines. Journal of Approximation Theory, 6(1):50–62, 1972. ISSN 0021-9045. doi: https://doi.org/10.1016/0021-9045(72)90080-9.
Dissanayake and Phan-Thien [1994] MWMG Dissanayake and Nhan Phan-Thien. Neural-network-based approximations for solving partial differential equations. communications in Numerical Methods in Engineering, 10(3):195–201, 1994.
Dong and Li [2021] Suchuan Dong and Zongwei Li. Local extreme learning machines and domain decomposition for solving linear and nonlinear partial differential equations. Computer Methods in Applied Mechanics and Engineering, 387:114129, 2021.
Dong and Wang [2023] Suchuan Dong and Yiran Wang. A method for computing inverse parametric pde problems with random-weight neural networks. Journal of Computational Physics, 489:112263, 2023.
Dong and Yang [2022] Suchuan Dong and Jielin Yang. On computing the hyperparameter of extreme learning machines: Algorithm and application to computational pdes, and comparison with classical and high-order finite elements. Journal of Computational Physics, 463:111290, 2022.
Dresdner et al. [2022] Gideon Dresdner, Dmitrii Kochkov, Peter Christian Norgaard, Leonardo Zepeda-Nunez, Jamie Smith, Michael Brenner, and Stephan Hoyer. Learning to correct spectral methods for simulating turbulent flows. Transactions on Machine Learning Research, 2022.
Du et al. [2023] Yiheng Du, Nithin Chalapathi, and Aditi Krishnapriyan. Neural spectral methods: Self-supervised learning in the spectral domain. arXiv preprint arXiv:2312.05225, 2023.
Dwivedi and Srinivasan [2020] Vikas Dwivedi and Balaji Srinivasan. Physics informed extreme learning machine (pielm)–a rapid method for the numerical solution of partial differential equations. Neurocomputing, 391:96–118, 2020.
Dwivedi et al. [2021] Vikas Dwivedi, Nishant Parashar, and Balaji Srinivasan. Distributed learning machines for solving forward and inverse problems in partial differential equations. Neurocomputing, 420:299–316, 2021.
E [2020] Weinan E. Towards a Mathematical Understanding of Neural Network-Based Machine Learning: What We Know and What We Don’t. CSIAM Transactions on Applied Mathematics, 1(4):561–615, June 2020. ISSN 2708-0560, 2708-0579. doi: 10.4208/csiam-am.SO-2020-0002.
E et al. [2022] Weinan E, Chao Ma, and Lei Wu. The Barron Space and the Flow-Induced Function Spaces for Neural Network Models. Constructive Approximation, 55(1):369–406, February 2022. ISSN 0176-4276, 1432-0940. doi: 10.1007/s00365-021-09549-y.
Fabiani et al. [2021] Gianluca Fabiani, Francesco Calabrò, Lucia Russo, and Constantinos Siettos. Numerical solution and bifurcation analysis of nonlinear partial differential equations with extreme learning machines. Journal of Scientific Computing, 89(2):44, November 2021. ISSN 0885-7474, 1573-7691. doi: 10.1007/s10915-021-01650-5.
Galaris et al. [2022] Evangelos Galaris, Gianluca Fabiani, Ioannis Gallos, Ioannis Kevrekidis, and Constantinos Siettos. Numerical Bifurcation Analysis of PDEs From Lattice Boltzmann Model Simulations: A Parsimonious Machine Learning Approach. Journal of Scientific Computing, 92(2):34, August 2022. ISSN 0885-7474, 1573-7691. doi: 10.1007/s10915-022-01883-y.
Gingold and Monaghan [1977] R. A. Gingold and J. J. Monaghan. Smoothed particle hydrodynamics: theory and application to non-spherical stars. Monthly Notices of the Royal Astronomical Society, 181(3):375–389, 12 1977. ISSN 0035-8711. doi: 10.1093/mnras/181.3.375.
Glorot and Bengio [2010] Xavier Glorot and Yoshua Bengio. Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the thirteenth international conference on artificial intelligence and statistics, pages 249–256. JMLR Workshop and Conference Proceedings, 2010.
Han et al. [2018] Jiequn Han, Arnulf Jentzen, and Weinan E. Solving high-dimensional partial differential equations using deep learning. Proceedings of the National Academy of Sciences, 115(34):8505–8510, 2018.
Herrmann et al. [2023] Leon Herrmann, Tim Bürchner, Felix Dietrich, and Stefan Kollmannsberger. On the use of neural networks for full waveform inversion. Computer Methods in Applied Mechanics and Engineering, 415:116278, October 2023. ISSN 00457825. doi: 10.1016/j.cma.2023.116278.
Huang and Alkhalifah [2024] Xinquan Huang and Tariq Alkhalifah. Efficient physics-informed neural networks using hash encoding. Journal of Computational Physics, 501:112760, 2024.
Hughes et al. [2005] T.J.R. Hughes, J.A. Cottrell, and Y. Bazilevs. Isogeometric analysis: Cad, finite elements, nurbs, exact geometry and mesh refinement. Computer Methods in Applied Mechanics and Engineering, 194(39):4135–4195, 2005.
Kapoor et al. [2023a] Taniya Kapoor, Abhishek Chandra, Daniel Tartakovsky, Hongrui Wang, Alfredo Núñez, and Rolf Dollevoet. Neural oscillators for generalizing parametric pdes. In The Symbiosis of Deep Learning and Differential Equations III, 2023a.
Kapoor et al. [2023b] Taniya Kapoor, Hongrui Wang, Alfredo Núñez, and Rolf Dollevoet. Physics-informed neural networks for solving forward and inverse problems in complex beam systems. IEEE Transactions on Neural Networks and Learning Systems, 2023b.
Kapoor et al. [2024a] Taniya Kapoor, Abhishek Chandra, Daniel M Tartakovsky, Hongrui Wang, Alfredo Nunez, and Rolf Dollevoet. Neural oscillators for generalization of physics-informed machine learning. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 38, pages 13059–13067, 2024a.
Kapoor et al. [2024b] Taniya Kapoor, Hongrui Wang, Alfredo Núñez, and Rolf Dollevoet. Transfer learning for improved generalizability in causal physics-informed neural networks for beam simulations. Engineering Applications of Artificial Intelligence, 133:108085, 2024b.
Kim et al. [2021] Jungeun Kim, Kookjin Lee, Dongeun Lee, Sheo Yon Jhin, and Noseong Park. Dpm: a novel training method for physics-informed neural networks in extrapolation. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 8146–8154, 2021.
Lagaris et al. [1998] Isaac E Lagaris, Aristidis Likas, and Dimitrios I Fotiadis. Artificial neural networks for solving ordinary and partial differential equations. IEEE transactions on neural networks, 9(5):987–1000, 1998.
Lancaster and Salkauskas [1981] Peter Lancaster and Kestutis Salkauskas. Surfaces generated by moving least squares methods. Mathematics of Computation, 37:141–158, 1981.
Lange et al. [2021] Henning Lange, Steven L Brunton, and J Nathan Kutz. From fourier to koopman: Spectral methods for long-term time series prediction. Journal of Machine Learning Research, 22(41):1–38, 2021.
Li et al. [2020] Zongyi Li, Nikola Borislavov Kovachki, Kamyar Azizzadenesheli, Kaushik Bhattacharya, Andrew Stuart, Anima Anandkumar, et al. Fourier neural operator for parametric partial differential equations. In International Conference on Learning Representations, 2020.
Lippe et al. [2024] Phillip Lippe, Bas Veeling, Paris Perdikaris, Richard Turner, and Johannes Brandstetter. Pde-refiner: Achieving accurate long rollouts with neural pde solvers. Advances in Neural Information Processing Systems, 36, 2024.
Liu et al. [2023] Xu Liu, Wen Yao, Wei Peng, and Weien Zhou. Bayesian physics-informed extreme learning machine for forward and inverse pde problems with noisy data. Neurocomputing, 549:126425, 2023.
Lu et al. [2021a] Lu Lu, Pengzhan Jin, Guofei Pang, Zhongqiang Zhang, and George Em Karniadakis. Learning nonlinear operators via deeponet based on the universal approximation theorem of operators. Nature machine intelligence, 3(3):218–229, 2021a.
Lu et al. [2021b] Lu Lu, Xuhui Meng, Zhiping Mao, and George Em Karniadakis. DeepXDE: A deep learning library for solving differential equations. SIAM Review, 63(1):208–228, 2021b. doi: 10.1137/19M1274067.
Lucy [1977] L. B. Lucy. A numerical approach to the testing of the fission hypothesis. The Astronomical Journal, 82:1013–1024, December 1977. doi: 10.1086/112164.
Meng et al. [2020] Xuhui Meng, Zhen Li, Dongkun Zhang, and George Em Karniadakis. Ppinn: Parareal physics-informed neural network for time-dependent pdes. Computer Methods in Applied Mechanics and Engineering, 370:113250, 2020.
Meuris et al. [2023] Brek Meuris, Saad Qadeer, and Panos Stinis. Machine-learning-based spectral methods for partial differential equations. Scientific Reports, 13(1):1739, 2023.
Miao and Chen [2023] Zhengwu Miao and Yong Chen. Vc-pinn: Variable coefficient physics-informed neural network for forward and inverse problems of pdes with variable coefficient. Physica D: Nonlinear Phenomena, 456:133945, 2023.
Paszke et al. [2017] Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. Automatic differentiation in PyTorch, 2017.
Piegl and Tiller [1997] Les Piegl and Wayne Tiller. The NURBS book (2nd ed.). Springer-Verlag, Berlin, Heidelberg, 1997. ISBN 3540615458.
Pinkus [1999] Allan Pinkus. Approximation theory of the MLP model in neural networks. Acta Numerica, 8:143–195, January 1999. ISSN 0962-4929, 1474-0508. doi: 10.1017/S0962492900002919.
Powell [1992] M J D Powell. The Theory of Radial Basis Function Approximation in 1990. In Advances in Numerical Analysis: Wavelets, Subdivision Algorithms, and Radial Basis Functions. Oxford University Press, 04 1992. ISBN 9780198534396. doi: 10.1093/oso/9780198534396.003.0003.
Prudhomme et al. [2001] Serge Prudhomme, Frédéric Pascal, J.Tinsley Oden, and Albert Romkes. A priori error estimate for the baumann–oden version of the discontinuous galerkin method. Comptes Rendus de l’Académie des Sciences - Series I - Mathematics, 332(9):851–856, 2001. ISSN 0764-4442. doi: https://doi.org/10.1016/S0764-4442(01)01936-X.
Raissi et al. [2019] Maziar Raissi, Paris Perdikaris, and George E Karniadakis. Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. Journal of Computational physics, 378:686–707, 2019.
Rao et al. [2023] Chengping Rao, Pu Ren, Qi Wang, Oral Buyukozturk, Hao Sun, and Yang Liu. Encoding physics to learn reaction–diffusion processes. Nature Machine Intelligence, 5(7):765–779, 2023.
Raonic et al. [2024] Bogdan Raonic, Roberto Molinaro, Tim De Ryck, Tobias Rohner, Francesca Bartolucci, Rima Alaifari, Siddhartha Mishra, and Emmanuel de Bézenac. Convolutional neural operators for robust and accurate learning of pdes. Advances in Neural Information Processing Systems, 36, 2024.
Ren et al. [2022] Pu Ren, Chengping Rao, Yang Liu, Jian-Xun Wang, and Hao Sun. Phycrnet: Physics-informed convolutional-recurrent network for solving spatiotemporal pdes. Computer Methods in Applied Mechanics and Engineering, 389:114399, 2022.
Rudi and Rosasco [2021] Alessandro Rudi and Lorenzo Rosasco. Generalization Properties of Learning with Random Features, April 2021.
Shadloo et al. [2016] M.S. Shadloo, G. Oger, and D. Le Touzé. Smoothed particle hydrodynamics method for fluid flows, towards industrial applications: Motivations, current state, and challenges. Computers & Fluids, 136:11–34, 2016. ISSN 0045-7930. doi: https://doi.org/10.1016/j.compfluid.2016.05.029.
Shang and Wang [2024] Yong Shang and Fei Wang. Randomized Neural Networks with Petrov–Galerkin Methods for Solving Linear Elasticity and Navier–Stokes Equations. Journal of Engineering Mechanics, 150(4):04024010, April 2024. ISSN 0733-9399, 1943-7889. doi: 10.1061/JENMDT.EMENG-7463.
Sharma and Shankar [2022] Ramansh Sharma and Varun Shankar. Accelerated training of physics-informed neural networks (pinns) using meshless discretizations. Advances in Neural Information Processing Systems, 35:1034–1046, 2022.
Shepard [1968] Donald Shepard. A two-dimensional interpolation function for irregularly-spaced data. In Proceedings of the 1968 23rd ACM National Conference, ACM ’68, page 517–524, New York, NY, USA, 1968. Association for Computing Machinery. ISBN 9781450374866. doi: 10.1145/800186.810616.
Sirignano and Spiliopoulos [2018] Justin Sirignano and Konstantinos Spiliopoulos. Dgm: A deep learning algorithm for solving partial differential equations. Journal of computational physics, 375:1339–1364, 2018.
Sun et al. [2024] Jingbo Sun, Suchuan Dong, and Fei Wang. Local randomized neural networks with discontinuous Galerkin methods for partial differential equations. Journal of Computational and Applied Mathematics, 445:115830, August 2024. ISSN 03770427. doi: 10.1016/j.cam.2024.115830.
Turk and Levoy [1994] Greg Turk and Marc Levoy. Zippered polygon meshes from range images. In Proceedings of the 21st annual conference on Computer graphics and interactive techniques, pages 311–318, 1994.
Wang et al. [2024a] Haoxiang Wang, Tao Yu, Tianwei Yang, Hui Qiao, and Qionghai Dai. Neural physical simulation with multi-resolution hash grid encoding. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 38, pages 5410–5418, 2024a.
Wang et al. [2024b] Sifan Wang, Shyam Sankaran, and Paris Perdikaris. Respecting causality for training physics-informed neural networks. Computer Methods in Applied Mechanics and Engineering, 421:116813, 2024b.
Wang and Dong [2024] Yiran Wang and Suchuan Dong. An extreme learning machine-based method for computational pdes in higher dimensions. Computer Methods in Applied Mechanics and Engineering, 418:116578, 2024.
Wu and Long [2022] Lei Wu and Jihao Long. A Spectral-Based Analysis of the Separation between Two-Layer Neural Networks and Linear Methods. Journal of Machine Learning Research, 23(1), January 2022. ISSN 1532-4435.
Xia et al. [2023] Mingtao Xia, Lucas Böttcher, and Tom Chou. Spectrally adapted physics-informed neural networks for solving unbounded domain problems. Machine Learning: Science and Technology, 4(2):025024, 2023.
Zhu et al. [2023] Min Zhu, Handi Zhang, Anran Jiao, George Em Karniadakis, and Lu Lu. Reliable extrapolation of deep neural operators informed by physics or sparse observations. Computer Methods in Applied Mechanics and Engineering, 412:116064, 2023.

Appendix

Appendix A Theoretical results

In this section, we describe in more detail the setting of the PDE and the assumptions, define the particular variant of Barron spaces, and prove the theoretical result in the main paper.

A.1 Input space

We start by defining the input space $\Omega$ , following the setup from [4]. We set

\displaystyle d_{\mathbb{R}^{d}}(x,A)=\inf\{d(x,a)\colon a\in A\},

where $d$ is the canonical Euclidean distance in the space $\mathbb{R}^{d}$ and $A\subseteq\mathbb{R}^{d}$ . The medial axis is defined as

\displaystyle\text{Med}(A)=\{x\in\mathbb{R}^{d}\colon\exists p\neq q\in A,% \lVert p-x\rVert=\lVert q-x\rVert=d_{\mathbb{R}^{d}}(x,A)\}

and the reach is the scalar

\displaystyle\tau_{A}=\inf_{a\in A}d_{\mathbb{R}^{d}}(a,\text{Med}(A)),

i.e., the point in $A$ that is closest to the projection of points in $A^{c}$ .

Definition 1.

Let $\widetilde{\Omega}$ be a nonempty compact subset of $\mathbb{R}^{d}$ with reach $\tau_{\widetilde{\Omega}}>0$ . The input space $\Omega$ is defined as

\displaystyle\Omega=\{x\in\mathbb{R}^{d}\colon d_{\mathbb{R}^{d}}(x,\widetilde% {\Omega})<\epsilon_{\Omega}\},

where $0<\epsilon_{\Omega}<\min\{\tau_{\widetilde{\Omega}},1\}$ . In addition, we denote

\displaystyle\Omega_{c}=\{x\in\mathbb{R}^{d}\colon d_{\mathbb{R}^{d}}(x,% \widetilde{\Omega})\leq\epsilon_{\Omega}\}.

The input spaces encompassed by the definition above are still very broad, and as argued by [4], these assumptions typically hold due to noise in real-world applications. In addition, in many PDE settings, the input space is also included in the definition above or close enough. We altered the definition somewhat from [4], as we will need the open and bounded set $\Omega$ , as well as the compact set $\Omega_{c}$ .

A.2 PDE Setting

Consider a family of second-order elliptic PDEs on $\mathbb{R}^{d}$ of the form

\mathcal{L}u=-\nabla\cdot(\Psi\nabla u)+\psi u=f.

(7)

Assumption A.1.

$\Psi(x)=(\Psi_{ij}(x))_{1\leq i,j\leq d}$ is a symmetric matrix with $\|\Psi(x)\|\leq a_{max}\leq\infty$ and uniformly elliptic, that is, for some $a_{min}>0$ , it satisfies

\xi^{\top}\Psi(x)\xi\geq a_{min}\|\xi\|^{2},\quad\text{ for all}\quad x,\xi\in% \mathbb{R}^{d}.

In addition, assume that $0<\psi_{min}\leq\psi(x)\leq\psi_{max}<1$ and $f\in L_{2}(\mathbb{R}^{d})$ .

Under the above assumptions, the Lax-Milgram theorem implies that there exists a unique weak solution $u^{*}\in H_{1}(\mathbb{R}^{d})$ , such that $\mathcal{L}u=f\in H^{*}(\mathbb{R}^{d})$ , with $H^{*}(\mathbb{R}^{d})$ the dual space of $H_{1}(\mathbb{R}^{d})$ [8], i.e.,

\int_{\mathbb{R}^{d}}\Psi\nabla u^{*}\cdot\nabla v\mathrm{d}x+\int_{\mathbb{R}% ^{d}}\psi u^{*}v\mathrm{d}x=\int_{\mathbb{R}^{d}}fv\mathrm{d}x,\quad\text{ for% all}\quad v\in H_{1}(\mathbb{R}^{d}).

Following the setup of [8], we consider the PDE whose coefficients and source term are in the Barron space.

A.3 Barron Spaces

The Barron space has been defined in distinct ways [2, 25], but all definitions can be seen as the space of infinitely wide two-layered neural networks. Given an activation function $\widetilde{\sigma}$ (not to be confused with $\sigma$ which we set to $\tanh$ in the main paper), and a probability measure $\rho$ over the parameter space, the function

u_{\rho}(x)=\int a\widetilde{\sigma}(w^{\top}x+b)\rho(\mathrm{d}a,\mathrm{d}w,% \mathrm{d}b),\quad x\in\mathbb{R}^{d}

(8)

can be seen as the infinite width function of the neural network

\displaystyle\sum_{k=1}^{K}a_{k}\widetilde{\sigma}(w_{k}^{\top}x+b_{k}),\quad% \{a_{k},w_{k},b_{k}\}_{k=1}^{K}\sim\rho.

The Barron space is a generalization from one particular probability distribution to all function that have at least one distribution $\rho$ such that Equation 8 holds.

Definition 2.

For a domain $\Omega$ and $R\in[0,\infty]$ and a function $g\colon\mathbb{R}^{d}\rightarrow\mathbb{R}$ such that $g=u_{\rho}$ with some probability measure $\rho$ , we define the Barron norm of $g$ on $\Omega$ with index $p\in[1,\infty]$ and support radius $R$ by

	$\displaystyle\lVert g\rVert_{\mathcal{B}^{p}_{R}(\Omega)}$	$\displaystyle=\inf_{\rho}\Bigg{\{}\left(\int\lvert a\rvert^{p}\rho(\mathrm{d}a% ,\mathrm{d}w,\mathrm{d}b)\right)^{1/p}\colon g(x)=\int a\widetilde{\sigma}(w^{% \top}x+b)\rho(\mathrm{d}a,\mathrm{d}w,\mathrm{d}b)$
		$\displaystyle\quad\quad\text{on}\,\Omega\;\text{with}\;\mathrm{supp}(\rho)% \subseteq\mathbb{R}\times\overline{B}_{R}^{d}\times\mathbb{R}\Bigg{\}},$

where $\overline{B}_{R}^{d}:=\left\{x\in\mathbb{R}^{d}\colon\lvert x\rvert\leq R\right\}$ . The corresponding Barron space is then defined as

\mathcal{B}_{R}^{p}=\left\{g:\mathbb{R}^{d}\to\mathbb{R}\colon\|g\|_{\mathcal{% B}^{p}_{R}(\Omega)}<\infty\right\}.

(9)

As already discussed in [8], this definition of Barron space adapts a similar definition in [25] with several important modifications for the purpose of PDE analysis. Namely, it is required that the $w$ -marginal of the probability measure $\rho$ has compact support in order to control the derivatives of a Barron function, see [8] for further explanation. In addition, the Barron norm defined here only considers the norm of the parameter $a$ while [25] takes into consideration $w$ and $b$ as well. This is because [25] uses unbounded activation functions (such as ReLU), which requires the moment condition in all parameters to make the integral in (8) well-defined. This is not necessary for us, as we will limit $\widetilde{\sigma}$ to be bounded, and $w$ is already bounded by $R$ .

A.4 Existence result

Using the Barron space defined above, we can impose our second assumption on the type of PDE, following the ideas in [8].

Assumption A.2.

For some constants $R_{\Psi},R_{\psi},R_{f}\in(0,\infty)$ , the following bounds holds on the coefficients and the source term of our second-order elliptic PDE:

\displaystyle\max_{1<i,j<d}\lVert\Psi_{i,j}\rVert_{\mathcal{B}^{1}_{R_{\Psi}}(% \mathbb{R}^{d})}<\infty,\quad\lVert\psi\rVert_{\mathcal{B}^{1}_{R_{\psi}}(% \mathbb{R}^{d})}<\infty,\quad\lVert f\rVert_{\mathcal{B}^{1}_{R_{f}}(\mathbb{R% }^{d})}<\infty.

The assumptions still allow for a broad range of PDE. The two assumptions, Assumption A.1 and Assumption A.2, on the constants holds as long as $Im(\widetilde{\sigma})=0$ . Possible source terms can be found in Proposition A.1 in [8].

We are now ready to prove the existence result in the main paper.

Theorem A.1.

Consider a second-order elliptic PDE of the form (5)

\mathcal{L}u=-\nabla\cdot(\Psi\nabla u)+\psi u=f

(10)

with the coefficients $\Psi(x),\psi(x)$ and the source term $f(x)$ satisfying Assumption A.1 and Assumption A.2, with weak solution $u^{*}\in H^{1}(\mathbb{R}^{d})$ . Let $\Omega\subset\mathbb{R}^{d}$ be any open, bounded subset following Definition 1 in Appendix A and $\mu$ being the Lebesgue measure. Then, for any $\epsilon\in(0,\frac{1}{2})$ , there exists one hidden layered neural network with $\tanh$ activation function of the form (3), constructed either by ELM or SWIM, such that

\lVert\hat{u}-u^{*}\rVert_{L_{2}(\Omega,\mu)}\leq\epsilon.

Proof.

Consider the weak solution $u^{*}$ of the PDE (10). According to the Theorem 2.10 from [8], for any open $\Omega\subset\mathbb{R}^{d}$ and any $\epsilon\in(0,\frac{1}{2})$ , there exists a two-layer neural network $u^{\cos}_{k}$ with $\widetilde{\sigma}=\text{cos}$ of the form

u^{\cos}_{k}(x)=\frac{1}{k}\sum_{k=1}^{K}a_{k}\cos(w_{k}^{\top}x+b_{k}),\quad x% \in\Omega,

(11)

with parameters $(a_{k};w_{k};b_{k})\in\mathbb{R}\times\mathbb{R}^{d}\times\mathbb{R},k=1,2,% \ldots,K$ and $K\leq\gamma\mu(\Omega)d^{\beta\ln{(\epsilon/2)}}$ with some absolute constants $\gamma,\beta>0$ (see [8] for more details) and $\mu(\Omega)$ the Lebesgue measure of $\Omega$ , such that

\lVert u^{*}-u^{\cos}_{k}\rVert_{H^{1}(\Omega,\mu)}\leq\frac{\epsilon}{2}.

As for any function $g$ we have $\lVert g\rVert_{L^{2}(\Omega,\mu)}\leq\lVert g\rVert_{H^{1}(\Omega,\mu)}$ , it holds that

\lVert u^{*}-u^{\cos}_{k}\rVert_{L^{2}(\Omega,\mu)}\leq\frac{\epsilon}{2}.

(12)

As $u^{\cos}_{k}$ is a continuous function on $\Omega_{c}$ , we know that for any $\epsilon_{2}\in(0,1)$ there exist neural networks $\hat{u}$ with one hidden layer of arbitrary width and $\tanh$ -activation function such that

\|u^{\cos}_{k}-\hat{u}\|_{L^{\infty}(\Omega_{c})}\leq\epsilon_{2}.

This holds both for the ELM setting [14, 53], but also the SWIM setting [4]. From the Hölder inequality it follows that

\displaystyle\|u^{\cos}_{k}-\hat{\mu}\|_{L^{2}(\Omega_{c},\mu)}\leq\mu(\Omega_% {c})^{1/2}\|u^{\cos}_{k}-\hat{u}\|_{L^{\infty}(\Omega_{c})},

and we have

\|u^{\cos}_{k}-\hat{u}\|_{L^{2}(\Omega,\mu)}\leq\mu(\Omega_{c})^{1/2}\epsilon_% {2}.

Taking $\epsilon_{2}=\mu(\Omega_{c})^{1/2}\frac{\epsilon}{2}$ , we get

\|u^{\cos}_{k}-\hat{u}\|_{L^{\infty}(\Omega,\mu)}\leq\frac{\epsilon}{2}.

(13)

Now, allowing for (12) and (13), we see that the weak PDE solution $u^{*}$ can be well approximated by the tanh-network in terms of the $L^{2}$ -norm due to

\|\hat{u}-u^{*}\|_{L^{2}(\Omega)}\leq\|\hat{u}-u^{\cos}_{k}\|_{L^{2}(\Omega,% \mu)}+\|u^{\cos}_{k}-u^{*}\|_{L^{2}(\Omega,\mu)}\leq\frac{\epsilon}{2}+\frac{% \epsilon}{2}=\epsilon,

which completes the proof. ∎

A.5 Convergence for sampled networks

We now consider the case where we sample the hidden layer, and determine if the outer weights exist such that the network is $\epsilon$ -close to the weak solution of the elliptic PDE. First, we define the kernel $\mathcal{K}$

\displaystyle\mathcal{K}(x,x^{\prime})=\int_{\mathcal{W}}\sigma(w^{\top}x+b)% \sigma(w^{\top}x^{\prime}+b)\widetilde{\rho}(\mathrm{d}w,\mathrm{d}b),

(14)

where $\widetilde{\rho}$ is the distribution from which we sample the weights and biases of the hidden layer, and $\mathcal{W}$ is the parameter space. As $\sigma$ equals tanh, $\mathcal{K}$ exists and is well-defined, due to boundedness of $\sigma$ . The accompanying RKHS is then defined as

\displaystyle\mathcal{H}_{2}=\left\{\int_{\mathcal{W}}\alpha(w,b)\sigma(w^{% \top}\cdot+b)\widetilde{\rho}(\mathrm{d}w,\mathrm{d}b)\colon\int_{\mathcal{W}}% \lvert\alpha(w,b)\rvert^{2}\widetilde{\rho}(\mathrm{d}w,\mathrm{d}b)<\infty% \right\}.

Before we can show the main result, we need a simple lemma stating that any finite neural network can be approximated by a function in $\mathcal{H}_{2}$ .

Lemma A.1.

Let $\mathcal{K}$ in Equation 14 be well-defined. For every neural network $\hat{u}(x)=\sum_{k=1}^{K}a_{k}\sigma(w_{k}^{\top}x+b_{k})$ and $\epsilon\in(0,1)$ , if for all $k=1,2,\dots,K$ , there exist $\delta_{k}>0$ such that $0<\widetilde{\rho}(\overline{B}^{d+1}_{R}(w_{k},b_{k}))<\infty$ for all $R\in(0,\delta_{k}]$ , then there is a function $h\in\mathcal{H}_{2}$ such that

\displaystyle\lVert\hat{u}-h\rVert_{L^{2}(\Omega,\mu)}<\epsilon,

where $\mu$ is the Lebesgue measure.

Proof.

By continuity of $\sigma$ and on a compact subset $\Omega_{c}$ , we have the following: for every $\hat{\epsilon}=\frac{\sqrt{\epsilon}}{\max\{\lvert a_{k}\rvert,1\}_{k=1}^{K}% \cdot K\cdot\mu(\Omega)}>0$ , there exists a $\delta$ such that for all $k=1,2,\dots,K$ , if $[\tilde{w}_{k},\tilde{b}_{k}]\in\overline{B}^{d+1}_{\min\{\delta_{k},\delta\}}% (w_{k},b_{k})$ , then

\displaystyle\lVert\sigma(\tilde{w}_{k}^{\top}\cdot+\tilde{b}_{k})-\sigma(w_{k% }^{\top}\cdot+b_{k})\rVert^{2}_{L^{\infty}(\Omega_{c})}<\hat{\epsilon}.

Let $B_{k}=\overline{B}^{d+1}_{\min\{\delta_{k},\delta\}}(w_{k},b_{k})$ . We then set

\displaystyle\alpha(w,b)=\begin{cases}\frac{a_{k}}{\widetilde{\rho}(B_{k})},&% \text{if }[w,b]\in B_{k}\\ 0,&\text{otherwise.}\end{cases}

Constructing $h$ with this $\alpha$ , we have

	$\displaystyle h(x)$	$\displaystyle=\int_{\mathcal{W}}\alpha(w,b)\sigma(w^{\top}x+b)\widetilde{\rho}% (\mathrm{d}w,\mathrm{d}b)$
		$\displaystyle=\sum_{k=1}^{K}\int_{B_{k}}\frac{a_{k}}{\widetilde{\rho}(B_{k})}% \sigma(w^{\top}x+b)\widetilde{\rho}(\mathrm{d}w,\mathrm{d}b)$
		$\displaystyle=\sum_{k=1}^{K}a_{k}\sigma(w_{k}^{\top}x+b_{k})+\frac{a_{k}}{% \widetilde{\rho}(B_{k})}\int_{B_{k}}(\sigma(w^{\top}x+b)-\sigma(w_{k}^{\top}x+% b_{k}))\widetilde{\rho}(\mathrm{d}w,\mathrm{d}b)$
		$\displaystyle=\hat{u}+\sum_{k=1}^{K}\frac{a_{k}}{\widetilde{\rho}(B_{k})}\int_% {B_{k}}(\sigma(w^{\top}x+b)-\sigma(w_{k}^{\top}x+b_{k}))\widetilde{\rho}(% \mathrm{d}w,\mathrm{d}b).$

We then have

	$\displaystyle\lVert\hat{u}-h\rVert_{L^{2}(\Omega,\mu)}^{2}$	$\displaystyle=\left\lVert\sum_{k=1}^{K}\frac{a_{k}}{\widetilde{\rho}(B_{k})}% \int_{B_{k}}(\sigma(w_{k}^{\top}\cdot+b_{k})-\sigma(w^{\top}\cdot+b))% \widetilde{\rho}(\mathrm{d}w,\mathrm{d}b)\right\rVert_{L^{2}(\Omega,\mu)}^{2}$
		$\displaystyle<\left\lVert\sum_{k=1}^{K}\frac{\sqrt{\epsilon}}{K\cdot\mu(\Omega% )}\right\rVert_{L^{2}(\Omega,\mu)}^{2}=\epsilon.$

∎

Remark 1.

The condition on $\widetilde{\rho}$ and the balls surrounding the weights of the parameters of $\hat{u}$ holds in both the case where $\hat{u}$ is constructed by ELM and the case where it is constructed by SWIM. For the ELM case, as long as we set $\eta$ sufficiently large enough which we will assume, this holds true. In SWIM, any network used to prove convergence of Theorem 1 in [4] relies on weights and biases that corresponds to points in the interior of the input space. This implies we can find a $\delta_{k}$ small enough such that the support of $\widetilde{\phi}$ includes the balls $B_{k}$ in the proof above. The finite nonzero requirement holds due to $\widetilde{\rho}$ is either the Gaussian/uniform when sampling with ELM, and uniform or the finite-based distribution in [4] when sampling with SWIM.

We are now ready to prove the main result of convergence when sampling the weights. Letting $\widetilde{\rho}$ be one of the three distributions discussed in the previous remark, such that Lemma A.1 holds.

Theorem A.2.

With the same PDE and input setup, and assumptions as in Theorem A.1, let the sampling method be either ELM or SWIM with distribution $\widetilde{\rho}$ , and let $\delta\in(0,1]$ . Then for any $\epsilon\in(0,\frac{1}{2})$ , there exist a $K\in\mathbb{N}$ , so that by sampling $K$ pairs of weights $W\in\mathbb{R}^{K\times d}$ and biases $b\in\mathbb{R}^{K}$ i.i.d. from $\widetilde{\rho}$ , there exists outer weights $c\in\mathbb{R}^{K}$ , such that the bound

\lVert\hat{u}-u^{*}\rVert_{L_{2}(\Omega,\mu)}=\lVert c^{\top}\sigma(W\cdot+b)-% u^{*}\rVert_{L^{2}{(\Omega,\mu)}}\leq\epsilon.

holds with probability $1-\delta$ .

Proof.

We start by constructing a neural network with tanh-activation function $\hat{u}_{1}$ , such that

\displaystyle\lVert\hat{u}_{1}-u^{*}\rVert_{L^{2}(\Omega,\mu)}^{2}\leq\frac{% \epsilon}{3},

which holds by Theorem A.1. Secondly, we choose a $h\in\mathcal{H}_{2}$ such that

\displaystyle\lVert h-\hat{u}_{1}\rVert_{L^{2}(\Omega,\mu)}^{2}\leq\frac{% \epsilon}{3},

which we can by Lemma A.1.

We then construct a probability measure $\widetilde{\mu}=\frac{\mu}{\mu(\Omega_{c})}$ . Now we can observe a few things. Firstly, $\lvert\sigma(w^{\top}x+b)\rvert\leq 1$ a.s. Secondly, $\lvert h(x)\rvert\leq l$ a.s. w.r.t. $\widetilde{\mu}$ , where $l>0$ . This is due to the compactness of $\Omega_{c}$ and $h$ being continuous.

From this, we can deduce that Theorem 1 in [60] holds, with the distribution over the data $\pi$ being such that the marginal for $x\in\Omega_{c}$ is $\widetilde{\mu}$ , and the distribution of $y$ conditioned on $x$ is the Dirac delta at $h(x)$ . This means there exist a $N_{0}$ , and we set

\displaystyle N=\max\{N_{0},z_{\delta}^{2}\},\quad z_{\delta}=\frac{e_{1}\log^% {2}\frac{18}{\delta}}{\epsilon/3\cdot\mu(\Omega_{c})^{2}},

where $e_{1}>0$ is a constant. We then sample i.i.d. $\{x_{n},f(x_{n})\}_{n=1}^{N}\sim\pi$ , set $\lambda=N^{-1/2}$ , and sample

\displaystyle K=e_{0}\max\{\sqrt{N_{0}},z_{\delta}\}\log\frac{108\max\{\sqrt{N% _{0}},z_{\delta}\}}{\delta},

weights and biases, where $e_{0}>0$ is a constant. Let $W$ be the resulting sampled weight matrix, and $b$ be the resulting bias vector. By setting $c$ in similar fashion to Equation 7 in [60], and $\hat{u}=c^{\top}\sigma(W\cdot+b)$ , we have that with probability $1-\delta$ ,

\displaystyle\lVert\hat{u}-h\rVert_{L^{2}(\Omega_{c},\widetilde{\mu})}^{2}\leq% \frac{e_{1}\log^{2}\frac{18}{\delta}}{z_{\delta}}=\frac{\epsilon}{3\cdot\mu(% \Omega_{c})^{2}}.

Combining the results above, with probability $1-\delta$ , the following bound holds,

	$\displaystyle\lVert\hat{u}-u^{*}\rVert_{L^{2}(\Omega,\mu)}^{2}$	$\displaystyle\leq\frac{\mu(\Omega_{c})^{2}}{\mu(\Omega_{c})^{2}}\lVert\hat{u}-% h\rVert_{L^{2}(\Omega,\mu)}^{2}+\lVert h-\hat{u}_{1}\rVert_{L^{2}(\Omega,\mu)}% ^{2}+\lVert\hat{u}_{1}-u^{*}\rVert_{L^{2}(\Omega,\mu)}^{2}$
		$\displaystyle\leq\mu(\Omega_{c})^{2}\lVert\hat{u}-h\rVert_{L^{2}(\Omega_{c},% \widetilde{\mu})}^{2}+\frac{\epsilon}{3}+\frac{\epsilon}{3}$
		$\displaystyle\leq\mu(\Omega_{c})^{2}\cdot\frac{\epsilon}{3\cdot\mu(\Omega_{c})% ^{2}}+\frac{2\epsilon}{3}=\epsilon.$

∎

Appendix B Methods

The computational experiments were performed on the following system: UBUNTU 20.04.6 LTS, PYTHON 3.12.3, NUMPY 1.26.4, SCIPY 1.13.0, MATPLOTLIB 3.9.0, PYTORCH 1.12.1, and NVIDIA Driver 515.105.01, i7 CPU.

The code to reproduce the experiments from the paper, and an up-to-date code base, can be found (with MIT Licence) at

https://gitlab.com/felix.dietrich/swimpde-paper,

https://gitlab.com/felix.dietrich/swimpde.

Let $d$ be the dimension of space and $N$ be the total number of points in the spatio-temporal domain $\Omega$ $\subset(\mathbb{R}^{d},\mathbb{R})$ . Then, given $N$ points in a test set $X$ , we define the error metrics we use to compare numerical results are RMSE and relative $L^{2}$ error by

\textnormal{RMSE}:=\sqrt{\frac{\sum_{x\in X}(u_{true}(x)-u_{pred}(x))^{2}}{N}}% ,\ \textnormal{and}\ L^{2}_{\text{relative}}:=\frac{\sqrt{\sum_{x\in X}(u_{% true}(x)-u_{pred}(x))^{2}}}{\sqrt{\sum_{x\in X}(u_{true}(x))^{2}}}.

B.1 SWIM, ELM, SWIM-ODE, ELM-ODE

The details of these methods are already discussed in the main text.

B.2 Physics-informed networks

This work employs two prominent variants of physics-informed machine learning to compare the results obtained by the proposed methods. In particular, physics-informed neural network (PINN) [56] and causality-respecting physics-informed neural network (causal PINN) [69] are utilized to compare the obtained approximations.

Vanilla PINNs are feedforward deep neural networks designed to simulate PDEs by incorporating physical laws into the learning process. The architecture of a vanilla PINN includes a deep neural network that maps inputs (e.g., space and time coordinates) to outputs (e.g., physical quantities of interest) and is trained to minimize a loss function that combines both data and physics-based errors. The data term ensures that the neural network fits the provided data points, while the physics term enforces the PDE constraints with automatic differentiation. Hence, the loss function for PINN could be defined by

L(\mu)=\lambda_{1}L_{\mathrm{PDE}}(\mu)+\lambda_{2}L_{\mathrm{Data}}(\mu).

(15)

Here, $\mu$ represents the trainable network parameters. Considering the generic nonlinear PDE defined by (1) with well-posed boundary and initial conditions (2), the individual loss terms weighted by the hyperparameters $\lambda_{i}$ , $i=1,2$ , are defined by

\begin{gathered}L_{\mathrm{PDE}}(\mu)=\frac{1}{N_{\mathrm{int}}}\sum_{n=1}^{N_% {\mathrm{int}}}||u^{*}_{t}(x^{(n)},t^{(n)})+Lu(x^{(n)},t^{(n)})+\lambda N(u^{*% })(x^{(n)},t^{(n)})-f(x^{(n)})||^{p}.\end{gathered}

(16)

The data loss term considering the initial and boundary conditions is defined by

\begin{gathered}L_{\mathrm{Data}}(\mu)=\frac{1}{N_{\mathrm{i}}}\sum_{n=1}^{N_{% \mathrm{i}}}||Bu^{*}(x^{(n)},t^{(n)})-g(x^{(n)})||^{p}.\end{gathered}

(17)

Here, $N$ is the total number of training points, which is the sum of interior training points ( $N_{\mathrm{int}}$ ) and initial or boundary training points ( $N_{\mathrm{}}$ ). The neural network predicted solution of $u$ at a point in computational domain, $(x^{(n)},t^{(n)})$ is denoted by $u^{*}(x^{(n)},t^{(n)})$ . The experiments are trained with $L^{2}$ -norm, implying $p=2$ . The main goal is to minimize (15) and determine the optimal parameters ( $\mu$ ). These parameters, once optimized, are employed to predict the solution of the PDE within the computational domain.

The second physics-informed method employed is Causal PINNs, which modifies the PINN loss function to explicitly adhere to the temporal causality inherent in time-dependent PDEs. In vanilla PINNs, the loss function does not prioritize resolving the solution at lower times before higher times, leading to inaccuracies, especially in time-dependent problems. Causal PINNs address this by introducing a weighting factor for the loss at each time step, which depends on the accumulated loss from previous time steps. The resulting loss function ensures that the network prioritizes learning the solution accurately at earlier times before focusing on later times, thus maintaining the causal structure of the physical problem being solved. The causal PDE loss term is defined by

\begin{gathered}L_{\mathrm{PDE}}(\mu)=\sum_{i=1}^{N_{\mathrm{t}}}w_{i}L_{% \mathrm{PDE}}(t_{i},\mu),\\ w_{1}=1,\quad w_{i}=e^{-\epsilon\sum_{k=1}^{i-1}L_{\mathrm{PDE}}(t_{k},\mu)},% \quad i=2,3,\ldots N_{\mathrm{t}}.\end{gathered}

(18)

$N_{\mathrm{t}}$ represents the number of time steps into which the computational domain is divided. The causality hyperparameter $\epsilon$ regulates the steepness of the weights and is incorporated in the loss function similar to [37]. This modification introduces a weighting factor, $w_{i}$ , for the loss at each time level $t_{i}$ , with $w_{i}$ being dependent on the cumulative PDE loss up to time $t_{i}$ . The network prioritizes a fully resolved solution at earlier time levels by exponentiating the negative of this accumulated loss. Consequently, the modified loss function for causal PINN is expressed as

\displaystyle L_{\mathrm{PDE}}(\mu)=\frac{1}{N_{t}}\left[w_{1}L_{\mathrm{PDE}}% (t_{1},\mu)+\sum_{i=2}^{N_{t}}e^{-\epsilon\sum_{k=1}^{i-1}L_{\mathrm{PDE}}(t_{% k},\mu)}L_{\mathrm{PDE}}(t_{i},\mu)\right].

(19)

B.3 IGA-FEM

First introduced in [33], Isogeometric analysis (IGA) is a numerical method developed to unify the fields of computer-aided design (CAD) and finite element analysis (FEA). The key idea is to represent the solution space for the numerical analysis using the same functions that define the geometry in CAD [11], which include the B-Splines and Non-Uniform Rational B-Splines (NURBS) [52].

In this paper, we use B-Splines as the basis functions. The B-Splines are defined using the Cox-de Boor recursion formula [13, 15], i.e.,

\displaystyle N_{i,0}(\xi)=\begin{cases}1&\xi_{i}\leq\xi<\xi_{i+1}\\ 0&\text{otherwise,}\end{cases}

\displaystyle N_{i,p}(\xi)=\frac{\xi-\xi_{i}}{\xi_{i+p}-\xi_{i}}N_{i,p-1}(\xi)% +\frac{\xi_{i+p+1}-\xi}{\xi_{i+p+1}-\xi_{i+1}}N_{i+1,p-1}(\xi),

where $\xi_{i}$ is the $i$ th knot, and $p$ is the polynomial degree. The vector $\Xi=[\xi_{1},\xi_{2},\dots,\xi_{n+p+1}]$ is the knot vector, where $n$ is the number of B-Splines. By specifing the knot vector, we define the basis functions we use to solve the PDEs. We use an uniform open knot vector, where the first and last knots have multiplicity $p+1$ , the inner knots have no multiplicity, and all knots that have different values are uniformly distributed. We refer to the knots with different values as "nodes". The intervals between two successive nodes are knot spans, which can be viewed as "elements". The elements form a "patch". A domain can be partitioned into subdomains and each is represented by a patch. In our work, we use a single patch to represent the entire 1D domain. Figure 7(a) shows an example of such a patch, where the B-Splines are $C^{p}$ -continuous within the knot spans and $C^{p-1}$ continuous at the inner knots. In order to address the boundary conditions, we adapt the B-Splines as shown in Figure 7(b) Figure 7(c), so that the boundary conditions are directly built into the solution space.

In the following, we refer to the adapted B-Splines as basis functions $\phi_{k}(x)$ . Thus, the solutions of PDEs are approximated by

\displaystyle u(x,t)=\sum^{K}_{k=1}c_{k}(t)\phi_{k}(x).

We solve the PDEs in the weak formulation. For the linear advection equation (23), the weak form of the equation is

\displaystyle\sum^{K}_{k=1}c^{\prime}_{k}(t)\int_{X}\phi_{k}(x)v(x)dx+\beta% \sum^{K}_{k=1}c_{k}(t)\int_{X}\phi^{\prime}_{k}(x)v(x)dx=0,

(20)

where $v(x)$ are the test functions. The test functions are chosen to be the same as the basis functions. The integral of the functions is computed using Gaussian quadrature. Then we solve the linear Ordinary differential equation (ODE)

\displaystyle\mathbf{M}\dot{\mathbf{c}}+\mathbf{Kc}=\mathbf{0},

where matrix $\mathbf{M}$ and matrix $\mathbf{K}$ contain the integral of the B-Splines and their derivatives, and the coefficient $\beta$ , which are given. We solve the Euler-Bernoulli equation (25) and the Burgers’ equation (26) in a similar way. The boundary condition for the Euler-Bernoulli equation is in addition weakly imposed, as is done in [55].

Appendix C Numerical Experiments

C.1 Helmholtz equation on a geometry described by the Stanford bunny dataset

We solve the Helmholtz equation with periodic boundary conditions on the domain $\Omega\subset\mathbb{R}^{3}$ . The Stanford bunny data set [67] contains 3-D point cloud data of 15,258 points. The PDE is defined by


$\displaystyle\Delta u(x)-u(x)$	$\displaystyle=f(x)\quad\textnormal{for}\quad x\in\Omega,$	(21a)

$\displaystyle u(x)$	$\displaystyle=g(x)\quad\textnormal{for}\quad x\in\partial\Omega.$	(21b)

We consider two cases with different analytical solutions. The boundary conditions $g$ , and the forcing terms $f$ for the two cases are chosen such that the respective exact solutions of the PDE are

\displaystyle u_{1}(x)=\exp(x_{1}+x_{2}+x_{3}),\ \text{and}\,u_{2}(x)=5+\sin(% \pi x_{1})\sin(\pi x_{2})\sin(4\pi x_{3}).

(22)

Solving PDEs with classical mesh-based methods requires dealing with difficulties and higher computational costs of generating meshes on complex geometries. The meshfree nature of neural PDE solvers such as PINN, ELM, and SWIM eliminates the difficulties associated with meshing and can cheaply find a good approximation as shown in Table 5 and Table 6. ELM requires fewer neurons for the same accuracy as SWIM for PDE solutions that do not involve steep local gradients. SWIM usually performs slightly better than ELM for solutions involving moderately steep local gradients, and PINNs struggle. Moreover, Figure 8 shows that $L^{2}_{\text{relative}}$ decreases exponentially with the neurons in the hidden layer for SWIM and ELM. We always choose the number of neurons as half the number of collocation points in Figure 8(b) and Figure 8(d).

In Table 4, we describe additional details in solving the Helmholtz equation with neural PDE solvers: ELM, SWIM, and PINN. Figure 9 and Figure 10 show the absolute errors obtained with the ELM, SWIM, and PINN PDE solvers. We use the Python package alphashape [3] to extract the boundary points of the Stanford bunny dataset. However, we do not uniformly sample the boundary points on the boundary, and there may be some points on the domain’s interior near the boundary that are selected as the boundary points. This, however, entails no changes in our algorithm.

For approximating $u_{1}(x)$ and $u_{2}(x)$ , 6000 points in the interior and 888 points on the boundary of the bunny are sampled to train the networks. In addition, testing is performed on 8370 interior points and 99 boundary points. The exact architectures and comparison of training times and errors are presented in Table 5 and Table 6. With ELM and SWIM, we compute the SVD of the feature matrix and retain 1200 singular values for approximating $u_{1}(x)$ and 1600 singular values for approximating $u_{2}(x)$ .

Table 4: Network hyperparameters used to solve the static PDE on the Stanford bunny dataset (on both examples).

	Parameter	Value
SWIM and ELM	Number of layers	$2$
	Layer width	$[50,500,1000,2000,\textbf{3000},\textbf{4000}]$
	SVD/Outer layer	$[1200,1600]$
	Activation	tanh
	$L^{2}$ -regularization	$[10^{-8},10^{-10},\mathbf{10^{-12}}]$
	Loss	mean-squared error
PINN	Number of layers	$4$
	Layer width	$20$
	Activation	tanh
	Optimizer	LBFGS
	Epochs	$2500$ ( $10000$ for approximating $u_{2}(x)$ )
	Loss	mean-squared error
	Learning rate	$0.1$
	Batch size	$888$
	Parameter initialization	Xavier [29]
	Loss weights, $\lambda_{1}$ , $\lambda_{2}$	$0.1$ , $1$

Table 5: Summary of Helmholtz equation with ground truth given by

u_{1}(x)

(see Equation 22).

Method	Training time ( $\mathrm{s}$ )	RMSE	Relative ${L}^{2}$ error	architecture
PINN	57.12	4.86e-4 $\pm$ 1.53e-4	3.18e-4 $\pm$ 9.99e-5	(3, 4 $\times$ 20, 1)
SWIM	64.30	3.99e-4 $\pm$ 1.47 e-4	2.61e-4 $\pm$ 9.65e-5	(3, 3000, 1200, 1)
ELM	69.06	1.05e-6 $\pm$ 4.31e-7	6.90e-7 $\pm$ 2.82 e-7	(3, 3000, 1200, 1)

Table 6: Summary of Helmholtz equation with ground truth given by

u_{2}(x)

(see Equation 22).

Method	Training time ( $\mathrm{s}$ )	RMSE	Relative ${L}^{2}$ error	architecture
PINN	229.06	4.81e-2 $\pm$ 1.01e-3	9.53e-3 $\pm$ 1.99e-4	(3, 4 $\times$ 20, 1)
ELM	144.83	5.15e-3 $\pm$ 2.64e-3	2.80e-3 $\pm$ 3.13e-4	(3, 4000, 1600, 1)
SWIM	129.20	2.80e-3 $\pm$ 3.13e-4	1.02e-3 $\pm$ 5.24e-4	(3, 4000, 1600, 1)

C.2 Linear Advection Equation

The advection equation models the propagation of a quantity at a speed $\beta$ without altering the shape. We solve the linear advection equation with periodic boundary conditions described by

	$\displaystyle u_{t}(x,t)+\beta u_{x}(x,t)$	$\displaystyle=0,\quad\textnormal{for}\quad x\in[0,2\pi],t\in[0,1],$		(23)
	$\displaystyle u(x,0)$	$\displaystyle=\sin(x).$		(24)

We describe additional details in solving the advection equation with various neural PDE solvers in Table 7. In addition, hyperparameter optimization for PINN for the case of $\beta=10$ was carried out, varying the number of neurons and interior points. The results for the hyperparameter optimization are detailed in Table 8, Table 9. For SWIM and ELM, we use $1000$ interior points for $\beta\in\{10^{-2},10^{-1},1,10\}$ , and we use $8000$ interior points for $\beta\in\{40,100\}$ . Figure 11 shows the absolute errors obtained with the SWIM-ODE, ELM-ODE, PINN, Causal PINN and IGA methods.

Table 7: Network hyperparameters used for the linear advection equation for

\beta\in\{10^{-2},10^{-1},1,10,40,100\}

	Parameter	Value
SWIM-ODE, ELM-ODE	Number of layers	$2$
	Hidden layer width	$[140,\mathbf{380},560]$
	Activation	tanh
	$L^{2}$ -regularization	$[10^{-8},\mathbf{10^{-10}},10^{-12}]$
	Loss	mean-squared error
SWIM, ELM	Number of layers	$2$
	SVD cutoff	$10^{-12}$
	Hidden layer width	$[140,\mathbf{380},560]$
	Activation	tanh
	$L^{2}$ -regularization	$[10^{-8},\mathbf{10^{-10}},10^{-12}]$
	Loss	mean-squared error
	# Initial and boundary points	$400$
	# Interior points	$[\textbf{1000},\textbf{8000}]$
IGA	Number of nodes	$16$
	Degree of polynomials	$8$
	Number of basis functions	$15$
PINN	Number of layers	$4$
	Layer width	$[10,20,\textbf{30},40]$
	Activation	tanh
	Optimizer	LBFGS
	Epochs	$5000$
	Loss	mean-squared error
	Learning rate	$0.1$
	Batch size	$200$
	Parameter initialization	Xavier [29]
	Loss weights, $\lambda_{1}$ , $\lambda_{2}$	$1$ , $1$
	# Interior points	$[500,1000,1500,\textbf{2000}]$
	# Initial and boundary points	$600$
Causal PINN	Number of layers	$4$
	Layer width	$30$
	Activation	tanh
	Optimizer	ADAM followed by LBFGS
	ADAM Epochs	$2000$
	LBFGS Epochs	$5000$
	Loss	mean-squared error
	Learning rate	$0.1$
	Batch size	$2000$
	Parameter initialization	Xavier [29]
	Loss weights, $\lambda_{1}$ , $\lambda_{2}$	$1$ , $1$
	# Interior points	$40000$
	# Initial and boundary points	$6000$
	Causality parameter, $\epsilon$	$10$

Table 8: Advection equation (23): hyperparameter optimization for PINN varying layer width for

\beta=10

Layer width	Training time ( $\mathrm{s}$ )	RMSE	Relative ${L}^{2}$ error
10	24.47 $\pm$ 0.19	1.24e-3 $\pm$ 2.38e-4	1.76e-3 $\pm$ 3.37e-4
20	27.46 $\pm$ 0.08	6.52e-4 $\pm$ 2.59e-4	9.22e-4 $\pm$ 3.66e-4
30	30.43 $\pm$ 0.50	3.69e-4 $\pm$ 4.33e-5	5.23e-4 $\pm$ 6.13e-5
40	33.64 $\pm$ 0.41	3.86e-4 $\pm$ 9.37e-5	5.46e-4 $\pm$ 1.32e-4

Table 9: Advection equation (23): hyperparameter optimization for PINN varying interior points for

\beta=10

Interior points	Training time ( $\mathrm{s}$ )	RMSE	Relative ${L}^{2}$ error
500	25.76 $\pm$ 0.29	4.10e-4 $\pm$ 7.20e-5	5.80e-4 $\pm$ 1.01e-4
1000	27.44 $\pm$ 0.25	3.72e-4 $\pm$ 4.06e-5	5.27e-4 $\pm$ 5.74e-5
1500	29.61 $\pm$ 0.16	5.68e-4 $\pm$ 1.97e-4	8.03e-4 $\pm$ 2.79e-4
2000	30.43 $\pm$ 0.50	3.69e-4 $\pm$ 4.33e-5	5.23e-4 $\pm$ 6.13e-5

Table 10: Summary of results for the advection equation for

\beta=40

Method	Training time ( $\mathrm{s}$ )	RMSE	Relative ${L}^{2}$ error	architecture
SWIM	66.30	6.81 $\pm$ 0.26	9.63 $\pm$ 0.38	(2, 4000, 1)
ELM	59.05	3.78 $\pm$ 0.21	5.35 $\pm$ 0.29	(2, 4000, 1)
Causal PINN	357.63 $\pm$ 3.11	2.07 $\pm$ 0.87	2.92 $\pm$ 1.23	(2, 4 $\times$ 30, 1)
PINN	30.56 $\pm$ 0.27	4.89e-1 $\pm$ 2.09e-2	6.92e-1 $\pm$ 2.96e-2	(2, 4 $\times$ 30, 1)
ODE-SWIM (our)	2.72	5.04 e-6 $\pm$ 1.45e-6	7.13e-6 $\pm$ 2.05 e-6	(1, 350, 15, 1)
ODE-ELM (our)	2.71	2.73e-6 $\pm$ 3.68e-7	3.84e-6 $\pm$ 5.2e-7	(1, 350, 15, 1)
IGA-FEM	0.07	8.24e-11	1.17e-10	15

C.3 Euler-Bernoulli PDE

This is a time-dependent PDE, given by

u_{\mathrm{tt}}+u_{\mathrm{xxxx}}=f(x,t)\quad x\in[0,\pi],t\in[0,1]

(25)

where $f(x,t)=(1-16\pi^{2})\sin{(x)}\cos(4\pi t)$ , with initial and boundary conditions

u(x,0)=\sin(x),\quad u_{t}(x,0)=0

u(0,t)=u(\pi,t)=u_{\mathrm{xx}}(0,t)=u_{\mathrm{xx}}(\pi,t)=0.

It models a simply supported beam with varying transverse force. We describe additional details in solving the Euler-Bernoulli with various neural PDE solvers in Table 11. Figure 12 shows the absolute errors obtained with the SWIM-ODE, ELM-ODE, PINN, and IGA methods.

Table 11: Network hyperparameters used for the Euler-Bernoulli equation.

	Parameter	Value
SWIM-ODE and ELM-ODE	Number of layers	$[2]$
	Hidden layer width	$200$
	Outer layer width	$10$
	Activation	tanh
	$L^{2}$ -regularization	$10^{-12}$
	Loss	mean-squared error
IGA	Number of nodes	$27$
	Degree of polynomials	$9$
	Number of basis functions	$33$
PINN	Number of layers	$4$
	Layer width	$20$
	Activation	tanh
	Optimizer	LBFGS
	Epochs	$15000$
	Loss	mean-squared error
	Learning rate	$0.1$
	Batch size	$2000$
	Parameter initialization	Xavier [29]
	Loss weights, $\lambda_{1}$ , $\lambda_{2}$	$0.1$ , $1$
	# Interior points	$10000$
	# Initial and boundary points	$6000$

Table 12: Summary of results for the Euler-Bernoulli beam equation.

Method	Training time ( $\mathrm{s}$ )	RMSE	Relative ${L}^{2}$ error	architecture
PINN	2303.71 $\pm$ 278.68	2.11e-3 $\pm$ 4.79e-4	4.21e-3 $\pm$ 9.56e-4	(2, 4 $\times$ 20, 1)
ODE-SWIM (our)	0.05	6.06e-8 $\pm$ 2.96e-8	1.20e-7 $\pm$ 5.91e-8	(1, 200, 10, 1)
ODE-ELM (our)	0.06	1.75e-8 $\pm$ 3.91e-9	3.50e-8 $\pm$ 7.79e-9	(1, 200, 10, 1)
IGA-FEM	0.94	2.11e-7	4.21e-7	33

C.4 Burgers

The inviscid Burgers’ equation is a non-linear PDE, which can form shock waves. We solve Burgers’ equation on $\Omega=[-1,1]$ for time $t\in(0,1]$ , so that

$\displaystyle u_{t}+uu_{x}-(0.01/\pi)u_{xx}$	$\displaystyle=0,\quad x\in\Omega,\quad,t\in[0,1],$	(26)
$\displaystyle u(0,x)=-\sin(\pi x),$		(27)
$\displaystyle u(1,-1)=u(t,1)=0.$		(28)

We describe additional details in solving the Burgers’ equation with various neural PDE solvers in Table 13 and Table 14. Figure 13 shows the absolute errors obtained with the PINN, Causal PINN, and IGA methods.

Table 13: Network hyper-parameters used for SWIM-ODE and ELM-ODE to solve Burgers’ equation.

	Parameter	Value
SWIM-ODE	Number of layers	$2$
	Hidden layer width	$[450]$
	Activation	tanh
	$L^{2}$ -regularization	$[10^{-6},\mathbf{10^{-7}},10^{-8},10^{-10},10^{-12}]$
	Loss	mean-squared error
	# collocation points (space)	$[800]$
	# sampling points	$[6000]$
ELM-ODE	Number of layers	$2$
	Hidden layer width	$[2000]$
	Activation	tanh
	$L^{2}$ -regularization	$[10^{-6},\mathbf{10^{-7}},10^{-8},10^{-10},10^{-12}]$
	Loss	mean-squared error
	# collocation points (space)	$[3000]$
	# sampling points	$[6000]$

Table 14: Network hyper-parameters used for PINN, Causal PINN, and IGA to solve Burgers’ equation.

	Parameter	Value
PINN	Number of layers	$9$
	Layer width	$20$
	Activation	tanh
	Optimizer	LBFGS
	Epochs	$10000$
	Loss	mean-squared error
	Learning rate	$0.1$
	Batch size	$200$
	Parameter initialization	Xavier [29]
	Loss weights, $\lambda_{1}$ , $\lambda_{2}$	$1$ , $1$
	# Interior points	$10000$
	# Initial and boundary points	$600$
Causal PINN	Number of layers	$9$
	Layer width	$20$
	Activation	tanh
	Optimizer	ADAM followed by LBFGS
	ADAM Epochs	$5000$
	LBFGS Epochs	$10000$
	Loss	mean-squared error
	Learning rate	$0.1$
	Batch size	$200$
	Parameter initialization	Xavier [29]
	Loss weights, $\lambda_{1}$ , $\lambda_{2}$	$1$ , $1$
	# Interior points	$40000$
	# Initial and boundary points	$600$
	Causality parameter, $\epsilon$	$5$
IGA	Number of nodes	$400$
	Degree of polynomials	$8$
	Number of basis functions	$405$

Table 15: Summary of results for Burgers’ Equation.

Method	Training time ( $\mathrm{s}$ )	RMSE	Relative ${L}^{2}$ error	architecture
ODE-ELM (our)	2.41	1.51e-1 $\pm$ 3.27 e-4	2.47e-1 $\pm$ 5.33e-4	(1, 2000, 1)
PINN	275.23 $\pm$ 5.38	2.38e-3 $\pm$ 1.61e-3	3.88e-3 $\pm$ 2.61e-3	(2, 9 $\times$ 20, 1)
Causal-PINN	1531.79 $\pm$ 18.45	9.85e-3 $\pm$ 5.51e-3	1.60e-2 $\pm$ 8.97e-3	(2, 9 $\times$ 20, 1)
ODE-SWIM (our)	141.35	2.05e-4 $\pm$ 2.84e-4	3.33e-4 $\pm$ 4.63e-4	(1, 400, 1)
IGA-FEM	13.61	1.35e-4	2.20e-4	405

Table 16: Network hyperparameters for the two inverse problems.

	Parameter	Value
SWIM	Number of layers	1
	Layer width	1024 (Poisson), 400 (Helmholtz)
	Activation	tanh
	$L^{2}$ -regularization	1.0e-10 (Poisson), 0.0 for Helmholtz
	Linear solver	np.linalg.lstsq
	Loss	mean-squared error
	# Interior points to solve PDE	$900$ (Poisson and Helmholtz)
	# Boundary points	None (boundary condition not provided)
	# Measurement points	$50$ (Poisson), $300$ (Helmholtz)
PINN	Number of layers	$4$
	Layer width	$20$
	Activation	tanh
	Optimizer	LBFGS
	Epochs	$10000$
	Loss	mean-squared error
	Learning rate	$0.1$
	Batch size	$375$
	Parameter initialization	Xavier [29]
	Loss weights, $\lambda_{1}$ , $\lambda_{2}$	$0.05$ , $1$
	# Interior points to solve PDE	$1500$
	# Initial and boundary points	$1500$
	# Measurement points	$50$ (Poisson)

C.5 Parametric Poisson equation

We solve a parametric Poisson equation on $\Omega=[0,1.4]^{2}$ ,

\text{div}(D\nabla u(x))=f(x)\,\text{for}\,x\in\Omega,\,\text{and}\,u(x)=g(x)% \,\text{for}\,x\in\partial\Omega,

(29)

with parameterized, diagonal diffusivity matrix $D=\text{diag}\left(1,\alpha\right)$ , $\alpha>0$ . Here $u(x)$ denotes the unknown solution of the PDE and $\alpha$ denotes the unknown model parameter that needs to be estimated given the set of measurements. We describe additional details in solving this inverse problem with various neural PDE solvers in Table 16. Figure 14 shows the ground truth as well as the distribution of the random measurement points in the two-dimensional domain. When solving the PDE with varying parameter $\alpha$ , it is obvious from Figure 14 (right) that the ground truth parameter must be close to $\alpha=1$ .

For general inverse problems solved with PINN, the unknown parameter can be in the PDE or initial or boundary conditions. Hence, this unknown parameter will be the part of the loss function. To estimate this parameter in conjunction to training PINN and solving the forward problem, this parameter should be learnable. Following, we describe two possible ways to estimate this parameter. First, the unknown parameter can be considered as a learnable parameter like weights and biases of PINN and the parameter could be passed on to the optimizer to iterate over the landscape of parameter values to reach a suboptimal value. This approach works fine for cases with a limited number of unknown parameters, but has a downside of initialization. That is, when making the unknown parameter learnable, the parameter needs initialization, for which a good guess may not be known and sometimes can completely lead to the failure of PINN. The second approach is to consider a pseudo function whose mean over space-time is the unknown parameter. This pseudo function is considered as an output of the neural network. With such an approach, the mean of the function needs to be passed on to the loss function to estimate the parameter. Hence, although the neural network has two outputs, The quantities of interest are the first output which is the solution of the PDE and the mean of the second output, which is the unknown parameter itself. The benefit of this approach is that no initialization of the unknown parameter is required. However, a negative side is that if there are several parameters, the training can be expensive. However, as we have a single parameter, this approach suits well. In addition, if one has to approximate an unknown source term or function through PINN, the second approach is more feasible as the first approach would require an intelligent parametrization of the unknown function field.

C.6 Helmholtz equation with parameter field

We repeat the experiment from Dong and Wang [18] and set up an inverse problem involving the Helmholtz equation on $\Omega=[0,1.5]^{2}$ with Dirichlet (fixed-value) boundary conditions, and a space-dependent parameter field $\gamma$ , so that

	$\displaystyle\Delta u(x)-\gamma(x)u(x)$	$\displaystyle=$	$\displaystyle f(x)\quad\textnormal{for}\quad x\in\Omega,$
	$\displaystyle u(x)$	$\displaystyle=$	$\displaystyle g(x)\quad\textnormal{for}\quad x\in\partial\Omega.$

Here, we fix the same ground truth solution as Dong and Wang [18], to

$\displaystyle\gamma(x)$	$\displaystyle=$	$\displaystyle 100\left[1+0.25\sin(2\pi x_{1})+0.25\sin(2\pi x_{2})\right],$
$\displaystyle u(x)$	$\displaystyle=$	$\displaystyle[2.5\sin(\pi x_{1}-0.4\pi)+1.5\cos(2\pi x_{1}+0.3\pi)]$
		$\displaystyle\cdot[2.5\sin(\pi x_{2}-0.4\pi)+1.5\cos(2\pi x_{2}+0.3\pi)].$


	PDE:	$-\Delta u=f$
	Neural basis:	${\color[rgb]{0,0.39453125,0.7421875}\definecolor[named]{pgfstrokecolor}{rgb}{% 0,0.39453125,0.7421875}\phi_{1}},{\color[rgb]{0.890625,0.4453125,0.1328125}% \definecolor[named]{pgfstrokecolor}{rgb}{0.890625,0.4453125,0.1328125}\phi_{2}% },{\color[rgb]{0.63671875,0.6796875,0}\definecolor[named]{pgfstrokecolor}{rgb}% {0.63671875,0.6796875,0}\phi_{3}},\dots$
	Ansatz:	$\hat{u}=\sum_{k}c_{k}\phi_{k}$
	Approximation:	$-\sum_{k}c_{k}\Delta\phi_{k}\approx f$

Solving partial differential equations with sampled neural networks

Abstract

1 Introduction

2 Related Work

3 Mathematical framework

3.1 Neural network ansatz

3.2 Existence of solutions

Theorem 3.1.

Theorem 3.2.

4 Approximation of PDE solutions using the neural network ansatz

4.1 Random sampling of parameters in the hidden layer

4.2 Solving time-dependent PDE

4.3 Constructing outer weights to satisfy the boundary condition

4.4 Solving inverse problems involving PDEs

5 Computational experiments

5.1 Time-dependent PDE

5.1.1 Linear advection equation

5.1.2 Fourth order Euler–Bernoulli Beam Equation

5.1.3 Non-linear Viscous Burgers equation

5.2 Inverse problems

5.2.1 Parametric Poisson PDE

5.2.2 Helmholtz PDE with parameter field

6 Conclusion

Acknowledgments and Disclosure of Funding

References

Appendix

Appendix A Theoretical results

A.1 Input space

Definition 1.

A.2 PDE Setting

Assumption A.1.

A.3 Barron Spaces

Definition 2.

A.4 Existence result

Assumption A.2.

Theorem A.1.

Proof.

A.5 Convergence for sampled networks

Lemma A.1.

Proof.

Remark 1.

Theorem A.2.

Proof.

Appendix B Methods

B.1 SWIM, ELM, SWIM-ODE, ELM-ODE

B.2 Physics-informed networks

B.3 IGA-FEM

Appendix C Numerical Experiments

C.1 Helmholtz equation on a geometry described by the Stanford bunny dataset

C.2 Linear Advection Equation

C.3 Euler-Bernoulli PDE

C.4 Burgers

C.5 Parametric Poisson equation

C.6 Helmholtz equation with parameter field

Solving partial differential equations
with sampled neural networks