Physics Informed Neural Network Theory and Applications
Physics Informed Neural Network Theory and Applications
Applications
Cosmin Anitescu∗, Burak İsmail Ateş†, and Timon Rabczuk‡
Institut für Strukturmechanik, Bauhaus-Universität Weimar
Abstract
Methods that seek to employ machine learning algorithms for solv-
ing engineering problems have gained increased interest. Physics in-
formed neural networks (PINNs) are among the earliest approaches,
which attempt to employ the universal approximation property of arti-
ficial neural networks to represent the solution field. In this framework,
solving the original differential equation can be seen as an optimiza-
tion problem, where we seek to minimize the residual or some energy
functional. We present the main concepts and implementation steps
for PINNs, including an overview of the basics for defining and train-
ing an artificial neural network model. These methods are applied in
several numerical examples of forward and inverse problems, including
the Poisson equation, Helmholtz equation, linear elasticity and hyper-
elasticity.
1 Introduction
Machine learning (ML) methods based on artificial neural networks (ANNs)
have become increasingly used, particularly in data-rich fields such as text,
image and audio processing, where they have achieved remarkable results,
greatly surpassing the previous state-of-the art algorithms. Typically, ML
methods are most efficient in applications where the patterns are difficult to
describe by clear-cut rules, such as handwriting recognition. In these cases, it
may be more efficient to generate the rules by a kind of high-dimensional re-
gression between a sufficiently large number of input-output pairs. However,
other techniques based on ANNs have also been successful in domains where
the rules are relatively easy to describe, such as AlphaZero [59] for game play-
ing and AlphaFold [28] for protein folding. Many of these advancements have
been driven by an increase in computational capabilities, in particular with
∗ cosmin.anitescu@uni-weimar.de
† burak.ismail.ates@uni-weimar.de
‡ timon.rabczuk@uni-weimar.de
1
regard to Graphics Processing Units (GPUs) and Tensor Processing Units
(TPUs) [27], but also by theoretical advances related to the initialization
and architecture of the ANNs. In the scientific community, there has also
been increased interest in applying the new developments in ANNs and ML
to solve partial differential equations (PDEs) and other engineering problems
of interest.
One can distinguish between supervised, unsupervised and reinforcement
learning. In the former, the aim is to find the mapping between a set of inputs
and outputs, such as images of hand-written digits and the actual digit they
represent, so that when a new input is presented, the correct output can be
predicted by the ML algorithm. A prerequisite for the application of these
methods is the availability of labeled data. In engineering applications, such
approaches can be used e.g. for predicting the solution from the boundary
conditions for a given PDE based on a large set of inputs/solutions pairs of
similar problems, see also operator-approximation methods [36, 41, 38]. How-
ever, a drawback is the requirement for possibly large amounts of labeled data
(i.e. solved examples) drawn from the same distribution as the problems that
we like to solve in the first place. On the contrary, in unsupervised learn-
ing the algorithm aims to find patterns in the input data to produce useful
output based on some hard-coded rules or objectives. In classical ML, such
tasks include image segmentation, dimensionality reduction (such as princi-
pal component analysis or PCA), or different types of clustering (grouping
unlabeled data based on similarities or differences). Furthermore, there is
a middle ground category of semi-supervised learning, where a mixture of
labeled and unlabeled data is used in an attempt to overcome some of the
shortcomings of the first two categories. Related to this, is the concept of re-
inforcement learning, where an agent-based system seeks to learn the actions
that maximize a reward function.
Physics-informed neural networks (PINNs) are more closely-related to the
unsupervised or semi-supervised learning, whereby satisfying the governing
equations, including the boundary conditions, at a given set of collocation
points defines the objective function. This idea was originally proposed dur-
ing the 1990s in [33, 32] and further extended for domains with irregular
boundaries in [34]. As the cost of training neural networks became cheaper,
further developments have been first reported in [52, 60] among others, in-
cluding the extension to time-dependent problems and model parameter in-
ference (i.e. inverse problems). In [52], the term PINN is first used, along
with the concept of combining (possibly noisy) experimental data with the
governing equation in a small data or semi-supervised setting. Since then,
several improvements have been suggested, such as adaptively choosing the
collocation points [3, 68], variational formulations [69, 55, 29], and domain
decomposition approaches [58]. Moreover, PINNs have been applied to a
wide variety of problems, such as hyperelasticity [46], multiphase poroelastic-
ity [19], Kirchhoff plates [71], eikonal equation [64], biophysics[31], quantum
chemistry[50], materials science[57] and others.
2
In this chapter, we give a concise overview of the main ideas of PINNs,
focusing on the implementation and potential applications to forward and
inverse PDEs. In Section 2, we introduce the building blocks required to
create and train a neural network model, while in Section 3 we present the
collocation and energy minimization approaches, along with a discussion of
enforcing the boundary conditions. In Section 4, we present some numerical
examples, focusing on some pedagogical examples of standard PINNs for
problems that are feasible to compute on regular desktops or even mobile
computers, followed by some concluding remarks in Section 5.
uN N = Lk ◦ Lk−1 ◦ . . . ◦ L0 (1)
3
with
Li (xi ) = σi (Wi xi + bi ) = xi+1 for i = 0, . . . , k. (2)
Here Wi are matrices of size mi × ni , with n0 = n, ni+1 = mi , and mk = m,
xi+1 and bi are column vectors of size mi , and the activation functions σi
are applied element-wise to the vectors Wi xi + bi . The entries of the matri-
ces Wi are called weights and those of the vectors bi are called biases, and
together they represent the trainable parameters of the neural network. For
k > 0, the values m0 , …, mk−1 can be chosen freely and represent the number
of neurons in each hidden layer. If the number of hidden layers k > 1, we say
that uN N is a deep neural network. A schematic of a feed-forward network
with 3 neurons in the input layer, two hidden layers with 4 and 5 neurons,
respectively, and an output layer consisting of 2 neurons is shown in Figure
1. In a typical application, many inputs are collected in a batch and evalu-
Hidden
Hidden
Input
Output
ated together. Evaluating the output of the neural network involves mainly
linear algebra operations (such as matrix and vector products) which can be
easily parallelized. In machine learning frameworks, such as Tensorflow [1],
PyTorch [48] or JAX [5], a computational graph is built to record the different
operations. This allows for efficient evaluation and also for computing the
gradients by automatic differentiation methods as will be detailed in Section
2.3.
4
2.2.1 Linear activation
The simplest activation function is the linear activation, which means
that σ is simply the identity function:
σ(x) = x. (3)
On a network with no hidden layers, a linear activation function between the
input and output layers can be used to perform a linear regression between
the input and output data. For networks with one or more hidden layers,
stacking linear layers is not useful since a composition of linear activations
is still linear. However, linear layers can be combined with other non-linear
activation functions. For example, linear layers can be used as the last layer
to scale the output to arbitrary values. A non-trainable linear transformation
is often used to normalize the input of a network to speed up the training
(optimization) process, as will be detailed in Section 2.3.3).
2.2.3 Sigmoid
The sigmoid activation, also known as the logistic function, is defined as:
1
σ(x) = . (6)
1 + exp(−x)
This function has a S-shaped form, as shown in Figure 2b. The range of
this function is the interval (0, 1), therefore it is often used in the output
layer of neural networks used for binary classification tasks, where the output
is a probability that the input belongs to a given class. The function is
also differentiable infinitely many times, resulting in a smooth approximation
which is desirable for many applications.
5
10 1.0
8 0.8
6 0.6
4 0.4
2 0.2
0 0.0
10 5 0 5 10 10 5 0 5 10
1.0 10
0.5 8
6
0.0
4
0.5 2
1.0 0
10 5 0 5 10 10 5 0 5 10
exp(x) − exp(−x)
tanh(x) = . (7)
exp(x) + exp(−x)
This function looks similar to the sigmoid activation, maintaining the overall
S-shape and smoothness. An important difference is that the range of the
outputs is (−1, 1) which is centered at 0. This makes the tanh activation more
suitable for deep networks without creating a bias towards positive outputs.
2.2.5 Swish
The swish activation function is defined as:
x
swish(x) = = x · σ(x), (8)
1 + exp(−x)
where σ(x) is the sigmoid activation. The plot of this function is shown in
Figure 2d. The swish function looks similar to the ReLU activation. However,
like sigmoid and tanh, it is infinitely differentiable.
6
We note that there are several other activations that have been proposed
which are similar to ReLU and swish, such as Leaky ReLU [42], exponential
linear units (ELUs) [8], Gaussian error linear units (GELUs) [22], Mish [45]
and others. These have been shown to remedy some of the drawbacks of the
previously considered activation functions and provide a modest improvement
on some machine learning tasks, particularly related to image-based classi-
fication and segmentation tasks [37]. However, from the point of view of
function approximation where partial derivatives are involved, tanh or swish
are also well suited due to their smoothness properties.
The idea of using trainable parameters in the activation function was pro-
posed in [2], and further developed in the context of function and PDE so-
lution approximation in [25, 24, 26, 57] among others. Some adaptive or
trainable activation functions have a different form, for example the original
Swish activation proposed in [53] is of the form:
x
σβ (x) = , (10)
1 + exp(−βx)
2.3 Training
As mentioned earlier, the training process involves optimizing the network
parameters (weights and biases) such that an objective function is minimized.
Suppose the loss function is denoted by L(uN N (x; θ)), where uN N is the
neural network and θ represents the trainable parameters, e.g. the matrices
Wi and vectors bi in (2). In the case of regression, a commonly used loss
function is the mean square error, defined as:
N
1 X
LM SE (uN N (xj ; θ)) = |uN N (xj ) − yj |2 , (11)
N j=1
7
where xj , j = 1, . . . , N are input points at which the ground truth output
values yj are known. For the case of PDE approximations, more complicated
loss functions which contain the partial derivatives of uN N with respect to
the inputs can be devised. Additional terms can be used to incorporate the
governing equations and boundary conditions, as will be detailed in Section
3. Then the process of training a neural network can be described as:
Find θ ∗ = arg min L(uN N (x; θ)). (12)
θ
8
2.3.2 Network initialization
When initializing the training process, particular care is needed for the
selection of the initial value. For example, if all the weights and biases are set
to zero, then the gradients with respect to the weights within a layer will have
the same value. In a gradient descent update with a fixed step size, all the
parameters will be updated by the same amount, resulting in the equivalent
of a network with a single neuron per layer. Part of the recent success of deep
neural networks in applications is owed to better techniques for initializing
the values of the network parameters, such as Glorot (Xavier) [14] and He
[21] initialization.
While the initialization method can be seen as a hyper-parameter which
can be tuned according to the problem at hand, a commonly used one is
Glorot uniform, where the weights are chosen from a uniform distribution
U [−l, l], where r
6
l= , (13)
nin + nout
with nin and nout being the number of input and output neurons for a given
layer. The biases are initialized to zero. This is also the default initialization
used the Tensorflow deep learning framework.
2 · (x − xmin )
Tnorm (x) = − 1, (14)
xmax − xmin
where xmax and xmin are the maximum and minimum input values, respec-
tively. In the case where the input values are points in the computational
domain, then xmin and xmax represent the bounding box of the domain. These
values must be fixed for training and testing, otherwise incorrect results will
be obtained.
9
the number of training points and the number of layers and neurons must
be correlated in the sense that, for optimal results, a larger number of pa-
rameters require a larger number of training points to avoid overfitting. The
performance of the network is then measured by testing and validating the
output.
In standard machine learning tasks, it is common to partition the avail-
able data into training/testing/validation subsets. The training data is used
in the optimization procedure (12) for finding the optimal trainable param-
eters (weights and biases). The validation data is used to monitor the per-
formance of the network by just evaluating the loss function. Tuning the
network hyperparameters, such as the type of activation function, network
size and optimization algorithm may require some trial and error. Although
the validation data is not used directly in the optimization process, it may
indirectly create a bias in the process of hyperparameter tuning. Therefore,
when the performance on the validation data is satisfactory, the network may
be further validated using the test set. A typical split is to use 80% of the
data for training and ca. 20% for testing and validation, although these ra-
tios may vary depending on the problem at hand. For example, in the case
of physics informed neural networks, where training and testing data are just
points in the domain, it may be useful to test the network by generating
many more points from a higher resolution sample. We note that in most
machine learning models, optimization is the most computationally intensive
part. Therefore the amount of training data is most closely related to the
amount of random access memory (RAM) and numerical (floating point) op-
erations required, while testing (evaluating) the model is comparatively much
cheaper.
In this section, we describe some of the pitfalls involved in training and
testing a network, and the countermeasures that can be implemented.
10
approximated occurs in interpolation by high-order polynomials, where the
interpolant can oscillate wildly between the interpolation points. In this
case, the training loss value decreases to a low value (even zero), while the
validation loss can be much higher.
A good strategy to avoid overfitting or underfitting is to monitor both
the training and validation losses and to stop the training when the testing
loss begins to increase. To illustrate, the results for regression of the function
u(x) = sin(πx) for x ∈ [−1, 1] are shown in Figure 3. A random uniform
noise with magnitude in the interval (0, 0.1) was added to the training and
validation data, which consists of 201 and 50 points, respectively. A neural
network with two hidden layers consisting of 64 neuron has been used, to-
gether with the tanh activation function for the first two layers and linear
activation in the last layer. The ADAM optimizer with the default parame-
ters and learning rate of 0.001 is used to minimize the mean square error of
the difference between the predicted and training values.
We observe from Figure 3a that after 300 iterations, the neural network
can start to approximate the sinusoidal function, but it is still quite far
away from the actual shape (underfitting). The training loss value at this
stage is 0.1129, while the validation loss is 0.1393. After 10000 iterations,
the approximation is already quite good, with only a small error between
the prediction and the actual function (without noise) as shown in Figure
3b. Here, the training loss is 0.0033 and the validation loss is quite close at
0.0035. Next, if we continue to training, we start to observe that after many
more iterations, the training and validation loss start to diverge (see Figure
3d). After 100000 iterations, we notice that the predicted function has some
oscillations and spikes as it tries to capture the noise in the data as shown
in Figure 3c. At this stage, the training loss is 0.0026 and the test loss is
0.0037.
11
1.0 Prediction 1.0 Prediction
Ground truth Ground truth
0.5 Training Data 0.5 Training Data
Validation Data Validation Data
0.0 0.0
0.5 0.5
1.0 1.0
1.0 0.5 0.0 0.5 1.0 1.0 0.5 0.0 0.5 1.0
(a) Underfit, 300 iterations (b) Proper fit, 10000 iterations
1.0 Prediction Train loss
Ground truth Validation loss
0.5 Training Data 10 1
Validation Data
Loss
0.0
0.5 10 2
2.5 Optimizers
We will now briefly describe the optimization algorithms commonly used
to train (i.e. minimize the loss function) a neural network. First, we mention
that two types of optimization strategies can be employed: full-batch training
and mini-batch training. In the former, the entire data set is used during
a forward pass through the network and the gradients with respect to all
the data points are computed in one step. In mini-batch training on the
other hand, the training data is split into several sub-sets of (approximately)
the same size called mini-batches. Then an optimization sub-step is taken
with respect to each mini-batch. When the entire data set is seen by the
12
optimization algorithm once, then a training epoch is completed. In general,
first order optimization methods, like gradient descent, are commonly used
with mini-batch training, while algorithms that make use of (approximations
of) second derivative information use full batch training. A detailed survey
of optimization methods used in machine learning has been presented in [62].
where η is the learning rate. In the case of mini-batch training, since the
mini-batches are typically randomly selected, the method is called stochastic
gradient descent (SGD). Using mini-batches has been shown to improve the
robustness, allowing the optimizer to find the global optima (or better local
optima) even for non-convex problems [9, 44].
13
bias-correction is introduced in (18) and (19). This technique can smooth
out the oscillations in the gradients and usually improves the convergence
compared to the standard SGD optimizer.
14
where F represents a differential operator for the domain interior, G is a dif-
ferential operator for the boundary conditions, u is the unknown function,
Ω and Γ are the computational domain and its boundary, and n is the outer
normal vector to the boundary. The interior differential operator may con-
tain any order of derivatives with respect to the inputs, while the boundary
operator may contain any order of derivative with respect to the outer normal
vector for Neumann-type boundary conditions.
The loss function for a neural network uN N (x; θ) with trainable parame-
ters θ (which include the weights and biases for each layer) can be constructed
based on the “mean square error” (MSE) evaluated at a set of Nint interior
collocation points {xint
i }, i = 1, . . . , Nint and a set of Nbnd boundary collo-
cation points {xbnd
j }, j = 1, . . . , Nbnd as:
Nint
λ1 X ∂uN N (xint
i ; θ)
Lcoll (θ) = F(uN N (xint
i ; θ), , . . .)2
Nint i=1 ∂x1
Nbnd
λ2 X ∂uN N (xbnd
j ; θ)
+ G(uN N (xbnd
j ; θ), , . . .)2 . (23)
Nbnd j=1 ∂n
Here λ1 and λ2 are weight terms; usually choosing λ2 >> λ1 helps to speed up
convergence by ensuring that the boundary conditions are satisfied. Adaptive
methods for choosing the weights have also been proposed in [67]. In case of
time-dependent problems, the classical PINNs use a space-time discretization,
where the time is considered as an additional dimension.
with Γ denoting the portion of the boundary over which the boundary term
is evaluated. Then we can define the loss function of the form:
Z Z
Lenergy (θ) = Hint (uN N ) dΩ + Hbnd (uN N ) dΓ. (25)
Ω Γ
15
1, . . . , Qint for the interior integral and quadrature points {qbnd
j } and weights
{wjbnd } with j = 1, . . . , Qbnd for the boundary integral, i.e.
Qint Q bnd
j ))wj . (26)
X X
Lenergy (θ) ≈ Hint (uN N (qint int
i ))wi + Hbnd (uN N (qbnd bnd
i=1 j=1
4 Numerical Applications
By using a small set of training or input data (e.g., initial and boundary
conditions and/or measured data) as well as governing physical laws, PINNs
attempt to approximate the solution of the problem. Complex nonlinear sys-
tems and phenomena in physics and engineering are described by differential
equations.
PINNs have shown their capabilities to solve both forward and inverse
problems in science and engineering. A forward problem can be defined as
a problem of finding a particular effect of a given cause utilizing a physical
or mathematical model, whereas an inverse problem refers to finding causes
16
from the given effects[63]. We can investigate the one-dimensional steady-
state heat equation with the source term to give more concrete examples of
forward and inverse problems.
Let us consider a rod with unit length along the x-axis and the heat
flowing through this rod with a heat source as our model. We can represent
the temperature at location x on the rod as T (x). Under certain assumptions,
such as the rod being perfectly insulated, with the source term q(x) being
known, then the governing equation can be written as:
d2 T
κ + q(x) = 0 (28)
dx2
where κ > 0 is the thermal diffusivity constant. Finding temperature at any
location on the rod is a forward problem. On the other hand, finding the
constant κ, which is a rod feature, from observed temperature data is a good
example of an inverse problem. These examples will be detailed in Section
4.1 and 4.2.
To summarize, the aforementioned procedures explained in the previous
sections to solve differential equations with PINNs will become tangible with
numerical applications in this section. The solution estimation of PINNs for
both forward and inverse problems will be discussed by providing simple and
complex examples.
17
of neurons with non-linear activation functions. Of course, the outcome of
the first forward propagation will not be compatible with the true solution.
Therefore, at this point, the physics and boundary knowledge will guide the
neural network to approximate the ground truth by updating the weights
and biases of the neural network. Let us elaborate on this step by step and
reinforce these steps with code snippets. Note that these codes are written
with TensorFlow version 2.x with the Keras API.
We first generate 100 equidistant points in our domain. Here the choice
of the number of points is up to the user. However, it should be noted that
the number of points also has some influence on the number of iterations or
network size required to have results with similar accuracy. The ADAM op-
timizer with a learning rate of 0.005 is used for this example. An input layer,
three hidden layers with 32 neurons equipped with tanh activation function,
and an output layer form the neural network (see Figure 4). The input and
output layers have one neuron each since the input for the network is only
one spatial dimension, and the output is the temperature at these points.
By setting the number of iterations to 1000 and introducing the boundary
condition data in TensorFlow tensors, we complete the initial settings of our
model (see Listing 1).
18
hidden layers
(1) (2) (3)
a1 a1 a1
input output
layer layer
(1) (2) (3)
a3 a3 a3
x T
(1) (2) (3)
a4 a4 a4
.. .. ..
. . .
(1) (2) (3)
a32 a32 a32
# Output is one-dimensional
model.add(tf.keras.layers.Dense(1))
return model
19
# determine the model size (3 hidden layers with 32 neurons each)
model = buildModel(3, 32)
Then we define our loss function, which is composed of two parts, the
boundary loss, and physics loss, as formulated in (30). Here, the loss term
tells us how far away our model is from ’reality’. For the measure of these
loss terms, we will use mean square error formulation, which is mentioned in
Section 2.3 and (11).
LLoss = LBCs + LP hysics (30)
Constructing the boundary loss is easier compared to the physics loss. Our
model’s assumptions should be compatible with the prescribed boundary con-
ditions, which are T (0) = 0 and T (1) = 0 for our case. Thus, our goal should
be to minimize the mean square error between our model’s temperature pre-
diction at both ends of the rod and the real temperature values at these
points, which must be 0. The boundary condition loss is given by (31).
NB =2
λ1 X
LBCs = |TN N (xj ) − yj |2 , (31)
NB j=1
where NB = 2 since we have boundary condition data for two points which
are T (0) = T (1) = 0. The regularization term λ1 is taken as 1.
We also need to provide information about the interior points to get rea-
sonable results. Although we do not know the temperature data for interme-
diate points on the rod, we know those points have to satisfy some physical
laws that we derived in (28). Or in other words, our temperature prediction
needs to satisfy (28). When we take the derivative of the temperature pre-
diction of the network with respect to x two times and sum this result with
the source term q(x) divided by κ, this summation must yield 0. Thus, the
physics loss for our example becomes:
NPX
=100
λ2 d2 TN N q(xj ) 2
LP hysics = | + | , (32)
NP j=1
dx2 x=xj κ
20
return mse_bcs
21
# This tape is for derivatives with
# respect to trainable variables
tape.watch(model.trainable_variables)
Loss = loss_func(x, bcs_x_tensor, bcs_T_tensor)
g = tape.gradient(Loss, model.trainable_variables)
return Loss, g
# Training loop
for i in range(N + 1):
loss = train_step()
# printing loss amount in each 100 epoch
if i % 100 == 0:
print("Epoch {:05d}: loss = {:10.8e}".format(i, loss))
Once the training process is completed with the desired loss value, we
can validate the output by performing one forward pass with a test data-set
which is typically formed in the same domain as the training data-set. In
our example, the training data was 100 equidistant points between 0 and 1.
We can determine our test data set as 200 equidistant points in the same
domain. Figure 5 depicts that the model’s prediction captures the analytical
result.
22
1.50
1.25
1.00
0.75
T(x)
0.50
0.25 Exact Solution
Predicted Solution
0.00
0.0 0.2 0.4 0.6 0.8 1.0
x
Figure 5: Exact solution and the prediction of the model. The predicted
solution coincides with the ground truth which is T (x) = −5x3 + 2x2 + 3x.
where µ and λ are the Lamé constants, and I is the identity tensor. The
Dirichlet boundary conditions are u(x) = û for x ∈ ΓD and the Neumann
boundary conditions are σn = t̂ for x ∈ ΓN , where n is the normal vector.
For this example (see Figure 6), Ω is a rectangle with corners at (0,0) and
(8,2). Letting x = (x, y), and u = (u, v) the Dirichlet boundary conditions
for x = 0 are:
W2
Py
u(x, y) = (2 + ν)(y 2 − )
6EI 4
(36)
P 2
v(x, y) = − (3νy L)
6EI
Commonly, a parabolic traction at x = 8
y 2 − yW
p(x, y) = P (37)
2I
is applied where P = 2 is the maximum traction, E = 103 is Young’s modulus,
ν = 0.25 is the Poisson ratio and I = b W
12 is second moment of area of the
3
23
y
L=8 y
W=2
x z
b=1
Pmax=2
24
2.0
1.5
1.0
0.5
0.0
0 2 4 6 8
where Ψ is the strain energy density and ϕ indicates the mapping of points
on the body from the initial/undeformed to the deformed state.
25
hidden layers
(1) (2) (3)
a1 a1 a1
input output
layer layer
(1) (2) (3)
a2 a2 a2
x u
(1) (2) (3)
a3 a3 a3
Figure 8: The architecture of the feed-forward neural network for the Timo-
shenko beam problem. The network consists of one input layer, one output
layer, and 3 hidden layers. There are 20 neurons per hidden layer. 2 neu-
rons in the input layer take x and y coordinates, and the output neurons
give displacements in u and v directions. a is the activation function that is
swish in this example. Superscripted numbers denote the layer number, and
subscripted ones denote the neuron number in the relevant layer.
in which VΩ is the volume and NΩ is the number of data points within the
solid; N∂ ΩN and A∂ ΩN denote the number of points on the surface subjected
to the force and the surface area, respectively.
Let us consider now 3D cuboid of length L = 1.25, width W = 1.0 and
depth H = 1.0. It is fixed at the left surface and twisted 60◦ counter-clockwise
26
(a) Predicted displacements in x-axis (b) Exact displacements in x-axis
27
(a) Estimation error for displace- (b) Estimation error for displace-
ments in x-axis ments in y-axis
Figure 10: The difference between exact solution and predicted solution for
displacements on the beam in x and y directions.
H=1
L=1
.25
1
Fixed Support W=
Figure 11: The 3-D hyperelastic cuboid is fixed at the left hand side and
twisted 60◦ counter-clockwise
rection, 40 equally spaced points, 64000 points in total, are placed over the
whole domain (see Figure 12a). The neural network consists of 3 hidden
layers, and each hidden layer has 30 neurons with a tanh activation function.
The input and output layers have three neurons corresponding to coordinates
of the initial configuration of the designated points and their deformed coor-
dinates after loading, respectively. The network is trained with 50 iterations
and the parameters are optimized by the L-BFGS optimizer.
The predicted deformed shape of the cuboid is given in Figure 12b. A line
passing through two points on the cube A(0.625, 1, 0.5) and B(0.625, 0, 0.5)
is drawn to compare the displacement predictions and the real displacements
on the line. We showed in [55], that a neural network with the same setup
but trained with 25 steps has an error in the L2 norm of 0.13210, whereas
the finite element model has error 0.13275 for estimating the displacements
28
(a) 64000 equidistant points over (b) Deformed shape of 3D hyperelastic
the domain cuboid
Figure 12: Training points on the cuboid and its predicted deformed shape
after training
29
of the network according to the governing equations will be added to the loss
function.
At first, initial settings are applied (see Listing 4) similar to the forward
heat conduction problem defined in Section 4.1. However, the constant κ is
not known in advance for this problem. We have an initial guess of κ = 0.1
for the thermal diffusivity constant. The neural network has three hidden
layers with 32 neurons each, and the tanh function is used as the activation
function. The ADAM optimizer optimizes the network parameters with a
fixed learning rate of 0.001. The number of epochs is designated as 6000.
# We set seeds initially. This feature starts the model with same random
# variables (e.g. initial weights of the network).
# By doing it so, we have same results whenever the code is run
tf.random.set_seed(123)
# Number of iterations
N = 6000
30
model = tf.keras.Sequential()
# Output is one-dimensional
model.add(tf.keras.layers.Dense(1))
return model
After defining the model settings, we can proceed with constructing the
loss function. The loss function (43) consists of three parts, namely, boundary
loss, physics loss and data loss.
with
NB
λ1 X
LBCs = |TN N (xi ) − yi |2 ,
NB i=1
NP
λ2 X d2 TN N q(xj ) 2
LP hysics = | + | , (44)
NP j=1 dx2 x=xj κ
ND
λ3 X
LData = |TN N (xj ) − yj |2
ND j=1
31
@tf.function
def boundary_loss(bcs_x_tensor, bcs_T_tensor):
predicted_bcs = model(bcs_x_tensor)
mse_bcs = tf.reduce_mean(tf.square(predicted_bcs - bcs_T_tensor))
return mse_bcs
# Source term
def source_func(x): return (15 * x - 2)
@tf.function
def physics_loss(x):
x = x[1:-1]
predicted_Txx = second_deriv(x)
mse_phys = tf.reduce_mean(
tf.square(predicted_Txx * kappa + source_func(x)))
return mse_phys
@tf.function
def data_loss(x):
x = x[1:-1]
ob_T = solution(x)[:, None]
data_loss = tf.reduce_mean(tf.square(ob_T - model(x)))
return data_loss
@tf.function
def loss_func(x):
bcs_loss = boundary_loss(bcs_x_tensor, bcs_T_tensor)
32
phys_loss = physics_loss(x)
ob_loss = data_loss(x)
loss = phys_loss + ob_loss + bcs_loss
return loss
The training and testing procedures are the same as for the forward prob-
lem. Again, the gradients of the loss function with respect to κ and the train-
able variables, which are weights and biases of the network, are determined
with backpropagation. Then, the trainable variables and the κ value are
updated by the ADAM optimizer using previously obtained gradients. This
iterative procedure is repeated a number of epoch times, and eventually, it is
expected to reach the possible minimum loss value.
Listing 6: Training
# taking gradients of the loss function w.r.t. trainable variables
# and kappa
@tf.function
def get_grad():
with tf.GradientTape(persistent=True) as tape:
# This tape is for derivatives with
# respect to trainable variables
tape.watch(model.trainable_variables)
tape.watch(kappa)
Loss = loss_func(x)
g = tape.gradient(Loss, model.trainable_variables)
g_kappa = tape.gradient(Loss, kappa)
return Loss, g, g_kappa
# optimizing and updating the weights and biases of the model and
# kappa by using the gradients
@tf.function
def train_step():
# Compute current loss and gradient w.r.t. parameters
loss, grad_theta, grad_kappa = get_grad()
33
1.4 predicted κ = 0.5000
real κ = 0.50
1.2
1.0
0.8
T(x)
0.6
0.4
0.4
0.3
0.2
real value of κ
predicted κ
0.1
0 1000 2000 3000 4000 5000 6000
Iteration
∇2 u + k 2 u = 0 (45)
where ∇2 is the Laplace operator and k is the wave number. The solution
of the problem is u(x, y) for (x, y) ∈ Ω. An inverse acoustic duct problem,
adopted from [3], whose governing equation is a complex-valued Helmholtz
equation such that k is unknown and u(x, y) is known at some points in the
domain, will be investigated.
34
We can write (45) with domain information and boundary conditions as:
35
Here LBCs , LP hysics , LData refer to the loss obtained from boundary condi-
tions, governing equation, and measured data, respectively. The regulariza-
tion term λ1 is 100 whereas λ2 and λ3 are 1; NB indicates the number of
boundary points, NP , ND are the number of interior collocation points where
physics loss is computed and the number of points where the observed data
is available, respectively. In this problem, 784 equidistant points (28 × 28)
such that NP =ND = 676 and NB =108 are created (see Figure 14).
The neural network consists of 5 hidden layers with the tanh activation
function, and there are 30 neurons in each layer. The data is normalized
to the interval [−1, 1] before being processed. First, ADAM optimizer and,
subsequently, the quasi-Newton method (L-BFGS) are employed to minimize
the loss function. Five thousand iterations for ADAM and 6200 iterations
with L-BFGS are applied. The estimated solution for u(x, y) and the exact
solution are shown in Figure 15.
The initial guess for k was one, and the neural network’s estimation for
k after the training is 5.999. The relative L2 error norm for the real part of
the solution is 0.0015. A comparison between the predicted solution and the
exact solution can be found in Figure 16.
1.0
0.8
0.6
0.4
0.2
0.0
0.0 0.5 1.0 1.5 2.0
Figure 14: Collocation points for 2D Helmholtz equation. Black points depict
the boundary points where Neumann boundary conditions are valid whereas
the red points show the Robin boundary points. In addition, blue points
represent the inner collocation points where physics loss and data loss are
computed.
36
(a) Predicted solution for the real (b) Exact solution for the real part of
part of Helmholtz equation Helmholtz equation
(c) Predicted solution for the imagi- (d) Exact solution for the imaginary
nary part of Helmholtz equation part of Helmholtz equation
Figure 15: Predicted and exact values for real and imaginary parts of the
Helmholtz equation .
(a) Error distribution between pre- (b) Error distribution between pre-
dicted and exact solution for the real dicted and exact solution for the
part imaginary part
Figure 16: Error distribution for real and imaginary parts of the Helmholtz
equation .
5 Conclusions
In this chapter, we have introduced some of the main building blocks
for PINNs. The main idea is to cast the process of solving a PDE as an
optimization problem, where either the residual or some energy functional
37
related to the governing equations is minimized. We showed the imple-
mentation of PINNs for both simple and more advanced inverse problems.
First, a one-dimensional steady state heat conduction problem with a source
term was solved for the unknown thermal diffusivity constant κ. Later, a
complex-valued Helmholtz equation for an inverse acoustic duct problem was
investigated. The wave number k is unknown in the beginning, and it is
approximated by the PINN model. Unlike the forward problems, we have
an additional term in the loss function, which is formed as the mean square
error between the measured data and the model’s prediction.
By taking advantage of modern machine learning libraries, it is possi-
ble to write fairly succinct programs that approximate the solution or some
quantity of interest, while at the same time taking advantage of the built-
in parallelization offered by multi-processor and GPU architectures. Nev-
ertheless, solving PDEs by the optimization of parameters in a “standard”
fully-connected neural network is less efficient than current methods such as
finite elements. More advances seem possible by combining machine learning
algorithms with classical methods for solving PDEs which make use of the
available knowledge for approximating the solutions or quantities of interest.
References
[1] M. Abadi, A. Agarwal, P. Barham, E. Brevdo, et al. TensorFlow: Large-
Scale Machine Learning on Heterogeneous Systems. Software available
from tensorflow.org. 2015. url: https://www.tensorflow.org/.
[2] F. Agostinelli, M. Hoffman, P. Sadowski, and P. Baldi. “Learning acti-
vation functions to improve deep neural networks”. In: arXiv preprint
arXiv:1412.6830 (2014).
[3] C. Anitescu, E. Atroshchenko, N. Alajlan, and T. Rabczuk. “Artifi-
cial neural network methods for the solution of second order boundary
value problems”. In: Computers, Materials and Continua 59.1 (2019),
pp. 345–359.
[4] A. Apicella, F. Donnarumma, F. Isgrò, and R. Prevete. “A survey on
modern trainable activation functions”. In: Neural Networks 138 (2021),
pp. 14–32.
[5] J. Bradbury, R. Frostig, P. Hawkins, M. J. Johnson, et al. JAX: compos-
able transformations of Python+NumPy programs. Version 0.2.5. 2018.
url: http://github.com/google/jax.
[6] C. G. Broyden. “The convergence of a class of double-rank minimiza-
tion algorithms: 2. The new algorithm”. In: IMA journal of applied
mathematics 6.3 (1970), pp. 222–231.
38
[7] Y. Chen, L. Lu, G. E. Karniadakis, and L. Dal Negro. “Physics-informed
neural networks for inverse problems in nano-optics and metamateri-
als”. In: Optics express 28.8 (2020), pp. 11618–11633.
[8] D.-A. Clevert, T. Unterthiner, and S. Hochreiter. “Fast and accurate
deep network learning by exponential linear units (ELUs)”. In: arXiv
preprint arXiv:1511.07289 (2015).
[9] C. De Sa, C. Re, and K. Olukotun. “Global convergence of stochastic
gradient descent for some non-convex matrix problems”. In: Interna-
tional conference on machine learning. PMLR. 2015, pp. 2332–2341.
[10] I. Depina, S. Jain, S. Mar Valsson, and H. Gotovac. “Application of
physics-informed neural networks to inverse problems in unsaturated
groundwater flow”. In: Georisk: Assessment and Management of Risk
for Engineered Systems and Geohazards 16.1 (2022), pp. 21–36.
[11] J. V. Dillon, I. Langmore, D. Tran, E. Brevdo, et al. “Tensorflow dis-
tributions”. In: arXiv preprint arXiv:1711.10604 (2017).
[12] R. Fletcher. “A new approach to variable metric algorithms”. In: The
computer journal 13.3 (1970), pp. 317–322.
[13] L. Floridi and M. Chiriatti. “GPT-3: Its nature, scope, limits, and con-
sequences”. In: Minds and Machines 30.4 (2020), pp. 681–694.
[14] X. Glorot and Y. Bengio. “Understanding the difficulty of training deep
feedforward neural networks”. In: Proceedings of the thirteenth interna-
tional conference on artificial intelligence and statistics. JMLR Work-
shop and Conference Proceedings. 2010, pp. 249–256.
[15] D. Goldfarb. “A family of variable-metric methods derived by varia-
tional means”. In: Mathematics of computation 24.109 (1970), pp. 23–
26.
[16] I. Goodfellow, Y. Bengio, and A. Courville. Deep learning. MIT press,
2016.
[17] S. Goswami, C. Anitescu, and T. Rabczuk. “Adaptive fourth-order
phase field analysis for brittle fracture”. In: Computer Methods in Ap-
plied Mechanics and Engineering 361 (2020), p. 112808.
[18] I. Gühring, G. Kutyniok, and P. Petersen. “Error bounds for approxi-
mations with deep ReLU neural networks in W s,p norms”. In: Analysis
and Applications 18.05 (2020), pp. 803–859.
[19] E. Haghighat, D. Amini, and R. Juanes. “Physics-informed neural net-
work simulation of multiphase poroelasticity using stress-split sequen-
tial training”. In: Computer Methods in Applied Mechanics and Engi-
neering 397 (2022), p. 115141.
[20] J. He, L. Li, J. Xu, and C. Zheng. “Relu deep neural networks and
linear finite elements”. In: Journal of Computational Mathematics 38.3
(2020), pp. 502–527.
39
[21] K. He, X. Zhang, S. Ren, and J. Sun. “Delving deep into rectifiers: Sur-
passing human-level performance on imagenet classification”. In: Pro-
ceedings of the IEEE international conference on computer vision. 2015,
pp. 1026–1034.
[22] D. Hendrycks and K. Gimpel. “Gaussian error linear units (GELUs)”.
In: arXiv preprint arXiv:1606.08415 (2016).
[23] K. Hornik, M. Stinchcombe, and H. White. “Multilayer feedforward
networks are universal approximators”. In: Neural networks 2.5 (1989),
pp. 359–366.
[24] A. D. Jagtap, K. Kawaguchi, and G. Em Karniadakis. “Locally adaptive
activation functions with slope recovery for deep and physics-informed
neural networks”. In: Proceedings of the Royal Society A 476.2239 (2020),
p. 20200334.
[25] A. D. Jagtap, K. Kawaguchi, and G. E. Karniadakis. “Adaptive acti-
vation functions accelerate convergence in deep and physics-informed
neural networks”. In: Journal of Computational Physics 404 (2020),
p. 109136.
[26] A. D. Jagtap, Y. Shin, K. Kawaguchi, and G. E. Karniadakis. “Deep
Kronecker neural networks: A general framework for neural networks
with adaptive activation functions”. In: Neurocomputing 468 (2022),
pp. 165–180.
[27] N. P. Jouppi, C. Young, N. Patil, D. Patterson, et al. “In-datacenter
performance analysis of a tensor processing unit”. In: Proceedings of the
44th annual international symposium on computer architecture. 2017,
pp. 1–12.
[28] J. Jumper, R. Evans, A. Pritzel, T. Green, et al. “Highly accurate pro-
tein structure prediction with AlphaFold”. In: Nature 596.7873 (2021),
pp. 583–589.
[29] E. Kharazmi, Z. Zhang, and G. E. Karniadakis. “Variational physics-
informed neural networks for solving partial differential equations”. In:
arXiv preprint arXiv:1912.00873 (2019).
[30] D. P. Kingma and J. Ba. “Adam: A method for stochastic optimization”.
In: arXiv preprint arXiv:1412.6980 (2014).
[31] G. Kissas, Y. Yang, E. Hwuang, W. R. Witschey, et al. “Machine learn-
ing in cardiovascular flows modeling: Predicting arterial blood pressure
from non-invasive 4D flow MRI data using physics-informed neural net-
works”. In: Computer Methods in Applied Mechanics and Engineering
358 (2020), p. 112623.
[32] I. E. Lagaris, A. Likas, and D. I. Fotiadis. “Artificial neural network
methods in quantum mechanics”. In: Computer Physics Communica-
tions 104.1-3 (1997), pp. 1–14.
40
[33] I. E. Lagaris, A. Likas, and D. I. Fotiadis. “Artificial neural networks
for solving ordinary and partial differential equations”. In: IEEE trans-
actions on neural networks 9.5 (1998), pp. 987–1000.
[34] I. E. Lagaris, A. C. Likas, and D. G. Papageorgiou. “Neural-network
methods for boundary value problems with irregular boundaries”. In:
IEEE Transactions on Neural Networks 11.5 (2000), pp. 1041–1049.
[35] K. Levenberg. “A method for the solution of certain non-linear prob-
lems in least squares”. In: Quarterly of applied mathematics 2.2 (1944),
pp. 164–168.
[36] A. Li, R. Chen, A. B. Farimani, and Y. J. Zhang. “Reaction diffusion
system prediction based on convolutional neural network”. In: Scientific
reports 10.1 (2020), pp. 1–9.
[37] Z. Li, F. Liu, W. Yang, S. Peng, et al. “A survey of convolutional neural
networks: analysis, applications, and prospects”. In: IEEE Transactions
on Neural Networks and Learning Systems (2021).
[38] Z. Li, N. Kovachki, K. Azizzadenesheli, B. Liu, et al. “Fourier neural op-
erator for parametric partial differential equations”. In: arXiv preprint
arXiv:2010.08895 (2020).
[39] D. C. Liu and J. Nocedal. “On the limited memory BFGS method for
large scale optimization”. In: Mathematical programming 45.1 (1989),
pp. 503–528.
[40] J. López, C. Anitescu, and T. Rabczuk. “Isogeometric structural shape
optimization using automatic sensitivity analysis”. In: Applied Mathe-
matical Modelling 89 (2021), pp. 1004–1024.
[41] L. Lu, P. Jin, G. Pang, Z. Zhang, et al. “Learning nonlinear opera-
tors via DeepONet based on the universal approximation theorem of
operators”. In: Nature Machine Intelligence 3.3 (2021), pp. 218–229.
[42] A. L. Maas, A. Y. Hannun, A. Y. Ng, et al. “Rectifier nonlinearities
improve neural network acoustic models”. In: Proc. icml. Vol. 30. 1.
Citeseer. 2013, p. 3.
[43] D. W. Marquardt. “An algorithm for least-squares estimation of non-
linear parameters”. In: Journal of the society for Industrial and Applied
Mathematics 11.2 (1963), pp. 431–441.
[44] P. Mertikopoulos, N. Hallak, A. Kavis, and V. Cevher. “On the al-
most sure convergence of stochastic gradient descent in non-convex
problems”. In: Advances in Neural Information Processing Systems 33
(2020), pp. 1117–1128.
[45] D. Misra. “Mish: A self regularized non-monotonic activation function”.
In: arXiv preprint arXiv:1908.08681 (2019).
41
[46] V. M. Nguyen-Thanh, X. Zhuang, and T. Rabczuk. “A deep energy
method for finite deformation hyperelasticity”. In: European Journal of
Mechanics-A/Solids 80 (2020), p. 103874.
[47] A. D. Otero and F. L. Ponta. “Structural analysis of wind-turbine
blades by a generalized Timoshenko beam model”. In: (2010).
[48] A. Paszke, S. Gross, F. Massa, A. Lerer, et al. “PyTorch: An Impera-
tive Style, High-Performance Deep Learning Library”. In: Advances in
Neural Information Processing Systems 32. Curran Associates, Inc.,
2019, pp. 8024–8035. url: http : / / papers . neurips . cc / paper /
9015 - pytorch - an - imperative - style - high - performance - deep -
learning-library.pdf.
[49] P. Petersen and F. Voigtlaender. “Optimal approximation of piecewise
smooth functions using deep ReLU neural networks”. In: Neural Net-
works 108 (2018), pp. 296–330.
[50] D. Pfau, J. S. Spencer, A. G. D. G. Matthews, and W. M. C. Foulkes.
“Ab initio solution of the many-electron Schrödinger equation with
deep neural networks”. In: Phys. Rev. Research 2 (3 2020), p. 033429.
[51] G. Philipp, D. Song, and J. G. Carbonell. “Gradients explode - Deep
Networks are shallow - ResNet explained”. In: (2018).
[52] M. Raissi, P. Perdikaris, and G. E. Karniadakis. “Physics-informed neu-
ral networks: A deep learning framework for solving forward and inverse
problems involving nonlinear partial differential equations”. In: Journal
of Computational physics 378 (2019), pp. 686–707.
[53] P. Ramachandran, B. Zoph, and Q. V. Le. “Searching for activation
functions”. In: arXiv preprint arXiv:1710.05941 (2017).
[54] D. E. Rumelhart, G. E. Hinton, and R. J. Williams. “Learning rep-
resentations by back-propagating errors”. In: nature 323.6088 (1986),
pp. 533–536.
[55] E. Samaniego, C. Anitescu, S. Goswami, V. M. Nguyen-Thanh, et al.
“An energy approach to the solution of partial differential equations in
computational mechanics via machine learning: Concepts, implemen-
tation and applications”. In: Computer Methods in Applied Mechanics
and Engineering 362 (2020), p. 112790.
[56] D. F. Shanno. “Conditioning of quasi-Newton methods for function
minimization”. In: Mathematics of computation 24.111 (1970), pp. 647–
656.
[57] K. Shukla, P. C. Di Leoni, J. Blackshire, D. Sparkman, et al. “Physics-
informed neural network for ultrasound nondestructive quantification
of surface breaking cracks”. In: Journal of Nondestructive Evaluation
39.3 (2020), pp. 1–20.
42
[58] K. Shukla, A. D. Jagtap, and G. E. Karniadakis. “Parallel physics-
informed neural networks via domain decomposition”. In: Journal of
Computational Physics 447 (2021), p. 110683.
[59] D. Silver, T. Hubert, J. Schrittwieser, I. Antonoglou, et al. “Master-
ing chess and shogi by self-play with a general reinforcement learning
algorithm”. In: arXiv preprint arXiv:1712.01815 (2017).
[60] J. Sirignano and K. Spiliopoulos. “DGM: A deep learning algorithm
for solving partial differential equations”. In: Journal of computational
physics 375 (2018), pp. 1339–1364.
[61] N Sukumar and A. Srivastava. “Exact imposition of boundary con-
ditions with distance functions in physics-informed deep neural net-
works”. In: Computer Methods in Applied Mechanics and Engineering
389 (2022), p. 114333.
[62] S. Sun, Z. Cao, H. Zhu, and J. Zhao. “A survey of optimization meth-
ods from a machine learning perspective”. In: IEEE transactions on
cybernetics 50.8 (2019), pp. 3668–3681.
[63] M. Vauhkonen, T. Tarvainen, and T. Lähivaara. “Inverse Problems”. In:
Mathematical Modelling. Ed. by S. Pohjolainen. Springer International
Publishing, 2016.
[64] U. bin Waheed, E. Haghighat, T. Alkhalifah, C. Song, et al. “PINNeik:
Eikonal solution using physics-informed neural networks”. In: Comput-
ers & Geosciences 155 (2021), p. 104833.
[65] C. Wang, V. Tan, and Y. Zhang. “Timoshenko beam model for vibra-
tion analysis of multi-walled carbon nanotubes”. In: Journal of Sound
and Vibration 294.4-5 (2006), pp. 1060–1072.
[66] G.-F. Wang and X.-Q. Feng. “Timoshenko beam model for buckling
and vibration of nanowires with surface effects”. In: Journal of physics
D: applied physics 42.15 (2009), p. 155411.
[67] S. Wang, X. Yu, and P. Perdikaris. “When and why PINNs fail to
train: A neural tangent kernel perspective”. In: Journal of Computa-
tional Physics 449 (2022), p. 110768.
[68] C. L. Wight and J. Zhao. “Solving allen-cahn and cahn-hilliard equa-
tions using the adaptive physics informed neural networks”. In: arXiv
preprint arXiv:2007.04542 (2020).
[69] B. Yu et al. “The deep Ritz method: a deep learning-based numeri-
cal algorithm for solving variational problems”. In: Communications in
Mathematics and Statistics 6.1 (2018), pp. 1–12.
[70] J. Yu, L. Lu, X. Meng, and G. E. Karniadakis. “Gradient-enhanced
physics-informed neural networks for forward and inverse PDE prob-
lems”. In: Computer Methods in Applied Mechanics and Engineering
393 (2022), p. 114823.
43
[71] X. Zhuang, H. Guo, N. Alajlan, H. Zhu, et al. “Deep autoencoder
based energy method for the bending, vibration, and buckling anal-
ysis of Kirchhoff plates with transfer learning”. In: European Journal of
Mechanics-A/Solids 87 (2021), p. 104225.
44