Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
40 views

Introduction To Differentiable Physics - Physics-Based Deep Learning

Differentiable physics (DP) allows numerical simulations to be incorporated into deep learning by making the simulations differentiable. This enables gradient information to flow between the simulator and neural network during training via backpropagation. DP represents physical systems with continuous models and discretizes them using established numerical methods. Each operator in the simulation must provide gradients with respect to its inputs. The Jacobians of the operators are used to calculate gradients via the chain rule for efficient backpropagation through the differentiable simulation operators. This allows neural networks to learn to solve classes of physics problems by training on differentiable simulations.

Uploaded by

imsgsatti
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
40 views

Introduction To Differentiable Physics - Physics-Based Deep Learning

Differentiable physics (DP) allows numerical simulations to be incorporated into deep learning by making the simulations differentiable. This enables gradient information to flow between the simulator and neural network during training via backpropagation. DP represents physical systems with continuous models and discretizes them using established numerical methods. Each operator in the simulation must provide gradients with respect to its inputs. The Jacobians of the operators are used to calculate gradients via the chain rule for efficient backpropagation through the differentiable simulation operators. This allows neural networks to learn to solve classes of physics problems by training on differentiable simulations.

Uploaded by

imsgsatti
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

7/4/23, 6:59 PM Introduction to Differentiable Physics — Physics-based Deep Learning

Introduction to Differentiable Physics

Contents
Differentiable operators
Jacobians
Learning via DP operators
A practical example
Implicit gradient calculations
Summary of differentiable physics so far

As a next step towards a tighter and more generic combination of deep learning methods and
physical simulations we will target incorporating differentiable numerical simulations into the
learning process. In the following, we’ll shorten these “differentiable numerical simulations of
physical systems” to just “differentiable physics” (DP).

The central goal of these methods is to use existing numerical solvers, and equip them with
functionality to compute gradients with respect to their inputs. Once this is realized for all
operators of a simulation, we can leverage the autodiff functionality of DL frameworks with
backpropagation to let gradient information flow from a simulator into an NN and vice versa.
This has numerous advantages such as improved learning feedback and generalization, as we’ll
outline below.

In contrast to physics-informed loss functions, it also enables handling more complex solution
manifolds instead of single inverse problems. E.g., instead of using deep learning to solve single
inverse problems as in the previous chapter, differentiable physics can be used to train NNs that
learn to solve larger classes of inverse problems very efficiently.

Fig. 9 Training with differentiable physics means that a chain of differentiable operators
provide directions in the form of gradients to steer the learning process.

Differentiable operators
With the DP direction we build on existing numerical solvers. I.e., the approach is strongly
relying on the algorithms developed in the larger field of computational methods for a vast
range of physical effects in our world. To start with, we need a continuous formulation as model
for the physical effect that we’d like to simulate – if this is missing we’re in trouble. But luckily,
we can tap into existing collections of model equations and established methods for discretizing
continuous models.

https://physicsbaseddeeplearning.org/diffphys.html#a-practical-example 1/8
7/4/23, 6:59 PM

Let’s assume we have a continuous formulation P


u(x, t) : R

 Note

simulations.
d
× R
+

∂u
→ R

) a
d

∂u

⎢⎥
constants). The components of u will be denoted by a numbered subscript, i.e.,
u = (u 1 , u 2 , … , u d )
T

u(t + Δt) = P m ∘ … P 2 ∘ P 1 (u(t), ν)

f (g(x)) = f ∘ g(x) .
.

In order to integrate this solver into a DL process, we need to ensure that every
operator P provides a gradient w.r.t. its inputs, i.e. in the example above ∂P

Jacobians
i

directly extends for larger numbers of composited functions, i.e. i


∂P i,d /∂u 1

where, as above, d denotes the number of components in u. As P maps one value of u to

encounter during training, this would cause huge memory overheads and unnecessarily slow
down training. Instead, for backpropagation, we can provide faster operations that compute
Introduction to Differentiable Physics — Physics-based Deep Learning

products with the Jacobian transpose because we always have a scalar loss function at the end
of the chain.

Given the formulation above, we need to resolve the derivatives of the chain of function
compositions of the P at some current state u via the chain rule. E.g., for two of them
i

∂(P 1 ∘ P 2 )

un

which is just the vector valued version of the “classic” chain rule f (g(x))
=
n

∂P 1

∂u
(x, ν)

Typically, we are interested in the temporal evolution of such a system. Discretization yields a
formulation P(x, ν) that we re-arrange to compute a future state after a time step Δt. The
state at t + Δt is computed via sequence of operations P

Note that we typically don’t need derivatives for all parameters of P(x, ν), e.g., we omit ν in the
following, assuming that this is a given model parameter with which the NN should not interact.
Naturally, it can vary within the solution manifold that we’re interested in, but ν will not be the
output of an NN representation. If this is the case, we can omit providing ∂P
However, the following learning process naturally transfers to including ν as a degree of
freedom.

As u is typically a vector-valued function, ∂P


single value:

∂P i

∂u
=



i /∂u

∂P i,1 /∂u 1

another, the Jacobian is square here. Of course this isn’t necessarily the case for general model
equations, but non-square Jacobian matrices would not cause any problems for differentiable

In practice, we rely on the reverse mode differentiation that all modern DL frameworks provide,
and focus on computing a matrix vector product of the Jacobian transpose with a vector a, i.e.
the expression: (
∂P i T

Here, the derivatives for P and P are still Jacobian matrices, but knowing that at the “end” of
1 2

the chain we have our scalar loss (cf. Overview), the right-most Jacobian will invariably be a
P 2 (u

matrix with 1 column, i.e. a vector. During reverse mode, we start with this vector, and compute
the multiplications with the left Jacobians,

https://physicsbaseddeeplearning.org/diffphys.html#a-practical-example
∂P 1

∂u
above, one by one.
of the physical quantity of interest
, with model parameters ν (e.g., diffusion, viscosity, or conductivity

1, P2 … Pm

, where ∘ denotes function decomposition, i.e.

∂P i,d /∂u d

. If we’d need to construct and store all full Jacobian matrices that we

n
)
∂P 2

∂u
un

> 2

.
such that

denotes a Jacobian matrix J rather than a

⋯ ∂P i,1 /∂u d


i /∂ν


i /∂u

= f (g(x))g (x)
.

in our solver.


, and

2/8
7/4/23, 6:59 PM Introduction to Differentiable Physics — Physics-based Deep Learning

For the details of forward and reverse mode differentiation, please check out external materials
such as this nice survey by Baydin et al..

Learning via DP operators


Thus, once the operators of our simulator support computations of the Jacobian-vector
products, we can integrate them into DL pipelines just like you would include a regular fully-
connected layer or a ReLU activation.

At this point, the following (very valid) question arises: “Most physics solvers can be broken down
into a sequence of vector and matrix operations. All state-of-the-art DL frameworks support these,
so why don’t we just use these operators to realize our physics solver?”

It’s true that this would theoretically be possible. The problem here is that each of the vector
and matrix operations in tensorflow and pytorch is computed individually, and internally needs
to store the current state of the forward evaluation for backpropagation (the “g(x)” above). For
a typical simulation, however, we’re not overly interested in every single intermediate result our
solver produces. Typically, we’re more concerned with significant updates such as the step from
u(t) to u(t + Δt).

Thus, in practice it is a very good idea to break down the solving process into a sequence of
meaningful but monolithic operators. This not only saves a lot of work by preventing the
calculation of unnecessary intermediate results, it also allows us to choose the best possible
numerical methods to compute the updates (and derivatives) for these operators. E.g., as this
process is very similar to adjoint method optimizations, we can re-use many of the techniques
that were developed in this field, or leverage established numerical methods. E.g., we could
leverage the O(n) runtime of multigrid solvers for matrix inversion.

The flip-side of this approach is that it requires some understanding of the problem at hand,
and of the numerical methods. Also, a given solver might not provide gradient calculations out
of the box. Thus, if we want to employ DL for model equations that we don’t have a proper
grasp of, it might not be a good idea to directly go for learning via a DP approach. However, if
we don’t really understand our model, we probably should go back to studying it a bit more
anyway…

Also, in practice we should be greedy with the derivative operators, and only provide those
which are relevant for the learning task. E.g., if our network never produces the parameter ν in
the example above, and it doesn’t appear in our loss formulation, we will never encounter a
∂/∂ν derivative in our backpropagation step.

The following figure summarizes the DP-based learning approach, and illustrates the sequence
of operations that are typically processed within a single PDE solve. As many of the operations
are non-linear in practice, this often leads to a challenging learning task for the NN:

Fig. 10 DP learning with a PDE solver that consists of m individual operators P . The gradient i

travels backward through all m operators before influencing the network weights θ.

https://physicsbaseddeeplearning.org/diffphys.html#a-practical-example 3/8
7/4/23, 6:59 PM Introduction to Differentiable Physics — Physics-based Deep Learning

A practical example
As a simple example let’s consider the advection of a passive scalar density d(x, t) in a velocity
field u as physical model P : ∗

∂d
+ u ⋅ ∇d = 0
∂t

Instead of using this formulation as a residual equation right away (as in v2 of Physical Loss
Terms), we can discretize it with our favorite mesh and discretization scheme, to obtain a
formulation that updates the state of our system over time. This is a standard procedure for a
forward solve. To simplify things, we assume here that u is only a function in space, i.e. constant
over time. We’ll bring back the time evolution of u later on.

Let’s denote this re-formulation as P . It maps a state of d(t) into a new state at an evolved time,
i.e.:

d(t + Δt) = P( d(t), u, t + Δt)

As a simple example of an inverse problem and learning task, let’s consider the problem of
finding a velocity field u. This velocity should transform a given initial scalar density state d at 0

time t into a state that’s evolved by P to a later “end” time t with a certain shape or
0 e

configuration d target
. Informally, we’d like to find a flow that deforms d through the PDE model 0

into a target state. The simplest way to express this goal is via an L loss between the two 2

states. So we want to minimize the loss function L = |d(t ) − d


e target
|
2
.

Note that as described here this inverse problem is a pure optimization task: there’s no NN
involved, and our goal is to obtain u. We do not want to apply this velocity to other, unseen test
data, as would be custom in a real learning task.

The final state of our marker density d(t e


) is fully determined by the evolution from P via u,
which gives the following minimization problem:

0 e target 2
arg min u
|P(d , u, t ) − d |

We’d now like to find the minimizer for this objective by gradient descent (GD), where the
gradient is determined by the differentiable physics approach described earlier in this chapter.
Once things are working with GD, we can relatively easily switch to better optimizers or bring an
NN into the picture, hence it’s always a good starting point. To make things easier to read
below, we’ll omit the transpose of the Jacobians in the following. Unfortunately, the Jacobian is
defined this way, but we actually never need the un-transposed one. Keep in mind that in
practice we’re dealing with tranposed Jacobians ( that are “abbreviated” by .
∂a T ∂a
)
∂b ∂b

As the discretized velocity field u contains all our degrees of freedom, all we need to do is to
update the velocity by an amount Δu = ∂L/∂u , which is decomposed into Δu =
∂d

∂u
∂L

∂d
.

The ∂L

∂d
component is typically simple enough: we’ll get

0 e target 2
∂L ∂|P(d , u, t ) − d |
e target
= = 2(d(t ) − d ).
∂d ∂d

If d is represented as a vector, e.g., for one entry per cell of a mesh, ∂L

∂d
will likewise be a column
vector of equivalent size. This stems from the fact that L is always a scalar loss function, and so
the Jacobian matrix will have a dimension of 1 along the L dimension. Intuitively, this vector will
simply contain the differences between d at the end time in comparison to the target densities
d
target
.

The evolution of d itself is given by our discretized physical model P , and we use P and d
interchangeably. Hence, the more interesting component is the Jacobian ∂d/∂u = ∂P/∂u to
compute the full Δu =
∂d

∂u
∂L

∂d
. We luckily don’t need ∂d/∂u as a full matrix, but instead only
multiplied by ∂L

∂d
.

https://physicsbaseddeeplearning.org/diffphys.html#a-practical-example 4/8
7/4/23, 6:59 PM Introduction to Differentiable Physics — Physics-based Deep Learning

So what is the actual Jacobian for d? To compute it we first need to finalize our PDE model P ,
such that we get an expression which we can derive. In the next section we’ll choose a specific
advection scheme and a discretization so that we can be more specific.

Introducing a specific advection scheme


In the following we’ll make use of a simple first order upwinding scheme on a Cartesian grid in
1D, with marker density d and velocity u for cell i. We omit the (t) for quantities at time t for
i i

brevity, i.e., d i (t) is written as d below. From above, we’ll use our physical model that updates
i

the marker density d i (t + Δt) = P(d i (t), u(t), t + Δt) , which gives the following:

+ −
d i (t + Δt) = d i − Δt[u (d i+1 − d i ) + u (d i − d i−1 )] with
i i

+
u = min(u i /Δx, 0)
i


u = max(u i /Δx, 0)
i

Fig. 11 1st-order upwinding uses a simple one-sided finite-difference stencil that takes into
account the direction of the flow

Thus, for a negative u , we’re using u to look in the opposite direction of the velocity, i.e.,
i
+

backward in terms of the motion. u will be zero in this case. For positive u it’s vice versa, and

i i

we’ll get a zero’ed u , and a backward difference stencil via u . To pick the former case, for a
+

i

negative u we geti

u i Δt u i Δt
P(d i (t), u(t), t + Δt) = (1 + )d i − d i+1 (17)
Δx Δx

and hence ∂P/∂u gives i


Δt

Δx
di −
Δt

Δx
d i+1 . Intuitively, the change of the velocity u depends on i

the spatial derivatives of the densities. Due to the first order upwinding, we only include two
neighbors (higher order methods would depend on additional entries of d)

In practice this step is equivalent to evaluating a transposed matrix multiplication. If we rewrite


the calculation above as P(d , then (∂P/∂u) . However, in
T
T
i (t), u(t), t + Δt) = Au = A

many practical cases, a matrix free implementation of this multiplication might be preferable to
actually constructing A.

Another derivative that we can consider for the advection scheme is that w.r.t. the previous
density state, i.e. d i (t) , which is d in the shortened notation. ∂P/∂d for cell i from above gives
i i

1 +
u i Δt

Δx
. However, for the full gradient we’d need to add the potential contributions from cells
i + 1 and i − 1, depending on the sign of their velocities. This derivative will come into play in
the next section.

Time evolution
So far we’ve only dealt with a single update step of d from time t to t + Δt, but we could of
course have an arbitrary number of such steps. After all, above we stated the goal to advance
the initial marker state d(t 0
) to the target state at time t , which could encompass a long
e

interval of time.

In the expression above for d i (t + Δt) , each of the d i (t) in turn depends on the velocity and
density states at time t − Δt, i.e., d i (t − Δt) . Thus we have to trace back the influence of our
loss L all the way back to how u influences the initial marker state. This can involve a large
number of evaluations of our advection scheme via P .

https://physicsbaseddeeplearning.org/diffphys.html#a-practical-example 5/8
7/4/23, 6:59 PM Introduction to Differentiable Physics — Physics-based Deep Learning

This sounds challenging at first: e.g., one could try to insert equation (17) at time t − Δt into
equation (17) at time t and repeat this process recursively until we have a single expression
relating d to the targets. However, thanks to the linear nature of the Jacobians, we treat each
0

advection step, i.e., each invocation of our PDE P as a seperate, modular operation. And each of
these invocations follows the procedure described in the previous section.

Given the machinery above, the backtrace is fairly simple to realize: for each of the advection
steps in P we compute a Jacobian product with the incoming vector of derivatives from the loss
L or a previous advection step. We repeat this until we have traced the chain from the loss with
d
target
all the way back to d . Theoretically, the velocity u could be a function of time like d, in
0

which case we’d get a gradient Δu(t) for every time step t. However, to simplify things below,
let’s we assume we have field that is constant in time, i.e., we’re reusing the same velocities u for
every advection via P . Now, each time step will give us a contribution to Δu which we
accumulate for all steps.

e
∂d(t ) ∂L
Δu =
e
∂u ∂d(t )
e e
∂d(t − Δt) ∂d(t ) ∂L
+
e e
∂u ∂d(t − Δt) ∂d(t )

+ ⋯

0 e e
∂d(t ) ∂d(t − Δt) ∂d(t ) ∂L
+ ( ⋯ )
e e e
∂u ∂d(t − 2Δt) ∂d(t − Δt) ∂d(t )

Here the last term above contains the full backtrace of the marker density to time t . The terms 0

of this sum look unwieldy at first, but looking closely, each line simply adds an additional
Jacobian for one time step on the left hand side. This follows from the chain rule, as shown in
the two-operator case above. So the terms of the sum contain a lot of similar Jacobians, and in
practice can be computed efficiently by backtracing through the sequence of computational
steps that resulted from the forward evaluation of our PDE. (Note that, as mentioned above,
we’ve omitted the tranpose of the Jacobians here.)

This structure also makes clear that the process is very similar to the regular training process of
an NN: the evaluations of these Jacobian vector products from nested function calls is exactly
what a deep learning framework does for training an NN (we just have weights θ instead of a
velocity field there). And hence all we need to do in practice is to provide a custom function the
Jacobian vector product for P .

Implicit gradient calculations


As a slightly more complex example let’s consider Poisson’s equation ∇ 2
a = b , where a is the
quantity of interest, and b is given. This is a very fundamental elliptic PDE that is important for a
variety of physical problems, from electrostatics to gravitational fields. It also arises in the
context of fluids, where a takes the role of a scalar pressure field in the fluid, and the right hand
side b is given by the divergence of the fluid velocity u.

For fluids, we typically have u n


= u − ∇p , with ∇ 2
p = ∇ ⋅ u . Here, u denotes the new,
n

divergence-free velocity field. This step is typically crucial to enforce the hard-constraint
∇ ⋅ u = 0 , and also goes under the name of Chorin Projection, or Helmholtz decomposition. It is
a direct consequence of the fundamental theorem of vector calculus.

If we now introduce an NN that modifies u in a solver, we inevitably have to backpropagate


through the Poisson solve. I.e., we need a gradient for u , which in this notation takes the form
n

∂u
n
/∂u .

In combination, we aim for computing u n


= u − ∇ ((∇ )
2 −1
∇ ⋅ u) . The outer gradient (from
∇p ) and the inner divergence (∇ ⋅ u) are both linear operators, and their gradients are simple
to compute. The main difficulty lies in obtaining the matrix inverse (∇ 2
)
−1
from Poisson’s
equation (we’ll keep it a bit simpler here, but it’s often time-dependent, and non-linear).

https://physicsbaseddeeplearning.org/diffphys.html#a-practical-example 6/8
7/4/23, 6:59 PM Introduction to Differentiable Physics — Physics-based Deep Learning

In practice, the matrix vector product for (∇ 2


)
−1
b with b = ∇ ⋅ u is not explicitly computed via
matrix operations, but approximated with a (potentially matrix-free) iterative solver. E.g.,
conjugate gradient (CG) methods are a very popular choice here. Thus, we theoretically could
treat this iterative solver as a function S , with p = S(∇ ⋅ u) . It’s worth noting that matrix
inversion is a non-linear process, despite the matrix itself being linear. As solvers like CG are also
based on matrix and vector operations, we could decompose S into a sequence of simpler
operations over the course of all solver iterations as S(x) = S n (S n−1 (. . . S 1 (x))) , and
backpropagate through each of them. This is certainly possible, but not a good idea: it can
introduce numerical problems, and will be very slow. As mentioned above, by default DL
frameworks store the internal states for every differentiable operator like the S i () in this
example, and hence we’d organize and keep a potentially huge number of intermediate states in
memory. These states are completely uninteresting for our original PDE, though. They’re just
intermediate states of the CG solver.

If we take a step back and look at p 2


= (∇ )
−1
b, it’s gradient ∂p/∂b is just ((∇ 2
)
−1
)
T
. And in
this case, (∇ 2
) is a symmetric matrix, and so ((∇ 2
)
−1
)
T 2
= (∇ )
−1
. This is the identical inverse
matrix that we encountered in the original equation above, and hence we re-use our unmodified
iterative solver to compute the gradient. We don’t need to take it apart and slow it down by
storing intermediate states. However, the iterative solver computes the matrix-vector-products
for (∇ 2
)
−1
. So what is b during backpropagation? In an optimization setting we’ll always have
b

our loss function L at the end of the forward chain. The backpropagation step will then give a
gradient for the output, let’s assume it is ∂L/∂p here, which needs to be propagated to the
earlier operations of the forward pass. Thus, we simply invoke our iterative solve during the
backward pass to compute ∂p/∂b = S(∂L/∂p) . And assuming that we’ve chosen a good
solver as S for the forward pass, we get exactly the same performance and accuracy in the
backwards pass.

If you’re interested in a code example, the differentiate-pressure example of phiflow uses exactly
this process for an optimization through a pressure projection step: a flow field that is
constrained on the right side, is optimized for the content on the left, such that it matches the
target on the right after a pressure projection step.

The main take-away here is: it is important not to blindly backpropagate through the forward
computation, but to think about which steps of the analytic equations for the forward pass to
compute gradients for. In cases like the above, we can often find improved analytic expressions
for the gradients, which we then approximate numerically.

 Implicit Function Theorem & Time

IFT: The process above essentially yields an implicit derivative. Instead of explicitly
deriving all forward steps, we’ve relied on the implicit function theorem to compute
the derivative.

Time: we can actually consider the steps of an iterative solver as a virtual “time”, and
backpropagate through these steps. In line with other DP approaches, this enables an
NN to interact with an iterative solver. An example is to learn initial guesses of CG
solvers from [UBH+20]. Details and code can be found here.

Summary of differentiable physics so far


To summarize, using differentiable physical simulations gives us a tool to include physical
equations with a chosen discretization into DL. In contrast to the residual constraints of the
previous chapter, this makes it possible to let NNs seamlessly interact with physical solvers.

We’d previously fully discard our physical model and solver once the NN is trained: in the
example from Burgers Optimization with a Physics-Informed NN the NN gives us the solution
directly, bypassing any solver or model equation. The DP approach substantially differs from the

https://physicsbaseddeeplearning.org/diffphys.html#a-practical-example 7/8
7/4/23, 6:59 PM Introduction to Differentiable Physics — Physics-based Deep Learning

physics-informed NNs (v2) from Physical Loss Terms, it has more in common with the controlled
discretizations (v1). They are essentially a subset, or partial application of DP training.

However in contrast to both residual approaches, DP makes it possible to train an NN alongside


a numerical solver, and thus we can make use of the physical model (as represented by the
solver) later on at inference time. This allows us to move beyond solving single inverse
problems, and yields NNs that quite robustly generalize to new inputs. Let’s revisit the example
problem from Burgers Optimization with a Physics-Informed NN in the context of DPs.

By N. Thuerey, P. Holl, M. Mueller, P. Schnell, F. Trost, K. Um


© Copyright 2021,2022.

https://physicsbaseddeeplearning.org/diffphys.html#a-practical-example 8/8

You might also like