ML PDF
ML PDF
net/publication/325737916
CITATIONS READS
0 1,918
3 authors, including:
All content following this page was uploaded by David Flad on 17 June 2018.
In this work, we present a novel data-based approach to turbulence modelling for Large
Eddy Simulation (LES) by artificial neural networks. We define the exact closure terms
including the discretization operators and generate training data from direct numerical
simulations of decaying homogeneous isotropic turbulence. We design and train artificial
neural networks based on local convolution filters to predict the underlying unknown
non-linear mapping from the coarse grid quantities to the closure terms without a
priori assumptions. All investigated networks are able to generalize from the data and
learn approximations with a cross correlation of up to 47% and even 73% for the inner
elements, leading to the conclusion that the current training success is data-bound. We
further show that selecting both the coarse grid primitive variables as well as the coarse
grid LES operator as input features significantly improves training results. Finally, we
construct a stable and accurate LES model from the learned closure terms. Therefore, we
translate the model predictions into a data-adaptive, pointwise eddy viscosity closure and
show that the resulting LES scheme performs well compared to current state of the art
approaches. This work represents the starting point for further research into data-driven,
universal turbulence models.
Key words:
1. Introduction
Machine learning algorithms and in particular deep neural networks (DNN) thrive in
situations where a structural relation between input and output is presumably present
but unknown, when sufficiently many training samples exist and the computing power to
train and deploy these algorithms is available. The confluence of these three conditions
in the last half decade has given rise to an extraordinary interest in these algorithms
and their applications, e.g. from mastering the game of Go (Silver et al. 2016), to object
detection and steering in self-driving cars (Bojarski et al. 2016) to natural language
processing (Bengio et al. 2003). At the centre of each of these applications lies the search
for a non-linear model that approximates the underlying functional relationship without
a priori assumptions or analytical considerations. Based on the reported successes in a
number of fields, this “learning from data” approach could provide a powerful method
for model development in fluid mechanics, wherever the governing equations derived
from first principles need to be augmented by some form of closure term which typically
† A. Beck and D. Flad share first authorship
‡ Email address for correspondence: beck@iag.uni-stuttgart.de
2 A. D. Beck, D. G. Flad and C.-D. Munz
incorporates information from different physical effects or scales (Kutz 2017).
In this work, we develop a data-based closure for the subgrid (SG) terms arising in Large
Eddy Simulation (LES). Without a priori assumptions, the exact SG terms are ”learned”
through supervised learning by DNN from Direct Numerical Simulation (DNS) and LES
data of homogeneous isotropic turbulence. The resulting closures thus are not models in
the classical explicit sense, where the effect of unresolved physics on the grid-represented
flow field is expressed by an a priori assumed functional relationship, but in the machine
learning sense as they constitute a data-based approximation of an unknown, but existing
functional relationship. While this data-based reconstruction of the DNS flow field results
in an approximation of the perfect subgrid closure, by design this approach cancels the
influence of the coarse grid numerical operator applied to the discretized field (i.e. the
LES operator) and is thus not directly suitable for practical applications. This is clearly
the case considering the approach for perfect LES with implicit filtering as used in this
work and derived in Nadiga & Livescu (2007). We argue that this is indeed also the case
for explicitly filtered LES when using the exact closure. Thus, to render our models useful
in practical, but imperfect LES, we therefore suggest a first data-informed explicit closure
model based on a fit to the perfect model learned via DNN methods. We demonstrate that
the resulting model is suitable for LES and provides a stable and accurate closure. The
learned perfect closure term and the derived closure can thus be seen as a starting point
towards general data-based SG models and intelligent model learning in fluid dynamics.
∂U
+ R(F (U )) = 0, operator filtered form or (2.5a)
∂t
∂U
+ R(F (U )) = 0, flux filtered form. (2.5b)
∂t
The two choices for the coarse grid equation are equivalent in the DNS sense, i.e.
as long as the error introduced by the discretization of the divergence operator R is
negligible (which is assumed for DNS), the filtering and the divergence commute. We
will in the following focus on the operator-filtered form (Eqn. 2.5a), and highlight the
reasons for why this form is more suitable for the perfect LES approach.
Note that Eqn. 2.5a already is the LES formulation of the problem, i.e. a constitutive
equation for the coarse grid solution U . Given the perfect LES model for this equation,
i.e. the exact temporal evolution of the spatial operator on the coarse grid R(F (U )),
Eqn. 2.5a would revert to an ordinary differential equation in time for U . We therefore
define a perfect LES by the following two conditions: i) the solution to the corresponding
equation must be U , and ii) all terms must be computed on a coarse grid. To arrive at
Guidelines for authors 5
a usable LES formulation that includes the coarse grid operator applied to the filtered
solution field, we append Eqn. 2.5a by the appropriate temporal and spatial closures:
The term R(Fe (U )), in Re represents the discretized spatial operator, i.e. the discrete
representation of the divergence and the flux F is computed from the filtered solution
U ; and the temporal closure term ∂U ∂U
∂t − ∂t , arriving at
∂U e (U )) − R(F (U )) + ∂U ∂U
+ R(F
e (U )) = R(F − . (2.6)
∂t | {z } | {z ∂t }
∂t
spatial closure
temporal closure
Under the common assumptions that the timestep of a practical LES is small, that the
time discretization is sufficiently accurate and that the filter commutes with the time
derivative, we neglect the temporal closure term in the following and focus on the spatial
closure only, arriving at the constitutive equation for a perfect LES:
∂U e (U )) − R(F (U )) .
+ R(F
e (U )) = R(F (2.7)
∂t | {z }
perfect closure model
This is the exact LES formulation. Indeed, as long as the RHS of this equation remains
exact, the solution to Eqn. 2.7 remains U , regardless of the specific discretization operator
Re and the filter (). It also highlights the subtle fact that a SG model always needs to fulfil
a double purpose: to provide a suitable closure for the unknown subgrid terms R(F (U ))
and to account for the discretization operator applied to the grid-resolved terms. For the
ideal closure this means that the latter terms cancel exactly, thereby essentially negating
the discretization effects.
Note that if we had chosen the flux-filtered version of Eqn. 2.5, for an LES, a discretization
of the divergence operator onto the coarse grid R e would have been introduced. Unless
for special cases, this discretization does not commute with the filtering, and thus
cannot lead to the perfect LES formulation sought here. Indeed, the closure term often
given for the momentum equations of the incompressible Navier-Stokes equations, e.g.
τ11 = −uu, can thus in general not recover U . The exception to this statement is a grid
converged explicitly filtered LES, in which the discretization itself becomes irrelevant
when the filtered equation is solved for the grid spacing h → 0 and thus filtering and
discrete divergence commute. Alternatively, specifically designed discretization operators
which have the commutation property could be used. However, both approaches are
very involved for the perfect LES and - more importantly to our discussion - not a
meaningful basis for our attempt to generate DNN learned closure models for practical
LES applications. Due to these reasons, we prefer the operator filtered form for our
investigations, hence, the closure terms include the discrete operator in order to reproduce
the filtered state U .
Returning to Eqn. 2.7, note that all the terms occurring must exist only on the coarse
scales, i.e. no DNS grid is required and the equation can be solved on the LES grid.
However, since the RHS depends on the unfiltered solution U , application of this approach
is limited to specifically designed test cases where prior DNS information is available at
every LES time step and temporal integration errors are assumed to be negligible (Nadiga
& Livescu 2007; De Stefano & Vasilyev 2004). In practical LES, Eqn. 2.7 is therefore
replaced by
∂ Û
+ R(F
e (Û )) = M
f(Û , Ck ) , (2.8)
∂t | {z }
imperfect closure model
6 A. D. Beck, D. G. Flad and C.-D. Munz
where a modelling term M (typically dependent on parameters Ck ) is introduced.
This approach generally leads to a solution Û 6= U , since the model M is not exact
and its discretization errors may remain unclosed as well. Obviously, Eqn. 2.8 has
the decisive advantage of being numerically solvable without DNS information. From
a model development standpoint however, Eqn. 2.7 provides the exact closure terms
e (U )) − R(F (U )), which we use a training data for to construct a data-based, discrete
R(F
model term M f.
This completes the construction of the perfect closure model (see Eqn. 2.7), which is
also stored at the corresponding time intervals.
Remark I: All these data preparation steps are done as a pre-processing to the LES
computations. Since the full DNS solution needs to be stored at a large number of time
steps, these operations are very expensive in terms of computational and storage costs.
For example, storing the information necessary to compute the DNS-to-LES operation
(U and R(F (U ))) at ∆t = 4e−5 T ∗ for 0.2 T ∗ requires approx. 55 TByte storage space.
Remark II: In defining the perfect closure model, we have made two choices: The selection
of the DNS-to-LES operator and the choice of the LES operator itself. While the first
choice defines the coarse solution U , the second one is - in a perfect LES approach only -
completely arbitrary, as it cancels out by design. We have confirmed these observations
by numerical experiments by selecting various filter shapes and discretizations options
within the DGSEM framework.
Remark III: We have analyzed the contribution of the model term and its components
to the kinetic energy balance by computing the volume integral over the dot product
of the momentum equations from Eqn. 2.7 and the velocity vector. We found that the
e (U )) − R(F (U )) and its DNS component are dissipative
full perfect closure term R(F
as expected. The LES operator in the model term is slightly antidissipative due to our
choice of a dissipative baseline scheme.
With the perfect closure model in place, we can now compute the actual LES. In order
to achieve a consistent discretization for the perfect LES solution, the discretization must
match the operator as described in the preprocessing step above. The stored closure terms
are read from disk and introduced as a source term to the discretization at each timestep.
Fig. 1 shows the u-velocity on a slice through the turbulent field. The upper left
pane shows the DNS solution, the upper middle pane the filtered solution on the LES
grid after the application of the DNS-to-LES operator. In the upper right hand pane,
the resulting computed perfect LES field obtained by solving Eqn. 2.7 is shown. The
marginal differences can be attributed to numerical round-off and temporal integration
errors. For comparison, in the lower row the corresponding results for a no-model LES
(where the discretization error of the LES operator serves as an implicit closure model)
and two LES with Smagorinsky closure are shown (Smagorinsky 1963). Note that this
LES method using the described scheme along with a Smagorinsky model was shown to
be the state of the art for LES with DG schemes in Flad & Gassner (2017). While the
position and magnitude of the large scale structures remain relatively stable, their extent
and shape is clearly affected by the imperfect model. For the small scales, this effect is
more pronounced, as spurious artefacts occur. This investigation highlights the facts that
only the perfect LES approach can recover U , while imperfect closure models lead to a
solution Û 6= U .
In Fig. 2, the energy spectra and the temporal evolution of the kinetic energy for the
8 A. D. Beck, D. G. Flad and C.-D. Munz
Figure 1. Iso-contours of u-velocity, shown in the x-z slice at y = 3.0 and t = 1.6. upper
left: DNS; middle: filtered DNS; right: perfect LES; lower left: no model LES; middle: LES,
Cs = 0.05; right: LES, Cs = 0.17. The corresponding grid cells are also shown.
0.05
4 PPW
3 PPW
DNS filtered
LES, perfect model
0.04 0.65 LES, no model
LES, Cs=0.05
LES, Cs=0.17
0.03
0.6
Ekin
E(k)
0.02
DNS Filtered
0.55
DNS
LES, perfect model
LES, no model
LES, Cs=0.05
LES, Cs=0.17
0.5
0.01
k t
Figure 2. Left: Spectra of kinetic energy at t = 1.6 T ∗ , cut-off frequencies for 3 and 4 points
per wavelength are also shown; Right: Temporal evolution of kinetic energy.
different LES approaches are compared to the filtered DNS result. As expected from
Fig. 1, the perfect model LES is in excellent agreement with the filtered data, while
the no-model LES (using otherwise the same discretization as described above) lacks
sufficient dissipation, which results in a high frequency built-up. Adding a Smagorinsky
model to this discretization increases the overall dissipation, but leads to the typical
tilted spectral distribution, where low wave number are too energy-rich, while those near
the cut-off are damped too strongly.
Guidelines for authors 9
1.8 A
1.6
1.4 Run 10
1.2
Run 11
Run 14
10-1
B
1 Run 15
t(-2.2)
0.8
0.6
Ekin
E(k)
C
-2
10
0.4
0.2
-3
10
Figure 3. Left: Temporal evolution of kinetic energy for 4 selected DHIT runs; Right: Temporal
evolution of the kinetic energy spectrum of run 11 from t = 0 to T = 2.0 T ∗ ; A: initial spectrum,
5
B: start of data collection T = 1.0 T ∗ , C: end of data collection T = 2.0T ∗ , dashed line: k− 3
To summarize, the discussion herein and the numerical experiments conducted have
established the equations for a perfect LES computation (without the need for a sec-
ond explicit filtering to essentially remove the discretization operator effects) and have
demonstrated how to compute the perfect LES solution with the help of a pre-computed
perfect model. With this method in place, we now employ the established framework to
generate training data to find a DNN approximation to the unknown term R(F (U )) in
this perfect closure.
to gauge whether the chosen input labels have a non-vanishing correlation to the labels,
i.e. if it possible for a DNN to find a generalized mapping. From the coefficients listed in
Tbl. 1 computed over all available training data pairs, we can conclude that in particular
the LES operators are reasonable input features for the DNN. During our training
process, we found that reducing the input features to the either the velocities or the LES
operators only impeded training success significantly, and we thus have chosen both sets
as inputs features.
Guidelines for authors 11
a, b CC(a, b)
With these definitions in place, the task of the DNN is then to find a mapping M by
training on the available, designated training data:
M : R6 × p × p × p → R3 × p × p × p , given 18 × nsamples × nelems training pairs (x̂, ŷ)
(2.13)
Details about the DNN architecture, training and validation will be discussed in the next
section.
Figure 4. Left: A single artificial neuron, consisting of a linear mapping of the inputs X to z(X)
with the weights ωi and bias b following by a non-linear activation Y (X) = a(X) = g(z(X)).
Right: A multilayer perceptron with 2 hidden layers H1 and H2 , input vector X and output
vector Y . The arrows indicate information flow in a forward pass and represent the trainable
weights.
the network, which means that an optimum does not necessarily exist and that the non-
convex optimization problem becomes more costly. Typically, for ANNs, this optimization
is conducted in two steps (Rumelhart et al. 1986; Werbos 1990):
(i) The backpropagation or backward pass through the network, which computes the
partial derivatives of the cost function w.r.t. to all weights in the network via the chain
rule, i.e. ∂C(Ŷ∂W
,Y (W ))
.
(ii) The optimization step updates the weights according to the gradient descent
method, where the specific form of the gradients used depends on the chosen optimization
method. The size of each parameter increment depends on the specified learning rate.
Computing the weight updates is achieved by the mini-batch gradient descent
method (Ioffe & Szegedy 2015), in which the full training set is split into smaller mini
batches on which the training is conducted. This approach is the most widely used in
ANN training and provides a compromise between memory efficiency and computational
cost. During training, progress is monitored via the cost function of the training set. A
separate validation set, drawn randomly from the training data but not taking part in
the training process itself, it used to check the generalization success of the network and
to detect overfitting.
In the following section, we will describe the specific network architecture employed
in this project (Sec. 3.1) and give details on its implementation and hyperparameters
(Sec. 3.2).
∆i /2 ∆j /2 ∆k /2
X X X
l l−1 l−1
Zijk = Wmno Ai+m j+n k+o + bl−1
mno ,
m=−∆i /2 n=−∆j /2 o=−∆k /2
convll−1 (3.3)
Alijk l
= g(Zijk ),
where ∆i , ∆j , ∆k are the sizes of the kernel in the given direction i.e. the extension
l−1
of the local receptive field and Wmno denotes the entries of the filter kernel. Note that
Eq. 3.3 can be formalized as a discrete convolution operation (with added bias term)
of the input tensor Al−1 with the filter W l−1 , with a subsequent application of point-
wise non-linearity. The choice of the kernel sizes and the treatment of the boundary
regions are open hyper parameters of CNNs. Thus, by design, CNNs are closely related
to MLPs, but observe the dimensionality of the original data and replace the global
matrix multiplication with a local, multi-dimensional convolution filter.
For a given filter kernel Wmnol−1,g
, the activations Al,g
ijk computed from Eq. 3.3 are often
termed the feature map associated with the respective kernel. In each layer of a CNN, an
arbitrary number of filters can be applied, i.e. the number of feature maps (each being
determined by one of the filters) increases accordingly. The stacked feature maps Al,g
form the activation of the layer Al . Note that by adding CNN layers to the network,
the overall receptive field of a single neuron in the deeper layers usually increases, as its
domain of dependence indirectly includes larger and larger inputs fields. In addition,
in deeper layers a combination of feature maps from the previous layers leads to a
hierarchical representation of the input data, which can then be used to generate an
efficient representation of the input data (so-called autoencoding) (Vincent et al. 2010)
14 A. D. Beck, D. G. Flad and C.-D. Munz
Figure 5. A single convolutional layer for a 2D input activation Al−1ij , a filter kernel size of
3 × 3 and feature maps Al,g , where g denotes an instance of the filter kernel W l−1,g , g = 1, ..., f
. The ⊗ operator describes the discrete convolution, the addition of the pointwise bias term and
the non-linear activation function have been omitted for clarity.
or to approximate the target function more efficiently (Lee et al. 2009). Fig. 5 gives a
schematic impression of the operations in a single convolutional layer for 2D data.
While related in their design to MLP architectures, CNNs ameliorate a number of
shortcomings of the former. Due to the local connectivity of CNNs, the number of
trainable weights is significantly reduced, which makes model training more efficient
and robust and allows the construction of deeper networks. For a given feature map,
the filter kernels are constant among all neurons, i.e. the same filter function is applied
to whole the input field. This so-called weight sharing makes CNNs shift-invariant and
enables the extraction of hierarchical features.
As for MLPs, a large number of design choices and hyperparameters exist for
CNNs, which require careful algorithm design and experimentation. Nonetheless,
for multidimensional data, CNNs are the current state of the art and have replaced MLP
architectures.
Figure 6. A single residual block. The underlying mapping is conceptually split into two parts,
G(Al−3 ) = F (Al−3 ) + h(Al−3 ), where h is a linear function of its input and F a stack of non-
linear convolutional layers. According to He et al. (2016), choosing both f and h as the identity
is optimal.
Figure 7. The RNN architecture used for learning the LES closure terms. The number of
residual blocks denotes by the dashed box is variable. For the input and output tensors, the
fourth dimensions (of sizes Mx̂ and Mŷ , respectively) denoting the specific feature are omitted,
and both are shown for a mini batch size of 1 for the sake of clarity. The isotropic kernel size k
is and the number of feature maps nf is shown for each 3D convolution operation, BN denotes
the batch normalization and the non-linear activation layers are labelled ReLu.
maps. The networks investigated in this work are summarized in Tbl. 2. Additional
hyperparameters that complete the network design are:
• Activation function: The optimal choice of the activation function is still ongoing.
The Rectified Linear unit (ReLu) is the current state of the art (Nair & Hinton 2010)
and avoids saturation problems of previously favoured asymptotic functions. It is used
exclusively in this work. We briefly investigated optimized variants (Ramachandran et al.
2018), but found no consistent improvement in network accuracy for our cases.
• Batch normalization: The input features to all layers in the residual block, i.e. the
activations from the previous layers, are normalized for each training batch. This method
has been shown to increase learning speed and robustness by reducing the sensitivity
of the optimization process to changing input distributions deep within the network
(internal covariate shift, (Ioffe & Szegedy 2015)).
• Cost function: We choose the standard squared error costfunction for regression
problems. For single sample, i.e. for a pair (y ∈ Y, ŷ ∈ Ŷ ), where ŷ is the ground truth
label and y the network prediction, it is given as
where the square and the operators denote point-wise operations. The weight matrix
wLGL ∈ R p × p × p contains the three-dimensional tensor product of Legendre-Gauss-
Lobatto quadrature weights of degree p and is a re-application of the mass matrix of
the DGSEM scheme used as the LES operator. In effect, this rescales the elements of
each sample and avoids the bias introduced by the sampling in physical space due to
the non-uniform position of the LGL nodes. The overall cost C is then determined by
summation of all elements of Cŷn , n = 1, 2, 3 and over all samples in a given batch.
• Optimization procedure: The minimization of the cost function is conducted using
the mini-batch stochastic gradient descent method with the optimizing algorithm Adam
presented in Kingma & Ba (2014) and exponential decay learning rate adaptation. The
size of a mini-batch was chosen to be ≈ 250. Before each training epoch, the distribution
of samples to the mini-batches was randomized.
• Data augmentation: To increase the available training samples, we triple the number
Guidelines for authors 17
Network d nf 1 nf 2
RNN0 0 16 32
RNN1 1 16 32
RNN4 4 16 32
RNN8 8 16 32
MLP100 1 100
Table 2. Networks details. For the RNNs, d denotes the number of residual blocks, nf 1,2 the
number of features maps of the convolutional layers. The MLP according to Gamahara & Hattori
(2017) contains 1 hidden layer with 100 neurons.
RNN0
RNN1
1.2 RNN2
RNN4
RNN8
1.1 MLP 100
1
Cost C
0.9
0.8
0.7
Figure 8. Cost function C for different network depths d as a function of iteration number. The
results for the MLP network according to Gamahara & Hattori (2017) are shown for reference.
The validation costs are shown as solid lines, the training costs as dashed lines. Right: Zoomed-in
view of the validation costs.
4. Results
4.1. ANN Training Results
In this section, we report the results of training the network architectures defined in
Sec. 3.2 on the data described in Sec. 2.4. We report on a small number of network
18 A. D. Beck, D. G. Flad and C.-D. Munz
AN N
RNN0 R(F (U ))1 , R(F (U ))1 0.347676 0.712184 0.149090
AN N
R(F (U ))2 , R(F (U ))2 0.319793 0.663664 0.134267
AN N
R(F (U ))3 , R(F (U ))3 0.326906 0.669931 0.101801
AN N
RNN1 R(F (U ))1 , R(F (U ))1 0.414848 0.744746 0.164221
AN N
R(F (U ))2 , R(F (U ))2 0.397299 0.704188 0.263977
AN N
R(F (U ))3 , R(F (U ))3 0.392828 0.707352 0.131613
AN N
RNN2 R(F (U ))1 , R(F (U ))1 0.443292 0.756434 0.205861
AN N
R(F (U ))2 , R(F (U ))2 0.422572 0.718873 0.320142
AN N
R(F (U ))3 , R(F (U ))3 0.421324 0.720736 0.185260
AN N
RNN4 R(F (U ))1 , R(F (U ))1 0.470610 0.766688 0.253925
AN N
R(F (U ))2 , R(F (U ))2 0.450476 0.729371 0.337032
AN N
R(F (U ))3 , R(F (U ))3 0.449879 0.730491 0.269407
AN N
RNN8 R(F (U ))1 , R(F (U ))1 0.477211 0.763708 0.290509
AN N
R(F (U ))2 , R(F (U ))2 0.458047 0.728010 0.346132
AN N
R(F (U ))3 , R(F (U ))3 0.460305 0.732248 0.307202
AN N
MLP100 R(F (U ))1 , R(F (U ))1 0.254276 0.657802 0.117419
AN N
R(F (U ))2 , R(F (U ))2 0.230262 0.605015 0.091826
AN N
R(F (U ))3 , R(F (U ))3 0.231645 0.612368 0.065401
Table 3. Network training results. The cross correlation CC is given for a full sample as well as
for the inner elements of the three-dimensional tensor as well as for the outer or surface elements
separately.
and hyperparameter constellations only, as the focus of this work is not finding the
optimal network. For reference, we also report the results for MLP network as proposed
by Gamahara & Hattori (2017).
All networks were initialized with uniformly distributed random weights and were trained
over approx. 60, 000 mini-batch iterations, which corresponds to 50 full training epochs.
After each full epoch, the ANN was evaluated on the validation and training sets to
judge the generalization capabilities of the learned model. Fig. 8 shows the evolution of
the validation and training costs with iteration number. All networks under consideration
are able to learn from the data, i.e. their approximation of the target quantities improves
from the initial random state. In addition, both the validation and training costs continue
to drop or flatten out without showing an increase with higher iteration number. The only
exception can be observed in the right hand pane of Fig. 8 for the RNN8 architecture,
where a positive slope of the validation costs can be observed after approx. 40, 000
iterations. Combined with the fact that it is observed for the largest network under
consideration, this behaviour is a likely a sign of overfitting occurring during training.
In terms of network architecture, the achievable losses decrease with the number of
Guidelines for authors 19
Figure 9. Input features, labels and predictions for a pair (x̂, ŷ)test from the hidden test run.
Shown are iso-contours in the x-y slice at x = 3.0 and t = 1.5 T ∗ . First row: Label R(F (U ))1
AN N
from the test sample, predictions R(F (U ))1 of networks RNN4 (≈ 47% CC) and RNN0
(≈ 34% CC). Second and third row: Corresponding input features: coarse scale velocities U i and
e (U i )). The contour levels for each row are shown on the left.
LES operators R(F
residual blocks in the RNNs, i.e with the depth of the network. This observation is in
agreement with the findings for other general learning tasks as discussed in the previous
sections: Deep architectures have favourable generalization properties. In addition, the
asymptotic behaviour of the cost functions and the onset of overfitting for the deepest
RNN strongly suggest that a further reduction of the cost function through training is
not inherently limited by the chosen methodology, but by the available data only.
In order to further assess the accuracy of the learned models, Tbl. 3 lists the cross-
correlation between the predicted and ground truth labels for the test data set. As
discussed in Sec. 2.4, each training sample consists of three-dimensional tensors of shape
p×p×p (written in index notation as {y} = [0 : p−1, 0 : p−1, 0 : p−1]). In addition to the
overall cross-correlation, we can thus define an inner and surface cross-correlation, which
inner
are computed from the inner and outer subsets as {y} = [1 : p − 2, 1 : p − 2, 1 : p − 2]
surf inner
and {y} = {y} \ {y} of each sample. In Tbl. 3, we report these two additional
metrics alongside the overall cross-correlation CC. As deduced from the cost functions, the
20 A. D. Beck, D. G. Flad and C.-D. Munz
data in Tbl. 3 supports two important findings: Firstly, the networks are able to learn from
the data by not just reproducing a linear mapping of the inputs, but by generating a (non-
linear) combination of the features. Thereby, the resulting cross-correlation is significantly
higher than that of the input features, see Tbl. 1. Secondly, deeper RNNs learn more
successfully, i.e. produce a higher correlation of their predictions to the actual labels.
As discussed above, the achievable gains in cross-correlation saturate asymptotically for
d > 4 due to overfitting and the limited amount of training data. A third observation
observed from Tbl. 3 concerns the different approximation accuracies for the inner and
surface points of the training samples. For the inner points, CCs of over 0.7 can be learned
from the data, while the surface correlation is significantly weaker. This is likely due to
the non-isotropy of the data and the filter kernel at the element boundaries, which could
be remedied by additional training on new data sets.
Fig. 9 gives a visual impression of the learning results and the data involved. For a sample
from the test set, the true label R(F (U ))1 is shown in the upper left pane. The input
features, the coarse grid velocities U i and the LES operators R(Fe (U i )), are depicted in
the second and third rows, with the contour levels adjusted to reveal their structures. Note
that as according to the analysis reported in Tbl. 1, a weak positive correlation between
R(F (U ))1 and R(Fe (U 1 )) exists, which is noticeable visually. The velocities U i appear
uncorrelated to R(F (U ))1 , a notion also supported by Tbl. 1. Returning to the first row
AN N
of Fig. 9, the middle and right pane show the predicted closure term R(F (U ))1
for the RNN4 (CC ≈ 0.47) and RNN0 (CC ≈ 0.34) architectures. Both predictions are
capable of capturing the general scales of R(F (U ))1 , with the deeper network being in
better agreement than the shallow one.
In order to analyse the respective choice of features onto the training, we have repeated
the training of the RNN4 network for the input sets listed in Tbl. 5, where set 1
corresponds to the original input features as discussed Sec. 2.4 and 4.1. Sets 2 and 3
consider the velocities and the LES operators only, respectively, and yield considerably
lower training accuracy than set 1. From the data in Tbl. 4, the correlation between the
LES and DNS terms suggest that omitting the operator terms (set 2) reduces the learning
success. In addition, from theoretical considerations, the LES term is an approximation
of the low-pass filtered DNS term, so removing it from the input features hinders the
learning of the mapping. In set 3, the input velocities are omitted. Although Tbl. 4 shows
that only a very weak correlation exists between the velocities and the targets, the results
in Tbl. 5 demonstrate that the ANNs can create a considerably better generalization if
these terms are included. This might be an indicator that the network creates some form
of deconvolution of the velocities akin to the work by Maulik & San (2017), but this
requires further research. In set 4, we have included all the available coarse scale quantities
Guidelines for authors 21
e (U 1 ))
R(F 0.1894 -2.87e-04 -5.24e-04
e (U 2 ))
R(F -1.14e-03 0.1793 9.73e-05
e (U 3 ))
R(F -7.83e-04 9.29e-04 0.1787
Table 4. Correlation coefficients between targets R(F (U )) and available coarse grid features.
Set Features CC 1 CC 2 CC 3
1 e (U i )), i = 1, 2, 3
ui , R(F 0.4706 0.4505 0.4499
2 ui , i = 1, 2, 3 0.3665 0.3825 0.3840
3 e (U i )), i = 1, 2, 3
R(F 0.3358 0.3066 0.3031
4 e (U i )), i = 1, 2, 3
ρ, p, e, ui , R(F 0.4764 0.4609 0.4580
5 e (U 1 ))
u1 , R(F 0.3913
Table 5. Feature sets and resulting test correlations. CC i with i = 1, 2, 3 denotes the cross
AN N
correlation between the targets and network outputs CC(R(F (U )i ), R(F (U ))i ). Set 1
corresponds to the original feature choice from Sec. 2.4 and 4.1; Set 5 corresponds to the RNN4
architecture, but with features and labels for the u−momentum component only.
0.86
Ekin
Ekin
0.87260
0.84
0.82
0.87258
0.8
Figure 10. LES with direct ANN closure according to Eqn. 2.7. Left: Comparison of long-
term behaviour of different RNN models. Right: Short-term model behaviour for varying CFL
numbers. The result for an explicit closure with a Smagorinsky model with Cs = 0.17 as in
Eqn. 2.8 are shown as a reference.
In the next section, we will investigate the possibilities of constructing a stable LES
model from the learned data.
0.7 10-1
0.6
Ekin
E(k)
-2
10
0.5
Figure 11. Comparison of different LES closures for DHIT. Left: Evolution of kinetic energy.
Solid lines with symbols denote µAN N -type closures, dashed lines with symbols µOP -type LES
runs. A Smagorinsky model results for Cs = 0.17 and the no model results are also shown. Right:
Spectra of kinetic energy at t = 1.9 (unless stated otherwise).
In this section, we have demonstrated that a direct closure of Eqn. 2.7 with the
ANN-based model terms is not feasible due to the approximate nature of the closure
and operator cancelling. Instead, we have shown how to employ the learned model to
24 A. D. Beck, D. G. Flad and C.-D. Munz
construct a data-informed, adaptive eddy-viscosity type closure, which results in a stable
and accurate scheme. We note that this is a simple approach to constructing a closure
model, and that more elaborate modelling ideas based on ANN predictions of coarse grid
terms or fine grid reconstructions should be explored in the future.
The authors would like to acknowledge the support by the SimTech Cluster of Excel-
lence through project 5-21 and the High Performance Computing Center Stuttgart.
Guidelines for authors 25
REFERENCES
Abadi, Martı́n & others 2015 TensorFlow: Large-scale machine learning on heterogeneous
systems. Software available from tensorflow.org.
Barron, A. R. 1993 Universal approximation bounds for superpositions of a sigmoidal function.
IEEE Transactions on Information Theory 39 (3), 930–945.
Bassi, F. & Rebay, S. 1997 A high-order accurate discontinuous finite element method for the
numerical solution of the compressible Navier-Stokes equations. Journal of Computational
Physics 131 (2), 267 – 279.
Batchelor, G. K. & Townsend, A. A. 1948 Decay of isotropic turbulence in the
initial period. Proceedings of the Royal Society of London A: Mathematical, Physical
and Engineering Sciences 193 (1035), 539–558, arXiv: http://rspa.royalsocietypublish-
ing.org/content/193/1035/539.full.pdf.
Beck, Andrea D., Bolemann, Thomas, Flad, David, Frank, Hannes, Gassner,
Gregor J., Hindenlang, Florian & Munz, Claus-Dieter 2014 High-order
discontinuous Galerkin spectral element methods for transitional and turbulent flow
simulations. International Journal for Numerical Methods in Fluids 76 (8), 522–548.
Beck, Andrea D., Flad, David G., Tonhäuser, Claudia, Gassner, Gregor & Munz,
Claus-Dieter 2016 On the influence of polynomial de-aliasing on subgrid scale models.
Flow, Turbulence and Combustion 97 (2), 475–511.
Ben Driss, S., M., Soua, R., Kachouri & Akil, M.S. 2017 A comparison study between
MLP and convolutional neural network models for character recognition. In Real-Time
Image and Video Processing, , vol. 10223, pp. 10223 – 10223 – 11.
Bengio, Yoshua, Ducharme, Rjean, Vincent, Pascal & Jauvin, Christian 2003 A Neural
Probabilistic Language Model. Journal of Machine Learning Research 3 (Feb), 1137–1155.
Bojarski, Mariusz, Del Testa, Davide, Dworakowski, Daniel, Firner, Bernhard,
Flepp, Beat, Goyal, Prasoon, Jackel, Lawrence D., Monfort, Mathew,
Muller, Urs, Zhang, Jiakai, Zhang, Xin, Zhao, Jake & Zieba, Karol 2016 End
to End Learning for Self-Driving Cars. arXiv:1604.07316 [cs] ArXiv: 1604.07316.
Chasnov, J. R. 1995 The decay of axisymmetric homogeneous turbulence. Physics of Fluids
7 (3), 600–605, arXiv: https://doi.org/10.1063/1.868584.
Cybenko, George 1989 Approximation by superpositions of a sigmoidal function. Mathematics
of Control, Signals and Systems 2 (4), 303–314.
De Stefano, Giuliano & Vasilyev, Oleg V. 2004 “Perfect” modeling framework for
dynamic SGS model testing in large eddy simulation. Theoretical and Computational
Fluid Dynamics 18 (1), 27–41.
Flad, David, Beck, Andrea & Munz, Claus-Dieter 2016 Simulation of underresolved
turbulent flows by adaptive filtering using the high order discontinuous Galerkin spectral
element method. Journal of Computational Physics 313, 1–12.
Flad, David & Gassner, Gregor 2017 On the use of kinetic energy preserving DG-schemes
for large eddy simulation. Journal of Computational Physics 350, 782 – 795.
Funahashi, Ken-Ichi 1989 On the approximate realization of continuous mappings by neural
networks. Neural networks 2 (3), 183–192.
Gamahara, Masataka & Hattori, Yuji 2017 Searching for turbulence models by artificial
neural network. Physical Review Fluids 2 (5), 054604.
Garnier, Eric, Adams, Nikolaus & Sagaut, Pierre 2009 Large eddy simulation for
compressible flows. Springer Science & Business Media.
Gassner, Gregor J. & Beck, Andrea D. 2013 On the accuracy of high-order discretizations
for underresolved turbulence simulations. Theoretical and Computational Fluid Dynamics
27 (3), 221–237.
Haykin, Simon 2004 Neural networks: A comprehensive foundation. Neural networks 2 (2004),
41.
He, Kaiming, Zhang, Xiangyu, Ren, Shaoqing & Sun, Jian 2016 Deep residual learning
for image recognition. In Proceedings of the IEEE Conference on Computer Vision and
Pattern Recognition, pp. 770–778.
Hindenlang, Florian, Gassner, Gregor J., Altmann, Christoph, Beck, Andrea,
Staudenmaier, Marc & Munz, Claus-Dieter 2012 Explicit discontinuous Galerkin
methods for unsteady problems. Computers & Fluids 61, 86–93.
26 A. D. Beck, D. G. Flad and C.-D. Munz
Ioffe, Sergey & Szegedy, Christian 2015 Batch normalization: Accelerating deep network
training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167 .
Kingma, Diederik P & Ba, Jimmy 2014 Adam: A method for stochastic optimization. arXiv
preprint arXiv:1412.6980 .
Krizhevsky, Alex, Sutskever, Ilya & Hinton, Geoffrey E 2012 ImageNet Classification
with Deep Convolutional Neural Networks. In Advances in Neural Information Processing
Systems 25 (ed. F. Pereira, C. J. C. Burges, L. Bottou & K. Q. Weinberger), pp.
1097–1105. Curran Associates, Inc.
Kutz, J. Nathan 2017 Deep learning in fluid dynamics. Journal of Fluid Mechanics 814, 1–4.
LeCun, Yann, Bengio, Yoshua & Hinton, Geoffrey 2015 Deep learning. Nature
521 (7553), 436.
LeCun, Yann, Bengio, Yoshua & others 1995 Convolutional networks for images, speech,
and time series. The handbook of brain theory and neural networks 3361 (10), 1995.
LeCun, Yann, Boser, Bernhard E, Denker, John S, Henderson, Donnie, Howard,
Richard E, Hubbard, Wayne E & Jackel, Lawrence D 1990 Handwritten digit
recognition with a back-propagation network. In Advances in Neural Information
Processing Systems, pp. 396–404.
Lee, Honglak, Grosse, Roger, Ranganath, Rajesh & Ng, Andrew Y 2009 Convolutional
deep belief networks for scalable unsupervised learning of hierarchical representations.
In Proceedings of the 26th Annual International Conference on Machine Learning, pp.
609–616. ACM.
Ling, Julia, Kurzawski, Andrew & Templeton, Jeremy 2016 Reynolds averaged
turbulence modelling using deep neural networks with embedded invariance. Journal of
Fluid Mechanics 807, 155–166.
Maulik, R. & San, O. 2017 A neural network approach for the blind deconvolution of turbulent
flows. Journal of Fluid Mechanics 831, 151–181.
McCulloch, Warren S & Pitts, Walter 1943 A logical calculus of the ideas immanent in
nervous activity. The Bulletin of Mathematical Biophysics 5 (4), 115–133.
Milano, Michele & Koumoutsakos, Petros 2002 Neural network modeling for near wall
turbulent flow. Journal of Computational Physics 182 (1), 1–26.
Minsky, Marvin & Papert, Seymour 1969 Perceptron (expanded edition).
Nadiga, B. T. & Livescu, D. 2007 Instability of the perfect subgrid model in implicit-filtering
large eddy simulation of geostrophic turbulence. Phys. Rev. E 75, 046303.
Nair, Vinod & Hinton, Geoffrey E 2010 Rectified linear units improve restricted boltzmann
machines. In Proceedings of the 27th International Conference on Machine Learning
(ICML-10), pp. 807–814.
Oßwald, Kai, Siegmund, Alexander, Birken, Philipp, Hannemann, Volker & Meister,
Andreas 2016 L2roe: a low dissipation version of roe’s approximate riemann solver for
low mach numbers. International Journal for Numerical Methods in Fluids 81 (2), 71–86.
Peyrard, Clement, Mamalet, Franck & Garcia, Christophe 2015 A comparison between
multi-layer perceptrons and convolutional neural networks for text image super-resolution.
In VISAPP (1), pp. 84–91.
Ramachandran, Prajit, Zoph, Barret & Le, Quoc V 2018 Searching for activation
functions.
Rogallo, R.S. 1981 Numerical experiments in homogeneous turbulence. NASA TM-81315 .
Rosenblatt, Frank 1958 The perceptron: a probabilistic model for information storage and
organization in the brain. Psychological review 65 (6), 386.
Rumelhart, David E, Hinton, Geoffrey E & Williams, Ronald J 1986 Learning
representations by back-propagating errors. nature 323 (6088), 533.
Sarghini, F., de Felice, G. & Santini, S. 2003 Neural networks based subgrid scale modeling
in large eddy simulations. Computers & Fluids 32 (1), 97 – 108.
Schmidhuber, Jrgen 2015 Deep learning in neural networks: An overview. Neural Networks
61, 85 – 117.
Silver, David & others 2016 Mastering the game of Go with deep neural networks and tree
search. Nature 529 (7587), nature16961.
Simonyan, Karen & Zisserman, Andrew 2014 Very Deep Convolutional Networks for Large-
Scale Image Recognition. arXiv:1409.1556 [cs] ArXiv: 1409.1556.
Guidelines for authors 27
Smagorinsky, Joseph 1963 General circulation experiments with the primitive equations: I.
the basic experiment. Monthly Weather Review 91 (3), 99–164.
Szegedy, Christian, Liu, Wei, Jia, Yangqing, Sermanet, Pierre, Reed, Scott,
Anguelov, Dragomir, Erhan, Dumitru, Vanhoucke, Vincent, Rabinovich,
Andrew & others 2015 Going deeper with convolutions.
Tracey, Brendan D, Duraisamy, Karthikeyan & Alonso, Juan J 2015 A machine
learning strategy to assist turbulence model development. In 53rd AIAA Aerospace
Sciences Meeting, p. 1287.
Vincent, Pascal, Larochelle, Hugo, Lajoie, Isabelle, Bengio, Yoshua & Manzagol,
Pierre-Antoine 2010 Stacked denoising autoencoders: Learning useful representations
in a deep network with a local denoising criterion. Journal of Machine Learning Research
11 (Dec), 3371–3408.
Wallach, Izhar, Dzamba, Michael & Heifets, Abraham 2015 Atomnet: A deep
convolutional neural network for bioactivity prediction in structure-based drug discovery.
arXiv preprint arXiv:1510.02855 .
Werbos, P. J. 1990 Backpropagation through time: what it does and how to do it. Proceedings
of the IEEE 78 (10), 1550–1560.