High Performance Computing in Science and Engineering 16
High Performance Computing in Science and Engineering 16
High Performance Computing in Science and Engineering 16
Dietmar H. Kröner
Michael M. Resch Editors
High Performance
in Science and
Engineering ’16
High Performance Computing in Science
and Engineering ’16
Wolfgang E. Nagel • Dietmar H. KrRoner •
Michael M. Resch
High Performance
Computing in Science
and Engineering ’16
Transactions of the High Performance
Computing Center, Stuttgart (HLRS) 2016
Wolfgang E. Nagel Dietmar H. Kröner
Zentrum für Informationsdienste Abteilung für Angewandte Mathematik
und Hochleistungsrechnen (ZIH) Universität Freiburg
Technische Universität Dresden Freiburg
Dresden Germany
Michael M. Resch
Stuttgart (HLRS)
Universität Stuttgart
Front cover figure: Bag breakup event during the air-assisted atomization of a liquid fuel. The air
flow field is colored by particle IDs which depend on the creation time and their respective release
position at the inlet. Details can be found in “Smoothed Particle Hydrodynamics for Numerical
Predictions of Primary Atomization”, by S. Braun, R. Koch and H.-J. Bauer, Institut für Thermische
Strömungsmaschinen (ITS), Karlsruher Institut für Technologie (KIT), Karlsruhe, Germany on page
Part I Physics
The Illustris++ Project: The Next Generation of Cosmological
Hydrodynamical Simulations of Galaxy Formation . . . . .. . . . . . . . . . . . . . . . . . . . 5
Volker Springel, Annalisa Pillepich, Rainer Weinberger,
Rüdiger Pakmor, Lars Hernquist, Dylan Nelson, Shy Genel,
Mark Vogelsberger, Federico Marinacci, Jill Naiman,
and Paul Torrey
Hydrangea: Simulating a Representative Population
of Massive Galaxy Clusters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 21
Yannick M. Bahé, for the C-EAGLE collaboration
PAMOP Project: Computations in Support of Experiments
and Astrophysical Applications. . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 33
B.M. McLaughlin, C.P. Ballance, M.S. Pindzola, P.C. Stancil,
S. Schippers, and A. Müller
Estimation of Nucleation Barriers from Simulations of Crystal
Nuclei Surrounded by Fluid in Equilibrium . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 49
Antonia Statt, Peter Koß, Peter Virnau, and Kurt Binder
The Internal Dynamics and Early Adsorption Stages of
Fibrinogen Investigated by Molecular Dynamics Simulations .. . . . . . . . . . . . . 61
Stephan Köhler, Friederike Schmid, and Giovanni Settanni
Vorticity, Variance, and the Vigor of Many-Body Phenomena
in Ultracold Quantum Systems: MCTDHB and MCTDH-X . . . . . . . . . . . . . . . 79
Ofir E. Alon, Raphael Beinke, Lorenz S. Cederbaum,
Matthew J. Edmonds, Elke Fasshauer, Mark A. Kasevich,
Shachar Klaiman, Axel U.J. Lode, Nick G. Parker,
Kaspar Sakmann, Marios C. Tsatsos, and Alexej I. Streltsov
vi Contents
Peter Nielaba
In this section, eight physics projects are described, which achieved important
scientific results by using the CRAY XC40 (Hornet and Hazel Hen) of the HLRS.
Fascinating new results are being presented in the following pages on astrophysical
systems (simulations of galaxy formation, of massive galaxy clusters, and of
photodissociation), soft matter systems (simulations of nucleation in colloidal
systems and of dynamics and adsorption of fibrinogen), many body quantum
systems (simulations of ultracold quantum systems) and elementary particle systems
(simulations of nucleon observables and of the anomalous magnetic moment of the
The studies of the astrophysical systems have focused on the galaxy formation,
massive galaxy clusters, and on photodissociation of certain molecules.
V. Springel, A. Pillepich, R. Weinberger, R. Pakmor, L. Hernquist, J. Naiman,
D. Nelson, M. Vogelsberger, and F. Marinacci from Heidelberg (V.S., A.P., R.W.,
R.P.), Cambridge USA (L.H., J.N., M.V., F.M.) and Garching (D.N.), in their
project GCS-ILLU present results from a new generation of hydrodynamical
simulations (“Illustris++” project, AREPO code), including new black hole physics
and chemical enrichment models, using more accurate techniques and an enlarged
dynamical range. The authors reproduced the appearance of a red sequence of
galaxies, quenched by accreting supermassive black holes and computed disk galax-
ies populations with properties closely matching observational data. In addition,
the authors predicted magnetic field amplifications through small-scale dynamo
processes for galaxies of different sizes and types.
Yannick M. Bahé and the C-EAGLE collaboration from Garching used in their
HLRS project GCS-HYDA the GADGET-3 code to simulate 25 galaxy clusters with
high resolution (“Hydrangea” project). By the ongoing data analysis new insights
P. Nielaba ()
Fachbereich Physik, Universität Konstanz, 78457 Konstanz, Germany
e-mail: peter.nielaba@uni-konstanz.de
2 P. Nielaba
into the physics of galaxy formation in an extreme environment and on the growth
of the massive haloes, in which cluster galaxies are embedded, are achieved.
B M McLaughlin, C P Ballance, M S Pindzola, P C Stancil, S Schippers and A
Müller from the Universities of Belfast (B.M.M., C.P.B.), Auburn (M.S.P.), Georgia
(P.C.S.), and Giessen (S.S., A.M.) investigated in their project PAMOP atomic,
molecular and optical collisions on petaflop machines in order to support measure-
ments at synchrotron radiation facilities and to study photodissociation effects for
astrophysical applications. The Schrödinger and Dirac equations have been solved
with the R-matrix or R-matrix with pseudo-states approach, and the time dependent
close-coupling method has been used. Various systems and phenomena have been
investigated, ranging from X-ray and inner-shell photoionization in atomic oxygen
and argon ions, as well as in tungsten ions, the single-photon double ionization in
helium, to the photodissociation in SHC .
The studies of the soft matter systems have focused on nucleation barriers in
colloidal systems and on the dynamics and adsorption of fibrinogen.
A. Statt, P. Koß, P. Virnau and K. Binder from the University of Mainz present in
their project colloid a method to study the free energy barriers for homogeneous
nucleation of crystals from a fluid phase, which is not hampered by the fact
that the fluid-crystal interface tension in general is anisotropic. By Monte Carlo
simulations in the NpT ensemble, using the softEAO model for colloidal systems,
and by analyzing the equilibrium of a crystal nucleus surrounded by fluid in a small
simulation box in thermal equilibrium, the fluid pressure, chemical potential and
the volume of the nucleus have been computed to obtain the nucleation barrier.
Interesting deviations from the classical nucleation theory with spherical nucleus
assumptions have been discovered and analysed.
S. Köhler, F. Schmid and G. Settanni from the University of Mainz investigated
in their project Flexadfg dynamical properties of fibrinogen and of the initial adsorp-
tion stages of fibrinogen on mica and graphite surfaces by atomistic Molecular
Dynamics simulations. The adsorption simulations on mica showed a preferred
adsorption orientation in a reversible process without large deformations of the
protein, and the adsorption simulations on graphite showed an irreversible character
and a formation of a large quantity of protein-surface contacts which eventually lead
to deformations of the protein and the onset of denaturation.
In the last granting period, quantum mechanical properties of elementary particle
systems have been investigated as well as the quantum many body dynamics of
trapped bosonic systems.
O.E. Alon, R. Beinke, L.S. Cederbaum, M.J. Edmonds, E. Fasshauer, M.A.
Kasevich, S. Klaiman, A.U.J. Lode, N.G. Parker, K. Sakman, M.C. Tsatsos,
A.I. Streltsov from the Universities of Haifa (O.E.A.), Heidelberg (R.B., L.S.C.,
S.K., A.I.S.), Newcastle (M.J.E., N.G.P.), Tromso (E.F.), Stanford (M.A.K.), Basel
(A.U.J.L.), Wien (K.S.), Sao Paulo (M.C.T.) studied in their project MCTDHB ultra-
cold atomic systems by their method termed multiconfigurational time-dependent
Hartree for bosons (MCTDHB). The principal investigators have focused on
seven topics: (a) single shots imaging of dynamically created quantum many-
body vortices, (b) many-body tunneling dynamics of Bose-Einstein condensates
I Physics 3
and vortex states in 2D, (c) transition from vortices to solitonic vortices in 2D
trapped Bose-Einstein condensates, (d) variance as a sensitive probe of correla-
tions and uncertainty product of an out-of-equilibrium many-particle system, (e)
development of a multiconfigurational time-dependent Hartree method for fermions
(“MCTDH-X”) (f) trapped fermions escape, (g) composite fragmentation of multi-
component Bose-Einstein condensates.
C. Alexandrou, K. Jansen, G. Koutsou and C. Urbach from Nicosia (C.A., G.K.),
Zeuthen (K.J.) and Bonn (C.U.) investigated in their project GCS-Nops the inner
structure of the proton and other hadrons by lattice chromodynamics. By generating
the ensemble using directly the physical value of the pion and nucleon masses, the
principal investigators were able to compute the hadron spectrum, the axial and
tensor charges moments of parton distribution functions and the quark contents of
the nucleons.
P. Marquard and M. Steinhauser from Zeuthen (P.M.) and Karlsruhe (M.S.)
computed in their project NumFeyn multi-loop Feynman integrals for the electron
contribution to the anomalous magnetic moment of the muon, using the FIESTA
The Illustris++ Project: The Next Generation
of Cosmological Hydrodynamical Simulations
of Galaxy Formation
V. Springel ()
Astronomisches Recheninstitut, Zentrum für Astronomie der Universität Heidelberg,
Mönchhofstr. 12–14, 69120, Heidelberg, Germany
Heidelberg Institute for Theoretical Studies, Schloss-Wolfsbrunnenweg 35, 69118, Heidelberg,
e-mail: volker.springel@h-its.org
A. Pillepich
Max-Planck Institute for Astronomy, Königstuhl 17, 69117, Heidelberg, Germany
e-mail: pillepich@mpia-hd.mpg.de
R. Weinberger • R. Pakmor
Heidelberg Institute for Theoretical Studies, Schloss-Wolfsbrunnenweg 35, 69118, Heidelberg,
e-mail: rainer.weinberger@h-its.org; ruediger.pakmor@h-its.org
L. Hernquist • J. Naiman
Center for Astrophysics, Harvard University, 60 Garden Street, 02138, Cambridge, MA, USA
e-mail: lars.hernquist@cfa.harvard.edu; jill.naiman@cfa.harvard.edu
D. Nelson
Max-Planck Institute for Astrophysics, Karl-Schwarzschild-Str. 1, 85740, Garching, Germany
e-mail: dnelson@mpa-garching.mpg.de
S. Genel
Department of Astronomy, Columbia University, 550 W. 120th St., 10027, New York, NY, USA
e-mail: shygenelastro@gmail.com
M. Vogelsberger • F. Marinacci • P. Torrey
Kavli Institute for Astrophysics and Space Research, MIT, 02139, Cambridge, MA, USA
e-mail: mvogelsb@mit.edu; fmarinac@mit.edu; ptorrey@mit.edu
1 Introduction
our code AREPO [9]: like in AMR, the volume of space is discretized into many
individual cells, but as in SPH, these cells move with time, adapting to the flow
of gas in their vicinity. As a result, the mesh itself, constructed through a Voronoi
tessellation of space, has no preferred directions or regular grid-like structure. Over
the past few years, we have shown that this new type of approach for simulating
gas has significant advantages over the other two methods, particularly for large
cosmological simulations like Illustris [10–13].
One of the major achievements of the Illustris simulation is its ability to track the
small-scale evolution of gas and stars within a representative portion of the Universe.
The calculation yielded a population of thousands of well-resolved elliptical and
spiral galaxies, reproduced the observed radial distribution of galaxies in clusters
and the characteristics of hydrogen on large scales, and at the same time it matches
the metal and hydrogen content of galaxies on small scales. However, the analysis
of Illustris has also revealed a number of tensions with observational data. In
particular, it has become clear that the physical model used for the so-called radio-
mode feedback [14] of accreting supermassive black holes has been too strong and
violent at the scale of galaxy groups and low-mass clusters, causing a depletion of
their baryon content. At the same time, this physical model still proved insufficient
to quench the central galaxies in these systems to the required degree, causing
these galaxies to become too massive. Other problems we identified were the
normalization of the faint-end of the galaxy luminosity function, too large galaxy
sizes, and a lack of a pronounced bimodality in the galaxy color distribution. In
addition, important physical ingredients such as magnetic fields were still missing.
This provides the motivation for the ‘Illustris++’ project that we currently carry
out as a GCS large-scale project on HazelHen at HLRS. The primary scientific goal
of our project is to calculate new, unprecedentedly large hydrodynamic simulation
models of the universe that improve upon the earlier Illustris project in several
important respects. Most importantly, we aim to improve the feedback models by
using a newly developed model for black hole accretion and its associated energy
release, by employing a considerably improved multi-species chemical enrichment
model, by adjusting the treatment of galaxy winds, and last but not least, by adding
magnetic fields to our simulations, opening up a rich new area of predictions that are
still poorly explored, given that the body of cosmological magneto-hydrodynamic
(MHD) simulations is still very small [15–22]. In particular, we aim to study the
strength of magnetic field amplification through structure formation as a function of
halo and galaxy size. We will also be able to quantify for the first time the expected
distribution of magnetic field properties for galaxies of a given size, and to explore
the role of winds and strong feedback events in “polluting” the intergalactic-medium
with magnetic fields. In addition, we aim for a larger number of resolution elements,
and a larger simulation volume than realized previously. This is necessary to study
the regime of galaxy clusters better (which are rare and can only be found in a
sufficiently large volume), and to allow a sampling of the massive end of the galaxy
and black hole mass functions.
At the time of this writing, our project is still running, and only a subset of the
production calculations have finished, with some of the main runs presently being
8 V. Springel et al.
computed. In this article, we describe some of the developments undertaken for the
project, detail practical and technical aspects of our work, and the status of our runs.
We also describe a few preliminary results in an exemplary fashion.
The simulations carried out in Illustris++ represent a significant challenge not only
in terms of size and spatial dynamic range, but also in terms of the dynamic
range in timescales. In particular, the strong kinetic AGN feedback, which couples
to the densest gas in galaxies, induces very small timesteps for a small fraction
of the mass. Over the course of 13 billion years of cosmic evolution, we need
up to 107 timesteps in total. This would be completely infeasible with time
integration schemes that employ global timesteps, but even for the individual
timestepping we use in AREPO, this represents a formidable problem. It can
only be tackled if the computation of sparsely populated timesteps can be made
extremely fast so that they do not dominate the total CPU time budget. This in turn
requires elimination of overheads that touch the full particle/cell system on such
For Illustris++, we have developed a novel hierarchical timestepping scheme
in our AREPO code that solves this in a mathematically clean fashion. This
is done by recursively splitting the Hamiltonian describing the dynamics into
a ‘slow’ and a ‘fast’ system (similar to [26]). One important feature of this
time integration scheme is that the split-off fast system is self-contained, i.e. its
evolution does not rely on any residual coupling with the “slow” part. This means
that our goal, namely that poorly populated short timesteps can be computed
without touching any parts of the system living on longer timesteps, can be
For Illustris++, we have also improved our modelling of chemical enrichment, both
by using updated yield tables that account for the most recent results of stellar
evolution calculations, and by making the tracking of different chemical elements
more accurate and informative. The most important technical measure to achieve
this has been the introduction of a fiducial “other chemical elements” mass bin, such
that together with the explicit tracking of 9 chemical elements (H, He, C, N, O, Ne,
Mg, Si, Fe), the metal abundance vector accounts for the full mass content of every
cell or star. Since we do a spatial reconstruction for every element individually,
the previous code could arrive at extrapolated flux vectors at cell interfaces with
an unphysical sum of the 9 explicitly tracked elements, leaving for the other
elements a negative contribution. In our new treatment, the density of these other
elements is reconstructed as well, and the abundance pattern is renormalized after
extrapolation, thereby always leading to physically viable chemical compositions at
flux exchanges.
10 V. Springel et al.
The other significant change we made is that we now account with special
chemical tagging fields separately for metals produced by asymptotic giant branch
(AGB) stars, type-II supernovae, and type-Ia supernovae. This has not been done
before in such hydrodynamical simulations, and opens up a rich additional set of
analysis possibilities which are largely unexplored thus far. Given that the metallic-
ity patterns in the circumgalactic medium are emerging as a critical observational
diagnostic and constraint for the feedback physics, this is a very timely extension of
our modelling capabilities.
For certain problems, the original implementation of AREPO reached only first-
order convergence in the L1 norm. In [27] we have shown that this can be rectified
by simple modifications in the time integration scheme and the spatial gradient
estimates of the code, both acting together to improve the accuracy of the code. As a
result, the new formulation used for Illustris++ is now second-order accurate under
the L1 norm even in unfavorable situations. As a welcome side effect, conservation
of angular momentum is substantially improved, too. We have found that these
changes can significantly improve the results of smooth test problems. On the
other hand, we also showed that cosmological simulations of galaxy formation
are unaffected for well resolved galaxies, demonstrating that the numerical errors
eliminated by the new formulation do not impact these simulations significantly.
Nevertheless, the improved accuracy of the new formulation is clearly to be
preferred, and we expect that small, poorly resolved galaxies are rendered more
accurately in Illustris++ than before, corroborating the advantage of our moving-
mesh technique compared to SPH or AMR codes in this regime.
We have also made important improvements to the ideal magnetodynamics
(MHD) solver in our code [28], primarily in the form of an additional timestep
criterion that controls the size of the Powell source terms applied for divergence
control. Previously, this was not checked explicitly, instead the timestep of a
cell was determined only by the Courant condition and a kinematical timestep
constraint. It could thus happen under rare conditions that the source terms would
apply order unity corrections to the magnetic field over the course of a timestep,
leading to a relatively large local error. In our new code used for Illustris++, this
is now safely prevented, increasing both the accuracy and robustness of our MHD
In the AREPO code, we need to carry out, at multiple places, parallel, distributed
tree walks that serve to calculate, e.g., the short-range gravitational field, the
Galaxy Formation in Illustris++ 11
for the application code on the compute nodes. We could initially not use more
than 3900 MB per core in large partition runs without sometimes falling victim
to OOMs, caused by memory needs of the I/O subsystem and MPI buffers. How
to work around this reliably required a lot of experimenting and testing on our
After finalizing our modification of the physical model of Illustris++, we
adjusted our plans for the primary science runs in the project by first carrying out
“IllustrisPrime”, a very demanding simulation with 18 billion resolution elements
in a 75 h1 Mpc box similar to the original Illustris run, but now using the new
full physics model of Illustris++ with all its improvements, as well as including
magneto-hydrodynamics (MHD). We also now adopted the newest cosmological
models as determined by the PLANCK Satellite. This cosmological simulation is the
first that includes MHD and resolves galaxy formation at high resolution, opening
up many possibilities for novel predictions. Also, IllustrisPrime will be ideal to
convincingly demonstrate that we can solve the problems at the bright end of the
galaxy luminosity functions that have troubled all previous simulations in the field,
including our older Illustris simulation that presently defines the state-of-the-art in
this area.
In Table 1, we give an overview of our primary production simulations, omitting
smaller test calculations. We are currently still in the process of finalizing one of
Table 1 Overview of primary production runs carried out by the Illustris++ project. All simu-
lations use PLANCK cosmological parameters and are carried out with a tracer particle method
that is faithful with respect to the mass flux in the system between all baryonic components. We
typically use two Monte Carlo tracers per Voronoi cell, i.e. Ntracers D 2 Ncells . All simulations
follow more than 13 billion years of cosmic evolution, with smallest timesteps of order a few
thousand years
Symbolic name Boxsize Ndm Ncells MPI ranks Physics Run status
L75n1820TNG 75 h1 Mpc 18203 18203 10;752 Final full Advanced
physics model
L75n1820MF 75 h1 Mpc 18203 18203 12;000 Alternative Finished
AGN feedback
L75n910TNG 75 h1 Mpc 9103 9103 2688 Final full Finished
physics model
L75n455TNG 75 h1 Mpc 4553 4553 336 Final full Finished
physics model
L205n2500TNG 205 h1 Mpc 25003 25003 24;000 Final full Started
physics model
L35n2160TNG 35 h1 Mpc 21603 21603 16;320 Final full Started
physics model
L25n512TNG 25 h1 Mpc 5123 5123 1200 Final full Finished
physics model
L12.5n512TNG 12:5 h1 Mpc 5123 5123 1200 Final full Finished
physics model
Galaxy Formation in Illustris++ 13
our main production runs using 10,752 cores on HazelHen, while IllustrisPrime
has already finished. In addition, we are carrying out two further large calculations
which just have been started. They are substantially larger and either excel in volume
or mass resolution, respectively. We have already transferred more than 240 TB of
production data to the Heidelberg Institute of Theoretical Studies, in part by using
fast gridftp services offered by HLRS. From the ongoing runs, we expect about
200 TB of additional data, which we will also transfer to Heidelberg for the scientific
analysis in the coming years.
Fig. 1 Thin projections through the L75n1820MF simulation, showing the gas density field, the
metallicity, the magnetic field strength, the dark matter density, and the stellar density
Galaxy Formation in Illustris++ 15
Fig. 2 Stellar mass distributions of two disk galaxies in halos of mass 8:3 1011 Mˇ (left) and
2:0 1012 Mˇ (right), respectively, in face-on (top) and edge-on projections (bottom). The stellar
colors are assigned according to their age and metallicity
cosmic rays in the Milky Way, and for analyzing the deflections of ultra-high energy
cosmic rays of extra-galactic origin.
In Fig. 4, we show an analysis of the typical magnetic field strengths reached in
halos of different size. We here plot radial profiles for four different halo masses,
stacking up to 50 halos in a narrow mass range around the virial masses 1010 ,
1011 , 1012 , and 1013 Mˇ . We see that field strengths of several G are reached
in the centres of galaxies in halos of masses 1011 1012 Mˇ , in good agreement
with typical observed fields. In smaller halos, the fields are still notably weaker,
presumably because here they have not yet been amplified as efficiently. In larger
halos, 1013 Mˇ and beyond, they are also weaker in the centers, but for a different
reason. Here some of the magnetic flux is expelled by strong nuclear outflows
driven by AGN feedback. In any case, the strength of the simulated fields implies a
remarkable amplification relative to the primordial fields that we seeded in the initial
conditions. This initial field strength is empirically largely unconstrained, but our
results reached for the field strength in galactic centres do not depend on the value
we used (which in our case was 1011 Gauss) over a wide dynamic range, because
the magnetic amplification processes stop once the dynamo processes responsible
for the exponential amplification saturate. This happens when the magnetic pressure
becomes comparable to the thermal pressure.
16 V. Springel et al.
Fig. 3 Magnetic field structure in a disk galaxy (the one displayed in the left-hand panel of Fig. 2),
overlaid over a rendering of the gas density structure (color-scale in the background). The length
of the drawn vectors is made only weakly dependent on the magnetic field strength (as / jBj1=4 )
in order to see more of the field structure in the regions with weaker fields
On large scales, however, the magnetic field strengths reached in voids still reflect
the initial field. This is clearly seen in Fig. 5, which gives a phase-space diagram of
gas density versus magnetic field strength. The correlation B / 2=3 (indicated as
a solid line) reflects adiabatic expansion/compression of the initial field set at the
starting redshift z D 127. At baryonic overdensities of around 100, we see that
much larger fields are created. This is in part due to the amplification of the field
through large-scale shearing flows and in part due to a small-scale dynamo driven
by star formation and galactic wind feedback on small scales.
In sum, our calculations demonstrate that already an extremely tiny magnetic
field left behind by the Big Bang is sufficient to explain orders of magnitude larger
field strengths observed today. Interestingly, the magnetic field strength found in
the simulation agrees very well with the values measured for the Milky Way and
neighboring galaxies. This is remarkable given that there are no free parameters
influencing the magnetic field amplification that could be tuned to modify the final
field strength reached in our simulated galaxies.
Galaxy Formation in Illustris++ 17
10.00 10.00
log(M ) = 10.0 log(M ) = 11.0
200 200
1.00 1.00
B [ μG ]
B [ μG ]
0.10 0.10
0.01 0.01
10.00 10.00
log(M ) = 12.0 log(M ) = 13.0
200 200
1.00 1.00
B [ μG ]
B [ μG ]
0.10 0.10
0.01 0.01
Fig. 4 Spherically averaged profiles of the mean magnetic field strength in halos of different mass.
Each panel shows a stacked set of up to 50 halos in a narrow mass bin around a different virial mass,
as labeled in each panel
Fig. 5 Phase-space diagram of the magnetic field strength versus gas overdensity at z D 0 in one
of our Illustris++ simulations. The line shows the locus corresponding to adiabatic compression or
expansion of the initial field strength
5 Conclusions
z = 0.0
z = 1.0
z = 2.0
0.3 z = 4.0
z = 7.0
dM / dlog
10-2 100 102 104 106 108
ρ/ <
Fig. 6 Distribution of the gas phase metal content with respect to baryonic overdensity at different
epochs, as labelled. Each distribution is normalized to the total metal mass in the gas at the
corresponding time
models further. Our simulation predictions for the gas around galaxies, the so-
called circum-galactic medium (CGM) are particular timely, as the Cosmic Origins
Spectrograph (COS) on board of the Hubble Space Telescope (HST) has provided a
wealth of absorption line data probing this phase. In addition, our simulations allow
us to make novel, testable predictions for the magnetic field strength in different
environments, and its correlation with other galaxy properties.
Acknowledgements The authors gratefully acknowledge computer time through the project
GCS-ILLU on Hornet/HazelHen at HLRS. We acknowledge financial support through subproject
EXAMAG of the Priority Programme 1648 ‘SPPEXA’ of the German Science Foundation, and
through the European Research Council through ERC-StG grant EXAGAL-308037, and we would
like to thank the Klaus Tschira Foundation.
1. Springel, V., White, S.D.M., Jenkins, A., Frenk, C.S., Yoshida, N., Gao, L., Navarro, J.,
Thacker, R., Croton, D., Helly, J., Peacock, J.A., Cole, S., Thomas, P., Couchman, H., Evrard,
A., Colberg, J., Pearce, F.: Nature 435, 629 (2005). doi:10.1038/nature03597
2. Boylan-Kolchin, M., Springel, V., White, S.D.M., Jenkins, A., Lemson, G.: MNRAS 398, 1150
(2009). doi:10.1111/j.1365-2966.2009.15191.x
3. Angulo, R.E., Springel, V., White, S.D.M., Jenkins, A., Baugh, C.M., Frenk, C.S.: MNRAS
426, 2046 (2012). doi:10.1111/j.1365-2966.2012.21830.x
20 V. Springel et al.
4. Kauffmann, G., Colberg, J.M., Diaferio, A., White, S.D.M.: MNRAS 303, 188 (1999).
5. Cole, S., Lacey, C.G., Baugh, C.M., Frenk, C.S.: MNRAS 319, 168 (2000).
6. Vogelsberger, M., Genel, S., Springel, V., Torrey, P., Sijacki, D., Xu, D., Snyder, G., Bird, S.,
Nelson, D., Hernquist, L.: Nature 509, 177 (2014). doi:10.1038/nature13316
7. Vogelsberger, M., Genel, S., Springel, V., Torrey, P., Sijacki, D., Xu, D., Snyder, G., Nelson,
D., Hernquist, L.: MNRAS 444, 1518 (2014). doi:10.1093/mnras/stu1536
8. Genel, S., Vogelsberger, M., Springel, V., Sijacki, D., Nelson, D., Snyder, G., Rodriguez-
Gomez, V., Torrey, P., Hernquist, L.: MNRAS 445, 175 (2014). doi:10.1093/mnras/stu1654
9. Springel, V.: MNRAS 401, 791 (2010). doi:10.1111/j.1365-2966.2009.15715.x
10. Vogelsberger, M., Sijacki, D., Kereš, D., Springel, V., Hernquist, L.: MNRAS 425, 3024
(2012). doi:10.1111/j.1365-2966.2012.21590.x
11. Sijacki, D., Vogelsberger, M., Kereš, D., Springel, V., Hernquist, L.: MNRAS 424, 2999
(2012). doi:10.1111/j.1365-2966.2012.21466.x
12. Torrey, P., Vogelsberger, M., Sijacki, D., Springel, V., Hernquist, L.: MNRAS 427, 2224
(2012). doi:10.1111/j.1365-2966.2012.22082.x
13. Bauer, A., Springel, V.: MNRAS 423, 2558 (2012). doi:10.1111/j.1365-2966.2012.21058.x
14. Sijacki, D., Springel, V., Di Matteo, T., Hernquist, L.: MNRAS 380, 877 (2007).
15. Dolag, K., Bartelmann, M., Lesch, M.: A&A 348, 351 (1999)
16. Dolag, K., Bartelmann, M., Lesch, H.: A&A 387, 383 (2002). doi:10.1051/0004-
17. Dolag, K., Grasso, D., Springel, V., Tkachev, I.: J. Cosmol. Astropart. Phys. 1, 009 (2005).
18. Donnert, J., Dolag, K., Lesch, H., Müller, E.: MNRAS 392, 1008 (2009). doi:10.1111/j.1365-
19. Bonafede, A., Dolag, K., Stasyszyn, F., Murante, G., Borgani, S.: MNRAS 418, 2234 (2011).
20. Kotarba, H., Lesch, H., Dolag, K., Naab, T., Johansson, P.H., Donnert, J., Stasyszyn, F.A.:
MNRAS 415, 3189 (2011). doi:10.1111/j.1365-2966.2011.18932.x
21. Beck, A.M., Lesch, H., Dolag, K., Kotarba, H., Geng, A., Stasyszyn, F.A.: MNRAS 422, 2152
(2012). doi:10.1111/j.1365-2966.2012.20759.x
22. Marinacci, F., Vogelsberger, M., Mocz, P., Pakmor, R.: MNRAS 453, 3999 (2015).
23. Sijacki, D., Vogelsberger, M., Genel, S., Springel, V., Torrey, P., Snyder, G.F., Nelson, D.,
Hernquist, L.: MNRAS 452, 575 (2015). doi:10.1093/mnras/stv1340
24. Yuan, F., Narayan, R.: ARA&A 52, 529 (2014). doi:10.1146/annurev-astro-082812-141003
25. Weinberger, R., Springel, V., Hernquist, L., Pillepich, A., Marinacci, F., Pakmor, R., Nelson,
D., Genel, S., Vogelsberger, M., Naiman, J., Torrey, P.: ArXiv e-prints (2016)
26. Pelupessy, F.I., Jänes, J., Portegies Zwart, S.: New A 17, 711 (2012).
27. Pakmor, R., Springel, V., Bauer, A., Mocz, P., Munoz, D.J., Ohlmann, S.T., Schaal, K., Zhu,
C.: MNRAS 455, 1134 (2016). doi:10.1093/mnras/stv2380
28. Pakmor, R., Bauer, A., Springel, V.: MNRAS 418, 1392 (2011). doi:10.1111/j.1365-
Hydrangea: Simulating a Representative
Population of Massive Galaxy Clusters
Abstract Galaxy clusters are the most massive bound structures in the Universe,
and contain not only up to several thousand galaxies, but also extended haloes of
dark matter and hot gas. Observations show that galaxies in clusters differ from
those living in more isolated parts of the Universe, but the physics of how clusters
shape their galaxies is at present not well understood. Not only does this constitute
a major gap in our understanding of galaxy formation, but it also limits the use of
galaxy clusters as cosmological probes. In the Hydrangea project, we have created
a suite of 24 simulated galaxy clusters at unprecedented resolution, using a state
of the art galaxy formation model developed for the EAGLE project. Detailed
scientific analysis of the simulation outputs, which has only just begun, is expected
to lead to major new insight into the physics of both galaxy formation in an extreme
environment and the growth of the massive haloes in which cluster galaxies are
1 Introduction
The parsec (pc) is the standard unit of length in astronomy, with 1 pc = 3:08 1016 m.
Y.M. Bahé ()
Max Planck Institut für Astrophysik, Karl-Schwarzschild-Str. 1, 85748, Garching, Germany
e-mail: ybahe@mpa-garching.mpg.de
galaxy clusters the most massive gravitationally bound structures in the present-day
Scientifically, galaxy clusters are of interest in contemporary astrophysics for at
least three reasons. The first is that the close proximity of their member galaxies to
each other, as well as the presence of the DM and ICM haloes, constitute an extreme
environment for galaxy formation, a detailed understanding of which is an integral
part of the wider quest to understand how galaxies and larger-scale structures formed
and evolved in our Universe over the last 13 billion years. Secondly, it has become
clear that galaxy clusters are also one of the most promising probes to investigate the
composition and expansion history of the Universe itself, including the as-yet poorly
understood nature of Dark Energy (DE). Finally, the concentration of vast amounts
of DM in galaxy clusters also makes them interesting astrophysical laboratories that
can help us to better understand the nature of this dominant gravitating constituent
of our Universe.
There is ample observational evidence that galaxies in groups and clusters are
different from “field” galaxies that formed in more isolated regions of the Universe,
such as our Milky Way. Their colours are typically red – indicating a lack of
recent and ongoing star formation – whereas field galaxies tend to be blue due to
recently formed young, massive stars. A second key difference is their morphology:
cluster galaxies are biased towards “early” (elliptical) types, whereas many isolated
galaxies show a pronounced “late-type” morphology with a prominent spiral disk
[2]. A multitude of physical mechanisms have been suggested that could explain
these trends as a result of transformations that occur when galaxies fall into a
group or cluster: these include ram pressure stripping of cold [3] and hot gas [4],
tidal stripping by the group/cluster potential [5] and galaxy-galaxy interactions [6].
However, understanding to what extent, and on which timescales, each of these
processes actually affect galaxies has so far proved elusive [7, 8].
Making progress here is one of the most important goals in extragalactic
astrophysics, not only because groups and clusters harbour a significant fraction
(approximately one third) of all galaxies in the local Universe – so that under-
standing their evolution constitutes a key part of understanding galaxy formation
in general – but also because the usefulness of galaxy clusters as precision probes of
DE and cosmology is compromised by systematic effects that include the influence
of cluster galaxies and the hot gaseous intra-cluster medium (ICM) [9], unless
these effects are well understood and accounted for. Several large-scale surveys
are currently under way, or planned for the near future, to study DE with weak
gravitational lensing measurements of galaxy clusters, including the Dark Energy
Survey (DES), the Large Synoptic Survey Telescope (LSST), and the European
Space Agency’s Euclid mission, so that a better understanding of baryonic processes
in galaxy clusters is urgently required.
Major new observational insight is expected in the near future from a number of
large integral-field unit surveys such as KMOS-Cluster, SAMI and MaNGA, as well
as the eRosita X-ray telescope. However, the non-linear nature of galaxy evolution –
several of the above-mentioned transformation mechanisms will likely amplify each
Hydrangea 23
M200 is defined as the mass within r200 , the radius inside which the mean density equals 200 times
the critical density of the Universe (crit ).
Abbreviation for “Cluster-EAGLE”, which also refers to the sea eagle (Haliaeetus pelagicus) as
the most massive member of the eagle family.
24 Y.M. Bahé, for the C-EAGLE collaboration
2 Simulation Code
Our simulations are run with a heavily modified version of the cosmological
TreePM/SPH code GADGET-3, last described by [17], that was developed,
optimized, and extensively tested for the EAGLE project [12]. The code is fully
MPI-parallelized, with a sophisticated domain decomposition scheme that assigns
to each MPI-task the particles in a large number of disjoint cells; this significantly
improves load-balancing in highly clustered systems such as those we have
Gravitational forces are calculated on large scales by Fourier-transforming in
parallel a periodic mesh covering the entire simulation volume (implemented
through the FFTW library). On smaller scales, a Barnes-Hut tree algorithm is
used, together with direct summation on the smallest scales. The code also uses
an additional isolated mesh covering only the high-resolution region of zoom
simulations, which results in an orders-of-magnitude speed-up. Particles are inte-
grated in time on variable time-steps nested hierarchically on up to 20 levels.
Different from “standard” GADGET-3, our code uses a time-step limiter [18] which
ensures that timesteps are kept short after particles experience significant changes
in their internal energy. This significantly improves the accuracy in the treatment of
feedback [19].
Hydrodynamical forces are evaluated with the Smoothed Particle Hydrodynam-
ics (SPH) approach, which is implemented in the entropy-conserving formulation
[20]. The SPH implementation has been modified significantly for the EAGLE
project through a series of measures collectively referred to as “Anarchy” (Dalla
Vecchia, in prep.) which include the conservative pressure-entropy formalism of
[21], the artificial viscosity switch of [22], an artificial conduction switch similar to
that of [23], and the C2 Wendland kernel [24]. These modifications largely eliminate
the inaccuracies related to contact discontinuities and spurious fragmentation
present in older versions of SPH [19].
The most significant modifications of the GADGET-3 code relate to the imple-
mentation of relevant physical processes on unresolved scales (“sub-grid physics”).
We only summarized these briefly; for details the reader is referred to the description
of the “AGNdT9” model in [12]. Gas cooling and chemical enrichment are
Hydrangea 25
Like all C-EAGLE runs, the Hydrangea simulations are based upon a very large, low
resolution (particle masses of mDM 8 1010 Mˇ ) dark matter only simulation
realised in a cubic box of side length 3200 Mpc (comoving). This simulation
contains more than 300,000 dark matter haloes with a mass in excess of 1014 Mˇ ,
including almost 3000 extremely massive objects with M200 > 1015 Mˇ . We
discarded those with a relatively close more massive neighbour (within 20 r200
or 30 Mpc, whichever is larger), and those within 200 Mpc from the simulation
box edge. Out of the remaining haloes, 29 objects in the mass range 14:0 <
log10 .M200 =Mˇ / < 15:25 and distributed uniformly in M200 were selected at
random for re-simulation as our ‘core’ sample. Furthermore, one even more massive
object (M200 1015:4 Mˇ ) was selected for comparison to the most massive
observed clusters.
For each object selected for re-simulation, high-resolution zoomed initial condi-
tions (ICs) were generated with the IC_2LPT_GEN code [33], using second order
Lagrangian perturbation theory. The dark matter particle mass mDM D 9:7 106 Mˇ
in the high-resolution region of these ICs is almost 10,000 times smaller than in the
parent simulation; the mass of baryon (star and gas) particles is lower still by a factor
of fb ˝b =˝m D 6:36, the cosmic baryon fraction, so that the simulation contains
(initially) equal numbers of DM and baryon particles.4 In order to correctly model
the tidal forces acting on the high-resolution region, the remaining volume of the
3200 Mpc simulation box is filled with a relatively small number of very massive
‘boundary’ particles.
To test the quality of the high-resolution ICs, each simulation was first run in N-
body only mode, i.e. without hydrodynamics or subgrid models. The motivation
for doing this is twofold: first, such a simulation incurs only a small fraction
of the computational cost of a full hydrodynamical run and therefore constitutes
an economical way to test the quality of the high-resolution ICs by comparing
During the course of a simulation, some baryon particles are ‘swallowed’ by black holes, so that
the final number of baryon particles is typically slightly lower.
26 Y.M. Bahé, for the C-EAGLE collaboration
the masses of the simulated cluster haloes in the high-resolution run to the low-
resolution counterparts in the parent simulation. On the other hand, a lot of insight
into the effect of baryonic physics can be gained from comparing dark matter only
and hydrodynamical simulations started from the same ICs, as any differences can
be ascribed solely to the presence of gas in the latter [9]. We have verified that for
all objects in our sample, the final cluster masses in the high-and low resolution
runs are virtually identical (differing by less than 10 %), and that no low-resolution
‘boundary’ particles are present within >12r200 from the cluster centre at z D 0.
However, simulating all 30 haloes with full hydrodynamics in a high-resolution
region of 10r200 would have incurred a very high computational cost. To accommo-
date the project within the constraints of available resources, we therefore selected a
subsample of 24 objects for the Hydrangea runs, including the very massive cluster
with M200 1015:4 Mˇ but otherwise biased towards lower mass haloes which
are individually cheaper to simulate, but also contain a smaller number of galaxies
and therefore benefit especially from an enlarged sample size. Hydrodynamical
simulations of the remaining six clusters, which are not part of Hydrangea, were
performed only out to 5r200 . Eleven Hydrangea haloes have masses M200 in the
range 1014:0 to 1014:5 Mˇ , eight between 1014:5 and 1015:0 Mˇ , and five between
1015:0 and 1015:5 Mˇ .
Table 1 The three largest simulations performed at HLRS. Note that the wallclock time includes
queueing between individual jobs
Mass Wallclock time CPU time
Halo ID [1015 Mˇ ] Ncore [days] [106 CPU-hr]
22 1:05 2048 187 4:8
28 1:70 2048 164 4:2
40 2:19 4096 281 10:3
200 nodes on Hornet/HazelHen. Most of the SUBFIND analysis for the massive
clusters has therefore been performed on Hornet/HazelHen as well.
Production runs at HLRS began on 16 June 2015, following a brief period of
testing our code on the Cray XC40 system after access was granted on June 3.
The longest running simulation performed at HLRS, of the cluster with M200
1015:4 Mˇ , finished on April 29, 2016. The post-processing of simulation outputs
with SUBFIND was completed on May 12, 2016, thus concluding our calculations
at HLRS slightly ahead of the scheduled project completion date (May 15).
For each simulation, 30 full snapshots were stored between redshift z D 14 and
z D 0, with a constant gap of 500 Myr between them. In addition, we saved a
larger number of ‘snipshots’ that only contain the most essential and most rapidly
time-varying quantities calculated by the simulation, such as particle positions,
velocities, and SPH-interpolated density. For each simulation, we have stored at
least 178 such snipshots, resulting in a maximum time between any two outputs
of 125 Myr. In total, the simulations run at HLRS have so far produced 350 TB of
data, which has been continuously transferred to the Virgo Data Archive at the Max
Planck Computing and Data Facility (MPCDF) in Garching for scientific analysis.
Table 1 lists the cluster mass, run time, and number of cores used for the three
most demanding simulations performed at HLRS. The entire Hydrangea project
has produced more than 500 TB of raw data, and simulated the formation of more
than 20,000 galaxies with stellar mass Mstar 109 Mˇ .
4 Simulation Results
Fig. 1 Visualisation of halo 28 at redshift z D 0:0, one of the most massive Hydrangea clusters
simulated at HLRS. The left column shows, from top to bottom, the projected density of dark
matter, gas, and stars in a cubic box of 40 Mpc side length, centered on the cluster. The nominal
edge of the high-resolution region, at a distance of 10r200 D 25 Mpc is indicated with the dotted
yellow lines visible in the corners of the top-left panel. In the right column, we show the projected
mass-weighted metallicity of the gas, its temperature, and (bulk) velocity. In qualitative agreement
with observations, the central cluster is filled with very hot, metal-enriched gas that shows a
complex dynamical structure. Each point in the stellar density map (bottom left) represents a
simulated galaxy
Hydrangea 29
Fig. 3 Fraction of simulated galaxies within r200;m that are passive (i.e. have a specific star
formation rate sSFR < 1011 yr1 ). Differently coloured bands denote different ranges of halo
mass, as indicated in the bottom right corner. In approximate agreement with observations, cluster
galaxies have a much higher passive fraction than field galaxies, the difference being greatest in
the most massive clusters (green)
clusters increases with both stellar mass and the mass of the cluster [38].5 As Fig. 3
shows, our simulations reproduce the latter observation well, especially for galaxies
at the lower end of the mass range considered, Mstar 1010 Mˇ .
The full scientific exploitation of the rich dataset produced by the Hydrangea
simulations and related C-EAGLE projects has only just begun, and is expected to
take several years. This analysis includes quantitative comparisons of the simulated
galaxies and the intra-cluster medium to observational data (e.g. [35, 38–40]), as
well as projects aiming to obtain a detailed understanding of the physical processes
operating in galaxy clusters that lead to e.g. the lack of star formation in cluster
galaxies, the change in galaxy morphology from disk-dominated to elliptical, and
the formation of structures in the ICM. The results of these studies will be reported
in the astrophysical literature.
5 Summary
We have produced the Hydrangea simulations of two dozen massive galaxy clusters,
a ground-breaking new tool to study the formation of galaxies in the most extreme
environment in our Universe. The simulations use a well-tested code developed
Given that the age of the Universe is approximately 1010 yr, such galaxies must have formed stars
at a much higher rate in the past in order to build up their current stellar mass.
Hydrangea 31
for the EAGLE project and are in large part run on the HLRS Cray XC40
Hornet/HazelHen system, each using up to 4096 cores. Memory is a key limiting
factor in our simulations, which require up to 10 TB of RAM, making the high-
memory machine at HLRS an ideal system to run them. All simulations have been
completed within the allocated time frame, and the same is true for cataloging of
outputs with the SUBFIND code. We are now beginning the scientific analysis of
the simulation data, which is expected to lead to more than a dozen publications
over the coming years.
1 Introduction
Table 1 Photoionization cross section calculations: timings for the J D 1 scattering symmetry
of W2C ions. The scattering model used included 392-states, 1728 coupled channels, and 800,000
energy points. The R-matrix outer region module PSTGBF0DAMP performance on Hazel Hen,
the Cray XC40 at HLRS, is presented for a different number of cores
R-matrix Speed-up Cray XC40 Total core time
(Module) Number of runs (factor) (Number of cores) (minutes)
PSTGBF0DAMP 1 1.00 1000 451:525
PSTGBF0DAMP 1 2.01 2000 224:588
PSTGBF0DAMP 1 3.93 4000 114:866
PSTGBF0DAMP 1 5.82 8000 77:523
PSTGBF0DAMP 1 9.77 10;000 46:193
such as; CLOUDY, CHIANTI, AtomDB, XSTAR necessary for interpreting exper-
iment/satellite observations of astrophysical objects as well as fusion and plasma
modeling for JET and ITER.
The launch of the satellite Astro-H (re-named Hitomi) on February 17, 2016, was
expected to provide x-ray spectra of unprecedented quality and would have required
a wealth of atomic and molecular data on a range of collision processes to assist
36 B.M. McLaughlin et al.
with the analysis of spectra from a variety of astrophysical objects. The subsequent
break-up 40 days later on March 28, 2016 of Hitomi leaves a void in observational x-
ray spectroscopy. Measurements of cross sections for photoionization of atoms and
ions are essential data for testing theoretical methods in fundamental atomic physics
and for modeling of many physical systems, for example, terrestrial plasmas, the
upper atmosphere, and a broad range of astrophysical objects (quasar stellar objects,
the atmosphere of hot stars, proto-planetary nebulae, H II regions, novae, and
supernovae) [12, 13].
Limited wavelength observations for x-ray transitions were recently made on
atomic oxygen, neon, magnesium and their ions with the High Energy Transmission
Grating (HETG) on board the CHANDRA satellite [14]. Strong absorption K-shell
lines of atomic oxygen, in its various forms of ionization, have been observed by
the XMM-Newton satellite in the interstellar medium, through x-ray spectroscopy of
low-mass x-ray binaries [15]. The Chandra and XMM-Newton satellite observations
may be used to identify absorption features in astrophysical sources, such as
active galactic nuclei (AGN), x-ray binaries, and for assistance in benchmarking
theoretical and experimental work [16–21].
Absolute cross sections for the K-shell photoionization of Be-like (O4C ) and Li-
like (O5C ) atomic oxygen ions were measured (in their respective K-shell regions)
by employing the ion-photon merged-beam technique at the SOLEIL synchrotron-
radiation facility in Saint-Aubin, France. High-resolution spectroscopy with E/E
4000 (140 meV, FWHM) was achieved with photon energy from 550 eV up to
675 eV. Rich resonance structure observed in the experimental spectra is analyzed
using the R-matrix with pseudosates (RMPS) method.
Detailed spectra for Be-like [O4C ] and Li-like [O5C ] atomic oxygen ions in
the vicinity of the K-edge were measured. This work is the culmination of pho-
toionization cross section measurements on the atomic oxygen isonuclear sequence.
Previous studies on this sequence, focused on obtaining photoionization cross
sections for the OC and O2C ions [17] and the O3C ion [16], where differences
of 0.5 eV in the positions of the K˛ resonance lines with prior satellite observations
were found. This will have major implications for astrophysical modelling.
Figure 1 shows the spectra for Be-like atomic oxygen in the region of the
strong 1s ! 2p resonance. To compare directly with the SOLEIL measurements,
the theoretical R-matrix cross sections have been convoluted with a Gaussian profile
width of 220 meV at FWHM. For O4C as the 1s2 2s2p 3 Po metastable state is present
in the photon beam, an admixture of 70 % of the ground state and 30 % of the
metastable state, of the respective cross sections, appears to simulate experiment
suitably well. The theoretical cross section results presented in Fig. 1 indicate
excellent agreement with the SOLEIL experimental measurements. Similarly in
Fig. 2, the SOLEIL spectra for Li-like atomic oxygen in the region of the strong
1s ! 2p resonance are illustrated. To compare with the SOLEIL measurements,
the theoretical cross sections have been convoluted with a Gaussian profile width
of 350 meV at FWHM. We note that for both ions, the theoretical results from
the R-matrix with pseudostates method (RMPS) show suitable agreement with the
SOLEIL measurements [22].
Computations in Support of Experiments and Astrophysical Applications 37
1 o
1s2s 2p P
Cross section (Mb) O THEORY( S / 70%)
3 o
THEORY( P / 30%)
ΔE=220 meV
1s2s( S)2p
1s2s( S)2p
THEORY (Breit-Pauli)
Cross section (Mb)
2 o
1s2s( S)2p P
1s2s( S)2p P
Cross section (Mb)
ΔE = 140 meV
250 255 260 265 270
Photon energy (eV)
Fig. 3 Photoionization cross sections (Mb) as a function of the photon energy (eV) in the ArC
L-shell region between 250 and 270 eV. The (blue) circles are the experimental measurements
from SOLEIL taken at a band pass of 140 meV at FWHM. The dashed (red) line are the MCDF
theoretical results and the solid (black) line are the DARC (model DARC3) results. The theoretical
results were statistically weighted for the initial ground state and convoluted with a Gaussian profile
width of 140 meV at FWHM [29]
Computations in Support of Experiments and Astrophysical Applications 39
270 eV. Comparisons are made between the experimental results from SOLEIL, and
theoretical work, MCDF and DARC. In order to match the SOLEIL experimental
spectrum an energy shift of 7.5 eV to the DARC calculations was necessary [29].
W 5
392cc DARC D term average
shifted by - 1.4 eV
scan 100 me V resolution
absolute measurements
Crosss ection (Mb)
Cowan thresholds
NIST thresholds of DJ levels
40 NIST term-averagedthreshold
20 30 40 50 60 70 80 90
Photon energy (eV)
Fig. 4 Photoionization of W2C ions measured at energy resolution 100 meV. Energy-scan mea-
surements (small circles with statistical error bars) were normalized to absolute cross-section
data represented by large circles with total error bars. The black vertical bars at energies below
26 eV represent ionization thresholds of all 5d4 , 5d3 6s, and 5d2 6s2 levels with excitation energies
lower than the excitation energy of the lowest level (5 G2 ) within the 5d3 6p configuration. These
thresholds were calculated by using the Cowan code [45] as implemented by Fontes and co-
workers [46] and were shifted by about 0.5 eV to match the ground level ionization threshold
from the NIST tables [47]. The (brown) vertical bars between 25 and 26 eV indicate the NIST
ionization potentials of the levels within the 5d4 5 D ground-term. The lowest (green) vertical
bar which matches the cross-section onset shows the NIST ground-term-averaged ionization
potential. The solid (red) line with (light red) shading represents the result of the present 392-
level DARC calculation (125 eV step size) of the ground-term-averaged photoionization cross
section, convoluted with a Gaussian of 100 meV width. The theoretical cross sections are shifted
by 1:4 eV to match experiment [43]
of the investigated tungsten ions with levels 5s2 5p6 5d4 5 DJ J D 0; 1; 2; 3; 4 for
W2C and 5s2 5p6 5d3 4 FJ 0 J 0 D 3=2; 5=2; 7=2; 9=2 for W3C . As illustrated in Fig. 4
for W2C ions, suitable agreement is achieved below 60 eV, but at higher energies
there is a factor of approximately two difference between experiment and theory.
In Fig. 5, assuming a statistically weighted distribution of ions in the initial ground-
term levels, over the energy range investigated, good agreement between theory and
experiment for W3C ions is achieved [43].
Computations in Support of Experiments and Astrophysical Applications 41
173cc DARC F term average
W shifted by - 2.0 eV
379cc DARC F terma verage
not shifted
Cross section ( Mb )
30 40 50 60 70 80 90
Photon energy ( eV )
Fig. 5 Comparison of the measured photoionization cross section of W3C with the present 173-
level DARC calculation (87 eV step size; thin red line with shading) and the present 379-level
DARC result (109 eV step size; solid blue line without shading). The theory curves were obtained
by convolution of the original spectra with a Gaussian of 100 meV width. Only the 173-level
calculations are shifted down in energy by 2.0 eV so that the steep rise of the experimental cross
section function at about 40 eV is matched
5 Photodissociation: SHC
Photodissociation cross sections for the SHC radical are computed from all
rovibrational (RV) levels of the ground electronic state X 3 ˙ for wavelengths
from threshold to 500 Å. The five electronic transitions, 2 3 ˙ X 3˙ ,
3 3 3 3 3 3 3
3 ˙ X ˙ ,A ˘ X ˙ ,2 ˘ X ˙ , and 3 ˘ X 3˙ ,
42 B.M. McLaughlin et al.
3 o
Double Photoionization of He(1s2p P )
3 o
50 60 70 80 90 100
Photon Energy (eV)
Fig. 6 Total cross sections (kbarns) as a function of photon energy using the time dependent
close-coupling (TDCC) method. Results are shown for the initial individual fine-structure states
of He.1s2p 3 PoJ /, where J D 0, 1 and 2 [52]
3 o
Double Photoionization of He(1s2p P ) at 70.0 eV
Differential cross section(kbarns/eV)
3 o
0.60 3 o
3 o
0 2 4 6 8 10 12 14
Ejected energy (eV)
Fig. 7 Differential cross sections (kilobarns/eV) as a function of the ejected electron energy in eV
using the time dependent close-coupling (TDCC) method at a photon energy of 70 eV. Results are
shown for the initial individual fine-structure states of He.1s2p 3 PoJ /, where J D 0; 1 and 2 [52]
Computations in Support of Experiments and Astrophysical Applications 43
3 3 - 3 -
2 Π X Σ -2 Σ
3 - 3 -
6 3 - 1.0 X Σ -3 Σ
2 Σ
3 -
0 5 10 15 0 5 10 15
Fig. 8 (a) Relative electronic energies (eV) for the SHC molecular cation, as a function of bond
separation at the MRCI+Q level of approximation with an AV6Z basis. Energies are relative to
the ground state near equilibrium (2.6 a0 ). The states shown are for the transitions connecting the
X3 ˙ ! 2 3 ˙ , 3 3 ˙ , A 3 ˘ , 2 3 ˘ , 3 3 ˘ states involved in the photodissociation process. (b)
Dipole transition moments D.R/ (a.u.) for the X 3 ˙ ! A 3 ˘; 2 3 ˙ ; 3 3 ˙ ; 2 3 ˘; 2 3 ˘;
transitions. The MRCI + Q approximation with an AV6Z basis set was used to calculate the
transition dipole moments
Lyman Lyman α
-1 Limit 3 - 3 -
10 X Σ -2 Σ
cm )
3 - 3 -
-2 X Σ -3 Σ
10 3 - 3
X Σ -A Π
3 - 3
X Σ -2 Π
10 3 - 3
X Σ -3 Π
Cross section (10
Wavelength (Å)
Fig. 9 Comparison of SHC photodissociation cross sections for v 00 D 0 and J 00 D 0 with
estimates from Ref. [55]
Lyman Lyman α
Limit 3 3 -
2 Π <- X Σ
cm )
10 3 3 -
3 Π <- X Σ
3 - 3 -
2 Σ <- X Σ
-2 3 3 -
A Π <- X Σ
3 - 3 -
3 Σ <- X Σ
Cross section (10
1000 1500 2000 2500 3000 3500
Wavelength (Å)
Fig. 10 Total SHC LTE photodissociation cross section at 3000 K for all electronic transitions
In Fig.10, the LTE cross sections for all five transitions are compared at 3,000 K.
This should be compared to Fig. 9 for v 00 D 0, J 00 D 0 where it is seen that the cross
sections are larger in the LTE case for wavelengths longer than 1500 Å.
Computations in Support of Experiments and Astrophysical Applications 45
The SHC radical ion, sulfanylium, was not detected in the interstellar medium
(ISM) until as late as 2010 [56]. It is however, an important tracer of gas condensa-
tions in dense regions and also probes the warm surface layers of photo-dominated
regions (PDR) [57]. Furthermore, its abundance is expected to be enhanced in
x-ray dominated regions (XDR) [58]. In their model of the Orion Bar PDR, Nagy
et al. [57] find that photodissociation accounts for a maximum of about 4.4 % of
the total destruction rate of SHC , since reactive collisions with H and dissociative
recombination by electrons are more efficient. However, they adopted the estimated
cross section of [55] for v 00 D 0, J 00 D 0. We point out that the adoption of the
current cross sections would enhance the photodissociation contribution to greater
than 10 %. We note that the photodissociation rates are not given here as they are
sensitive to the local radiation field and dust properties. The latter is quite different
in the Orion Bar from the average ISM of the galaxy. The densities and temperatures
(105 –106 cm3 and 1000 K) of the Orion Bar PDR begin to approach the regime
where photodissociation from excited states might contribute which is currently
neglected in all models. Furthermore, LTE conditions are almost satisfied, but at
1000 K there is not a significant difference between the LTE and v 00 D 0, J 00 D 0
cross sections [59].
6 Summary
The power of the predictive nature of the R-matrix approach within a non-relativistic
or a fully relativistic approach for photoionization cross sections, valence or
inner-shell, resonance energy positions, Auger widths and strengths is illustrated.
Quantal calculation of photodissociation cross sections and rates for astrophysical
applications require as input accurate potential energy curves and transition dipole
moments. Access to leadership architectures is essential to our research work
such as the Cray XC40 at HLRS which provides an integral contribution to our
computational effort in atomic, molecular and optical collision processes.
1. Hasoglu, M.F., Abdel Naby, S.A., Gorczyca, T.W., Drake, J.J., McLaughlin, B.M.: K-shell
photoabsorption studies of the carbon isonuclear sequence. Astrophys. J. 724, 1296 (2010)
2. McLaughlin, B.M.: Inner-shell photoionization, fluorescence and Auger yields. In: Ferland, G.,
Savin, D.W. (eds.) Spectroscopic Challenges of Photoionized Plasma, Astronomical Society of
the Pacific. ASP Conference Series, vol. 247, p. 87. Astronomical Society of the Pacific, San
Francisco (2001)
3. Kallman, T.R.: Challenges of plasma modelling: current status and future plansa. Space Sci.
Rev. 157, 177 (2010)
4. McLaughlin, B.M., Ballance, C.P.: Photoionization, fluorescence and inner-shell processes. In:
McGraw-Hill (ed.) McGraw-Hill Yearbook of Science and Technology, p. 281. McGraw Hill,
New York (2013)
5. McLaughlin, B.M., Ballance, C.P.: Photoionization cross section calculations for the halogen-
like ions KrC and XeC . J. Phys. B: At. Mol. Opt. Phys. 45, 085701 (2012)
6. McLaughlin, B.M., Ballance, C.P.: Photoionization cross-sections for the trans-iron element
SeC from 18 eV to 31 eV. J. Phys. B: At. Mol. Opt. Phys. 45, 095202 (2012)
7. McLaughlin, B.M., Ballance, C.P.: Petascale computations for large-scale atomic and molecu-
lar collisions, ch 15. In: Resch, M.M., Kovalenko, Y., Fotch, E., Bez, W., Kobaysahi, H. (eds.)
Sustained Simulated Performance 2014. Springer, New York (2014)
8. McLaughlin, B.M., Ballance, C.P., Pindzola, M.S., Müller, A.: PAMOP: petascale atomic,
molecular and optical collisions, ch 4. In: Nagel, W.E., Kröner, D.H., Resch M.M. (eds.) High
Performance Computing in Science and Engineering’14. Springer, New York (2015)
9. McLaughlin, B.M., Ballance, C.P., Pindzola, M.S., Schipprs, S., Müller, A.: PAMOP: petascale
computations in support of experiments, ch 4. In: Nagel, W.E., Kröner, D.H., Resch, M.M.
(eds.) High Performance Computing in Science and Engineering’15. Springer, New York
10. Ballance, C.P., Griffin, D.C.: Relativistic radiatively damped R-matrix calculation of the
electron-impact excitation of W 46C . J. Phys. B: At. Mol. Opt. Phys. 39, 3617 (2006)
11. Ballance, C.P., Loch, S.D., Pindzola, M.S., Griffin, D.C.: Electron-impact excitation and
ionization of W 3C for the determination of tungsten influx in a fusion plasma. J. Phys. B:
At. Mol. Opt. Phys. 46, 055202 (2013)
12. Kjeldsen, H., Kristensen, B., Brooks, R.L., Folkman, H., Knudsen, H., Andersen, T.: Absolute
state-slected measurements of the photoionization cross section of N C and OC ions. Astro-
phys. J. Suppl. Ser. 138, 219 (2002)
13. Garcia, J., Mendoza, C., Bautista, M.A., Gorczyca, T.W., Kallman, T.R., Palmeri, P.: K-shell
photoabsorption of oxygen ions. Astrophys. J. Suppl. Ser. 158, 68 (2005)
14. Liao, J.-Y., Zhang, S.-N., Yao, Y.: Wavelength measurements of K transitions of oxygen, neon,
and magnesium with X-ray absorption lines. Astrophys. J. 774, 116 (2013)
15. Pinto, C., Kaastra, J.S., Costantini, E., de Vries, C.: Interstellar medium composition through
X-ray spectroscopy of low-mass X-ray binaries. Astron. Astrophys. 551, 25 (2013)
16. McLaughlin, B.M., Bizau, J.M., Cubaynes, D., Al Shorman, M.M., Guilbaud, S., Sakho,
I., Blancard, C., Gharaibeh, M.F.: K-shell photoionization of B-like (O3C ) oxygen ions:
experiment and theory. J. Phys. B: At. Mol. Opt. Phys. 47, 115201 (2014)
17. Bizau, J.M., Cubaynes, D., Guilbaud, S., Al Shorman, M.M., Gharaibeh, M.F., Ababneh, I.Q.,
Blancard, C., McLaughlin, B.M.: K-shell photoionization of OC and O2C ions: experiment
and theory. Phys. Rev. A 92, 023401 (2015)
18. Gorczyca, T.W., Bautista, M.A., Hasoglu, M.F., Garcia, J., Gatuzz, E., Kasstra, J.S., Kall-
man, T.R., Manson, S.T., Mendoza, C., Raasen, A.J.J., de Vries, C.P., Zatsarinny, O.: A
comprehensive X-ray absorption model for atomic oxgen. Astrophys. J. 779, 78 (2013)
19. Gatuzz, E., Garcia, J. Mendoza, C., Kallman, T.R., Witthoeft, M., Lohfink, A., Bauitista, M.A.,
Palmeri, P., Quinet, P.: Photoionization modeling of oxygen K absorption in the interstellar
medium: the Chandra grating spectra of XTE J1817–330. Astrophys. J. 768, 60 (2013)
Computations in Support of Experiments and Astrophysical Applications 47
20. Gatuzz, E., Garcia, J. Mendoza, C., Kallman, T.R., Witthoeft, M., Lohfink, A., Bauitista, M.A.,
Palmeri, P., Quinet, P.: Erratum: photoionization modeling of oxygen K absorption in the
interstellar medium: the Chandra grating spectra of XTE J1817–330. Astrophys. J. 778, 83
21. Gatuzz, E., Garcia, J., Mendoza, C., Kallman, T.R., Bautista, M.A., Gorczyca, T.W: Physical
properties of the interstellar medium using high-resolution Chandra spectra: O K-edge
absorption. Astrophys. J. 790, 131 (2014)
22. Bizau, J.M., Cubaynes, D., Guilbaud, S., Al Shorman, M.M., El Ghazaly, M.O.A., Gharaibeh,
M.F., Sakho, I., McLaughlin, B.M.: K-shell photoionization of O4C and O5C ions: experiment
and theory. Mon. Not. R. Astro. Soc. (MNRAS) (2016, in press)
23. Dyall, K.G., Grant, I.P., Johnson, C.T., Plummer, E.P.: GRASP: a general-purpose relativistic
atomic structure program. Comput. Phys. Commun. 55, 425 (1989)
24. Grant, I.P.: Quantum Theory of Atoms and Molecules: Theory and Computation. Springer,
New York (2007)
25. Norrington, P.H., Grant, I.P.: Low-energy electron scattering by Fe XXIII and Fe VII using the
dirac R-matrix method. J. Phys. B: At. Mol. Opt. Phys. 20, 4869 (1987)
26. R-matrix DARC and BP codes. http://connorb.freeshell.org (2016)
27. Covington, A.M., Aguilar, A., Covington, I.R., Hinojosa, G., Shirley, C.A., Phaneuf, R.A.,
Álvarez, I., Cisneros, C., Dominguez-Lopez, I., Sant’Anna, M.M., Schlachter, A.S., Bal-
lance, C.P., McLaughlin, B.M.: Valence-shell photoionization of chlorinelike ArC ions. Phys.
Rev. A 84, 013413 (2011)
28. Blancard, C., Cossé, Ph., Faussurier, G., Bizau, J.-M., Cubaynes, D., El Hassan, N.,
Guilbaud, S., Al Shorman, M.M., Robert, E., Liu, X.-J., Nicolas, C., Miron, C.: L-shell
photoionization of ArC to Ar3C ions. Phys. Rev. A 85, 043408 (2012)
29. Tyndall, N.B., Ramsbottom, C.A., Ballance, C.P., Hibbert, A.: Valence and L-shell photoion-
ization of Cl-like argon using R-matrix techniques. Mon. Not. Roy. Astro. Soc. (MNRAS) 456,
366 (2016)
30. Müller, A.: Fusion-related ionization and recombination data for tungsten ions in low to
moderately high charge states. Atoms 3, 120 (2015)
31. Rausch, J., Becker, A., Spruck, K., Hellhund, J., Borovik Jr, A., Huber, K., Schippers S.,
Müller, A.: Electron-impact single and double ionization of W 17C . J. Phys. B: At. Mol. Opt.
Phys. 44, 165202 (2011)
32. Stenke, M., Aichele, K., Harthiramani, D., Hofmann, G., Steidl, M., Völpel, R., Salzborn, E.:
Electron-impact single-ionization of singly and multiply charged tungsten ions. J. Phys. B: At.
Mol. Opt. Phys. 28, 2711 (1995)
33. Schippers, S., Bernhardt, D., Müller, A., Krantz, C., Grieser, M., Repnow, R., Wolf, A.,
Lestinsky, M., Hahn, M., Novotný, O., Savin, D.W.: Dielectronic recombination of xenonlike
tungsten ions. Phys. Rev. A 83, 012711 (2011)
34. Krantz, C., Spruck, K., Badnell, N.R., Becker, A., Bernhardt, D., Grieser, M., Hahn, M.,
Novotný, O., Repnow, R., Savin, D.W., Wolf, A., Müller, A., Schippers S.: Absolute rate
coefficients for the recombination of open f -shell tungsten ions. J. Phys. Conf. Ser. 488, 012051
35. Spruck, K., Badnell, N.R., Krantz, C., Novotný, O., Becker, A., Bernhardt, D., Grieser, M.,
Hahn, M., Repnow, R., Savin, D.W., Wolf, A., Müller, A., Schippers, S.: Recombination of
W 18C ions with electrons: absolute rate coefficients from a storage-ring experiment and from
theoretical calculations. Phys. Rev. A 90, 032715 (2014)
36. Borovik, A. Jr., Ebinger, B., Schury, D., Schippers, S., Müller, A.: Electron-impact single
ionization of W 19C ions. Phys. Rev. A 93, 012708 (2016)
37. Badnell, N.R., Spruck, K., Krantz, C., Novotný, O., Becker, A., Bernhardt, D., Grieser, M.,
Hahn, M., Repnow, R., Savin, D.W., Wolf, A., Müller, A., Schippers, S.: Recombination of
W 19C ions with electrons: absolute rate coefficients from a storage-ring experiment and from
theoretical calculations. Phys. Rev. A 93, 052703 (2016)
38. Fivet, V., Bautista, M.A., Ballance, C.P.: Fine-structure photoionization cross sections of Fe II.
J. Phys. B: At. Mol. Opt. Phys. 45, 035201 (2012)
48 B.M. McLaughlin et al.
39. Müller, A., Schippers, S., Esteves-Macaluso, D., Habibi, M., Aguilar, A., Kilcoyne, A.L.D.,
Phaneuf, R.A., Ballance, C.P., McLaughlin, B.M.: High resolution valence shell photoioniza-
tion of Ag-like (Xe7C ) Xenon ions: experiment and theory. J. Phys. B: At. Mol. Opt. Phys. 47,
215202 (2014)
40. Müller, A., Schippers, S., Hellhund, J., Holosto, K., Kilcoyne, A.L.D., Phaneuf, R.A., Ballance,
C.P., McLaughlin, B.M.: Single-photon single ionization of W C ions: experiment and theory.
J. Phys. B: At. Mol. Opt. Phys. 48, 2352033 (2015)
41. Müller, A.: Precision studies of deep-inner-shell photoabsorption by atomic ions. Phys. Scr.
90, 054004 (2015)
42. Macaluso, D.A., Aguilar, A., Kilcoyne, A.L.D., Red, E.C., Bilodeau, R.C., Phaneuf, R.A.,
Sterling, N.C., McLaughlin, B.M.: Absolute single-photoionization cross sections of Se2C :
experiment and theory. Phys. Rev. A 92, 063424 (2015)
43. McLaughlin, B.M., Ballance, C.P., Schippers, S., Hellhund, J., Kilcoyne, A.L.D., Phaneuf,
R.A., Müller, A.: Photoionization of tungsten ions: experiment and theory for W 2C and W 3C .
J. Phys. B: At. Mol. Opt. Phys. 49, 065201 (2016)
44. Müller, A., Schippers, S., Hellhund, J., Kilcoyne, A.L.D., Phaneuf, R.A., Ballance, C.P.,
McLaughlin, B.M.: Single and multiple photoionization of W qC tungsten ions in charged states
q D 1; 2; ::; 5: experiment and theory. J. Phys. Conf. Ser. 488, 022032 (2014)
45. Cowan, R.D.: The Theory of Atomic Structure and Spectra. University of California Press,
Berkeley (1981)
46. Fontes, C.J., Zhang, H.L., Abdallah, J. Jr., Clark, R.E.H., Kilcrease, D.P., Colgan, J.P.,
Cunningham, R.T., Hakel, P., Magee, N.H., Sherrill, M.E.: The Los Alamos suite of relativistic
atomic physics codes. J. Phys. B: At. Mol. Opt. Phys. 48, 144014 (2015)
47. Kramida, A.E., Ralchenko, Y., Reader, J., NIST ASD Team: NIST Atomic Spectra Database
(version 5.2). National Institute of Standards and Technology, Gaithersburg (2014)
48. Hinojosa, G., Covington, A.M., Alna’Washi, G.A., Lu, M., Phaneuf, R.A., Sant’Anna, M.M.,
Cisneros, C., Álvarez, I., Aguilar, A., Kilcoyne, A.L.D., Schlachter, A.S., Ballance, C.P.,
McLaughlin, B.M.: Valence-shell single photoionization of KrC ions: experiment and theory.
Phys. Rev. A 86, 063402 (2012)
49. Barthel, M., Flesch, R., Rühl, E., McLaughlin, B.M.: Photoionization of the 3s2 3p4 3 P and the
3s2 3p4 1 D;1 S states of sulfur: experiment and theory. Phys. Rev. A 91, 013406 (2015)
50. Kennedy, E.T., Mosnier, J.-P., Van Kampen, P., Cubaynes, D., Guilbaud, S., Blancard, C.,
McLaughlin, B.M., Bizau, J.-M.: Photoionization cross sections of the aluminumlike SiC ion
in the region of the 2p threshold (94–137 eV). Phys. Rev. A 90, 063409 (2014)
51. Pindzola, M.S., Robicheaux, F., Loch, S.D., Berengut, J.C., Topcu, T., Colgan, J., Foster, M.,
Griffin, D.C., Ballance, C.P., Schultz, D.R., Minami, T., Badnell, N.R., Witthoeft, M.C., Plante,
D.R., Mitnik, D.M., Ludlow, J.A., Kleiman, U.: The time-dependent close-coupling method for
atomic and molecular collision processes. J. Phys. B: At. Mol. Opt. Phys. 40, R39 (2007)
52. Li, Y., Pindzola, M.S., Colgan, J.P.: Double photoionization of He from the 1s2p 3 Po excitated
state. J. Phys. B: At. Mol. Opt. Phys. 49, 19205 (2016)
53. Helgaker, T., Jorgesen, P., Oslen, J.: Molecular Electronic-Structure Theory. Wiley, New York
54. Langhoff, S., Davidson, E.R.: Configuration interaction calculations on the nitrogen molecule.
Int. J. Quantum Chem. 8, 61 (1974)
55. van Dishoeck, E.F., Jonkheid, B., van Hemert, M.C.: Photoprocesses in protoplanetary disks.
Faraday Discuss. 133, 855 (2006)
56. Benz, A.O., et al.: Hydrides in young stellar objects: radiation tracers in a protostar-disk-
outflow system. Astron. Astrophys. 521, A35 (2010)
57. Nagy, Z., et al.: The chemistry of ions in the orion Bar I. – CHC ,SH C , and CFC . Astron.
Astrophys. 550, A96 (2013)
58. Abel, N.P., Federman, S.R., Stancil, P.C.: The effects of doubly ionized chemistry on SH C and
S2C abundances in X-ray-dominated regions. Astrophys. J. 675, L81 (2008)
59. McMillan, E.C., Shen, G., McCann, J.F., McLaughlin, B.M., Stancil, P.C.: Rovibrationally
resolved photodissociation of SH C . J. Phys. B: At. Mol. Opt. Phys. 49, 084001 (2016)
Estimation of Nucleation Barriers
from Simulations of Crystal Nuclei Surrounded
by Fluid in Equilibrium
A. Statt • P. Koß
Graduate School Materials Science in Mainz, Staudinger Weg 9, D-55099, Mainz, Germany
e-mail: statt@uni-mainz.de
P. Virnau () • K. Binder
Institut für Physik, Johannes Gutenberg-Universität, Staudinger Weg 7, D-55099, Mainz,
e-mail: virnau@uni-mainz.de
free energy
interfacial term
nucleation barrier
ΔF ∗
R∗ R
Fig. 1 Formation free energy contribution of a nucleus F as function of its linear dimension
R. In d D 3 dimensions, the volume term is negative and scales like R3 , but the interfacial term
is positive and scales like R2 . Thus a nucleation barrier F for a “critical droplet” with linear
dimension R results
1.04 2.25
(111) Lz = 29.39
1.03 (111) Lz = 39.19
Lz 2.20
βγ̃(Lx Ly )
βγ̃(Lx Ly )
(100) = 25.46
1.02 (100) Lz = 33.94 (111) Lz = 29.39
(110) Lz = 24.00 (110) Lz = 24.00
1.01 (110) Lz = 30.00 (100) Lz = 25.46
1.00 2.10
0.99 2.05
0 0.01 0.02 0.03 0 0.01 0.02 0.03
1/(Lx Ly ) 1/(Lx Ly )
Fig. 2 Finite size scaling for the reduced interfacial tension of the soft effective Asakura-Oosawa
(softEffAO) model at two reduced interaction strengths, rp D 0:1 (left part) and rp D 0:2 (right
part), plotted vs inverse interfacial area, using Lx Ly Lz geometry and periodic boundary
conditions. Three orientations of the interface are shown, (111) [i.e. a closed packed interface
in the face-centered cubic crystal lattice], (110) and (100) (Part (a) is taken from Ref. [3], Part (b)
from Ref. [4])
due to the unfavorable surface free energy contribution. Classical nucleation theory
estimates this barrier making two assumptions: (i) The critical nucleus can be
described by a spherical droplet, R being its radius. (ii) The interfacial free
energy just is 4 R2
being the interfacial tension of a flat planar interface.
However, while these assumptions look rather reasonable for the nucleation of liquid
droplets from supersaturated vapor, they make little sense for crystal nucleation: the
spherical shape of the nucleus is not consistent with its regular crystal structure,
and furthermore
is not isotropic, but rather depends somewhat on the orientation
of the interface relative to the crystal axes. This is demonstrated in Fig. 2 for the
Estimation of Nucleation Barriers from Simulations 51
model of attractive colloidal particles studied in the present work [3, 4]. This model
will be explained in the following section. Here it suffices to know that for weak
attraction between the colloidal particles this anisotropy is rather weak (left part of
Fig. 2), and hence an almost spherical droplet shape may be expected, while for
stronger attraction (right part of Fig. 2) the anisotropy is more noticeable. Then
the crystal shape will deviate from a sphere. Each point in Fig. 2a, b took around
4 24 h on 1000 CPUs in parallel. If the interfacial free energy were known
for arbitrary interface orientation (and not just for the three choices (111), (110)
and (100) displayed in Fig. 2), one could find the equilibrium crystal shape from the
Wulff construction [5]. First of all, this procedure is cumbersome, and knowing
for only three orientations n of the interface, this is only possible approximately. If
we could do that, the surface term in Fig. 1 could be written as function of the
nucleus volume V as
Fsurf .V/ D V 2=3
.n/ds Aw
N V 2=3 ; (1)
where the surface integral ds is extended over a crystal having the Wulff shape
and unit volume. The corresponding surface area is Aw , and
N is then an average
interfacial tension. Then F in Fig. 1 becomes ( pc is the pressure in the crystal and
pl is the pressure in the liquid)
F D . pc p` /V C Fsurf .V/ D . pc p` /V C Aw
N V 2=3 ; (2)
N 1
V 1=3 D ; F D Aw
N V 2=3 : (3)
3. pc p` / 3
Now the present study simply exploits the idea [6, 7] to combine both Eqs. (3) as
follows and expand the pressures at the coexistence conditions as
F D pc p` V ; (4)
pc pcoex C .6= /m .c . pc / coex /;
pl pcoex C .6= /f .l . pl / coex /; (5)
with m .f / being the packing fractions of the (spherical) colloidal particles where
the melting (freezing) sets in. Since in equilibrium the chemical potential for the
nucleus coexisting with fluid is homogeneous,
c . pc / D l . pl / D ; (6)
52 A. Statt et al.
Fig. 3 Schematic plot of the chemical potential vs. density (or packing fraction D
. c3 =6/, c being the colloid diameter, respectively), for a system undergoing a phase transition
from liquid (at density f ) to solid (at density m ) at D coex in the thermodynamic limit (broken
horizontal straight line) and in a box of finite volume Vbox (full curve). Due to finite size effects,
the homogeneous liquid is stable until the density 1 where the droplet evaporation/condensation
transition occurs. For 1 < < 2 a nucleus surrounded by liquid is stable: this is the region of
interest, where l , pl , and V need to be extracted. At 2 , a transition occurs to a cylinder-like
nucleus, stabilized by the periodic boundary conditions that are applied throughout. For D 3 a
transition to a slab-like crystal with two planar interfaces occurs. In the slab region, theory requires
D coex , if the periodic boundary condition is commensurable with the crystal periodicity. The
different states are illustrated with snapshot pictures of configurations of the model with rp D 0:1
(particles in the crystal are shown in red, in the fluid in blue, in the interface region in green)
(From [6])
As a caveat, we mention that Vbox has to be chosen large enough so that fluctuations
of and pl are relatively small, and one can only work in a restricted range of
packing fractions (avoiding both the “droplet-evaporation/condensation” transition
[8] and the appearance of cylinder-like droplets or slab structures [9], see Fig. 3).
Estimation of Nucleation Barriers from Simulations 53
ΔF ∗
150 50 70 90 110 70 110 150 190
6000 6000
8000 500 8000
120 10000 10000
1.082 2.406
90 (111) 400 (111)
(100) (100)
60 (110) (110)
45 60 75 90 105 120 80 100 120 140 160 180 200
V ∗2/3 V ∗2/3
Fig. 4 Nucleation barriers F .V / plotted vs V 2=3 for rp D 0:1 (a) and rp D 0:2 (b). Here
units kB T D 1 and c D 1 are used. Three system sizes are included in each case, containing
N D 6000, 8000 or 10;000 particles, respectively. Full straight lines show fits assuming a spherical
surface (replacing Aw by Aiso D .36 /1=3 in Eq.(3)) and then fitting
,N with result
N D 1:082 (a)
N D 2:406 (b). The dotted lines indicate the predictions when one would take, in case (a)
111 D 1:013,
110 D 1:044 and
100 D 1:039, and in case (b)
111 D 2:078,
110 D 2:224 and
Before presenting any details on our procedures, we show central results to show
that the strategy outlined above works (Fig. 4). Indeed, apart from small deviations,
for all choices of N used there is a broad regime where the proportionality of
F to V 2=3 holds, and the important feature is that these data superimpose to
a common straight line irrespective of N in each case. This property is crucial,
because we want to be able to describe nucleation phenomena in bulk materials,
not in nanoscopically small boxes with periodic boundary conditions. The use of
such boxes is needed to be able to study nuclei in thermal equilibrium – a nucleus
on top of the barrier in Fig. 1 is unstable against thermal fluctuations, of course, and
cannot be straightforwardly studied.
However, as the insets in Fig. 4a, b show, there occur minor deviations from the
fit to a common straight line, but these deviations are of the order of a few percent
only. These deviations are to a fewer extent statistical errors, but to a larger extent
systematic. We attribute the systematic errors due to the fact that in our geometry the
chemical potential in the system is not strictly constant (as tacitly assumed in Fig. 1),
but fluctuates. This fluctuation is larger the smaller N is. A second systematic effect
comes from the translational entropy of the nucleus in the simulation box, which
scales proportional to ln.Vbox / and hence ln.N/. More research will be needed to
clarify the nature of these systematic corrections quantitatively.
In any case, the deviations due to the choice of a spherical droplet shape and use
of any of the interface tensions of planar interfaces (
111 ;
110 or
100 , respectively)
are distinctly larger than these systematic errors, and would lead to a significant
underestimation of the nucleation barrier, in particular for rp D 0:2. We expect that
this discrepancy will increase further for still larger rp , where ultimately faceted
54 A. Statt et al.
crystals [5] will result. We recall that in the simplistic lattice gas model at low
temperatures T the nucleus shape tends to a simple cube, and then the ratio of
the actual barrier F to the spherical approximation gradually tends to 6= as
T ! 0 [10].
In the next section, we shall give some details on the model that we have used for
our study, and in the third section, some details of the actual analysis that yielded
Fig. 4 will be given. Section 4 summarizes our conclusions, and gives an outlook to
future work.
12 ηpr = 0.00
10 0.20
0.2 0.3 0.4 0.5 0.6 0.7
Fig. 5 Normalized pressure p versus packing fraction for several choices of attraction strength
rp , as indicated. The branches at the left side represent the liquid phase and the branches at the
right side the crystal
while Uatt .r >D c .1 C q// D 0. The parameter is chosen such that the total
potential is smoothly differentiable at r D c , which yields D 0:967118 (rp D
0:1) or D 0:9892 (rp D 0:2), respectively. For this potential it is straightforwardly
possible to compute the pressure applying the Virial formula, unlike for the original
AO model.
Figure 5 shows then the phase diagram of this model for different choices of rp .
These data were taken by sampling the packing fraction by Monte Carlo runs in
the NpT-ensemble, using N D 4000 colloidal particles. The data for the crystal were
obtained using a perfect fcc lattice as initial condition, of course. Ideally, the liquid
branch should only occur for pressures p pcoex . However, as usually observed for
NpT simulations of first order transitions, this is not the case: there is a regime of
pressures where both phases are stable or metastable, respectively, and from the data
of Fig. 5 a straightforward estimation of pcoex is not possible. We have determined
pcoex by the method proposed by Zykova-Timan et al. [16]. In this method, one
studies slab configurations where in the initial state a crystal domain (of volume
Vc D L L Lc ) and a liquid domain (of volume Vl D L L .5L Lc /) are
present. Periodic boundary conditions are used, and thus the domains are separated
by two planar L L interfaces (L is chosen such, that at the chosen pressure the
crystal is not distorted). If the chosen pressure exceeds pcoex and we let the system
evolve in the Monte Carlo run, we expect that the crystal grows on expense of the
liquid, while the opposite behavior occurs for p < pcoex (see [16, 17] for more
details). Plotting the volume change of the total system versus Monte Carlo “time”
for various pressures we identify pcoex as the pressure where no volume change
occurs (Fig. 6). As discussed in [4, 7, 17], this method is not as straightforward as
it looks, since there is both the need to take averages over many equivalent runs to
reduce statistical noise in the curves such as shown in Fig. 6, and there is the need to
study several choices of L (or n, respectively) to extrapolate the resulting estimates
56 A. Statt et al.
(a) 30 (b)
0 2 4 6 8
MC-Cycles [103 ]
Fig. 6 (a) Volume change as a function of the number of Monte Carlo steps for rp D 0:2, choosing
n D 10 lattice planes in x and y directions, and pressures from p D 0:6 (red, top) to p D 3:0
(magenta, bottom). (b) Same plot as (a) for rp D 0:28
-4.5 -4.5
-5.0 μliquid -5.0
-5.5 -5.5
-6.0 -6.0
1 1.5 2 2.5 3 0.32 0.4 0.48 0.56 0.64 0.72
p η
Fig. 7 Chemical potential as a function of pressure (a) and packing fraction (b), for the softEffAO
model with rp D 0:2
of pcoex .n/ vs n2 in order to obtain an estimate for the true coexistence pressure
that applies in the thermodynamic limit. When pcoex is known and the liquid and
solid branches l . p/ and c . p/ are known (Fig. 5), we immediately can read off
f D l . pcoex /, m D c . pcoex /, and from the estimation of the pressure pl of the
liquid coexisting with the nucleus we can infer {Eq. 6} from the linear expansion
of pl {Eq.5}, and using also the expansion for pc {Eq. 5} we find c . pc / and V
can then be inferred from Eq. 8. However, it is advisable to check that one works
close enough to coexistence conditions such that the linear expansions, Eq 5, are
actually valid. For this purpose, a new method to estimate the chemical potential
has been developed [4, 6, 7, 17], since in many cases of interest the standard Widom
particle insertion method [18] cannot be applied. Figure 7 shows, as an example,
the chemical potential for rp D 0:2 plotted against both pressure and packing
fraction, using the estimate of pcoex as estimated above from the method explained
in Fig. 6. Indeed one finds that the curves vs. p for both phases are straight lines in
the regime of interest. Using pcoex D 1:78 ˙ 0:02 we found coex D 4:60 ˙ 0:04
in this case, leading to f D 0:374 ˙ 0:002, m D 0:688 ˙ 0:001.
Estimation of Nucleation Barriers from Simulations 57
Fig. 8 Different crystalline seeds (left column, part (a), (b)) lead to very similar shapes of the
equilibrated crystalline nuclei (next column, part (c), (d)) and almost identical distributions of
pressure (part (e), (f)) and density (part (g), (h)) of the surrounding fluid. All data refer to the case
N D 10;000, D 0:48, rp D 0:2. The equilibrated nuclei shapes where obtained after about 1010
Monte Carlo cycles, with each cycle comprised of N Monte Carlo trial moves
58 A. Statt et al.
Fig. 9 Pressure pl in the liquid surrounding a crystalline nucleus, as shown in Fig. 8, plotted vs.
the total packing fraction for the softEffAO model choosing rp D 0:2. Three choices of N are
shown, N D 6000, 8000, and 10;000, respectively. The region of interest is shown on the right
with strongly magnified scales. Values for the packing fraction of the fluid l are included and lie
on top of the bulk equation of state for the liquid branch. From [4]
4 Conclusions
A method to study the free energy barrier for homogeneous nucleation of crystals
from a fluid phase has been presented, which is not hampered by the fact that the
fluid-crystal interface tension in general is anisotropic. In the softEffAO model,
variation of the parameter rp that controls the strength of the effective attraction
between the colloidal particles allows to control this anisotropy (Fig. 2), and
indeed deviations from the standard (inappropriate) assumption of spherical nucleus
shape were found (Fig. 4). In the present report, several steps of analysis of the
simulation data have been explained. We also emphasize the need for accessing a
fast supercomputer such as HORNET at the HLRS Stuttgart for the research: typical
system sizes involve systems with 104 colloidal particles, and for obtaining data
such as shown in Fig. 6 averages over 100 runs carried out in parallel need to be
While the present work has addressed a simple model system, appropriate for
colloidal suspensions, future work should address interparticle potentials that are
relevant for materials science, since nucleation of crystals is a very relevant problem
there. Also an application to study the formation of ice nuclei in the atmosphere
is planned, since this problem is of central importance in the context of climate
modeling. In all cases, complementary studies of the kinetic aspects of nucleation
phenomena will be needed.
Acknowledgements We would like to thank the DFG for funding in the framework of the priority
program on heterogeneous nucleation (SPP 1296, grant Nı VI 237/4-3). P. K. is a recipient of a
DFG-fellowship/DFG-funded position through the Excellence Initiative by the Graduate School
Materials Science in Mainz (GSC 266). We thank the HLRS Stuttgart for generous grants of
computer time at the HORNET supercomputer. The authors gratefully acknowledge the computing
Estimation of Nucleation Barriers from Simulations 59
time granted on the supercomputer Mogon at Johannes Gutenberg University Mainz (www.hpc.
1 Introduction
Fig. 1 The fibrinogen molecule. (a) Schematic representation of the fibrinogen molecule. The
three chains of Fg, A˛, Bˇ and
are shown in blue, red and green, respectively. (b) Van der
Waals representation of the crystallographic structure (pdb 3GHG) of Fg, color coded as in (a).
Carbohydrates are in orange. The ˛C region and the FpA and FpB peptides were not resolved in
the crystal structure (Reprinted from Ref. [6]. Copyright (2015) Köhler et al. under the terms of
the Creative Commons Attribution License)
Bˇ and
which depart from their N-terminal region (E region), form an elongated
coiled-coil region, and end into two globular domains forming the D region (Fig.1).
The C terminal segment of the A˛ chain, i.e. the ˛C region, as well as the N-terminal
parts of chain A˛ and Bˇ, including FpA and FpB, are mostly disordered (thus, not
resolved in the crystal).
The D region contains several integrin binding sites, including the P1 and P2
sites (residues
190–202 and
377–395, respectively) which are known to bind
leukocyte integrin ˛M ˇ2 [7, 8], and site H12 (residues
392–411), which binds to
the platelet integrin receptor ˛IIb ˇ3 [9]. In particular, P1 is partly located in a cleft
between the
C and ˇC domain (binding cleft). Additionally, the D region contains
the a- and b-“holes” which are the binding sites of the “knobs” at the end of the Fp
tethers of the E region and play a major role in fibrin formation.
Although the available crystallographic structures of Fg show a relatively
limited variability, atomic force microscopy images of adsorbed Fg on several
surfaces reveal a large degree of conformational flexibility. Indeed, the typical tri-
nodular structure of Fg, as observed in adsorption studies, where the three nodules
correspond to the two D regions and the central E region, is very variable [10], and
the angle formed by the three nodules has a wide distribution [11, 12]. The source
of this conformational flexibility at the molecular level is not well understood. Early
sequence analysis [13] and comparison of several crystallographic structures of Fg
[5, 14, 15] suggested the presence of a hinge point in the middle of the coiled-coil
regions connecting the E and D regions. With the help of the simulations described
Fibrinogen Dynamics and Adsorption 63
below, we have suggested a possible role of this hinge point and the extent of
flexibility that it confers to the Fg molecule.
Two surfaces often used in Fg adsorption experiments are mica and graphite
[10, 11, 16–20]. They represent an ideal charged/hydrophilic and a non-polar sur-
face, respectively, and their sheet structure allows for the production of atomically
flat surfaces. The techniques used for these experiments, however, do not allow to
spatially resolve the mechanism behind the flexibility of Fg or the atomic scale
details of the adsorbed state. Simulations, which have been used to study protein
adsorption on mica [21–23] and graphite/graphene [24–28], can help to fill the
spatial resolution gap. The adsorption of the
C domain of Fg on various self assem-
bled monolayer surfaces has been investigated using atomistic molecular dynamics
(MD) simulations, which showed rolling motions but no deformations [29]. The
adsorption of Fg on graphene has also been investigated using atomistic MD
simulations [28], which showed slow equilibration possibly driven by the formation
of hydrophobic contacts and conformational rearrangements. Further simulations
of Fg explored its mechanical response to external forces [30, 31], as well as its
flexibility in solution [6]. Fg adsorption has been also studied using simplified
models [32–34], where Fg is replaced by one or a small number of interacting
objects that represent the whole molecule or the globular regions. In these models,
as well as in models of fibrin polymerization [35], the internal flexibility of Fg is
either ignored or treated approximately, although it may play a very important role
especially in the characterization of its hydrodynamic properties.
After presenting the methodological tools used in our work we show how we
have addressed two key aspects of Fg behavior, namely its interanal dynamics and
its adsorption properties.
In Sect. 3, we report the results of extensive molecular dynamics (MD) simula-
tions performed on Fg in solution. The simulations allow for the identification of
large bending motions centered at a hinge point on the coiled-coil region of Fg.
We also show how these bending motions may provide a conserved mechanism
facilitating the action of plasmin in fibrinolysis.
In Sect. 4 we report on atomistic molecular dynamics simulations of the initial
stages of Fg adsorption on mica and graphite. In these simulations we address the
speed, strength and reversibility of the adsorption process on both surfaces, as well
as the emergence of preferential adsorption orientations for the Fg protomer. We also
address the change in the flexibility of Fg upon adsorption as well as the possible
onset of deformation/denaturation.
2 Simulation Methods
The simulations are based on the crystal structure of human Fg (PDB ID: 3GHG)
[5]. The carbohydrate groups that are only partly resolved in the crystal have been
modelled using VMD and introduced in some of the simulations. The unresolved
parts of the protein structure (the ˛C domain and the N terminal segments of all the
64 S. Köhler et al.
chains) have not been included in the calculations. Several molecular constructs
have been prepared to assess the role of the different components of the Fg
molecule. Rectangular periodic simulation boxes with explicit TIP3P water [36]
and physiological ion concentration (150 mMol [NaCl]) were prepared using VMD
[37] (see Tables 1 and 2 for box sizes of the solution and adsorption simulations,
Isobaric-isothermal simulations were set up at a temperature of 310 K and
pressure of 1atm using NAMD [38] with a Langevin thermostat and a Langevin
piston barostat [39, 40] using 200 and 100 ps1 as decay time, respectively. The
covalent bonds involving hydrogen atoms were fixed in length and a 2fs timestep
was used. The CHARMM22 force field with CMAP corrections [41] was used with
its recent extension to carbohydrates [42] in combination with ParamChem (http:/
www.paramchem.org) and the CHARMM generalized force field (CGenFF) [43].
The van der Waals forces were cut off at 1.2 nm while PME was used for long range
electrostatic interactions with a grid spacing of 1 Å. After energy minimization
(NAMD’s conjugate gradient algorithm, 15,000 steps) of hydrogen atoms and water
molecules, the systems were heated and equilibrated for 10 ns. Production runs
statistics are given in Tables 1 and 2 for the solution and adsorption simulations,
In the case of the adsorption simulations, mica and graphite were chosen as
model solid surfaces for adsorption. The graphite surface was built as a six layer
graphene sheet using the Carbon Nanostructure Builder within the program VMD
[37] and modeled using standard CHARMM aromatic carbon parameters. The mica
surface was constructed according to a recently published model [44] which was
successfully adopted to simulate peptide adsorption [45, 46]. The mica surface
consists of a two-layer sheet with realistic surface defects. The defects are point
defects where an aluminum atom substitutes a silicon atom. Potassium ions are
evenly distributed on the two sides of the mica slab. The solid surfaces were
constructed as being continuous in the xy-plane by defining covalent bonds that
wrap around the periodic boundaries. Then, a 12 nm high water box was constructed
on top of the solid surface using VMD. After the equilibration of the solvated
surface box, the first protomer and residues ˛27–65, ˇ58–95 and
14–40 of the
second protomer (see Fig. 2) were added to the simulation box. The protein was
added in such a way that the minimal distance to the solid surface was at least
0.8 nm. Three different initial orientations (labeled 0, 120, 240) were constructed by
rotating Fg around its long axis by 120ı. This procedure limits the sampled space to
so called side-on adsorption which is known to be the dominant adsorption mode, at
least in the dilute regime [11, 33]. After this step, the surface systems were further
minimized and equilibrated for 0.75 ns before starting the production runs listed in
Table 2).
To identify the collective motions of the whole Fg molecule and of its sub-
domains we performed several principal component analyses (PCA) [47] using
wordom [48] and GROMACS utilities [49]. DynDom [50] was used to identify
rigid domains and hinges of motion. The overlap between spaces spanned by the
dominant PCA modes of different simulations was used to quantify the similarity of
the observed dynamics [51].
In the adsorption simulations the orintations of the D and E domain are
investigated separately. To characterize the different adsorption orientations we
Fig. 2 (a) Schematic representation of Fg near a solid surface. The simulated part of the protein
is colored in black. In (b) and (c) a close-up view of the D and E region, respectively, where the
vectors used to characterize the orientation of regions with respect to the surface are indicated with
red arrows. See main text for a detailed description (Reprinted with permission from Köhler et al.
[60]. Copyright 2015 American Chemical Society)
66 S. Köhler et al.
defined an angle
describing the tilting of a relevant axis of the region with respect
to the surface and an angle describing the rotation around the identified axis
(Fig. 2b, c). Both
and are defined separately for the D and E regions. A contact
between the protein and the solid surface is defined when a heavy protein atom
comes closer than 0.5 nm to the surface. If such a contact is formed in the globular
D and E regions and persists longer than 1 ns, we call this an adsorbed state. A
contact between a given residue and the solid surface is called persistent if it forms
in all sets of simulations, independent of the starting configuration. As a reference
for an unbiased adsorption process, we also measured the fraction of heavy atoms on
the protein surface contributed by each residue type: charged residues contributed
52 % of the surface atoms, polar uncharged 33 %, hydrophobic 11.5 % and the
carbohydrate groups 3.5 %. Here protein atoms were considered as being on the
protein surface if they were within 0.2 nm of water atoms. To detect spreading of
the globular regions of the protein during an adsorption event, we monitored the
“domain height”, which we define as the z-component of the distance between the
center of mass and that protein atom which is closest to the surface of the material.
The simulations have been carried out, in part, on Hornet/Hazelhen at the
High performance computing center Stuttgart. The simulations were carried out
using NAMD [38], which has been parallelized using MPI and can be specifically
compiled for the XC40 architecture. The large classical MD systems studied here
are particularly fit to the Cray XC40 architecture, as, thanks to the high performance
interconnect, they scale well up to 4000 cores. The typical job used for these
calculations involved 50–100 nodes and lasted for less than 3 h allowing to collect
about 1 ns of trajectory per job, depending on system size.
Fig. 3 Characterization of the large bending motions of fibrinogen. (a)–(c) Dominant PCA
modes of the Fg protomer with the hinge region highlighted in yellow (chains colored according to
Fig. 1). For each PCA mode, the two structures with the largest (solid) and smallest (transparent)
projection along the PCA mode are represented. An illustration of the bending angle
and the
torsion angle ' is superimposed to the first PCA mode. The three groups of atoms used to define
angle are the E region (˛50–58, ˇ82–90,
23–31), the hinge region (˛99–110, ˇ130–155,
pure torsion of the coiled coil along its axis (not shown). The motions are reversible
as shown by the time series of the PCA projections (Fig. 3d) Lower ranking PCA
modes provide smaller contributions to the overall variance so they will not be
analyzed further.
The program DynDom [50], applied to the extremal structures observed along
the first PCA mode (Fig. 3a) of the Fg protomer, has been used to identify the
regions of the molecule which are more rigid in our simulations, as well as the
connecting hinge regions. DynDom reports the presence of two relatively rigid
regions, separated by a hinge. The E region and the N-terminal part of the coiled-
coil region represent one of the two rigid domains, while the C-terminal part of the
coiled-coil region along with the D region represent the second. The hinge region is
68 S. Köhler et al.
located approximately in the middle of the coiled-coil region and includes the break
in the ˛-helical structure of the
chain, which gives rise to a flexible loop (residues
70–78), along with the neighbor residues on the A˛ and Bˇ chains (Fig. 3a–c). The
break in the ˛-helical structure of the
-chain is facilitated by two proline residues.
The bending around the identified hinge can be described by a bending angle
and a torsion angle ' defined using groups of atoms from the E region, the hinge
region and the D region (Fig. 3a). The
and ' angles strongly correlate with the
projections along the dominant PCA modes. Our simulation data show a consistent
and significant bending occurring at the hinge region and reaching bending angles
below 90ı (Fig. 3e). The time it takes for the Fg structure to reach a bending angle
below 110ı from conformations similar to the crystal structure (bending angle above
150ı) is 19 ˙ 1 ns along the trajectories, averaged over the 12 observed events (see
Fig. 3d). The reverse process occurs twice in the simulations, taking 20 and 26 ns.
The simulations of the full Fg dimer do not show significant correlations between
the angle values observed at the two hinges.
Comparison of the crystallographic structures of Fg coiled-coil regions from
various organisms already hinted at the presence of a flexible hinge [5]. This
hypothesis is also supported by hydrogen-deuterium exchange experiments [52].
The latter are in good agreement with our simulations: amino acids from the coiled-
coil region with lower helical probability in the simulations (Fig. 4b) correspond to
amino acids with low protection factors in the experiments. The hinge is positioned
on the non-helical segment of the
chain (
70–78), most probably due to the
resulting reduction in the stiffness of the coiled coil. This segment is non-helical
also in the other crystallized Fg structures [14, 15]. In addition, this segment
has markedly helix-breaking features in most of the available Fg sequences from
vertebrates that we have analyzed, showing a large density of proline and glycine
residues as well as high probability to be a disordered/hot loop as revealed by the
program DisEMBL [53] (Fig. 4a). This analysis supports the idea that the non-
helical segment of the
chain provides a function that is strongly conserved across
vertebrates possibly linked to the bending motion of the coiled-coil region. Besides
providing flexibility to the individual Fg molecules as well as the fibrin fibers [5],
the bending at the hinge may help expose the plasmin cleavage sites located nearby
on the coiled-coil region [13]. Our simulations strongly support this hypothesis
showing that the ˛-helical structure around the plasmin cleavage sites A˛104–105
and Bˇ133–134 is partly disrupted by the bending motions, and the exposure to the
solvent of the involved peptide bonds increases (Fig. 4c, d). The twisting of fibrin
fibers [54] compresses molecules in the center of the fiber and stretches them on the
perimeter. If the hinge bending is necessary to accommodate such deformation, it is
reasonable to believe that the bending motions at the hinge may actually be reduced
by tension applied along the fiber axis. Thus, fibrinolysis assisted by the bending
motions at the hinge may selectively take place on fibrin molecules subject to
reduced tension. This hypothesis is supported by experimental evidence indicating
reduced plasmin fibrinolytic effectiveness on fibrin fibers subject to mechanical
tension [55].
Fibrinogen Dynamics and Adsorption 69
Fig. 4 Functional role of the bending motions in the coiled-coil region of fibrinogen. (a)
DisEMBL “hot coil” predictions for the sequences of the
chain from several vertebrates,
highlighting the fact that the flexibility of the non-helical loop is a conserved feature. The hinge
region is shaded and, within that region, the non-helical loop segment is dark shaded. The inset
legend reports the sequence alignment of the non-helical loop region across the same vertebrates,
highlighting the content of glycine and proline residues. (b) Cartoon representation of the coiled-
coil region of Fg colored according to the fraction of the simulation time spent in an ˛-helical
conformation (red = 0, green = 0.85, blue = 1). The N-termini of the segments are on the left. The
regions with lower helical fraction are in good agreement with regions with lower protection factors
as determined in H/D exchange experiments [52]. (c) Probability distribution of the fraction of
helical residues around the A˛104–105 and Bˇ133–134 plasmin cleavage sites as a function of
the bending angle
. Dark shades correspond to high probability. The three residues preceding
and following the cleavage sites (i.e., A˛102–107 and Bˇ131–136) have been included in the
calculation of the helicity. Larger bending (lower
angle) correlates with lower helical content. (d)
Snapshot of the conformation of the bent coiled-coil region (chains colored as in Fig. 1) showing
the disrupted secondary structure around the plasmin cleavage sites (rendered yellow and cyan
inside the dashed circle) (Reprinted from Ref. [6]. Copyright (2015) Köhler et al. under the terms
of the Creative Commons Attribution License)
Mica The dominant large scale motions of Fg on the mica surface (as identified
using PCA), are bending motions at the hinge, which closely resemble those
previously described for Fg in Sect. 3. A large essential dynamics (ED) overlap
(0.69–0.71) is observed between the three largest PCA modes measured in the dif-
ferent sets of simulations. The overlap with previously reported solution simulations
is also high (0.63). Furthermore the sampled distribution of hinge conformations
(Fig. 5a), as well as the observed bending time of (16 ˙ 6)ns are in reasonable
agreement with the corresponding results in solution. In some instances hinge
bending coupled with protein-surface contact formation can lead to sliding and
rolling motions on the surface. In a previous simulation of the
C domain [29]
of Fg a rolling motion has been observed. To our knowledge the sliding motion has
not been observed previously on this system.
In simulations on mica, the total number of contacts formed with the surface
reaches a plateau at 15 contacts (average over the simulations) after about
30 ns (Fig. 6a). The fraction of contacts formed by the different types of residues
70 S. Köhler et al.
Fig. 5 (a) Distribution of the hinge bending and dihedral angle at a mica surface. The definition
of the angles is shown in the inset. (b) Maximally bent state of Fg at the mica surface (highlighted
in (a)). A collision of the D and E regions prevents further bending at the hinge. The A˛, Bˇ and
chains from the whole protomer are rendered in blue, red and green, respectively, the carbohydrates
in orange. For clarity the solvent is not shown (Reprinted with permission from Köhler et al. [60].
Copyright 2015 American Chemical Society)
resembles essentially the one expected from the surface distribution of residues
– the only exception being that positively charged residues contribute more than
would be expected while the contribution of negatively charged and polar residues
is slightly lower than expected. This phenomenon is explained by the negatively
charged nature of the mica surface. Another significant feature of the simulations
is the observation of frequent desorption and reorientation events. On average a
globular region was only adsorbed for (14 ˙ 3)ns before leaving the surface again.
We observed a total of 51 adsorption events in the D region and 45 in the E region.
The great flexibility provided by the hinge allow us to treat these events separately.
Similarly, the adsorption orientation of the D- and E-regions can be analyzed
separately. Several different adsorption orientations have been identified for both
globular regions by dividing the space of the adsorption angles into a small number
of adsorption orientation states (Fig. 7a–b). For all simulated systems, the adsorption
orientation states of the D- and E-regions overlap significantly (Table 3), although
a bias towards the initial orientation state is certainly visible. These data clearly
indicate that transitions from one orientation state to another occur frequently. We
observed 68 reorientation events (changes in the adsorption orientation state) for the
Fibrinogen Dynamics and Adsorption 71
Fig. 6 Average total number of contacts and fraction contributed by the different types of residues
during adsorption on (a) mica and (b) graphite. The straight horizontal darkened lines represent the
expected fractions according to the exposed surface area in the crystal structure (after equilibration)
(Reprinted with permission from Köhler et al. [60]. Copyright 2015 American Chemical Society)
D-region and 22 for the E-region. Neglecting trajectories where the D- (E-) region
never contacted the surface this gives an average time of 27 ns (14 ns) between
reorientation events.
More specifically, the E region shows three distinct adsorption orientation states.
The orientation E1 is significantly populated in all sets of simulations. In cases
where the simulation starts with this orientation it never leaves it, while simulations
starting with the other orientations often show reorientation towards E1. These data
support the idea that E1 is a preferred adsorption orientation. In the E1 orientation
the flexible Fp tethers point toward the surface. The presence of many charged
residues in this region likely explains the preference for this adsorption orientation.
In simulations starting from orientation E1 (Mica/240), 60 % of the Fg-surface
contacts are provided by the E region. In Mica/0 and Mica/120 simulations, these
72 S. Köhler et al.
Fig. 7 (a), (b) Distribution of the adsorption angles for the D- and E-region, respectively, and
schematic representation of the corresponding adsorption orientation states. Chains are color
coded as in Fig. 5. The orange patch between the ˇC and
C domain indicates the binding cleft
while the purple region identifies the P2 and H12 binding sites. In orientation D4 the P2 and H12
binding sites face away from the surface and are available for binding. (c) The residues ˛27–
28,˛38,˛92,ˇ345–348, ˇ361–363, ˇ365,
361 and carbohydrates 479–480
from the first whole protomer and ˛27–30, ˛37–38, ˛63–65, ˇ91 and
38–40 from the second
truncated protomer form persistent contacts on mica regardless of initial orientation (red licorice).
The carbohydrate cluster (glycans) attached to the ˇC domain is shown in grey licorice, the P1
site is rendered in orange and the P2 and H12 sites in purple. (d) Example snapshot of a pair
of oppositely charged amino acids (lysine in blue, aspartic acid in green) anchoring the E region
in an Fp-down orientation. The aspartic acid interacts with a sodium ion (gray) that has replaced
potassium (pink) from the counter ion layer. Hydrogen bonds between the lysine and the silicate
ring are indicated by the black dotted lines. For clarity only the topmost atoms of the mica surface
are shown (Reprinted with permission from Köhler et al. [60]. Copyright 2015 American Chemical
Table 3 Fraction of the time spent in each of the adsorption orientation states defined in
Fig. 7a ,b. States containing the initial orientation are in bold
System D1 D2 D3 D4 E1 E2 E3
Mica, 0 33 % 28 % 5% 34 % 46 % 44 % 10 %
Mica, 120 0% 60 % 37 % 3% 13 % 28 % 59 %
Mica, 240 8% 0% 42 % 50 % 100 % 0% 0%
All 16 % 32 % 25 % 27 % 56 % 21 % 23 %
numbers are significantly lower (30 % and 19 %), further supporting that E1 is
indeed the preferred adsorption state.
The D-region shows several populated adsorption orientation states. In this
case, a preference for the orientation D4 is detectable. D4 is observed in all sets
of simulations, although no set of simulation starts from there. This orientation
Fibrinogen Dynamics and Adsorption 73
Fig. 8 (a) The adsorbed conformation of Fg on graphite shows a noticeable flattening of the
domains. Coloring as in Fig. 5b. Histogram of the change in domain height for (b) mica and (c)
graphite (Reprinted with permission from Köhler et al. [60]. Copyright 2015 American Chemical
5 Conclusions
Acknowledgements The authors thank Prof. H. Heinz for providing the structure of the mica
surface and for helpful discussions. SK gratefully acknowledges financial support from the
Graduate School Materials Science in Mainz. GS gratefully acknowledges financial support from
the Max-Planck Graduate Center with the University of Mainz. We gratefully acknowledge
support with computing time from the HPC facility Mogon at the university of Mainz, the Jülich
Supercomputing Center and the High performance computing center Stuttgart. This work was
partially supported by the German Science Foundation within SFB 1066 (project Q1).
20. Agnihotri, A., Siedlecki, C.A.: Langmuir 20(20), 8846 (2004). doi:10.1021/la049239+. http://
21. Heinz, H.: J. Comput. Chem. 31(7), 1564 (2010). doi:10.1002/jcc.21421. http://dx.doi.org/10.
22. Starzyk, A., Cieplak, M.: J. Chem. Phys. 139(4), 045102 (2013). doi:10.1063/1.4813854.
23. Kubiak-Ossowska, K., Burley, G., Patwardhan, S.V., Mulheran, P.A.: J. Phys. Chem. B
117(47), 14666 (2013). doi:10.1021/jp409130s. http://dx.doi.org/10.1021/jp409130s
24. Raffaini, G., Ganazzoli, F.: Langmuir 19(8), 3403 (2003). doi:10.1021/la026853h. http://pubs.
25. Utesch, T., Daminelli, G., Mroginski, M.A.: Langmuir 27(21), 13144 (2011).
doi:10.1021/la202489w. http://pubs.acs.org/doi/abs/10.1021/la202489w
26. Kang, S.G., Huynh, T., Xia, Z., Zhang, Y., Fang, H., Wei, G., Zhou, R.: J. Am. Chem. Soc.
135(8), 3150 (2013). doi:10.1021/ja310989u. http://pubs.acs.org/doi/abs/10.1021/ja310989u
27. Baweja, L., Balamurugan, K., Subramanian, V., Dhawan, A.: Langmuir 29(46), 14230 (2013).
doi:10.1021/la4033805. http://dx.doi.org/10.1021/la4033805
28. Chong, Y., Ge, C., Yang, Z., Garate, J.A., Gu, Z., Weber, J.K., Liu, J., Zhou, R.: ACS Nano
9(6), 5713 (2015). doi:10.1021/nn5066606. http://dx.doi.org/10.1021/nn5066606
29. Agashe, M., Raut, V., Stuart, S.J., Latour, R.A.: Langmuir 21(3), 1103 (2005).
doi:10.1021/la0478346. http://pubs.acs.org/doi/abs/10.1021/la0478346
30. Lim, B.B., Lee, E.H., Sotomayor, M., Schulten, K.: Structure 16(3), 449 (2008). doi:10.1016/
j.str.2007.12.019. http://www.sciencedirect.com/science/article/pii/S0969212608000476
31. Zhmurov, A., Brown, A.E., Litvinov, R.I., Dima, R.I., Weisel, J.W., Barsegov, V.: Structure
19(11), 1615 (2011). doi:10.1016/j.str.2011.08.013
32. Adamczyk, Z., Barbasz, J., Cieśla, M.: Langmuir 26(14), 11934 (2010).
doi:10.1021/la101261f. http://pubs.acs.org/doi/abs/10.1021/la101261f
33. Adamczyk, Z., Barbasz, J., Cieśla, M.: Langmuir 27(11), 6868 (2011). doi:10.1021/la200798d.
34. Vilaseca, P., Dawson, K.A., Franzese, G.: Soft Matter 9, 6978 (2013). doi:10.1039/
C3SM50220A. http://dx.doi.org/10.1039/C3SM50220A
35. Rocco, M., Molteni, M., Ponassi, M., Giachi, G., Frediani, M., Koutsioubas, A., Profumo, A.,
Trevarin, D., Cardinali, B., Vachette, P., Ferri, F., Prez, J.: J. Am. Chem. Soc. 136(14), 5376
(2014). doi:10.1021/ja5002955. http://dx.doi.org/10.1021/ja5002955
36. Jorgensen, W.L., Chandrasekhar, J., Madura, J.D., Impey, R.W., Klein, M.L.: J. Chem. Phys.
79(2), 926 (1983). doi:10.1063/1.445869. http://link.aip.org/link/?JCP/79/926/1
37. Humphrey, W., Dalke, A., Schulten, K.: J. Mol. Graph. 14, 33 (1996)
38. Phillips, J.C., Braun, R., Wang, W., Gumbart, J., Villa, E., Chipot, C., Skeel, R.D., Kale, L.,
Schulten, K.: J. Comput. Chem. 26, 1781 (2005)
39. Martyna, G.J., Tobias, D.J., Klein, M.L.: J. Chem. Phys. 101(5), 4177 (1994).
doi:10.1063/1.467468. http://link.aip.org/link/?JCP/101/4177/1
40. Feller, S.E., Zhang, Y., Pastor, R.W., Brooks, B.R.: J. Chem. Phys. 103(11), 4613 (1995).
doi:10.1063/1.470648. http://link.aip.org/link/?JCP/103/4613/1
41. Mackerell, A.D., Feig, M., Brooks, C.L.: J. Comput. Chem. 25(11), 1400 (2004).
doi:10.1002/jcc.20065. http://dx.doi.org/10.1002/jcc.20065
42. Guvench, O., Mallajosyula, S.S., Raman, E.P., Hatcher, E., Vanommeslaeghe, K., Fos-
ter, T.J., Jamison, F.W., MacKerell, A.D.: J. Chem. Theory Comput. 7(10), 3162 (2011).
doi:10.1021/ct200328p. http://pubs.acs.org/doi/abs/10.1021/ct200328p
43. Vanommeslaeghe, K., Hatcher, E., Acharya, C., Kundu, S., Zhong, S., Shim, J., Darian, E.,
Guvench, O., Lopes, P., Vorobyov, I., Mackerell, A.D.: J. Comput. Chem. 31(4), 671 (2010).
doi:10.1002/jcc.21367. http://dx.doi.org/10.1002/jcc.21367
44. Heinz, H., Koerner, H., Anderson, K.L., Vaia, R.A., Farmer, B.L.: Chem. Mater. 17(23), 5658
(2005). doi:10.1021/cm0509328. http://pubs.acs.org/doi/abs/10.1021/cm0509328
45. Bertran, O., Curcó, D., Zanuy, D., Alemán, C.: Faraday Discuss 166, 59 (2013)
78 S. Köhler et al.
46. Maity, S., Zanuy, D., Razvag, Y., Das, P., Alemn, C., Reches, M.: Phys. Chem. Chem. Phys.
17(23), 15305 (2015). doi:10.1039/c5cp00088b. http://dx.doi.org/10.1039/c5cp00088b
47. Kitao, A., Hirata, F., Go, N.: Chem. Phys. 158, 447 (1991). doi:http://dx.doi.org/10.1016/0301-
0104(91)87082-7. http://www.sciencedirect.com/science/article/pii/0301010491870827
48. Seeber, M., Cecchini, M., Rao, F., Settanni, G., Caflisch, A.: Bioinformatics 23(19), 2625
49. Spoel, D.V.D., Lindahl, E., Hess, B., Groenhof, G., Mark, A.E., Berendsen, H.J.C.: J. Comput.
Chem. 26(16), 1701 (2005). doi:10.1002/jcc.20291. http://dx.doi.org/10.1002/jcc.20291
50. Poornam, G.P., Matsumoto, A., Ishida, H., Hayward, S.: Proteins: Struct. Funct. Bioinf. 76(1),
201 (2009). doi:10.1002/prot.22339. http://dx.doi.org/10.1002/prot.22339
51. Hess, B.: Phys. Rev. E. 62, 8438 (2000). doi:10.1103/PhysRevE.62.8438. http://link.aps.org/
52. Marsh, J.J., Guan, H.S., Li, S., Chiles, P.G., Tran, D., Morris, T.A.: Biochemistry 52(32), 5491
(2013). doi:10.1021/bi4007995. http://pubs.acs.org/doi/abs/10.1021/bi4007995
53. Linding, R., Jensen, L.J., Diella, F., Bork, P., Gibson, T.J., Russell, R.B.: Structure 11(11), 1453
(2003). doi:http://dx.doi.org/10.1016/j.str.2003.10.002. http://www.sciencedirect.com/science/
54. Weisel, J.W., Nagaswami, C., Makowski, L.: Proc. Natl. Acad. Sci. U. S. A. 84(24), 8991
55. Varj, I., Stonyi, P., Machovich, R., Szab, L., Tenekedjiev, K., Silva, M.M.C.G., Longstaff,
C., Kolev, K.: J. Thromb. Haemost. 9(5), 979 (2011). doi:10.1111/j.1538-7836.2011.04203.x.
56. Yermolenko, I.S., Fuhrmann, A., Magonov, S.N., Lishko, V.K., Oshkadyerov, S.P., Ros, R.,
Ugarova, T.P.: Langmuir 26(22), 17269 (2010). doi:10.1021/la101791r. http://pubs.acs.org/doi/
57. Podolnikova, N.P., Yermolenko, I.S., Fuhrmann, A., Lishko, V.K., Magonov, S., Bowen,
B., Enderlein, J., Podolnikov, A.V., Ros, R., Ugarova, T.P.: Biochemistry 49(1), 68 (2010).
doi:10.1021/bi9016022. http://pubs.acs.org/doi/abs/10.1021/bi9016022
58. Patwardhan, S.V., Emami, F.S., Berry, R.J., Jones, S.E., Naik, R.R., Deschaume, O., Heinz, H.,
Perry, C.C.: J. Am. Chem. Soc. 134(14), 6244 (2012). doi:10.1021/ja211307u. http://pubs.acs.
59. Sivaraman, B., Latour, R.A.: Biomaterials 31(5), 832 (2010). doi:10.1016/
j.biomaterials.2009.10.008. http://dx.doi.org/10.1016/j.biomaterials.2009.10.008
60. Köhler, S., Schmid, F., Settanni, G.: Langmuir 31(48), 13180–13190 (2015).
doi:10.1021/acs.langmuir.5b03371. PMID: 26569042. http://dx.doi.org/10.1021/acs.langmuir.
Vorticity, Variance, and the Vigor of Many-Body
Phenomena in Ultracold Quantum Systems:
Abstract During the past year of the MCTDHB project at the HLRS, we continued
to strive and conquest further applications, developments, and expansion of the
MultiConfigurational Time-Dependent Hartree for Bosons (MCTDHB) method in
the context of ultracold atomic systems. We also announce the MCTDH-X package,
the Multiconfigurational Time-Dependent Hartree for Indistinguishable Particles
X package, which is able to treat identical bosons and fermions, with or without
spin/internal degrees of freedom, alike. Here we report on a plethora of results
and versatile applications which include: (i) single-shot imaging of fluctuating
vortices in a fragmented Bose-Einstein condensate (BEC); (ii) the many-body
O.E. Alon
Department of Physics, University of Haifa at Oranim, 36006, Tivon, Israel
R. Beinke • L.S. Cederbaum • S. Klaiman • A.I. Streltsov ()
Theoretische Chemie, Physikalisch-Chemisches Institut, Universität Heidelberg,
Im Neuenheimer Feld 229, D-69120, Heidelberg, Germany
e-mail: Alexej.Streltsov@pci.uni-heidelberg.de
M.J. Edmonds • N.G. Parker
Joint Quantum Centre (JQC) Durham-Newcastle, School of Mathematics and Statistics,
Newcastle University, NE1 7RU, Newcastle upon Tyne, England, UK
E. Fasshauer
Department of Chemistry, University of Tromsø – The Arctic University of Norway,
Centre for Theoretical and Computational Chemistry, N-9037, Tromsø, Norway
M.A. Kasevich
Department of Physics, Stanford University, 94305, Stanford, CA, USA
A.U.J. Lode
Department of Physics, University of Basel, Klingelbergstrasse 82, CH-4056, Basel, Switzerland
K. Sakmann
Vienna Center for Quantum Science and Technology, Atominstitut TU Wien, Stadionallee 2,
1020, Vienna, Austria
M.C. Tsatsos
Instituto de Física de São Carlos, Universidade de São Paulo, Caixa Postal 369, 13560-970,
São Carlos, São Paulo, Brazil
1 Introductory Remarks
In the field of many-body quantum physics the theoretically most easily accessible
quantities are low order correlation functions, such as the single-particle density,
the single-particle momentum distribution as well as the respective two-body
correlation functions.
Unlike many other subfields of physics the field of ultracold quantum gases opens
the rare possibility to investigate high order correlation functions of many-body
quantum systems, essentially at no cost. Usually an absorption image of an atomic
cloud is taken at the end of an experimental run, which measures the position of the
particles. According to the postulates of quantum mechanics the positions r1 ; : : : ; rN
Many-Body Phenomena in Ultracold Quantum Systems 81
where e.g. P.r2 jr1 / denotes the conditional probability of finding a particle at r2 ,
given another one is at r1 . Using Eq. (2) the authors have recently developed an
algorithm which allows the simulation of single shots from arbitrary many-body
wave functions [16]. This generalizes previous work in this direction for special
cases [24–27]. A powerful algorithm to obtain highly accurate many-body wave
functions of dynamic many-boson systems is the MCTDHB method [2, 4, 9, 28],
which we use here as a tool to obtain the wave function of a rotating condensate. In
the following we review some of the results on fluctuating many-body vortices, see
[16] for details.
Quantized vortices are a hallmark of Gross-Pitaevskii mean-field theory and
typically display a density node [29]. Their appearance is related to a critical
rotation velocity. It was recently discovered that stirring a BEC can lead to many-
body vortices below the mean-field critical velocity [27, 30, 31]. In contrast, the
single particle density of many-body vortices only has a finite value at the vortex
core. However, it is important to distinguish between the single-particle density
.r/ D NP.r/, which is the average over many single shots and single shots
themselves which are random deviates of P.r1 ; : : : ; rN /.
Consider the ground state of a repulsively interacting BEC of N D 10;000
bosons in a 2D harmonic trap with !x D !y D 1 at an interaction strength
D 17. The many-body ground state using M D 2 orbitals is practically fully
condensed with 1 =N D 99:99 % and therefore well-described by Gross-Pitaevskii
mean-field theory. We then switch on a time-dependent stirring potential Vs .r; t/ D
1 2 2
2 .t/Œx.t/ y.t/ that imparts angular momentum onto the BEC. Here x.t/ and
y.t/ vary harmonically and the amplitude .t/ is linearly ramped up from zero to a
finite value until time t D 80, kept constant there until t D 300 and ramped back
down again until t D 380.
82 O.E. Alon et al.
Fig. 1 Fluctuating many-body vortices. A repulsive condensate in the ground state of a harmonic
trap is stirred by a rotating potential in two spatial dimensions. Over the course of time the
system fragments and in single shots vortices appear at random positions. (a) First column: single-
particle density at different times. Second to fourth column: single shots at the same times. (b)
Fragmentation of the condensate as a function of time. Starting from a condensed state, the system
of bosons fragments as it is stirred. While the system is condensed single shots and the single-
particle density look alike. When the system is fragmented vortices appear at random positions.
Parameter values: N D 10;000. Interaction strength: D 17. See text for details. All quantities
shown are dimensionless (Figure from Ref. [16])
The first column of Fig. 1a shows the single-particle density at different times.
The remaining three columns show single shots taken at the same times as shown for
the density. Figure 1b shows the evolution of the natural occupations as a function
of time. The system is condensed as long as only a single natural occupation is
occupied. As expected from the discussion above single shots merely reproduce
the single-particle density for such BECs. However, over the course of time an
additional natural orbital becomes occupied, i.e., the BEC becomes fragmented
[32]. Each single shot then shows a clear vortex with no particles at its core. These
vortices appear randomly at different locations in each shot, in contrast to their
mean-field counterparts. The average over many single shots reproduces the single-
particle density. The fact that the vortices appear randomly at different locations in
each shot explains the finite value of the vortex core in the average over many single
shots, i.e., the single-particle density.
Many-Body Phenomena in Ultracold Quantum Systems 83
Quantized vortices and dark solitons are the fundamental nonlinear excitations of
atomic Bose-Einstein condensates in two/three dimensions and one dimension,
respectively [38]. Quantized vortices are defects in the quantum phase about
which the superfluid flows with quantized circulation, while dark solitons are non-
dispersive waves characterized by a density depression and phase slip. Since the
early days of atomic condensates, both vortex structures and dark solitons have
been experimentally generated and studied. However, recent experiments have
reported intriguing structures called solitonic vortices [39–41]. These excitations,
first predicted by Brand and Reinhardt [42], lie at the crossover between one and
two/three dimensions, where dark solitons are dimensionally unstable yet the trans-
verse confinement negatives conventional vortices. Motivated by these observations
84 O.E. Alon et al.
Fig. 3 Vortex-solitonic vortex transition. (a) The vortex oscillation frequency p !v increases as
the trap ratio !y =!x is increased, saturating to the dark soliton prediction !x = 2, shown here
for different interaction strengths (blue, red, pink) from simulations (points) and an analytical
prediction (lines). (b) When plotted as a function of the condensate width ly divided by the healing
length , the data fall on a common curve. (c) Evolution of the condensate density during an
example trap deformation cycle (circular ! elongated ! circular) (Figure panels adapted from
Ref. [18])
86 O.E. Alon et al.
a dark soliton. When plotted as a function of the condensate width divided by the
healing length, the oscillation frequencies fall on a common curve (Fig. 3b).
Next we examine the hysteresis of the system in traps with time-dependent aspect
ratio, when the system, starting from a circular trap, is being elongated and then
returned back to its initial shape. When the solitonic regime is probed during the
hysteresis cycle, angular momentum is lost from the system but, remarkably, the
vortex can re-emerge (Fig. 3c).
an expanding initially-trapped BEC, after the trap is released, still exhibits 100 %
condensation [52]. Furthermore, the condensate density evolves according to that
predicted by the time-dependent Gross-Pitaevskii equation. Whereas these results
imply that the fraction of depleted particles vanishes in the infinite particle limit, the
absolute number of non condensed particles is always non-zero in the interacting
system. The latter is central to the present investigations.
In [19] we analyze the ground state of a trapped BEC and demonstrate that, even
in the infinite particle limit when the BEC is 100 % condensed, the variance of a
many-particle operator can substantially differ from that predicted by the Gross-
Pitaevskii theory. The existence of many-body effects beyond those predicted by
the mean-field Gross-Pitaevskii theory stems from the necessity of performing the
infinite particle limit only after the quantum mechanical observalbe is evaluated
and not prior to its evaluation. This is essential since otherwise any trace of many-
body correlations is washed out before the quantum mechanical observable can be
evaluated. This is explained in length both analytically and numerically in Ref. [19],
see Fig. 4 for an example.
Fig. 4 The variance of the many-particle position operator, XO D jD1 xO j , of a weakly-interacting
BEC held in a symmetric trap for different barrier heights. Results for N D 1000 (in green),
10;000 (in blue), 100;000 (in magenta), and 1;000;000 (in red) bosons are shown. The interaction
parameter is D 0 .N 1/ D 0:1. (a) Shown is the variance N1 2XO (full curves) and the Gross-
Pitaevskii variance 2Ox;GP (in black; dashed curve). Large differences arise from a certain barrier
height. All four curves for the different numbers of particles lie atop each other. (b) The many-body
energy and (c) the depletion are seen to approach and coincide with the Gross-Pitaevskii results,
as is expected from the literature. Note the small values on the y-axes of panels (b) and (c). In
contrast, the variance converges to a value different from the Gross-Pitaevskii results for not too
shallow barriers. See [19] for more details. The quantities shown are dimensionless (Figure from
Ref. [19])
88 O.E. Alon et al.
In [20] we generalize our result for the ground state to the dynamics of an out-of-
equilibrium BEC. Dynamics is generally more intricate than statics, and involves
(sometimes many) excitations. We show, analytically and numerically, that the
evolution in time of the uncertainty product of two operators can deviate from that
of the Gross-Pitaevskii dynamics, even in the infinite particle limit. We explicitly
demonstrate this deviation for the center-of-mass position–momentum uncertainty
product of a freely expanding BEC, see Fig. 5, as well as to the dynamics of a
trapped BEC [20]. The uncertainty product is an example of an observable of a BEC
that, rather than depending on the depleted fraction which vanishes in the infinite
particle limit, depends on the depleted total number of particles which always exists
in the interacting system. Our results thus advocate that one has to use a many-body
propagation theory, such as MCTDHB, to describe the out-of-equilibrium dynamics
of observables like the uncertainty product of BECs, even in the limit of an infinite
number of particles when the system becomes 100 % condensed.
x2 / D 0 ı.x1 x2 /. Shown and compared as a function of time are the Gross-Pitaevskii results
for the interaction parameters D 0 .N 1/ D 1 (in red; dashed), D 10 (in green; dashed–
dotted), and D 100 (in blue; dashed–double-dotted) and the analytical, many-body result 14 .1C
t2 / valid 8 (in black; full curve). The position–momentum uncertainty product computed at
the Gross-Pitaevskii level differs from the analytical, many-body result. The difference increases
upon increasing , meaning that the pace of growth of the uncertainty at the mean-field level
depends on the interaction parameter (the y-axis is plotted in logarithmic scale). The many-
body uncertainty product grows as t2 , and the mean-field uncertainty product is seen to grow in a
similar manner in time (see Ref. [53] for the mean-field analysis of the expansion). The uncertainty
product constitutes a macroscopic probe of the time-dependent correlations of a BEC, even when
the system becomes 100 % condensed in the limit of an infinite number of particles. See [20] for
more details. The quantities shown are dimensionless (Figure from Ref. [20])
Many-Body Phenomena in Ultracold Quantum Systems 89
The tunneling of one trapped fermion like an electron, proton and even classes
of atoms or molecules through a barrier can in many cases be solved even
analytically. But how do several trapped fermions behave? Do they tunnel through
the barrier together or do they go one-by-one? This issue has been investigated
experimentally [54] showing a sequential tunneling process, but has not been fully
understood theoretically yet. We therefore utilize the MCTDHF approach [3, 55–
57] implemented in the MCTDH-X program package [12]. We first confine two
fermions in a parabolic potential (see Fig. 6a) and then propagate the wavefunction.
As the time evolves, fermions with two different kinetic energies, which are here
characterized by their momenta k, are observed. This can be explained by sequential
tunneling of the fermions since the first fermion will feel both the external potential
and the interaction with the second fermion in the potential. However, the energy of
the second fermion is only influenced by the potential and therefore is lower (see
Fig. 6b).
For stronger interparticle interactions a third kinetic energy is observed which
fits to the collective movement of two fermions. From Fig. 7, it can be seen that
the two-body correlation function for k1 and k2 is larger for different energies than
their own (i.e., maxima on the off-diagonal), while for a momentum of kred the other
electron is observed with the same energy, which manifests in a peak on the diagonal
in Fig. 7. We therefore conclude that fermions escape over the barrier together while
they travel alone when they tunnel through the barrier [21].
Much like humans in a society may form groups that have different opinions
and interests, ultracold bosonic atoms in a Bose-Einstein condensate may exhibit
“social behavior”. A system of indistinguishable bosons with two internal degrees
of freedom is naturally divided into two groups C and . As a whole, these two
groups may have “the same interests”, i.e., be in a coherent, condensed state or
“have different interests”, i.e., be in an incoherent, fragmented state.
Herein, we investigate the ground state of a system of two-component bosons that
are trapped in harmonic potentials whose minima are spatially split as a function of
the distance between them. To this end, we use the multiconfigurational time-
dependent Hartree for indistinguishable particles software [12]. As a first step, we
show how the two groups C and react to the distance between their parabolic
90 O.E. Alon et al.
Fig. 6 (a) Potential in which the fermions are trapped. Initially, the fermions are confined to the
parabolic potential V.x; t < 0/ and their wavefunction has the density .x; t D 0/ (sketched).
The potential V.x; t 0/ is then opened to allow for fermions to escape. (b) When several
fermions are confined in space and repel each other, the energy in the system is higher than for one
single fermion. When the first fermion leaves the confined space by tunneling it takes the energy
stemming from its interaction with the other fermions and removes it from the other fermions.
Hence, their energy is lowered. All quantities are dimensionless (Figure material from Ref. [21])
confinements, by plotting the component densities C .x/ D kq kq kC; .x/qC .x/,
P ;
.x/ D
kq kq k .x/q .x/ and the composite density
.x/ D C .x/ C
.x/ in the left part of Fig. 8. The C and groups’ densities C .x/ and .x/
center themselves around the minima of their respective potentials quite intuitively.
When the splitting becomes sufficiently large ( 4), fragmentation in the system
emerges (bottom panel in left part of Fig. 8): the system of the two groups of atoms
loses its coherence – the “society” starts to host different interests.
To understand how the different interests are distributed between the groups,
we plot the correlation function of the system in the right part of Fig. 8. Quite
intuitively, the coherence within the different components or groups is maintained
Many-Body Phenomena in Ultracold Quantum Systems 91
kqsl .p1 / .p2 /s .p1 /l .p2 /
Fig. 7 Two-body momentum correlation function g.2/ .p1 ; p2 / D P kqsl k P
kq kq k .p1 /q .p1 / kq kq k .p2 /q .p2 /
For sequential tunneling the probability that the two momenta are different is pronounced (i.e.,
maxima on the off-diagonal), while two fermions traveling together have the same momentum (i.e.,
maxima on the diagonal). The latter feature increases with the interparticle interaction amongst
the fermions and, therefore, this process can be assigned to the joint over the barrier escape. All
quantities are dimensionless (Figure material from Ref. [21])
while it is lost between them: when the splitting is large enough, one hence finds
ˇ P ˇ2
ˇ kq kq k
; 0
.x /q .x/ ˇ
jg .1/; 2
j D ˇ ˇ 1 and
ˇ q.P kq ; .x0 / .x0 //.P kq ; .x/ .x// ˇ
ˇ ˇ2
kq k q kq k q
ˇ P C; 0 C ˇ
ˇ r kq kq k .x /q .x/
jg.1/;C j2 D ˇ P P ˇ 1 while
C; 0 C 0 C;
kq kq k .x /q .x / kq kq k .x/qC .x/
ˇ P ˇ2
ˇ ˛; 0 ˛
kq˛ kq k .x /q .x/
jg.1/;C= j2 D ˇˇ q P ˇ ! 0. Members of the
. kq˛ kq k .x /q .x //. kq˛ kq k .x/q˛ .x// ˇ
˛; 0 ˛ 0 P ˛;
same group in our “society” of atoms share interests while the members of different
groups have different interests. We term this distribution of interests among groups
“composite fragmentation”. Composite as opposed to component fragmentation can
only occur in systems of ultracold bosonic atoms with (multiple) internal degrees
of freedom [22].
We have presented in the above chapters, continuing thereby the tradition set
forward in previous years [13–15], our scientific work within the MCTDHB project
supported by the HLRS for the last year. Our work span a variety of applications and
systems. In Ref. [16] we dealt with single-shot simulations of dynamic quantum
many-body systems and their implications on imaging of fluctuating vortices in
rotating fragmented BECs; In Ref. [17] we explored the many-body tunneling
dynamics of BECs and vortex states in 2D circular traps and the resulting emergence
92 O.E. Alon et al.
Fig. 8 Left: Ground-state density and fragmentation P as a function of the splitting . The top
panel shows the composite density C= .x/ D ˛
kq˛ kq k .x/q .x/ of the two components
P D C ˛; and ˛ D . The second and third panel depict the component densities ˛ .x/ D
kq kq k .x/q .x/ of the system in the respective internal state ˛ D C and ˛ D . The bottom
panel shows the fragmentation of the system. Fragmentation is energetically favorable as soon as
the overlap of the densities of the internal states becomes small (cf. top and bottom panels). Right:
Signatures of composite fragmentation in the one-body correlation function as a function of the
splitting . The rows of panels correspond to the splittings D 0, D 1, and D 5:5 from top
to bottom. The values of fragmentation are F D 0:004, F D 0:006, and F D 0:490, respectively.
The first column shows the composite correlation function of both internal states jg.1/;C= j2 , the
middle column the correlation function of the ˛ D C state, jg.1/;C j2 , and the right column the
correlation function of the ˛ D state, jg.1/; j2 . The correlations are only plotted for coordinates
.x; x0 / if the component (composite) one-body density at these coordinates is larger than 0:05, to
avoid analyzing component (composite) correlations where there are practically no particles. While
the component correlations exhibit full coherence, i.e., jg.1/;˛ j2 1 in the middle and left column,
the composite correlation function shows a quick loss of coherence between the components, i.e.,
jg.1/;C= j2 0 on the off-diagonals in the left column: the fragmentation in the system is of
“composite” type. All quantities shown are dimensionless, see text for further discussion (Figure
material from Ref. [22])
1. Streltsov, A.I., Alon, O.E., Cederbaum, L.S.: General variational many-body theory with
complete self-consistency for trapped bosonic systems. Phys. Rev. A 73, 063626 (2006)
2. Streltsov, A.I., Alon, O.E., Cederbaum, L.S.: Role of excited states in the splitting of a trapped
interacting bose-einstein condensate by a time-dependent barrier. Phys. Rev. Lett. 99, 030402
3. Alon, O.E., Streltsov, A.I., Cederbaum, L.S.: Unified view on multiconfigurational time
propagation for systems consisting of identical particles. J. Chem. Phys. 127, 154103 (2007)
94 O.E. Alon et al.
21. Fasshauer, E., Lode, A.U.J.: Multiconfigurational time-dependent Hartree method for
fermions: implementation, exactness, and few-fermion tunneling to open space. Phys. Rev.
A 93, 033635 (2016)
22. Lode, A.U.J.: The multiconfigurational time-dependent Hartree method for bosons with
internal degrees of freedom: theory and composite fragmentation of multi-component Bose-
Einstein condensates. Phys. Rev. A 93, 063601 (2016)
23. Penrose, O., Onsager, L.: Bose-Einstein condensation and liquid helium. Phys. Rev. 104, 576
24. Javanainen, J., Yoo, S.M.: Quantum phase of a Bose-Einstein condensate with an arbitrary
number of atoms. Phys. Rev. Lett. 76, 161 (1996)
25. Castin, Y., Dalibard, J.: Relative phase of two Bose-Einstein condensates. Phys. Rev. A 55,
4330 (1997)
26. Dziarmaga, J., Karkuszewski, Z.P., Sacha, K.: Images of the dark soliton in a depleted
condensate. J. Phys. B 36, 1217 (2003)
27. Dagnino, D., Barberán, N., Lewenstein, M.: Vortex nucleation in a mesoscopic Bose superfluid
and breaking of the parity symmetry. Phys. Rev. A 80, 053611 (2009)
28. Streltsov, A.I., Alon, O.E., Cederbaum, L.S.: General mapping for bosonic and fermionic
operators in fock space. Phys. Rev. A 81, 022124 (2010)
29. Fetter, A.L.: Rotating trapped Bose-Einstein condensates. Rev. Mod. Phys. 81, 647 (2009)
30. Dagnino, D., Barberán, N., Lewenstein, M., Dalibard, J.: Vortex nucleation as a case study of
symmetry breaking in quantum systems. Nat. Phys. 5, 431 (2009)
31. Weiner, S.E., Tsatsos, M.C., Cederbaum, L.S., Lode, A.U.J.: Angular momentum in interacting
many-body systems hides in phantom vortices. arXiv:1409.7670
32. Nozières, P., James, D.S.: Particle vs. pair condensation in attractive Bose liquids. J. Phys. (Fr.)
43, 1133 (1982)
33. Martin, A.M., Scott, R.G., Fromhold, T.M.: Transmission and reflection of Bose-Einstein
condensates incident on a Gaussian tunnel barrier. Phys. Rev. A 75, 065602 (2007)
34. Arovas, D.P., Auerbach, A.: Quantum tunneling of vortices in two-dimensional superfluids.
Phys. Rev. B 78, 094508 (2008)
35. Salgueiro, J.R., Zacarés, M., Michinel, H., Ferrando, A.: Vortex replication in Bose-Einstein
condensates trapped in double-well potentials. Phys. Rev. A 79, 033625 (2009)
36. Fialko, O., Bradley, A.S., Brand, J.: Quantum tunneling of a vortex between two pinning
potentials. Phys. Rev. Lett. 108, 015301 (2012)
37. Garcia-March, M.A., Carr, L.D.: Vortex macroscopic superpositions in ultracold bosons in a
double-well potential. Phys. Rev. A 91, 033626 (2015)
38. Kevrekidis, P.G., Frantzeskakis, D.J., Carretero-González, R. (eds.): Emergent Nonlinear
Phenomena in Bose-Einstein Condensates. Springer, Berlin (2008)
39. Becker, C., Sengstock, K., Schmelcher, P., Kevrekidis, P.G., Carretero-González, R.: Inelastic
collisions of solitary waves in anisotropic Bose-Einstein condensates: sling-shot events and
expanding collision bubbles. New J. Phys. 15, 113028 (2013)
40. Donadello, S., Serafini, S., Tylutki, M., Pitaevskii, L.P., Dalfovo, F., Lamporesi, G., Ferrari, G.:
Observation of solitonic vortices in Bose-Einstein condensates. Phys. Rev. Lett. 113, 065302
41. Ku, M.J.H., Ji, W., Mukherjee, B., Guardado-Sanchez, E., Cheuk, L.W., Yefsah, T., Zwierlein,
M.W.: Motion of a solitonic vortex in the BEC-BCS crossover. Phys. Rev. Lett. 113, 065301
42. Brand, J., Reinhardt, W.P.: Solitonic vortices and the fundamental modes of the snake
instability: possibility of observation in the gaseous Bose-Einstein condensate. Phys. Rev. A
65, 043612 (2002)
43. Cohen-Tannoudji, C., Diu, B., Laloë, F.: Quantum Mechanics, vol. 1. Wiley, New York (1977)
44. Dalfovo, F., Giorgini, S., Pitaevskii, L.P., Stringari, S.: Theory of Bose-Einstein condensation
in trapped gases. Rev. Mod. Phys. 71, 463 (1999)
45. Leggett, A.J.: Bose-Einstein condensation in the alkali gases: some fundamental concepts. Rev.
Mod. Phys. 73, 307 (2001)
96 O.E. Alon et al.
46. Bloch, I., Dalibard, J., Zwerger, W.: Many-body physics with ultracold gases. Rev. Mod. Phys.
80, 885 (2008)
47. Pitaevskii, L., Stringari, S.: Bose-Einstein Condensation. Oxford University Press, Oxford
48. Leggett, A.J.: Quantum Liquids: Bose Condensation and Cooper Pairing in Condensed Matter
Systems. Oxford University Press, Oxford (2006)
49. Pethick, C.J., Smith, H.: Bose-Einstein Condensation in Dilute Gases, 2nd edn. Cambridge
University Press, Cambridge (2008)
50. Lieb, E.H., Seiringer, R., Yngvason, J.: Bosons in a trap: a rigorous derivation of the Gross-
Pitaevskii energy functional. Phys. Rev. A 61, 043602 (2000)
51. Lieb, E.H., Seiringer, R.: Proof of Bose-Einstein condensation for dilute trapped gases. Phys.
Rev. Lett. 88, 170409 (2002)
52. Erdős, L., Schlein, B., Yau, H.-T.: Rigorous derivation of the Gross-Pitaevskii equation. Phys.
Rev. Lett. 98, 040404 (2007)
53. Brazhnyi, V.A., Kamchatnov, A.M., Konotop, V.V.: Hydrodynamic flow of expanding Bose-
Einstein condensates. Phys. Rev. A 68, 035603 (2003)
54. Serwane, F., Zürn, G., Lompe, T., Ottenstein, T.B., Wenz, A.N., Jochim, S.: Deterministic
preparation of a tunable few-fermion system. Science 332, 6027 (2011)
55. Caillat, J., Zanghellini, J., Kitzler, M., Koch, O., Kreuzer, W., Scrinzi, A.: Correlated
multielectron systems in strong laser fields: a multiconfiguration time-dependent Hartree-Fock
approach. Phys. Rev. A 71, 012712 (2005); Zanghellini, J., Kitzler, M., Fabian, C., Brabec, T.,
Scrinzi, A.: An MCTDHF approach to multielectron dynamics in laser fields. Laser Phys. 13,
1064 (2003)
56. Kato, T., Kono, H.: Time-dependent multiconfiguration theory for electronic dynamics of
molecules in an intense laser field. Chem. Phys. Lett. 392, 533 (2004)
57. Nest, M., Klamroth, T., Saalfrank, P.: The multiconfiguration time-dependent Hartree-Fock
method for quantum chemical calculations. J. Chem. Phys. 122, 124102 (2005)
58. Grond, J., Streltsov, A.I., Lode, A.U.J., Sakmann, K., Cederbaum, L.S., Alon, O.E.: Excitation
spectra of many-body systems by linear response: general theory and applications to trapped
condensates. Phys. Rev. A 88, 023606 (2013)
59. Alon, O.E., Streltsov, A.I., Cederbaum, L.S.: Unified view on linear response of interacting
identical and distinguishableparticles from multiconfigurational time-dependent Hartree meth-
ods. J. Chem. Phys. 140, 034108 (2014)
60. Alon, O.E.: Many-body excitation spectra of trapped bosons with general interaction by linear
response. J. Phys. Conf. Ser. 594, 012039 (2015)
61. Tsatsos, M.C., Tavares, P.E.S., Cidrim, A., Fritsch, A.R., Caracanhas, M.A., dos Santos,
F.E.A., Barenghi, C.F., Bagnato, V.S.: Quantum turbulence in trapped atomic Bose-Einstein
condensates. Phys. Rep. 622, 1 (2016)
62. Lode, A.U.J., Chakrabarti, B., Kota, V.K.B.: Many-body entropies, correlations, and emer-
gence of statistical relaxation in interaction quench dynamics of ultracold bosons. Phys. Rev.
A 92, 033622 (2015)
63. Gring, M., Kuhnert, M., Langen, T., Kitagawa, T., Rauer, B., Schreitl, M., Mazets, I., Adu
Smith, D., Demler, E., Schmiedmayer, J.: Relaxation and prethermalization in an isolated
quantum system. Science 337, 1318 (2012)
64. von Stecher, J., Greene, C.H.: Spectrum and dynamics of the BCS-BEC crossover from a few-
body perspective. Phys. Rev. Lett. 99, 090402 (2007)
Nucleon Observables as Probes for Physics
Beyond the Standard Model
Abstract We discuss the results of our ongoing project on Hazel Hen at HLRS
concerning observables that shed light on the inner structure of the proton and other
hadrons. We use techniques from lattice quantum chromodynamics to evaluate these
observables on a gluon field ensemble of a 483 96 lattice at a lattice spacing of
a D 0:094.1/ fm. The novelty of this ensemble is that it is generated directly at the
physical value of the pion mass such that any extrapolation from heavier pion masses
can be avoided, eliminating thus this systematic uncertainty. By employing state of
the art lattice QCD algorithms we were able to compute the hadron spectrum, the
axial and tensor charges moments of parton distribution functions and the quark
contents of the nucleons.
1 Introduction
The project is embedded in the field of high energy particle physics. In particular, it
addresses the strong interaction which binds together quarks and gluons. This leads
to the formation of the observed hadronic matter, e.g. the proton and neutron as the
most prominent examples.
C. Alexandrou
Department of Physics, University of Cyprus, P.O. Box, 20537, 1678, Nicosia, Cyprus
e-mail: alexandrou@cyi.ac.cy
K. Jansen ()
NIC, DESY, Platanenallee 6, 17538, Zeuthen, Germany
e-mail: karl.jansen@desy.de
G. Koutsou
Computation-Based Science and Technology Research Center, The Cyprus Institute,
20 Kavafi Str., 2121, Nicosia, Cyprus
e-mail: g.koutsou@cyi.ac.cy
C. Urbach
Helmholtz-Institut für Strahlen-und Kernphysik (Theorie) and Bethe Center for Theoretical
Physics, Universität Bonn, 53115, Bonn, Germany
e-mail: urbach@hiskp.uni-bonn.de
The theory that is to explain the interaction of quarks and gluons is quantum
chromodynamics. It is a local quantum field theory which can be written down in a
most elegant and compact way. Still, we demand from this theory that it describes
the phenomena of the strong interaction from very small to large O(1fm) scales. For
distances much below 1fm the quarks behave almost freely, we speak of asymptotic
freedom. When the distance becomes at O(1fm) the interaction between quarks and
gluons becomes strong, they bind strongly together and form the observed hadrons.
In fact, the force between quarks and gluons becomes so strong that they cannot
be detected in experiment as asymptotic particles. We speak of the confinement of
The fact that the interaction becomes very strong hinders that the theory can
be evaluated in perturbation theory since no small parameter occurs. Thus, in
order to test QCD as the correct theory of the strong interaction, non-perturbative
methods have to be employed. The way to use such non-perturbative approaches
is to formulate the theory on a discrete 4-dimensional space time grid. This leads
to the notion of Lattice QCD (LQCD). By rotating the standard Minkowski time
to Euclidean time, LQCD can be interpreted as statistical mechanics model (in the
sense of the Ising model). This makes it possible to perform numerical simulations
of the system and hence in turn to evaluate the theory from first principles and non-
Following this approach, the present project has concentrated on the computation
of important hadronic observables. Starting with the benchmark calculation of
the hadronic spectrum, in particular the baryon masses, a number of quantities
that characterize the properties of hadrons, their spin, their angular momentum
and their quark content has been computed. The novelty of the project is that all
calculation have been performed directly at the physical value of the nucleon and
pion masses (the physical point), see below. This is a most substantial progress
for such lattice calculations since it avoids the demanding and often not very
controlled extrapolation from heavier than physical pion masses to the physical one.
Working directly at the physical point eliminates therefore an – often dominating –
systematic error and opens a new road for understanding the strong force. Besides
understanding better nuclear matter, the observables computed in this project, the
charges, the neutron electric dipole moment and the quark contents of the nucleon
play a most significant role as input to the interpretation of worldwide ongoing and
planned experiments to detect new physics beyond the standard model.
These main results of our last project period are summarized in two recent publica-
tions, Refs. [1, 2]. There is, in addition, a recent work on the -terms in QCD [3]
related to the quark content of the nucleon and there are a number of papers in
preparation. These publications are listed in the bibliography section from where the
links can be followed to their arXiv entry on which they have been posted. The last
project period allowed us to substantially progress our analysis on the 483 96 lattice
Nucleon Observables as Probes for Physics Beyond the Standard Model 99
As a first very important step, the determination of the lattice spacing for the here
used ensemble is necessary. We calculated the lattice spacing from the nucleon mass
itself. This has been achieved by combining older results at unphysical values of
the pion mass with the lattice data for the pion and nucleon masses directly at the
physical point. The pion mass dependence of the ratio of the nucleon mass to the
pion mass as function of the pion mass squared has been fitted to heavy baryon chiral
perturbation theory. As can be see in Fig. 1 this dependence is very well described
by the theoretical formulae. In particular, we find for the ratio mN =m ˙ D 7:6.3/
which is consistent with the physical value. Thus taking this as being at the physical
pion mass we set mN D 0:938 GeV and determine from this physical value the
lattice spacing as a D 0:094.1/ fm. Note that this value is fully consistent with a
scale setting procedure using pure gauge quantities, see [1].
100 C. Alexandrou et al.
Fig. 1 The ratio of the nucleon mass to the pion mass as a function of the pion mass squared. For
determining the pion mass squared the scale is set using the nucleon mass at the physical point as
described in the text. The fit includes the points with heavier than physical pion masses (circles,
diamonds and squares) but not the ensemble with the clover term (filled triangle)
4 Three-Point Functions
After computing the hadronic masses as benchmark quantities we now turn to the
calculation of more complicated observables that derive from three-point functions.
To this end, on each lattice gauge-configuration we compute three-point functions
originating from 16 randomly chosen positions, and for each of these positions
we compute the nucleon three-point function with the single unpolarized and three
polarized projections, at three sink-source time separations. This means we needed
to invert the twisted mass clover operator 4992 times per gauge-field configuration.
These calculations are only feasible when using multiple right-hand-side (rhs)
methods such as the polynomialy accelerated Arnoldi method used in this project
and multi-grid, which we plan to employ in the future. Both methods yield speed-
ups of between 30 to 100 times compared to standard Conjugate Gradients (CG)
An important outcome during the past allocation period was the calculation of the
nucleon light, strange and charm -terms. The precise values of these quantities
are important for Dark Matter searches since they enter cross-sections of nuclei
scattering with Weakly Interacting Massive Particles (WIMPs) through Higgs
Nucleon Observables as Probes for Physics Beyond the Standard Model 101
boson exchange. These are examples of quantities where lattice QCD can provide
predictions, in particular in the case of the strange and charm -terms where direct
measurements are not possible.
Following a first calculation at pion mass of around 370 MeV [7], which was used
to develop the required methodology, we used Hazel Hen resources to obtain the -
terms directly from the nucleon matrix elements for the first time at the physical
values of the light quark masses. Our results have been reported in Ref. [3], where
guided by the analysis at 370 MeV, which showed large contaminations by excited
states, we computed multiple sink-source separations in order to reliably identify
the ground state matrix element.
Having results at multiple sink-source separations and at measurements of
O(104 ), we were able to apply different analyses on our data which carry different
systematics in terms of the excited state contaminations. This is shown in Fig. 2,
where we use three methods to obtain the -terms: fits to the standard plateau which
Fig. 2 Connected and disconnected contributions to the light -term ( N , top two rows), the
strange -term (s , third row), and the charm -terms (c , bottom row). Results obtained using the
standard plateau method are shown in the first column (red circles) as a function of the sink-source
separation, using a two-state fit in the center column (blue squares) as a function of the smallest
sink-source separation included in the fit and using the summation method in the right column
(green triangles) as a function of the smallest sink-source separation included in the fit. The red
band indicates the final value taken, which is the plateau value when all three methods agree
102 C. Alexandrou et al.
ignores excited states, fits which include an excited state (“two-state”) and fits to
the ratio summed over the insertion time coordinate (“Summation”). Using these
three methods to ensure systematic uncertainties were under control we were able to
obtain values for the -terms to a precision which can be used in phenomenological
studies such as in Ref. [8]. Furthermore this is one of few calculations of the charm
-terms and one of the most accurate predictions for the strange -term.
We note that Hazel Hen resources were used for computing the connected
contribution of the N . They were complemented by GPU resources for the
calculation of the disconnected contributions to the -terms leveraging optimally
our available computer resources.
Apart from the dedicated calculation of the -terms highlighted above, our produc-
tion runs yield all local and one-derivative matrix elements of the nucleon. These
results were reported on in Ref. [2] for the initial set of statistics obtained at the
time. Our current goals are a high-statistics evaluation of these matrix elements at
multiple sink-source separations in order to reliably investigate excited state effects.
Currently we have obtained 6400 measurements and expect about 10,000 by the end
of the allocation period. For an error of about 2 % we need about O.105 / statistics
up to sink-source separations of 1.5 fm, which is made possible with our new
multi-grid method as explained in the current proposal.
In Figs. 3 and 4 we show the current status of two observables, the nucleon
axial charge and the nucleon quark helicity fraction for the physical point ensemble.
Fig. 3 Nucleon axial charge of the cA2.09.48 ensemble as a function of the distance between
and insertion and source time (tins ) for three sink-source separations. The band is the result of the
summation method
Nucleon Observables as Probes for Physics Beyond the Standard Model 103
Fig. 4 Nucleon quark helicity fraction calculated on the physical point ensemble (triangles)
compared to results obtained at heavier than physical pion masses. The upwards pointing triangle
is the result of a fit to plateau, while the rightwards pointing triangle is the result of the summation
The axial charge is given as a function of the insertion to source time separation for
three sink-source separations. Excited states are suppressed exponentially with both
these separations. We also show the summation method, in which tins is summed
over and a two-parameter fit is performed of which the slope is gA , leading however
to a larger error. The helicity is shown as a function of the pion mass compared
to various lattice results from the literature at heavier than physical pion masses
and to experiment. At the physical point we show the result of the plateau method
at separation 1.3 fm, while slightly shifted we show the result of the summation
method. As can be seen in both cases there is a trend towards the experimentally
measured values of both quantities but a tension still remains as mentioned above.
Furthermore the errors at larger sink-source separations and especially for the
summation method are large and need to be reduced in order to decide on the
excited state effects. These results clearly demonstrate the need for more statistics,
especially at larger sink-source separations, in order to allow definite conclusions
on such quantities to be drawn and to be able to decide, whether there is a real
tension. If it turns out that there is a discrepancy between our lattice data and
experimental/phenomenological results, it becomes necessary to investigate which
other systematic effect of our lattice calculation could be responsible for this. It is
our opinion that then finite volume effects are the cause for the tension and hence
a calculation on a larger volume at fixed action parameters becomes mandatory. As
mentioned above we have in any case already initiated the generation of gluon field
configurations on a 643 128 lattice which will allow us to provide a definite answer
to this question.
104 C. Alexandrou et al.
5 Conclusion
With the computing time allocated to us on the Hazel Hen at HLRS we could achieve
substantial progress in the evaluation of physical quantities related to the structure
of hadrons and the search for new physics beyond the standard model. With these
resources, we were able to analyze gluon field configurations that were generated
on a 483 96 lattice at a lattice spacing of about a 0:09 fm and, most importantly,
the physical value of the pion and nucleon masses.
The latter fact avoids the extrapolation from unphysically large pion masses to the
physical one and thus eliminates an often dominating systematic uncertainty. In this
way, we could not only compute the spectrum of many hadrons observed in nature
and reproduce their physical values but we could also address more complicated
observables that are important to shed light on the inner structure of nucleons and
serve as input for the interpretation of experiments looking for beyond the standard
model physics: by computing the first moment of parton and gluon distribution
functions, we can determine the quark and gluon average momentum in the proton;
having the axial and tensor charges at our disposal we can also make a complete
analysis of the proton spin; the neutron electric dipole moment is important to
understand charge and parity violation from the strong interaction; finally having
results for the so-called -terms, we can provide important input for dark matter
Interestingly and somewhat unexpected, we still see tensions of some quantities
with experimental or phenomenological analyses. It will be most important to
understand this tension by increasing the statistics of the present calculation and
by also using a larger lattice of size 643 128 to address finite volume effects.
It would be most reassuring to find agreement of these quantities with experimen-
tal/phenomenological results, which would open the road for a controlled analysis
of further quantities such as generalized form factors [9, 10], the neutron electric
dipole moment [11] or even the parton distribution functions themselves [12].
Acknowledgements We would like to thank all members of the European Twisted Mass Collab-
oration (ETMC) in which this work is embedded for a most enjoyable and fruitful collaboration.
Without this common effort it would have not been possible to obtain the interesting and important
results described in this report.
1. Abdel-Rehim, A., et al.: Simulating QCD at the Physical Point with Nf D 2 Wilson Twisted
Mass Fermions at Maximal Twist (2015). 1507.05068
2. Abdel-Rehim, A., et al.: Nucleon and pion structure with lattice QCD simulations at physical
value of the pion mass. Phys. Rev. D92, 114513 (2015). 1507.04936
3. Abdel-Rehim, A., et al. [ETM Collaboration]: Direct evaluation of the quark content of
nucleons from lattice QCD at the physical point. Phys. Rev. Lett. 116(25), 252001 (2016).
doi:10.1103/PhysRevLett.116.252001. [arXiv:1601.01624 [hep-lat]]
Nucleon Observables as Probes for Physics Beyond the Standard Model 105
4. Frezzotti, R., Rossi, G.C.: Chirally improving Wilson fermions. I: O(a) improvement. JHEP
08, 007 (2004). hep-lat/0306014
5. Frezzotti, R., Rossi, G.C.: Twisted-mass lattice QCD with mass non-degenerate quarks. Nucl.
Phys. Proc. Suppl. 128, 193–202 (2004). hep-lat/0311008
6. Frezzotti, R., Rossi, G.C.: Chirally improving Wilson fermions. II: four-quark operators. JHEP
10, 070 (2004). hep-lat/0407002
7. Abdel-Rehim, A., et al.: Disconnected quark loop contributions to nucleon observables in
lattice QCD. Phys. Rev. D89, 034501 (2014). 1310.6339
8. Hoferichter, M., de Elvira, J.R., Kubis, B., Meißner, U.-G.: Remarks on the pion-nucleon
sigma-term. arXiv:1602.07688 [hep-lat]
9. Alexandrou, C., et al.: Nucleon electromagnetic form factors in twisted mass lattice QCD.
Phys. Rev. D 83, 094502 (2011). doi:10.1103/PhysRevD.83.094502. [arXiv:1102.2208 [hep-
10. Alexandrou, C., Constantinou, M., Dinter, S., Drach, V., Jansen, K., Kallidonis, C.,
Koutsou, G.: Nucleon form factors and moments of generalized parton distributions
using Nf D 2 C 1 C 1 twisted mass fermions. Phys. Rev. D 88(1), 014509 (2013).
doi:10.1103/PhysRevD.88.014509. [arXiv:1303.5979 [hep-lat]]
11. Alexandrou, C., Athenodorou, A., Constantinou, M., Hadjiyiannakou, K., Jansen,
K., Koutsou, G., Ottnad, K., Petschlies, M.: Phys. Rev. D 93(7), 074503 (2016).
doi:10.1103/PhysRevD.93.074503. [arXiv:1510.05823 [hep-lat]]
12. Alexandrou, C., Cichy, K., Drach, V., Garcia-Ramos, E., Hadjiyiannakou, K., Jansen, K.,
Steffens, F., Wiese, C.: Lattice calculation of parton distributions. Phys. Rev. D 92, 014502
(2015). doi:10.1103/PhysRevD.92.014502. [arXiv:1504.07455 [hep-lat]]
Numerical Evaluation of Multi-loop Feynman
Abstract The main focus of our activities in the period from July 2015 to June
2016 was on the numerical computation of multi-dimensional integrals needed for
the electron contribution to the anomalous magnetic moment of the muon.
1 Introduction
Here pj and ki are momenta in Minkowski space where ki are linear combinations
of pj and external momenta ql . mi are scalar quantities which can be identified with
the masses of the involved particles. The peculiarity of Eq. (1) is the non-integer
dimensionality of each of the momentum integrals since d D 4 2. Here is a
regularization parameter which has been introduced since in general the integrals
diverge in four space-time dimensions. Eventually, we are interested in the limit
! 0 where the divergences manifest themselves as poles in . To obtain a physical
quantity these poles are removed with the help of the renormalization procedure.
The main aim of this project is the numerical evaluation of integrals as given in
Eq. (1) where L D 4, the masses mi are either 0 or m, and for the only external
momentum we have q2 D m2 . Such integrals are the building blocks for the relation
of quark masses defined in the MS and on-shell scheme (as reported in the previous
report) and the anomalous magnetic moment which was the main focus in the period
from July 2015 to June 2016.
P. Marquard
Deutsches Elektronen-Synchrotron DESY, Platanenallee 6, 15738, Zeuthen, Germany
M. Steinhauser ()
Institut für Theoretische Teilchenphysik, Karlsruher Institut für Technologie,
76128, Karlsruhe, Germany
e-mail: Matthias.Steinhauser@kit.edu
The anomalous magnetic moment of the muon, a , is among the most precisely
measured quantities in particle physics. It is measured to a precision of 0.54 parts
per million which matches the precision of the Standard Model theory prediction
[1, 2]. However, since many years one observes a discrepancy of about three to four
standard deviations which survives persistently all improvements both on the theory
and the experimental side.
The theory prediction can be split into hadronic, electroweak and QED contribu-
tion (see Ref. [3] for further references). The non-perturbative hadronic contribution
is further subdivided into the vacuum polarization and light-by-light contribution
and has reached next-to-next-to-leading order accuracy [4]. It is nevertheless the
main source to the uncertainty of the theory prediction. On the other hand, the
electroweak part is known up to two-loop order and thus well under control. The
numerically largest contribution arises from QED corrections. In this respect the
four-loop corrections play a special role since their numerical impact is of the same
order of magnitude as the discrepancy between theory and experiment. Note that
the four-loop corrections have only been computed by one group using entirely
numerical methods [3]. In the approach of our group the calculation proceeds
analytically up to a point where a is expressed as a linear combination of about 380
integrals of the type described above. A large fraction of them have been computed
at the HLRS.
As already mentioned in the previous reports, the workhorse for our calculations
performed at the HLRS is the program package FIESTA [5–7]. For convenience
we repeat the main features of FIESTA.
FIESTA is developed since 2008 with the participation of the Institute for
Theoretical Particle Physics (TTP) at KIT. FIESTA stands for Feynman Integral
Evaluation by a Sector decomposiTion and applies the method of sector decom-
position [8] to obtain finite expression for the coefficients of the Laurent series of
Eq. (1) in D .4 d/=2. These finite expressions are multi-dimensional parameter
integrals with in general large integrands of the size of a few hundred MB up to
a GB.
In practice the preparation of the integrand is performed within Mathematica
on the local cluster. The expressions are transferred in form of a data base to the
HLRS where the time-consuming Monte-Carlo integration is performed. FIESTA
uses a simple master slave model for the parallelization, where the integrands are
distributed from the master to the slaves using MPI and each term is integrated by a
slave using a single core.
The scaling behavior of FIESTA is almost linear with the number of cores as
has been demonstrated in the previous report.
Numerical Evaluation of Multi-loop Feynman Integrals 109
The numerically most important four-loop contribution comes from the Feynman
diagrams which contain a closed electron loop since in that case numerically large
logarithms log.m =me / log.206/ 5:3 are present which in some cases are
even raised to fourth power. Let us in the following concentrate on these kind of
corrections. Sample Feynman diagrams are shown in Fig. 1
The underlying Feynman integrals to the Feynman diagrams in Fig. 1 contain two
mass scales: me and m . Since me
m it is natural to perform an expansion in the
mass ratio. From the appearance of logarithmic contributions mentioned above it is
obvious that a naive Taylor expansion is not appropriate and would lead to wrong
results. Instead we apply a so-called asymptotic expansion which is an algorithmic
prescription to factorize the original expressions into integrals which depend either
on me or on m .
As a example let us present our results for the diagram class III (cf. Fig. 1).
Leaving out the overall factor .˛= /4 the contribution to a reads [9]
2 D 1:15444 ˙ 0:00446 1:80996`x
C xŒ0:849197
C x2 Œ1:95556 ˙ 0:00400 1:25333`x
C x3 Œ20:2365 15:3527`x
D Œ1:1544 ˙ 0:0045 C 9:6500
C Œ0:004107
C Œ0:00004574 ˙ 0:00000009 C 0:00015630
C Œ0:000002289 C 0:000009260
110 P. Marquard and M. Steinhauser
Fig. 1 Four-loop example Feynman diagrams contributing to a containing at least one closed
electron loop. The external solid lines represent muons, the solid loops denote electrons or muons
and the wavy lines represent photons
D Œ10:8044 ˙ 0:0045
C Œ0:004107
C Œ0:00011056 ˙ 0:00000009
C Œ0:000006970
D 10:8004 ˙ 0:0045 ; (2)
where in the first row the expansion in x and its dependence on `x D log.x/ is
shown. After the first equality sign the numerical values of x and `x 5:33 : : : are
inserted, but the resulting summands are kept separated, which indicates the relative
behavior between the constant and the logarithmic terms. Afterwards the sums for
every order in x are evaluated to demonstrate the convergence of the asymptotic
series. At the end the final contribution of the diagram class is shown.
Note that our result in Eq. (2) is in perfect agreement to the result of Ref. [3]
which reads 10:7934 ˙ 0:0027.
Numerical Evaluation of Multi-loop Feynman Integrals 111
Our final result for A2 .m =me / is given by [9–11]
Note that the uncertainty in Eq. (4) receives approximately the same amount from
experiment and theory (i.e. the hadronic contribution). Even after a projected
reduction of the uncertainty by a factor four both in a .exp/ and a .SM/ our
numerical precision is a factor ten below the uncertainty of the difference.
4 Outlook
In the following years we plan to consider the universal contribution to the leptonic
anomalous dimension where only one lepton flavor is present. Furthermore, we plan
to extend the results of Ref. [12] and to present the relation between the MS and on-
shell quark mass for generic number of colors and massless quark flavors. Both
applications require more precise master integrals which we plan to compute on the
Hazel Hen cluster at the HLRS.
1. Bennett, G.W., et al., [Muon G-2 Collaboration]: Final report of the Muon E821 anomalous
magnetic moment measurement at BNL. Phys. Rev. D 73, 072003 (2006). [hep-ex/0602035]
2. Roberts, B.L.: Status of the Fermilab Muon .g 2/ experiment. Chin. Phys. C 34, 741 (2010).
[arXiv:1001.2898 [hep-ex]]
3. Aoyama, T., Hayakawa, M., Kinoshita, T., Nio, M.: Complete tenth-order QED contribution to
the Muon g-2. Phys. Rev. Lett. 109, 111808 (2012). [arXiv:1205.5370 [hep-ph]]
4. Kurz, A., Liu, T., Marquard, P., Steinhauser, M.: Hadronic contribution to the Muon
anomalous magnetic moment to next-to-next-to-leading order. Phys. Lett. B 734, 144 (2014).
doi:10.1016/j.physletb.2014.05.043. [arXiv:1403.6400 [hep-ph]]
5. Smirnov, A.V., Tentyukov, M.N.: Feynman integral evaluation by a sector decomposition
approach (FIESTA). Comput. Phys. Commun. 180, 735 (2009). [arXiv:0807.4129 [hep-ph]].
Preprint No. TTP08-32
This result is taken from Ref. [3].
112 P. Marquard and M. Steinhauser
6. Smirnov, A.V., Smirnov, V.A., Tentyukov, M.: FIESTA 2: parallelizeable multiloop numerical
calculations. Comput. Phys. Commun. 182, 790 (2011). [arXiv:0912.0158 [hep-ph]]. Preprint
No. TTP09-39
7. Smirnov, A.V.: FIESTA 3: cluster-parallelizable multiloop numerical calculations in physical
regions. Comput. Phys. Commun. 185, 2090 (2014). [arXiv:1312.3186 [hep-ph]]
8. Heinrich, G.: Sector decomposition. Int. J. Mod. Phys. A 23, 1457 (2008). [arXiv:0803.4177
9. Kurz, A., Liu, T., Marquard, P., Smirnov, A., Smirnov, V., Steinhauser, M.: Electron contri-
bution to the Muon anomalous magnetic moment at four loops. Phys. Rev. D 93(5), 053017
(2016). doi:10.1103/PhysRevD.93.053017. [arXiv:1602.02785 [hep-ph]]
10. Kurz, A., Liu, T., Marquard, P., Smirnov, A.V., Smirnov, V.A., Steinhauser, M.: Light-by-light-
type corrections to the muon anomalous magnetic moment at four-loop order. Phys. Rev. D
92(7), 073019 (2015). doi:10.1103/PhysRevD.92.073019. [arXiv:1508.00901 [hep-ph]]
11. Kurz, A., Liu, T., Marquard, P., Smirnov, A.V., Smirnov, V.A., Steinhauser, M.: Higher order
hadronic and leptonic contributions to the Muon g 2. arXiv:1511.08222 [hep-ph]
12. Marquard, P., Smirnov, A.V., Smirnov, V.A., Steinhauser, M.: Quark mass relations to four-loop
order. Phys. Rev. Lett. 114(14), 142002 (2015). [arXiv:1502.01030 [hep-ph]]
Part II
Molecules, Interfaces, and Solids
The following chapter reveals that chemistry, material science and solid state
physics have profited substantially from the computational resources provided by
both the High Performance Computing Center Stuttgart and the Steinbuch Centre
for Computing Karlsruhe. A particular challenge was the multi-scale character of
the problems the projects were concerned with in the simulations.
From the broad range of the field, seven contributions have been selected for
presentation. First-principle DFT and MD calculations have been used to study
molecular systems such as molecules bound to gold surfaces and water structures
at the water-mineral interface, and to study processes that could be termed as
‘molecular engineering’ of semiconductors. As to the modelling of structural,
electronic and transport properties of solids, such methods have been used to
study rare earth silicide thin films, lithium ion diffusion through NZP structures,
and laser ablation in covalently bonded materials. The last contribution presents a
time-dependent DMRG treatment of quantum transport in nano-devices attached to
metallic leads.
The work by D. Marx, M. Wollenhaupt and M. Z. Michoff from the University of
Bochum applies first-principles molecular dynamics methods to study in detail what
happens if mechanical forces act on a molecular system, how the molecules stretch
and finally how bond rupture takes place. They compared aliphatic and aromatic
thiolates bound to a gold surface, and the interesting observation was, that although
aliphatic thiolates bind more strongly to the gold surface (higher thermal desorption
energy), the mechanical force required to pull the molecule off the surface is lower
than for aromatic thiolates. The reason is a different mechanochemical pathway,
namely a Au-Au bond rupture for aliphatic thiloates vs. Au-S bond rupture for
aromatic thiolates. In a related investigation, it was investigated how thiotic acid
with a PEG chain stretches under mechanical force, and how chemical its reactivity
is modulated by the mechanical force. It was found that stretching molecule
facilitates nucleophilic attac at one of the sulfur atoms. The QuantumESPRESSO
and CPMD programs have been used to carry out these calculations. Using several
levels of parallelization, these codes offer good scaling up to several thousands of
K. Remi and M. Sulpizi from the University of Mainz investigate the interface
between water and fluorite (CaF2 ) at a microscopic level. Such water-mineral inter-
actions are of great importance in materials science but there are also environmental
and medical issues. Experimentally, vibrational sum frequency generation (VSFG)
is used to obtain spectroscopic signatures from the interface area, since only an
assembly of oriented water molecules and not bulk (isotropic) water generates
a VSFG signal. In the simulations, such a signal is obtained from a molecular
dynamics run using quantum mechanical forces (Born-Oppenheimer molecular
dynamics using density functional methods, as implemented in the CP2K program),
and calculating the contribution of (only) the interfacial water molecules to the time-
dependent dipole moment. This has been done for different setups (low, neutral and
high pH), and the molecular origin of signatures observed in experimental VSFG
spectra could be clarified.
The group of R. Tonner at the University of Marburg studies ‘molecular engineer-
ing’ of functional semiconductors. Experimentally, thin films are grown on materials
such als silicon using metal-organic vapour phase epitaxy. The calculations assess
the thermodynamic stability of such phases. For example, it is demonstrated that
G(NAsP) is stabilized (with respect to segregation into GaN, GaAs and GaP islands)
if grown on Si(001) because of the strain imposed by the epitaxial growth process. It
has been further found out that bismuth atoms in dilute Ga(AsBi) are not distributed
evenly but tend to cluster, and this has some effect on how the band gap of the
host GaAs is modified by Bi doping. Finally, a new mechanism of how in-plane
vibrations of planar molecules adsorbed on a metal surface aquire infrared intensity
has been studied. These calculations used the VASP program which allows making
efficient use of modern supercomputers in highly parallel calculations.
The material physics group at the University Paderborn, conducted by W. G.
Schmidt, combines energy density functional (DFT) calculations with ab initio
thermodynamics to provide a microscopic structural model for the silicon thin film
5 2 phase observed in the sub-monolayer regime, where the lack of theoretical
investigations is particularly severe. Within their DFT framework the generalised
gradient approximation is used, in Perdew-Burke-Ernzerhof formulation, as imple-
mented in the VASP package. The authors demonstrate that the 5 2 structure
is characterised by alternating Si honeycomb and Seiwatz chains, with rare earth
II Molecules, Interfaces, and Solids 115
atoms located in the interjacent channels. The simulated data impressively agrees
with measured STM images of the sub-monolayer structure on a Si(111) surface.
Materials crystallising in the NZP structure, named after NaZr2 .PO4 /3 , with Na
exchanged by Li, are promising candidates for solid-state electrolytes, primarily
because they form a 3D diffusion network for Li ions. C. Elsässer and collaborators
from the University Freiburg and the Fraunhofer Institute for Mechanics of Mate-
rials analysed the diffusion of Li through various NZP compounds by combining
DFT simulations based on Quantum ESPRESSO PWscf code and static energy
calculations with bond valence potentials. In this way important structure-property
relationships could be identified, which allowed, e.g., to predict the migration
barrier heights directly from crystal structure characteristics. For LISICON, which
possesses high ionic Li conductivities under special conditions, the Ti and P were
substituted by a variety of isovalent elements, in order to discuss the influence on Li-
ion diffusion. Calculating the activation energies for the migration of a Li vacancy,
the authors found out that the Li ion can escape the cage through the bottleneck
much easier when the coordination polyhedron around Li-ion is larger.
An outstanding example for the predictive power of large-scale (multi-million
particle) molecular dynamics (MD) simulations is the work by A. Kiselev, J.
Roth, and H.-R. Trebin from the Institute for Functional Matter and Quantum
Technologies at the University Stuttgart. Within the framework of a self-consistent
continuum-atomistic two-temperature (TTM) approach the authors model carrier-
lattice interaction and electron-hole recombination processes in covalently bound
materials subject to strong laser radiation fields. To this end the hitherto existing
MD simulations for metals with high charge carrier concentration are developed
further to treat semiconductors where charge carriers have to created first by the
laser pulse. The focus is on laser ablation in silicon. Here, dynamical interactions,
depending on the electron temperature, have to be taken into account. Compared
to a simple rescale model (with laser energy introduced by a rescaled kinetic
energy of the particles), the ablation thresholds are much higher and the material is
completely vaporised. The results demonstrate the importance of the combined MD-
TTM algorithm with temperature adapted potentials. Beyond doubt such a approach
paves the way for the numerical treatment of general non-equilibrium phenomena
in highly excited covalent systems.
The challenging problems of non-linear quantum transport and transient dynam-
ics of currents in strongly interacting nano-structures were addressed by B. Schoe-
nauer from the Center for Extreme Matter and Emergent Phenomena at Utrecht
University and P. Schmitteckert from the Institute for Theoretical Physics and
Astrophysics at the University Würzburg. By means of an elaborate time-dependent
density matrix renormalisation group (td-DMRG) technique, the authors studied
a paradigmatic model system, where an interacting ring-structure is sandwiched
between two metallic leads. To prepare the system in a state of non-equilibrium
it is quenched by applying an external potential to the leads (which is switched of
for all times t > 0). Krylov subspace methods are used to facilitate the calculation
of the matrix exponential function needed to describe the time evolution. This
way, the authors try to answer the question of whether local observables, such
116 C. van Wüllen and H. Fehske
as the currents through the links, always relax to a steady state. However, it
turned out that oscillating ring-currents, which are orders of magnitude larger than
transmitted currents, dominate the transport, in particular for strong interactions.
Results obtained for different voltages show that the frequency of this oscillation
is independent of the bias, and does not decay on the time scales accessible to the
calculations (which, however, not completely rules out a relaxation of the currents
inside the ring structure in the very long-time limit). A detailed (td-DMRG) analysis
of the time dependence of the reduced density matrix sheds light on the states that
contribute to the current oscillation.
We finally like to emphasise that all projects introduced by the reviewers have
in common, besides a high scientific quality, the need for powerful computers to
achieve their results. That is why the leading-edge systems at the HLRS and KIT
SCC are a prerequisite for such ambitious research.
Mechanochemistry of Ring-Opening Reactions:
From Cyclopropane in the Gas Phase to Thiotic
Acid on Gold in the Liquid Phase
1 Scientific Background
years; taking for example the procedure of cooking, the effect of adding heat was
efficiently used even though no one knew what was happening on a molecular level.
Nowadays recently developed methods such as atomic force microscopy [2] or
sonochemical processes in an ultrasonic bath [3] provide the possibility of applying
mechanical forces to single molecules while monitoring the stress and the resulting
change in the structure of molecules.
In the field of theoretical chemistry, codes for calculating geometries, energies,
and even reaction pathways as a function of constant force rather than upon
imposing structural constraints (i.e. isotensional versus isometric stretching, see
Ref. [1] for a comprehensive discussion) have been implemented and successfully
used. All this research is related to the field of (covalent) mechanochemistry,
which deals with the influence of external mechanical forces on molecules and,
in particular, on their reactions, see Refs. [1, 4, 5] for reviews.
Thiolate–gold interfaces have been intensely studied for many decades using
a wide array of experimental and computational methodologies [6, 7]. The pro-
nounced interest in these particular hybrid molecule/metal junctions and interfaces
is due to a multitude of potential applications such as tailoring the properties
of surfaces [8–12], chemical anchors for molecular electronics applications [13–
15] or coating agents for the stabilization of gold nanoparticles [16, 17]. Thus,
the results obtained by this study will be of great relevance to diverse research
fields, such as tailored surfaces for electrochemical processes, molecular electronics
and in medicinal applications, just to mention a few. For instance, the potential
applications of gold nanoparticles, AuNPs, in several medical applications, such as
X-ray imaging, photothermal therapy, radiotherapy and targeted drug delivery just
to mention a few, has attracted an enormous attention in the last few years [18–28].
To prevent agglomeration and to increase their circulation time in the blood
stream, nanoparticles are typically coated with a biocompatible polymer such as
polyethylene glycol, PEG [29]. The chemical modification of AuNPs with PEG
ligands has been usually performed by covalently attaching PEG chains to the
metallic surface through a thiolate linkage. In the first studies on PEG-AuNPs,
the PEG ligands employed were terminated by a monothiol functional group [30].
However, in most recent works, multivalent thiol linkers are being used. One of
the most common examples are PEG ligands that are appended to thioctic acid,
TA (or dyhydrolipoic acid, DHLA, in its reduced form) [31–36]. This functional
group provides with a bidentate anchoring to gold surfaces. Experimental studies
have shown that TA terminated PEG ligands provide an enhanced colloidal stability
to AuNPs under a wide range of experimental conditions with respect to their
monothiolated counterparts [33–36].
Despite the steeply increasing interest in exploiting multidentate thiols for
functionalization, all experimental as well as theoretical studies performed so far
have only dealt with the structure and thermodynamic stability of these adsorbates.
Information about how such multidentate anchoring functionalities would respond
to external stress is scarce whereas monothiol-based interfaces and point contacts
have been intensely studied [37–47]. To the best of our knowledge, only one
experimental study dealing with the mechanical rupture of a dithiolate linkage to
Mechanochemistry of Ring-Opening Reactions 119
gold has been reported [48]. Interestingly, the measured force required to remove a
single TA molecule from the gold substrate resulted in an about 3.4 times smaller
rupture force compared to that of a simple Au–S bond. This suggests that SAMs
of multidentate thiols may be less stable under tensile stress than anticipated from
their thermal properties via thermal desorption experiments and computed binding
To shed some light into this open topic, we have carried out a comprehensive
computational study on the thermal and mechanical detachment of a series of
bidentate thiolates adsorbed on a gold surface [49]. In this work, the effect of
the chain length separating sulfur atoms has been studied. It was found that
thermal desorption always yields cyclic disulfides. In contrast, mechanochemical
desorption leads to cyclic gold complexes, where metal atoms are extracted from
the surface and kept in tweezer-like arrangements by the sulfur atoms. Interestingly,
the flexibility of the chain is shown to crucially impact on the mechanical strength
of the junction. Given these insights, what remained to be explored is to what extent
solvent effects might affect the rupture scenario and thus the mechanical strength of
We have carried out a systematic computational study, comparing mono and
bidentate thiolate ligands, that expands our knowledge and provides with key
information in order to understand in a very detailed level the mechanochemical
behavior of the thiolate–gold interface.
On one hand, we have investigated the mechanical and thermal desorption of
prototypical aliphatic alkanethiolates, such as ethyl (Et-S) and butylthiolate (Bu-S),
and a series of substituted p-methyl thiophenolates adsorbed on gold surfaces (see
Fig. 1). Since the detailed adsorption structure thiolate ligands on gold is still being
debated [6, 7, 50], we have considered two different models for the adsorption of
the molecules: a perfect flat Au(111) surface and a surface with a vacancy defect on
the adsorption site.
Fig. 1 (a) Molecular structure of the substituted thiophenols studied. (b) Adsorbate of Ar–NO2 –S
on a perfect Au(111) surface, being shown as an illustrative example of the initial structure for the
thermal and mechanical desorption. Note that for clarity purposes, only the two top layers of the
Au slab are shown
120 M. Zoloff Michoff et al.
The surface has been modeled using 4 layers of a 5 6 slab, using ca. 15 Å of
vacuum in the direction perpendicular to the surface to avoid spurious interactions
between the periodic images. This approach has been well–established in the
Marx group also within mechanochemistry. To explore the mechanical desorption of
the proposed systems, we have used the “isometric” approach, in which the carbon
of the methyl group that is common to all molecular species, was constrained to
move in a plane parallel to the bottom gold layer of the slab. The atoms in this
layer were kept fixed at their bulk positions throughout the simulations. The distance
between these two planes can be termed as the “stretching parameter” (D), which
was increased stepwise by increments of 0.2 Å until final breakage of the molecule–
metal junction was observed.
On the other hand, we have also explored the mechanochemistry of such hybrid
interfaces in more realistic conditions, that is to say including the effects of finite
temperature and those can arise from a fully explicit solvation environment.
In this study we have focus on the PEG ligands attached to a gold surface by
means of a bidentate thiolate linking functionality such as the thioctic acid (TA, see
Fig. 2).
As it was already mentioned, the PEG-TA system is widely used in experiments
to coat gold surfaces. Thus, there is a high interest in this system, but little is yet
known about its response to an external mechanical stress.
Fig. 2 (a) Illustration of the system studied using AIMD simulations, showing the Au slab
representing the metallic surface, a model of the PEG-TA conjugate adsorbed on it, and the
solvation environment described explicitly at an atomistic level. An OH molecule has been
highlighted using larger spheres. (b) Molecular structure of the PEG-TA conjugate. (c) Scheme
showing possible reaction pathways to be explored in the presence of OH . The thick arrow
indicates the external force, explicitly included in the AIMD simulations and located on the carbon
of the terminal methyl group of the molecule and directed perpendicular to the metallic surface
Mechanochemistry of Ring-Opening Reactions 121
On one hand, it can be seen that aliphatic thiolates have a higher desorption
energy than the thiophenolates. Within the aromatic derivatives, there is no definite
trend regarding the nature of the substituent. This was also observed for p-
substituted thiophenolates adsorbed on gold surfaces [55], and can be explained
in terms of the nature of the S–Au bonding interactions. Although the Cl-Ph-S and
F-Ph-S derivatives display a mild decrease in the Edes , this could also be attributed
to steric effects from the ortho substituents.
Interestingly, for the flat Au(111) all aromatic thiophenolates display a similar
mechanical desorption mechanism regardless of the nature of the substituent, and
which notably differs from the one displayed by the aliphatic derivatives. For the
latter, the breakage always occurs at a Au–Au bond, with a Frup of 1.6 nN, whereas
for the aromatic disubstituted molecules the final rupture takes place at a S–Au,
with Frup values of 2.2 nN. Thus, the aliphatic thiolates display a higher thermal
desorption energy, but are mechanically detached with a Frup that is 37 % lower
than that of the aromatic thiolates.
To determine whether this detachment scenario depends on the detailed structure
of the S–Au junction, additional pulling computational experiments were carried
out in which Bu-S and H-Ph-S were scrambled at an early stage in the opposite
mechanical desorption pathway. This is illustrated in Fig. 3. As it can be appreciated,
Fig. 3 Illustrative scheme of the two different mechanical detachment pathways observed for
aliphatic (Pathway A, top) and aromatic (Pathway B, bottom) thiolates on flat Au(111). Only some
key structures along the pathway are being sequentially shown. In each case, the S–Au junction
taken from the second relevant structure along the pathway is kept and the carbon skeleton of
the molecule exchanged. Illustrative structures of the mechanical detachment pathways originated
from these starting points are shown
Mechanochemistry of Ring-Opening Reactions 123
the aliphatic molecule detaches via a Au–Au rupture, whereas the aromatic thiolate
does so through the breakage of a S–Au bond, regardless of the detailed structure of
the Au–S junction. Several parameters derived from the electronic structure, such as
charges and bond orders correlate well with these observations.
Finally, the presence of a vacancy defect on the adsorption site provides with less
coordinated Au atoms to which the molecule attach more strongly, as it is reflected in
the Edes values in Table 1. This does not change the mechanical detachment scenario
for the aliphatic thiolates, but it does for the thiophenolates. The aromatic derivatives
display now a Au–Au breakage with a Frup of 1.8 nN, which means that a higher
adsorption interaction leads to a lower mechanical stability.
We are now performing more analysis to have a deeper understanding of this
phenomena, and we foresee that these results will have great impact in the design of
such hybrid molecule/metal interfaces.
To define the minimum model for the PEG-TA conjugate, we have studied
the properties under stress of a molecule composed of thioctic acid with two
ethyleneglycol units appended (LONG). This is the largest molecular system we
consider it could be feasible to treat in AIMD in a full solvated simulation box.
We have also determined the mechanical properties of two shorter models of the
PEG-TA conjugate: one with one less ethyleneglycol unit (SHORT-1), an another
one using a modified version of the thioctic acid with two CH2 units less (SHORT-
2). The comparison of the mechanical properties of the three proposed molecular
models is shown in Fig. 4.
As it can be observed, most of the relevant mechanical features displayed by
the largest model considered are retained by the model labeled as SHORT-2. This
will be the molecular model chosen to represent the PEG-TA conjugate in our
In order to study the possible adsorption structures of PEG-TA on gold surfaces,
many different adsorption sites and geometries have been probed using as a model
the cyclic portion of the thioctic acid molecule. To account for different bonding
scenarios, three types of surfaces were considered: flat Au(111), and two types of
point defects, a vacancy and an adatom. The most stable structures found for each
type of surface are shown in Fig. 5. The corresponding desorption energies to give
the cyclic disulfide are 1.27 eV, for the flat Au(111) surface, 1.76 eV for the surface
with a vacancy and 1.46 eV for the surface with a gold adatom.
Then we studied the mechanical detachment of the minimum PEG-TA model
previously determined adsorbed on the Au(111) surface with an adatom. The results
are shown in Fig. 6a. For comparison purposes, we have also included the results
124 M. Zoloff Michoff et al.
Fig. 4 Mechanical properties of the proposed molecular models for PEG-TA as a function of the
external force applied, left panel from top to bottom: stretching coordinate q (distance between the
atoms to which the force is applied); C(H2 )–O–C(H3 ) angle; and C–C–C(=O)–N torsion angle.
The molecular structures are illustrated in the right panel. The arrows indicate the atoms on which
the external force is applied to
Fig. 5 Most stable structures found for the adsorption of the cyclic portion of thioctic acid on
different gold surfaces. From left right, adsorbates on flat Au(111), surface with a vacancy and
surface with an adatom
Fig. 6 Pathway of mechanical desorption of (a) PEG-TA model on a gold surface with an adatom,
and (b) Cyclic moiety of TA on a gold surface with a vacancy. In both cases the following plots
are shown: Total electronic energy along the mechanical desorption pathway (filled black circles,
left axis), force versus distance curves for regions of elastic deformation (solid red lines, right
axis); the connecting broken red lines are merely guides to the eye through discontinuous plastic
deformation events, as a function of the stretching parameter D. The filled red circle indicates
the Frup value. Some relevant structures along each stretching pathway are shown on the top of
each plot
Two points should be specially noted: on one hand, the mechanical stretching
pathways does not differ much when these two defective surfaces are considered.
On the other, as expected, the PEG chain does not greatly influence the mechanical
stability of the molecule–metal junction. The main differences are noted at the initial
stages of the stretching of the PEG-TA model, which corresponds to the unfolding
of its soft dihedral degrees of freedom. Most noticeably, at the last stage just before
the final breakage the detailed geometry of the molecule–metal contact is very
similar in both cases. It corresponds to the molecule being bonded to the surface
by one sulfur atom and with one metallic atom complexed by both sulfur atoms in
a tweezer-like arrangement. Further stretching leads to the detachment of the final
product, a cyclic complex with one gold being extracted from the surface, by means
of a S–Au bond rupture with a very similar rupture force, Frup , value: 2.05 nN for
the cyclic moiety initially adsorbed on the surface with a vacancy, and 1.99 nN for
the PEG-TA model on the surface with an adatom.
From the atomic charges evolution of the stretching of the PEG-TA model, we
could determine that the effect of the external force has the largest impact on the
126 M. Zoloff Michoff et al.
sulfur atoms. These atoms become more positive as the molecule–metal junction is
stretched during the first stages, making them prone to an attack by a nuclephilic
species such as OH (aq). Because the attachment point of the chain to which the
force is applied is not symmetrical with respect to the S–Au bonds, then it can be
foreseen that the external stress will not be equally transduced to both thiolate–gold
linkages. Therefore, we considered of interest to examine the effect of the external
force on the free energy pathway for the attack of OH (aq) on both sulfur sites.
For this purpose, our starting point was a pre-stretched PEG-TA structure on gold,
obtained from our preliminary “in vacuo” study. After solvation and equilibration
with water and one OH impurity of the selected structure, we then proceeded with
the AIMD simulations at constant force. We started from a somewhat lower force
than the one used to pre-stretch TA-PEG, and then the constant force was increased
in a stepwise manner. Using this procedure, we covered a range of 1.2–2.6 nN, up
to now. We are still running simulations at higher forces, since we have not yet
observed the detachment of the molecule from the surface.
Notably, at a value of 2.2 nN a structural change at the gold–thiolate interface is
observed. This is illustrated in Fig. 7. At low forces, both anchoring sulfur atoms
are attached to two gold atoms, with one common atom to which both S atoms are
attached to. As the force increases, for the S atom one labeled as S1 in Fig. 7, the
S–Au bond with central Au atom is notably elongated.
This observation suggests that a value of 2.2 nN could be a threshold beyond
which the relative reactivity of the anchoring S atoms may dramatically change.
We then proceeded to assess this hypothesis by means of determining the free
Table 2 Activation free Force A (S1 ), kcal/mol A (S2 ), kcal/mol
energies (A ) for the attack
of OH on S1 and S2 at 1:2 29:5 19:7
F = 1.2 nN and F = 2.0 nN 2:0 27:2 18:0
128 M. Zoloff Michoff et al.
Acknowledgements Partial financial support is provided by the DFG Koselleck Grant “Under-
standing Mechanochemistry” to D.M. We wish to thank Przemyslaw Dopieralski and Martin
Krupička for their contributions to this work.
Mechanochemistry of Ring-Opening Reactions 129
38. Krüger, D., Fuchs, H., Rousseau, R., Marx, D., Parrinello, M.: J. Chem. Phys. 2001(115),
39. Keel, J.M., Yin, J., Guo, Q., Palmer, R.E.: J. Chem. Phys. 2002(116), 7151–7157
40. Krüger, D., Fuchs, H., Rousseau, R., Marx, D., Parrinello, M.: Phys. Rev. Lett. 2002(89),
41. Krüger, D., Rousseau, R., Fuchs, H., Marx, D.: Angew. Chem. Int. Ed. 2003(42), 2251–2253
42. Xu, B., Tao, N.J.: Science 2003(301), 1221–1223
43. Konôpka, M., Rousseau, R., Štich, I., Marx, D.: J. Am. Chem. Soc. 2004(126), 12103–12111
44. Chen, F., Zhou, A., Yang, H.: Appl. Surface Sci. 2009(255), 6832–6839
45. Seema, P., Behler, J., Marx, D.: Phys. Chem. Chem. Phys. 2013(15), 16001–16011
46. Xue, Y., Li, X., Li, H., Zhang, W.: Nat. Commun. 2014, 5 (2014)
47. Seema, P., Behler, J., Marx, D.: Phys. Rev. Lett. 2015(115), 036102
48. Langry, K.C., Ratto, T.V., Rudd, R.E., McElfresh, M.W.: Langmuir 2005(21), 12064–12067
49. Zoloff Michoff, M.E., Ribas-Arino, J., Marx, D.: Phys. Rev. Lett. 2015(114), 075501
50. Pei, Y., Zeng, X.C.: Nanoscale 2012(4), 4054–4072
51. Dopieralski, P., Ribas-Arino, J., Marx, D.: Angew. Chem. Int. Ed. 2011(50), 7105–7108
52. Wollenhaupt, M., Krupička, M., Marx, D.: ChemPhysChem 2015(16), 1565–1565
53. Dopieralski, P., Ribas-Arino, J., Anjukandi, P., Krupička, M., Kiss, J., Marx, D.: Nat. Chem.
2013(5), 685–691
54. Dopieralski, P., Ribas-Arino, J., Anjukandi, P., Krupička, M., Marx, D.: Angew. Chem. Int. Ed.
2015(55), 1304–1308
55. Miranda-Rojas, S., Muñoz Castro, A., Arratia-Pérez, R., Mendizábal, F.: Phys. Chem. Chem.
Phys. 2013(15), 20363–20370
56. Giannozzi, P., et al.: J. Phys. Condens. Matter 2009(21), 395502
57. Car, R., Parrinello, M.: Phys. Rev. Lett. 1985(55), 2471–2474
58. Marx, D., Hutter, J.: Ab Initio Molecular Dynamics. Cambridge University Press, Cambridge
59. Raiteri, P., Laio, A., Gervasio, F.L., Micheletti, C., Parrinello, M.: J. Phys. Chem. B 2006(110),
Microscopic Insights into the Fluorite/Water
Interfaces from Vibrational Sum Frequency
Generation Spectroscopy
1 Introduction
fluorite/water interface, not only as function of the pH, but also as function of
the concentration of ions in the solution and addressing fluorite/water interfaces
with saturated and supersaturated solutions. At high pH, the presence of surface
adsorbates is detected and attributed to calcium hydroxo complexes [12]. At low
pH atomic scale disorder was observed, which could be attributed to either partial
dissolution of the topmost layer by the creation of F- vacancies, or to proton
adsorption at the interface. Still experiments seem not to be able to distinguish
between the two possible scenarios [12].
As another surface sensitive technique, Vibrational Sum Frequency Generation
Spectroscopy (VSFG) has the ability to selectively address the nanometric interfa-
cial water layer, and indeed has contributed substantially to our understanding of
the physical and chemical properties of the CaF2 /water interface [2, 3]. VSFG is
rather unique in its ability to provide the vibrational spectrum of water molecules
specifically at the interface, as the selection rule of VSFG requires symmetry to be
broken, i.e. no VSFG signal can be generated from the adjacent centrosymmetric
bulk. Previous VSFG investigations of water at the CaF2 /water interface by the
Richmond group [2, 3] have revealed dramatic changes in the interfacial hydrogen
bonding structure upon changing the pH of the aqueous phase. In particular at low
pH, the VSFG experiments have suggested that positive charge develops on the
surface, causing orientation of water molecules into highly ordered, tetrahedrally
coordinated states. At near-neutral pH, the VSFG signal vanishes and this has
been interpreted as the result of a more random orientation of the interfacial water
molecules at a near-neutral surface. Finally in the basic pH regime dissociative
adsorption was hypothesised to take place on the solid surface resulting in the
formation of Ca-OH species. Open questions are still: how do these OH groups
contribute to the VSFG spectrum? What type of order is established in the interfacial
water region?
Here we review a recent simulation study aimed at answering these questions
and to provide a new microscopic understanding of the CaF2 /water interface
as function of pH [11]. We explore the effect of surface termination on the
interfacial water arrangement and we show the importance of the local electrical
field due to ions in solution in the near-surface region on water orientation. Such
a detailed analysis is now possible thanks to recent advances in the computational
techniques. In particular, we use Density Functional Theory (DFT)-based molecular
dynamics (MD) simulations, which allow an accurate description of the structure
and dynamics of hydrogen bonding in highly heterogeneous environments, also
including electronic polarisation. A newly developed approach is used for the
calculation of the VSFG spectra [11] which only requires the atomic positions and
velocities without the cost of the additional calculation of molecular dipoles and
polarizabilities. At the same time appropriate selection rules for the VSFG are also
taken into account. The spectra are calculated using velocity-velocity correlation
functions (VVCF) over several 100 ps time scale. This is possible thanks to the use
Microscopic Insights into the Fluorite/Water Interfaces from VSFG Spectroscopy 133
of massively parallel architectures, such as the Cray XE6 (Hermit, HRLS) and the
Cray XC40 (Hazel Hen, HRLS) used in the present work. This permits us to build
several models, which include about 500 atoms each and span e.g. different surface
2 Methodology
Several models are used to describe the fluorite/water interface over a wide range
of pH. The reference system – an interface between CaF2 (111) and water at
neutral pH – is composed of 88 water molecules and 60 formula units of CaF2
contained in a 11.59 13.38 34.0 Å cell periodically repeated in the (X,
Y, Z) directions. All the other models have close compositions and size to allow
inter-system comparisons. The thickness of water slabs is around 20 Å along the
z-axis, which is reasonable compromise between the need to achieve bulk-like
properties far from the surface and the computational cost. Simulations were carried
out with the package CP2K/Quickstep [25], consisting in Born-Oppenheimer MD
(BOMD) BLYP [1, 13], electronic representation including Grimme (D3) correction
for dispersion [7], GTH pseudopotentials [6, 8], a combined Plane-Wave (280 Ry
density cutoff) and TZV2P basis sets. All the BOMD are performed using the NVT
ensemble. The Nosé-Hoover thermostat is used to control the average temperature at
330 K. Trajectories are accumulated for at least 50 ps (whom 10 ps of equilibration)
with a time step of 0.5 fs.
The starting equation to calculate the VSFG response function from molecular
dynamics simulations have been introduced by Morita [9, 15–17]:
Z C1
.2/;R i ˝ ˛
PQR D P R .0/ dt
ei!t AP PQ .t/M (1)
kB T! 0
Here .2/;R is the resonant part of second-order susceptibility tensor, .P; Q; R/ are
O Y;
any directions of the laboratory frame .X; O Z/,
O ! is the frequency of the IR beam,
APQ and MR are respectively the components of the total polarizability tensor and
the total dipole moment and the dot stands for the time derivative.
134 R. Khatib and M. Sulpizi
If we suppose that at the frequencies of interest only the O-H stretching has an
impact on the spectra, the total polarizability and dipole moment of the system (APQ ,
MR ) can be decomposed into individual (OH) bond contributions (˛mn;PQ , mn;R ),
where the sum is done over all the Nm bonds of the M molecules:
ˆ XM X Nm
ˆ P PQ .t/ D ˛P mn;PQ .t/
ˆ A
< mD1 nD1
ˆ X
ˆ M
:̂ P R .t/ D P mn;R .t/
mD1 nD1
Moreover, thanks to basic geometry considerations, one can express the dipole
moment of the A-B bond from the molecular frame (b ) to the laboratory
frame (l ):
l D Db (3)
where D is the direction cosine matrix projecting the bond frame onto the laboratory
frame. In the following, we will assume that (1) the bond elongations are small
enough to make Taylor expansion at the first order and (2) the stretching mode of
the bond is much faster than the modes involving a bond reorientation – for example
the libration. The second assumption means that D P Ri 0 and that drz drx dry
dt dt dt
Therefore P R can be simplified into:
P R .0/ DRi .0/P i .0/
0 1
X X ˇ
x;y;z x;y;z
@i drj ˇ A
DRi .0/ @
i j
@rj dt ˇtD0
DRi .0/ vz .0/ (4)
where vz .0/ D drdtz ˇ corresponds to the projection of the velocity on the bond
With the same methodology for the polarizability, one deduces that:
2 3
X @˛ij
˛P PQ .t/ 4DPi .t/ DQj .t/ 5 vz .t/ (5)
i j
Microscopic Insights into the Fluorite/Water Interfaces from VSFG Spectroscopy 135
Table 1 Calculated derivatives of the dipole moment (D.Å1 ) and polarizability (Å2 ) of the O-H
bond in a bulk of water and in CaFOH monomer. The results are given within the bond frame
@x @y @z @˛xx @˛yy @˛zz @˛xy @˛xz @˛yz
@r @r @r @r @r @r @r @r @r
H2 O 0:15 0:0 2:1 0:40 0:53 1:56 0:0 0:02 0:0
H3 OC 0:11 0:0 1:7 0:47 0:40 1:50 0:0 0:0 0:0
HO 0:0 0:0 1:6 0:5 0:5 2:3 0:0 0:0 0:0
The use of equation (4) and (5) into equation (2) brings important computational
advantages. Indeed the velocities and the direction cosine matrix (vz , D) can be read-
ily obtained from the DFT-MD trajectories while @rzij , @ i
@rz can be parametrized [4].
Our approach avoid the additional direct calculation of the bond dipole moment
and polarizabilities which, at an ab initio level certainly requires a considerable
additional computational cost, e.g. the cost of the Wannier centres localisation [23].
Finally, with the splitting of the dipole moment and polarizability into their bond
contributions, it is easy to decompose the signal into its auto-, intramolecular and
intermolecular parts.
The parametrization of @rzij and @
is based on the calculation of the maximally
localised Wannier functions (MLWF) [14] and has been done through the methodol-
ogy developed by Salanne et al. [21]. The values are obtained by a 2-point numerical
differentiation: a single O-H bond is elongated by ˙0:02 Å. For the O-H bond of
water molecules, a trajectory of 128 H2 O inside a cubic box (c D 15:6404 Å) has
been simulated and an average involving more than 4000 bonds distributed over a
dynamic of 40 ps has been done. One formula unit of HCl has been added to the
previous box in order to do the same kind of sampling about the O-H bond of the
hydronium. Finally, for the O-H bond of the grafted hydroxide ions, the derivatives
are those obtained on a linear monomer of CaFOH. All these values are resumed in
the Table 1.
.CaF2 /surf C HC C
aq .CaF /surf C HFaq : (6)
Fluoride ions dissolving into the water solution leave positive vacancies on the
surfaces, which are responsible for the aligning of the water molecules. As the
VSFG signal increases with increasing interfacial order in the system, a large VSFG
136 R. Khatib and M. Sulpizi
Fig. 1 (a) Random snapshot of the system used to describe the CaF2 /H2 O interface for the neutral
pH. Miniatures highlighting the differences between the neutral pH and the (b) low pH with an
excess of proton in the form of dissociated HCl, (c) low pH system with partial dissolution of
fluoride ions, (d) high pH with 6 substitutions of fluorides by hydroxides per surface. For (b–
d), the water molecules are transparent in order to highlight the ions position. The hydrogens are
coloured in white, the oxygens in red, the fluorines in pink, the clorines in green and the calciums
in turquoise
signal is detected [2, 3]. For low pH, model systems which resemble the final
equilibrium state can be built with various concentrations of fluorite vacancies on
the surface, which correspond to different extents of positive charge on the surface
(Fig. 1). In particular our model consists of a CaF2 slab in contact with water where
two equivalent interfaces are present. Fluoride counterions are added to the solution
to compensate the positive surface charge, i.e. to get an overall neutral system. We
find that the F ions tend to prefer to be solvated by water, and form a diffuse
layer in the near-surface region. Overall, the surface-localised positive charge and
the near-surface negative counterions generate a double layer, giving rise to a rather
strong electrical field at the solid/liquid interface. We have considered more extreme
conditions with 2.58 vacancies.nm2 (4 vacancies on each surface) and milder
conditions with 1.29 vacancies.nm2 (2 vacancies) or with 0.64 vacancies.nm2
(1 vacancy), respectively.
Microscopic Insights into the Fluorite/Water Interfaces from VSFG Spectroscopy 137
At high pH, the hydroxide ions in excess are expected to react with the CaF2
surface leading to the following substitution:
The Ca-OH groups on the surface have been suggested as the responsible for the
narrow band signal at 3645 cm1 [2, 3]. For high pH, we have constructed a model
where a surface modification of the CaF2 has taken place in response to the increased
concentration of OH groups in the solution. In the topmost fluorite layer, F- were
partially or totally replaced by HO (Fig. 1). Different concentrations of OH have
been considered in order to establish a relation between the VSFG signal intensity
and the pH: 1, 6 and 12 substitution over the 12 available sites per surface.
Using the described models we have calculated the spectral responses from the
surface sensitive vibrational density of states using surface specific VVCF (see
method sections for details) for the XXZ polarization (the indexes of .2/ will be
In the case of low pH the spectra for the different vacancy densities are reported
in the top row of Fig. 2. The common feature for all the different concentrations of
surface vacancies is the presence of a broad negative band in the Im.2/ spectrum,
which, for the 1 and 2 vacancies systems, is located around 3300 cm1 . As the
charge concentration increases to 4 positive charges, the intensity of the band
increases and the band position moves towards lower frequencies, with a maximum
located at 3100 cm1 . If we compare the calculated spectra to the experimental
ones [11], we can see that such strong red shift for the 4 vacancies system is not
consistent with the experiment. Better agreement is found for the 1 and 2 vacancies
Fig. 2 Comparison of the Im.2/ , Re.2/ and j.2/ j2 for different values of the surface defect
concentration (plain lines). Top panels: low pH. Bottom panels: high pH. In order to facilitate the
comparison, the spectra with 2 HCl per surface have been plotted in dotted lines on the spectra
with 2 vacancies per surface
138 R. Khatib and M. Sulpizi
Fig. 3 Re.2/ , Im.2/ and j.2/ j2 obtained from simulations (blue, red and black respectively).
Low pH (1+), neutral (111) and high pH (6 OH) systems are considered
Fig. 4 Im.2/ (top) and Re.2/ (bottom) as function of the layer thickness included in the
calculation. Left panels: low pH (1 defect per surface); Right panels: high pH (6 substitutions
per surface)
Fig. 5 Density profile of H3 OC and Cl along the Z-axis. As a guide for the eyes, the position of
the CaF2 interface is represented by a dashed grey line
model without fluoride vacancy, but instead with an excess of protons in the form
of dissociated HCl is present (4 HCl, 2.5 M solution). Such a system is reported
in Fig. 1b and would eventually corresponds to 2 excess protons per surface. The
proton distribution at the interface is reported in Fig. 5.
140 R. Khatib and M. Sulpizi
The calculated VSFG spectra for this system are shown in Fig. 2 in the last panel
of the top row. The first striking result is that overall the signal is much weaker than
that obtained for the model with two fluorine vacancies per surface, which exhibit
the same overall positive charge at the interface. Moreover, the main peak in the
Im.2/ is located at 3500–3600 cm1 , which is quite far from peak location in
the experimental spectra. This analysis would suggest that the excess proton alone
cannot be responsible for the measured spectra, which instead originates from the
water aligned by the positive fluorine vacancies.
Let’s now move to the analysis of the high pH conditions. The imaginary and
real part of the VSFG spectrum together with the intensity spectrum calculated
from the surface selective VVCF analysis are presented in the bottom row of Fig. 2
for the three different values of OH concentration on the surface. For the 1 and
6 substitutions two main features can be observed in the imaginary part: the first
is a positive band between 3280 and 3400 cm1 , the second is a negative feature
between 3400 and 3700 cm1 . In the case of 12 OH substitutions, the overall profile
of Im.2/ is very different, with a broad negative band extending up to 3200 cm1
where a crossing to positive values is finally observed. The real part and the intensity
spectrum have a very high intensity below 3600 cm1 (Fig. 3), which is not present
in the experiment [11]. The best agreement between calculated and experimental
spectra is found for the models with 1 or 6 OH substitutions. From this we can
set, for the experimental pH D 13, an upper limit of 6 OH substitutions per surface
corresponding to 3.87 substitutions.nm2.
As done for the low pH, also for the high pH conditions, we can decompose
the overall signal in molecular contributions, thus providing a microscopic inter-
pretation of the experimental spectra. In particular, the peak between 3600 and
3700 cm1 is only associated with the OH groups on the surface, namely those
OH groups which replace F in the topmost layer, which is clear from the purple
spectrum in the bottom panel of Fig. 4. This frequency is very close to that of
“free OH” [20, 26], indeed such an OH group does not form any hydrogen bond
with water. This is clearly shown in the radial distribution function of the Ca-OH
hydrogen with water oxygens: the distance between the proton of the Ca-OH and
the oxygen from water (red curve, Fig. 6) is much larger than the distance between
the proton from one water molecule and the oxygen from the next water molecule
(black curve Fig. 6). The presence of a “free OH” signal at the solid/liquid interfaces
is not so uncommon. A similar high frequency peak has also been observed for
the alumina/water interface [24], where no hydrogen bond is formed between the
surface OH groups and the water molecules.
In addition to the “free OH” peak, the high pH spectra, also present a band
between 3280 and 3400 cm1 , which is instead associated with hydrogen bonded
water molecules at the interface. These hydrogen bonded waters have an opposite
orientation with respect to that of the OH groups, as evident from the opposite sign
of Im.2/ for the two different peaks. The water ordering is not very pronounced
and saturates with a distance of 2 Å (Fig. 4).
Finally, let’s briefly comment on the neutral pH conditions. The neutral pH model
is given by a fluorine terminated surface in contact with neutral water (no excess of
Microscopic Insights into the Fluorite/Water Interfaces from VSFG Spectroscopy 141
4 Conclusions
Acknowledgements This work was supported by the DFG Research Grant SU 752/2-1. All the
dynamics were simulated on the supercomputers of the High Performance Computing Center
(HLRS) of Stuttgart (Grant 2DSFG).
142 R. Khatib and M. Sulpizi
22. Saxena, V., Ahmed, S.: Dissolution of fluoride in groundwater: a water-rock interaction study.
Environ. Geol. 40(9), 1084–1087 (2001)
23. Sulpizi, M., Salanne, M., Sprik, M., Gaigeot, M.-P.: Vibrational sum frequency generation
spectroscopy of the water liquid-vapor interface from density functional theory-based molecu-
lar dynamics simulations. J. Phys. Chem. Lett. 4(1), 83–87 (2013)
24. Tong, Y., Wirth, J., Kirsch, H., Wolf, M., Saalfrank, P., Campen, R.K.: Optically probing Al-O
and O-H vibrations to characterize water adsorption and surface reconstruction on ˛-alumina:
an experimental and theoretical study. J. Chem. Phys. 142(5), 054704 (2015)
25. VandeVondele, J., Krack, M., Mohamed, F., Parrinello, M., Chassaing, T., Hutter, J.: Quickstep:
fast and accurate density functional calculations using a mixed gaussian and plane waves
approach. Comput. Phys. Commun. 167(2), 103–128 (2005)
26. Walrafen, G.E., Douglas, R.T.W.: Raman spectra from very concentrated aqueous NaOH and
from wet and dry, solid, and anhydrous molten, LiOH, NaOH, and KOH. J. Chem. Phys.
124(11), 114504 (2006)
Growth, Structural and Electronic Properties
of Functional Semiconductors Studied by First
1 Introduction
Many semiconductor solar cells or logical devices, transistors, are based on silicon
(Si) crystal substrates. One potential pathway increasing device efficiencies beyond
the so-called red brick wall – the physical limits of miniaturization and device
fabrication – is to employ optically active compound semiconductors within Si-
based devices. Then, the device can make use of specifically designed optoelectronic
properties that enable optical telecommunication or even nonlinear optical effects.
Due to the indirect band gap of pure Si, functionality of conventional devices is
limited due to excitation inefficiency and various loss mechanisms [1, 2].
The project reported is part of a research program that investigates fabrication
and properties of new semiconductor materials. One class is III/V materials that
comprise chemical elements from groups 13 and 15 at various relative concentra-
tions which allows the adjustment of electronic band gaps and atomic structure for
integration in Si-based devices [3, 4].
For defect-free growth and integration highly specific deposition techniques have
been developed [5]. The growth of thin films in the nanometer scale is possible
2.1 Methods
Total energies were determined by DFT methods applying the VASP 5.3.5 [11–14]
software with a plane-wave basis set and the projector-augmented wave procedure
(PAW) [15, 16]. The expansion of plane-wave basis functions was stopped at
350 eV while electronic energies and atomic forces were converged to 105 eV and
102 eV/Å in electronic and structural relaxation, respectively.
The lattice constant of Si was optimized to a = 5.421 Å applying the exchange-
correlation functional by Perdew, Burke and Ernzerhof (PBE) [17, 18] and the D3
[19, 20] correction for long-range, attractive van der Waals interactions. Further
calculations were performed with the HSE06 hybrid functional [21, 22] with D3 as
indicated in the sections below.
The D3 parameters for the PBE0 hybrid functional [20] were used for HSE06
calculations while 50.2 and 21.2 Å were used as cutoffs for the interaction radius
and for the determination of coordination numbers, respectively, for all calculations.
Momentum space was expanded in a -centered grid derived via the Monkhorst-
Pack method [23] with a (4 2 1)-division of the Si(001)c(4x2) surface cell containing
four dimers. For other cell sizes the k-mesh division was scaled according to inverse
lattice vectors. The asymmetric slabs contain eight Si layers with the two bottom
layers frozen to bulk positions and hydrogen saturation at the bottom.
Model cells for Si(001)c(4x2) at different coverages are presented in Fig. 1.
The coverage is defined as
D 1 hydrogen atom per surface Si atom, i.e. the
monohydride configuration H/Si(001). For lower coverages it was assumed that Si-
Si dimers will either be both hydrogenated or both pristine due to stabilization as
Si(001)c(4x2) reconstruction with buckled dimers for
D 0. Thus, it can be easily
understood that fully covered
D 1 as well as uncovered
D 0 Si(001) are stable.
A coverage of
D 0:5 seems to be stabilized with adjacent, fully and uncovered
dimer rows.
For vibration calculations the Phonopy 1.8.2 code [24, 25] was used and a 2 2
supercell containing 16 dimers lead to converged results (PBE-D3) and used as
standard for phonon density of state calculations with a q-mesh of 8 4 1 points
in reciprocal space, centered at the -point.
2.2 Results
neglecting the volume change of the surface (with respect to [26]) and applying the
reaction energies E of the desorption reaction
ŒSi H ! ŒSi
C H2 ; (2)
2 EH2 E1ML
E D (3)
Fig. 2 (a) Phonon density of states (DOS) of hydrogenated and pristine Si surface. The negative
frequencies stem from the frozen bottom layers. (b) Free energy of complete desorption for
Einstein model [29, 30], interpolated phonon (IP) and AITD approach in comparison. The
electronic energies were computed with HSE06-D3 (Figure reprinted with permission from
Ref. [9])
Fig. 3 Temperature dependence of coverage
for IP and AITD approach. Binding energies
computed with PBE-D3 and HSE06-D3. The grey-shaded area indicates the range of the graph
with partially hydrogenated surface (0.95 >
> 0.1 ML) (Figure reprinted with permission from
Ref. [9])
Ga(NAsP) exhibits a direct band gap in the Vis/IR range promoting the material for
efficient light emitting devices performing efficiently even at room temperature [34–
36]. It is almost lattice-matched to Si(001) and GaP(001), and can thus be epitaxially
A minimal strain energy decreases the probability of dislocation defects formed
during growth. Defect-free samples of Ga(NAsP)/GaP/Si(001) were realized at
moderate and high growth temperatures between 575 and 700 ıC [33]. While the
material was found homogeneous at high temperatures (decreasing compositional
disorder inside the bulk film) the roughness at the QW interfaces was smallest at
low temperatures. The QW roughness also increases with the thickness of the film
grown [37–41].
These results point towards thermodynamic stability of Ga(NAsP) grown
on GaP(001) although dilute nitride III/V materials were expected to be
metastable [33].
In the following, the composition Ga(Nx As0:85x P0:15 ) is investigated with an
N concentration x between 0 and 0.25 and its stability with respect to different
substrates is presented.
3.1.1 Methods
The calculations were performed according to Sect. 2.1 unless the following mod-
ifications. The PBE-D3 functional was used with a cut-off energy of 500 eV for
the plane-wave basis expansion and a -centered (6 6 6) k-grid for primitive cells.
For (2 4 5) supercells, (4 3 2) intersections were used. Mimicing the epitactic
nature of Ga(NAsP) growth on Si(001) and GaP(001), the x=y cell parameters were
constrained and the cells were relaxed stepwise in z towards hypothetical bulk-like
Ga(NAsP) (theoretical epitaxy).
The strain relaxation energy (SRE) and the phase separation energy (PSE) are
defined as
x Asy Pz /
N xEGaN C yEGaAs
: (5)
The former is the energy difference between the strained and the relaxed bulk
film of the compound material. The latter is given by the energy difference between
the strained compound material and the strained binary materials with respect to the
substrate. All materials exhibit zinc blende structure with the respective substrate’s
lateral lattice constant.
152 A. Stegmüller et al.
3.1.2 Results
Figure 4 shows the SRE of Ga(NAsP) on Si and Ga(AsP) substrates as well as the
PSE of Ga(NAsP) on a Si lattice and its equilibrium lattice constant.
The SRE of Ga(Nx As0:85x P0:15 ) decreases from 0 % to 20 % N content as
the material is decreasingly compressively strained with respect to a Si substrate
(Fig. 4a) black, filled dots). Above an N ratio of 20 %, the material becomes
tensilely strained and the SRE rises. In contrast, on hypothetical Ga(AsP) substrate,
a deposited film becomes increasingly tensilely strained when adding nitrogen.
Low SRE values support the hypothesis of thermodynamic stability for a given
compound material-substrate system.
The tendency for phase separation into the binary materials in different strained
environments was studied by the PSE for Ga(Nx As0:85x P0:15 ) (Fig. 4b). The PSE of
the quarternary material with silicon’s lattice constant (black dots) is negative for
low N contents and stabilizes further for concentrations up to 15 %. For higher con-
centrations it drastically increases and becomes positive for 25 % N incorporation.
Then, there is a thermodynamic drive to separate into the binary components GaN,
GaP and GaAs, presumably dominated by the contribution of GaN which is highly
strained in the respective environments. This can be followed by the behaviour of
the PSE on Ga(Nx As0:85x P0:15 ) with equilibrium lattice constant which increases
monotonously with N concentration (yellow dots, Fig. 4 b).
Fig. 4 Computational results for (a) strain relaxation energy (SRE) of Ga(Nx As0:85x P0:15 ) on Si
( filled black symbols) and virtual Ga(As0:85 P0:15 ) (open red symbols) substrates. In (b) the values
of the phase separation energy (PSE) of Ga(Nx As0:85x P0:15 ) strained to Si lattice constant ( filled
black symbols, left hand side axis) and on its equilibrium lattice constant (open orange symbols,
right hand side axis) are plotted. Energy values refer to the (2 4 5) supercell (Figure reprinted
with permission from Ref. [33])
Properties of Functional Semiconductors Studied by First Principles 153
These results support experimental findings and the hypothesis drawn: Depend-
ing on composition and the substrate, Ga(NAsP) is thermodynamically stabilized at
certain N concentrations (15–20 %). In this composition range, the lattice-match to
Si is optimal. QW layers of Ga(NAsP) can be grown at higher temperatures resulting
in high quality films measured by roughness and compositional disorder.
Dilute Ga(AsBi) materials are applied for band gap engineering of III/V semi-
conductors. Starting from GaAs, the band gap can linearly be reduced by adding
bismuth (Bi) at low concentrations (<1.9 %) so that light-emitting devices in the IR
region were realized. This band gap reduction originates in an up-shift of the valence
bands accompanied by a down-shift of the spin-orbit split-off band [43–49].
At higher Bi concentrations, the band gap narrowing becomes non-linear and,
furthermore, a dependence of the band gap with respect to internal Bi bonding
arrangements arises. Bismuth may form local clusters with multiple Bi atoms in
close vicinity which makes a chemical perspective on the stability and bonding
nature in those materials worthwhile. The homogeneity of Bi distribution in
Ga(As1x Bix ) was studied with a periodic DFT approach and the results are
presented in the following [42, 43].
3.2.1 Methods
Fig. 6 Local Gan Bim configurations classified as (a) dispersed with concentrations of x = 0.03125,
0.04688 and 0.0625, and (b) clustered containing (b) Bi2 , (c) Bi3 and (d) Bi4 units. Averaged
iCOHP values are fiven for equivalent bonds in eV per bond together with the standard deviation.
For the clustered arrangements bonds to Ga in the same (in-plane, shaded) and in other
crystallographic planes (out-of-plane) can be distinguished (Figure from Ref. [42])
The total energies of the different configurations studied do not differ significantly
[42]. However, the stabilities of local Bi arrangements can clearly be distinguished
based on the COHP analyis as evaluated by the energies gained by intergration over
Properties of Functional Semiconductors Studied by First Principles 155
the COHP elements (iCOHP). The iCOHP values for Ga-Bi bonds for several local
configurations next to the bonding types in bulk GaAs and hypothetical GaBi are
presented in Fig. 6.
In pure GaAs (4:508 eV/bond) and GaBi (3:892 eV/bond) the iCOHP bond
strengths serve as a reference for the maximum and minimum bonding interaction,
respectively. It becomes obvious that the III-V bonds in GaAs are much stronger
than in binary GaBi, which is not known experimentally. All configurations of dilute
bismide materials considered, the bond strengths are close to the iCOHP of ideal
GaAs, in agreement with the structures found in experiment and negligible total
energy differences.
For the dispersed configurations, the Ga-Bi bond strength (iCOHP D
4:438; 4:440; 4:446 eV/bond) increases with Bi concentration x, however,
for the clustered arrangements, stronger bonds were found. In line with the
dispersed situations, the iCOHP values also increase with Bi concentration in
the clustered arrangements, where a Ga-Bi-Ga unit forms a plane as indicated by
the shaded area in Fig. 6a–d. In-plane Ga-Bi bonds tend to be more favourable
(iCOHP D 4:489; 4:490; 4:492 eV/bond) than out-of-plane bonds (iCOHP D
4:439; 4:438 eV/bond). The strongest Ga-Bi bond was found for a GaBi4
tetrahedral arrangement (at the highest x studied, clustered arrangement) with
an iCOHP of 4:554 eV/bond. Remarkably, this value is a larger than the absolute
iCOHP of the GaAs bond in the ideal binary material.
Thus, it was concluded that clustered, i.e. heterogeneously dispersed, arrange-
ments of Bi atoms forming Ga-Bi bonds in a dilute bismide GaAs have stronger
bonds and are more likely to occur than homogeneously dispersed Bi atoms.
This is, of course, only a thermodynamic view and growth processes (kinetics,
defect formation etc.) might influence the formation of certain configurations. This
conclusion was, however, in good agreement with quantitative analysis by high-
resolution high angle annular dark field (transmission electron microscopy) images
of Ga(AsBi) samples at similar Bi concentrations grown under metal-organic vapour
phase epitaxy conditions for photonics applications [42].
The effect of band gap narrowing due this localized character of clustered Bi atom
arrangements in dilute Ga(As1x Bix ) alloys was investigated and the results are
presented in the following [43].
In dilute Ga(NAs) a band gap reduction was found which was explained as
anticrossing of localized, empty s(N) orbitals with the conduction band of GaAs.
It was shown that the electron mobility was affected by the N concentration which
hints towards effects on the conduction band in dilute Ga(NAs).
156 A. Stegmüller et al.
Fig. 7 Band decomposed charge density of the heavy hole band for two atom [111] chain (a)
and cluster (b) arrangements. The charge density results from integration over the whole Brillouin
zone. Every isovalue is set to 10 % of the respective maximum (Figure reprinted with permission
from Ref. [43])
In contrast, for dilute Ga(AsBi) materials the hole mobility is decreased in the
bismide compared to pure GaAs (effect on valence band) [59–62]. This is an effect
in the valence band – more precisely, as will be shown, a hybridization of p(Bi)
orbitals which depends on the Bi concentration and local configuration in Ga(AsBi).
Figure 7 shows the partial charge density of the heavy hole band for a dispersed
(left) and a clustered (right) arrangement at a Bi concentration of x = 0.047 (2 Bi
atoms per supercell). The band decomposition clearly displays the delocalized (a)
(dispersed) and localized (b) (clustered) character of the valence band for the two
The band structures of pristine GaAs and four dilute Ga(AsBi) supercell models
were calculated by DFT and a band unfolding technique [63]. The formation of a
tail in the valence band is highly dependent on the relative local arrangement of the
Bi atoms measured by the Bi-Bi separation inside the supercell.
This behaviour can be explained by the tendency of the p(Bi) orbitals to hybridize
with decreasing separation along the [111] axis (the supercell’s diagonal) [43]. In
Fig. 8 the band gap of the host GaAs material is plotted next to the localized Bi
levels for four different Bi concentrations in dilute Ga(AsBi).
Furthermore, the band gap narrowing effect was found to increase with Bi
concentration (compressive strain). Compared to experimental photoluminescence
measurements, the bowing rates determined indicate a two-scale disorder effect for
high Bi concentrations.
Thus, three effects determine the band gap narrowing of dilute Ga(AsBi):
chemical arrangements of the dopant atoms, strain and macroscopic disorder.
Properties of Functional Semiconductors Studied by First Principles 157
Fig. 8 Band gaps of pristine GaAs and dilute Ga(AsBi) with clustered arangements of Bi atoms
in a 8 8 8 supercell. The evolution of Bi defect levels is shown as a function of Bi cluster size
The following section expands the research on the interplay of chemical and
electronic properties of semiconductor materials conducted. The behaviour of wide-
gap aromatic molecules on metal substrates at finite temperatures was studied by
DFT methods and the electronic response of vibrational modes was investigated.
As the character of semiconductor interfaces determines device performance the
insights on electron-vibron coupling gained from this study will guide further
investigations on III/V-Si systems.
Infrared spectroscopy experiments of NTCDA adsorbates on Ag(111) show
active in-plane molecular modes which should be inactive following selection
rules for molecules [64]. These modes do not lead to a change in the molecule’s
dipole moment orthogonal to the surface. A dynamical dipole moment emerges at
the interface between the adsorbate layer and the substrate. For this quantity, no
theoretical or experimental proof beyond heuristic models has been provided [65].
Here, the interfacial dynamical charge transfer (IDCT) was studied for a sub-
monolayer NTCDA on Ag(111) system and the relative importance compared to
nuclear motion at the interface was quantified. A rationale for the amount of
IDCT for specific vibrational modes at the interface is provided going beyond
an empirical evaluation presented earlier [66] without applying time-dependent
treatment [67], which is unfeasible for systems as large as the one studied. Electron-
vibron coupling was investigated for similar systems before with the conclusion
that all the prerequisites for IDCT (strong adsorption, dynamical partial occupation
of the lowest unoccupied molecular orbital) are fulfilled for the system under
investigation [68, 69]. Schöll et al. found that the dipole moment IDCT of the
NTCDA/Ag(111) system is affected by a strong electron-vibron coupling [70].
158 A. Stegmüller et al.
4.1 Methods
IR intensity is proportional to the square of the change in dipole moment 2dyn for a
given mode. The dipole moment is affected by nuclear motion as well as dynamic
charge transfer from the metal substrate to the molecule across the adsorbate-
substrate interface.
4.2 Results
The NTCDA molecules are bound to Ag(111) via attractive van der Waals inter-
actions. However, the bent geometry of the adsorbate molecule suggests a more
Fig. 9 Schematic of most stable adsorption geometry of NTCDA on a bridge position on Ag(111)
in top (a) and side (b) view; (c) contributions to dynamic dipole moment (Figure reprinted with
permission from Ref. [64])
Properties of Functional Semiconductors Studied by First Principles 159
covalent character of the Ag-O bonds. The molecule is distorted propagating shorter
Ag-O distances than Ag-C which lifts the planarity of the free molecule (type-
averaged bond lengths d(O-Ag) = 2.577 ˙ 0.004 Å, d(C-Ag) = 2.905 ˙ 0.145 Å).
This reduces the symmetry from D2h to approximately C2v [65]. The adsorption
energy is large (Eads D 2:09 eV) and the molecule’s LUMO is partially filled.
This fulfills important prerequisits for IDCT [74, 75].
The calculated as well as the experimentally measured IR spectra agree very
well in intensities and mode energies. As can be seen in Table 1, the in-plane
molecular modes with symmetry ag become IR active [64, 65] in the adsorbate-
substrate system due to IDCT which has a large contribution to dyn .
The IDCT was quantified by the mode-specific charge transfer from the substrate
to the adsorbate, q, which was found to correlate to dyn (Fig. 10a). Depending on
the mode symmetry, nuclear motions contribute less to dyn (e.g. ag modes). Those
modes are dominated by the IDCT dipole moment, which can be derived as IDCT
by a linear correlation with q, Fig. 10a. nucl as given in Table 1 is the difference
between dyn and IDCT .
For this value, on the other hand, good correlation was found for Q computed
by finite-difference displacements as shown in Fig. 10b. The dipole moment from
nuclear motion 0nucl can then be directly derived from ıQ as given in Table 1.
Table 1 Computed properties of vibrational modes. See Fig. 9c for definition of terms. Table
reused with permission from Ref. [64]
No. Q a sym.b int.c dyn d IDCT d 0nucl d qe Q d
1 648:1 ag 0:004 0:18 0:08 C0:11 0:01 C0:14
2 665:4 b2g 0:000 0:00 – – 0:00 0:00
3 717:9 b3u 0:058 0:59 0:52 1:08 0:11 1:14
4 720:8 au 0:000 0:00 – – 0:00 C0:01
5 813:0 b3u 0:002 0:11 0:22 0:18 0:04 0:17
6 987:0 ag 0:002 0:12 0:13 C0:06 0:02 C0:09
7 1104:2 ag 0:021 0:37 0:22 C0:02 0:04 C0:04
8 1256:9 ag 0:038 0:53 0:35 C0:05 0:07 C0:08
9 1345:5 ag 0:581 1:95 1:88 C0:07 0:42 C0:10
10 1404:8 ag 0:133 0:94 0:92 C0:09 0:20 C0:12
11 1435:8 b1u 0:000 0:02 – – 0:00 0:00
12 1509:8 b2u 0:000 0:00 – – 0:00 0:00
13 1565:6 ag 1:000 2:51 2:44 C0:17 0:55 C0:21
14 1625:7 ag 0:264 1:27 1:84 0:62 0:41 0:64
15 1628:9 b1u 0:024 0:37 0:57 0:19 0:12 0:18
Vibrational modes in cm1
Mode symmetries
IR intensities normalized to highest value
Dipole moments in Debye
Charges in e
160 A. Stegmüller et al.
Fig. 10 (a) Correlation of charge transfer (q) and dyn to determine IDCT ; the open rectangles
refer to first order correction of dyn due to nonzero nucl . (b) Correlation of Q with nucl to
determine 0nucl (Figure reprinted with permission from Ref. [64])
As can be seen comparing the dipole contributions nucl and IDCT in Table 1,
the role of IDCT dominates the dynamic dipole moments dyn and associated
infrared activities. The contribution of nuclear motion (out-of-plane bending) is
less important throughout the 15 modes investigated. For ag symmetric modes, the
magnitude of IDCT can be derived from partial charges as shown by the NPA-
derived atomic charges q. According to Eq. 6, contributions from nuclear motion
can be determined reliably if IDCT is the main contribution.
We derived unequivocal evidence for the dominating role of IDCT for dynamic
dipole moments, associated IR activities and thus electron-vibron coupling.
49. Lewis, R., Beaton, D., Lu, X., Tiedje, T.: J. Cryst. Growth 311(7), 1872 (2009)
50. Becke, A.D., Johnson, E.R.: J. Chem. Phys. 124(22) (2006)
51. Tran, F., Blaha, P.: Phys. Rev. Lett. 102(22), 5 (2009)
52. Kim, Y.S., Hummer, K., Kresse, G.: Phys. Rev. B Condens. Matter Mater. Phys 80(3), 1 (2009)
53. Kim, Y.S., Marsman, M., Kresse, G., Tran, F., Blaha, P.: Phys. Rev. B Condens. Matter Mater.
Phys. 82(20), 1 (2010)
54. Dronskowski, R., Blöchl, P.E.: J. Phys. Chem. 97(33), 8617 (1993)
55. Deringer, V.L., Tchougréeff, A.L., Dronskowski, R.: J. Phys. Chem. A 115(21), 5461 (2011)
56. Maintz, S., Deringer, V.L., Tchougréeff, A.L., Dronskowski, R.: J. Comput. Chem. 34(29),
2557 (2013)
57. Koga, T., Kanayama, K., Watanabe, S., Thakkar, A.J.: Int. J. Quantum Chem. 71(6), 491 (1999)
58. Koga, T., Kanayama, K., Watanabe, T., Imai, T., Thakkar, A.J.: Theor. Chem. Acc. 104(5), 411
59. Kini, R.N., Ptak, A.J., Fluegel, B., France, R., Reedy, R.C., Mascarenhas, A.: Phys. Rev. B
Condens. Matter Mater. Phys. 83(7), 1 (2011)
60. Nargelas, S., Jarasiunas, K., Bertulis, K., Pacebutas, V.: Appl. Phys. Lett. 98(8), 1 (2011)
61. Beaton, D.A., Lewis, R.B., Masnadi-Shirazi, M., Tiedje, T.: J. Appl. Phys. 108(8), 2 (2010)
62. Cooke, D.G., Hegmann, F.A., Young, E.C., Tiedje, T.: Appl. Phys. Lett. 89(12), 83 (2006)
63. Rubel, O., Bokhanchuk, A., Ahmed, S.J., Assmann, E.: Phys. Rev. B Condens. Matter Mater.
Phys. 90(11), 1 (2014)
64. Rosenow, P., Jakob, P., Tonner, R.: J. Phys. Chem. Lett. 7, 1422 (2016)
65. Tonner, R., Rosenow, P., Jakob, P.: Phys. Chem. Chem. Phys. 18, 6316 (2016)
66. Braatz, C.R., Öhl, G., Jakob, P.: J. Chem. Phys. 136(13), 134706/1-134706/8 (2012)
67. Weigel, A., Dobryakov, A., Klaumünzer, B., Sajadi, M., Saalfrank, P., Ernsting, N.P.: J. Phys.
Chem. B 115(13), 3656 (2011)
68. Langreth, D.C.: Phys. Rev. Lett. 54(2), 126 (1985)
69. Chabal, Y.J.: Phys. Rev. Lett. 55(8), 845 (1985)
70. Schöll, A., Zou, Y., Kilian, L., Hübner, D., Gador, D., Jung, C., Urquhart, S.G., Schmidt, T.,
Fink, R., Umbach, E.: Phys. Rev. Lett. 93(14), 93 (2004)
71. Reed, A.E., Weinstock, R.B., Weinhold, F.: J. Chem. Phys. 83(2), 735 (1985)
72. Dunnington, B.D., Schmidt, J.R.: J. Chem. Theory Comput. 8(6), 1902 (2012)
73. Bader, R.: Atoms in Molecules – A Quantum Theory. Oxford University Press (1990)
74. Tautz, F.S.: Prog. Surf. Sci. 82(9–12), 479 (2007)
75. Bendounan, A., Forster, F., Schöll, A., Batchelor, D., Ziroff, J., Umbach, E., Reinert, F.: Surf.
Sci. 601(18), 4013 (2007)
Submonolayer Rare Earth Silicide Thin Films
on the Si(111) Surface
1 Introduction
Metallic rare earth (RE) silicides can be grown epitaxially as thin films on the
Si(111) substrate by rare earth deposition and thermal treatment [1–3]. The resulting
metal/semiconductor interface is characterized by an extraordinarily low Schottky
barrier height of 0.3–0.4 eV on n-type substrates. Due to the marginal lattice
mismatch [4] between substrate and thin film, the interface furthermore has a low
defect concentration and is very stable. Therefore rare earth silicides on n-type
silicon are considered ideal candidates for Ohmic contacts [5, 6]. The relatively high
barrier height on p-type substrates makes them interesting for infrared detectors and
photovoltaic applications [7]. For submonolayer coverage, a variety of structures
with different periodicities was found [8–12].
Despite the large and growing interest in silicide thin films on Si(111), our
knowledge of these systems is still fragmentary. A multitude of surface reconstruc-
tions or nanostructures with different periodicities has been observed, depending on
the rare earth species and rare earth coverage [8–19]. The observed structures are
characterized by different stoichiometries and heights. For the case of dysprosium
silicide, e.g., a full monolayer results in a film with 1 1 periodicity
p p and DySi2
hexagonal structure, multilayer silicides grow in a film with 3 3 periodicity
p Dy3 Si
p5 composition, while submonolayer coverage results in structures with
2 3 2 3 or 5 2 periodicity [10].
Computational studies of two-dimensional rare-earth silicides are rare and
limited to the simplest yttrium and erbium silicide structures [15, 18–23]. The lack
of theoretical investigations is particularly severe in the submonolayer range, where,
to the best of our knowledge, no theoretical investigations are available.
The present paper aims at providing microscopic structural models for the silicide
thin film 5 2 phase observed in the submonolayer regime. To this end, we
combine total energy density functional theory (DFT) calculations with ab initio
thermodynamics. Calculations are performed using Tb (atomic number 65) and Dy
(atomic number 66) as prototypical trivalent rare earths.
2 Methodology
Total-energy density functional theory (DFT) calculations are performed within the
generalized gradient approximation [24] (GGA) in the Perdew-Burke-Ernzerhof
formulation [25] as implemented in the Vienna ab initio simulation package
(VASP) [26, 27]. Projector augmented wave [28, 29] (PAW) potentials with pro-
jectors up to l D 1 for H, l D 2 for Si and l D 3 for the rare earth atoms, as well as
a plane wave cutoff of 400 eV have been used. As no other valence state than RE3C
has been observed for the rare earth ions in the silicide structures, we constrain the
valence state of the investigated rare earth ions treating f electrons as core states.
This approach, commonly referred to as frozen-core method, allows for a proper
treatment of the lanthanides within DFT [30–32].
Six Si bilayers stacked along the [111] crystallographic direction model the
substrate. The periodic supercell contains in addition the silicide thin film of variable
structure and height, and a vacuum region of at least 15 Å. The dangling bonds at
the bottom of the slab are saturated by H atoms. The atomic positions are relaxed
until the residual Hellmann-Feynman forces are lower than 0.001 eV/Å. Thereby
three Si bilayers and the hydrogen atoms are kept frozen. Test calculations show that
adding further substrate layers does not result in noticeable changes of the calculated
geometries and band structures. Dipole-correction algorithms have been used to
correct the spurious interactions of the slabs with their periodic images [33, 34].
Simulated constant-current STM images are calculated within the Tersoff-
Hamann approach [35, 36] on the basis of the partial densities of states (LDOS).
In order to compare the formation energy of silicide films with different
composition, we use the Landau potential ˝, approximated as [37, 38]
˝.Si ; RE / EDFT .NSi ; NRE / i Ni : (1)
Silicide Thin Films on Si(111) 165
In this equation, EDFT .NSi ; NRE / is the DFT total energy of a slab containing NSi
silicon atoms and NRE rare earth atoms. Si and RE are the corresponding chemical
potentials and represent the experimental growth conditions. The sum in Eq. 1 also
extends to the H atoms employed to saturate the Si dangling bonds at the bottom side
of the slabs. The usage of the total rather than the free energy for the calculation of
˝ is an approximation. It is justified as long as the entropic contributions are of
similar magnitude for the different silicide films.
The Landau potential ˝ in Eq. 1 is expressed as a function of the chemical
potentials Si and RE . Their thermodynamically allowed range is constrained by
several conditions. The upper limits are given by the bulk phases,
i bulk
i i D Si; RE: (2)
Furthermore the silicide films are in equilibrium with the Si substrate, which
represents an infinite reservoir of Si atoms. This pins the value of Si to bulk
Si and
allows to express the Landau potential as ˝ D ˝.RE /. The RE chemical potential
can be controlled experimentally with the amount of rare earth deposited on the Si
substrate before annealing. If we restrict our investigation to silicide phases with a
given stoichiometry RE˛ Siˇ , the lower limit of RE is given by
˛ bulk
Si C ˇ RE D Si˛ REˇ
Fig. 1 CPU time on the HRLS CRAY XE6 for the self consistent calculation of the electronic
structure of a LiNbO3 slab within different parallelization schemes. See text for details
requirements. Numerical results for the test configuration are shown in Fig. 1 (red
line). This setup results in a roughly linear scaling up to 768 cores.
Within the described approach for parallel computing, it is possible to steer
the data distribution mode. In particular, the plane wise data distribution in real
space can be activated. This allows for a much reduced communication during the
Fourier transforms (FFTs). Unfortunately, the resulting load balancing is worsened.
Therefore, the suitability of the plane wise data distribution must be tested for
the particular computational architecture and in dependence of the number of
processors. The results of our tests are shown in Fig. 1 (orange line). As expected,
advantages of the plane wise data distribution occur for a relatively small numbers
of processors, but are outweighted by load-balancing problems for calculations that
employ 768 cores.
The computational techniques for the modeling of atomic system discussed here
rely on mathematical libraries, in particular the Linear Algebra Package LAPACK
or its distributed-memory implementation ScaLAPACK. The calculations discussed
above have been performed with ScaLAPACK. This speeds up the calculations by
up to a factor of two compared to LAPACK calculations, shown by the black line
in Fig. 1. Even if only 96 cores are used, there is a noticeable speed up achieved by
using the scalable linear algebra package.
Starting with VASP Version 5.3.2 it is possible to use additionally a paralleliza-
tion over the k points used to sample the Brillouin zone in the reciprocal space
calculations. Thereby it is possible to specify the number of k points that are to be
treated in parallel. Within the group of cores that share the work for an individual k
point also the electronic states and/or plane wave coefficients are treated in parallel.
Silicide Thin Films on Si(111) 167
The results of the corresponding tests are shown in Fig. 1 (blue line). It can be
seen that the k point parallelization leads to an additional saving of computer time,
provided more than 192 cores are used. The speed up with respect to calculations
without k point parallelization amounts up to a factor two. Moreover, this approach
allows for the extension of the roughly linear scaling to 1536 cores.
3 Results
nor the size of the surface unit cell are considered for this particular calculation,
the calculated values do not correspond to the absolute formation energies of the
structures and the Landau potential is labeled by ˝ 0 . However, as both the surface
unit cell as well as the number of hydrogen atoms used to passivate the dangling
bonds at the bottom side of the slabs are the same for all configurations, a relative
comparison of the different structures is possible. Even if the 5 2 phase has
been observed for different rare earth silicides, we limit our investigation to Dy
silicide due to the high demand of computational resources. However, based on our
experience with the other silicide structures discussed above, the results may again
be extrapolated to all trivalent rare earths.
The phase diagram in Fig. 4 shows that structures with rare earth atoms in
the channels between honeycomb and Seiwatz chains (h, n) are favored. Indeed,
for most values of the chemical potentials, which are relevant for submonolayer
coverage, the structures labeled by h and n are the most stable configurations, while
for strongly Dy rich conditions the models m and i can be formed. These structures
are less relevant, however, since at these values of the rare earth chemical potential
monolayer or multilayer silicides are formed.
Thus, the energetically almost degenerate models h and n (energy difference
18 meV per 5 2 unit cell) with two rare earth atoms per unit cell describe the
observed phase. Considering that four and eight Si atoms per 5 2 unit cell form the
Seiwatz and honeycomb chains, respectively, the stoichiometry of the silicide layer
at the Si(111) surface can be expressed as RESi6 . The difference between the two
Fig. 4 Calculated phase diagram for the dysprosium adsorbed Si(111) surface with 5 2
periodicity as a function of the dysprosium chemical potential Dy . Two representative values
of Dy , corresponding to Dy in its metallic hcp bulk phase and to Dy in hexagonal DySi2 state are
indicated. Si-rich conditions are assumed
Silicide Thin Films on Si(111) 171
Fig. 5 Side (a) and top view (b) of the thermodynamically stable rare earth induced surface
reconstruction with 5 2 periodicity on the Si(111) surface. The termination corresponds to
structure h in Figs. 3 and 4 and consists of alternating Si Seiwatz and honeycomb chains. The
surface unit cell is highlighted
Fig. 6 Calculated surface band structure for model h in Figs. 3 and 4. Projected bulk bands are
shown in grey. The inset shows the surface Brillouin zone of the 5 2 structure
models h and n consists in the different alignments of neighboring rare earth atom
rows on both sides of the honeycomb chains. The slightly more stable model h, in
which the rare earth atoms are aligned in-phase, is shown in more detail in Fig. 5. It
is known that on Si(111) a honeycomb chain is stabilized by one electron per 3 1
unit cell, while a zigzag Seiwatz chain requires two electrons per 2 1 unit cell [11].
Thus, the 5 2 phase is built of two 3 1 surface units with honeycomb chains and
two 2 1 surface units containing Seiwatz chains. The structure is stabilized by two
trivalent rare earth atoms, which provide six electrons per unit cell.
Figure 6 shows the calculated electronic band structure of model h. We mention
that the Si bulk band gap calculated here is about 0.67 eV i.e. slightly smaller than
measured, due to the underestimation of the band gaps in DFT calculations [37].
Almost no surface localized electronic states are present in the bulk gap region. The
172 S. Sanna et al.
Fig. 7 Calculated density of states for model h in Figs. 3 and 4. The total density of states is
shown black, while the silicide contribution is shown red
fundamental electronic gap (direct, at ) is only slightly smaller than the calculated
Si bulk gap. This confirms that the submonolayer silicide with 5 2 periodicity on
the Si(111) surface is semiconducting. The inset of Fig. 6 shows the 5 2 surface
Brillouin zone employed for the calculations. The atomic chains are parallel to the
long sides of the surface Brillouin zone.
The (local) density of states of the slab modeling the DySi6 silicide with 5 2
periodicity on the Si(111) surface is shown in Fig. 7. The total density of states
is represented by the black curve, while the red curve represents the local DOS
of the silicide layer. The dotted lines indicate the valence and conduction band
edges of bulk Si. The calculated (L)DOS again shows that the silicide layer is
semiconducting. The overall appearance of the total DOS is very similar to the Si
bulk DOS, with the exception of a minor reduction of the fundamental bandgap.
This effect is due to the electronic states close to the conduction band minimum.
Otherwise the presence of the DySi6 layer does not strongly affect the band gap
region of the substrate.
The knowledge of the thermodynamically stable structural model now also
allows for the interpretation of the STM images and the identification of the
observed features. In the filled state images (Fig. 8a, c) the bright spots are assigned
to the honeycomb (broad rows) and Seiwatz (thin rows) chains, which capture the
electrons from the rare earth atoms. The latter are thus not visible at this bias. In
contrast, in the empty state images (Fig. 8b, d), the rare earth atoms donating their
electrons are visible, while the dark rows show the location of the honeycomb
chains. As between the chains different equivalent lattice sites are available for
the rare-earth atoms, different STM patterns are possible. These correspond to an
in-phase alignment between neighboring rare earth rows (model h) or a zigzag
alignment (model n).
In contrast to monovalent and divalent ions, for which also other n 2 phases
with odd n ¤ 5 have been observed, 5 2 is the only possible n 2 periodicity for
Silicide Thin Films on Si(111) 173
Fig. 8 (a), (b) Measured STM images of the 5 2 Tb silicide submonolayer structure on the
Si(111) surface [12] in comparison with simulated data in (c), (d). Experimental STM images
refer to voltages of 1:5 V [(a), (c) filled states] and 1.5 V [(b), (d) empty states], and tunneling
currents of 100 pA. The 5 2 surface unit cell is indicated
4 Conclusions
It is also important to notice that the adsorption of divalent metals at the Si(111) typically leads
to a n 2 surface reconstruction, with n an odd integer. Thus, as suggested by Battaglia et al. [11],
the 5 2 phase might be induced by divalent lanthanides such as Yb, Eu, Sm or Tm. In this case,
they would give rise to completely different structures, similar to the reconstructions formed by
deposition of divalent alkaline-metal earths (Mg, Ca, Sr, Ba). These are not investigated in this
work, as we only consider lanthanides in the trivalent state (Dy3C ,Tb3C ).
174 S. Sanna et al.
of the silicide structures with 5 2 periodicity does not strongly affect the electronic
properties of the substrate, but slightly reduces the band gap.
1. Paki, P., Kafader, U., Wetzel, P., Pirri, C., Peruchetti, J.C., Bolmont, D., Gewinner, G.: Phys.
Rev. B 45, 8490 (1992)
2. d’Avitaya, F.A., Perio, A., Oberlin, J.C., Campidelli, Y., Chroboczek, J.A.: Appl. Phys. Lett.
54(22), 2198 (1989)
3. Knapp, J.A., Picraux, S.T.: Appl. Phys. Lett. 48(7), 466 (1986)
4. Wetzel, P., Pirri, C., Paki, P., Peruchetti, J., Bolmont, D., Gewinner, G.: Solid State Commun.
82(4), 235 (1992)
5. Tu, K.N., Thompson, R.D., Tsaur, B.Y.: Appl. Phys. Lett. 38(8), 626 (1981)
6. Vandré, S., Preinesberger, C., Busse, W., Dähne, M.: Appl. Phys. Lett. 78(14), 2012 (2001)
7. Vandré, S., Kalka, T., Preinesberger, C., Dähne-Prietsch, M.: Phys. Rev. Lett. 82, 1927 (1999)
8. Lohmeier, M., Huisman, W.J., ter Horst, G., Zagwijn, P.M., Vlieg, E., Nicklin, C.L., Turner,
T.S.: Phys. Rev. B 54, 2004 (1996)
9. Roge, T., Palmino, F., Savall, C., Labrune, J., Pirri, C.: Surf. Sci. 383(2–3), 350 (1997)
10. Engelhardt, I., Preinesberger, C., Becker, S., Eisele, H., Dähne, M.: Surf. Sci. 600(3), 755
11. Battaglia, C., Cercellier, H., Monney, C., Garnier, M.G., Aebi, P.: EPL (Europhys. Lett.) 77(3),
36003 (2007)
12. Franz, M., Große, J., Kohlhaas, R., Dähne, M.: Surf. Sci. 637–638, 149 (2015)
13. Wanke, M., Franz, M., Vetterlein, M., Pruskil, G., Höpfner, B., Prohl, C., Engelhardt, I.,
Stojanov, P., Huwald, E., Riley, J., Dähne, M.: Surf. Sci. 603(17), 2808 (2009)
14. Wanke, M., Franz, M., Vetterlein, M., Pruskil, G., Prohl, C., Höpfner, B., Stojanov, P., Huwald,
E., Riley, J.D., Dähne, M.: J. Appl. Phys. 108(6), 064304 (2010)
15. Stauffer, L., Mharchi, A., Pirri, C., Wetzel, P., Bolmont, D., Gewinner, G., Minot, C.: Phys.
Rev. B 47, 10555 (1993)
16. Kitayama, H., Tear, S., Spence, D., Urano, T.: Surf. Sci. 482–485(Part 2), 1481 (2001)
17. Bonet, C., Spence, D., Tear, S.: Surf. Sci. 504, 183 (2002)
18. Rogero, C., Koitzsch, C., González, M.E., Aebi, P., Cerdá, J., Martín-Gago, J.A.: Phys. Rev. B
69, 045312 (2004)
19. Rogero, C., Martín-Gago, J.A., Cerdá, J.I.: Phys. Rev. B 74, 121404 (2006)
20. Koitzsch, C., Bovet, M., Garnier, M., Aebi, P., Rogero, C., Martín-Gago, J.: Surf. Sci. 566–
568(Part 2), 1047 (2004) (Proceedings of the 22nd European Conference on Surface Science)
21. Magaud, L., Reinisch, G., Pasturel, A., Mallet, P., E. Dupont-Ferrier, Veuillen, J.Y.: EPL
(Europhys. Lett.) 69(5), 784 (2005)
22. Wetzel, P., Saintenoy, S., Pirri, C., Bolmont, D., Gewinner, G.: Phys. Rev. B 50, 10886 (1994)
23. Cocoletzi, G.H., de la Cruz, M.R., Takeuchi, N.: Surf. Sci. 602(2), 644 (2008)
24. Perdew, P., Chevary, J.A., Vosko, S.H., Jackson, K.A., Pederson, M.R., Singh, D.J., Fiolhais,
C.: Phys. Rev. B 46, 6671 (1992)
25. Perdew, J.P., Burke, K., Ernzerhof, M.: Phys. Rev. Lett. 77, 3865 (1996)
26. Kresse, G., Furthmüller, J.: Comput. Mater. Sci. 6, 15 (1996)
27. Kresse, G., Furthmüller, J.: Phys. Rev. B 54, 11169 (1996)
28. Bloechl, P.E.: Phys. Rev. B 50(24), 17953 (1994)
Silicide Thin Films on Si(111) 175
29. Kresse, G., Joubert, D.: Phys. Rev. B 59, 1758 (1999)
30. Anisimov, V.I., Aryasetiawan, F., Lichtenstein, A.I.: J. Phys. Condensed Matter 9(4), 767
31. Sanna, S., Schmidt, W.G., Frauenheim, T., Gerstmann, U.: Phys. Rev. B 80, 104120 (2009)
32. Sanna, S., Frauenheim, T., Gerstmann, U.: Phys. Rev. B 78, 085201 (2008)
33. Neugebauer, J., Scheffler, M.: Phys. Rev. B 46(24), 16067 (1992)
34. Bengtsson, L.: Phys. Rev. B 59(19), 12301 (1999)
35. Tersoff, J., Hamann, D.R.: Phys. Rev. Lett. 50, 1998 (1983)
36. Tersoff, J., Hamann, D.R.: Phys. Rev. B 31, 805 (1985)
37. Bechstedt, F.: Principles of Surface Physics. Advanced Texts in Physics. Springer, Berlin/
Heidelberg (2003)
38. Sanna, S., Schmidt, W.G.: Phys. Rev. B 81(21), 214116 (2010)
39. Lüth, c: Surfaces and Interfaces of Solid Materials. Springer Study Edition. Springer,
Berlin/Heidelberg (1995)
40. Kirakosian, A., McChesney, J., Bennewitz, R., Crain, J., Lin, J.L., Himpsel, F.: Surf. Sci.
498(3), L109 (2002)
41. Perkins, E., Scott, I., Tear, S.: Surf. Sci. 578(1–3), 80 (2005)
42. Wetzel, P., Pirri, C., Gewinner, G., Pelletier, S., Roge, P., Palmino, F., Labrune, J.C.: Phys. Rev.
B 56, 9819 (1997)
Computational Analysis of Li Diffusion
in NZP-Type Materials by Atomistic Simulation
and Compositional Screening
Abstract Solid state electrolytes (SSEs) can become a key component for the
development of novel reliable, safe, and highly efficient Li-ion batteries. This work
focuses on the vacancy-mediated diffusion of Li ions through solid compounds
with NZP crystal structures [e.g. LiTi2 (PO4 )3 (LTP); NZP stands for NaZr2 (PO4 )3 ],
which is a promising class of materials for the application as SSEs. Since this crystal
structure is known to be stable for many combinations of elements on the cation
positions, the activation energies for vacancy jumps were calculated in this work
for a variety of NZP-type compounds with different compositions. First-principles
calculations based on density functional theory were performed to determine the
migration barrier heights, and to correlate their values to structural characteristics.
In addition, the bond valence method was applied to the NZP-type compounds,
which not only helps to identify diffusion networks and transition points, but which
can also be valuable for predicting qualitative trends by systematic compositional
1 Introduction
In recent years, there has been a rapidly growing industrial demand for energy
storage materials combining large specific energy and power densities with high
safety, harmlessness for health and abundant availability of the processed elements.
An increase of the safety of current Li-ion batteries could be achieved by replac-
ing the likely flammable and toxic liquid electrolytes by ion conducting solid
compounds. Materials crystallizing in the NASICON [2, 3, 6, 8] or NZP crystal
structure, named after the compound NaZr2 (PO4 )3 , with Na being exchanged by
Li, can be regarded as promising candidates for solid-state electrolytes (SSEs),
mainly due to their three-dimensional diffusion network for Li ions [10] and their
capability of accommodating many different combinations of elements on the Zr
and P sublattices by maintaining the NZP crystal structure [18, 19]. The variety of
possible elemental substitutions allows for a systematic screening of many different
compositions with the goal of finding novel materials with desired properties such
as high ionic conductivity and low thermal expansion. In this work, the diffusion of
Li through various NZP-type compounds is analyzed and screened by combining
quantum-mechanical ab-initio simulations with static energy calculations based
on bond valence potentials. Structure–property relationships were identified [12]
leading to the possibility of qualitatively predicting migration barriers directly from
crystal structure characteristics. After describing the NZP crystal structure in detail,
the employed computational methods are explained concisely. Results for vacancy-
mediated Li-ion migration in LiX2 (LO4 )3 , where X and L denote ions substituting
Ti and P, respectively in LiTi2 (PO4 )3 (LTP or LISICON [11]) are presented and
discussed. Finally, a summary of the usage of computational resources on the
ForHLR I supercomputer is given.
The general structure of NZP or LTP compounds can be described by the for-
Œ8 Œ6
mula .M1/Œ6 .M2/3 X2 .LŒ4 O4 /3 [15, 16, 18, 19]. M1 and M2 denote interstitial
positions which are fully or partly occupied by Li, X and L are the positions of
Zr/Ti and P, respectively. The oxygen coordination numbers of the cations are given
by superscripts in square brackets. NZP compounds crystallize in a rhombohedral
structure with the space group R3c. Two XO6 octahedra and three LO4 tetrahedra
being connected by oxygen atoms form the basic X2 (LO4 )3 units of the structural
framework, which are called ‘lanterns’ due to their characteristic shape (see Fig. 1).
In between the connected lanterns, three-dimensional migration paths exist for the
Li ions from M1 to M2 positions and vice versa. From the site multiplicities of
M1 (Wyckoff position 6b) and M2 (18e) it is obvious that there are three times as
much M2 than M1 positions. Each M1 is surrounded by six M2 sites, and each M2
connects two M1 sites, leading to the three-dimensional network. The occupation
of M1 and M2 positions with Li ions depends on the oxidation states of X and L
ions such that charge neutrality is ensured, e.g. LiTi2 (PO4 )3 with Ti.CIV/ and P.CV/ ,
Li4 Zr2 (SiO4 )3 with Zr.CIV/ and Si.CIV/ , or Li3 Al2 (VO4 )3 with Al.CIII/ and V.CV/ . In
the case of mixed occupation of X sites with atoms of valencies +IV and +V, some
of the M1 sites have to be empty, leading to the possibility of vacancy-mediated
diffusion without having to incorporate additional vacancies.
Computational Analysis of Li Diffusion in NZP-Type Materials 179
Fig. 1 Hexagonal supercell of the rhombohedral NZP structure, shown for LiTi2 (PO4 )3 (LTP).
One ‘lantern’ configuration is visualized by coordination tetrahedra of oxygen (red spheres) around
P atoms (violet spheres) and by octahedra around Ti atoms (blue spheres). In LTP, Li ions occupy
the M1 positions (green spheres), and the M2 positions are empty (yellow spheres). Only M2 sites
surrounding one Li ion are shown for clarity
3 Computational Methods
Transition paths and activation energies (migration barrier heights) in the material
class described above were calculated by means of ab-initio methods based on the
density functional theory (DFT) [7] and the nudged elastic band (NEB) method [9],
and by static energy calculations using bond valence (BV) potentials [1].
180 D. Mutter et al.
The DFT code Quantum ESPRESSO PWscf [7] was applied to obtain ground-state
configurations and energies of perfect NZP structures, structures with vacancies,
and structures with migrating Li atoms at transition points. Since compounds with
more than one Li atom per formula unit were not considered (no occupation of
M2 sites), the perfect supercells contain 108 atoms (6 Li, 12 X, 18 L, and 72
O), arranged as depicted in Fig. 1. The wavefunctions of the valence electrons
were expanded in a plane-wave basis with a cutoff energy of 476 eV, and their
interaction with the ionic cores was described by ultrasoft pseudopotentials [20].
The exchange-correlation contribution to the total energy was taken into account by
the general gradient approximation (GGA) in the formulation of Perdew, Burke, and
Ernzerhof [17]. Brillouin-zone integration was performed on a 3 3 1 grid, set
up by the scheme of Monkhorst and Pack [14]. Cell-volume relaxation was done
by total-energy minimization, and atomic relaxation was stopped when the minimal
force acting on an atom became less than 103 eV/A. V For the NEB calculations,
5 intermediate images were chosen along an initially straight path between the
previously relaxed and the initial and final states. In general, at each ionic relaxation
step, the component of the force on an atom in the direction of this path is replaced
by forces of elastic springs between the atom and the neighboring images of this
atom [9], thereby impeding atoms at energetically unfavorable positions along a
path between two energy minima from relaxing into one of these minima. The
components of the forces perpendicular to the path are not changed, and so the
correlated relaxation of all images results in the transition path of minimal energy
across the saddle point.
V.A/ D vi D expŒ.R0 Ri /=b ; (1)
iD1 iD1
with distances Ri between the bonded atoms. The parameters R0 and b can be
adjusted to achieve minimal mismatch of V.A/ with the oxidation state of atom A in
known stable configurations. Higher sum mismatches should therefore correspond
Computational Analysis of Li Diffusion in NZP-Type Materials 181
to energetically less favorable and less stable structures [5]. The total energy is
linked to the bond valences by:
" N #
X vi vmin 2
Ebv D D0 N ; (2)
In order to identify the paths for vacancy-mediated Li-ion diffusion, the energy of a
Li ion at each position in the cell was calculated with the BV approach in the LTP
structure. The lowest energy values are found, as expected, for Li at M1 positions.
At a certain value of higher energy, which can be regarded as the activation energy
within this model, the isosurface becomes interconnected and forms a continuous
network throughout the whole system, as depicted in Fig. 2. A vacancy at a M1
site, visualized by the empty circle in the upper Li layer, can now move in the
crystal by successive jumps along e.g. the dashed line, thereby effectively enabling
Li-ion movement in the opposite direction. Since the energies result from a static
calculation with an effective interaction potential, they cannot be considered as
quantitatively accurate, but can be valuable for predicting qualitative trends of
activation energies when e.g. incorporating defects or screening a large variety of
elemental substitutions.
Activation energies for the migration of a Li vacancy were calculated with DFT for
several substitutions at the Ti and P positions of LTP. To this end, defect-free systems
182 D. Mutter et al.
O Ti
Fig. 2 Hexagonal supercell of rhombohedral LTP with an energy isosurface of constant E.Li/ as
calculated with the BV method. The energy value was chosen as the minimum value at which
the isosurface formed an interconnected network (C0:8 eV relative to the energy of Li at the M1
positions). The Li vacancy in the supercell is visualized by the empty sphere in the uppermost Li
were set up followed by a relaxation of the cell volume and atomic positions. The
volume was then kept constant for the simulations of the structures containing one
vacancy at a M1 site (initial state), and for those structures with two vacancies at
adjacent M1 sites and an interstitial Li at the intermediate M2 site (i.e., the transition
state). The activation energy was obtained as the energy difference between these
two configurations. In order to calculate the energy along the migration paths, NEB
runs were performed for a few of the considered systems (see Fig. 3), leading to
mirror-symmetric barriers.
Computational Analysis of Li Diffusion in NZP-Type Materials 183
0.7 LiX2(PO4)3 X=
Activation Energy (eV)
0 0.25 0.5 0.75 1
Migration Coordinate for Li Ion
Fig. 3 Minimum energy paths for the migration of a Li ion from a M1 to an adjacent M1 position
across the transition point (M2) in LiX2 (PO4 )3 for 5 different tetravalent elements X [12]
12 13 14 15 16
Polyhedron Volume (Å )
Fig. 4 Dependence of the activation energy for vacancy-mediated Li-ion migration on the volume
of the LiO6 octahedron for a variety of NZP compounds
Fig. 5 Oxygen octahedra around adjacent M1 positions. A possible diffusion direction, which
is indicated by arrows, crosses the bottleneck formed by three oxygen atoms at the face of the
position, it has to cross one of the triangular faces of the octahedron spanned by three
O atoms, and so the energy barrier is higher the smaller this bottleneck is (see Fig. 5).
For systems containing Ge, Ti, Sn, and Hf, a similar relationship between activation
energy and bottleneck size in NZP structures was experimentally observed by
Martinez et al. using X-ray diffraction and electrical impedance spectroscopy [13].
The trend is also maintained when, instead of a total replacement of Ti by
other elements in LTP, only a partial substitution of just one Ti atom in the cell
is considered, leading to LiX0:2 Ti1:8 (PO4 )3 . This was shown earlier in this project
for a large set of tri-, tetra-, and pentavalent elements X [12]. In the case of
LiSi0:2 Ti1:8 (PO4 )3 it was found that the migration barriers along a complete path
through the cell, as indicated for example by the dashed line in Fig. 2, vary by up to
0.2 eV depending on the distance of the moving ion from the substituted atom.
5 Computer Resources
For the DFT calculations, the MPI parallelized pwscf code of the software Quantum
ESPRESSO was run on the ForHLR I singlenode queues using 20 processors. K-
point parallelization was not taken into account (specified by the flag -npool = 1
of the mpirun command), since this was found to lead to the shortest execution
times for 20 cores. In Fig. 6, the decrease of computation time with the number of
employed cores is shown for two representative systems of LTP with 107 atoms:
system S1 containing one vacancy on a M1 position leading to 5 irreducible k-
points (resulting from symmetry operations on the original 331 grid of k-points),
Computational Analysis of Li Diffusion in NZP-Type Materials 185
npool = 1; Nk_irr = 5
npool = 2; Nk_irr = 5
npool = 3; Nk_irr = 5
Execution Time (min)
npool = 4; Nk_irr = 5
60 npool = 1; Nk_irr = 4
npool = 2; Nk_irr = 4
npool = 3; Nk_irr = 4
40 npool = 4; Nk_irr = 4
8 16 32 48 64
Number of Cores
Fig. 6 Dependence of execution time of a single electronic self-consistency run on the number
of cores. Different settings for the mpirun option npool were applied, which corresponds to k-
point parallelization, and two systems with different numbers of irreducible k-points (Nk_irr ) were
considered. The calculations using 8 and 16 cores were performed on the singlenode queue, and
those using 32, 48, and 64 cores on the multinode queue with 2, 3, and 4 nodes, respectively, and
16 cores in each case
System S0
25 System S1
System S2
Computation Time (h)
200 300 400 500 600 700 800
Number of Electronic + Ionic Steps
Fig. 7 Dependence of the computation time on the sum of electronic self-consistency and ionic
relaxation steps for the three characteristic systems considered in this work: a defect-free cell with
108 atoms leading 4 irreducible k-points (S0 ), a cell with one Li vacancy at M1 (107 atoms) leading
to 5 irreducible k-points (S1 ), and a cell with two vacancies at M1 sites and one interstitial Li atom
at the transition point M2 (107 atoms) leading to 4 irreducible k-points (S2 ). The different points
belong to different substitutions of Ti in LiTi2 (PO4 )3 . Lines are linear fits
In this work static energy calculations based on bond valence potentials and density
functional theory calculations were applied to study migration paths and activation
energies for Li ions in ion-conducting NZP-type SSE materials. Based on the
compound LiTi2 (PO4 )3 , Ti and P atoms were substituted by a variety of isovalent
elements to analyze their influence on Li-ion diffusion. The larger the coordination
polyhedron around a Li-ion is, the easier the Li ion can escape the cage through
the bottleneck. In the next step of this project, a systematic compositional screening
approach will be applied combining qualitative results of bond valence calculations
with molecular dynamics and ab-initio simulations in order to discover hitherto
unknown combinations of elements in NZP compounds leading to stable crystal
structures with low migration barriers and therefore high conductivities for Li ions.
Acknowledgements This work was funded by the German Research Foundation (DFG Grant no.
El 155/26-1). The DFT calculations were performed on the computational resource ForHLR Phase
I funded by the Ministry of Science, Research, and Arts Baden-Württemberg and DFG (“Deutsche
Computational Analysis of Li Diffusion in NZP-Type Materials 187
1. Adams, S., Prasada Rao, R.: High power lithium ion battery materials by computational design.
Phys. Stat. Solidi A 208, 1746 (2011)
2. Alamo, J.: Chemistry and properties of solids with the [NZP] skeleton. Solid State Ion. 63, 547
3. Anantharamulu, N., Rao, K.K., Rambabu, G., Kumar, B.V., Radha, V., Vithal, M.: A wide-
ranging review of NASICON type materials. J. Mater. Sci. 46, 2821 (2011)
4. Brown, I.D.: Chemical and steric constraints in inorganic solids. Acta Crys. B48, 553 (1992)
5. Brown, I.D., Poeppelmeier, R. (eds.): Bond Valences. Springer, Berlin (2014)
6. Delmas, C., Nadiri, A., Soubeyroux, J.: The NASICON-type titanium phosphates ATi2 (PO4 )3
(A = Li, Na) as electrode materials. Solid State Ion. 28, 419 (1988)
7. Giannozzi, P., Baroni, S., Bonini, N., Calandra, M., Car, R., Cavazzoni, C., Ceresoli, D.,
Chiarotti, G.L., Cococcioni, M., Dabo, I.: Quantum ESPRESSO: a modular and open-source
software project for quantum simulations of materials. J. Phys. Condens. Matter 21, 395502
8. Hagman, L.O., Kierkegaard, P.: The crystal structure of NaMeIV IV
2 (PO4 )3 ; Me = Ge, Ti, Zr.
Acta Chem. Scand. 22, 1822 (1968)
9. Henkelman, G., Uberuaga, B.P., Jonsson, H.: A climbing image nudged elastic band method
for finding saddle points and minimum energy paths. J. Chem. Phys. 113, 9901 (2000)
10. Kamaya, N., Homma, K., Yamakawa, Y., Hirayama, M., Kanno, R., Yonemura, M., Kamiyama,
T., Kato, Y., Hama, S., Kawamoto, K.: A lithium superionic conductor. Nat. Mater. 10, 682
11. Knauth, P.: Inorganic solid Li ion conductors: an overview. Solid State Ion. 180, 911 (2009)
12. Lang, B., Ziebarth, B., Elsässer, C.: Lithium ion conduction in LiTi2 (PO4 )3 and related
compounds based on the NASICON structure: a first-principles study. Chem. Mater. 27, 5040
13. Martinez, A., Pecharroman, C., Iglesias, J.E., Rojo, J.M.: Relationship between activation
energy and bottleneck size for LiC ion conduction in NASICON materials of composition
LiMM’(PO4 )3 ; M,M’ = Ge, Ti, Sn, Hf. J. Phys. Chem. B 102, 372 (1998)
14. Monkhorst, H.J., Pack, J.D.: Special points for Brillouin-zone integrations. Phys. Rev. B 13,
5188 (1976)
15. Orlova, A.I.: Isomorphism in phosphates of the NaZr2 (PO4 )3 structural type and radiochemical
properties. Radiochemistry 44, 423 (2002)
16. Orlova, A.I., Koryttseva, A.K.: Phosphates of pentavalent elements: structure and properties.
Crystallogr. Rep. 49, 724 (2004)
17. Perdew, J.P., Burke, K., Ernzerhof, M.: Generalized gradient approximation made simple. Phys.
Rev. Lett. 77, 3865 (1996)
18. Pet’kov, V.I., Asabina, E.A., Markin, A.V., Smirnova, N.N.: Synthesis, characterization and
thermodynamic data of compounds with NZP structure. J. Therm. Anal. Calorim. 91, 155
19. Pet’kov, V.I., Orlova, A.I.: Crystal-chemical approach to predicting the thermal expansion of
compounds in the NZP family. Inorg. Mater. 39, 1013 (2003)
20. Vanderbilt, D.: Soft self-consistent pseudopotentials in a generalized eigenvalue formalism.
Phys. Rev. B 41, 7892 (1990)
Molecular Dynamics Simulations of Silicon:
The Influence of Electron-Temperature
Dependent Interactions
Abstract The well-known two-temperature model for solids with highly excited
electrons is extended from metals to semiconductors. It is combined with clas-
sical molecular dynamics simulations to study laser ablation in semiconductors
where charge carriers are created by the absorption of the laser light. The model
is improved by extending the static modified Tersoff potential to a dynamical
interaction which depends on the electron temperature of the material. Results are
presented for single and double pulses in silicon and are compared to a simple
rescale model where the laser energy is added as kinetic energy to the atoms.
1 Introduction
Tersoff potentials. The results demonstrate the importance of the combined MD-
TTM approach and electron-temperature adapted potentials.
We first present the continuum two-temperature model, then we add molecular
dynamics simulations. We introduce electron-temperature dependent interactions
and show how to determine them. We end up with results obtained by the new
2 Continuum-Atomistic Modeling
We begin with the modeling of laser pulse propagation. The power density of a laser
pulse on the surface of the sample can be given as a Gaussian function in space and
4 ln 2 .1 R/ .r2 =b2 / 4 ln 2.tt0 /2 =tp2
I.t/ D e e ; (1)
b2 tp
where R is the reflectivity, is the laser fluence, b is the laser pulse width, tp is the
laser pulse duration and t0 is the time at the maximal laser intensity.
For spatially homogeneous distributed laser power density this equation can be
simplified to
r " #
4 ln 2 .1 R/ .t t0 /2
I.t/ D exp 4 ln 2 : (2)
tp tp2
The propagation of the laser pulse is given by the rate equation [1]
@I.x; t/
D .˛ C n/I ˇI 2 (3)
if the direction of the laser beam is the x-axis. Here ˛ and ˇ are one- and two-
photon absorption coefficients, respectively, and is the free-carrier absorption
cross section. This model extends the Lambert-Beer law widely used for metals and
takes into account a variable number density of free carriers n in covalent materials.
The dynamics of electron-hole pairs density is given by
@n ˛I ˇI 2
n3 C
n; (4)
@t h 2h
where is the photon frequency and the last two terms correspond to Auger
recombination and impact ionization processes, respectively.
The total energy density of the electron-hole pairs can be treated as a sum of
potential and kinetic energy densities:
with band-gap energy Eg and carrier temperature Tc . Based on this equation the
following model for the energy transport process have been suggested [1, 2]
@Tc 3nkB
3nkB D r .kc rTc / .Tc Tl / (6)
@t c
@n @Eg
C .˛ C n/I .Eg C 3kB Tc / n :
@t @t
The corresponding energy transport equation for the lattice (index l) can also be
@Tl 3nkB
Cl D r .kl rTl / C .Tc Tl /: (7)
@t c
These two coupled partial differential equations, well known as the Two-
Temperature Model (TTM) have been established for continuum modeling of
electronic and lattice sub-systems after ultrashort laser irradiation.
For a more realistic description of the non-equilibrium lattice dynamics we
replace Eq. 7 by the Molecular Dynamics (MD) equation of motion:
d2 ri
mi D Fi mi viT ; (8)
1 PNFD 3nkB
NFD V mD1 c .Tl Tcm /
D PNv T 2
: (9)
kD1 mk .vk /
Here mk and vkT are the mass and the thermal velocity of the k-th atom, NFD is the
number of electronic iterations within a single MD step and Nv is the number of
atoms in a volume V.
In this work the heat transport equation for the carriers has been solved by a
Finite-Difference (FD) method and the molecular dynamics simulations have been
carried out with IMD, the ITAP Molecular Dynamics simulation package [3, 4].
For modeling the reflectivity and laser field absorption process we use the Drude
formula for the dielectric function [1, 5]:
!p 1
" D "r ; (10)
!L 1 C i=!L
109 1.0
107 0.7
106 0.5
10 0.2
1022 1023 1024 1025 1026 1027 1028 1029 1030
Carrier number density [m−3 ]
2!L =.Qn/
˛D ; (12)
where !L is the laser frequency, is the collision frequency parameter, c is the
speed of light, !p is the plasma frequency, nQ is the complex refractive index and "r
is the intrinsic dielectric constant. Figure 1 shows the dependency of the absorption
coefficient and reflectivity on the carrier number density for a laser wavelength of
775 nm.
3 Interatomic Potentials
During the last decades a significant number of interatomic potentials for silicon
have been developed and widely used in molecular dynamics simulations:
• Modified Embedding Atom Method (MEAM)
• Stillinger-Weber Potential (SW)
• Tersoff Potential (T3)
• Modified Tersoff Potential (MOD)
• Environment-Dependent Interatomic Potential (EDIP).
A comparison of modeled physical properties with experimental data is presented
in Table 1. Except for the melting temperature value, all of the listed physical
properties are in good agreement with experiment. However, since the melting
temperature is a major thermodynamical material property for modeling the laser
ablation, the molecular dynamics simulations in this work were performed using
the MOD potential.
Laser Ablation of Silicon 193
Table 1 Elastic constants, bulk modulus and melting temperatures of silicon using different
interatomic potentials compared with experimental data [6]
Property Exp MEAM SW T3 MOD EDIP
C11 , GPa 166 167 162 143 166 175
C12 , GPa 64 65 82 75 65 62
C44 relaxed, GPa 80 80 60 69 77 71
B, GPa 99 99 108 98 99 100
Tm , K 1683 2990 1691 2547 1681 1520
In the original Tersoff interaction the total potential energy V is modeled as a sum
of pair-like repulsive VR and attractive VA interactions with environment-dependent
coefficient b:
VD fC .rij /ŒVR .rij / bij VA .rij / (13)
were introduced in the MOD potential to improve the melting temperature value.
The parameters for this potential are listed in Table 2.
Under strong laser irradiation the anti-bonding states of covalent materials are
occupied. This has the consequence that the potential energy surface and thus the
interatomic interactions change nearly instantaneously. The resulting interatomic
forces can induce non-thermal processes in the lattice such as melting or phase
transformation, equivalently to ordinary thermal processes. To take these effects into
account, the MOD potentials for silicon, called MOD*, were parameterized depen-
dent on the electronic temperature by using finite-temperature density functional
theory (FTDFT) calculations.
First we prepared a set of silicon configurations: simple cubic (sc), body-
centered cubic (bcc), face-centered cubic (fcc) and cubic diamond crystal structures
194 A. Kiselev et al.
Fig. 2 Dependency of cohesive energy on carrier temperature and the lattice constant for silicon
P.Tc / D an Tcn :
Laser Ablation of Silicon
160 C11
Fig. 3 Dependence of elastic constants Cij and bulk modulus B on carrier temperature Tc
Here the a0 coefficients, which correspond to the MOD* parameters for silicon at
zero temperature (Table 2), are differ from the original MOD parameters since we
used the DFT data instead of experimental data of elastic constants for the fitting
(Fig. 3).
For the electron-temperature dependent potentials used in molecular dynamics
simulations the force calculations have to be extended:
X @V @Pk
FD rTc rV;
@Pk @Tc
where Pk are the temperature dependent potential parameters, which were calculated
at each atomic position ri by using a trilinear interpolation method.
4 Results
The molecular dynamics simulations of laser ablation were performed for a box of
constant size 1124 4:34 4:34 nm3 with approximately 106 silicon atoms and a
time step of 0.101806 fs. This represents a 1-m thick silicon film. The simulation
domain was divided into 750 FD cells along the x-axis, which corresponds to the
[1 0 0] crystallographic direction. In this case each of these FD cells contains
nearly 1333 atoms. Periodic boundary conditions were applied in y- and z-directions
whereas open boundaries were assumed along the x-direction. The electronic-
temperature dependent MOD* potential was used throughout this work.
The two-temperature-model for the electronic system was solved on a regular
finite difference grid with a time step of 0.0051 fs i.e., 200 electronic iterations
Laser Ablation of Silicon 197
within a single MD step. The material parameters for silicon were chosen according
to [1] and are listed in Table 4.
The laser pulses were modeled with a Gaussian temporal profile with the full
width at half-maximum of 100 fs and a wavelength of 775 nm. This wavelength
corresponds to a photon energy of 1.6 eV which is higher than the band gap, thus
the two-photon absorption can be neglected. The second pulse delay for double-
pulse simulations a value of 0.25 ps was chosen and for the laser field absorption
mechanism and reflectivity the Drude model was applied.
First we performed molecular dynamics simulations at constant room temper-
ature (300 K) and zero pressure for a few thousand steps in order to reach an
equilibrium atomic configuration, while the electronic temperature was also kept
constant according to the Fermi-Dirac distribution.
Then we investigated the evolution of carrier and lattice temperatures after
laser irradiation with single and double pulses for the fluences between 0.02 and
0.15 J/cm2 .
Figure 4 shows the carrier density at the front film surface for single pulses with
laser fluences of 0:075, 0:1 and 0:12 J/cm2 , and double pulses with laser fluences
of 0:05, 0:075 and 0:1 J/cm2 . As expected, we observe a linear dependency of the
maxima on laser fluences according to the carrier number rate equation (4) for both
laser pulse sequences. We can also see increasing of maxima at the second laser
pulse due to the rise of carrier number densities after the first peaks. The Auger
recombination, on the other hand, decreases the carriers number density.
The temporal evolution of the carrier temperatures for single and double pulses
is plotted in Fig. 5. A nearly linear dependency of the maxima can be observed here
also. The carrier temperatures maxima are shifted. The rapid increase of carriers
temperature during decreasing of carrier number density is a consequence of the
fifth term on the RHS of energy balance equation (6), which is proportional to
the negative time derivative of density n. Here the potential energy of carriers is
198 A. Kiselev et al.
Fig. 4 Carrier density at the front film surface for single and double pulses at several laser fluences
Fig. 5 Carrier temperature at the front film surface for single and double pulses at several laser
Fig. 6 Lattice temperature at the front film surface for single and double pulses at several laser
Fig. 7 Spatial and temporal evolution of atomic density for single pulses with laser fluence of
0:12 J/cm2 above the damage threshold
of atomic densities from single pulse and double pulse simulations, respectively,
is plotted. The calculated results for single pulses are comparable to experimental
values for the ablation threshold in silicon as reported by Pronko et al. [12]
E D 0:17 J/cm2 ( D 800 nm, tp D 100 fs) and E D 0:108 J/cm2 ( D 786 nm,
tp D 90 fs).
200 A. Kiselev et al.
Fig. 8 Spatial and temporal evolution of atomic density for double pulses with laser fluence of
0:10 J/cm2 above the damage threshold
5 Benchmark Numbers
As a benchmark for the performance of IMD on the Hazel Hen of the HLRS we
report the results of a shear rate simulation since numbers of the overall performance
of IMD and the performance in ablation simulations have been given in previous
reports. The system studied was a block of Ag-Cu alloy with the size 525270 nm
containing 16 million atoms. Several simulations with different shear rates have
been carried out together with a minimized shearing (minim in Table 5). The most
interesting column in Table 5 is the last labeled “quotient”. It contains the number
of MD steps per node hour, also called wall time. Obviously this number varies
largely and not very systematically. The reason is that the performance depends
strongly on the specific type of defects generated at a certain shear rate. This also
shows that it is not possible to use one of the cases for optimization.
Laser Ablation of Silicon 201
In the row labeled “minim” in Table 5 the shearing is carried out in such a way
that the energy of the probe is minimized at each time steps. This explains the much
smaller performance in this case.
6 Conclusions
7. Kresse,G., Hafner, J.: Ab initio molecular dynamics for liquid metals. Phys. Rev. B 47, 558–
561 (1993)
8. Kresse, G., Joubert, D.: From ultrasoft pseudopotentials to the projector augmented-wave
method. Phys. Rev. B 59, 1758–1775 (1999)
9. Brommer, P., Gähler, F.: Potfit: effective potentials from ab-initio data.
Model. Simul. Mat. Sci. Eng. 15, 295–304 (2007)
10. Brommer, P., Gähler, F.: Effective potentials for quasicrystals from ab-initio data. Philos. Mag.
86, 753–758 (2006)
11. Shokeen, L., Schelling, P.K.: Thermodynamics and kinetics of silicon under conditions of
strong electronic excitation. J. Appl. Phys. 109, 073503 (2011)
12. Pronko, P., VanRompay, P., Horvth, C., Loesel, F., Juhasz, T., Liu, X., Mourou, G.: Avalanche
ionization and dielectric breakdown in silicon with ultrafast laser pulses. Phys. Rev. B 58, 2387
Non-linear Quantum Transport in Interacting
1 Introduction
B. Schoenauer
Center for Extreme Matter and Emergent Phenomena, Institute for Theoretical Physics, Utrecht
University, Princetonplein 5, 3584 CE Utrecht, The Netherlands
e-mail: b.m.schonauer@uu.nl
P. Schmitteckert ()
Lehrstuhl für Theoretische Physik I, Physikalisches Institut, Am Hubland, Universität Würzburg,
97074 Würzburg, Germany
e-mail: peter.schmitteckert@physik.uni-wuerzburg.de
initial state with a charge imbalance and is reviewed in [7]. In cooperation with
Edouard Boulat and Hubert Saleur we have been able to show that our approach is
in excellent agreement with analytical calculations in the framework of the Bethe
ansatz [4]. This agreement is remarkable as the numerics is carried out on a lattice
model, while the analytical result is based on field theoretical methods in the
continuum limit. Most strikingly, we proved the existence of a negative differential
conductance (NDC) regime even in this simple model of a single resonant level
with interaction on the contact links. In an extension of this approach we presented
results for current-current correlations, including shot noise, based on our real time
simulations in an earlier HPC report [8], see also [5, 6]. We then managed to include
a counting field into our time dependent simulations which allowed us to obtain the
full counting statistics (FCS) via the direct simulation of the cumulant generating
function [10, 21]. Finally we have been able to obtain the sub leading corrections
to the FCS of charge transport in the inverse measuring time [11, 12]. Despite this
success story of obtaining steady state transport properties from quenches in the
charge imbalance it is an open question, whether a system will always relax to the
true steady state. In classical mechanics it is well known that driven systems can
end up in an oscillatory or even chaotic state. In order to extend studies to this issue
we expand our impurity systems to interacting structures with an internal degree of
The initial state of our spin-less fermion systems is given by the lowest eigenstate
of the Hamilton matrix. The time evolution as a solution of the linear Schrödinger
equation is then given by the action of a matrix exponential of the Hamiltonian
governing the time evolution on the initial state. The numerical problem arises
from the fact that the dimension of the corresponding Hilbert space is given by
2M with M the number of lattice sites typically ranging from a few tens to a few
hundreds of sites. The DMRG is now an iterative procedure to search for a suitable
subspace of the complete Hilbert space that is sufficient to represent the system.
Exploiting particle number conservation allows us to implement a block structure
for the Hamilton matrix and by employing a dyadic representation of the vector
Hilbert space the matrix vector product needed for the sparse matrix diagonalization
and matrix exponentials can be represented by a set of matrix multiplications, for
details see [22]. Our parallelization strategy consist of a master-worker queue where
the work chunks are given BLAS-3 and LAPACK function calls. These matrix
operation are then evaluated in a single threaded operation [22]. In addition we
enhance the concurrency of the code by an asynchronous evaluation of the scalar
products. Specifically in the evaluation of the matrix exponential we can schedule
several steps concurrently, i.e. we can fill the worker queue from the main thread
while the worker threads are still occupied with a previous scalar product. We can
parallelize the sweeping procedure and distribute it over several nodes. However, to
obtain a single I/V-characteristic we have to perform a complete DMRG simulation
for each voltage value of interest. Since we have to run many DMRG jobs we
use an embarrassingly parallel strategy here, as a distributed calculation would
lead to a decreased performance. By default we have allocated an entire node
consisting of 20 processors to each DMRG calculation. These processors were used
Non-linear Quantum Transport in Interacting Nanostructures 205
by four master threads and sixteen worker threads as described in [22]. Each of
the processors was assigned 3200 MB of memory. Using these default resources
our DMRG calculations had an average runtime of two weeks. We typically used a
few hundred GB of scratch data on the local disk $TMP which was written to the
$WORK-directory for an eventual restart of the jobs.
As pointed out in the introduction, the intention of this work is to establish and
investigate a simple model system that comprises a small additional amount of
degrees of freedom compared to the commonly studied single impurity systems.
The model is suitable to approximate benzene-like molecules and structures, as well.
From the model system we expect new insights into the non-equilibrium behavior of
interacting structures. From its similarity to benzene we hope to better understand
the charge transport behavior of the periodic benzene structure that is graphene. The
obvious system that fulfills our prerequisites is a preferably small interacting ring
structure. A similar model system has already been used by Bohr and Schmitteckert
[1] to study interaction and interference effects in benzene. We extend this model
to also incorporate asymmetries, which for example occur for doped benzene or
graphene sheets and for which recent density functional theory (DFT) studies by
Walz et al. [25] predict that they give rise to large circular currents.
where the three individual Hamiltonians describe the three different regions of the
First, we present the ring structure. It is basically a lattice discretized ring, which
is quadratic and consists of four localized orbitals. We will subsequently refer to
these orbitals as sites to emphasize their localized nature. Between the sites there is
a tunneling matrix element J 0 , which is also called hopping element. To decrease the
complexity of the models we only allow for spin-less fermions, which is equivalent
to a strong external magnetic polarization, that enforces the same spin polarization
206 B. Schoenauer and P. Schmitteckert
1 1
Hring D U nx ny J 0 dx dy C h:c: C T n2 ;
2 2
where hx; yi denotes neighboring sites in the ring, dx is the annihilation operator for
a fermion on site x and nx D dx dx is the occupation number operator at a site x. The
gate potential is applied only to site x D 2. The structure of the interaction term
preserves the particle-hole symmetry.
The metallic leads are modeled as a half-infinite chain of sites which is often
referred to as tight-binding chain because it describes materials, in which the
electrons are tightly bound to the ions of the crystal lattice and can only tunnel
from orbital to orbital. The leads are also modeled as sufficiently large, such that
the electron-electron interaction is completely screened and one can therefore omit
the interaction term. Furthermore, since the ring structure is small, one assumes,
that only a single transport channel of the leads couples to it, which makes it
possible to model the leads as two one-dimensional chains. This fits well with
the preference of the density-matrix renormalization group for one-dimensional
systems. The electrons in the leads are also spin-less and the Hamiltonian is given
0 1
Hlead D J @ cx cx1 C h:c:A ; (2)
where J denotes the tunneling matrix element in the leads and cx the annihilation
operator at a site in the leads. Here, the index x runs from ˙2 to ˙1.
The non-interacting tight-binding chain can be solved analytically by a plane
wave ansatz with wave vector k. This ansatz yields the dispersion relation of the
system, which reads
Fig. 1 Schematic representation of the two model system. The four ring sites making up the
structure are shown as red and green circles. The red bonds indicate sites with nearest-neighbor
interaction. An on site gate potential T is applied to the green site in the upper half of the ring.
Blue sites belong to the noninteracting tight-binding leads. The coupling JL between leads and ring
structure is represented by a dashed line
The energy band of the system consequently has the shape of a cosine and the
bandwidth D D 4J.
The ring structures are symmetrically coupled to the leads as shown in Fig. 1.
The tunneling element JL is chosen to be smaller than the tunneling in the ring
and the leads respectively to pronounce the features induced by the interacting
nanostructure. In the limit of very small coupling, a perturbation theory in the
coupling is possible and subject of future work. The coupling Hamiltonian is
given by
Hcoupling D JL d1 c1 C dNR c1 C h:c: ; (4)
where the operator d1 creates a particle on the left site of the ring structure and dNR
creates a particle on the right site of the ring.
(a) (b)
Fig. 2 Different quench setups. (a) Schematic representation of the initial conditions correspond-
ing to an Hamiltonian where the bias voltage ˙VSD =2 is applied to left respectively right lead
at t D 0 and switched off afterwards. The voltage is applied exclusively to the leads and not to
the ring structure. The bandwidth 4 J of the leads originates from cosine band of the tight-binding
chain. (b) Representation of the quench setup where the bias voltage ˙VSD =2 is applied to leads
in the Hamiltonian that is used for time evolution. Half of the states in both leads are occupied and
the bands are shifted against each other for all times t > 0. If VSD > 2 J=e electrons at the Fermi
level of the left lead and holes close to the Fermi level in the right lead have no state in the opposite
lead to tunnel into. The current decreases as a result, which is unphysical
Fig. 3 Initial setup of the system. (a) Electron density of the ground state 0 when a bias VSD D
0:4 J=e is applied at t D 0. The electron density is below n D 0:5 in the left lead and above n D 0:5
in the right lead. In both leads we observe Friedel oscillations of the electron density. The particle
number on the sites in the ring structure adds up to exactly N D 2, because the nearest-neighbor
interaction forces half filling in the ring structure. The second result of the interaction is a clear
particle density wave in the ring structure, where the left and the right site of the ring are almost
completely filled and the top and bottom site are nearly empty. (b) I-V characteristics that result
from the two different quench setups that are shown in Fig. 2. If the bias voltage is applied only at
t D 0 the current agrees with the analytical results for all values of VSD . If the bias voltage is added
to the time evolution Hamiltonian, the current decreases for VSD > 2 J=e because the two band are
shifted against each so far that certain particles or holes close to the Fermi level cannot tunnel into
the opposite lead
The ground state 0 of this Hamiltonian Hquench , which is calculated using the
standard finite lattice DMRG, is characterized by a charge imbalance, that is
depicted in Fig. 3a. The energy difference between the highest occupied energy
levels in the right lead (source) and the left lead (drain) should be equivalent to
the bias or source-drain voltage VSD .
Non-linear Quantum Transport in Interacting Nanostructures 209
For all times t > 0, the external potential is switched off and one calculates
the time-resolved expectation values of a chosen set of observables for the state 0
while the dynamics are governed by H0 .
It is equally possible to determine the initial state 0 for the unperturbed system
H0 and to apply the external potential for all times t > 0, which results in the
time-dependent behavior of 0 being determined by Hquench . This is equivalent to
an energy shift of the entire band of the leads by ˙VSD =2. As shown in Fig. 3b, this
leads to unphysical results for current for bias voltages VSD > 2 J=e. For VSD
4J=e, where D D 4J=e is the bandwidth, the current vanishes completely because
energy conservation prohibits the tunneling of particles respectively holes from one
energy band to the other.
We refer to these two options as instantaneous quenches, since the switching
of the bias voltage occurs with an infinite velocity. The former of the two quench
methods has been our standard procedure for the system quench because of the
obvious drawbacks of the latter one. It has been applied whenever nothing else is
stated. In both cases the leads also act as a particle reservoir.
5 Numerical Dynamics
For a numerical calculation of the time evolution the time is discretized into time
steps t D tj tj1 D 0:4 J1 . The time evolution is performed within the td-DMRG
calculation in the fashion explained in detail in [7]. We apply Krylov subspace
methods to facilitate the time evolution via a matrix exponential function as it
rigorously assures unitarity. Depending on the choice of the quench procedure the
time evolution is either
ˇ ˛ ˇ ˛
ˇ tj D eiH0 .tj tj1 / ˇ tj1 (6)
ˇ ˛ ˇ ˛
ˇ tj D ei.H0 CHSD /.tj tj1 / ˇ tj1 : (7)
The maximum number of time steps is limited by the system size because after a
finite time T the fastest wave packets have reached the hard wall boundary of the
system and are reflected.
To properly explore our model numerically, we vary the parameters of our Hamilto-
nian. To simultaneously ensure a certain degree of comparability between different
calculations, we have defined a set of standard parameters, that is used for most of
210 B. Schoenauer and P. Schmitteckert
the calculations. In one set of calculations only a single parameter is varied while
the other parameters assume their standard values. The standard parameters help to
reduce the overall computational costs.
The standard system size is set to M D 72 system sites. The number M D 72
is a compromise between a system size large enough to study the dynamics for a
reasonable time t < T and the calculation time that is needed for the system. Using
the chosen standard system size a basic DMRG calculation has a reasonable runtime
of at most two days depending on the ForHLRI computer cluster.
As a result of the chosen system size, we adopt a bias voltage of VSD D 0:4 J=e.
The frequency of the oscillation artifacts that result from said bias voltage is large
enough to obtain reasonable results from our fitting procedure while the bias voltage
is still small enough to avoid the additional errors that occur for bias voltages close
to the band width. From early calculation we deduce that VSD D 0:4 J=e is the most
suitable choice for the standard parameter since it is equal to the actual effective
bias voltage for our chosen standard system size and thus reflects well the behavior
of an infinite system to which the same bias voltage is applied.
The standard value of the on-site gate potential T D 0:5 J derives from our
first calculations where we have found a suitable oscillation frequency of the ring
currents in response to the gate potential.
We define three different regimes for the interaction strength. The regime of
weak interaction is by default calculated with a value U D 0:1 J. The strong
interaction regime uses a default value U D 1:0 J and very strong interaction is
commonly calculated using U D 2:0 J. Calculations have been done for a wide
range of interaction strengths to confirm, that the chosen values properly represent
the particular regimes.
For the coupling between leads and ring structure we have chosen a standard
value of JL D 0:5 J to model that ring and leads consist of different materials with
an imperfect coupling.
Early calculations have been done using a time discretization of t D 0:25 J1 .
For later we have used a more coarse discretization of t D 0:4 J1 . Calculations
for both discretization widths haven been compared to ensure that they yield the
same results.
The DMRG calculations have kept a minimum of N D 700 states per block and
a maximum of N D 2800 states per block. Particular calculations that constantly
needed 2800 per block were rerun using a minimum of N D 900 and a maximum
of N D 3600.
By default we have calculated the current through four distinct bonds of the
systems. The particular bonds and the direction of positive current are indicated
by the purple arrows in Fig. 4. We will subsequently denote the current through the
bonds located in the leads as the transmitted current and the currents through the
indicated bonds in the ring as upper link current and lower link current.
Non-linear Quantum Transport in Interacting Nanostructures 211
Fig. 4 The model system and the position of the measured currents. The image shows the model
system. A gate potential is marked by the green circle and is given the standard value T D 0:5 J.
For the nearest-neighbor interaction between the sites of the ring (marked by the red links) we
distinguish three regimes. The standard interaction strength for weak interaction is U D 0:1 J, for
strong interaction U D 1:0 J and very strong interaction U D 2:0 J. The hopping elements in the
ring are J 0 D 1 J while the couplings between ring and leads are JL D 0:5 J. The transmitted
current is the mean value of the currents measured in the left and right leads marked by the purple
arrows. The upper link current is measured at the position of the upper purple arrow in the
respective ring. The lower link current is measured at the position of the lower arrow in the ring.
The purple arrows also indicate the direction of positive current
We now turn our focus to the main objective of this work, which is the investigation
of the time-dependent behavior of currents in the model system. We obtain this
behavior from our td-DMRG calculations and examine it for local transient and local
steady state regimes. In this way we try to answer our initial question of whether the
local observables, e.g. the currents through the studied links, always relax to a steady
state. Constant non-equilibrium effects of the currents would hereby indicate that the
certain relaxation is in fact a false assumption. The properties of the currents in the
ring structures, which will subsequently be called ring currents, for finite interaction
are moreover interesting in connection with the work by Walz et al. [25]. They find
ring currents orders of magnitude larger than the transmitted current in their DFT
studies of hydrogen doped Graphene.
For a thorough analysis of the dynamics of the system, we perform td-DMRG
calculations for a wide range of interaction strengths 0 J < U 6:0 J and bias
voltages 0:1 J VSD 4 J=e. Several calculations have been done for varying
system sizes to detect if particular features of the time-dependent currents are caused
by the finite system size. The data is analyzed manually and checked for transient
and steady state regimes. If oscillations are found in the currents, they are first fitted
with cosine of fixed frequency ! D VSD . An oscillation with frequency ! D VSD
would indicate finite size effects. If the oscillations and the fits mismatch, a second
cosine fit with variable frequency is performed to obtain a value for the amplitude
and the frequency of the oscillation. Figures 5, 6, 7, 8, and 9 display the time-
dependent currents (as labelled in Fig. 4) for different interaction strengths U and
the standard parameters for VSD D 0:4 J=e and T D 0:5 J.
212 B. Schoenauer and P. Schmitteckert
Fig. 5 Time-dependent currents of the quadratic ring system for U D 0:1 J. The plot displays the
measured currents as a function of time t for a system with M D 72 sites and parameters T D 0:5 J
and VSD D 0:4 J=e. The currents are labelled according to Fig. 4. The oscillation frequency of the
transmitted current for t > 10 J1 is ! D 0:398 ' 0:4 D VSD . The oscillation of the currents is
therefore regarded as a finite size effect
Fig. 6 Time-dependent currents of the quadratic ring system for U D 0:5 J. The plot displays the
measured currents as a function of time t for a system with M D 72 sites and parameters T D 0:5 J
and VSD D 0:4 J=e. The currents are labelled according to Fig. 4. The oscillation frequency of the
transmitted current for t > 10 J1 is ! D 0:388 ' 0:4 D VSD . The oscillation of the currents is
therefore regarded as a finite size effect
From Fig. 5, we conclude, that weak interaction does not visibly change the time-
dependancy of the currents. For times t > 10 J1 no more transient effects can be
observed. Remaining oscillations have a frequency ! D VSD and are only a result
of the finite system size. For the oscillations inside the ring we observe a phase
difference of D =2 compared to the oscillations of the transmitted current.
Non-linear Quantum Transport in Interacting Nanostructures 213
Fig. 7 Time-dependent currents of the quadratic ring system for U D 1:0 J. The image shows the
measured currents as a function of time t for a system with M D 72 sites and parameters T D 0:5 J
and VSD D 0:4 J=e. The currents are labelled according to Fig. 4. The oscillation frequency of the
upper ring current !u D 0:302 and the frequency of the lower ring current !l D 0:301 at times
t 5 J1 are noticeably smaller than the bias voltage VSD D 0:4 J=e. The oscillation of the ring
currents is therefore assumed to be a novel effect, that is not caused by the finite system size. The
transmitted current still exhibits an oscillation with frequency ! ' VSD . This remaining oscillation
of the transmitted current corresponds to the familiar finite size effect
Fig. 8 Time-dependent currents of the quadratic ring system for U D 2:0 J. The plot shows the
measured currents as a function of time t for a system with M D 72 sites and parameters T D 0:5 J
and VSD D 0:4 J=e. The currents are labelled according to Fig. 4. The oscillation frequency of the
upper ring current !u D 0:361 and the frequency of the lower ring current !l D 0:360 at times
t 5 J1 are not equal to the bias voltage VSD D 0:4 J=e. The oscillation of the ring currents is
assumed to be a novel effect, that is not caused by the finite system size. The oscillation of the
transmitted current with frequency ! ' VSD has become barely noticeable
Calculations for different bias voltages yield qualitatively identical results so that it
can be concluded that the studied currents in the system reliably relax to the steady
state in the regime of weak interaction.
214 B. Schoenauer and P. Schmitteckert
Fig. 9 Time-dependent currents of the quadratic ring system for U D 4:0 J. The figure displays
the measured currents as a function of time t for a system with M D 72 sites and parameters T D
0:5 J and VSD D 0:4 J=e. The currents are labelled according to Fig. 4. The oscillation frequency of
the upper ring current !u D 0:426 and the frequency of the lower ring current !l D 0:425 at times
t 5 J1 have now become larger than to the bias voltage VSD D 0:4 J=e. The oscillation of the
ring currents is still assumed to be a novel effect, that is not caused by the finite system size. An
oscillation of the transmitted current with frequency ! ' VSD no longer visible. The oscillation
artifacts seems to be increasingly suppressed with growing interaction strength U
For interaction strengths 0:1 J < U 1:0 J, one recognizes a behavior of the
currents that is increasingly different from the case of vanishing interaction. In Fig. 6
we display the time-dependent currents for interaction strength U D 0:5 J and in
Fig. 7 for U D 1:0 J. The former shows a significant decrease of the amplitude
of the finite size induced oscillations and the appearance of a single oscillation
period of deviant frequency for times t < 15 J1 in the ring currents only. For
increasing interaction, the finite size oscillation becomes more and more suppressed
and effectively vanishes for U D 1:0 J. Meanwhile, another oscillation with a
relatively large amplitude and frequency ! ¤ VSD becomes the dominant feature
of the ring currents. Calculations for different bias voltages show that the frequency
of this oscillation is completely independent of the applied bias. The amplitude of
the oscillation is also large enough for the upper link current to temporarily change
direction. From Fig. 7, one can deduce the dynamics of the currents in the ring
structure to be as follows:
1. t < 13 J1 : A relatively large current flows through the lower half of the ring
from the right side to left side of the ring structure. At the left ring site only
about one third of the electrons leaves the ring structure and tunnels into the lead
while the other two thirds flow from the left ring site to the top site of the ring.
Non-linear Quantum Transport in Interacting Nanostructures 215
2. 13 J1 . t < 16 J1 : Two small currents flow from the top and bottom sites of
the ring structure towards the left ring site from where the current flows into the
3. 16 J1 t . 20 J1 : A current is only flowing from the top site to the left site.
The current from the left ring site into the leads is equivalent to the upper link
current. At t D 20 J1 the systems changes back to the behavior of the previous
time frame and is subsequently oscillating between the states 1 and 3, with 2
being the intermediate state of the oscillation.
At least for very strong interaction, the oscillation of the ring currents does not
decay on the time scales accessible to our calculations. Because of this permanent
oscillation one could suspect that the currents of the system might not relax to the
steady state for interaction strengths U > 1:0 J.
At this point, we would like to remind the reader that the primary intention
of this work is to examine whether the local ring currents in the chosen model
system demonstrably remain in a transient state or eventually relax to a steady state.
The observation of a permanent transient state would effectively falsify the “steady
state relaxation assumption” that is made when the Landauer formula [14, 15] is
employed. This Landauer formula is widely used to calculate electronic properties
of materials, particularly molecular materials.
The results we have shown so far suggest that the quadratic interacting ring
structure might already fit the bill. However, one first has to determine the origin
of the seemingly permanent oscillation of the ring currents in order to confirm that
the quadratic ring structure has indeed the desired properties.
Since oscillating currents are already known as a consequence of finite system
sizes it stands to reason that the finite system size may also be the cause of the newly
found ring current oscillation. We have therefore performed additional calculations
in which a potential origination from finite system size can be unveiled. In a
first attempt we vary the system size and search for potential shifts in frequency,
amplitude or phase of the oscillation. A second approach makes use of damped
boundary conditions, see [2, 7, 20]. Such boundary conditions of the leads reduce
the energy gaps in the vicinity of the Fermi level and should also result in deviating
properties of the oscillation, if it appears due to a finite level discretization.
The results, as of yet, indicate that the frequency of the ring current oscillation
depends solely on the interaction strength and the on-site gate potential. From this
we conclude that the oscillation is not equivalent to the known finite site oscillation
artifact. It brings about the question in which way the two parameters actually
influence the oscillation. This question is addressed in Sect. 9.
In order to further rule out the finite system size as the origin of the ring current
oscillation, we perform td-DMRG calculations for systems with modified leads.
The last ten sites of each lead are coupled by exponentially decreasing tunneling
elements. This is known as damped boundary conditions (DBC) and is explained in
detail in [2, 7, 20]. The purpose of the damped boundary conditions is a reduction
of the energy gap at the Fermi level. This energy gap is responsible for several
finite size effects such as the additional oscillations on top of the steady state [7].
The modification of an effect in response to damped boundary conditions is a good
indicator for a finite size effect. Damped boundary conditions have a significant
drawback though. They drastically reduce the time T , which is the time before
reflected wave packages return to the interacting structure. The size of the system
Non-linear Quantum Transport in Interacting Nanostructures 217
Fig. 10 Time-dependent currents of the quadratic ring system with damped boundary condition
for U D 1:0 J. The image shows the measured currents as a function of time t for a system
with parameters T D 0:5 J and VSD D 0:4 J=e. The size of the system is originally M D 72
sites. Because of the damped boundary conditions the effective system size is M ' 52 sites.
Wave packages already get reflected at the first link with a decreased tunnel matrix element
JDBC D n J, which result in a significantly smaller transit time T 0:25 J1 . The currents are
labelled according to Fig. 4. The oscillation frequency of the upper ring current !u D 0:298 and the
frequency of the lower ring current !l D 0:248 at times 5 J1 t 20 J1 are noticeably smaller
than the bias voltage VSD D 0:4 J=e. The oscillations persist despite the fine energy resolution at
the Fermi level due to the damped boundary conditions. This suggests, that the oscillation of the
ring currents is not a finite size effect
for which we illustrate the results in Fig. 10 is thus effectively reduced from M D 72
sites to M D 52 sites resulting in a time T D 252J D 26 J1 .
Regular td-DMRG calculations were performed for a system of M D 72 sites
where the tunneling matrix element of the ten outermost sites of both leads was set
to JDBC D n J. We choose D 0:5 and n D 1 : : : 10, were n D 10 for the last site
of each lead respectively. For the other parameters of the system a set of values was
chosen for which we have observed ring current oscillations in prior calculations.
In Fig. 10 we show the results of a calculation, that exemplifies the set of
calculations employing damped boundary conditions. For parameters for which
an oscillation of the ring currents can be found in a regular system one also
finds these oscillations for a system modified by damped boundary conditions. By
comparing Figs. 7 and 10 one however discovers that the amplitude of the oscillation
is approximately 40 % smaller for the system with damped boundary conditions.
Although this might be a hint for a finite size effect, calculations for regular systems
with M D 96 and M D 150 sites show no reduction in the amplitude, where one
would expect a reduced amplitude by 25 % and 52 % for an oscillation caused by
the finite system size.
From the calculations employing damped boundary conditions one can again
conclude that the oscillation of the ring currents is not related to the familiar finite
size oscillation artifacts. They neither depend on the bias voltage VSD nor do they
218 B. Schoenauer and P. Schmitteckert
So far, we have found the ring structure model to be a promising candidate for
a model system, for which the relaxation of local observables to the steady state
occurs at times t ! 1. The presented calculations have mainly been aimed at
studying a wide range of system parameters, targeting only system sizes of M ' 72
sites and times t 30 J1 . Short time frames do not allow to deduce whether the
oscillation of the ring currents and transmitted currents is actually permanent or
decays after t D 30 J. We have therefore performed a second series of calculations
to specifically target longer simulated times.
A calculation of longer times needs to go hand in hand with the calculation
of larger system sizes. This was pointed out in Sect. 5 and means an enormous
increase of computer time for a small increase in simulated time. Therefore only
few calculations have been done to explore the long time limit in a first series of
calculations, that was performed before access to the ForHLRI computer cluster
was obtained. The chosen system size for these first calculations has been M D 96
system sites, resulting in a transit time T ' 45 J1 . Since this is a rather small
increase in simulated time, some additional measures have been taken to obtain
information about even longer times. The system parameters were chosen such that
the frequency of the ring current oscillations is large enough to observe several
oscillation periods but small enough that a wave contains sufficient data points
to properly determine amplitude of the oscillation. In Fig. 11 we show the results
of a calculation for quadratic ring structure using the parameters U D 4:0 J and
T D 0:5 J. For the chosen parameters, we obtain an oscillation frequency of
! ' 0:43 J, which meets the requirements.
In conjunction with the fit of a cosine to both ring currents one can estimate from
the amplitude of the data points whether and how fast the oscillation decays. A close
look at Fig. 11 reveals that all data points lie on the fitted cosine function for times
30 J1 t T . One cannot recognize a decay in amplitude, neither exponential
nor linear. Instead one observes a more clean oscillation with progressive time,
suggesting that other transient effects have already decayed on the calculated time
scale. The decay of this ring current oscillation is therefore either not taking place
at all or extremely slow. More recent calculations for system sizes of M D 120 sites
confirm this finding.
Our results do not completely rule out an eventual relaxation of the local currents
inside the interacting ring structure to a local steady state, but they strongly suggest
that a relaxation is at the very least taking place on time scales orders of magnitude
larger than the time we can simulate using td-DMRG.
Non-linear Quantum Transport in Interacting Nanostructures 219
Fig. 11 Limit of long time for the quadratic ring system and U D 4:0 J. The image displays
the measured currents as a function of time t for a system with M D 72 sites and parameters
T D 0:5 J and VSD D 0:4 J=e. The currents are labelled according to Fig. 4. The oscillation
frequency ring currents !u ' 0:43 is chosen such that sufficiently many half-waves fit into the
time frame 5 J1 t 50 J1 while being small enough that the amplitude of the half-waves can
be clearly resolved. The oscillation amplitude of the ring currents is not noticeably decaying for
times t ttrans T . A decay of the amplitudes is therefore supposed to happen at times scale that
are orders of magnitude larger than the time, that can be simulated with td-DMRG. Our results do
not exclude, that the ring current oscillation is not decaying at all
The td-DMRG calculations for the quadratic ring structure show that the frequency
of the ring current oscillation is independent of the applied bias voltage and rather
depends on the interaction strength U. They are particularly depending on the size
of the gate potential T . As can be seen from Figs. 6 and 7 one also needs a large
interaction strength to observe the ring current oscillation for the quadratic structure
in the first place. This motivates the investigation of the isolated interacting ring
structure to determine how the interaction influences the eigenstates of the ring
structure. A special focus of this investigation is hereby on the spectrum of the
ring structure as a function of the interaction strength. Energy differences in the
spectrum, that are comparable to the frequency of the oscillation, could hint at states,
that are involved in the oscillation process. If such energy differences are found
in the spectrum, one can then check in td-DMRG calculations if the oscillation is
indeed connected to the corresponding eigenstates of the ring.
In order to obtain the spectrum, we construct the Hamiltonian of the ring structure
in the complete many-body Hilbert space basis. A complete diagonalization of the
Hamiltonian matrix is performed to obtain each energy eigenstate and eigenvector.
We then repeat this calculation for a wide range of interaction strengths and various
gate potentials. For selected eigenstates of the system, we calculate the expectation
value of nx , the particle density on the sites in the ring. From this local particle
density we get additional insight into the spatial structure of the eigenstates.
220 B. Schoenauer and P. Schmitteckert
Fig. 12 Low energy spectrum of the uncoupled interacting quadratic ring structure. The plot
shows the lowest energy eigenvalues of the uncoupled quadratic ring as a function of the interaction
strength U for T D 0:5 J. The ground state energy E0 is subtracted from each energy eigenvalue so
that E D E E0 . The black data points indicate the oscillation frequency obtained by td-DMRG
for a given value of U. The oscillation frequencies agree with the energy difference between the
ground state and the particular excited state. This is also the lowest excited state for interaction
strength U 0:5 J. The excited state corresponds to the electron density shown in Fig. 13b
By subtracting the value of the ground state energy from each eigenvalue of the
quadratic ring structure one arrives at the spectrum that is depicted in Fig. 12. For
U D 0, one finds a twofold degeneracy of the ground state energy and the first
excited level. A finite interaction strength lifts both degeneracies. A further increase
in interaction strength leads to a steep growth in energy for respectively one of
each of the previously degenerate states. One particular excited state is however
only slowly growing in energy as a function of U and becomes the lowest exited
state for U > 0:5 J. This dependence on interaction strength is reminiscent of the
frequency of the ring current oscillation. A comparison of the two does indeed show
a good agreement, indicating that this state is the main contributor to the oscillation
phenomenon. The energy of the excited state asymptotically approaches a value
E D 0:5 J D T for U ! 1 and is E D 0:25 J D 2T in the limit of vanishing
interaction. From this we conclude that the ring site, to which the gate voltage is
applied, is completely empty in the ground state and completely occupied in the
particular excited state for U ! 1. In Fig. 13 we show the local electron density
on the sites of the ring for an interaction strength U D 2:0 J and a gate potential
T D 0:5 J for four eigenstates of the quadratic ring structure with the lowest energy.
One can see that the two lowest energies correspond to states that represent a charge
density wave. For the ground state one discovers, that the left and right ring sites
are filled while the top and bottom sites are empty. The lowest excited state can be
described by the opposite picture. Now the top and bottom ring sites are filled while
the other two sites are empty. The other two eigenstates comprise a different number
of electrons in the system. With increasing interaction, a half-filling of the system
Non-linear Quantum Transport in Interacting Nanostructures 221
Fig. 13 Local electron density on the sites of the uncoupled interacting quadratic ring structure.
The images display the local electron density on the sites of the quadratic ring structure for
U D 2:0 J and T D 0:5 J. The local particle density is given as the expectation value hOnx i of
the eigenstates corresponding to the four lowest energy eigenvalues. The two lowest energies are
characterized by a particle density wave shown in (a) and (b). The ground state (a) is the particle
density wave with small electron density on the site of the gate potential T . In the state (c) the
ring is occupied by a single electron and in state (d) by three electrons. States (c) and (d) become
energetically less favorable compared to (a) and (b) for increasing interaction strength U
becomes increasingly favorable, explaining why these two state are far higher in
energy as compared to the ground state of the system for strong interaction.
From the spectrum of the interacting ring structure we have identified the two
states that are likely to be involved in the ring current oscillation. The ground state
of the system corresponds to a density wave with a small local electron density on
several ring sites. It thus seems reasonable that at least one other state is involved
in the charge transport through the ring structure. If this state was the particular
excited state, an oscillation between the ground state and the excited state might be
the cause for the ring current oscillation. We have therefore performed an additional
series of td-DMRG calculations, where the time-dependent reduced density matrix
of the ring structures has been measured. From these calculations, one can determine
whether such an oscillation between the eigenstates takes place.
Studying the uncoupled rings we have seen that two particular low-lying eigenstates
of the ring structures have an energy difference that coincides with the frequency
of the ring current oscillation. We are consequently interested in how these two
222 B. Schoenauer and P. Schmitteckert
10.1 VSD T =e
In Fig. 14a we observe an oscillation of both the ground state and the first excited
state. However both oscillations differ in frequency. The ground state oscillation
frequency is equal to the bias voltage. We thus conclude that this oscillation is
due to the finite size effect discussed in [7]. The excited state oscillates with the
frequency, that we expect from our calculation for the uncoupled ring, and is equal
to the frequency of the ring current oscillation. This is another indicator that said
excited state contributes to the ring current oscillation. All other eigenstates of the
uncoupled ring have been studied in the same fashion. Only few of them exhibit
an oscillatory behavior but none has a frequency that matches the frequency of the
ring current oscillation. The most notable of the other eigenstates is the one that
corresponds to Fig. 13c. It also oscillats with frequency ! D VSD having the same
amplitude as the ground state and a phase shift of compared to the ground state.
The probability of the ground state and the particular excited state as a function of
time changes significantly for bias voltages larger then the on-site gate potential.
Non-linear Quantum Transport in Interacting Nanostructures 223
Fig. 14 Time-dependent reduced density matrices of the interacting ring structures. The figures
show the probability of the ground state (right axis) and the first excited state (left axis) as a function
of time. Figures (a) and (b) picture the probabilities for the quadratic ring structure (M D 72
sites) in the case (a) VSD T =e and (b) VSD > T =e. In (a) we have chosen T D 1:0 J and
U D 2:0 J for which we find an oscillation frequency of the ring currents ! 0:7. The oscillation
frequency of the probability of the first excited state has the same frequency. This indicates that the
particular excited state is involved in the oscillation effect. The ground state probability oscillates
with ! D VSD . In (b) we find a similar behavior of the ground state probability and the excited state
probability for the hexagonal ring structure and parameters U D 2:0 J and T D 0:5 J. (b) shows
an increasing probability of the excited state modulated by a frequency ! that does not match the
ring current oscillation frequency for U D 2:0 J and T D 0:5 J
One can no longer observe a distinct oscillation of the ground state with a frequency
! D VSD or any other frequency. The probability of the first excited state as a
function of time is now qualitatively different as well. The probability now increases
seemingly linearly and an oscillation is solely modulated onto this linear function.
The frequency of this oscillation does also not match frequency of the oscillation
of the ring currents. Due to its linear growth the probability of the excited states
now reaches values of up to 103 whereas the probability of the ground state is
slightly smaller than before. When examining the other eigenstates of the uncoupled
224 B. Schoenauer and P. Schmitteckert
quadratic ring structure we find none that oscillate with the same frequency as the
ring currents.
The study of the time-dependent reduced density matrix also points to the
particular excited state as a substantial state that contributes to the ring current
oscillation. However it also raises further questions. The occupation probability of
the particular excited state is of order 104 , which is small considering that the ring
currents have oscillation amplitudes 102 eJ=h. A time evolution calculation for
the uncoupled ring using exact diagonalization and the occupation probabilities of
the ground state and the one excited state from the reduced density matrix yields ring
current oscillation that possess the right frequency and phase but only amplitudes
of order 105 eJ=h. This discrepancy has yet to be understood. A second question
concerns the probability of the particular excited for bias voltages larger than the
on-site gate potential T . The occupation probability increases monotonously while
the ring current oscillation retains the behavior seen for smaller voltages. This is
also not consistent with an explanation that assumes a switching between ground
state and excited state as the cause of the ring current oscillation.
Acknowledgements This work was performed on the computational resource ForHLR I funded
by the Ministry of Science, Research and the Arts Baden-Württemberg and DFG ("Deutsche
Forschungsgemeinschaft") within project QWHISTLE.
1. Bohr, D., Schmitteckert, P.: The dark side of benzene: interference vs. interaction. Ann. Phys.
524(3–4), 199–204 (2012)
2. Bohr, D., Schmitteckert, P., Wölfle, P.: Dmrg evaluation of the kubo formula – conductance of
strongly interacting quantum systems. Europhys. Lett. 73, 246 (2006)
3. Bohr, D., Schmitteckert, P.: Strong enhancement of transport by interaction on contact links.
Phys. Rev. B 75(24), 241103(R) (2007)
4. Boulat, E., Saleur, H., Schmitteckert, P.: Twofold Advance in the Theoretical Understanding
of Far-From-Equilibrium Properties of Interacting Nanostructures. Phys. Rev. Lett. 101(14),
140601 (2008)
5. Branschädel, A., Boulat, E., Saleur, H., Schmitteckert, P.: Numerical evaluation of shot noise
using real-time simulations. Phys. Rev. B 82, 205414 (2010)
6. Branschädel, A., Boulat, E., Saleur, H., Schmitteckert, P.: Shot noise in the self-dual interacting
resonant level model. Phys. Rev. Lett. 105, 146805 (2010)
7. Branschädel, A., Schneider, G., Schmitteckert, P.: Conductance of inhomogeneous systems:
real-time dynamics. Ann. Phys. 522(9), 657–678 (2010)
8. Branschädel, A., Schmitteckert, P.: Conductance of correlated nanostructures. In: High Perfor-
mance Computing in Science and Engineering’10. Springer, Berlin (2010)
9. Branschädel, A., Ulbricht, T., Schmitteckert, P.: Conductance of correlated nanostructures. In:
Nagel, W.E., Kröner, D.B., Resch, M. (eds.) High Performance Computing in Science and
Engineering’09, pp. 123–137. Springer, Berlin (2009)
10. Carr, S.T., Bagrets, D.A., Schmitteckert, P.: Full counting statistics in the self-dual interacting
resonant level model. Phys. Rev. Lett. 107(20), 206801 (2011)
11. Carr, S.T., Schmitteckert, P., Saleur, H.: Transport through nanostructures: finite time vs. finite
size. Phys. Rev. B 89, 081401 (2014)
Non-linear Quantum Transport in Interacting Nanostructures 225
12. Carr, S.T., Schmitteckert, P., Saleur, H.: Full counting statistics in the not-so-long-time limit.
Phys. Scr. T 165, 014009 (2015)
13. Hallberg, K.A.: New trends in density matrix renormalization. Adv. Phys. 55(5–6), 477–526
14. Landauer, R.: Spatial variation of currents and fields due to localized scatterers in metallic
conduction. IBM J. Res. Dev. 1(3), 223–231 (1957)
15. Meir, Y., Wingreen, N.S.: Landauer formula for the current through an interacting electron
region. Phys. Rev. Lett. 68(16), 2512–2515 (1992)
16. Noack, R.M., Manmana, S.R.: Diagonalization- and numerical renormalization-group-based
methods for interacting quantum systems. AIP Conf. Proc. 789, 93–163. AIP Publishing (2005)
17. Peschel, I., Wang, X., Kaulke, M., Hallberg, K. (eds.): Density Matrix Renormalization – A
New Numerical Method in Physics. Springer, Berlin (1999)
18. Schmitteckert, P.: Nonequilibrium electron transport using the density matrix renormalization
group method. Phys. Rev. B 70(12), 121302 (2004)
19. Schmitteckert, P.: Signal transport in and conductance of correlated nanostructures. In:
Nagel, W.E., Kröner, D.B., Resch, M. (eds.) High Performance Computing in Science and
Engineering’07, pp. 99–106. Springer, Berlin (2007)
20. Schmitteckert, P.: Calculating Green functions from finite systems. J. Phys. Conf. Ser. 220,
012022 (2010)
21. Schmitteckert, P.: Obtaining the full counting statistics of correlated nanostructures from time
dependent simulations. In: High Performance Computing in Science and Engineering’11.
Springer, Berlin (2011)
22. Schmitteckert, P., Schneider, G.: Signal transport and finite bias conductance in and through
correlated nanostructures. In: Nagel, W.E., Jäger, W., Resch, M. (eds.) High Performance
Computing in Science and Engineering’06, pp. 113–126. Springer, Berlin (2006)
23. Schollwöck, U.: The density-matrix renormalization group. Rev. Mod. Phys. 77(1), 259–315
24. Ulbricht, T., Schmitteckert, P.: Signal transport in and conductance of correlated nanostruc-
tures. In: Nagel, W.E., Kröner, D.B., Resch, M. (eds.) High Performance Computing in Science
and Engineering’08, pp. 71–82. Springer, Berlin (2008)
25. Walz, M., Wilhelm, J., Evers, F.: Current patterns and orbital magnetism in mesoscopic dc
transport. Phys. Rev. Lett. 113(13), 136602 (2014)
26. White, S.R.: Density matrix formulation for quantum renormalization groups. Phys. Rev. Lett.
69(19), 2863–2866 (1992)
27. White, S.R.: Density-matrix algorithms for quantum renormalization groups. Phys. Rev. B
48(14), 10345–10356 (1993)
Part III
Reactive Flows
Dietmar Kröner
The four contributions in the section “Reactive Flows” indicate that the numerical
simulations of reactive flows could be more improved concerning the accuracy, the
efficiency, the parallelization and the scalability. The first two papers and the last
one are based on the OpenFOAM software and the third one on the in–house code
TASCOM3D. All four projects were supported by the German Research Council
In the first contribution about “DNS Analysis of the Correlation of Heat Release
Rate with Chemiluminescence Emissions in Turbulent Combustion” by F. Zhang,
T. Zirwes, P. Habisreuther and H.Bockhorn the authors perform DNS computations
of the methane-air combustion in turbulent flow, modeled by the compress-
ible Navier-Stokes equations with gravity and an additional equation for species
transport, diffusion and reaction. The pressure is given by the ideal gas law,
dynamic viscosity and heat conductivity seem to be constant. The chemical reaction
mechanism consists of 18 species and 69 fundamental reactions, containing the
optically active OH radical. The main goal was the investigation of the correlation
between heat release rate and the luminescent species in turbulent flames. The
implementation uses the open source software OpenFOAM for the CFD part and
Cantera for the chemical reaction. Aim of this work is the validation of a correlation
between the presence of the OH radical, which can be measured optically, and the
heat release in the chemical reaction. Such a correlation is assumed in practice for
the technical optimization of combustion chambers.
The underlying grid for the numerical simulations contains 16 million finite
volumes and the parallel implementation uses 3,600 processor cores from the Hazel
Hen cluster.
D. Kröner ()
Abteilung für Angewandte Mathematik, Universität Freiburg, Hermann-Herder-Str. 10, 79104
Freiburg, Germany
e-mail: dietmar.kroener@mathematik.uni-freiburg.de
228 D. Kröner
1 Introduction
Heat release is the major purpose of combustion processes, which can be used for
heating, e.g. in heat exchangers, or converted to mechanical or electrical energy,
e.g. in internal combustion engines or power plants. The rate of heat release is used
to assess efficiency of the combustion process and to identify the location of the
reaction zone of the flame, which is influenced by the interaction of fluid flow and
chemical reactions. It is a fundamental property and of great importance for the
theoretical and experimental investigation of combustion processes. Traditionally,
high-speed imaging of the chemiluminescence of excited hydroxyl radicals OH*
or methylidyne radicals (CH*) with intensified cameras is used to characterize
the unsteady heat release in turbulent flames [1, 2]. This suffers from being a
line-of-sight technique with limited capability for spatial resolution. Hence, only
the integral or total heat release rate can be determined from this technique. The
correlation between heat release and chemiluminescence is determined empirically
in previous work [2, 3] and proportionality is commonly assumed which is not
based on an understanding of the underlying transport and chemical process but
rather sanctified by the obtained results. Therefore, there is a need to justify this
general linear correlation of heat release and chemiluminescence emission in a more
detailed way. In particular, the influence of turbulence or unsteady effect on this
correlation is analyzed in this work, which has not been investigated in the literature
To accomplish this, direct numerical simulations (DNS) using complex reaction
kinetics with the full reaction paths of the electronically excited OH* radical have
been applied in the present work to simulate a synthetically propagating flame front
of three-dimensions, which is perturbed artificially by different turbulent inflow
conditions. The DNS relies on the numerical solutions of the governing balance
equations without any simplifications. The full range of time and length scales of the
turbulent flow as well as the chemical reaction system is resolved to a large extent.
The fine-grained rendering of the interaction between the turbulent flow, molecular
transport and complex chemistry in DNS provides greater insight and quantitative
predictability, complementing measurements and less fine-grained turbulence and
combustion models like Reynolds averaged Navier-Stokes (RANS) or large eddy
simulation (LES) methodologies. The DNS is used in the present work to provide
a quantitative statement of the correlation of heat release and chemiluminescence
emission from turbulent combustion, whereas this is only qualitatively accessible in
DNS for Correlation of Heat Release and Chemiluminescence in Turbulent Combustion 231
2 Computational Methods
The conservation equations for the total mass, the species masses, the momentum
and the energy, together with the equation of state, constitute the basics for the
detailed description of chemically reacting flows [4, 5]:
D r .v/ (1)
.v/ D r .vv/ rp C r C g (2)
.Yk / D r .Yk v/ r jk C rPk ; k D 1:::N 1 (3)
@ Dp
.hs / D r .hs v/ r qP C C qP r (4)
@t Dt
p D RT (5)
where and v are the density and velocity vector, p and T denote the static pressure
and temperature and hs the sensible enthalpy. Yk and rPk indicate mass fraction and
reaction rate of the species k. R is the specific gas constant. The gravitational force
g acts as an external force on the cell volume. The heat source from viscous
dissipation and radiation are neglected.
The mixture-averaged diffusion flux jk , the viscous stress flux for a Newtonian
fluid and the diffusive heat flux qP are given by:
Here Dkm is the mixture-averaged diffusion coefficient between the k-th species and
the mixture, is the dynamic viscosity, is the thermal conductivity and hk is the
specific enthalpy of the k-th species. The reaction rate rPk in Eq. (3) is given by the
rate law from reaction kinetics with the rate coefficient described by the extended
Arrhenius law. Due to the usage of sensible enthalpies, the heat release caused by
chemical reactions leads to a source term in Eq. (4):
qP r D rPk h0k (9)
The open-source code OpenFOAM [6] has been used to perform the three-
dimensional DNS of turbulent combustion, where the detailed calculation of
chemistry and molecular transport has been implemented in addition to its general
capabilities for CFD modeling of non-reactive flows [7]. The code is capable of
solving the compressible reactive flow Eqs. (1), (2), (3), (4), and (5) employing
the finite volume method on unstructured grids. The detailed description of the
chemistry, i.e. the reaction rates, and transport, i.e. the diffusion coefficients, has
been accomplished by coupling with the open-source chemical kinetics library
Cantera [8]. The mixture-averaged model is used in the current work for the
diffusive mass flux, the viscous stress flux of a Newtonian fluid and the diffusive
heat flux. A detailed reaction mechanism with 18 species and 69 fundamental
reactions has been applied for premixed methane/air combustion. It consists of the
reaction mechanism for methane/air combustion by Kee et al. [9] (17 species and
58 reactions) and adds the full reaction chain of the short-lived OH* radical [10]
(1 species and 11 reactions). A general operator splitting technique has been used
for the evaluation of chemical source terms, calculating the system of chemical
reactions decoupled from the solution of the flow equations. In this case, a zero-
dimensional batch reactor has been created for each discrete cell volume and the
resulting kinetics equations are numerically integrated over the time step of the
flow, thereby resolving the smallest time scales of the chemical reaction. The solver
employs a fully implicit scheme of second order for the time derivative and a fourth
order interpolation scheme for the discretization of the convective term. All diffusive
terms are discretized with an unbounded scheme of fourth-order, too. The pressure-
implicit split-operator (PISO) algorithm has been used for pressure correction. The
reader is kindly referred to [4, 5, 11] for a detailed description of the governing
equations and the numerical procedures. Informations about code validation can be
found in [12, 13] and the references therein.
Fig. 1 Heat release rate and concentration of OH* along the flame coordinate for methane/air
mixture at D 0:9, temperature of 300 K and pressure of 1 bar
concentration of OH* cOH along the flame axis for a fixed equivalence ratio
D 0:9. It is clear that cOH starts to increase only after a considerable amount of
heat has been released. cOH , however, rises more rapidly so that positions of peak
values of qP and cOH are very close together, with a displacement of approximately
10 20 m. Thereafter, both parameters decline sharply at a similar rate to
zero. Similar results have been reported in [10] by simulations of one-dimensional
methane/air flames employing the GRI-3.0 mechanism [14], where the appearance
of OH* is found to be very close to the heat release location at different equivalence
Despite the fact that the evolution of cOH follows the generation of qP quite
well, a unique correlation between both parameters is not available. This becomes
more evident when looking at Fig. 2, where local values of cOH are plotted against
those of qP leading to an enclosed envelope curve. The arrows in the figure show
the reaction path from unburnt to burnt state. Obviously, there are generally two
values of cOH assigned to a fixed qP value and vice versa. On the right hand side of
Fig. 2, qP and cOH are scaled by their peak values and plotted against each other. The
envelope curves coincide for lean flames with < 1, indicating a similar correlation
of qP and cOH in this range. The correlation is attenuated for higher values, which
can be identified by the increased distance between the lower and upper parts of the
envelope curve, for example, by comparing the curves for D 1:1 and D 1:2 in
Fig. 2 on the right. Although a direct proportionality cannot be observed for cOH
and qP , the generation of OH* is strongly coupled with heat release, as shown in
Figs. 1 and 2. Even a quasi-linear relationship can be identified for the upper part of
the envelope curve, where cOH and qP decrease from its maximum values.
234 F. Zhang et al.
X 10
φ = 0.7 1 φ = 0.7
φ = 0.8 φ = 0.8
2.5 φ = 0.9 φ = 0.9
φ = 1.1 φ = 1.1
2 φ = 1.2
φ = 1.2
0 0
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
X 10
Heat release rate [W/m3] Normalized heat release rate [–]
Fig. 2 Envelope curves of heat release rate and concentration of OH* obtained from one-
dimensional flame calculations at different stoichiometries
Corr (q,c)
0.6 0.8 1 1.2 1.4 1.6
In Fig. 3, the correlation coefficients R are evaluated from data pairs of cOH and
qP for different equivalence ratios, which are particularly high (0:9) in regions with
< 1. This behavior can also be detected in the envelope curves in Fig. 2 on the left,
where the upper and lower trajectories are closer to each other in case of lean flames.
The correlation coefficient decreases in fuel-rich flames because intermediate
species with higher hydrocarbons are formed lowering the level of released heat
substantially. Hence, a quasi-proportionality relation between cOH and qP can be
stated generally only for lean premixed flames. Because the chemiluminescence
measurement in experiments gathers light only along one viewing direction, the
integral or line-of-sight summed correlation of cOH and qP is studied in the following
by three-dimensional DNS.
DNS for Correlation of Heat Release and Chemiluminescence in Turbulent Combustion 235
A synthetic flame front is considered which propagates freely in a cubic domain with
the size of 555 mm3 . Along the stream-wise direction, fresh gas with methane/air
mixture enters the domain at the inlet with D 0:9, T D 300 K and p D 1 bar.
The product gas leaves the domain on the other side (outlet). The lateral faces are
defined as symmetry planes to avoid a loss of mass. The turbulence is prescribed by
means of an inflow generator, which provides a spatially and temporally correlated
velocity field at the inlet boundary for each time step [15, 16]. The bulk flow velocity
and the turbulence properties at the inlet are adjusted so that the flame front cannot
propagate out of the computational domain. This setup may be considered as a small
segment of a real flame with a continuous counterflow of fresh gas to the flame front,
as shown in Fig. 4.
Two turbulent Reynolds numbers with Ret D 15 and Ret D 69 are considered
for the inflow condition, which are based on the integral length in the stream-
wise direction (lx D 0:5 and 2.5 mm) and the turbulence intensity (u0 D 0:5
and 0.75 m/s). The length scales in the lateral directions lr are set to 1 mm. The
turbulence parameters are used as input for the inflow generator. A partially non-
reflecting boundary condition (NRBC) proposed by Poinsot and Lele [17] has been
applied to the inlet and outlet boundaries to avoid spurious reflection of pressure
waves at those boundaries. The computational domain is discretized into 16 million
finite cell volumes with an equidistant resolution of 20 m in each direction, which
is smaller than the Kolmogorov micro-scale estimated by D lx;r =Ret [4] and is
able to resolve the planar unstrained reaction zone with approx. 20 cells. A uniform
flow field and chemical scalars obtained from calculation of the corresponding one-
dimensional laminar flame have been used to initialize the simulation. The DNS
Flame front 5 mm
try 5m
Symme m
5 mm
Fresh mixture
Fig. 4 Schematic illustration of the computational domain and boundary conditions used for DNS
of a synthetically turbulent flame front
236 F. Zhang et al.
have been run for 40 ms with a time step of 0.5 s, which allows a maximum CFL
number of approx. 0.1.
3.2.2 Performance
In previous works [12, 13, 18], the implemented DNS solver in OpenFOAM has
proven to exhibit an excellent parallel scalability on different supercomputers, e.g.
the Cray XE6 (HERMIT) cluster maintained by the high performance computing
center Stuttgart (HLRS) and the JUQUEEN cluster with the IBM Blue Gene/Q
architecture from the Jülich Supercomputing Centre (JSC) [19]. Figure 5 shows
a scalability anaysis of the DNS solver performed on the secondarily installed
Cray XC40 machine (HORNET) from HLRS, where the test case is given by a
three-dimensional hydrogen/air flame at laboratory scale with a computational grid
consisting of 144 million cells [12]. A very good parallel performance is confirmed
by running the code for this case with up to 14,400 processor cores. Even a super-
linear behavior can be detected, indicating that the code is able to exploit the
full capacity of the HPC machine. Therefore, the DNS solver is able to speed-
up efficiently while running in parallel with a large number of processors. The
DNS in the present work have been conducted on the Cray XC40 (HAZEL HEN)
system [20]. For each case with Ret D 15 and Ret D 69, the DNS have been run
with 3,600 processor cores for 3 computing days, therewith, consuming approx.
520,000 core hours in total.
Measured values Measured values
Ideal Speed-up Ideal efficiency
8 1.2
Incremental Speed-up [−]
Efficiency [−]
Fig. 5 Incremental speed-up (left) and efficiency (right) obtained from running the OpenFOAM
based DNS code on the HPC platform Cray XC40 (HORNET) from HLRS [20] (normalized to
1800 processor cores)
DNS for Correlation of Heat Release and Chemiluminescence in Turbulent Combustion 237
3.2.3 Results
Figure 6 (left and middle) shows instantaneous contours of the heat release rate qP
and the OH* concentration cOH at a slice passing through the centerline axis of the
domain for Ret D 15 (top) and Ret D 69 (bottom). The flame front is corrugated
due to the non-uniform inflow condition. The flame is more wrinkled in case of
higher Ret due to the more intensive turbulent fluctuations. Similar to the results
obtained from the one-dimensional simulations in Sect. 3.1, qP and cOH have found
to correlate strongly with each other for both Ret numbers, which can be detected
by the very similar contours of qP and cOH . Figure 6 on the right depicts the joint
probability density function (PDF) of qP and cOH by using data pairs extracted from
the entire flame volume (Pq > 0). The solid lines indicate evolution of cOH .Pq/
obtained from the corresponding one-dimensional simulation, as shown in Fig. 2,
which is representative for the correlation of qP and cOH under turbulent conditions
too. A scattering around the envelope curve from the laminar flame is however
detected for the turbulent flame case. This is mainly attributed to the fact that
the intrinsic flame structure, i.e., profiles of the chemical scalars along the flame’s
normal coordinate, is altered locally by the turbulent flow via stretching. Moreover,
the flame undergoes a relaxation time to respond to the unsteady flow, leading to an
effect of time history. The scattering is broader in case of the higher turbulence level
Fig. 6 Instantaneous contours of heat release rate (left) and OH* concentration (middle) as well
as joint PDF of these parameters (right) for two different turbulent Reynolds numbers: Ret D 15
at the top and Ret D 69 at the bottom
238 F. Zhang et al.
Fig. 7 Iso-surface of temperature and decomposition of the domain into finite rays along one line-
of-sight direction
with Ret D 69, because the turbulence intensity and turbulent time scale is larger in
this case, leading to an increased stretching and response time of the flame.
Figure 7 presents a three-dimensional flame front defined by the T D 1500 K
isotherm for the case with Ret D 69. In order to analyse the line-of-sight correlation
of qP and cOH , the computational domain is decomposed in a number of rays defined
by a fixed viewing direction and an area A, as illustrated in Fig. 7. The heat release
and concentration are then integrated along these rays leading to their area-specific
integral values:
1 X
Q D ds D i V i ; D qP ; cOH (10)
In Eq. (10), discrete values of qP and cOH from each cell volume i, located within
one single ray volume, are summed up. In accordance with the view angle and the
instantaneous flame front shown in Fig. 7, the line-of-sight summed qP and cOH
calculated from Eq. (10) are shown in Fig. 8 for two different averaging areas. A
strong correlation of these integral values can be identified by comparing contours
of e
qP and cQ OH in Fig. 8. As expected, a sharp image can be obtained with thinner
rays or smaller A, respectively. The wrinkling of the flame front caused by flame-
turbulence interaction leads to larger integral values of qP and cOH , because the
flame surface may be passed through more frequently (more than once) by the rays.
As illustrated on the top right of Fig. 9, the turbulent flame surface is crossed by one
single ray for three times, leading to a triple reaction zone along this specific ray.
DNS for Correlation of Heat Release and Chemiluminescence in Turbulent Combustion 239
~. ~
q COH *
Fig. 8 Line-of-sight summed heat release rate (left) and OH* concentration (right) for two
different specific areas A D 0:1 0:1 mm2 (top) and A D 0:2 0:2 mm2 (bottom)
Fig. 9 Profiles of heat release rate and OH* concentration along one single ray passing through
the flame surface
240 F. Zhang et al.
2 2
ΔA=0.1x0.1 mm ΔA=0.2x0.2 mm
Re =69: c =7.181e–20q1.080 Re =69: c =8.390e–20q1.070
t t
1e–12 1.039 1.037
Re =15: c =1.322e–19q Re =15: c =1.357e–19q
t t
4e–13 Re =69
Re =15
2e–13 Re =69–Total
Re =15–Total
0 1 2 3 4 5 0 1 2 3 4 5
2 6 2 6
q [W/m ] X 10 q [W/m ] X 10
sum sum
Fig. 10 Correlation of integral heat release rate and OH* concentration for different cross-section
areas of rays and turbulent Reynolds numbers
The line-of-sight summed qP and cOH from different averaging areas are plotted
against each other in Fig. 10. Due to the reduction of data from three-dimensions
to two-dimensions by summing up along the viewing direction, the integrated
values show a higher correlation than the local values, as demonstrated in Fig. 10
by the correlation coefficients. Consequently, fitting of the ray value pairs via a
power function of the shape cQ OH D a e qP leads to exponents which are very
close to unity (the fitting coefficients are displayed in the legend), indicating a
quasi-linear relationship between e qP and cQ OH . The total volume integration of qP
and cOH by considering a single ray spanning the whole domain is depicted by
“*” symbols in Fig. 10, which lie fairly on the fitted curves too. The quasi-linear
correlation is even stronger for lower turbulence levels and larger cross section
areas A, where the fitted exponent as well as the correlation coefficient is closer
to unity. Although not shown here, similar results have been found when looking
from other viewing angles. Therefore, the application of a proportional correlation
between the line-of-sight summed concentration of chemiluminescent species and
heat release is reasonable for turbulent combustion. This result is applicable to other
lean equivalence ratios too, as long as a strong correlation exists locally in this case
(see Fig.3).
be formulated by summing up the integral intensities from each ray or pixel of the
chemiluminescence imaging
P t D A
Q e
qP k D A F e
Ik (11)
kD1 kD1
with the total number of pixels N. The thermal load Q P th represents the time-
mean value of the overall heat release rate, which is known from the operating
conditions or the set mass flow of the fuel stream, respectively. For a time series
of chemiluminescence snapshots, QP th can be expressed as
P th D Q
Q P t D F A e
Ik (12)
QP th
FD P (13)
A e Ii
In Eqs. (11), (12), and (13), “ e ” and “ ” indicate line-of-sight summed and
time-averaged values. A is the pixel size of the camera in the experiment which
represents the cross-section area of the rays discussed in Sect. 3.2.3. The line-of-
sight summed intensity of light eI is directly measured in the experiment. In this way,
Eq. (11) predicts that the heat release rate is proportional to the chemiluminescent
emission by a constant factor given by the ratio of their overall time-mean values.
4 Conclusions
Acknowledgements The authors wish to acknowledge the financial support by the German
Research Council (DFG) through the Research Unit DFG-BO693/27 “Combustion Noise”. This
study has used computing resources from the High Performance Computing Center Stuttgart
(HLRS) at the University of Stuttgart, Germany. The authors gratefully acknowledge assistance
from these Communities.
1. Weyermann, F., Hirsch, C., Sattelmayer, T.: Influence of boundary conditions on the noise
emission of turbulent premixed swirl flames. In: Schwarz, A., Janicka, J. (eds.) Combustion
Noise, pp. 151–178. Springer, Berlin/Heidelberg (2009)
2. Copeland, C., Friedman, J., Renksizbulut, M.: Planar temperature imaging using thermally
assisted laser induced fluorescence of OH in a methane-air flame. Exp. Therm. Fluid Sci. 31,
221–236 (2007)
3. Lauer, M.R.W.: Determination of the heat release distribution in turbulent flames by chemilu-
minescence imaging. Ph.D. thesis, Technical University Munich (2011)
4. Poinsot, T., Veynante, D.: Theoretical and Numerical Combustion, 2nd edn. Edwards Inc.,
Philadelphia (2005)
5. Kee, R.J., Coltrin, M.E., Glarborg, P.: Chemically Reacting Flow: Theory and Practice. John
Wiley & Sons Inc., Hoboken (2003)
6. OpenCFD Ltd.: OpenFOAM User Guide, Version 2.3.0 (2014)
7. Komen, E., Shams, A., Camilo, L., Koren, B.: Quasi-DNS capabilities of OpenFOAM for
different mesh types. Comput. Fluids 96, 87–104 (2014)
8. Goodwin, D.G.: Cantera C++ User’s Guide. California Institute of Technology, California
9. Kee, R.J., Grcar, J.F., Smooke, M.D., Miller, J.A.: A Fortran Program for Modeling Steady
Laminar One-Dimensional Premixed Flames. Report No. SAND85–8240. Sandia National
Laboratories, Albuquerque (1985)
10. Kathrotia, T., Riedel, U., Seipel, A., Moshammer, K., Brockhinke, A.: Experimental and
numerical study of chemiluminescent species in low-pressure flames. Appl. Phys. B 107, 571–
584 (2012)
11. Ferziger, J., PeriKc M.: Computational Methods for Fluid Dynamics. Springer, Berlin/New York
DNS for Correlation of Heat Release and Chemiluminescence in Turbulent Combustion 243
12. Zhang, F., Bonart, H., Zirwes, T., Habisreuther, P., Bockhorn, H., Zarzalis, N.: Direct
numerical simulation of chemically reacting flows with the public domain code openfoam.
In: Nagel, W.E., Kröner, D., Resch, M. (eds.) High Performance Computing in Science and
Engineering’14, pp. 221–236. Springer, Berlin/Heidelberg (2015)
13. Zirwes, T.: Weiterentwicklung und Optimierung eines auf OpenFOAM basierten DNS Lösers
zur Verbesserung der Effizienz und Handhabung. Bachelors thesis, Karlsruhe Institute of
Technology, Karlsruhe (2013). http://digbib.ubka.uni-karlsruhe.de/volltexte/1000037538
14. Smith, G.P., Golden, D.M., Frenklach, M., Moriarty, N.W., Eiteneer, B., Goldenberg, M.,
Bowman, C.T., Hanson, R.K., Song, S., Gardiner, W.C., Lissianski, V.V., Qin, Z.: (1999). http://
15. Klein, M., Sadiki, A., Janicka, J.: A digital filter based generation of inflow data for spatially
developing direct numerical or large eddy simulations. J. Comput. Phys. 286, 652–665 (2003)
16. Zhang, F., Habisreuther, P., Bockhorn, H.: Application of the unified turbulent flame-speed
closure (UTFC) combustion model to numerical computation of turbulent gas flames. In:
Nagel, W.E., Kröner, D., Resch, M. (eds.) High Performance Computing in Science and
Engineering’12, pp. 187–206. Springer, Berlin/Heidelberg (2012)
17. Poinsot, T., Lele, S.: Boundary conditions for direct simulation of compressible viscous flows.
J. Comput. Phys. 101, 104–129 (1992)
18. Zhang, F., Bonart, H., Habisreuther, P., Bockhorn, H.: Impact of grid refinement on turbulent
combustion and combustion noise modeling with large eddy simulation. In: Nagel, W.E.,
Kröner, D., Resch, M. (eds.) High Performance Computing in Science and Engineering 13,
pp. 259–274. Springer, Berlin/Heidelberg (2013)
19. IBM: IBM Blue Gene/Q – JUQUEEN. http://www.fz-juelich.de/ias/jsc/EN/Expertise/
20. Cray Inc.: Cray XC40 – HAZEL HEN. http://www.hlrs.de/systems/cray-xc40-hazel-hen/
Direct Numerical Simulation of Non-premixed
Syngas Combustion Using OpenFOAM
Abstract A direct numerical simulation (DNS) solver for turbulent reacting flows
is developed using libraries and functions from the open-source computational fluid
dynamics package OpenFOAM. The solver serves as a reference for developing
sub-grid scale models for the large eddy simulation (LES) of turbulent flames.
DNS typically requires spatial and temporal discretisation schemes of high order,
which are not readily available in OpenFOAM. We validate our OpenFOAM solver
by performing direct numerical simulations of a well-defined DNS case featuring
non-premixed syngas combustion in a double shear layer. This configuration has
previously been studied by Hawkes et al. (Proc Combust Inst 31:1633–1640, 2007)
using a purpose-built, high-order DNS solver. Despite the lower discretisation
schemes of OpenFOAM, simulation results agree very well with the reference DNS
data. Local extinction and re-ignition of the syngas flame are captured and effects
of differential diffusion are highlighted. Parallel scaling results using the HazelHen
architecture of HLRS Stuttgart are reported.
1 Introduction
2 Governing Equations
The governing equations for DNS of incompressible turbulent reacting flow are
@ @uj
C D 0; (1)
@t @xj
The investigated set-up is a temporally evolving double shear layer burning syngas
within two counterflowing streams of hot oxidizer, as shown in Fig. 1. This
configuration is identical to case L described in [10], with a jet Reynolds number
of 2510. Fuel and oxidizer move in opposite directions across the domain with
a characteristic velocity U D Ufuel Uoxidizer D 145 m/s. To trigger the onset
of turbulence from the initially laminar conditions, velocity perturbations with an
amplitude of 0.05U and an integral length scale of H=3 are superimposed within the
fuel stream, with H (D 0.72 mm) being the width of the jet at t D 0. The dimensions
of the computational domain are Lx Ly Lz D 8:64 10:065 5:76 mm3 . The
flame is initialized by setting a laminar mixture fraction profile and retrieving the
initial species distributions from a pre-computed flamelet table. A reduced, non-stiff
248 S. Vo et al.
Fig. 1 Computational domain, illustrated by the instantaneous mixture fraction field at t=tj D 20
extinction (t=tj D 20, where tj D H=U) the Kolmogorov scale was resolved by
a minimum of 1.2 cells. The flame structure was resolved by at least 10 grid points,
considering the half-width of the OH reaction rate profile of a steady diffusion
flame at half the extinction strain rate. It was also reported that cases run at half
the resolution gave first and second moments of the solution variables in good
agreement with the full resolution case [10]. For our OpenFOAM simulations we
consider the identical 576 672 384 150M uniform grid resolution, to allow
for a direct comparison of S3D and OpenFOAM on the same grid. In addition,
we run OpenFOAM simulations at half the original resolution in every coordinate
direction, resulting in 288 336 192 18M cells.
For our solver evaluation we compare the results from a set of four different DNS
calculations. The datasets “ITV-OF-Le1 150M” (black lines, see Fig. 2) and “ITV-
OF-Le1 18M” (red) are OpenFOAM calculations assuming unity Lewis number and
using the 150M and 18M grid, respectively. The dataset “ITV-OF-DD 18M” (green)
also uses OpenFOAM, but accounts for differential diffusion and is calculated on
the 18M grid. The label “SAN-S3D-DD 150M” (blue) refers to the reference DNS
from [10] using S3D, 150M and including realistic thermodynamic properties. In
the following we evaluate the DNS resolution requirements first, followed by a
discussion of the major flame characteristics and the level to which they are captured
by the different DNS calculations.
The resolution requirements for our DNS are evaluated by comparing statistics
of the scalar dissipation rate . The scalar dissipation rate is proportional to the
square of the mixture fraction gradient and therefore a sensitive indicator of grid
resolution effects. In addition, plays an important role for potential extinction and
re-ignition of turbulent non-premixed flames. Figure 2 shows cross-stream profiles
of the mean scalar dissipation rate at normalized jet times 10 t=tj 40. In
this temporally evolving double shear layer configuration statistics are calculated
by averaging across the homogeneous x-z-plane to obtain mean and RMS values
250 S. Vo et al.
(a) (b)
0.5 0.4
0 0
−4 −2 0 2 4 -4 -2 0 2 4
y/H y/H
(c) (d)
1 0.6
Normalized dissipation mean
0 0
-4 -2 0 2 4 -6 -4 -2 0 2 4 6
y/H y/H
Fig. 2 Cross-stream profiles of the normalized mean scalar dissipation rate at (a) t=tj D 10, (b)
t=tj D 20, (c) t=tj D 30, (d) t=tj D 40. The dissipation rate is normalized by its value at extinction
of the corresponding laminar flamelet (q D 2194 1/s)
at each fixed location y=H. It can be seen that all OpenFOAM calculations are
in reasonable agreement with the S3D reference data, with only minor deviations
becoming apparent at the late times t=tj D 30; 40, where the 18M calculation
assuming unity Lewis number shows the strongest (yet acceptable) discrepancies.
A similar trend can be observed for the scalar dissipation rate RMS shown in Fig. 3.
Here, small deviations from the reference dissipation RMS can already be observed
at t=tj D 10 (when turbulence develops). They increase, within acceptable bounds,
until the latest time, t=tj D 40. Even at this stage, after all four simulations have
evolved independently from each other for 40 jet times, the scalar dissipation rate
RMS profiles are in close agreement with each other, with the most pronounced
deviations again for the 18M unity Lewis number run. Note that only axisymmetric
cross-stream profiles from S3D are available, whereas the full y=H coordinate
is plotted from the OpenFOAM simulations, explaining the perfect symmetry of
the S3D results. Figure 4 shows PDFs of the scalar dissipation rate conditional
on mixture fraction being near stoichiometric at t=tj D 20, when the resolution
requirements are most critical for capturing local extinction.
It can be seen that the scalar dissipation rate PDFs agree very well for a wide
range of , with the high-order S3D simulation capturing extreme dissipation rate
DNS of Non-premixed Syngas Combustion 251
(a) (b)
Normalized dissipation RMS
0 0
−4 −2 0 2 4 -4 -2 0 2 4
y/H y/H
(c) (d)
2 1.2
Normalized dissipation RMS
1 0.6
0.5 0.3
0 0
-4 -2 0 2 4 -6 -4 -2 0 2 4 6
y/H y/H
Fig. 3 Cross-stream profiles of the normalized scalar dissipation rate RMS at (a) t=tj D 10, (b)
t=tj D 20, (c) t=tj D 30, (d) t=tj D 40. The dissipation rate is normalized by its value at extinction
of the corresponding laminar flamelet (q D 2194 1/s)
ITV-OF-Le1 150M
ITV-OF-Le1 18M
0.0001 ITV-OF-DD 18M
0 15000 30000 45000 60000
χ [s ]
Fig. 4 PDF of the scalar dissipation rate conditional on mixture fraction being in the interval
fst ˙ 0:2 (main reaction zone) at t=tj D 20
252 S. Vo et al.
events of the order of 70,000 1/s, followed by the 150M OpenFOAM simulation
with a peak at 60,000 1/s, and the two 18M OpenFOAM runs recovering slightly
smaller scalar dissipation rate peaks. A closer inspection shows that the discrep-
ancies for extreme scalar dissipation events only affect considerably less than 1 %
of the total number of dissipation rate samples. Overall, despite the significantly
lower order of spatial and temporal discretisation available in OpenFOAM, scalar
dissipation rate profiles are well resolved, and even simulations using half the
reference resolution should provide adequate flame predictions.
ITV-OF-Le1 150M
1600 ITV-OF-Le1 18M
Temperature [K]
0 10 20 30 40
ITV-OF-Le1 150M ITV-OF-Le1 150M
ITV-OF-Le1 18M ITV-OF-Le1 18M
0 0
-4 -2 0 2 4 -4 -2 0 2 4
y/H y/H
(a) t/t j = 20 (b) t/t j = 20
ITV-OF-Le1 150M
0.3 ITV-OF-Le1 150M
1 ITV-OF-Le1 18M ITV-OF-Le1 18M
ITV-OF-DD 18M Mixture fraction RMS ITV-OF-DD 18M
Mixture fraction
0 0
-6 -4 -2 0 2 4 6 -6 -4 -2 0 2 4 6
y/H y/H
(c) t/t j = 40 (d) t/t j = 40
Fig. 6 Cross-stream profiles of the (a), (c) mean and (b), (d) RMS mixture fraction at t=tj D 20
and t=tj D 40
0.3 0.1
0.15 0.05
0 0
-4 -2 0 2 4 -4 -2 0 2 4
y/H y/H
Fig. 7 Cross-stream profiles of the CO mass fraction at t=tj D 20, (a) mean, (b) RMS
at t=tj D 40, the mean profile of the coarse unity Lewis number simulation
shows a slightly decreased peak value and a mild over-prediction of the mixture
fraction RMS near the centre of the domain, while the RMS deviations from
the reference data near y/H D 0 decrease by using more cells or accounting for
differential diffusion in OpenFOAM. Figure 7 shows cross-stream profiles of the
254 S. Vo et al.
0 0
-4 -2 0 2 4 -4 -2 0 2 4
y/H y/H
Fig. 8 Cross-stream profiles of the H2 mass fraction at t=tj D 20, (a) mean, (b) RMS
8e-05 5e-05
4e-05 2.5e-05
0 0
-4 -2 0 2 4 -4 -2 0 2 4
y/H y/H
Fig. 9 Cross-stream profiles of the HO2 mass fraction at t=tj D 20, (a) mean, (b) RMS
(a) (b)
considering differential diffusion improves the results, albeit not uniformly across
mixture fraction space, but mainly on the lean side of stoichiometric. This is likely
because including differential diffusion allows H2 to diffuse faster from the centre of
the domain towards the top and bottom boundary, i.e. into the oxidizer streams. At
the late time t=tj D 40 this effect is less pronounced, as the scalar fields are generally
more homogenous and differential diffusion plays a less dominant role. Hence,
all OpenFOAM simulations yield similar predictions, mildly under-predicting the
conditional mean temperature of the reference DNS.
5 Parallel Performance
Strong and weak scaling tests are performed in order to assess the parallel perfor-
mance of the unity Lewis number OpenFOAM solver on the HazelHen architecture
of HLRS. For the strong scaling analysis the total number of CFD cells is kept
constant at 150M and the number of requested computer cores is increased by
constant factors of two from 128 up to 2048. Figure 11a plots the strong scaling
efficiency (based on the 128 core run) versus the number of computer cores. It can
be observed that the strong scaling efficiency drops significantly when moving from
128 to 256 and 512 cores, but it remains at a constant level of approximately 50 %
when the number of cores is further increased to 1024 or 2048. Weak scaling was
assessed by keeping a constant number of CFD cells per core (86 K) and performing
DNS with 64, 216 and 1728 computer cores, which resulted in total problem sizes
of 5M, 18M and 150M CFD cells, respectively. A similar analysis was carried out
with increased numbers of CFD cells per core (172 K and 344 K), but the results
did not change significantly. Figure 11b presents the results of the weak scaling
study, based on the 64 core 5M cell run. It can be observed that the weak scaling
efficiency remains high, only decreasing to 93 % for 216 and 90 % for 1728 cores.
Standard computations of 150M cells have been carried out on 1024 codes at a
256 S. Vo et al.
1 1
parallel efficiency
parallel efficiency
0.75 0.75
0.5 0.5
0.25 0.25
0 0
0 800 1600 0 800 1600
# of cores # of cores
(a) strong scaling (b) weak scaling
Fig. 11 Parallel performance of the OpenFOAM DNS solver (for unity Lewis number): Parallel
efficiency for (a) strong and (b) weak scaling
cost of approximately 20,000 CPU hours. Efficiencies are improved for current
computations of two-phase flows due to the need to include more complex chemical
kinetics for a realistic description of particle synthesis.
6 Conclusions
1 Introduction
The in-house CFD code TASCOM3D is used for numerical simulations of rocket
combustion chambers. The main aspects to be considered for CFD simulations of
rocket combustors are:
1. turbulence phenomena,
2. combustion processes,
3. thermodynamics and molecular transport properties,
4. grid resolution and discretization.
For an accurate simulation of rocket combustors it is essential to predict fluid
properties and flow phenomena with sufficient accuracy. The fluid properties may
differ significantly from an ideal gas behavior due to the extreme conditions in
rocket engines. Pressures up to 100 bar and more and temperatures from below
100 K for the injected propellants up to about 4000 K within the reaction zone in
the combustion chamber make these simulations very challenging. The fluids in the
combustion chamber can be in different states of matter (gas-like or liquid-like)
depending on the pressure and temperature. If a propellant is injected at cryogenic
temperature and pressure below the thermodynamic critical pressure of the fluid,
a discontinuous phase transition from liquid to gas will occur during heat up. The
liquid and gaseous phase must be handled separately.
The focus of this work is on injection at cryogenic temperatures and pressures
above the critical pressure of the fluid, where a continuous transition from a liquid-
like state to a gaseous state is observed. The liquid-like and gaseous phase can
no longer be distinguished and a combined treatment is required. This is achieved
within a single-fluid model based on real gas thermodynamics.
The summary of this report is as follows: First, a brief introduction of the
applied CFD code TASCOM3D is given in Sect. 2, followed by a description of the
implemented real gas thermodynamics model in Sect. 3. Then, results of two test
cases are presented in Sects. 4 and 5: a nonreactive liquid nitrogen jet injected into
a warm nitrogen environment and a liquid oxygen/gaseous hydrogen model rocket
combustor. Code performance issues are addressed in Sect. 6.
2 Numerical Method
The scientific in-house code TASCOM3D (Turbulent All Speed Combustion Multi-
grid Solver 3D) has been applied successfully during the last two decades to
simulate reacting and non-reacting super- and subsonic flows. Reacting flows are
described by solving the fully compressible Navier-Stokes, turbulence and species
transport equations. Additionally, an assumed PDF (probability density function)
approach is available to take turbulence-chemistry-interaction into account, though
for ideal gas simulations only. The two-dimensional conservative form of the
Reynolds-averaged Navier-Stokes equations in this work is given by
@Q @.F Fv / @.G Gv /
C C D S; (1)
@t @x @y
SK and S! are the source terms of the turbulence variables and SYi the source terms of
the species mass fractions due to combustion. For turbulence closure, two-equation
models are used, namely the q ! model of Coakley [3], the k ! model of
Wilcox [18], and Menter’s SST k ! model [12].
The spatial discretization is performed on block structured grids based on a finite
volume scheme. For the reconstruction of the cell interface values, MLPld (Multi-
dimensional Limiting Process – low diffusion) [7] with up to fifth order is used
to prevent oscillations at sharp gradients and discontinuities. MLP uses diagonal
values to improve the TVD (Total Variation Diminishing) limiting behavior [20].
Using these interface values, the AUSMC -up flux vector splitting [11] is employed
to calculate the inviscid fluxes. The unsteady set of Eq. (1) is solved with an
implicit Lower-Upper Symmetric Gauss-Seidel (LU-SGS) [8] algorithm. Fur-
thermore, finite-rate chemistry is treated in a fully coupled manner. The code
is parallelized with Message Passing Interface (MPI). More details concerning
TASCOM3D may be found in Refs. [6, 8, 16].
For sufficiently low pressures and high temperatures, the thermodynamic relation
between pressure, temperature, and density can accurately be described by the well-
known ideal gas (IG) equation of state (EOS)
pD T (4)
Ru T a 2
pD : (5)
Mw b Mw Mw C b
262 M. Seidl et al.
Fig. 1 Density of oxygen versus temperature for three different pressure levels
The parameters a (temperature dependent) and b for a mixture are obtained via
mixing and combining rules from their pure species counterparts. The SRK is
generally applicable to any pure fluid or mixture and continuously describes the p-
-T-relation for gases, liquids, and multi-phase regimes with remarkable accuracy
over a wide range of thermodynamic states. For a more detailed description and
a general introduction to real fluid properties, the interested reader is referred to
textbooks on thermodynamics, e.g. [14]. Figure 1 shows the density-temperature
relation for three different pressures for pure oxygen which is used as oxidizer in
many rocket engines. Values from NIST database are plotted together with values
predicted by the SRK EOS. A reasonable accuracy is achieved with this model and
similar ones. Depending on the pressure level in the combustor and the injection
temperatures, oxygen may be injected in a gas-like state (low density) or a liquid-
like state (high density).
For chamber pressures below the thermodynamic critical pressure of oxygen
at pcr;O2 D 50:43 bar and cryogenic injection temperatures below the saturation
temperature, the liquid oxygen (LOX) will undergo a discontinuous phase transition
during heat up in the chamber. Surface tension between the liquid and gaseous
phase leads to an abrupt and distinct separation of both phases. Associated flow
phenomena are primary and secondary atomization of the LOX jet into small
ligaments and droplets and their final evaporation into the gas phase.
In contrast, for pressures above the critical pressure or sufficiently high tem-
peratures of the injected oxygen, only a single phase will occur and a continuous
transition from the cool injection conditions to the hot reaction zone is observed.
For a consistent implementation of a real fluid EOS into a CFD code, it is
important to use Eq. (5) in combination with fundamental thermodynamic relations
Numerical Simulations of Rocket Combustion Chambers with Supercritical Injection 263
Apart from thermodynamic and transport properties, certain flow phenomena may
become non-negligible for high-pressure and low-temperature conditions present in
rocket engines. For example, the Soret effect (mass diffusion due to a temperature
gradient) or the reciprocal Dufour effect (energy flux due to species concentration
gradients) may become important in locally confined regions. Oefelein [13],
however, observed that for injection of propellants with shear-coaxial injectors
(typical for many rocket engines) their contribution may be neglected.
200 0.16
1 bar (NIST)
1 bar (NIST)
0.14 10 bar (NIST)
10 bar (NIST)
40 bar (NIST)
thermal conductivity [W/(m K)]
40 bar (NIST)
150 0.12 80 bar (NIST)
80 bar (NIST)
200 bar (NIST)
viscosity [µPa s]
0 0
100 150 200 250 300 100 150 200 250 300
temperature [K] temperature [K]
Fig. 2 Viscosity (left) and thermal conductivity (right) of oxygen versus temperature for various
264 M. Seidl et al.
The non-reactive RCM-1-A test case presented at the 2nd International Workshop
on Rocket Combustion Modeling [17] was chosen as a validation test case for the
implemented real gas model. Cryogenic nitrogen at a temperature of about 120 K is
injected through a circular duct of d D 2:2 mm diameter into a pressurized chamber
at 40 bar. The chamber is filled with gaseous nitrogen at room temperature and has a
diameter of 122 mm and a length of 1000 mm. There is some uncertainty concerning
the actual injection temperature, which is supposed to lie within a range of 120.9 and
126.9 K. At the implied injection conditions close to the pseudo-boiling point [1],
the density is very sensitive w.r.t. small changes in temperature.
Axial density distributions were measured in the experiment by Raman images
(case 5 in [2]). Steady-state RANS simulations were performed in this study. The
following setup is chosen for the presented simulation:
1. hexahedral grid with 95,000 elements and yC 1 resolution at walls;
2. 2nd order spatial discretization of inviscid fluxes with low diffusion multi-
dimensional limiting process (MLPld ) [7];
3. Menter’s k ! SST turbulence model [12];
4. turbulent Prandtl number Prt D 0:9;
5. adiabatic walls for injector and faceplate, isothermal chamber wall (T D 297 K);
6. injection temperature T D 126.9 K.
Figure 3 displays the density and temperature distribution close to the injector.
The high sensitivity of density w.r.t. temperature, as discussed before, is reflected
in these contours. An increase of 10 K roughly halves the density right after the
injection. A comparison of axial density profiles at the centerline is plotted in Fig. 4.
The RANS simulation resembles the experimental data very well. The density at
the centerline remains constant until x=d 10. Further downstream, the liquid-
like cold nitrogen core dissolves into the warm surrounding nitrogen and density
400 CFD
(kg/m3) 300
0 5 10 15 20 25 30 35 40
x / d (-)
Fig. 4 Comparison of axial density profiles at the centerline between simulation and experiment
The DLR model rocket combustor investigated by Smith et al. [15] was studied
numerically. Liquid oxygen and gaseous hydrogen are injected at cryogenic temper-
atures of 96 and 67 K and a pressure of 63 bar into a circular chamber with 50 mm
diameter. Hydrogen is also used to cool the chamber walls. A 2D-axisymmetric
simulation with a very fine grid (about 325,000 cells) is performed. Figure 5 displays
the grid resolution close to the injector superimposed with contours of the oxygen
radical. Tough the flame zone is thin, it is well resolved. The following setup is
chosen for the presented simulation:
1. 5th order spatial discretization of inviscid fluxes with low diffusion multi-
dimensional limiting process (MLPld ) [7];
2. k ! turbulence model of Wilcox [18];
3. turbulent Prandtl and Schmidt number Prt D Sct D 0:7;
4. adiabatic walls.
In the experiment, a highly turbulent and unsteady flame was observed. This
is confirmed in the present simulation. Figure 6 presents contours of water mass
fraction in the entire chamber. Instabilities in the mixing layer between the liquid
oxygen and the hydrogen jet induce vortex roll-up, which improves mixing and thus
combustion efficiency. In contrast to the experiment, no flame lift or even blow off
close to the injector is observed in the simulation. Instead, it is stably anchored at
the post-tip.
The interaction of large scale turbulent fluctuations with the flame leads to
pulsations in the heat release, which in turn induces pressure oscillations and
leads to unsteady injection conditions. This feedback mechanism can cause serious
mechanical failure of the chamber structure when the frequencies of this physical
phenomenon coincide with eigenfrequencies of the combustor geometry. In the
266 M. Seidl et al.
Fig. 5 Computational grid and contours of oxygen radical close to the injector in the DLR model
rocket combustor
Fig. 6 Water mass fraction contours in the DLR model rocket combustor (compressed by factor
of 2 in axial direction)
experiment, pressure oscillations with an amplitude of about 0.5 bar to 1.0 bar have
been observed. This could also be confirmed in the simulation.
Temperature contours up to 500 K are plotted in Fig. 7. The highly turbulent
and unsteady nature of the flame especially close to the injector is obvious. Low
temperatures within the nozzle at the centerline indicate that some unburnt oxygen
exits the combustor in the simulation. At the chamber wall, temperatures remain
rather cool (below 300 K). This confirms the effective cooling with the cryogenic
hydrogen film in the experiment.
Numerical Simulations of Rocket Combustion Chambers with Supercritical Injection 267
Fig. 7 Temperature contours in the DLR model rocket combustor (compressed by factor of 2 in
axial direction)
Supercritical injection conditions are present in most main stage rocket engines
and therefore are very interesting for future research. Due to the complexity of the
employed models and the resulting high computing times, the utilization of high
performance computing systems is inevitable. Consequently, it is crucial to examine
the performance of the code on the used platforms.
In the last two HLRS reports, comprehensive performance analysis for TAS-
COM3D on CRAY XE6 (HERMIT) and CRAY XC40 (HORNET) were per-
formed [9, 10]. A very good scaling performance (strong and weak scaling) was
During the last period, the focus for performance improvements in TASCOM3D
was on parallel I/O using MPI. All data are now stored in a single binary file and
handled by MPI I/O library routines.
7 Conclusion
Acknowledgements The presented work was performed within the framework of the SFBTR 40
funded by the Deutsche Forschungsgemeinschaft (DFG). This support is greatly appreciated. All
simulations were performed on the Cray XE6 (HERMIT) and XC40 (HORNET/HAZEL HEN)
cluster at the High Performance Computing Center Stuttgart (HLRS) under the grant number
scrcomb. The authors wish to thank for the computing time and the technical support.
268 M. Seidl et al.
1 Introduction
The importance of butadiene synthesis lies in the fact that it is one of the
basic petrochemical products. Today’s processes based on converting n-butane are
exclusively implemented in two-stage procedures. The direct dehydrogenation of n-
Supercomputing of Two-Zone Fluidized Bed Reactor 271
butane to 1,3-butadiene delivers very small yields, which is why the main products
have to undergo a second dehydrogenation to attain 1,3-butadiene. Due to financial
reasons, a single-stage process to produce 1,3-butadiene from n-butane is pursued.
So far, no single-stage process is known for the synthetization of 1,3-butadiene
where the yield and selectivity is large enough for an economic application.
Currently, two-zone fluidized bed reactors (see Fig. 1 left) show the highest yields
as well as selectivity. The separation of the feeds oxygen and butane allows the
creation of separated oxidation and reduction zones in the same reaction vessel,
between which the catalyst is circulated, thus circumventing problems associated
with the transfer of the catalyst between reactor and regenerator. The conversion
takes place in the reaction zone located above the n-butane inlet using the lattice
oxygen of the catalyst particles. In the regeneration zone at the lower part of the
fluidized bed, coke depositions on the particles are burned and the lattice oxygen of
the catalyst is filled up. After regeneration the catalyst particles penetrate again the
reaction zone due to particle mingling inside the fluidized bed.
In the last years a two-zone fluidized bed reactor (TZFBR) was designed, built
and experimentally investigated at the author’s institute (ITCP) [5]. Figure 1 (left)
shows a sketch of the reactor. The reactor consists of a 40 cm long quartz tube with
an inner diameter of 28 mm. At the bottom of the reactor a frit (pore size 160–
250 m) holds the particles and homogenizes the incoming gas flow. Inside the
reactor, a quartz tube with two holes at the end of the T-junction, serves as n-butane
inlet. The product stream leaves the reactor at a side outlet.
Fig. 1 Left: sketch of the two-zone fluidized bed reactor, measured experimentally [5]. Right:
calculation domain and boundaries
272 M. Hettel et al.
The CFD-DEM method applied for the calculation of the two phase flow is a
synthesis of CFD (Computational Fluid Dynamics) and DEM (Direct Element
Method) to model coupled fluid-granular systems and is based on the Eulerian-
Lagrange approach.
Fluid (gas or liquid) flows are governed by partial differential equations which
represent conservation laws for the mass and momentum (Navier-Stokes Equations)
and for additional scalar quantities, e.g. energy. Computational Fluid Dynamics
(CFD) is the art of replacing such systems of partial differential equations by a set of
algebraic equations which can be solved applying numerical methods using digital
computers. Within the Eulerian approach, the gas phase is modeled as a continuum.
The conservation laws (transport equations) are formally integrated over a finite
volume and discretized on a numerical grid. Each node of the grid represents the
volume averaged representation of a small section of the flow field. The solution
procedure is always iterative and defective. The smaller the finite volumes (the larger
the number of cells or grid points), the smaller the discretisation error.
The modeling of the particle phase is based on the Lagrangian approach. The
motion of each particle of the system is calculated by integrating Newton‘s equation
of motion. Various forces act on a single particle in a gas flow. The dominant forces
in the present application are drag, contact and gravity. To describe the collision
dynamics in the particulate flow the soft-sphere approach is used. The contact
forces between particles are incorporated with mechanical models consisting of
combinations of springs, dash-pots and sliders. The actual forces are calculated
based on the small overlap between particles and allow direct integration of the
particle displacement based on the contact forces. For a DEM simulation no
numerical grid is necessary. The calculation domain is restricted by geometrical
surfaces, with which the particles can interact (walls) or where they can leave the
domain (openings).
Supercomputing of Two-Zone Fluidized Bed Reactor 273
We used the CFDEM®coupling software [1] (version 2.3.1) for calculation. This
open-source platform couples the DEM engine LIGGGHTS®(version 3.3.0) [4] to
the open source CFD code OpenFoam®(version 2.3.1) [7], see also Fig. 2. The name
of the OpenFOAM solver is cfdemSolverPiso.
4 Calculation Procedure
The calculation domain (Fig. 1 right) comprises a cylinder with a length of 160 mm
and a diameter of 28 mm. The frit acts as a wall for the particles and is positioned
40 mm above of the gas inlet. The T-junction is positioned at hinlet D 55 mm above
the frit.
The height of the bed is about 90 mm. The size of the particles in the real system
is in the range of 160–250 m. We used an average diameter of 205 m, leading to
a particle number of ca. 4.8E6.
The calculations were done for isothermal conditions (T D 500 K) without
reaction (air only). The n-butane inlet was closed. The velocity at the lower inlet
was 0.23 m/s. Under these conditions the flow is laminar and no turbulence model
is needed. The gas leaves the calculation domain at the upper outlet.
Firstly, the reactor has to be filled with the particles which requires solely a DEM-
calculation. After settling of the particles, the coupled CFD-DEM calculation can
be started. On the one hand, the time-step of the particle movement has to follow
the high frequency of the particle collision dynamics. On the other hand, the time-
step has to be small enough to resolve the characteristic time in which the particles
respond to a variation in the velocity of the surrounding flow. These restrictions lead
typically to small time-steps for DEM calculations, 2.5E-6 s in our application.
The CFD-solver applies the PISO (Pressure-Implicit Split-Operator) approach
for the pressure velocity coupling. Therefore, for the time-step of the CFD holds,
that the Courant number (Co D time-step velocity/cell-size) has to be smaller than
one. We used a time-step of 2.5E-5 s.
The two codes calculate sequentially in cycles with a user defined coupling time
of 2.5E-4 s. Within one cycle the CDF-code solves ten time-steps (in sum 2.5E-4 s
274 M. Hettel et al.
physical time), afterwards the DEM-code calculates 100 time-steps (iagain 2.5E-4 s
physical time). This is done alternately. After each cycle (2.5E-4 s physical time) the
data which is necessary to capture the forces between the gas phase and the solid
phase are interchanged among the codes. The information flux between the codes is
depicted in Fig. 2. An anlysis of the calculation time yielded, that 53 % of the time
is needed from the DEM-code and 47 % from the CFD-code.
After a typical physical simulation time of five physical seconds to put the system
into operation, the calculation has to be continued to get time averaged quantities
which can be analyzed and compared with experimental data. Therefore, a physical
time of ca. 50 s is envisaged.
Figure 3 shows a snapshot of a calculation result. On the left side the fluidized bed
including the T-junction is shown. The bed is divided in half vertically to illustrate
the processes inside the bed. The colour represents the velocity magnitude of the
particles. The regions with higher velocity (green/yellow/red color) indicate bubbly
structures where the density of the particles per volume is smaller than in regions
with lower velocity (blue and cyan color). In these structures gas is transported
vertically through the bed. This process contributes to the mixture of the particles
between the two regions of the bed. If a bubble reaches the surface of the bed, an
eruption of particle clusters can be identified (right picture). To get an insight about
the grid resolution, the right picture shows some surface cells of the reactor wall. As
the geometrical data is in STL-format, each rectangular surface cell is divided with
a diagonal line into two triangular subsections for the graphical representation.
Fig. 3 Snapshot of results: fluidized bed as a whole (left) and detail near surface (right)
Supercomputing of Two-Zone Fluidized Bed Reactor 275
5 Computer Specifications
Two different supercomputers are used for the simulations. Their main features are
briefly described below.
The research cluster of the state of Baden-Württemberg JUSTUS is located
at the Communication and Information Center of the University of Ulm and is
specialized for computational and theoretical chemistry. It is a high-performance
massive parallel compute resource. Its intended use is mainly for chemistry-related
jobs with high memory requirements (RAM and/or HDD). The supercomputer
JUSTUS is suitable for user-jobs which have medium to low requirements to the
node-interconnecting InfiniBand network.
In the present study computing nodes of JUSTUS with 128 GB DDR4-RAM
have been used. Each node consists of two Intel Xeon E5-2630v3 (Haswell)
processors (with 8 cores per processor, or, 16 cores per node) having a 2.4 GHz
frequency and 20 MB cache per chip. The operating system is Red Hat Enterprise
Linux 7. The interested reader can get further details from [3, 6].
OpenFoam 2.3.1 on this cluster was compiled with the Intel®compiler 15.0 and
the corresponding MPI-library, version 5.0.3.
The massive parallel supercomputer ForHLR-I (recently being expanded by its
second-stage complement ForHLR-II) has 512 nodes with 64 GB RAM and for the
present study up to 32 of them have been used. Each node consists of 2 Deca-Core
Intel Xeon E5-2670 v2 processors (Ivy Bridge) (with 10 cores per processor, or, 20
cores per node). The processors have a 2,5 GHz frequency (max. Turbo-frequency
is 3,3 GHz).
Each one Deca-Core processor (Ivy Bridge) has 25 MB L3-Cache and operates
the system bus with a frequency of 1866 MHz. Each Core has 64 KB L1-Cache and
256 KB L2-Cache memory, see also [2]. The network has one InfiniBand 4X FDR
Interconnect. The operating system is Red Hat Enterprise Linux 6.x. For further
details, please refer to [2].
OpenFoam 2.3.1 on ForHLR-I was compiled with the GNU compiler. Currently
(april 2016), the default version of this compiler on the ForHLR-I supercomputer is
version 4.9 and the default version of the Open MPI software is version 1.8.4.
The limitations to be discussed in the following originate from different sources: the
physical modelling, the software implementation, the supercomputers’ architecture
or combinations of them.
The CFD-DEM model requires, that the size of the particles is smaller than a
portion of the fluid cell size. Optimally, the volume of the particles should not
be larger than 30 % of the volume of a fluid cell. This is because the physical
276 M. Hettel et al.
assumptions of the CFD-DEM method are only satisfied, if each cell contains a
certain portion of fluid. If the cells are too small, it could happen, that a whole
calculation cell is filled with solid material. For a particle size of 205 m the
minimal cell size has to be ca. 600 m. For the modeling of the reactor (Fig. 1) a
block-structured hexahedral grid with ca. 460.000 cells was generated. The smallest
cell size in the grid was about 0.5 mm.
The number of fluid cells per computing core governs the relation between the
amount of computational work on that core and the amount of the MPI-exchange
of information: a small number of cells per core would lead to a small amount of
work and a relatively large demand for information exchange. On the other hand, a
large amount of cells per node will increase the total duration of the computations.
A rule of thumb says that for a CFD calculation about 10.000–50.000 cells per core
usually lead to a satisfactory ratio between computing time and communication,
work thus ensuring an efficient use of the parallel resources with a good scalability.
However, if the total number of fluid cells is limited, the number of total cores
that can be utilized for the simulations necessarily becomes also limited. So, with
460.000 cells, the above rule returns a number between 9 and 46 cores. However,
this number of cores is quite low, so that also larger number of cores have been
utilized in the following tests. Thus, for the largest possible number of computing
cores (256), the number of cells per core decreased to 1800.
Another restriction that needs to be considered here, is the restriction on the
physical time step leading to an increase of the overall wall-clock time for the
simulations. In the present simulations, there are two time-steps coupled together:
for the fluid flow and for the particle tracking algorithm. The fluid flow time-step
is restricted by the Courant condition, the corresponding time-step for the particle
tracking is restricted by the model approach for the particle collision (see also
Sect. 4).
Another, probably more severe restriction is coming from the handling of
the Lagrangian particle tracking algorithm into the OpenFoam®. Currently, each
domain for the Eulerian fluid flow contains the complete information about all
particles. Because of the relatively high number of particles, the memory required
per core remains nearly constant and decreases only very slightly with the number of
cores. For the current calculations, each core needed 7 GB of RAM. This increases
considerably the overall RAM requirements per node while quickly reaching the
limits of the available RAM per node. For example, if a node of a supercomputer
has 64 GB RAM (ForHLR-I, see Chap. 5), then only a number of 64 GB/7 GB 8
cores can be used. The rest of the cores (that means 8 out of 16 for ForHLR-I) are
reserved, but not used. For this reason, the calculations shown later were performed
on maximal 8 cores per node on ForHLR-I. On the other supercomputer – JUSTUS
(see Chap. 5) there is no such limitation (128 GB RAM per node), but for reasons of
compatibility of the results, tests with the same number of cores per node (8) have
been made.
The CFDEM®Coupling software ensures a good distribution of the work load
between the cores. The statistics given at the end of computations show that the
cores are almost equally loaded: e.g. for the run on JUSTUS with 128 cores, the
Supercomputing of Two-Zone Fluidized Bed Reactor 277
largest load ratio between any two cores is 1.005. Therefore, the load balance is not
regarded as a factor which can decrease the efficiency of computations in the present
In the following, the results from the strong scaling tests on ForHLR-I and on
JUSTUS are presented. The number of grid cells (Control Volumes) computed was
kept constant, but the number of cores for the computations has been varied. The
wall-clock time for each simulation had a duration of 12 h. The results from the
simulations are measured and presented as the physical time in [s], advanced during
the 12-h simulations, see Fig. 4.
The first performance test consists of using different number of nodes and
different number of cores per node, while keeping the total number of cores
constant. There are two opposite tendencies when using less cores per node:
from one side, it leads to an increased MPI-communication through the node-
interconnecting network while from the other side, the demand for accessing the
RAM within each node decreases, which might become beneficial on the global
level. This performance test was made only on JUSTUS: on the ForHLR-I there is
not enough memory per node in order to carry out that test. Figure 4 reveals that for
the present tests, using 8 cores per node (instead of 16) increases the physical time
advanced on JUSTUS. Thus, the increased efficiency on the node level overpowers
the increased communication need on the network level.
Fig. 4 Results from the strong scaling: physical time advanced vs. the number of cores which
really took part in the computations
278 M. Hettel et al.
Fig. 5 Results from the strong scaling: physical time advanced vs. the total number of cores for a
given simulation. This statistics includes all cores reserved for the particular simulation, although
only a part of them took part in the computations. During the simulation, the all of the cores are
not available to other users
Supercomputing of Two-Zone Fluidized Bed Reactor 279
is used (64 or 128). The two modes on JUSTUS (8 cores per node and 16 cores per
node) become almost identical – in terms of advanced physical time – when 256
cores (total cores used) are taken (reserved) for the simulations. Unfortunately, the
limitations on the maximum computing cores (see Sect. 6) did not allow following
further the scaling trend on JUSTUS. The RAM limitations per node hindered a
similar test (8 vs. 16 cores per node) to be carried out also on the ForHLR-I cluster.
As a whole, the scaling tests performed allowed gaining a first insight into the
parallel efficiency of the simulations for the two-zone fluidized bed reactor. The
limitations on each of the two massive parallel supercomputers have been identified
and with this knowledge the actual production runs can be continued effectively.
Using up to 200 computing nodes on ForHLR-I leads to a reasonable scaling
of the simulations, however, there is a large overhead in terms of cores which are
reserved, but not taking part in the computations. Therefore, moving the core of
the simulations from ForHLR-I to JUSTUS is the best solution which allows the
efficient use of up to 256 computing cores without any additional overhead.
8 Conclusions
of particles (4.8E6), each core requires a RAM of 8 GB, independent from the
number of cores used. The bottleneck for the RAM-usage lies in the coupling of
the particle code to the CFD code. Here, much effect could be achieved, if every
CFD-subdomain would only need the data for the particles which are inside this
For the present investigation the JUSTUS supercomputer turns out to be more
suitable than the ForHLR-I: the larger RAM per node on JUSTUS allows the
efficient use of all cores in the nodes and a good scaling up to 256 computing cores.
However, it is planned to integrate another, third software package to complement
the existing two packages. This third software package allows considering chemical
reactions and intra-particle transport phenomena, but also requires additional tests
regarding the combined software performance.
Acknowledgements The simulations for the present work were partly supported by the bwHPC
initiative and the bwHPC-C5 project ŒA1 provided through associated compute services of the
JUSTUS HPC facility at the University of Ulm. The grant of supercomputing resources on the
ForHLR-I supercomputer at the Steinbuch Centre for Computing of the Karlsruhe Institute of
Technology for the project with acronym “butadiene” is highly appreciated.
The authors would like to thank Jürgen Salk from the Communication and Information Center
of the University of Ulm (Competence Center for Computational chemistry), Alexandru Saramet
from the University of Applied Sciences Esslingen (Competence Center for Engineering sciences)
and Dr. Stefan Radl from the Graz University of Technology for their valuable help and advices
during the software installation and the software adjustment processes. The authors would like to
thank also to their colleagues Dr. Holger Obermaier and Richard Walter from SCC/SCS for the
fruitful discussions.
The support of the Helmholtz programme “Supercomputing and Big Data” ŒA2 is also highly
[A1] bwHPC and bwHPC-C5 (http://www.bwhpc-c5.de) funded by the Ministry of Science,
Research and the Arts Baden-Württemberg (MWK) and the German Research Foundation
[A2] The Programme “‘Supercomputing & Big Data” https://www.helmholtz.de/en/research/
Ewald Krämer
A great number of research projects related to CFD with excellent scientific quality
were run on the supercomputers of the HLRS in Stuttgart and of the SCC in
Karlsruhe during the reporting period. Valuable fundamental as well as application-
oriented knowledge could be attained from the simulation results, which became
possible only through the extensive use of High Performance Computing. It is
without saying that the access to supercomputers is crucial for successful research
in Fluid Dynamics – today and even more in the future. This year, 37 annual
reports had been submitted and underwent a peer review process. Due to limited
space, only 17 contributions could be selected for publication in this book, which
means that a number of high-qualified reports had to be rejected. Even though the
presented collection cannot entirely represent an area this vast, the selected papers
demonstrate the state-of-the-art use of high-performance computing in Germany.
The spectrum of the projects is wide in several respects. Fundamental as well
as application-oriented problems of industrial relevance were addressed using in-
house, commercial, and open source codes (the latter two of which made up grounds
with respect to massive parallel performance). Various established numerical
methods as Finite Volume and Lattice Boltzmann methods, but also relatively new
methods (at least in the context of CFD), as Smoothed Particle Hydrodynamics or
Discontinuous Galerkin methods were employed. All CFD simulations presented in
this book were either run on the Cray XC40 Hornet/Hazel Hen in Stuttgart (Europe’s
fastest supercomputer according to the HPCG benchmark) or on the ForHLR I in
E. Krämer ()
Institut für Aerodynamik und Gasdynamik, Universität Stuttgart, Pfaffenwaldring 21, 70550
Stuttgart, Germany
e-mail: kraemer@iag.uni-stuttgart.de
282 E. Krämer
For many years, the working group of Munz at the Institute of Aerodynamics and
Gas Dynamics (IAG), University of Stuttgart, has been developing a Discontinuous
Galerkin based high-order simulation framework. DG methods can be considered
as a combination of a finite-element scheme (with a continuous higher order poly-
nomial in each grid cell) and a finite-volume scheme (allowing for discontinuities
at the cell faces, which are handled by a Riemann solver) and provide a superior
parallel performance if implemented appropriately. The latest fluid dynamics code
from this framework, FLEXI, which uses a spectral element method (DGSEM),
has increasingly been employed for real industrial application in recent years. One
example is given by Hempert, Boblest, Hoffmann, Offenhäuser, Sadlo, Glass, Munz,
Ertl, and Iben. They simulated a high-pressure throttle and jet flow, which serves as a
simplified model for a gas injector in automotive combustion engines. Their studies
assess the transient development and penetration of the gaseous jet. As shocks
appear in such cases, a shock-capturing technique was applied based on a Finite
Volume subcell method to avoid near shock oscillations and under-resolved scales.
An efficient load-balancing strategy was implemented to remove the imbalances
caused by the shock-capturing and to maintain the high parallel efficiency of the
code. The ongoing work has been a cooperation between the IAG, the Robert
Bosch GmbH, the Visualisation Research Center of the University of Stuttgart, the
HLRS, and the Interdisciplinary Center for Scientific Computing of the Heidelberg
University. The simulations were performed on the Hazel Hen.
The next two contributions are from the Institute of Thermal Turbomachines
(ITS) of the Karlsruhe Institute of Technology (KIT). There, an inhouse code based
on a Lagrangian, mesh-free Smoothed Particle Hydrodynamics (SPH) method has
been developed during the last few years. In such methods, which are relatively
new in the context of computational fluid dynamics, the spatial discretization of
a computational domain is done via so-called particles, which represent a certain
volume of the fluid. These Lagrangian particles move within the domain with the
local flow velocity. The simulations were run on the ForHLR I cluster at the SCC
displaying a very good parallel performance of the code. In the first report, Wieth,
Braun, Chassonnet, Dauch, Keller, Höfler, Koch, and Bauer simulated the temporal
evolution of droplet deformation at low aerodynamic loads, which plays a significant
role in liquid fuel atomization processes. The deformation dynamics of single-fluid
droplets as well as of fuel droplets with water added to the inside of the droplet was
investigated. To validate the SPH-code for this type of application, a comparison of
the results to well-known empirical findings was done for the pure liquid droplets,
showing an excellent matching. The authors conclude that the SPH-code is capable
of predicting droplet deformation dynamics physically correct.
The second SPH application, presented by Braun, Koch, and Bauer, deals with
the numerical prediction of primary atomization taking place e.g. in air-assisted
atomizer nozzles of jet engines. The focus is on the liquid disintegration processes,
i.e. on the breakup behavior and the spray characteristics. The test case is derived
from an experimentally investigated set-up and consists of up to 1.2 billion particles
with a spatial resolution of roughly 5 m. 2560 cores were used in parallel for these
simulations. Comparisons to the experimentally observed features as well as to the
IV Computational Fluid Dynamics 283
results of established CFD tools using Volume-of-Fluid (VoF) solvers show good
agreement, demonstrating that the SPH method is an adequate tool for predicting
multi-phase flows.
The aim of the work described by Förster, Mink, and Krause from the Institute
of Mechanical Process Engineering and Mechanics of the KIT is to achieve a
more accurate characterization of the flow domain and flow dynamics especially
in complex geometries. This shall be done by coupling existing (lower resolution)
experimental data, e.g. obtained from Phase Contrast Magnetic Resonance Imaging
(PC-MRI) in medical applications, with numerical simulations. The idea is to
formulate this fluid flow domain identification problem as an optimization problem,
which minimizes the differences between a given and a simulated flow field. The
proposed gradient-based solution strategy makes use of an adjoint lattice Boltzmann
method (ALBM). The authors’ novel sensitivity based so-called first-optimize-then-
discretize approach relies on first deriving an adjoint equation on a continuous basis
and then discretizing it, which allows maintaining the excellent parallel efficiency
known from LB methods in general. Using the open source software OpenLB,
developed by the working group Computational Process Engineering at the KIT,
a very good efficiency on massive parallel HPC has been achieved, and also the
single core performance could be improved significantly. In the article, preliminary
results are shown for a generic domain identification test case.
Ye and Tiedje from the Institute of Industrial Manufacturing and Management,
University of Stuttgart, analyze in their contribution the dynamics of paint drops
impacting onto dry surfaces. The special focus is on the air entrapment at the
droplet-solid interface. Both, Newtonian and non-Newtonian droplets are simulated
showing different results with respect to the creation of air discs and air bubbles
during drop spreading. The VoF method implemented in the commercial CFD code
ANSYS-FLUENT was used for a comprehensive parametrical study performed on
the CRAY XC40 of the HLRS. The results of the investigations provide a new
insight into the mechanism of air entrapment during drop impact onto solid surfaces.
Also Reitzle, Roth, and Weigand from the Institute of Aerospace Thermodynam-
ics, University of Stuttgart, investigated the impact of droplets on dry solid walls. In
their case, liquids are were used that show a non-Newtonian shear thinning behavior.
Due to their different viscosities, their spreading behavior is slightly different. The
simulations were performed on the CRAY XC40 using the in-house code Free
Surface 3D (FS3D), which predicts incompressible multiphase flows based on the
Volume-of-Fluid method. Scaling tests revealed that the speed-up of the code is
not ideal due to the multigrid solver used to solve the pressure Poisson equation.
However, a new multigrid solver library is being implemented, which is expected to
significantly improve both the serial and the parallel performance of the code.
The reduction of viscous drag, especially turbulent skin-friction drag, is desirable
for many fluid mechanical applications. During the last decades, various flow
control strategies based on near-wall forcing have emerged, which show promising
potential, at least for relatively low Reynolds numbers. However, due to missing
experimental and numerical data, the efficiency of such control technologies at
higher Reynolds numbers relevant for most industrial applications is still an open
284 E. Krämer
question. Davide Gatti from the Institute of Fluid Mechanics at the KIT therefore
has addressed the effect of increasing Reynolds number on the achievable skin-
friction drag reduction for a channel flow with enforced streamwise travelling waves
of the spanwise wall velocity as control strategy. By means of Direct Numerical
Simulations (DNS), he performed a comprehensive parameter study in two steps.
First, 4020 cases were simulated in a small domain for different parameters of the
spanwise forcing at two Reynolds numbers. These computations were performed
as contemporaneous serial runs partly on the Blue Gene/Q system at the CINECA
computing center in Bologna and partly on the For HLR I at the SCC in Karlsruhe.
Additionally, a second set of computations for a few representative cases were
conducted within a large domain. The results of both datasets are discussed in detail
and maximum net saving rates are given. The author also derives an equation for
the extrapolation of the drag reduction to higher Reynolds numbers. Based on this
equation, he states that the decrease in drag reduction efficiency for higher Reynolds
numbers is notably lower than the available pure low Reynolds number data bases
Control strategies for turbulent boundary layers have also been in the focus of
Alexander Stroh of the same institute, who performed DNS computations for a
turbulent channel flow. In contrast to Gatti, who applied spanwise forcing in a
fully developed boundary layer along the whole wall area, Stroh has focused on
localized control, which can easier be realized in industrial applications. In the
work presented, he investigates two different drag reduction control methods for
a spatially developing turbulent boundary layer and analyses in particular the flow
behavior downstream of the control region. He also compares the efficiency of
the flow control in a fully developed turbulent channel flow and in a developing
boundary layer, finding that there are significant differences in the mechanisms
behind the drag reduction. Up to 240 Mio grid nodes were used for his main
configuration setup, and the simulations were performed on 256 parallel cores each,
with different simulations running concurrently on the ForHLR I.
The next three contributions describe the results of different projects running
under the biannual “Call for Large-Scale Projects” of the Gauss Centre for Super-
computing (GCS). Projects considered in these calls require more than 35 million
core hours per year. The first paper by Axtmann and Rist from the IAG in Stuttgart
presents a study of the scalability and MPI characteristics of OpenFOAM on the
CRAY XC40 Hazel Hen at the HLRS. Direct Numerical Simulations for a three-
dimensional laminar cavity flow and Large Eddy Simulations for a backward facing
step were performed. Strong and weak scaling speedups as well as imbalance rates
are displayed for two different compilers. In addition, the performance of the MPI
routines ISend, Recv, and Waitall is compared. The tool CrayPAT was employed for
doing profiling and tracing during the runs. The study gives insight in the parallel
behavior of OpenFOAM version 2.3.0 on massively parallel computers.
Wilke and Sesterhenn, TU Berlin, Fachgebiet Numerische Fluiddynamik, sum-
marize the work performed within two separate projects, both dealing with subsonic
and supersonic jets impinging on a flat plate. The first part is dedicated to heat
transfer enhancement, whereas the second part refers to sound source mechanisms.
IV Computational Fluid Dynamics 285
In both cases, an in-house code was used that directly solves the governing Navier-
Stokes equations in a characteristic pressure-velocity-entropy-formulation. To avoid
Gibbs oscillations in the vicinity of shocks, an adaptive sock-capturing filter was
used. An excellent scaling behavior of the code on the Hazel Hen is demonstrated
up to 16,384 cores. For the highest Reynolds number investigated, a mesh with more
than one billion grid points was used. Impinging jets are known to be an effective
cooling means, and the amount of heat transfer can even be increased with pulsating
inlets. The aim of the first project was to get some insight into the underlying physics
behind this increase. Earlier investigations of a non-pulsating jet had revealed that
periodically occurring vortex rings are responsible for an additional heat transfer.
The authors show that the pulsation strongly amplifies these vortices. In the second
part, the open question is addressed, how the sound waves in the feedback loop
that is responsible for the generation of impinging tones, are produced. The authors
could observe the feedback loop in their direct numerical simulations, and they can
show that the interaction between vortices and stand-off shocks produce the sound
waves by two different mechanisms, either by shock-vortex- or by shock-vortex-
At the Institute of Aerodynamics of the RWTH Aachen, over many years a high-
fidelity, massively parallelized flow solver using the MILES (monotone integrated
LES) approach has been applied very successfully to various aerodynamic and aero-
acoustic problems. The code runs on locally refined Cartesian hierarchical meshes.
In their present contribution, Pogorelov, Cetin, Moghadam, Meinke, and Schröder
describe latest results of their simulations of the flow fields and the acoustic fields of
a ducted axial fan and a helicopter engine jet. For this purpose, a hybrid method was
chosen, where the flow fields including the aero-acoustic sources were predicted
by a highly resolved LES computation and, subsequently, the acoustic near and far
fields were determined by solving the acoustic perturbation equations. The focus
of the rotating fan simulations lay on the evaluation of the effect of the tip-gap
size. It is shown that, in accordance to measurements, a larger tip-gap size produces
stronger tip vortices and a higher broadband noise level. In the second part, jets
from helicopter nozzles with different built-in components are compared to each
other. The components have a strong impact on the acoustic near field, which is
explained by its effects on the turbulent wake structures.
Not least owing to a long-lasting, very successful research work performed
in the helicopter group of the IAG, the structured Finite-Volume code FLOWer,
originally developed by the German Aerospace Center (DLR), has established as a
very reliable, high-fidelity CFD-code for helicopter flow simulations. Many useful
features have been implemented during the last years, among others a high-order
reconstruction scheme, necessary for vortex dominated flow conservation. In their
present contribution, Kowarsch, Hofmann, Keßler, and Krämer report on the latest
enhancement, the implementation of unstructured grid handling into the code. This
hybrid mesh approach allows for easier grid generation in the near body regions,
whereas off-body regions can still be resolved with structured, preferably Cartesian
meshes, in combination with computationally efficient higher-order numerical
schemes. Validation was performed with a forward facing step, and results are
286 E. Krämer
shown for a complete helicopter in forward flight. Additional effort has been spent
to further optimize the code with regard to its application on HPC systems. Multi-
blocking and an efficient load balancing taking into account the respective mesh
type and numerical scheme of the individual blocks are used. Furthermore, thanks
to valuable support from the teams of HLRS and CRAY, the parallel performance
on the CRAY XC40 Hazel Hen could be improved, facilitating the efficient use of
more than 1000 nodes.
Chu and Laurien from the Institute of Nuclear Technology and Energy Systems,
University of Stuttgart, investigated the heat transfer problem arising in the cooling
system of nuclear power plants or heavy-duty coolers. Direct numerical simulations
of supercritical carbon dioxide flow in a heated vertical pipe including buoyancy
effects were performed for low Mach number flows with varying density using the
open-source code OpenFOAM. Bulk properties, average flow field and secondary
flow, and turbulence statistics are analyzed in detail. Scaling tests reveal a good
speedup up to 1400 cores on the Hazel Hen. The findings of this work can help
develop new turbulence models for this kind of practical applications.
OpenFOAM was also used by Stens and Riedelbauch from the Institute of Fluid
Mechanics and Hydraulic Machinery, University of Stuttgart. They simulated a fast
transition from pump mode to generating mode in a model scale reversible pump
turbine. Such machines are used in pumped storage power plants, which are an
efficient way to store energy at a large scale. However, the current procedures for
changing from one operating mode to the other is still time consuming. The aim
of the project is to understand the flow mechanisms during a change of operating
modes, in order to develop faster maneuvers that do not damage the machine.
Results for two different mesh sizes are presented for different monitor points.
Furthermore, the flow field in the runner is analyzed at different points of time. The
simulations were run on the ForHLR I at the SCC in Karlsruhe. Adequate speedups
were achieved for 40 cores for the coarse and 120 cores for the fine mesh.
The next contribution is from the same institute. Here, Krappel and Riedelbauch
present the results of their transient flow simulations in a Francis turbine at part
load conditions. The flow field in the draft tube of the turbine at these conditions is
dominated by the vortex rope phenomenon, which requires a very high resolution
in space and time and an appropriate turbulence model. The authors applied the
commercial code ANSYS CFX (in different versions) with two different turbulence
models (the RANS-SST model and the scale resolving SST-SAS model). The
meshes used in the study were in the range between 16 and 300 Mio nodes. The
differences in the resolved flow structures are displayed for the various meshes
and/or turbulence models. Additionally, different numerical schemes were used
for the spatial discretization, which also have an effect on the predictions. The
strong scaling behavior is shown for the different versions of ANSYS CFX, clearly
indicating a significant parallel performance improvement from V16.0 to V17.0.
Mansour, Kaltenbach, and Laurien from the Institute of Nuclear Technology and
Energy Systems, University of Stuttgart, present an application oriented CFD model
for predicting the heat and mass transfer between large droplets and gas during
the spray cooling process in an nuclear reactor containment with an Euler-Euler
IV Computational Fluid Dynamics 287
1 Introduction
We use the computational fluid dynamics (CFD) code FLEXI that we develop
with the application in industrial environments in mind [6]. It is based on the
discontinuous Galerkin spectral element method (DG SEM), which is a high-order
accuracy method and yields great potential for parallel scaling [3, 8]. For a detailed
description of DG SEM, together with a discussion of its parallelization efficiency,
the reader is referred to Hindenlang et al. [11].
The basis for our calculations are the compressible Navier-Stokes equations
(NSE). To close the NSE, an equation of state (EOS) is needed. This is achieved
by either using an analytical formulation, or a tabulated approach, as we do it in the
present paper. Our ansatz is based on the idea of Dumbser et al. [9]. We generate the
data for our tabulated EOS with the CoolProp library [5]. This allows us to represent
the EOS over a wide range of all thermodynamic variables with excellent accuracy.
We consider methane, as it is the main component of natural gas [12].
To avoid Gibbs type oscillations near shocks or under-resolved scales, we
use a detector proposed by Persson and Peraire [18] to detect regions, where
such oscillations might occur, and then apply the finite volume (FV) subcell
method [19, 20] in a slightly modified form to prevent their occurrence. The original
FV method uses Gaussian-distributed subcells, while in our case they are distributed
equidistantly for increased accuracy.
We developed a reader plugin for ParaView to visualize our simulations [6]
that runs on the Hazel Hen. In recent years, several methods have been developed
to directly visualize data from high-order CFD solvers without resampling [7, 13,
Within that plugin, however, we still use a resampling method with user-
defined resolution for DG elements and a fixed resolution for FV subcells. By also
integrating our EOS tables into the plugin, we can visualize all simulation variables
in ParaView, without the need to store all of them in our state files. This strategy
keeps our storage requirements low.
3 Simulation Strategy
We use a throttle geometry with diameter D D 0:5 103 m and length L D 4D.
This geometry is a simplified representation of a gas injector. The simulation
domain is represented using an unstructured hexahedral mesh. An overview of the
simulation domain, together with a section view of the mesh, is given in Fig. 1. The
mesh has the highest resolution at the boundaries of the throttle wall, around the
throttle exit, and downstream of the throttle. Downstream of the throttle, the gas is
injected into the open and forms a jet. The computation mesh consists of 83,732
elements. For the current assessment, we apply a 4th-order spatial discretization,
and use a 4th-order low-storage Runge-Kutta scheme for the temporal discretization.
292 F. Hempert et al.
Fig. 1 (a) Overview of simulation domain. (b) Sectional view of the simulation mesh, with the
high-resolution region in red
Fig. 2 Mach number on a section through the center of the nozzle at different pressure ratios
Rp and corresponding subsonic (a) and supersonic (b) conditions. Please note the different Mach
Figure 2 shows the Mach number contours for a fully subsonic jet at Rp D 1:25
(Fig. 2a) and a supersonic jet at Rp D 5:00 (Fig. 2b). The subsonic jet exhibits a
turbulent boundary layer, even though it is not fully developed because the throttle
is too short for that. The throttle flow and the resulting jet are fully turbulent at
this pressure ratio. For the supersonic jet, the shock systems are visible within
the throttle, and a slightly under-expanded jet occurs downstream of the throttle.
The flow is chocked and the critical cross section is at the inlet of the throttle. By
increasing the pressure ratio Rp between inlet and outlet, the Reynolds number of
the throttle flow and the jet increases. In the current investigation, we used the same
computation mesh for all Reynolds numbers. For higher Reynolds numbers, this
makes the simulation underresolved, especially in the jet region. However, this is not
a real problem in our case, as we focus on the mass flow behavior through a throttle
with a jet; for the lower pressure ratios, i.e., supersonic flow, the flow becomes
choked and therefore the throttle inlet limits the mass flow. Consequently, at higher
pressure ratios, the representation of the jet is less important for the determination
of the mass flow and the resolution used is a reasonable trade-off between accuracy
and computational cost.
294 F. Hempert et al.
For the design process of gas injectors, the accurate prediction of the mass flow
is essential. While there are some analytical relations at lower pressures [4], the
behavior at higher pressures is much less clear.
In the following, we focus on the mass flow in the quasi-stationary flow and
on the transient behavior of the mass flow. The quasi-stationary mass flow of the
investigated throttle is shown in Fig. 3. For lower pressure ratios, the flow becomes
chocked and the mass flow is independent of Rp . For an ideal gas, the critical
pressure ratio of a restriction within a pipe is Rp < 2:44 [4], however, in our case
we find a larger value Rp 2:86. This is due to the sharp edges of the geometry
and the real-gas effects, which have a non-negligible effect at these conditions.
The transient behavior of the mass flow can also be important, since an injection
of gas commonly occurs at high frequencies. The temporal development of the mass
flow rate for the different Rp is shown in Fig. 4. For Rp D 1:25, the flow is fully
subsonic and the mass flow reaches a quasi-constant value already at around t >
23 s. At Rp D 1:67, we observe a similar overall temporal behavior to the Rp D
1:25 case, apart from the significantly higher final value for the mass flow rate.
With an even higher pressure ratio Rp D 2:50, we find an initially similar rise of
the mass flow rate as in the fully subsonic cases, however, it continues to rise until
about t 120 s. The flow is not fully chocked, but the mass flow is no longer
limited by the conditions at the throttle exit but instead by those at the throttle inlet.
Finally, the flow becomes chocked at the inlet at the even higher pressure ratios
Rp D 2:86, 3:33, and 5:00. Until t 40 s, the mass flow rate here is lower
than for Rp D 2:50, and it takes significantly longer until the maximum value is
reached, which is virtually independent of Rp in these three cases. All cases show a
very dynamic initial mass flow behavior that strongly depends on whether the flow is
sub- or supersonic. At later times, t > 100 s, the mass flow rate is nearly constant
for all Rp .
High-Pressure Real-Gas Jet as a Simplified Gas Injector Model 295
Fig. 4 Mass flow rate for different pressure ratios over time
Fig. 5 Mach number along the centerline downstream of the throttle exit
The position of the shocks, especially during the early stages, is very dynamic.
In the following, we focus on the transient behavior of the first shock. The Mach
number along the centerline downstream of the throttle exit is depicted in Fig. 5 at
different times. At t D 10 s, the initial jet tip with a strong gradient is present at
x=D D 1. At t D 20 s, a shock starts to form, which grows in strength over time
while it moves upstream. Noticeably, for t D 30–60 s, the maximum Mach number
reaches a plateau. This plateau indicates that the flow enters the two-phase region.
The velocity increase in the two-phase region is reduced and the speed of sound
296 F. Hempert et al.
Fig. 6 Finite-volume subcell locations at one time instance for a subsonic jet (a) and a supersonic
jet (b). In the supersonic jet, the locations of the FV cells reflect the typical criss-cross pattern of
an under-expanded jet
remains nearly constant. Therefore, the Mach number only increases marginally in
the two-phase region very close to the shock jump. Once the flow is developed, no
normal shock is present anymore, since the jet is no longer under-expanded. Only
weak oblique shocks still exist under these flow conditions.
The DG SEM needs stabilization for under-resolved scales and shocks, which we
achieve by employing the aforementioned combination of the detector by Persson
and Peraire [18] and the FV subcell method [19, 20]. Figure 6a shows the FV
subcells for the subsonic jet at a given point in time. Even though no shocks are
present, these subcells are used to stabilize the simulation at under-resolved scales,
i.e., here mainly in the shear layer of the jet. For the supersonic jet with shocks, the
FV subcells accomplish shock capturing, see Fig. 6b. Here, the distribution of the
subcells shows the typical criss-cross pattern of an under-expanded jet.
The DG SEM is by construction a method that is very well parallelizable [11]. The
supplementation of DG SEM with the FV subcell method enables the numerical
simulation of complex flows with, e.g., occurring shocks [10]. However, without
further measures, it also causes significant load imbalances, because a FV subcell is
computationally more expensive than a DG element. Hence, if the mesh elements
are distributed equally on all cores, those cores with many FV cells will take
longer for their computation and hence decrease the performance of the entire
simulation. A first step to reduce load imbalances is to take into account the higher
computational cost of FV subcells in the initial distribution of mesh elements at
the beginning of the simulation. However, this is not sufficient, because both the
number of FV subcells as well as their locations within the simulation domain
High-Pressure Real-Gas Jet as a Simplified Gas Injector Model 297
depend heavily on the flow conditions, and thus may change rapidly during the
simulation, for example, during the emergence of shocks, see Fig. 6.
Hence, it is important to use a more sophisticated load balancing strategy to
maintain DG SEM’s excellent parallelizability properties in such complex flow
simulations. We have developed such a new technique for dynamic load-balancing,
and implemented it in FLEXI. For the initial element-to-core distribution, this
technique takes into account the difference in computation cost between FV subcells
and normal DG elements, by assigning a weight w > 1 to FV subcells, and
w D 1 for DG elements. Then, the elements are distributed in such a way that
the weight sums on all cores are as close to the average value as possible. During
the simulation, when new FV subcells emerge or old ones become DG elements
again, the distribution of elements on the cores is adapted. To do that, we shift
elements from cores with high weight sum to cores with low weight sum until all
cores have a weight sum as close to the average value as possible. We employ the
shared memory window that has been introduced in MPI 3.0 on each node to make
this element shifting as efficient as possible. The communication between nodes is
performed with standard MPI routines.
In our current implementation, the adaptation of the element distribution is
performed after fixed time-step intervals. Currently, we are investigating techniques
to measure the load-imbalance and start adaptation if the load-imbalance reaches a
certain threshold.
For a case study of the efficiency of our load-balancing strategy in its present
form, we performed test calculations with 216 DOF per DG element. We scaled
the number of cores from 96 to 1536, and executed the load-balancing every 1000
time steps. Table 1 shows the reduction of wall time compared to simulations
without load balancing. Clearly, incorporating a higher number of cores increases
load imbalance because the probability of one core receiving a large number of FV
subcells rises. With our dynamic load balancing, we gain a significant reduction in
wall time. As further illustration, Fig. 7a, b show the effect of our load balancing
technique for one given timestep on 96 cores. In the example, 4:1 % of the elements
are FV-subcells. In Fig. 7a, the mesh elements are evenly distributed on all cores
(256 elements per core), ignoring the difference in numerical cost between DG
and FV-subcell elements. This causes huge load imbalances and therefore a serious
performance drop that can be removed by shifting elements between cores so that
the load is evenly distributed (Fig. 7b), i.e., the numbers of elements can now differ
significantly between cores (between 203 and 262, Fig. 7c).
Table 1 Reduction of wall # Cores DOF per core Wall time reduction [%]
time achieved with our load
balancing strategy for 96 55;296 3:4
different numbers of cores. In 192 27;648 4:9
all cases, we performed a 384 13;824 6:4
simulation with 24,576 768 6912 7:3
elements and 63 D 216 DOF 1536 3456 12:1
per element
298 F. Hempert et al.
Fig. 7 Load distribution for one given timestep on 96 cores. (a) Load distribution without load
balancing. (b) Load distribution with load balancing. (c) Number of elements per core with load
balancing. The number of elements varies between 203 and 262, instead of being constant at 256
on each core if no load balancing is employed
High-Pressure Real-Gas Jet as a Simplified Gas Injector Model 299
5 Conclusion
Acknowledgements This work is supported by the Federal Ministry of Education and Research
(BMBF) within the HPC III project HONK “Industrialization of high-resolution numerical
analysis of complex flow phenomena in hydraulic systems”. We also thank the Gauss Centre for
Supercomputing (GCS) which provided us with the necessary computing resources on the Hazel
1. Adolf, M., Bargende, M., Becker, M., Bender, T.B., Budde, M., Ebner, A., Feix, F., Figer, G.,
Heine, P., Jauss, A., Kehler, T., Keskin, M.T., Köhler, E., Kufferath, A., Langer, W., Lejsek, D.,
Petersen, C., Philipp, U., Sarikaya, A., Sauerstein, R., Schaarschmidt, M., Schenk, A., Volz,
P., Weiske, S., Winke, F., Winkelmann, H., Wollenhaupt, H., Wunderlich, K.: Natural gas and
renewable methane for powertrains: future strategies for a climate-neutral mobility. In: Vehicle
Development for Natural Gas and Renewable Methane, pp. 229–458. Springer, Cham (2016)
2. Allgeier, T., Haug, M., Frehoff, R., Weikert, M., Kröger, K., Langer, W., Förster, J., Thurso,
J., Wörsinger, J.: Gasoline engine management: systems and components. In: Operation of
Gasoline Engines on Natural Gas, pp. 122–135. Springer, Wiesbaden (2015)
3. Altmann, C., Beck, A.D., Hindenlang, F., Staudenmaier, M., Gassner, G.J., Munz, C.-D.: An
efficient high performance parallelization of a discontinuous galerkin spectral element method.
Lect. Notes Comput. Sci. 7686, 37–47 (2013)
4. Beater, P.: Pneumatic Drives System Design, Modeling and Control. Springer, Berlin/London
5. Bell, I.H., Wronski, J., Quoilin, S., Lemort, V.: Pure and pseudo-pure fluid thermophysical
property evaluation and the open-source thermophysical property library coolprop. Ind. Eng.
Chem. Res. 53(6), 2498–2508 (2014)
6. Boblest, S., Hempert, F., Hoffmann, M., Offenhäuser, P., Sonntag, M., Sadlo, F., Glass, C.W.,
Munz, C.-D., Ertl, T., Iben, U.: Toward a discontinuous galerkin fluid dynamics framework
for industrial applications. In: High Performance Computing in Science and Engineering’15,
pp. 531–545. Springer, Berlin/New York (2016)
7. Bolemann, T., Üffinger, M., Sadlo, F., Ertl, T., Munz, C.-D.: Direct visualization of piecewise
polynomial data. In: IDIHOM: Industrialization of High-Order Methods – A Top-Down
Approach, pp. 535–550. Springer, Cham (2015)
300 F. Hempert et al.
8. de Wiart, C., Hillewaert, K.: Development and validation of a massively parallel high-order
solver for DNS and LES of industrial flows. In: Kroll, N., Hirsch, C., Bassi, F., Johnston,
C., Hillewaert, K. (eds.) IDIHOM: Industrialization of High-Order Methods – A Top-Down
Approach. Volume 128 of Notes on Numerical Fluid Mechanics and Multidisciplinary Design,
pp. 251–292. Springer, Cham (2015)
9. Dumbser, M., Iben, U., Munz, C.-D.: Efficient implementation of high order unstructured
{WENO} schemes for cavitating flows. Comput. Fluids 86, 141–168 (2013)
10. Hempert, F., Hoffmann, M., Iben, U., Munz, C.-D.: On the simulation of industrial gas dynamic
applications with the discontinuous Galerkin spectral element method. J. Therm. Sci. 25(3), 1–
8 (2016)
11. Hindenlang, F., Gassner, G., Altmann, C., Beck, A., Staudenmaier, M., Munz, C.-D.: Explicit
discontinuous Galerkin methods for unsteady problems. Comput. Fluids 61, 86–93 (2012)
12. Huang, J., Crookes, R.: Assessment of simulated biogas as a fuel for the spark ignition engine.
Fuel 77(15), 1793–1801 (1998)
13. Martin, T., Cohen, E., Kirby, R.M.: Direct isosurface visualization of hex-based high-order
geometry and attribute representations. IEEE Trans. Vis. Comput. Graph. 18(5), 753–766
14. McTaggart-Cowan, G., Mann, K., Huang, J., Singh, A., Patychuk, B., Zheng, Z.X., Munshi, S.:
Direct injection of natural gas at up to 600 bar in a pilot-ignited heavy-duty engine. SAE Int.
J. Engines 8(3), 981–996 (2015)
15. Nelson, B., Kirby, R.M., Haimes, R.: Gpu-based interactive cut-surface extraction from high-
order finite element fields. IEEE Trans. Vis. Comput. Graph. 17(12), 1803–1811 (2011)
16. Nelson, B., Liu, E., Kirby, R.M., Haimes, R.: Elvis: a system for the accurate and interactive
visualization of high-order finite element solutions. IEEE Trans. Vis. Comput. Graph. 18(12),
2325–2334 (2012)
17. Pagot, C., Osmari, D., Sadlo, F., Weiskopf, D., Ertl, T., Comba, J.: Efficient parallel vectors
feature extraction from higher-order data. Comput. Graph. Forum 30(3), 751–760 (2011)
18. Persson, P.-O., Peraire, J.: Sub-cell shock capturing for discontinuous Galerkin methods. In:
Proceedings of the American Institute of Aeronautics and Astronautics, Keystone, vol. 112
19. Sonntag, M., Munz, C.-D.: Shock capturing for discontinuous Galerkin methods using finite
volume subcells. In: Finite Volumes for Complex Applications VII-Elliptic, Parabolic and
Hyperbolic Problems. Volume 78 of Springer Proceedings in Mathematics & Statistics,
pp. 945–953. Springer, Cham (2014)
20. M. Sonntag and C.-D. Munz. Efficient parallelization of a shock capturing for discontinuous
galerkin methods using finite volume sub-cells. J. Sci. Comput. 1–28 (2016)
21. Vuorinen, V., Yu, J., Tirunagari, S., Kaario, O., Larmi, M., Duwig, C., Boersma, B.: Large-
eddy simulation of highly underexpanded transient gas jets. Phys. Fluids (1994-present) 25(1),
016101 (2013)
22. Westerhoff, M., Holtmeier, G.: Erdgas Die greifbare Chance. MTZ – Motortechnische
Zeitschrift 77(2), 8–13 (2016)
23. Yu, J., Vuorinen, V., Kaario, O., Sarjovaara, T., Larmi, M.: Visualization and analysis of the
characteristics of transitional underexpanded jets. Int. J. Heat Fluid Flow 44, 140–154 (2013)
Modeling of the Deformation Dynamics
of Single and Twin Fluid Droplets
Exposed to Aerodynamic Loads
Abstract Droplet deformation and breakup plays a significant role in liquid fuel
atomization processes. The droplet behavior needs to be understood in detail,
in order to derive simplified models for predicting the different processes in
combustion chambers. Therefore, the behavior of single droplets at low aerody-
namic loads was investigated using the Lagrangian, mesh-free Smoothed Particle
Hydrodynamics (SPH) method. The simulations to be presented in this paper are
focused on the deformation dynamics of pure liquid droplets and fuel droplets with
water added to the inside of the droplet. The simulations have been run at two
different relative velocities.
As SPH is relatively new to Computational Fluid Dynamics (CFD), the pure
liquid droplet simulations are used to verify the SPH code by empirical correlations
available in literature. Furthermore, an enhanced characteristic deformation time is
proposed, leading to a good description of the temporal initial deformation behavior
for all investigated test cases. In the further course, the deformation behavior of two
fluid droplets are compared to the corresponding single fluid droplet simulations.
The results show an influence of the added water on the deformation history.
However, it is found that, the droplet behavior can be characterized by the pure
fuel Weber number.
1 Introduction
in a jet-in-crossflow configuration for example. Hence, they are crucial for the
following evaporation and combustion process. Various experimental investigations
of the behavior of droplets at aerodynamic loads have been conducted in the past,
e.g. [11, 12, 14]. However, the experimental setup, which either relays on a shock
tube experiment or a free falling droplet in a crossflow, does not allow for a
detailed insight of the phenomena involved in the deformation and breakup process.
Therefore, numerical investigations have been conducted to gain insight into the
underlying physics, e.g. [17, 25, 30].
In order to predict all processes occurring in combustion chambers, from the
liquid fuel injection to the combustion, commonly Euler-Lagrange methods are
used. These methods predict the air flow on an Eulerian mesh, while the liquid
fuel is inserted as Lagrangian parcels. To describe the behavior of the liquid fuel
droplets, simplified models were derived using experimental and detailed numerical
investigations. The most common models to describe the initial deformation phase
are the Normal-Mode (NM) model and the Non-linear Taylor Analogy Breakup
(NLTAB) model [27, 28], which is a nonlinear extension to the well known TAB
model proposed by O’Rouke [24]. In all models it is assumed that after reaching a
critical deformation, the Lagrangian parcel will undergo secondary breakup, which
is described by empirical models as well (e.g. [2]).
The assumption of such empirical models is, that the droplet is exposed to a
quasi-steady aerodynamic load. Therefore, the history and the temporal evolution
of the droplet deformation is considered. This may lead to unphysical droplet drag
predictions. In the present paper the temporal evolution of droplet deformation is
investigated at low aerodynamic loads using the Lagrangian, mesh-free Smoothed
Particle Hydrodynamics (SPH) method. The weakly compressible SPH code in use
was developed and validated in order to predict the atomization process in gas
turbine engines [13]. The main advantage of SPH over mesh-based methods is the
inherent interface advection without the need of an interface capturing algorithm.
Furthermore, the effect of water added to the inside of the liquid fuel droplet is
investigated. Preliminary tests performed in heavy duty gas turbines showed that
the addition of water has a positive effect on the thermal NOx emissions[18]. The
addition of water to the fuel oil not only decreases the combustion temperature due
to the heat of evaporation, but has a positive effect on the atomization process as
well [8].
Therefore, the deformation of single, emulsified fuel droplets with an initial
diameter of d0 60 µm with different water volume fractions D VW =.VW C
VOil /. D 0; D 0:23; D 1/ exposed to different air velocities .jvAir j D
22:5 m=s and jvAir j D 24:34 m=s/ are investigated. Furthermore, the placement
of the water inside the droplet is varied to determine its influence on the droplet
Modeling of Single and Twin Fluid Droplets Exposed to Aerodynamic Loads 303
2 Methodology
The fact, that every spatial function f .x/ can be exactly reproduced by the
convolution of the function itself with the Dirac delta function ı.x x0 / is the basis
of the SPH-interpolation:
f .x/ D f .x0 /ı.x x0 /dx0 : (1)
where V is the volume, m is the mass and the density of the particles.
304 L. Wieth et al.
This is only valid if the domain of interpolation of a particle is not truncated by the
boundary of the computational domain.
This formulation conserves mass exactly and prevents a non-physical density gradi-
ent over the interface of multi-phase flows, in contrast to other formulations [16].
Various approaches for the approximation of gradients, like the pressure gra-
dient term in the momentum equation [r p=] are available in literature. An
Modeling of Single and Twin Fluid Droplets Exposed to Aerodynamic Loads 305
The viscous term of the momentum equation [r =] contains second-order deriva-
tives. The approximation of this term was introduced to SPH by Morris et al. [23].
It is derived from the inter particle average shear stress using a combined viscosity:
r i C j rij r W.xi xj ; h/
D mj vij : (6)
i j
i j r2ij C 2
Here denotes the dynamic viscosity, rij is the distance vector between the
particles, V denotes the particle volume, vij denotes the velocity difference and
is a small parameter, which serves to avoid singularities.
In our SPH approach the weakly compressible SPH scheme is used, meaning
that non-compressible liquids are modeled as weakly compressible. Therefore, the
pressure p and the density are linked through an equation of state. In this approach
the density fluctuations are limited to = D 1 % by imposing an artificial sound
speed c which is approximately ten times higher than the maximum velocity jvmax j.
In general, this leads to an artificial sound speed, which is much lower than the
physical one. By this approach small time steps due to the Courant-Friedrich-Levy
(CFL) criterion are mitigated.
For the present investigation the equation of state in-use is a modified Tait
equation, which was originally derived for water [3]:
nom c2 Vnom
pD 1 : (7)
Here nom is the reference density, Vnom is the reference volume and
is the
polytropic exponent.
For the prediction of the complex physics leading to liquid atomization the modeling
of surface tension effects plays a crucial role. This is due to the fact, that droplet
deformation and disintegration are mainly determined by the force balance between
the microscopic surface force acting in tangential and normal direction at the liquid-
air interface and the shear forces acting on the droplet which are induced by the
air flow. This balance is described by the dimensionless Weber number We, which
can be used to estimate whether a droplet is exposed to an super- or subcritical
aerodynamic load.
306 L. Wieth et al.
In our SPH code the surface tension is represented by the Continuum Surface
Force (CSF) model, originally introduced by Brackbill et al. [4] in the framework of
the VoF method. The CSF model adopted in our approach was proposed by Adami
et al. [1]. The surface tension force is represented as a continuous force acting over
a volume adjacent to the interface instead of a force acting directly on the surface
of the droplet. Therefore, the surface tension is converted to a volumetric force FSF
using a normalized delta-function ıS , which has its peak at the interface:
O S:
FSF D nı (8)
1 X 2 i
nD Vi C Vj2 i r W.xi xj ; h/: (9)
Vi j i C j j
Fi;SF i nO i jni j
hfii;SF D D D .r nO i / ni : (10)
i i i
Wetting effects, which for example highly influence the primary atomization, are
accounted for by using the model presented by Wieth et al. [29].
undercuts a certain cutoff distance, a repulsive force is applied [22]. The additional
force resembles a Lennard-Jones potential and acts on the center line between the
fluid and wall particle under consideration.
Due to the Lagrangian nature of SPH, permeable boundary conditions cannot be
handled straightforward like in Eulerian methods. Particles have to be generated at
the inlet and removed at the outlet in a rate, which is equivalent to the physical flow
rate. This is achieved by extending the numerical domain by so called buffer zones.
The buffer zones are filled with particles, which take place in the approximation. The
desired boundary conditions for the velocity u, the pressure p and the temperature T
are imposed onto these particles. The particles in the buffer zones are controlled by
markers, which do not take place in the approximation and which are positioned
right at the boundary surface. This procedure is suitable for arbitrarily shaped
boundaries and enables the generation of particles at the inlet and the removal of
particles at the outlet. A detailed description of the permeable boundaries method is
given by Braun et al. [6].
Fig. 2 Schematic of the force balance at the triple line for a general three-phase interaction
308 L. Wieth et al.
state leading to characteristic static contact angles inside the different phases. These
are indexed by the interfacial forces, which span the angles. The force
balance at the triple line results in a set of three equations, which relate the interfacial
tension coefficients to the static contact angles. The geometric interpretation of this
set of equations is known as the Neumann triangle. For dynamic contact angle
simulations the static contact angles is set as initial condition.
In our approach the surface tension is represented by the CSF model, requires an
additional acceleration in the momentum equation, which primarily depends on the
interface normals n and their divergence (curvature), cf. Eq. (10). In the vicinity of
the triple line the normal vectors are adjusted to introduce the desired static contact
angles. Up to now the modeling of fluid interactions of three liquids and/or gases has
not been realized on basis of the CSF model in the SPH framework. Hu and Adams
[15] showed the applicability of a different approach, the Continuum Surface Stress
(CSS) model to three phase interaction problems.
A schematic representation of the normal vector correction approach for a
general three phase interaction is shown in Fig. 3.
The correction of the normal vectors is only applied to particles which are close
to the triple line and if particles of the other phases are located within the radius
of influence, like it is the case for the black particle indicated in Fig. 3. For each
of those particles, two interface normal vectors n1 and n2 are calculated using (9).
These span an angle ˛, which does not necessarily represent the contact angle
In order to impose a correct static contact angle, ˛ has to be corrected as depicted
in Fig. 3. Specifically, one of the two normal vectors, in this case n2 , is rotated by
an angle ˇ to the corrected normal vector n2corr . The normal vector used for the
rotation is chosen by the strength of the kernel support. A higher kernel support
is assumed to be more trustworthy. Following the rotation of the normal vector, the
general approximations are used to calculate the curvature and then the acceleration.
The model presented yields excellent results (relative errors <5 %) in 2D and 3D
for the formation of static contact angles in a water-air-alkane system. Details of the
model and its validation shall be presented in a future publication.
Fig. 3 Illustration of the normal vector correction for the three liquid interaction case
Modeling of Single and Twin Fluid Droplets Exposed to Aerodynamic Loads 309
4 Numerical Setup
The simulations conducted for this study are focused on the deformation behavior of
single droplets in an air flow. A confined channel domain with permeable boundary
conditions (inlet, outlet) is used. A sketch of the investigated geometry is depicted
in Fig. 4.
The channel has a square cross-section with the height of hc 8:3d0 D 0:5 mm
and a length of lc 33:3d0 D 2 mm. The initial droplet diameter is denoted by
d0 . The channel is confined by moving walls in y- and z-direction, which have the
same velocity as prescribed at the inlet. At the inlet a fixed velocity in x-direction
uin and at the outlet a fixed reference pressure pout is prescribed. The velocities
imposed in this study are uin D 22:5 m=s and uin D 24:34 m=s. A summary of the
fluid properties can be found in Table 1. The properties of air resemble combustion
chamber conditions at T D 700 K and p D 2 MPa.
The single droplet with a diameter of d0 60 µm is initialized, after the air flow
has reached a quasi-steady state. Hence, the droplet is initialized by a “stamp” once
the air flow is settled. Within the stamp the fluid properties are changed from air
to the desired liquid. The air is put to rest for a short time. During this time span,
the spurious oscillations caused by the initialization of the droplet will relax and
a steady state sphere is formed. Thereafter, the air and wall particles are set to the
imposed inlet velocity, so that the droplet is exposed to a sudden aerodynamic load.
This method provides a fast and simple way to create different types of droplets. In
this paper fuel droplets with different water volume fractions of D 0, D 0:23
and D 1 are investigated. This results in single fluid droplets for D 0 (fuel)
and D 1 (water), yielding Weber numbers for the pure fuel of Wefuel 10 and
Wefuel 12 and for the water of Wewater 4:1 and Wewater 4:8. In the case
of D 0:23 a single water droplet is added to the interior of the fuel droplet.
The influence of the placement of the water is investigated for different scenarios:
centered, off center downstream (in flow direction, centered in the yz-plane), off
center upstream (against the flow direction, centered in the yz-plane), off center
perpendicular to the flow in both y-directions (centered in the xz-plane). Overall
this results in 14 simulations with each having approximately 34 million particles
using an initial particle distance of dx D 2:5 µm. This spatial resolution is the
absolute minimum required to correctly capturing the physics leading to droplet
deformation or breakup. A finer spatial resolution would give even more reliable
results. However, the computational effort would increase significantly. For the
verification of the SPH code against common correlation other single fluid droplet
simulations were conducted, which will not be specified in this context.
5 Computational Performance
The simulations were conducted using the SPH code developed by the Institut für
Thermische Strömungsmaschinen (ITS) [13]. The parallel performance of this code
was evaluated by strong scalability tests performed on the ForHLR I cluster at the
Steinbuch Center for Computing in Karlsruhe [9]. The cluster is equipped with 512
thin nodes, each having 2 Deca-Core Intel® Xeon® E5-2670 v2 processors. The
nodes are connected by an InfiniBand 4x FDR interconnect. For the scalability test,
the SPH code was compared to the grid based VoF solvers of OpenFOAM® 2.3.0
(interFoam) and another commercial CFD package. The domain investigated was
2D and turbulence modeling was neglected. Details of this study can be found in
[5]. All simulations were run for 1 h walltime, which easily enables a performance
test over three orders of magnitude. Therefore, the simulation domain was divided
into 1 up to 1000 subdomains. The results for the speedup and parallel efficiency
are depicted in Fig. 5. The results are normalized by the performance of one node or
20 cores.
The results for OpenFOAM (indicated by squares) and the commercial software
(indicated by diamonds) show a good scaling for up to 2 nodes (40 processors). The
speedup for SPH is almost ideal for up to 200 cores and has not jet reached saturation
at 1000 cores (cf. Fig. 5a). Consequently, the parallel efficiency of the SPH code
stays above 0.9 till 200 cores, but is still over 0.6 at 1000 cores (cf. Fig. 5b). The
efficiency of OpenFOAM and the commercial software indicates, that OpenFOAM
has an excellent serial performance, while this is questionable for the commercial
software. Beyond 100 cores the efficiency of both codes is severely reduced and
the speedup reaches saturation. This yields a stagnation (commercial software) or
even increase (OpenFOAM) of time needed for the simulations when increasing the
computational effort.
Modeling of Single and Twin Fluid Droplets Exposed to Aerodynamic Loads 311
Fig. 5 Comparison of parallel performance for SPH, OpenFOAM and a commercial software.
The data is normalized by the performance for one node with 20 cores. (a) Comparison of speedup
per node. (b) Comparison of parallel efficiency per node
As it is evident, the SPH code in use shows a strong parallel performance even at
high CPU numbers. Therefore, each simulation was conducted on 520 cores or 26
nodes, respectively. Depending on the simulation time, which is needed to advect
the droplet through the whole domain, the average time required for the computation
of each prediction is tComp 17,000 CPUh for the cases leading to Wefuel D 10
and tComp 14,500 CPUh for the cases leading to Wefuel D 12. Additionally,
the computation of the quasi-stationary air solution as well as the decomposition
and reconstruction of the simulation domain consumed computational resources.
However, these are small compared to the resources required for the actual
The aerodynamic loads chosen for this investigation are characterized by Weber
numbers below 12 for the pure liquids. In this We number range, droplets are
312 L. Wieth et al.
expected to just show an oscillatory deformation. Breakup can only occur due to
natural oscillations of the droplet [11]. The temporal evolution of the deformation
as predicted by SPH for a pure fuel droplet at Wefuel D 10 is depicted in Fig. 6. The
air as well as wall particles are omitted for the sake of clarity.
The aerodynamic load, indicated by black arrows, is impinging on the initial
spherical drop in a shock-like manner. This leads to the deformation of the droplet
to a flat disc. Thereafter, the surface tension causes a contraction of the droplet.
The high dynamic viscosity of the fluid prevents the droplet from elongating in flow
direction, yielding a spherical shape at the turning point of the deformation. Then,
the drop starts to flatten again due to the forces imposed by the pressure of the
air flow around the droplet and the surface tension force. Overall this leads to an
oscillatory deformation, which is dampened severely by the viscosity of the fuel. A
similar oscillatory deformation was also found for the other pure liquids. Thereby,
the oscillation of the water droplets are of a higher frequency and dampened less as
expected due to the higher surface tension and lower dynamic viscosity.
Overall the qualitatively observed deformation dynamics of the single fluid
droplets perfectly reproduce the expected behavior found by experimental investi-
gations. For the verification, that SPH is able to correctly predict the quantitative
behavior of the drop deformation, the predictions are compared to empirical
correlations in the following.
Hsiang and Faeth [14] determined the maximum extent of the droplet dcross,max
perpendicular to the flow direction as well as the minimum extent dstr,min in flow
direction experimentally. They found that the droplet extent perpendicular to the
flow almost linearly increases with time until the maximum is reached. On a basis
of a phenomenological analysis considering the surface tension and pressure forces,
together with the experimental results they proposed the following correlation:
dcross,max d0
D D 1 C 0:19We0:5 : (11)
d0 dstr,min
Modeling of Single and Twin Fluid Droplets Exposed to Aerodynamic Loads 313
Fig. 7 Comparison of the single fluid simulation results to empirical findings of Hsiang and Faeth.
(a) Maximum droplet extent perpendicular to the flow direction dy,cross,max over We. (b) Minimum
droplet extent in flow direction dstr,min over We Hsiang and Faeth [14]
The validity of this correlation is claimed to be for We < 100 and Oh < 0:1,
where Oh represents the Ohnesorge number relating the viscous forces to the surface
tension and inertial forces. The correlation claims, that the maximum cross-stream
as well as the minimum stream-wise diameter are just dependent on We. Since
the four cases addressing single fluid droplets are representing just four different
We and two different Oh, more numerical predictions of single fluid droplets were
conducted to cover a broader range of We as well as Oh. The simulations cover
2 < We < 12 and 0:005 Oh 0:1. The comparison of the numerical results
to the correlation given by (11) is depicted in Fig. 7. The numerical results were
obtained in a similar fashion as it is commonly done in experiments. Fixing the
view to one observation plane (x-y in this case), the drop dimensions in x- and
y-direction are measured. The findings for the maximum cross-stream extent in y-
direction dy,cross,max are shown in Fig. 7a while the findings for dstr,min are shown in
Fig. 7b.
It is evident, that the numerical results fit the correlation extremely well.
Minor deviations are observed for two points. One of them represents an extreme
investigated. The droplet simulated at We D 2 features an Ohnesorge number
Oh D 0:1, which is the limit of the correlation. This could explain the observed
deviation. The second case is well in the range of validity of the correlation, having
a We 5 and Oh 0:016. Therefore, this deviation only can be explained by
numerical inaccuracies resulting from different numerical setups.
Altogether, it can be stated that SPH is able to capture quantitatively the
correct physical behavior of drop deformation at low aerodynamic loads. The initial
temporal dynamics of the single fluid drop deformation is discussed in the following.
The temporal evolution of the drop deformation and thus the drag coefficient plays
a major role when developing simplified models, which can be used in an Euler-
Lagrange context. Hsiang and Faeth [14] observed a linear increase in cross-stream
314 L. Wieth et al.
diameter with time until the maximum deformation is reached. They claim, that for
a wide range of We the maximum deformation is always reached at approximately
t=t 1:6, where t D d0 .liq =gas /0:5 =u0 is the characteristic breakup time
proposed by Ranger and Nicholls [26]. The temporal evolution of the deformation
perpendicular to the flow in y-direction dy,cross for all single fluid cases is depicted
in Fig. 8.
The numerical results for the cross-stream deformation over the dimensionless
time t=t are shown in Fig. 8a. As evident, the dynamics of the deformation
predicted by SPH is not linear and cannot be correlated using the commonly used
characteristic time t. Furthermore, overall the deformation as predicted seems to
be faster than the experimental findings, exhibiting t=t 1:6 for the maximum
deformation [14]. The maximum dimensionless time needed for the maximum
deformation in SPH is t=t 1:3. The deviations observed may be due to the
experimental setup used to acquire the data. Commonly shock tube experiments
are used for the aerodynamic loading of droplets. These kind of experiments
cannot guarantee correct boundary conditions in contrast to the numerical analysis.
Furthermore, the droplets are introduced into the shock tube by a droplet chain using
a vibrating capillary tube. This might lead to an interference due to previous droplets
or the vibration might influence the initial drop shape.
A closer look to the dynamics in Fig. 8a indicates a dependence on We for the
initial deformation dynamics. The smaller We the faster the maximum cross-stream
deformation is reached. Correlating the numerical results with t and We using a 4th
order polynomial leads to the following enhanced characteristic time:
The plot of the predicted deformation dynamics over the new dimensionless time
t=T are depicted in Fig. 8b. Now the deformation dynamics of all cases coincide on
one curve. The only one case showing a deviation is the same as before (We 5;
Oh 0:016). Here, too the probable cause is numerical inaccuracies.
Fig. 8 Temporal evolution of the droplet deformation perpendicular to the flow direction. (a) Time
evolution of the initial droplet deformation perpendicular to the flow in y-direction. (b) Correlated
time evolution of the initial droplet deformation perpendicular to the flow in y-direction
Modeling of Single and Twin Fluid Droplets Exposed to Aerodynamic Loads 315
Fig. 9 Temporal evolution of the droplet deformation in flow direction. (a) Time evolution of
the initial droplet deformation in flow direction. (b) Correlated time evolution of the droplet
deformation dynamics in flow direction
The new characteristic time was derived using the data for the cross stream
deformation of the droplets. In Fig. 9 the temporal evolution of the droplet dynamics
in flow direction is plotted over the dimensionless time. In Fig. 9a t=t is used as
dimensionless time while in Fig. 9b t=T is used.
Apparently, here too the new characteristic time serves to describe the dynamics
of the deformation better, collapsing the temporal evolution to one curve. Even
the formation of a small bag in upstream direction, which is observed in most
cases, is reproduced quite well. The bag formation is indicated by the decrease of
deformation at about t=T 0:3 in Fig. 9b. The bag is formed by a faster acceleration
of the outer part of the droplet compared to the core part. Although the small bag
is formed, a droplet flattening is observed. Due to the ongoing droplet deformation
perpendicular to the flow direction, the curvature on the upstream side of the droplet
is decreased. Such a bag formation was not observed experimentally, whereas this
phenomenon occurred in other numerical investigations as well [17, 30].
In summary it can be stated, that the dynamics of the initial deformation of single
fluid droplets is dependent on We and is nonlinear with time. By introducing the new
non dimensional time t=T the different cases considered can be correlated quite well.
The influence of adding water to the droplet is investigated for the two different
aerodynamic loads and five different placements of a single water droplet inside the
fuel droplet. Exemplary the SPH prediction of temporal evolution of a fuel droplet
with water added at the center at u0 D 22:5 m=s (Wefuel D 10) is depicted in Fig. 10.
The addition of water causes a difference in deformation behavior. Due to the
higher density as well as viscosity the dispersed water is deformed slower. This
leads almost to a separation of the two phases (cf. second instance in Fig. 10),
which is counteracted by interfacial forces acting on the different fluid pairings.
316 L. Wieth et al.
Fig. 10 Temporal droplet evolution of a fuel droplet with water centered at u0 D 22:5 m=s
Fig. 11 Droplet deformation over time for all cases investigated at u0 D 22:5 m=s. (a) Temporal
evolution of the droplet deformation in flow direction dstr . (b) Temporal evolution of the droplet
deformation perpendicular to the flow in y-direction dy,cross
In the further course of the simulation fuel and water are unified again and the
droplet shows oscillatory deformations, like it was observed for the pure liquid
cases. The deformation behavior for all the other two fluid droplet simulations is
basically similar with minor differences due to the water placement. The water
droplets placed off center perpendicular to the flow direction, additionally feature
a rotation of the whole droplet. Due to the spatial resolution, which is not sufficient
to properly resolve the three-liquid contact line, the behavior of these droplets might
not be physical. Therefore, these results will be left out in the following.
In Fig. 11 the deformation of the droplets in flow direction dstr and perpendicular
to the flow dy,cross over time for an air velocity of u0 D 22:5 m=s is depicted.
In Fig. 11a the deformation in flow direction while in Fig. 11b the deformation
perpendicular to the flow in y-direction is plotted.
The pure liquid cases are indicated by black crosses (fuel) and blue squares
(water). It is evident, that the water oscillates at a higher frequency and the
oscillation is less dampened due to the lower Oh. In case of the two fluid droplets, the
damping of the oscillation is similar to that of the pure fuel droplets. The frequency
of the deformation seems to be in-between the frequencies of water and fuel for the
first two oscillations. Afterwards, the oscillations almost vanish or show a behavior
which cannot be classified by the behavior of neither fuel nor water. Possibly, the
Modeling of Single and Twin Fluid Droplets Exposed to Aerodynamic Loads 317
behavior is triggered by single water particles located at the surface of the fuel.
These single particles do not represent a physical water droplet. They are rather the
result of the insufficient spatial resolution of the three fluid contact line. Therefore,
a non-physical acceleration due to the CSF model is imposed (cf. Fig. 10).
The deviations observed for the initial droplet extent perpendicular to the flow
and in flow direction in Fig. 11 are due to the droplet initialization described
previously. During the formation of the steady state droplet, some particles of the
water droplet placed eccentric in the fuel droplet have air particles within their radius
of influence. Therefore, due to the surface tension modeling an ellipsoid rather than
a spherical droplet is formed.
The dynamics of the initial deformation shows an ambivalent behavior. In flow
direction the deformation dynamics as well as the minimum extent of the two fluid
droplets are similar to the results of the water. Perpendicular to the flow the droplet
deformation and maximum droplet extent are similar to the observations made for
the fuel droplet. At a first glance, this indicates similar drag coefficients for the two
fluid and fuel droplets in the initial deformation stage. Therefore, a description of
two fluid droplets might be possible by currently used simplified models.
Whether the same observations are true for the whole range of low We 12, is
revealed by the results with higher aerodynamic loading investigated in this study.
In Fig. 12 the droplet deformation in flow direction dstr and perpendicular to the
flow dy,cross over time is plotted for an air velocity of u0 D 24:34 m=s. Here too, the
deformation dynamics in flow direction is shown in Fig. 12a and the deformation in
cross-stream direction in Fig. 12b.
The deformation behavior of the pure liquids (water: blue squares and fuel black
crosses) show a similar behavior as for the lower aerodynamic load. Similarly, the
water droplet exhibits a distinct higher oscillation frequency and lower damping of
the oscillation than the fuel droplet. Only the initial amplitude of the deformation is
increased, as evident from correlation (12).
Regarding the deformation of the two fluid droplets, leaving out the cases with
water placed off center perpendicular to the flow again, the initial deviations of the
Fig. 12 Droplet deformation over time for all cases investigated at u0 D 24:34 m=s. (a) Temporal
evolution of the droplet deformation in flow direction dstr . (b) Temporal evolution of the droplet
deformation perpendicular to the flow in y-direction dy,cross
318 L. Wieth et al.
droplet extents as well as the dynamics observed are similar to the cases with the
lower aerodynamic load. Here too, the differences of the initial extents is also a
result of the initialization of the droplet at the start of the simulation. The frequency
of the first two oscillations is again somewhere in between the dynamics of pure
fuel and pure water. Furthermore, the damping of the amplitude is as strong as in
the case of the pure fuel, resembling the behavior observed before. Furthermore,
the amplitude of the deformation exhibits the same ambivalent behavior as in the
previous cases. The minimum extent in flow direction is similar to that of the pure
water case, whereas the maximum cross stream extent is similar to the deformation
of the pure fuel droplet.
In all cases investigated (u0 D 22:5 m=s and u0 D 24:34 m=s) the droplets
just experience deformation and no droplet breakup is observed. Since by SPH a
droplet breakup is predicted at We D 13 it is assumed that the addition of 23 %
volume fraction of water does not change the characteristics of the droplet dynamics
described by the pure fuel Weber number Wefuel , at least in the deformation regime.
Furthermore, it may be concluded from the deformation dynamics of the two fluid
droplets, that they experience a similar drag coefficient as pure fuel droplets in
the early stages of the deformation. This fact would allow to use the common
correlations with minor adaptations in Euler-Lagrange investigations for two fluid
droplets as well.
7 Conclusion
In this paper the dynamics of one and two fluid droplets at low aerodynamic loads
is investigated numerically using the Smoothed Particle Hydrodynamics (SPH)
method. The presented predictions are focused on the deformation dynamics of
pure liquid and water-in-fuel droplets with a diameter of d0 60 µm exposed to
two different air flow velocities: u0 D 22:5 m=s and u0 D 24:34 m=s. As the SPH
method is relatively new to CFD applications, first a verification of the code in-use is
done comparing numerical results for pure liquid droplets to well known empirical
findings. With few exceptions the predicted minimum initial deformation in flow
direction dstr as well as the maximum cross-stream deformation dcross perfectly
matches the correlation of Hsiang and Faeth [14]. Deviations are observed mainly
to occur at extreme conditions. The results demonstrate the capability of SPH for
capturing the droplet deformation dynamics. Second, the dynamic deformation
of the single fluid droplets was analyzed and found to be dependent on We. A
new correlation for the droplet deformation was proposed. This correlation uses
a modified definition of the characteristic time T compared to the correlations of
Hsiang and Faeth [14].
For the prediction of two fluid droplets, a single water droplet with a volume
fraction of 23 % was added to a fuel droplet and the placement was varied. For both
aerodynamic loads the two fluid droplets show a behavior, which can be classified by
the pure fuel Weber number Wefuel . The dynamic behavior of the two fluid droplets
Modeling of Single and Twin Fluid Droplets Exposed to Aerodynamic Loads 319
qualitatively feature a frequency in between the fuel and water droplet for the first
oscillations, while the damping of the oscillations is similar to the fuel droplet.
Furthermore, an ambivalent behavior is observed for the minimum extent in flow
direction and the maximum cross-stream deformation. In the first case the two fluid
droplets behave as the water droplet, while in the later case their behavior is similar
to the fuel droplet.
In general, it can be stated that the SPH code is capable of predicting multiphase
flows, like the droplet deformation, physically correct. Therefore, SPH will be used
as tool to further investigate the dynamics of droplet deformation of mono- and two
fluid droplets in order to derive simpler models, which can be used in typical CFD
predictions of sprays.
Acknowledgements The financial support of the German Federal Ministry of Economics and
Technology and Siemens AG within the cooperative research project ‘Entwicklung von Verbren-
nungstechnologien im CEC für klimaschonende Energieerzeugung (03ET7011E)’ is gratefully
This work was performed on the computational resource ForHLR Phase I, funded by the
Ministry of Science, Research and the Arts Baden-Württemberg and DFG (“Deutsche Forschungs-
1. Adami, S., Hu, X.Y., Adams, N.A.: A new surface-tension formulation for multi-phase SPH
using a reproducing divergence approximation. J. Comput. Phys. 229, 5011–5021 (2010)
2. Bartz, F.-O., Schmehl, R., Koch, R., Bauer, H.-J.: An extension of dynamic droplet deformation
model to secondary atomization. In: 23rd Annual Conference on Liquid Atomization and Spray
Systems, Brno (2010)
3. Batchelor, G.K.: An Introduction to Fluid Dynamics. Cambridge University Press, Cambridge
4. Brackbill, J.U., Kothe, D.B., Zemach, C.: A continuum method for modeling surface tension.
J. Comput. Phys. 100, 335–354 (1992)
5. Braun, S., Krug, M., Wieth, L., Höfler, C., Koch, R., Bauer, H.-J.: Simulation of primary
atomization: assessment of the smoothed particle hydrodynamics (SPH) method. In: 13th
Triennial International Conference on Liquid Atomization and Spray Systems, Tainan (2015)
6. Braun, S., Wieth, L., Koch, R., Bauer, H.-J.: A framework for permeable boundary conditions
in SPH: inlet, outlet, periodicity. In: 10th International SPHERIC Workshop, Parma (2015)
7. Colagrossi, A., Landrini, M.: Numerical simulation of interfacial flows by smoothed particle
hydrodynamics. J. Comput. Phys. 191, 448–475 (2003)
8. Dryer, F.L.: Water addition to practical combustion systems – concepts and applications. Symp.
Int. Combust. 16(1), 279–295 (1977)
9. Forschungshochleistungsrechner ForHLR Phase I http://www.bwhpc-c5.de/wiki/index.php/
ForHLR_Phase_I_Hardware_and_Architecture. Cited 04 Apr 2016
10. Gingold, R.A., Monaghan, J.J.: Smoothed particle hydrodynamics theory and application to
non-spherical stars. Mon. Not. R. Aston. Soc. 181, 375–389 (1977)
11. Guildenbecher, D.R., López-Rivera, C., Sojka, P.E.: Secondary atomization. Exp. Fluids 46,
371–402 (2009)
12. Hinze, J.O.: Fundamentals of the hydrodynamic mechanism of splitting in dispersion pro-
cesses. AIChE J. 1, 289–295 (1955)
320 L. Wieth et al.
13. Höfler, C., Braun, S., Koch, R., Bauer, H.-J.: Modeling spray formation in gas turbines – a new
meshless approach. J. Eng. Gas. Turb. Power 135, 011503-1–011503-8 (2013)
14. Hsiang, L.-P., Faeth, G.M.: Near-limit drop deformation and secondary breakup. Int. J. Mul-
tiph. Flow. 18(5), 635–652 (1992)
15. Hu, X.Y., Adams, N.A.: Angular-momentum conservative smoothed particle dynamics for
incompressible viscous flows. Phys. Fluids 18, 101702 (2006)
16. Hu, X.Y., Adams, N.A.: An incompressible multi-phase SPH method. J. Comput. Phys. 227,
264–278 (2007)
17. Khare, P., Ma, D., Chen, X., Yang, D.: Breakup of liquid droplets. In: 12th Triennial
International Conference on Liquid Atomization and Spray Systems, Heidelberg (2012)
18. Lechner, C., Seume, J.: Stationäre Gasturbinen. Springer, Heidelberg (2010)
19. Liu, M.B., Liu, G.R.: Smoothed particle hydrodynamics (SPH) an overview and recent
developments. Arch. Comput. Method E 17, 25–76 (2010)
20. Lucy, L.B.: A numerical approach to the testing of the fission hypothesis. Astron. J. 82, 1013–
1024 (1977)
21. Monaghan, J.J.: Smoothed particle hydrodynamics. Annu. Rev. Astron. Astrophys. 30, 543–
574 (1992)
22. Monaghan, J.J.: Simulating free surface flows with SPH. J. Comput. Phys. 110, 399–406 (1994)
23. Morris, J.P., Fox, P.J., Zhu, Y.: Modeling low Reynolds number incompressible flows using
SPH. J. Comput. Phys. 136, 214–226 (1997)
24. O’Rourke, P.J., Amsden, A.A.: The TAB method for numerical calculation of spray droplet
breakup. In: International Fuels and Lubricants Meeting and Exposition, Toronto (1987)
25. Quan, S., Schmidt, D.P.: Direct numerical study of a liquid droplet impulsively accelerated by
gaseous flow. Phys. Fluids 18, 102103 (2006)
26. Ranger, A.A., Nicholls, J.A.: Aerodynamic shattering of liquid drops. AIAA J. 7(2), 285–289
27. Schmehl, R.: Advanced modeling of droplet deformation and breakup for CFD analysis of
mixture preparation. In: 18th Annual Conference on Liquid Atomization and Spray Systems,
Zaragoza (2002)
28. Schmehl, R., Maier, G., Wittig, S.: CFD analysis of fuel atomization, secondary droplet
breakup and spray dispersion in the premix duct of a LPP combustor. In: 8th International
Conference on Liquid Atomization and Spray Systems, Pasadena (2000)
29. Wieth, L., Braun, S., Koch, R., Bauer, H.-J.: Modeling of liquid-wall interaction using the
meshless Smoothed Particle Hydrodynamics (SPH) method. In: 26th European Conference on
Liquid Atomization and Spray Systems, Bremen (2014)
30. Zaleski, S., Li, J., Succi, S.: Two-dimensional Navier-Stokes simulation of deformation and
breakup of liquid patches. Phys. Rev. Lett. 75(2), 244–247 (1995)
Smoothed Particle Hydrodynamics for
Numerical Predictions of Primary Atomization
1 Introduction
Jet engines for civil aircrafts have to provide the highest possible efficiency over
a broad range of operating conditions and, at the same time, a low emission
footprint. This is thermodynamically contradictory. Particularly, the reduction of
thermally induced nitric oxides (NOx) needs special attention. One major key to
2 Numerical Method
The SPH method has been developed in the late 1970s in the context of astro-
physics [12, 15]. The spatial discretization of a computational domain is done via
so called particles, which represent a certain volume of the fluid. These Lagrangian
particles move within the computational domain with the actual fluid velocity. The
governing equations describing the flow physics are the Navier Stokes equations.
The main idea behind the SPH formalism is to evaluate the physical property of a
particle or its derivative by interpolating over neighbor particles within a certain cut-
off radius. Equation (1) represents the basic interpolation formalism for a particle
with index i.
h˚ii D Vj ˚j W.xi xj ; h/ (1)
With Eq. (2), the particle’s density i can be directly calculated using its mass mi and
the available volume, which is defined by the positions of the surrounding particles
j. The conservation of momentum results in the following contributions to the total
particle acceleration. Equation (3) accounts for pressure gradients,
X mj .pi C pj /
ai;r p D r W.xij ; h/ (3)
i j
where pi and pj are the pressures of the regarded particle i and its neighbor particles
j respectively. r W.xij ; h/ denotes the spatial derivative of the kernel. Shear forces
result in accelerations given by Eq. (4) [18], d denotes the dimensionality.
X i C j vij rij
ai;v D 2mj .d C 2/. /r W.xij ; h/ (4)
i C j rij2 C 2
Note that instead of using the second derivative or a nested sum, an alternative
expression is used, where a standard SPH first derivative is combined with a finite
324 S. Braun et al.
fi;SF D D .r nO i / ni (5)
i i
In Eq. (5), ni denotes the surface normal of the interface and r nO i its normalized
curvature. The surface normal is obtained using a color function ji , by which the
different fluids are identified.
1 X 2 i
ni D Vi C Vj2 i r W.xij ; h/ (6)
Vi j i C j j
The evaluation of Eqs. (5) and (6) can be limited to particles in vicinity of
the interface, the overall computational impact of surface tension modeling can
therefore be reduced to a minimum.
In the context of this work, a weakly compressible SPH formulation is consid-
ered. Thus, solving the pressure Poisson equation is not necessary, but a suitable
equation of state must be provided in order to close the Navier Stokes equations.
The pressure is calculated using a modified Tait equation, which directly links the
density to the pressure.
c2 0
pD 1 (7)
3 Reference Setup
y hfilm
z α htrailing edge
hboundary layer
Fig. 1 Experimental setup (left) and two-dimensional numerical abstraction of the region in
vicinity of the trailing edge (right)
326 S. Braun et al.
The SPH simulations have been conducted using an in-house code, which has
been initialized at the Institut für Thermische Strömungsmaschinen 4 years ago.
It is written in C++. Parallelization is done via domain decompositioning and
MPI. Due to a lack of creativity, it is named super_sph. Before running large
3D simulations, the performance of the code has been investigated. Within the
reporting period, the serial performance has been optimized by improving the cache
efficiency. Most importantly, switching from an array of structures data layout
to a structure of arrays layout and a spatial particle sorting improved the serial
performance by a factor of 4. In order to analyze the serial performance, different
instrumentation and sampling tools like e.g. gprof [13] and MAQAO [2] have been
The parallel code performance has been investigated by scalability tests (strong
scalability). In order to classify the SPH results of the multi-phase flow simulations,
comparative runs have been performed using the VoF solvers of a commercial code
and OpenFOAM® 2.3.0, which are both grid based codes. A structured mesh has
been used, where the number of cells for both codes was identical to the number
of particles in the SPH simulations. The boundary conditions were identical to the
SPH simulation, turbulence modeling has been disabled. Using the two-dimensional
computational domain as described in Fig. 1 and Table 1 together with a high
number of cores, a maximum communication to computing ratio can be achieved,
which served to provoke and to identify communication bottlenecks. The tests have
been run on the thin nodes of the ForHLR I cluster, which are equipped with 2 Deca-
Core Intel® Xeon® E5-2670 v2 processors. For jobs with less than 20 cores, a single
node has been used exclusively. Two types of scalability tests have been performed,
using different termination criteria. For the comparison of the grid based tools and
SPH, 1 h wall clock time was set to terminate the simulations. The speedup and the
numerical efficiency have been calculated using the number of time steps, which
could be performed during this period of time. For the second scalability test, the
SPH for Numerical Predictions of Primary Atomization 327
number of time steps has been fixed. The calculation of the speedup is based on the
resulting runtimes.
It is to be emphasized that the following results might not be representative for
the maximum achievable performance of the commercial code and OpenFOAM.
Both codes have been used to the best of the author’s knowledge. The solver settings
correspond to production run settings, which are typically used at our institute.
Furthermore, small domain sizes of only 1:5 106 cells are usually not considered
to be run on more than 2 nodes.
In Figs. 2 and 3 the speedup and the parallel efficiency are depicted respectively.
The reference performance is given by the number of time steps achieved with
20 cores (1 node) within 1 h wall clock time. OpenFOAM and the commercial
Speedup per Node [–]
1 200 400 1000
Number of Cores
Fig. 2 Speedup behavior of three different codes. The termination criterion of the simulations is
1 h wall clock time. The graphs are normalized by the performance of a single node with 20 cores
2 commercial code
Efficiency per Node [–]
1.6 ideal
1 10 100 1000
Number of Cores
Fig. 3 Parallel efficiency behavior of three different codes. The termination criterion of the
simulations is 1 h wall clock time. The graphs are normalized by the performance of a single
node with 20 cores
328 S. Braun et al.
code show a reasonable scaling till 2 nodes, where OpenFOAM profits especially
from the fast intra-socket communication. The achieved serial performance of the
commercial code might be questionable. The parallel efficiency of SPH remains
above 0:9 till 200 cores. At 1000 cores, the parallel efficiency is still above 0:6 and
the speedup did not yet reach saturation. Please note, that even at a sub-domain size
of less than 1500 particles, a further acceleration of the simulation can be achieved.
The speedup of the investigated grid based methods saturated at approximately 100
cores, a further increase of computational resources would not reduce the computing
time of the simulations any more.
Typically, scalability tests are run using a fixed number of time steps as
termination criterion and not using a fixed wall clock time. This ensures that
temporal variations of the computational effort do not affect the scalability results.
An example of a temporal increased computational effort would be the occurrence
of a breakup event within the atomization process, which temporally leads to an
enlarged gas-liquid interface and, therefore, to higher computational costs.
In Fig. 4 the parallel efficiency of SPH is displayed using both termination
criteria. The red line indicates the efficiency results obtained after 1 h wall clock
time. The green line and the blue line are obtained after 20,000 time steps. The
error bars indicate a temporal variation of ˙10 s, which seems to be a reasonable
deviation when reading hundreds of sub-domain initialization files from a non-
exclusive file system. The wall clock time for 20,000 time steps varies between
12 h at 1 core and 97 s at 1000 cores. The reference performance for the green line is
given by 1 core, for the blue and the red line by 1 node, i.e. 20 cores. Due to the high
communication to computing ratio, the effect of the fast intra-socket communication
is clearly perceptible. However, the InfiniBand® 4X FDR interconnect ensures a
nearly constant efficiency up to 200 cores. In summary it can be stated, that the SPH
per node 1h wall time
per node 20k steps
per core 20k steps
Efficiency [–]
0.4 Intra–Node InfiniBand
1 10 100 1000
Number of Cores
Fig. 4 Parallel efficiency of our SPH code. Comparison of a test run with fixed wall clock time
(red curve) and 20,000 time steps (blue and green curves) as termination criterion. Reference
performance is given by one core (green) or one node (blue and red)
SPH for Numerical Predictions of Primary Atomization 329
Fig. 5 Simulated physical time within 1 h wall clock time (left) and within one CPU-hour (right)
for three different codes
code shows a decent strong scalability over 3 orders of magnitude, with sub-domain
sizes ranging from 1:5 106 down to 1500 particles.
When doing simulations, the values of speedup and efficiency are only of minor
interest. The benefit of a simulation is rather the achievable physical time, which can
be simulated within a certain time span or spending a certain amount of CPU-hours.
Therefore, in Fig. 5 the computed physical time is depicted, which can be achieved
within 1 h wall clock time or by spending one CPU-hour. The simulations with the
commercial code used a fixed time step size of 2:5 108 s, SPH and OpenFOAM
used an adaptive time stepping with mean time increments of 1:6 108 s (SPH)
and 4:3 108 s (OpenFOAM), respectively. The maximum achievable physical
time which can be computed within 1 h is limited to 0:2 ms using OpenFOAM and
0:65 ms using the commercial code. With SPH 12:3 ms have been computed using
1000 cores and saturation is not yet reached. The physical time per CPU-hour, which
can be interpreted as costs per simulated physical time, reaches up to 21 µs/CPUh
in the case of SPH. Even at 1000 cores, 12:3 µs can be achieved per CPU-hour. The
optimum values for OpenFOAM and the commercial code are 8:5 µs and 6:7 µs per
CPU-hour, respectively.
Concerning the physical results of the simulation like velocity fields, breakup
frequencies, ligament lengths, droplet sizes and droplet numbers, the commercial
code and SPH show very similar results. OpenFOAM, however, predicts very stable
liquid ligaments which resist disintegration. This results in fewer droplets of larger
mean diameters compared to the other two methods [6].
Three-dimensional test cases have not been subject to comparisons with the grid
based tools. However, 3 different 3D test simulations have been performed with
SPH. The test cases consisted of (a) 75 million, (b) 150 million and (c) 1.2 billion
particles respectively. The two small simulations were run on 400 cores, the large
one on 2560 cores. The obtained particle iteration frequencies are 79,896 (a), 77,542
(b) and 82,310 (c) particle-steps per CPU-second, which corresponds to 12:5, 12:9,
12:1 CPUµs per step and per particle.
330 S. Braun et al.
The actual target quantities of the simulation are droplet sizes and trajectories,
breakup frequencies and other characteristic length- and time-scales. As the liquid
volume only represents roughly 1 % of the entire computational domain, the data
SPH for Numerical Predictions of Primary Atomization 331
Fig. 7 Detail of a Rayleigh disintegration process and the formation of satellite droplets.
Rendering using particles (left) or tessellated surfaces
Fig. 8 Bag breakup event. The liquid phase and the wall of the atomizer lip are depicted by
tessellated surfaces. Confining upper and lower walls are not depicted. The gaseous phase is
visualized using a slice. The coloring of the left figure denotes the velocity magnitude. The coloring
of the right figure represents the particle IDs
a slice of the gaseous phase. The confining upper and lower walls are not depicted.
The coloring of the left figure indicates the velocity magnitude, the slice of the right
figure is colored by the particle ID. In the example given, the depicted ID denotes
the order of particle creation for every sub-domain located at the inflow region. This
means for the core flow, that in every sub-domain more than 30 million particles
have been released into the computational domain.Using time dependent IDs or float
values allows a descriptive depiction of vortices, recirculation zones, dead wakes or
residence times.
In Fig. 9, a time series of a bag breakup event is depicted. The time distance
between two consecutive images is 74:5 µs. Identical atomization characteristics
are observed in the experimental investigations and have been identified as main
breakup mechanism for prefilmer based atomizers [9–11]. Numerical predictions of
the generation and blowing up of the bag shaped structures are very sensitive to the
spatial discretization. If the inter-particle distance or the mesh size is too coarse, the
SPH for Numerical Predictions of Primary Atomization 333
Fig. 9 Sequence of a bag breakup event. Time increment between two consecutive images is
74:5 µs
bag will not be formed or it will burst too early. When compared to experimental
high speed videos, the sizes of the bag shaped structures in Fig. 9 seem to be too
small. This indicates, that the inter-particle spacing of 5 µm is still not sufficient
to properly represent the liquid skin of the bubbles. However, it is questionable,
if the very small droplets resulting from the breakdown of this liquid skin can be
experimentally captured at all. The quantitative comparison of the droplet spectra
from experiments and simulations will therefore be limited to droplets bigger than
14 µm in diameter, which is the spatial resolution of the high speed camera used for
image recording.
In Fig. 10, three consecutive top view snapshots of the test case are depicted. The
left half of the snapshots are experimentally obtained, using a high speed camera.
The right half shows the simulative results, where the data has been duplicated in
334 S. Braun et al.
span-wise direction. In general, the predicted frequencies, length scales and breakup
mechanisms match very well. However, the images clearly show that the very thin
skin of the liquid bags can not be taken into account using an inter-particle distance
of 5 µm. In future numerical setups we will therefore use an inter-particle distance
of 2:5 µm, which will allow to better investigate these structures.
The numerical prediction of air assisted atomizers has come into reach due to
steadily growing HPC resources. The results and insights which have been gained
by the simulations presented in this report will help to reduce the environmental
impact of civil aviation.
SPH for Numerical Predictions of Primary Atomization 335
The SPH method has proven to be an adequate tool, when it comes to predicting
multi-phase flow phenomena. Despite being a relatively young method, it can
compete with established methods. Particularly, the excellent use of the hardware
resources leads to both, fast and efficient numerical simulations. Furthermore,
the method seems to be well suited for the upcoming heterogeneous computer
systems. Although the current HPC facilities in Baden-Württemberg do mainly
comprise multi-core systems, many-core and GPU-accelerated clusters will be the
predominant facilities in the near future. Particle based methods seem to be tailored
for such computer architectures.
Regarding the scientific investigation of air assisted atomizers, further simula-
tions are planned with a higher spatial resolution. These simulations will clarify,
whether the proper representation of the experimentally observed liquid bubbles
and their skin does affect the overall spray properties substantially.
Acknowledgements This work was performed on the computational resource ForHLR Phase
I funded by the Ministry of Science, Research and the Arts Baden-Württemberg and DFG
(“Deutsche Forschungsgemeinschaft”). We greatly acknowledge the excellent technical support
provided by the Steinbuch Centre for Computing (SCC) at the Karlsruhe Institute of Technology.
1. Adami, S., Hu, X.Y., Adams, N.A.: A new surface-tension formulation for multi-phase SPH
using a reproducing divergence approximation. J. Comput. Phys. 229(13), 5011–5021 (2010)
2. Bendifallah, Z., Jalby, W., Noudohouenou, J., Oseret, E., Palomares, V., Rubial, A.C.: PAMDA:
performance assessment using MAQAO toolset and differential analysis. In: Tools for High
Performance Computing 2013, pp. 107–127. Springer, Berlin/New York (2014)
3. Brackbill, J.U., Kothe, D.B.: Dynamical Modeling of Surface Tension. NASA Conference
Publication, pp. 693–700 (1996)
4. Braun, S., Höfler, C., Koch, R., Bauer, H.-J.: Modeling fuel injection in gas turbines using
the meshless smoothed particle hydrodynamics method. In: ASME Turbo Expo 2013: Turbine
Technical Conference and Exposition, pp. V01AT04A001-V01AT04A001. American Society
of Mechanical Engineers, New York (2015)
5. Braun, S., Wieth, L., Koch, R., Bauer, H.-J.: Influence of trailing edge height on primary
atomization: numerical studies applying the smoothed particle hydrodynamics (SPH) method.
In: 13th International Conference on Liquid Atomization and Spray Systems, Taiwan (2015)
6. Braun, S., Krug, M., Wieth, L., Höfler, C. Koch, R., Bauer, H.-J.: Simulation of primary
atomization: assessment of the smoothed particle hydrodynamics (SPH) method. In: 13th
International Conference on Liquid Atomization and Spray Systems, Taiwan (2015)
7. Braun, S., Wieth, L., Koch, R., Bauer, H.-J.: A framework for permeable boundary conditions
in SPH: inlet, outlet, periodicity. In: 10th International SPHERIC Workshop, Parma (2015)
8. Edelsbrunner, H., Kirkpatrick, D.G., Seidel, R.: On the shape of a set of points in the plane.
IEEE Trans. Inf. Theory 29(4), 551–559 (1983)
9. Gepperth, S., Guildenbecher, D., Koch, R., Bauer, H.J.: Pre-filming primary atomization:
experiments and modeling. In: 23rd European Conference on Liquid Atomization and Spray
Systems (ILASS-Europe 2010), Brno, Sept 2010, pp. 6–8
10. Gepperth, S., Müller, A., Koch, R., Bauer, H.-J.: Ligament and droplet characteristics in
prefilming airblast atomization. In: International Conference on Liquid Atomization and Spray
Systems (ICLASS), Heidelberg, Sept 2012, pp. 2–6
336 S. Braun et al.
11. Gepperth, S., Koch, R., Bauer, H.-J.: Analysis and comparison of primary droplet character-
istics in the near field of a prefilming airblast atomizer. In: ASME Turbo Expo 2013: Turbine
Technical Conference and Exposition, pp. V01AT04A002-V01AT04A002. American Society
of Mechanical Engineers, New York (2013)
12. Gingold, R.A., Monaghan, J.J.: Smoothed particle hydrodynamics: theory and application to
non-spherical stars. Mon. Not. R. Astron. Soc. 181(3), 375–389 (1977)
13. Graham, S.L., Kessler, P.B., Mckusick, M.K.: Gprof: a call graph execution profiler. ACM
Sigplan Not. 17(6), 120–126. ACM (1982)
14. Hu, X.Y., Adams, N.A.: An incompressible multi-phase SPH method. J. Comput. Phys. 227(1),
264–278 (2007)
15. Lucy, L.B.: A numerical approach to the testing of the fission hypothesis. Astron. J. 82, 1013–
1024 (1977)
16. Morris, J.P.: Simulating surface tension with smoothed particle hydrodynamics. Int. J. Numer.
Methods Fluids 33, 333–353 (2000)
17. Rosenfeld, A., Pfaltz, J.L.: Sequential operations in digital picture processing. J. ACM (JACM)
13(4), 471–494 (1966)
18. Szewc, K., Pozorski, J., Minier, J.P.: Analysis of the incompressibility constraint in the
smoothed particle hydrodynamics method. Int. J. Numer. Methods Eng. 92(4), 343–369 (2012)
Towards Solving Fluid Flow Domain
Identification Problems with Adjoint Lattice
Boltzmann Methods
Abstract A novel strategy towards solving fluid flow domain identification prob-
lems for incompressible Newtonian fluids is proposed and investigated in this paper.
The resulting numerical approach is of great importance for academic studies as
well as for medical and industrial applications. For example, it can be used in
combination with Phase Contrast MRI measurements to characterise flow dynamics
as well as flow domains highly accurately. The problem is formulated as a optimi-
sation problem which minimised the distance between a given and a simulated flow
field, whereby the latter one is the solution of a parameterised porous media BGK-
Boltzmann model. The parameter represents the porosity distributed in the domain
and its distribution is obtained as the final result of the optimisation problem. The
proposed gradient-based solution strategy makes use of an adjoint lattice Boltzmann
method (ALBM). Due to their similar structure to lattice Boltzmann methods
(LBM), they also show excellent parallelisation behaviour. In this preliminary
work, first validation results are presented as well as performance results and
improvements for both single core and parallel implementation. In particular, with
a simple domain identification test case a cube is being identified by position and
shape inside a wind tunnel in only few optimisation steps, even with only partial
flow data being available.
1 Introduction
Here, for any model parameter h 2 R>0 , f D f .t; r; c/ is the f -particle distribution
function in a transient phase space of dimension 2d with time t 2 I D Œt0 ; t1 /
R0 ,
position r 2 $
Rd and velocity c 2 Rd . The total derivative of f is denoted by
f D @t@ C c rr C mF rc f . The particle density f and macroscopic velocity uf
of the Newtonian fluid are obtained as moments of f as follows:
f WD f .v/dv and uf WD vf .v/dv :
Rd f $
eq f hd 3 2
Mf;dh D d=2 exp c h dh uf h in I $ Rd
2 2
The coupling of the model parameter h to the discretisation parameter leads to LBM.
The continuous space I$Rd is replaced by a discrete space Ih $h Q where h is
identified with the model parameter and is now called the discretisation parameter.
The position space $h is chosen as a uniform grid with˚ spacing ır1 D ır2 D : : : D
ırd D h and the discrete time interval is set to Ih WD t 2 I W t D t0 C kh2 ; k 2 N .
The velocity space Q consists of q 2 N directions ci .i D 0; 1; : : : ; q 1/ which
link dedicated neighbouring positions in such a way that for r 2 int $h it holds r C
ci h2 2 $h , i.e. ci h1 . The resulting discrete phase space is called the lattice and
denoted by DdQq. To reflect the discretisation of the velocity space, the continuous
distribution function f is replaced by a set f h of q distribution functions fi .i D
0; 1; : : : ; q 1/, representing an average value of f in the vicinity of the velocity ci .
The iterative process in an LB algorithm can be written in two steps as follows,
the collision step (3) and the streaming step (4):
fQi .t; r/ D fi .t; r/
fi .t; r/ Mfi ;dh .t; r/ ; (3)
3 C 1=2
fi .t C h2 ; r C ci h2 / D fQi .t; r/ (4)
Towards Solving Fluid Flow Domain Identification Problems 341
for i D 0; 1; : : : ; q 1, where
wi 3 9 2
f h 1 C 3h2 ci dh uf h h2 .dh uf h /2 C h4 ci dh uf h
Mfi ;dh .t; r/ WD
w 2 2
1 X
f h WD fi and uf h WD ci fi :
f h iD0
The variable uf h corresponds to the macroscopic fluid velocity and f h to the mass
density. The kinematic fluid viscosity is assumed to be given, and the terms wi =w,
ci h (i D 0; 1; : : : ; q 1) are model dependent constants. An exhaustive derivation
of various LB equations can be found e.g. in [1, 5, 18].
In [10] Krause shows for the D2Q9 and D3Q27 that the truncation error
comparing an element of the diffusive limit family of BGK-Boltzmann equations
with its corresponding discrete LB term is of second order. In most previously
published derivations of LBM, macroscopically motivated assumptions are made
which is in contrast to Krause’s approach. This is important to note since the
derivation of the ALBM, presented later on in this article, will follow the approach
in [10].
In this section, a strategy for solving fluid flow domain identification problems
is presented using a method similar to LBM. Therefore, a general fluid flow
optimisation problem is formulated, which is then discretised step-by-step by
a first-optimise-then-discretise approach. A continuous solution strategy for the
optimisation problem is given by formulating a primal and dual problem. The
specific domain identification problem equations are then formulated and discretised
with an adjoint lattice Boltzmann method (ALBM). Implementation details are
provided regarding the ALBM and its parallelisation.
In the following, a strategy to solve optimal flow control and flow optimisation
problems of incompressible Newtonian fluids numerically is presented. The class
considered consists of constrained optimisation problems which can be formulated
in an abstract manner according to
find control ˛ and state f which
minimise J. f ; ˛/ and fulfill G. f ; ˛/ D 0 :
The particle distribution function f is said to be the state, the vector ˛ the control, the
functional J the objective or cost functional and G. f ; ˛/ D 0 the constraint or side
condition. Here, the side condition couples the control ˛ with the state f in terms of
a BGK-Boltzmann equation which is chosen as an element of the corresponding
diffusive limit family of BGK-Boltzmann equations. This is in contrast to the
classical macroscopic approach where the constraint is typically governed by a
Navier-Stokes equation.
Problems of this class can be solved numerically in two steps by a procedure
often referred to as the first-optimise-then-discretise strategy [4]. For the first step,
it is proposed to solve the optimisation problem iteratively by applying a line search
algorithm as presented in Algorithm 1. In particular a gradient-based method like
steepest descent or BFGS in combination with e.g. the Armijo or the Wolfe-Powell
rule can be chosen (e.g. [3]). Methods of this type have in common that solely
the evaluation of the goal functional J and its total derivative d˛ J are required to
determine the descent direction d and the step length ı in every optimisation step
k k
k D 1; 2; : : :.
The evaluation of the goal functional J requires solving the side condition
G. f ; ˛k / D 0 to obtain f .˛k / which corresponds to solving a fluid flow problem
in every optimisation step k D 1; 2; : : :. This can be done numerically after
discretisation as illustrated in Sect. 2.
Towards Solving Fluid Flow Domain Identification Problems 343
@ @
J. f .˛/; ˛/ D ' G. f .˛/; ˛// : (7)
@f .˛/ @f .˛/
Using the porous media model requires the control parameter ˛.r/ 2 R to be
projected onto the porosity parameter dh .r/ 2 Œ0; 1 for all points r 2 $ through
an operator B˛ D dh . Finding an appropriate operator B with sensitive optimisation
behaviour is subject to current research within this project.
With domain identification problems the goal functional J is defined by
J. f ; ˛/ D .uf u /2 dr (8)
2 $
@ .u uf /.v u /
J. f ; ˛/ D ;
@f f
where u is the measured flow field (e.g. of an MRI scan). Note, subdomains of $
may also be used.
344 M.J. Krause et al.
@ @
. C v rr /' D dQ.'/ J (11)
@t @f
1 eq
dQ.'/ D .' dMf;B˛ / (12)
Z 2
eq 3h uf v B˛uf vO B˛ C 1 eq
dMf;B˛ D '.v/
O Mf;B˛ .v/
O d vO :
The adjoint lattice Boltzmann equation (ALB equation) in discrete time and phase
space reads
'j .t/ 'j .t h2 / D
'j .t/ dMf h;B˛ .t/
3 C 1=2
h2 dJf h;B˛ .t/ for t 2 Ih ; j D 0; 1; : : : ; q 1 ;
6 C 1
where 'j .t; r/ WD '.t; r; cj / and cj 2 Q.
The transient phase space I $ Rd is discretised by Ih $h Q exactly as
described in Sect. 2. Here, h 2 R>0 denotes the discretisation parameter which is
coupled to a particular adjoint BGK-Boltzmann equation (14). As for the LBM (cf.
[1, 18, 20]), the particular choice of Ih $h Q sets up an ALBM model which is
denoted by DdQq with d representing the dimension and q the cardinal number of
Rd . Commonly applied models are D2Q9, D3Q19 and D3Q27.
Towards Solving Fluid Flow Domain Identification Problems 345
The velocity discrete adjoint Maxwellian distribution dMf h;B˛ which belongs to
the adjoint Maxwellian distribution dMf;B˛ is defined in I $ Q. By setting
'j .t; r/ WD '.t; r; cj / for all t 2 I, r 2 $ and cj 2 Q it reads
3 huf h cQ j huf h cQ i C 1 eq
dMf h;B˛ .cj / WD 'i .ci / Mf h;B˛ (15)
f h
for all cj 2 Q in I $.
.u uf h /.cj u /
dJf h;B˛ .cj / D :
f h
The structure of an ALB equation like (14) is very similar to that of a standard
LB equation. The main differences are its time reverse character and the additional
term 6C1 h2 dJf h . However, its locality properties basically remain the same. This
encourages the implementation of ALBM with a similar algorithm to that for LBM
presented in Sect. 2.
An iterative algorithm can be derived from (14). It is executed step by step for
decreasing t 2 Ih . In each single time step two operations are to be performed for
all r 2 $h and every j D 0; 1; : : : ; q 1, namely the adjoint collision step
1 6
h2 dJf h .t/
'Qj .t; r/ D 'j .t; r/ 'j .t; r/ dMf h .t; r/ C (16)
3 C 1=2 6 C 1
4 Numerical Experiments
First, the results of reconstruction of a partially given flow field are presented, in
order to validate the proposed method. Afterwards, the single core performance
improvements of the open-source LBM implementation OpenLB1 are presented.
The results of the most recent version 1.0 are compared to the version 0.9 of
OpenLB. Finally, a comprehensive scaling study on the HPC cluster FH1 is shown.
In a simple test scenario, the validity of the method is demonstrated. For this
purpose, artificial “experimental” flow data u is being generated by computing the
flow field around a solid cube in a virtual wind tunnel with fixed velocity values at
all boundaries. The tunnel is constructed to be 125 times as big as the cube.
The cube is then to be identified by the adjoint lattice Boltzmann algorithm
(Algorithm 1). The cube is to be identified in terms of position and shape within
the design domain, an area around the cube 9 times as big as the cube. Thereafter,
the algorithm is provided with only partial data of the simulated flow field, meaning
that the goal functional (8) integrates only over a subdomain $ N $ instead of $
(see blue colored regions on the left hand side of Fig. 1). The object is still expected
to be identified.
Figure 1 shows the reconstruction of the cube with different amounts of
“experimental” data u being provided to the domain identification algorithm. Even
in the case of only quarter of the overall data being provided at distance of the cube,
the object can be reconstructed within 20 optimisation steps. These results show the
high potential of a porous-media-based adjoint lattice Boltzmann method to improve
noisy MRI data.
Towards Solving Fluid Flow Domain Identification Problems 347
Fig. 1 Identification of a cube through the porous media adjoint lattice Boltzmann method. On the
left hand side, the (sub-)domain $ of the goal functional (8) is marked blue, with the cube (red)
being surrounded by the design domain (dotted line). The goal functional (8) computes the error
between the measured input flow data and the simulated optimisation flow data inside the blue-
colored area. The control parameter ˛ k determines the lattice porosity dhk in the design domain at
the k-th optimisation step, where low porosity values are to be interpreted as solid. ˛ k is determined
by Algorithm 1 for every optimisation step k. The fluid flow is said to enter the virtual wind tunnel
from left at a fixed velocity u for both artificial flow data generation and optimisation. The right
hand side shows the resulting reconstruction of the cube after various optimisation steps k. Even
in the case of only quarter of the overall data being provided at distance of the cube, the object can
be reconstructed within 20 optimisation steps
348 M.J. Krause et al.
v9 v4
v14 v1
v15 v11
v10 v16
z v18
y v17
MLUP/ps 10
1 2 4
Fig. 3 Million Lattice Updates (MLUP) per process and second as a function of the allocated
cores. Graph shows that the recent version 1:0 of OpenLB performs about 30 % faster than version
0:9 The problem size is fixed to 125;000 lattice nodes, see cylinder3d example of OpenLB.
Computations are performed on Intel i7-4790 compiled with gcc 5.3
Scaling studies on the HPC cluster FH12 are promising with the open-source
software OpenLB, which is developed by the working group Computational Process
Engineering (CPE) at the Karlsruhe Institute of Technology (KIT). As shown in
Sect. 3.4, ALBM based optimisation requires remarkable computation time. For
every optimisation step, a 3D fluid flow problem is to be solved numerically. As
a consequence HPC infrastructure is an elementary component for the proposed
method to tackle relevant problems.
Key index of performance is MLUP=ps (Mega Lattice Updates per process and
second, which is proportional to “mega FLOP per second and per core”), denoting
the number of fluid cells computed in one second by a single core. Two conclusions
can be drawn from the following discussion:
(a) Running the algorithm on an increasing number of cores with a fixed overall
problem size increases the amount of necessary communication and results in
lower MLUP=ps (strong scaling).
(b) Running the algorithm on an increasing number of cores with a fixed problem
size per core leads to constant MLUP=ps (weak scaling).
350 M.J. Krause et al.
N=101^3 FH1
N=201^3 FH1
7 N=401^3 FH1
N=801^3 FH1
1 20 160 1280
Fig. 4 Simulation of lid driven cavity on HPC cluster FH1 using open-source software OpenLB.
The graph shows for varying fluid cell number N, the performance index MLUPS=ps (computed
cells per second as a function of allocated cores). For fixed overall problem size N (strong scaling),
a decrease of MLUPS=ps is observed (horizontal lines). However, for weak scaling (constant
problem size per core) a constant MLUPS=ps is seen (vertical lines)
For a problem size of 1013 computed on a single compute node (2 deca-core CPUs),
a MLUP=ps of 3 is obtained (see Fig. 4), meaning the algorithm simulates 60 million
fluid cells per second on a single compute node. In comparison, for a problem of
size 4013, which is a typical size for LBM applications, 4:8 MLUP=ps have been
observed using one compute node (20 cores) and 4:5 with four compute nodes (80
cores). Due to the favourable relation between communication and computation it
holds: The bigger the problem, the higher the performance in terms of MLUP=ps.
In fact, OpenLB provides a very good incremental speed-up of about 46 or an
efficiency of 0:57. The biggest problem currently being considered is of size 8013
and shows an excellent efficiency of 0:88 from 40 to 1600 cores.
Table 1 Nearly constant Number of cores Fluid cells per core MLUP=ps
performance index MLUP=ps
for varying core numbers 20 5 104 3.01
with fixed problem size per 160 5 104 3.0
core. This indicates that the 1280 5 104 2.98
OpenMPI implementation of 1 106 5.2
OpenLB provides outstanding 8 106 5.0
scaling properties and
benefits particularly from 64 106 4.9
HPC infrastructure 512 106 4.9
scaling (see Table 1). Therefore, OpenLB is very well suited for massively parallel
5 Conclusion
1. Chopard, B., Droz, M.: Cellular Automata Modeling of Physical Systems. Cambridge Univer-
sity Press, Cambridge/New York (1998)
2. Fietz, J., et al.: Optimized hybrid parallel lattice Boltzmann fluid flow simulations on complex
geometries. In: Kaklamanis, C., Papatheodorou, T., Spirakis, P.G. (eds.) Euro-Par 2012
Parallel Processing. Lecture Notes in Computer Science, vol. 7484, pp. 818–829. Springer,
Berlin/Heidelberg (2012). ISBN:9783642328190, doi:10.1007/9783642328206_81, http://dx.
3. Geiger, C., Kanzow, C.: Numerische Verfahren zur Lösung un-restringierter Optimierungsauf-
gaben. Springer-Lehrbuch. Springer, Berlin (1999). ISBN:3540662200, http://swbplus.bszbw.
4. Gunzburger, M.D.: Perspectives in flow control and optimization. Advances in design and
control. Society for Industrial and Applied Mathematics, Philadelphia (2002). http://www.ulb.
5. Hänel, D.: Molekulare Gasdynamik. Springer (2004)
6. Henn, T., et al.: Parallel dilute particulate flow simulations in the human nasal cavity. Comput.
Fluids 124, 197–207 (2016). ISSN:0045-7930, doi:http://dxdoiorg/10.1016/jcompfluid2015.
08002, http://www.sciencedirect.com/science/article/pii/S0045793015002728
7. Heuveline, V., Strauss, F.: Shape optimization towards stability in constrained hydrodynamic
systems. J. Comput. Phys. 228, 938–951 (2009)
8. Heuveline, V., Krause, M.J., Latt, J.: Towards a hybrid parallelization of lattice Boltzmann
methods. Comput. Math. Appl. 58, 1071–1080 (2009). doi:10.1016/j.camwa2009.04001,
9. Kirk, A., et al.: Lattice Boltzmann topology optimization for transient flow. In:
MAESC 2011 Conference May 3, 2011. Christian Brothers University Memphis,
Tennessee (2011). http://wwwmaescorg/maesc11/Papers/Kirk_Kreissl_Pingen_Maute_
10. Krause, M.J.: Fluid flow simulation and optimisation with lattice Boltzmann methods on
high performance computers: application to the human respiratory system. Eng. http://digbib.
ubka.uni-karlsruhe.de/volltexte/1000019768. PhD thesis, Karlsruhe Institute of Technology
(KIT), Universität Karlsruhe (TH), Karlsruhe, July 2010. http://digbib.ubka.uni-karlsruhe.de/
11. Massaioli, F., Amati, G.: Achieving high performance in a LBM code using OpenMP. In:
EWOMP 2002, Rome (2002)
12. Ni, J., et al.: Parallelism of lattice Boltzmann method (LBM) for Lid- driven cavity flows.
In: High Performance Computing and Applications (HPCA2004), Shanghai, 8–10 Aug
2004. Accepted and being published in lecture note in computer science (LNCS). Springer,
Heidelberg (2004)
13. Pingen, G., Evgrafov, A., Maute, K.: Topology optimization of flow domains using the lattice
Boltzmann method. Struct. Multidiscip. Optim. 34(6), 507–524 (2007)
14. Pingen, G., Evgrafov, A., Maute, K.: A parallel Schur complement solver for the solution of
the adjoint steady-state lattice Boltzmann equations: application to design optimisation. Int. J.
Comput. Fluid Dyn. 22(7), 457–464 (2008)
15. Pingen, G., Evgrafov, A., Maute, K.: Adjoint parameter sensitivity analysis
for the hydrodynamic lattice Boltzmann method with applications to design
optimization. Comput. Fluids 38(4), 910–923 (2009). ISSN:0045-7930,
doi:10.1016/jcompfluid200810.002, http://www.sciencedirect.com/science/article/B6V264-
16. Pohl, T., et al.: Performance evaluation of parallel large-scale lattice Boltzmann applications
on three supercomputing architectures. In: Proceedings of the ACM/IEEE SC2004 Conference
on Supercomputing, Washington, DC, p. 21 (2004)
Towards Solving Fluid Flow Domain Identification Problems 353
17. Saint-Raymond, L.: From the BGK model to the Navier-Stokes equations. Annales Scien-
tifiques de l’École Normale Supérieure 36(2), 271–317 (2003). ISSN:0012-9593, doi:10.1016
/S0012-9593(03) 00010-7, http://www.sciencedirect.com/science/article/B6VKH48HS9DK5/
18. Sukop, M.C., Thorne, D.T.: Lattice Boltzmann Modeling. Springer, Berlin/New York (2006)
19. Tekitek, M.M., et al.: Adjoint lattice Boltzmann equation for parameter identification. Comput.
Fluids 35, 805–813 (2006)
20. Wolf-Gladrow, D.A.: Lattice-Gas, Cellular Automata and Lattice Boltzmann Models,
An Introduction. Lecture Notes in Mathematics. Springer, Heidelberg/Berlin (2000).
21. Zeiser, T., Götz, J., Stürmer, M.: On performance and accuracy of lattice Boltzmann approaches
for single phase flow in porous media: a toy became an accepted tool – how to maintain
its features despite more and mor complex (physical) models and changing trends in high
performance computing!? In: Krause, E., Shokin, Y.I., Shokina, N. (eds.) Computational
Science and High Performance Computing III, Proceedings of 3rd Russian-German Workshop
on High Performance Computing, Novosibirsk, 23–27 July 2007. Notes on Numerical Fluid
Mechanics and Multidisciplinary Design, vol. 101. Springer (2008)
Investigation on Air Entrapment in Paint Drops
Under Impact onto Dry Solid Surfaces
Abstract The present annual report summarises the purpose of the project and
the ongoing investigations performed at the Institut für Industrielle Fertigung und
Fabrikbetrieb Universität Stuttgart (IFF) on the numerical study of paint drop
impacting onto dry solid surfaces. Both Newtonian and yield-stress viscous droplets
were applied. Detailed numerical observations of the drop impact dynamics with the
focus of air entrapment were obtained. It has been found that at the early stage of
the droplet spreading there is no contact line movement, but only direct contact
of the droplet outline with the substrate, which results in the formation of an air
disc under the impact point. The maximum air disc is reached, when the drop
spreading is driven by the movement of the fully wetted contact line. Numerical
results showed much more bubble entrapment at the interface between liquid and
solid for Newtonian droplets. For shear thinning non-Newtonian fluids the created
air disc and air bubbles during drop spreading are reduced tremendously because of
the quite low liquid viscosity. The effects of the drop properties, impact velocity and
static contact angles on the maximum air disc and on the air bubble release from the
droplet film were analysed.
1 Introduction
Droplet impingement and spreading on a solid surface are phenomena that occur
frequently in many industrial applications, such as coating processes using liquid
sprays. The paint film quality of such coating processes is affected by the entrapment
of air bubbles in the liquid film which release later in the drying process, resulting
in pinholes in the dry paint film. One of the presumptions where air bubbles come
from is the air entrapment resulting from the impact of the liquid drops, which has
been experimentally observed by many researchers using different liquid materials
and impact velocities.
Q. Ye () • O. Tiedje
Institut für Industrielle Fertigung und Fabrikbetrieb, Universität Stuttgart, Nobelstr. 12, D-70569
Stuttgart, Germany
e-mail: qiaoyan.ye@ipa.fraunhofer.de
Experimental observations of the impact of liquid drops onto dry solid surfaces at
room temperature with the analysis of air entrapment have been reported extensively
[1, 4, 10, 12–14]. By using flash photographic methods and high speed cameras, as
well as different light settings, such as back, or oblique lighting, with and without
light diffuser, the authors observed bubble formation at the stagnation point and
assumed bubble formation because of a dimple created at the drop surface at impact
point [1, 12]. In investigations using viscous drops, Thoroddsen et al.[14] found
much more bubble entrapment during the drop spreading process, resulting from
the localised contacts of the levitated lamella with the solid substrate, especially
for intermediate values of the Reynolds number (Re 250–350). Similar researches
have also been carried out by Palacios et al. [10]. Besides myriad of air bubbles at the
interface between liquid and solid, they also observed two rings of micro-bubbles
under the drop of glycerol/water, impacting onto a dry glass surface at Reynolds
and Weber numbers around the splashing/deposition threshold and analysed the
behaviour of these rings, depending on the drop impact velocity and on the ranges of
relevant dimensionless numbers. However, the quality of the time-resolved imaging
depends strongly on the used facilities, namely the flash photographic methods and
high speed cameras, as well as the different light settings. In general, large drops,
e.g. d > 500 m, have to be used in the experiment. For small drops (50–300 m),
especially for opaque liquids, like in spray painting processes, it is very difficult
to experimentally get high quality time-resolved imaging of the entrapment of air
bubbles by drop impingement.
There are not so many numerical studies that focus on the air entrapment under
the drop impact. Mehdi-Nejad, Mostaghimi and Chandra [8] simulated the impact
of water, n-heptane, and molten nickel droplets on a solid surface using two-
dimensional computational domains. They included the effect of the gas around the
droplets and predicted the formation of air bubbles at the solid-liquid interface. The
impact dynamics of non-Newtonian drops, namely, yield-stress fluid droplets, have
been studied experimentally [5, 9] and numerically [6]. The latter study evaluated
the influence of the rheological parameters on the droplet spreading and recoiling
Although the air entrapment phenomenon under drop impact onto a solid surface
is well known experimentally, the knowledge about the detailed processes and
mechanisms underlying air bubble entrapment and release from the liquid film is
still limited, especially for high-viscous and non-Newtonian liquids. In this project
we have carried out numerical studies on Newtonian viscous drops (0.04–1 Pa s) and
yield-stress drops impacting onto dry smooth solid surface. Parameters of impact
velocity and droplet diameter were selected (Fig. 1) by taking into account the spray
painting applications. Comparison of air entrapment and bubble release from liquid
films between Newtonian and non-Newtonian liquids was carried out.
Investigation on Air Entrapment in Paint Drops Under Impact onto Dry Solid Surfaces 357
Fig. 1 Droplet impact velocity vs. droplet diameter of the different atomizers in coating industry
2 Numerical Method
For drop impact of non-Newtonian fluids two paint liquids were used in the sim-
ulations. The corresponding rheological properties were experimentally obtained
using a rotation viscometer.
Calculations were carried out using Cray XC40 (Hazelhen) at High Performance
Computing Center Stuttgart. Figure 2 shows the evaluation of the code performance
that was made using a grid size of 80 million cells and a time steps of 1e-6 s. The
wall-clock time with 1200 and 2400 cores was tremendously decreased. However,
the performance using more than 2400 cores was worse. Simulations were mainly
carried out with reasonable parallel processors of 1200, which has already speeded
up the parameter study tremendously. The CPU-times for calculating one second
droplet impact process are summarized in Table 1. In most cases we reach the
equilibrium state in the calculation after 0.1 s of the process, which results in 20
CPU-hours per case using 1200 cores.
Fig. 2 Performance of parallel processors for a test case of droplet impact calculation
3 Simulation Results
Slow deposition of drops onto a near-complete wetting solid substrate was experi-
mentally and numerically investigated by Ding et al. [2]. They observed the typical
droplet shape evolution of pinch-off process and the occurrence of droplet ejections
from the mother drop in rapid droplet spreading, as shown in Fig. 3. Pinch-off
criteria was analysed. Under certain conditions, six stages of pinch-off with droplet
ejections could be observed.
In the present investigation, simulation of water droplet spread with the zero
impact velocity has been carried out. Droplet diameter of 300 m and static contact
angle of 30ı were applied. Figure 4 shows the calculated drop shape evolution.
Comparing to the experimental results [2], a qualitative identical behaviour of the
pinch-off process can be observed. Since the parameters used in the simulation are
not identical to the experiment from Ding et al., the production of daughter droplets
with only two stages of droplet ejections was observed. Figure 4 shows also the air
entrapment by the coalescence of the daughter and mother drops.
Fig. 3 First-stage pinch-off for a water drop of 486 m in diameter, u D 0 m/s, Oh D = d D
1:68e 4;
s D 12ı [2]
Fig. 4 Simulated first pinch-off for a water drop of 300 m in diameter, u D 0 m/s, Oh D 6.78 e-3,
s D 30ı (Contours of volume fractions: red: air, blue: water)
Investigation on Air Entrapment in Paint Drops Under Impact onto Dry Solid Surfaces 361
Table 2 summarizes the parameters used for Newtonian liquids in this study. The
corresponding dimensionless numbers are Reynolds number p Re D Du=, Weber
number We D u2 D=, Ohnesorge number Oh D = d, where D, , and
are the diameter, density, viscosity and surface tension of the drop, respectively;
u is the drop impact velocity. For viscous liquids we have, Re < 75, We < 1200 and
Oh D 0.271 11.55. A high and a relative low viscosity, e.g. 1 and 0.04 Pa s, and
impact velocity of 1 and 10 m/s, were applied. For comparison we also carried out
drop impact simulations of water drops with Re D 300. Clearly, the regime of drop
impact presented in this section is mainly the droplet spreading on the wall without
breakup, especially for viscous liquids.
Fig. 5 Detailed view of contours of air volume fractions scaled from 0.01 to 0.8 for the impact of
a water drop (case: lh2o from Table 2: D D 300 m, impact velocity D 1 m/s, corresponding to
Re D 300, We D 4). (a): 1 s before impact, (b): droplet contacts just with the wall, (c): maximal
air disc on the wall, (d): air bubble under the bottom centre of the drop
disc under water drops at the initial contact. The droplet shape, however, could be
unstable because of the surrounding experimental conditions in a droplet free-fall.
The large droplet usually used in experimental research makes its shape change
easily. Since the free-fall distance in the present simulation is quite small, only one
tenth of the drop diameter, the spherical droplet is always ensured before the droplet
impact in the calculation.
An initial air disc (Fig. 5b) with a radius of 11 m was obtained. The air disc
is enlarged continuously during the drop spreading until a fully wetted contact
line is created. The subsequent spreading is driven by the contact line movement.
Such wetting process can be seen more clearly later for the case of viscous drop.
The maximal radius of the captured air disc, as shown in Fig. 5c, is about 32 m
with the thickness < 1 m. This air disc contracts into a bubble whose equivalent
diameter is about 13 m under the bottom centre of the drop. The time interval of
the contraction is ca. 20 s.
The simulation results using a viscous fluid listed in the case l1b in Table 2
are shown in Fig. 6 with the focus on the phase contours close to the solid wall.
Compared to the water drop, a slightly weaker flatness and smoothness of the
curvature around the impact point can be observed in at t D 0. The initial air disc
radius is about 9 m and is enlarged continuously, as shown in Fig. 6 at t D 0.022 ms.
The air contracts into bubbles during the spread process, resulting in a partly wetted
region. The maximal air disc on the wall (strictly speaking, the region with bubbles
Investigation on Air Entrapment in Paint Drops Under Impact onto Dry Solid Surfaces 363
Fig. 6 Viscous droplet: detailed view of contours of air volume fractions scaled from 0.01 to 0.8
(case: l1b: D D 300 m, impact velocity D 1 m/s, corresponding to Re D 7.5, We D 12
and partly wetted area) was observed at t D 0.23 ms to have a radial size of 253 m
(Fig. 6). Clearly, there is always a thin air layer or air disc under droplet impact onto
solid surface. This thin air layer results from the direct contact between the droplet
outline and the substrate, even for a near-completely wetting solid substrate. The
maximum air disc is reached if the wetted contact line moves. Of course, the size of
air disc and the release of air bubbles depend on material properties and application
parameters, which will be discussed in the following sections
As shown in Figs. 5 and 6, the air layer contracts into bubbles. Figure 7 shows the
evolution of the water drop impact with the focus particularly on the droplet shape,
air bubble formation and release. The bubble created by the air disc could not drift
up at once and is located under the centre of the drop because of the symmetrical
down flow inside the droplet. With the decreasing of the apex height of the drop,
the bubble leaves the liquid film (Fig. 7c). In this case, inertia force is lower than
the large surface tension, and high SCA, namely worse wettability, yielding a strong
contraction of the liquid film, which in turn results in droplet breakup (Fig. 7d).
Formation of new small bubbles during the coalescence of drops on the solid surface
was observed (Fig. 7e). During the advance and recoil of the droplet, the bubbles
drift up. An air-bubble-free condition was observed after approx. 2 ms by examining
the 3d-region of the liquid phase.
In contrast to the water droplet, the release of bubbles from the viscous droplet
is quite difficult, which can be observed in Figs. 8 and 9. Much more air bubbles are
364 Q. Ye and O. Tiedje
Fig. 7 Contours of volume fractions (1: air, 0: water liquid), impact of a water drop (D D 300 m,
impact velocity D 1 m/s, corresponding to Re D 300, We D 4), t is the real impact time
Fig. 8 Contours of air volume fraction (cross section view, red: air, blue: liquid), impact of a
viscous drop (D D 300 m, impact velocity D 1 m/s, corresponding to Re D 7.5, We D 12), t is
the real impact time
Fig. 9 Contour lines of air volume fraction (bottom view, 1: air, 0: liquid), impact of a viscous
drop (D D 300 m, impact velocity D 1 m/s, corresponding to Re D 7.5, We D 12). Scale line is
100 m
entrapped in the liquid-solid interface for the viscous drop, which is in accordance
with experimental observations [10, 14]. During the advance and recoil processes
Investigation on Air Entrapment in Paint Drops Under Impact onto Dry Solid Surfaces 365
micro-bubbles combine and some of them are able to leave the liquid film if the
height of the film decreases sufficiently. During the first advance scenario, the lowest
apex height of drop in the centre is reached, resulting in the escape of large bubbles
located in the centre at first (Fig. 8 at t D 0.972 ms). The remaining bubbles move
radially outward by the oscillation of the drop spreading and drift up as soon as
the drift forces are strong enough to overcome the adhesion force. Figure 9 shows
detailed air bubble formation on the solid surface. Automatic scaling of air volume
fraction was applied, 0.62–1 for the sequence at t D 0.022 ms, 0–1 for the rest of
sequences. Some air bubble patterns, e.g. centre bubble rings and cartwheel patterns
in Fig. 9 at t D 0.022 ms, can be observed, which is similar to the experimental
observations reported by Palacios et al. [10]. At t D 0.743 s, a quasi-equilibrium
phase, there are still fairly large air bubbles on the substrate, the release of such
bubbles becomes more slowly and more difficult.
Based on the present simulation results, it is found that the initial radius of the air
disc at the impact point for the 300 m droplet is 10 ˙ 2 m for all the test cases,
due to the nearly spherical shape of the original drop in the simulation. However,
the maximal air disc caught under the droplet depends on the droplet properties,
the impact velocity, as well as the wettability, namely the substrate properties. This
maximal air disc results finally in a myriad of micro-bubbles at the interface between
the liquid and the substrate. The size of such air discs, or air regions of viscous drops,
is plotted against Ohnesorge number and Reynolds number with a relationship of
Oh Re0:8 in Fig. 10. In general, the size of the air region is inversely proportional
Fig. 10 The maximal radius of the air disc of Newtonian viscous drops vs. Oh Re0:8
366 Q. Ye and O. Tiedje
to the surface tension of the fluid and increases with impact velocity and liquid
The effect of static contact angle SCA is also investigated in the present study.
With decreasing SCA, i.e. improving wettability, the maximal air disc reduces from
253 m in case l1b to 238 m in case l3b. Figures 11 and 12 show the entrapped
air bubbles under the drop at t D 0.18 s in detail. On the substrate there are only
two visible small bubbles for the case with SCA D 30ı , whereas many more large
bubbles can be observed for the case with SCA D 60ı . The small SCA helps
the bubbles to break the adhesion force and leave the solid surface much easier.
The decreasing height of the droplet film (small SCA) also makes bubbles drift up
5.00e–01 1.00e–00
4.80e–01 9.50e–01
4.60e–01 9.00e–01
4.40e–01 8.50e–01
4.20e–01 8.00e–01
4.00e–01 7.50e–01
3.80e–01 6.50e–01
3.60e–01 6.00e–01
3.40e–01 5.50e–01
3.20e–01 5.00e–01
3.00e–01 4.50e–01
2.80e–01 4.00e–01
2.60e–01 3.50e–01
2.40e–01 3.00e–01
2.20e–01 2.50e–01
2.00e–01 2.00e–01
1.80e–01 1.50e–01 Y
1.60e–01 1.00e–01
1.40e–01 5.00e–02
1.20e–01 X
Fig. 11 Contours of air volume fractions for the viscous drop with SCA D 30ı at t D 0.18 s (case
5.00e–01 1.00e–00
4.80e–01 9.50e–01
4.60e–01 9.00e–01
4.40e–01 8.50e–01
4.20e–01 8.00e–01
4.00e–01 7.50e–01
3.40e–01 5.50e–01
3.20e–01 5.00e–01
3.00e–01 4.50e–01
2.80e–01 4.00e–01
2.60e–01 3.50e–01
2.40e–01 3.00e–01
2.20e–01 2.50e–01
2.00e–01 2.00e–01
1.80e–01 1.50e–01 Y
1.60e–01 1.00e–01
1.40e–01 5.00e–02
1.20e–01 X
Fig. 12 Contours of air volume fractions for the viscous drop with SCA D 60ı at t D 0.18 s (case
Investigation on Air Entrapment in Paint Drops Under Impact onto Dry Solid Surfaces 367
a h2o
b h2o
3.0 1.2
l1a l1a
l1b 1.0 l1b
Spread factor: d/D
0.8 l1c
1.5 0.6
1.0 0.4
0.5 0.2
0.0 0.0
0.0001 0.001 0.01 0.1 1 10 100 1000 0.0001 0.001 0.01 0.1 1 10 100 1000
t[MS] t[ms]
Fig. 13 Spreading factors (d/D and h/D) of different liquids (impact velocity = 1 m/s, D = 300 m
and SCA D 60ı )
The droplet impact dynamics, namely the evolution of droplet shapes during
advancing and recoiling scenarios, are evaluated (Fig. 13) using spreading factors
defined by d/D and h/D, where d is spreading diameter and h the apex height.
There are no large differences in spreading factor distributions at the beginning,
e.g. t < 0.01 ms, since the inertia force dominates the droplet spreading at this time.
Significant oscillations of spreading factors for water drop can be observed, which
can promote the release of air bubbles from the water film. In contrast, for viscous
liquids there is only one period of advancing and recoiling scenario. Significant
differences of d/D and h/D distributions in dependence on the viscosity and the
surface tension can be observed for t > 0.1 ms.
The Herschel-Bulkley model that is described as follows was used in the present
numerical study for paint liquids.
P D 0 for < 0 (3)
D 0 C k
P n
for > 0 (4)
368 Q. Ye and O. Tiedje
In above equations,
P is the shear rate (s1 ), and the shear stress (Pa). 0 , k
and n are rheological parameters that represent the yield-stress magnitude (Pa), the
consistency factor (Pa sn ) and the power law index, respectively. The function of
the limit value, i.e. the second bracket in equation (5), is necessary, since droplet
impact dynamics is calculated until quasi-equilibrium state. is used for building
the function of the limit value. An increase in 0 induces an increase in additional
plastic-like dissipation, and an increase in k represents an increase in the apparent
viscosity. The power law index n is related to the shear-thinning behaviour (fluid
viscosity becomes lower as n decreases).
Two paint liquids were used. The rheological parameters are shown in Table 3.
Clearly, paint_f has higher apparent viscosity than paint_t. The slight difference of
value n indicates a similar shear-thinning behaviour of both paint liquids. Unless
otherwise specified, the droplet density, surface tension and static contact angle are
always 1000 kg=m3 , 0.025 N/m and 60ı , respectively. Parameter study was carried
out mainly with different drop diameter and impact velocity. A non-Newtonian
Reynolds number Ren D Dn U.2n/ =k and Weber number We defined in Sect. 3.2
are used for the discussion, where D, , U, n and k are the diameter, density, drop
impact velocity and rheological parameters, respectively.
Figures 14 and 15 show the velocity field of a paint droplet impact onto solid surface
and the corresponding shear rate around the impact point. The maximum pinch-off
air velocity from the impact region is 3.97 m/s and the maximum shear rate of gas-
liquid mixture reaches 2.4e6 (1/s), the corresponding viscosity is about 4 mPa s.
Therefore, the liquid viscosity around the impact region reduces tremendously. The
evolution of droplet shape and viscosity, the formation of the air disc as well as the
bubble release from the liquid film are show in Fig. 16. At the early stage of droplet
impact, the viscosity is quite low around the impact region because of the high
shear rate. Higher viscosity is located in the gas-liquid interface on the droplet top.
The diameter of the maximum air disc is about 72 m and contracts subsequently
Fig. 14 Contours of velocity magnitude (m/s), contacting just with the wall (paint_f, D D
300 m, U D 1 m/s)
Fig. 15 Contours of shear rate (1/s), contacting just with the wall (paint_f, D D 300 m,
U D 1 m/s)
into a bubble and releases totally from the liquid film at the quasi-equilibrium state
(t D 40 ms). In the previous case of the viscous droplet impact with Newtonian
fluid (Fig. 8), the maximum air disc with diameter of 506 m was obtained, and
at the quasi-equilibrium state (t D 743 ms) there were still some large bubbles on
the solid surface. In addition, for the non-Newtonian case, the maximum air disc
370 Q. Ye and O. Tiedje
Fig. 16 Paint drop impact (paint_t: D D 300 m, impact velocity = 1 m/s), Left: Contours of
volume fraction (red: air, blue: liquid), right: Contours of molecular viscosity (mixture, blue: air,
D 0:018 mPa s, red: maximum liquid viscosity at the time)
formed at the early stage of droplet impact for the shear thinning liquids is almost
independent on the dimensionless numbers, such as Ren and We. The dimensionless
air disc defined as a ratio of the air disc to droplet radius, Rmax_AD/R, is about
0:2 ˙ 0.1.
At high impact velocities droplet splashing occurs. Figure 17 shows the phase
view on the target wall. The air disc breaks up into many small bubbles that can still
easily release from the liquid, since the quite low apex height of drop in the centre is
reached in this case. Because of splashing the created lamellas contact with the solid
surface, which entraps again many small bubbles near the outside of the liquid film,
as shown in Fig. 17. However, these small bubbles can escape from the liquid during
the droplet recoiling process. At the quasi-equilibrium state the wall is bubble free.
At quite low impact velocities the air disc contracts into a bubble that sticks still
hard on the wall at the equilibrium state, as shown in Fig. 18a. By decreasing the
Investigation on Air Entrapment in Paint Drops Under Impact onto Dry Solid Surfaces 371
Fig. 17 Contours of volume fraction on the wall (red: air, blue: paint_f, D D 300 m, impact
velocity = 80 m/s, We D 7.68e4)
Fig. 18 Contours of volume fraction in a cross-section (red: air, blue: paint_f, D D 300 m,
impact velocity D 0.5 m/s, We D 3), Left: SCA D 60ı , right: SCA D 30ı
static contact angle, such as SCA D 30ı , the initial air disc is similar to that with
SCA D 60ı , since the early stage of droplet impact, especially the formation of
air disc, depends mainly on the droplet inertia and the viscosity. During the droplet
recoiling process with SCA D 30ı , the bubble, as shown clearly in Fig. 18b, escapes
already from the wall at t D 4.6 ms. The bubble therefore becomes easier free from
the liquid for the case in Fig. 18b than 18a.
Figure 19 shows the effects of rheological properties on spread factor d/D during
the drop impact with a diameter of 300 m and an impact velocity of 1 m/s. There is
no difference of the spread factor at t < 0.3 ms, since the inertia force dominates the
spreading process in the stage of t D 0 0:3 ms. After t D 0.3 ms, effects of viscous
and surface tension forces increase. The liquid film contracts again with the help
of surface tension. The lower apparent viscosity of paint_t makes the contraction
easier, which results in an early recoiling process. The entrapped dimensionless air
372 Q. Ye and O. Tiedje
Fig. 19 Effect of liquid rheological properties on the spread factor (D D 300 m, impact velocity
U D 1 m/s, SCA D 60ı )
Fig. 20 Impact scenarios and bubbles free condition for yield-stress droplets in relation to
dimensionless numbers
disc d/D is the same, about 0.2 for both liquids. The corresponding time is located
within the kinematic phase of the drop impact (t < 0.3 ms).
A summary of drop impact dynamics concerning different impact scenarios in
relation to dimensionless numbers of Ren und We is shown in Fig. 20. The bubble
free condition is also indicated in the figure. It was found that the entrapped air
bubbles can escape from the wall in the quasi-equilibrium state, if the Weber-number
satisfies We > 10.
4 Conclusions
For the first time, a numerical simulation on the time-resolved imaging of the air
entrapment and bubble movement under drop impacting onto dry solid surfaces was
carried out. Both Newtonian and yield-stress viscous droplets were applied in the
Investigation on Air Entrapment in Paint Drops Under Impact onto Dry Solid Surfaces 373
Based on the simulation results, the mechanism of air entrapment during drop
impact onto solid surfaces can be figured out. Basically, there is always air
entrapment between the droplet-solid interface, which does not depend strongly on
the surface wettability. Thin air layers result from the direct contact between the
droplet outline and the substrate. The maximum air disc is reached if the wetted
contact line moves. The size of air disc, the contraction of the air disc to bubbles
and the release of air bubbles, however, depend on material and target properties and
application parameters. The size of the entrapped air disc is inversely proportional
to the surface tension of the fluid and severely increases with liquid viscosity and
impact velocity. The air disc contracts and breaks into bubbles during the advancing
phase, which can escape from the liquid film at the equilibrium state under certain
conditions. Decreasing static contact angle of the liquid will enhance the bubble
release from the target wall as well as from the liquid film. For Newtonian viscous
droplets the maximum dimensionless air discs in dependence on the dimensionless
numbers were made. At the equilibrium state there were still fairly bubbles on the
wall, especially for high viscous drops.
For yield-stress liquids the wetting of solid wall was tremendously improved
because of the high shear rate and subsequently quite low viscosity at the early
stage of droplet impact. The dimensionless maximum air disc was quite small and
almost constant 0:2˙0:1. The impact scenarios and effects of rheological properties
on the time-dependent spread factor were analyzed. Bubble free conditions were
also discussed. It was found that the target wall could be bubbles free if the Weber
number is larger than 10. According to the simulation results, assumptions could
be made, for instance, the trend of air entrapment at the solid-liquid interface by
using pneumatic atomizer and airless gun that create impact droplets with large
We-number is lower than by using high-speed rotary bell. Bubbles that are created
by the droplet impacting onto solid surfaces and still adhere on the surface at the
quasi-static state provoke pinhole formation after baking process. Of course, air
entrapment occurs also by drop impact onto wet solid surface, which will be further
investigated in future.
Acknowledgements The author would like to thank the steering committee for the supercomput-
ing facilities at the Höchstleistungsrechenzentrum (HLRS) Stuttgart, Germany.
1. Chandra, S., Avedisian, C.T.: On the collision of a droplet with a solid surface. Proc. R. Soc.
Lond. Ser. A 432, 13 (1991)
2. Ding, H., Li, E.Q., Zhang, F.H., Sui, Y., Spelt, P.D.M., Thoroddsen, S.T.: Propagation of
capillary waves and ejection of small droplets in rapid droplet spreading. J. Fluid Mech. 697,
92–114 (2012)
3. Ansys-Fluent 17.0 User Manual
4. Fujimoto, H., Shiraishi, H., Hatta, N.: Evolution of liquid/solid contact area of drop impinging
on a solid surface. Int. J. Heat Mass Transf. 43, 1673–1677 (2000)
374 Q. Ye and O. Tiedje
5. German, G., Bertola, V.: Impact of shear-thinning and yield-stress drops on solid substrates.
J. Phys.: Condens. Matter 21, 375111 (2009)
6. Kim, E., Baek, J.: Numerical study of the parameters governing the impact dynamics of yield-
stress fluid droplets on a solid surface. J. Non-Newton. Fluid Mech. 173–174, 62–71 (2012)
7. Lunkad, S.F., Buwa, V.V., Nigam, K.D.P.: Numerical simulations of drop impact and spreading
on horizontal and inclined surfaces. Chem. Eng. Sci. 62, 7214–7224 (2007)
8. Mehdi-Nejad, V., Mostaghimi, J., Chandra, S.: Air bubble entrapment under an impacting
droplet. Phys. Fluids 15(1), 173–183 (2003)
9. Nigen, S.: Experimental investigation of the impact of an (apparent) yield-stress material.
Atomization Sprays 15, 103–117 (2005)
10. Palacios, J., Hernandez, J., Gómez, P., Zanzi, C., Lopez, J.: Experimental study on the
splash/deposition limit in drop impact onto solid surfaces. Exp. Fluids 52, 1449–1463 (2012)
11. Šikalo, Š., Tropea, C., Ganic, E.N.: Dynamic wetting angle of a spreading droplet. Exp. Therm.
Fluid Sci. 29, 795–802 (2005)
12. Thoroddsen, S.T., Sakakibara, J.: Evolution of the fingering pattern of an impacting drop. Phys.
Fluids 10(6), 1359–1374 (1998)
13. Thoroddsen, S.T., Etoh, T.G., Takehara, K., Ootsuka, N., Hatsuki, Y.: The air bubble entrapped
under a drop impacting on a solid surface. J. Fluid Mech. 545, 203–212 (2005)
14. Thoroddsen, S.T., Takehara, K., Etoh, T.G.: Bubble entrapment through topological change.
Phys. Fluids 22(051701), 1–4 (2010)
15. Ye, Q., Tiedje, O.: Numerical study on air entrapment in droplets under impact onto a solid
surface. In: Proceeding of ILASS – Europe 2013, 25th European Conference on Liquid
Atomization and Spray Systems, Chania, 1–4 Sept 2013
16. Ye, Q., Burk, S., Domnick, J.: Analysis of droplet impingement of different atomizers used
in spray coating processes. In: 13th Triennial International Conference on Liquid Atomization
and Spray Systems, Tainan, 23–27 Aug 2015
Numerical Study of the Impact of Praestol®
Droplets on Solid Walls
1 Introduction
calculations of droplet impact with shear thinning liquids are presented in [13]. Here
in this work the focus lies on the difference between the two Praestol® solution,
which have different shear thinning behaviour. We intent to present the validation
of the numerical code with experiments on the European Conference on Liquid
Atomization and Spray Systems (ILASS) 2016 conference in Brighton.
2 Numerical Method
In the VOF method [7] a scalar field variable f is introduced which represents
the volume fraction of a fluid in each computational cell. This variable is unity
inside the liquid and zero in the gaseous phase. It can therefore directly be used to
identify the location of the interface in the computational domain and also allows to
compute geometrical properties such as normal vectors or curvature.The variable f
is therefore defined as:
<0 outside the liquid phase;
f .x; t/ D .0; 1/ at the interface;
1 inside the liquid phase:
Numerical Study of the Impact of Praestol® Droplets on Solid Walls 377
Using this definition, physical properties of the liquid and gaseous phase can be
expressed in a continuous way across the interface, e.g. the density reads
where the subscript l denotes the liquid phase and g the gaseous phase.
The VOF variable f is transported by an additional transport equation:
C r .uf / D 0: (5)
Note, that the right hand side is zero only if no phase change takes place.
Combining the VOF method with a finite volume discretization, a volume
conserving numerical method can be guaranteed. Additionally, the advection Eq. (5)
is solved using fluxes calculated by using the method of piecewise linear interface
reconstruction (PLIC). This is necessary in order to suppress numerical diffusion of
the interface since without an accurate information about the spatial distribution of
the liquid phase, it is impossible to precisely determine how much liquid and how
much gas is transported across a cell boundary in each time step.
As mentioned above, the normal vectors at the interface n
D rf = jrf j can
be easily calculated. As of now, the 26 surrounding cells are used to evaluate the
gradient operator. The PLIC surfaces are then constructed using this normal vector
to define a plane in each cell which sharply separates the liquid and the gaseous
phase. The position of this plane in a local coordinate system in each cell can be
analytically determined. A 2D example of a PLIC reconstructed interface is shown
in Fig. 1b.
a b c
y y y uδt
0 0 0 0 0 0 0 0 0 0
1 1 0.6 0 0 1 1 0.6 0 0
1 1 1 0.3 0 1 1 1 ng 0
1 1 1 0.7 0 1 1 1 0.7 0
Fig. 1 (a) f -field without interface information; (b) interface reconstruction with the PLIC-
method; (c) calculation of the f -flux uıt with ıt being the timestep from a PLIC reconstructed
378 M. Reitzle et al.
In the case of shear thinning or shear thickening liquids, the viscosity is no longer
constant, but a function depending on the shear rate. In FS3D the Carreau-Yasuda
model (see e.g. [15] and [2]) is used:
P / 1
D 1 C .
P /a : (6)
0 1
Here, the subscripts 0 and 1 denote the viscosities at zero and very large
shear rates, respectively; , a and n are parameters depending on the rheometric
characteristics of the liquid.
The liquid properties are listed in Table 1 where denotes the surface tension
and the corresponding plot of the viscosities as a function of the shear rate is shown
in Fig. 2. Note, that the lower limit of the shear rate for Praestol® 2500 is in close
proximity to the viscosity of pure water.
0 upper limit
2500 0.8%
m [Pa s]
2540 0.05%
-3 lower limit
-3 -2 -1 0 1 2 3 4 5
10 10 10 10 10 10 10 10 10
Fig. 2 Evaluation of the viscosity due to shear rates according to the Carreau-Yasuda model for
0:8 % Praestol® 2500 (turquoise) and 0:05 % Praestol® 2540 (red) in water. The dash-dotted lines
represent the lower and upper limits for the former. The upper and lower limit for the latter liquid
are 1:5208 Pa s and 0:052 Pa s, respectively
Numerical Study of the Impact of Praestol® Droplets on Solid Walls 379
2 cm
1 cm Droplet x
Fig. 3 Schematic view of the computational domain and the initial position of the impacting
3 Numerical Setup
4 Results
In the numerical simulations the initial droplet diameter was in the range of D0
3–3.5 mm. The initial droplet velocity in the z-direction was varied from v0 0.4–
3.6 m/s. This resulted in a variation of the impact Weber number We D l D0 v02 =
6 to 555, with l the liquid density and the surface tension. Depending on the
impact Weber number and on the Praestol® solution different spreading behaviours
of the droplet liquid on the solid surface could be observed. The impact process can
be separated into different phases according to [13]. Phase A is the part of the impact
process where the droplet approaches the wall. In phase B the droplet touches the
surface and a disk begins to form. This phase ends at the non-dimensional time
t D tv0 =D0 D 2. The disk then spreads over the surface in phase C until the
maximum disk diameter d D dmax is reached. Afterwards, the recoiling process
is charaterises in phase D. The recoiling phase D is mainly determined by the
380 M. Reitzle et al.
contact angle between the droplet liquid and the solid surface. In the numerical
simulations of this study the contact angle was fixed to ˛ D 90ı , which is only a
very rough estimation. In future calculations this restriction will be removed and
a more physical contact angle model will be implemented. On the experimental
side the wall has to be cleaned accurately before each droplet impact, what will be
performed in future experiments. Therefore, the results shown below concentrate
mainly on the well validated phases. The final phase E describes the behaviour of
the droplet, when the disk has collapsed.
Figure 4 elucidates the development of the viscosity inside the droplet during the
first phases of the impact process. It further shows the essential differences between
the different Praestol® solutions. It is apparent from Fig. 2, that the viscosity of
the Praestol® 2540 solution decreases at much lower shear rates than the Praestol®
2500 solution. Therefore, the viscosity of the Praestol® 2540 solution reaches its
lower limit shortly after the droplet impact as can be seen from Fig. 4. However,
0.01897 Pa s
t ∗ ≈ 0.2 0.01553 Pa s
z [cm]
0.01204 Pa s
0.00864 Pa s
0.00520 Pa s
t ∗ ≈ 1.85
z [cm]
t ∗ ≈ 2.65
z [cm]
x [cm]
Fig. 4 Intersections through the centre of the droplets at different times t . The orange solid line
marks the border of the droplet liquid for impact Weber numbers We 67. On the left hand side
results for the 0:8 % solution of Praestol 2540 and on the right hand side results for the 0:005 %
solution of Praestol 2500 are shown. The colours indicate the viscosity of the droplet liquids. In
regions of the darkest blue the viscosity is larger than 0:01897 Pa s, which is 2:5 % of the upper
limit of the Praestol 2500 solution. In regions of the lightest yellow the viscosity is lower than
0:0052 Pa s, which is the lower limit of the Praestol 2540 solution
Numerical Study of the Impact of Praestol® Droplets on Solid Walls 381
at t 2:65, when the recoiling process has started, the shear rate decreases and
the viscosity begins to increase. The viscosity of the Praestol® 2500 solution is at a
much higher level, in many regions above 0:01897 Pa s. However, in close proximity
to the wall low viscosity values are obtained.
Figure 5 gives an impression of the velocity distribution in a cut through the
centre of the droplet for the 0:005 % solution of Praestol® 2500 for an impact Weber
z [cm]
z [cm]
x [cm]
Fig. 5 Intersections through the centre of the droplets at t D 1:85. The green solid line marks
the border of the droplet liquid. Shown is the result for an impact Weber number We 67 for
the 0:005 % solution of Praestol® 2500. The colours indicate the viscosity of the droplet liquids.
The legend can be found in Fig. 4. The velocity field is indicated by red arrows where the longest
arrows corresponds to approx. 1 m/s. The lower picture shows a zoom into the boundary layer
close to the rim
382 M. Reitzle et al.
number We 67. Inside of the droplet a hydrodynamic boundary layer close to the
wall is formed where the viscosity decreases due to the larger velocity gradients.
The development of the disk diameter d with respect to time t is shown in Fig. 6
for two different Weber numbers and both Praestol® solutions.
For higher Weber numbers the disk grows faster and higher maximum disk
diameters dmax are obtained. Due to the lower viscosity of the Praestol® 2540
solution during the impact process (compare Fig. 4) higher maximum disk diameters
are obtained in comparison to the Praestol® 2500 solution. For the lower Weber
numbers, these maxima are reached at a later time tmax . The same results in non-
dimensional form are shown in Fig. 7.
d [mm]
0 1 2 3 4 5 6 7 8 9 10
t [ms]
Fig. 6 Disc diameter d as a function of time t for Weber number We 430 (solid lines) and
Weber number We 67 (dashed lines). The squared symbols in green indicate the maximum dmax
of the disk diameter for the Praestol® 2540 solution and the circled symbols in red indicate the
same for the Praestol® 2500 solution
0 1 2 3 4 5 6 7 8
Fig. 7 Non-dimensional disk diameter d=D0 as a function of non-dimensional time t for Weber
number We 430 (solid lines) and Weber number We 67 (dashed lines). The squared symbols
in green indicate the maximum dmax of the disk diameter for the Praestol® 2540 solution and the
circled symbols in red indicate the same for the Praestol® 2500 solution
Numerical Study of the Impact of Praestol® Droplets on Solid Walls 383
dmax /D0
1 Praestol
0 100 200 300 400 500 600
Fig. 8 Non-dimensional maximum disk diameter dmax =D0 as a function of Weber number We
Praestol R 2500
0 100 200 300 400 500 600
Fig. 9 Non-dimensional time tmax , at which the maximum disk diameter has been reached as a
function of Weber number We
384 M. Reitzle et al.
dmax /D0
0 1 2 3 4
Fig. 10 Non-dimensional maximum disk diameter dmax =D0 as a function of non-dimensional time
tmax , at which this maximum has been reached
5 Computational Performance
For the performance analysis, an arbitrary setup is created consisting of two spheres
of radius r0 D 0:5 cm which are placed into a domain of size Œ3 3 3 cm. They
are initially located at the centre in the y- and z-direction but slightly shifted in the
x-direction, so that there is a small overlap of 1=8r0 . Both spheres are given an
initial velocity of 1:2 m/s towards the centre of the domain. The resulting impact
and lamella formulation is the basis of the performance analysis. In accordance to
the results shown in the previous section, the non-Newtonian Praestol® 2500 0:8 %
solution is used for both droplets. A schematic setup is shown in Fig. 11.
The baseline case is computed on a Cartesian grid of 5123 grid cells. Even
though hybrid parallelisation is possible (spatial domain composition with MPI data
exchange and OpenMP on a loop level), we used only spatial domain composition
here since previous calculations showed a decrease in parallel efficiency if OpenMP
is used additionally [4].
Strong scaling was investigated by increasing the number of cores while keeping
the problem size and resolution constant. The speed-up S was evaluated, defined as
SD : (7)
Here, Ni is the number of completed computation cycles (i.e. timesteps) in 2 h
on i cores. The relatively long runtime allows to neglect the initialization time at the
beginning of the calculation. The reference case was calculated on 32 cores since
memory limitations did not allow to go lower than that. Parallel efficiency E was
additionally evaluated based on the definition
ED (8)
Numerical Study of the Impact of Praestol® Droplets on Solid Walls 385
Fig. 11 Schematic
representation of the setup for
the performance analysis. The
domain size is Œ3 3 3 cm
Table 2 Speed-up and parallel efficiency of FS3D. Shown are the results for different numbers of
cores relative to the baseline case
Rel. number of cores 1 2 4 8 16 32 64
Speed-up S 1.0 1:95 3:59 6:28 11:26 13:91 24:91
Parallel efficiency E 100 % 97:2 % 89:6 % 78:4 % 70:4 % 43:5 % 38:9 %
Note, that instead of the classical definition of both the speed-up and the parallel
efficiency in terms of computational times, the number of completed cycles was
taken which is, however, equivalent.
From Table 2 and Fig. 12 it is apparent that the speed-up is not ideal. In fact,
the parallel efficiency deviates from the ideal speed-up more and more with an
increasing number of cores. This is mostly due to the multigrid solver used in FS3D
to solve the pressure Poisson equation. Here, about 80 % of the total computational
time is spent. Within this solver a high communication load arises due to multiple
point-to-point and global MPI data exchanges. Furthermore, as the grid coarsens
in the multigrid cycles, the ratio of calculation to communication overhead gets
worse. Additionally, a solver for multiphase flows has inevitably different numerical
schemes that are applied consecutively. These schemes have different serial frac-
tions which corrupt the parallel efficiency. Currently, a new multigrid-solver library
is implemented in cooperation with the University of Frankfurt, G. Wittum, with the
hope of greatly improving both the serial performance and the parallel efficiency.
For the weak scaling shown on the left of Fig. 12 and in Table 3 the number
of cells per core was kept constant, while the size of the problem was varied. The
baseline case consisted of Œ256 256 512 grid cells calculated on 128 cores.
Each case was run for 2 h, analogous to the strong scaling, and the total number
386 M. Reitzle et al.
Speed-Up S
40 100
100 101 100 101
No. of cores relative to baseline case No. of cores relative to baseline case
Fig. 12 Weak (left) and strong scaling (right) behaviour for FS3D relative to the baseline case of
non-Newtonian droplet impact. On the left the number of cycles relative to the baseline case on
Œ512 512 512 cells is shown. The right side shows the speed-up as defined in Eq. (7) as a
function of the number of cores relative to the baseline case
Table 3 Weak scaling performance in terms of total number of cycles in 2 h runtime. Shown are
the results relative to the case with Œ256 256 512 grid cells, herein called the baseline case
Rel. number of cores Cells in x Cells in y Cells in z Eweak (%)
1 256 256 512 100
2 256 512 512 83.30
4 512 512 512 92.25
8 512 512 1024 72.16
16 512 1024 1024 37.71
of completed timesteps were compared. The parallel efficiency for weak scaling is
defined as
Eweak D 100 %: (9)
Note, that Eweak is dropping for a rising number of cores with the exception of
the case with 512 processors. This is due to the rising number of coarsening levels
and consequently the varying number of cycles in the multigrid solver. We hope
to overcome this problem in the near future with the new solver for the Poisson
6 Conclusions
not identical behaviour. In future studies, the behaviour of Newtonian fluids shall
be compared with the results shown in this report. Furthermore, a performance
analysis was done for two non-Newtonian droplets colliding without the influence
of gravity. The strong scaling analysis shows good parallel efficiency up to 512
cores for the given case. However, FS3D still shows speed-up for more cores even
though the parallel efficiency drops. In the near-future, a new multigrid solver for the
pressure equation will be implemented with the hopes of greatly increasing parallel-
Acknowledgements The authors kindly acknowledge the High Performance Computing Center
Stuttgart (HLRS) for support and supply of computational time on the Cray XC40 platform under
the Grant No. FS3D/11142 and the financial support by the Deutsche Forschungsgemeinschaft
(DFG) for the Collaborative Research Center SFB-TRR75.
1. Eisenschmidt, K., Ertl, M., Gomaa, H., Kieffer-Roth, C., Meister, C., Rauschenberger, P.,
Reitzle, M., Schlottke, K., Weigand, B.: Direct numerical simulations for multiphase flows:
an overview of the multiphase code fs3d. J. Appl. Math. Comput. 272(2), 508–517 (2016).
2. Ertl, M., Roth, N., Brenn, G., Gomaa, H., Weigand, B.: Simulations and experiments on shape
oscillations of newtonian and non-newtonian liquid droplets. In: ILASS 2013, Chania, p. 7
3. Francois, M.M., Cummins, S.J., Dendy, E.D., Kothe, D.B., Sicilian, J.M., Williams, M.W.: A
balanced-force algorithm for continuous and sharp interfacial surface tension models within a
volume tracking framework. J. Comput. Phys. 213(1), 141–173 (2006)
4. Galbiati, C.M.E., Tonini, S., Cossali, G.E., Weigand, B.: DNS investigation of the primary
breakup in a Conical Swirled Jet. In: High Performance Computing in Science and Engineering
’15 Transactions of the High Performance Computing Center, Stuttgart (HLRS), pp. 333–347
5. Gomaa, H., Stotz, I., Sievers, M., Lamanna, G., Weigand, B.: Preliminary Investigation on
diesel droplet impact on oil wallfilms in diesel engines. In: ILASS – Europe 2011, 24th
European Conference on Liquid Atomization and Spray Systems, Estoril, Sept 2011
6. Hase, M., Weigand, B.: A numerical model for 3D transient evaporation processes based on the
volume-of- fluid method. In: ICHMT International Symposium on Advances in Computational
Heat Transfer, Istambul, pp. 1–23 (2004)
7. Hirt, C.W., Nichols, B.D.: Volume of fluid (VOF) method for the dynamics of free boundaries.
J. Comput. Phys. 39(1), 201–225 (1981). doi:10.1016/0021–9991(81)90145–5
8. Rauschenberger, P., Weigand, B.: A volume-of-fluid method with interface reconstruc-
tion for ice growth in supercooled water. J. Comput. Phys. 282, 98–112 (2015).
9. Rauschenberger, P., Weigand, B.: Direct numerical simulation of rigid bodies in mul-
tiphase flow within an Eulerian framework. J. Comput. Phys. 291, 238–253 (2015).
10. Rieber, M., Graf, F., Hase, M., Roth, N., Weigand, B.: Numerical simulation of moving
spherical and strongly deformed droplets. In: Proceedings ILASS-Europe, Darmstadt, pp. 1–6
11. Roth, N., Schlottke, J., Urban, J., Weigand, B.: Simulations of droplet impact on cold wall
without wetting. In: ILASS, Como Lake, pp. 1–7 (2008)
388 M. Reitzle et al.
12. Roth, N., Gomaa, H., Weigand, B.: Droplet collisions at high weber numbers: experiments and
numerical simulations. In: Proceedings DIPSI Workshop 2010 on Droplet Impact Phenomena
& Spray Investigation. Bergamo (2010)
13. Roth, N., Meister, C., Gomaa, H., Ertl, M., Weigand, B.: Numerical simulation of shear
thinning liquids impacting on dry solid walls. In: Proceedings 26th Europe Conference on
Liquid Atomization and Spray Systems. ILASS, Bremen (2014)
14. Schlottke, J., Rauschenberger, P., Weigand, B., Ma, C., Bothe, D.: Volume of fluid direct
numerical simulation of heat and mass transfer using sharp temperature and concentration
fields. In: ILASS – Europe 2011, 24th European Conference on Liquid Atomization and Spray
Systems, Estoril (2011). http://www.ilass.uci.edu/
15. Tanner, R.I.: Engineering Rheology, 2nd edn. Oxford Engineering Science Series. Oxford
University Press, New York (2002)
16. Weking, H., Schlottke, J., Boger, M., Munz, C.D., Weigand, B.: DNS of rising bubbles using
VOF and balanced force surface tension. In: High Performance Computing on Vector Systems
(2010). Springer, Berlin/Heidelberg/New York
Turbulent Skin-Friction Drag Reduction at High
Reynolds Numbers
Davide Gatti
1 Introduction
The present manuscript briefly summarizes the research performed utilizing the
computational resources of the ForHLR I computer cluster within the project “reef-
fect”. A more detailed description of the present research has been already published
in Gatti and Quadrio [11] and in Stroh, Gatti, Hasegawa and Frohnapfel [22].
In the last few decades, fundamental research efforts in turbulent skin-friction
drag reduction met with considerable success, and several viable strategies to reduce
drag have been introduced. Due to the shrinkage of space- and time- scale of wall
turbulence in laboratory implementations, or to the fast growth of the computational
costs with increasing Re in numerical experiments, such studies are typically limited
to low-Reynolds number flows. Therefore, the question naturally arises how to
extrapolate the observed performance to the higher values of the Reynolds number
Re typical of most industrial applications.
In simple and well-controlled laboratory flows like a channel flow the friction
drag reduction is typically characterized in terms of the drag reduction rate R,
defined as the relative change of skin-friction coefficient Cf between the controlled
D. Gatti ()
Institute of Fluid Mechanics (ISTM), Karlsruhe Institute of Technology (KIT), Karlsruhe,
e-mail: davide.gatti@kit.edu
Fig. 1 Literature data for maximum drag reduction rate Rmax versus Re for spanwise-forcing
techniques. Black (white) symbols indicate results from DNS (experimental) studies. We explicitly
note that the forcing amplitude is not always identical among different datasets. ı: oscillating
wall [4–6, 12, 15, 17, 18, 20, 21, 23–25]; 4: streamwise-traveling waves [1, 19]; : spanwise-
traveling waves [7, 8]; Þ: Lorentz force [2, 16]; C: reactive opposition control [3]. The solid line
is Rmax
Re 0:2 (Figure taken from Gatti and Quadrio [10]. Reprinted with permission of AIP
In this definition, the subscript “0” indicates a quantity measured in the reference
flow, and the skin-friction coefficient is defined as
Cf D 2 ; (2)
where w is the wall-shear stress, is the fluid density and Ub the bulk velocity.
The low-Re laboratory and numerical evidence available shows (Fig. 1) that the
maximum drag reduction Rmax obtained with various control techniques based
on near-wall forcing decreases for increasing Reynolds numbers, which poses the
question whether sizeable drag reduction is still achievable at high values of Re and
hence worth pursuing.
A lively debate is taking place in the scientific community regarding the high-Re
behaviour of the last generation of control techniques for turbulent skin-friction drag
reduction, in particular the active open-loop ones (i.e. those requiring additional
power to be transferred to the flow). Such techniques operate by enforcing suitable
temporal and spatial distributions of velocity perturbations at the wall. Assessing
their potential for achieving sizeable benefits at high Re is of paramount importance
to motivate further research in this field.
Turbulent Skin-Friction Drag Reduction at High Reynolds Numbers 391
The goal of the resent research is to address the effect of increasing the Reynolds
number on the achievable turbulent skin-friction drag reduction. We take a par-
ticularly promising control strategy as model for the present investigation: the
streamwise-travelling waves of spanwise wall velocity [19], which consists on the
following spanwise wall velocity distribution:
In the above expression for the wall forcing, Ww is the spanwise velocity enforced
at the wall, A is the amplitude of the forcing, is the streamwise wavenumber and !
is the angular frequency. x and t are the streamwise coordinate and time respectively.
The forcing, sketched in Fig. 2, consists in a wall distribution of streamwise-
modulated waves of the spanwise (z) velocity component with wavelength D
2 = and period T D 2 =!, which travel at speed c D != forward (c > 0)
or backward (c < 0) with respect to the direction x of the mean flow. The three
independent parameters (for example A, , !) of the control law (3) combined
with the Reynolds number Re define a 4-dimensional parameter space, whose
complete investigation represents a computational challenge. A large number of
Direct Numerical Simulations (DNS) of the turbulent flow in a doubly periodic
channel modified by streamwise-travelling waves of spanwise wall velocity are
performed either a constant flow rate (CFR) or a constant pressure gradient (CPG).
c= κ
Ly = 2h
λ= κ
Mean flow
Table 1 Details of the small-box (upper half) and large-box (lower half) simulations. Every
caseset is detailed in terms of simulation type (CFR or CPG), number of cases Ncases , values
of bulk Reynolds number Reb and friction Reynolds number Re , length and width of the
computational domain in inner and outer units, number of Fourier modes in the homogeneous
directions (additional modes are used for dealiasing, according to the 3/2 rule) and collocation
points in the wall-normal direction
Type Ncases Reb Re Lx =h Lz =h LC
x LC
z Nx Ny Nz
CFR 1530 6627 203:0 1.59 0.80 1015 507 96 100 96
CFR 480 6627 203:4 2.05 1.02 1308 654 128 100 128
CFR 1530 39;333 905:6 0.32 0.16 906 453 96 500 96
CFR 480 39;333 948:3 0.43 0.22 1290 645 128 500 128
CFR 5 6360 199:9 4 2 2512 1256 256 128 256
CPG 5 6358 200:0 4 2 2513 1257 256 128 256
CFR 5 39;980 1000:0 4 2 12;566 6283 1024 500 1024
CPG 5 39;900 998:6 4 2 12;549 6274 1024 500 1024
The aim is to obtain and compare two comprehensive sets of cases at Re D 200
and Re D 1000, where Re D u h= is the Reynolds number based on the channel
half-height h, the friction velocity u of the uncontrolled flow and the kinematic
viscosity of the fluid. The initial condition is that of an uncontrolled turbulent
flow. The spatial resolution in wall units is always better than xC D 12:3 and
zC D 6:1 (or xC D 8:2 and zC D 4:1 if the additional modes used to
completely remove the aliasing error are considered). yC smoothly varies from
yC 1 near the wall to yC 7 at the centerline. Time integration is carried out
with a partially implicit approach, with a Crank-Nicolson scheme for the viscous
terms and a third-order Runge–Kutta scheme for the convective terms. The CFL
number is set at unity; the consequent average size of the timestep is always below
tC D 0:17 for the low-Re cases, and below tC D 0:1 for the high-Re cases. The
integration time is at least 24,000 viscous time units, and in certain cases it increases
up to 80,000 viscous units. For each value of Re, the computational study considers
two distinct sets of simulations, described below, details of which are reported in
Table 1.
The first set (upper half of Table 1) is a parameter study designed to produce a
massive database of drag reduction data (4020 cases overall); the parameter space
includes the forcing wavenumber , the forcing angular frequency ! and, for the
first time, the forcing amplitude A too. For this set of calculations, carried out
under the CFR condition, a relatively small computational domain is employed: the
consequent savings in computing time are key to make this huge parameter study
The second set of simulations (lower half of Table 1) employs a larger domain
size. For both Re we consider the reference uncontrolled case, and four other cases
at the amplitude AC D 7. One case is for the oscillating wall at nearly optimal
period T C D 75, one case with oscillating wall at the larger period T C D 250, one
case with travelling waves with large drag reduction (! C D 0:0239 and C D 0:01)
Turbulent Skin-Friction Drag Reduction at High Reynolds Numbers 393
and one case for travelling waves with drag increase (! C D 0:12 and C D 0:01).
Each case is run under both CFR and CPG (and for the latter the forcing parameters
listed above are to be intended in actual wall units), for a total of 20 simulations
featuring the larger computational domain.
3 Computational Details
The computation of the small-box large database, which involved a total of 4020
simulations, were partially performed on a Blue Gene/Q system at the CINECA
computing centre in Bologna and partially on the ForHLR supercomputer of the
Steinbuch Centre for Computing (SCC) in Karlsruhe. The simulations have been
run with the solver for the incompressible Navier–Stokes equations developed by
[13]. The large number of relatively inexpensive simulation, due to the small domain
size, allows to run them contemporaneously as 4020 serial simulations. Automated
procedures in bash and Python have been developed to control the whole workflow,
collect and postprocess the results easily. The simulation run for 720 wall clock
hours on 4020 cores, totalling 2.9 Mio CPU hours.
The large-box small database requires more resources and ad-hoc parallelization
to be generated. The smaller number (20) of very large simulation, each one
consisting of 524.3 Mio grid points, of course impedes to adopt the same strategy
used for the small-box simulations, i.e. by simultaneously running a large number of
serial computations. An hybrid shared-memory and distributed-memory paralleliza-
tion has been employed, which relies on MPI and OpenMP, in order to perform
the computation efficiently. The performance of the distributed matrix transpose
required in the pseudo-spectral convolutions has been improved by adopting a copy-
free algorithm which relies on MPI derived datatypes. Data are sent and received in
the appropriate order, which automatically results into a transposition, without the
requirement for manual packing and unpacking of send and receive buffers. These
computations have been entirely performed on the ForHLR supercomputer of the
SCC in Karlsruhe. Typically, each simulation was run on 140 CPUs, organized in
35 proceeses à 4 treads, for about 3.5 months, totalling about 7 Mio CPU hours. The
total volume of generated data is 4 TB.
4 Results
Figure 3 globally represents the whole DNS dataset as isosurfaces of drag reduction
rate in the control parameter space. The cloud of black dots represents the 2024
datapoints used for interpolation at each Re. This overview already confirms that
the drag reduction rate decreases throughout the whole dataset when the Re is
increased from Re D 200 to Re D 1000. For instance, The connected region
394 D. Gatti
Fig. 3 Isosurfaces of drag reduction R in the three-dimensional parameter space ! C ; C ; AC
for Re D 200 (a) and Re D 1000 (b). Isosurface from dark to light range from R D 0:2 to
0.5 in steps of 0.1. The cloud of dots represents the 2010 data points where, at each Re, a DNS
has been carried out (Figure taken from Gatti and Quadrio [11]. Reprinted with permission from
Cambridge University Press)
Turbulent Skin-Friction Drag Reduction at High Reynolds Numbers 395
where R > 0:5 at Re D 200 is not visible at Re D 1000. Interestingly, the
region of drag increase is most affected by the change in Reynolds number. The
isosurface at R D 0:2 disappears at Re D 1000 and the one at R D 0:1
shrinks significantly. The rate at which the drag reduction decays with Re is
traditionally described after [6], i.e. by assuming a power-law decay as R D
the Reynolds number Re. As a result, the power law R D Re can not be used to
predict the high-Re behaviour of drag reduction and other approaches are required.
Thanks to the results of the large-box small database and the computations run
on ForHLR, a different approach to describe the effect of Re on R has been
In analogy with surface roughness, the effect of drag-reducing control can
be quantified via the so-called roughness function and considered as a positive
(upward) shift BC of the velocity profile in the logarithmic layer. This known
result is systematically checked in Fig. 4a–d. The mean velocity profiles are obtained
at both Re from the large-box simulations.
Following a procedure already done for riblets e.g. by [14] and by [9], it
is possible to derive a relationship which links the vertical shift BC of the
mean velocity profile in its logarithmic region, the drag reduction rate R and the
Reynolds number (via the skin-friction coefficient of the uncontrolled flow, which
is a unique function thereof). Further details of the derivation can be found in
If the uncontrolled and controlled flows are compared under the CFR constraint,
this relationship reads:
2 h i 1
.1 R/1=2 1 ln .1 R/ : (4)
Cf ;0 2k
If on the other hand the pressure gradient is kept constant across the comparison
(CPG), then by definition Re D Re;0 , and the above equation further simplifies to:
2 h i
B D .1 R/1=2 1 : (5)
Cf ;0
The data presented in Gatti and Quadrio [11] show that B is a function of the
control parameters only and does not depend upon Re, if the Reynolds number
is large enough for the Prandt-von Kármán friction relation to reasonably hold.
Therefore, the relationship (4) and (5) can be utilized, once B is known, to predict
the behaviour of R at large Re.
396 D. Gatti
(a) (b)
25 25
20 20
15 15
10 10
16.2 16.2
16 16
5 15.8 5 15.8
15.6 15.6
60 70 60 70
100 101 102 103 100 101 102 103
y∗ y∗
(c) (d)
25 25
20 20
15 15
10 10
15.8 15.8
15.6 15.6
5 15.4 5 15.4
15.2 15.2
15 15
60 70 60 70
0 0
100 101 102 103 100 101 102 103
y∗ y∗
Fig. 4 Mean velocity profiles obtained from the large-domain simulations reported in the lower
half of Table 1. Top: Re D 200; bottom: Re D 1000. Left: CFR cases; right: CPG cases. The
solid line is the reference case and the other lines correspond to control yielding both drag reduction
and drag increase (see text). The insets enlarge a portion of the logarithmic layer to show the (very
small) statistical uncertainty at 95 % confidence, denoted by the shaded area (Figure taken from
Gatti and Quadrio [11]. Reprinted with permission from Cambridge University Press)
5 Conclusion
In this study a large drag reduction DNS database has been produced for a
turbulent plane channel flow subject to a spanwise forcing. Four-thousand and
twenty simulations have been used to describe how increasing the value of the
Reynolds number from Re D 200 to Re D 1000 affects drag reduction, and to
propose a rationale behind the observed performance deterioration. To the authors’
knowledge, this is the first study on spanwise forcing that includes a wide range of
forcing amplitudes, as well as Constant Pressure Gradient (CPG) data at different
values of Re.
The existing information regarding spanwise forcing has been significantly
extended. The classic argument linking the skin-friction drag changes of a rough
Turbulent Skin-Friction Drag Reduction at High Reynolds Numbers 397
wall to the vertical shift B of the logarithmic portion of the mean velocity profile
has been shown to apply to the case of spanwise forcing. A non-linear expression
has been derived that can be specialized to the CFR or CPG cases.
Under the assumption that B measured in the present work at Re D 1000 is
already Re-independent, Eq. (5) can be used to extrapolate drag reduction at higher
Re . It can be shown that a drag reduction of R D 0:5 at Re D 1000 translates into
R D 0:34 at Re D 105 . The decrease is still significant but not as dramatic as the
low-Re evidence suggests.
1. Auteri, F., Baron, A., Belan, M., Campanardi, G., Quadrio, M.: Experimental assessment of
drag reduction by traveling waves in a turbulent pipe flow. Phys. Fluids 22(11), 115103/14
2. Berger, T.W., Kim, J., Lee, C., Lim, J.: Turbulent boundary layer control utilizing the Lorentz
force. Phys. Fluids 12(3), 631–649 (2000)
3. Chang, Y., Collis, S.S., Ramakrishnan, S.: Viscous effect in control near-wall turbulence. Phys.
Fluids 14, 4069–4080 (2002)
4. Choi, K.S., Graham, M.: Drag reduction of turbulent pipe flows by circular-wall oscillation.
Phys. Fluids 10(1), 7–9 (1998)
5. Choi, K.S., DeBisschop, J., Clayton, B.: Turbulent boundary-layer control by means of
spanwise-wall oscillation. AIAA J. 36(7), 1157–1162 (1998)
6. Choi, J.I., Xu, C.X., Sung, H.J.: Drag reduction by spanwise wall oscillation in wall-bounded
turbulent flows. AIAA J. 40(5), 842–850 (2002)
7. Du, Y., Karniadakis, G.E.: Suppressing wall turbulence by means of a transverse traveling
wave. Science 288, 1230–1234 (2000)
8. Du, Y., Symeonidis, V., Karniadakis, G.E.: Drag reduction in wall-bounded turbulence via a
transverse travelling wave. J. Fluid Mech. 457, 1–34 (2002)
9. García-Mayoral, R., Jiménez, J.: Drag reduction by riblets. Phil. Trans. R. Soc. A 369(1940),
1412–1427 (2011)
10. Gatti, D., Quadrio, M.: Performance losses of drag-reducing spanwise forcing at moderate
values of the Reynolds number. Phys. Fluids 25, 125109(17) (2013)
11. Gatti, D., Quadrio, M.: Reynolds-number dependence of turbulent skin-friction drag reduction
induced by spanwise forcing. J. Fluid Mech. 802, 553–582 (2016)
12. Jung, W., Mangiavacchi, N., Akhavan, R.: Suppression of turbulence in wall-bounded flows by
high-frequency spanwise oscillations. Phys. Fluids A 4(8), 1605–1607 (1992)
13. Luchini, P., Quadrio, M.: A low-cost parallel implementation of direct numerical simulation of
wall turbulence. J. Comput. Phys. 211(2), 551–571 (2006)
14. Luchini, P., Manzo, F., Pozzi, A.: Resistance of a grooved surface to parallel flow and cross-
flow. J. Fluid Mech. 228, 87–109 (1991)
15. Nikitin, N.V.: On the mechanism of turbulence suppression by spanwise surface oscillations.
Fluid Dyn. 35(2), 185–190 (2000)
16. Pang, J., Choi, K.S.: Turbulent drag reduction by Lorentz force oscillation. Phys. Fluids 16(5),
L35–L38 (2004)
17. Quadrio, M., Ricco, P.: Critical assessment of turbulent drag reduction through spanwise wall
oscillation. J. Fluid Mech. 521, 251–271 (2004)
18. Quadrio, M., Sibilla, S.: Numerical simulation of turbulent flow in a pipe oscillating around its
axis. J. Fluid Mech. 424, 217–241 (2000)
398 D. Gatti
19. Quadrio, M., Ricco, P., Viotti, C.: Streamwise-traveling waves of spanwise wall velocity for
turbulent drag reduction. J. Fluid Mech. 627, 161–178 (2009)
20. Ricco, P., Quadrio, M.: Wall-oscillation conditions for drag reduction in turbulent channel flow.
Int. J. Heat Fluid Flow 29, 601–612 (2008)
21. Ricco, P., Wu, S.: On the effects of lateral wall oscillations on a turbulent boundary layer. Exp.
Therm. Fluid Sci. 29(1), 41–52 (2004)
22. Stroh, A., Gatti, D., Hasegawa, Y., Frohnapfel, B.: Influence of drag-reducing near-wall
turbulence control on spectral properties of Reynolds shear stress. In: Proceedings of the 11th
ETMM, Palermo (2016)
23. Tamano, S., Itoh, M.: Drag reduction in turbulent boundary layers by spanwise traveling waves
with wall deformation. J. Turbul. 13, N9 (2012)
24. Touber, E., Leschziner, M.: Near-wall streak modification by spanwise oscillatory wall motion
and drag-reduction mechanisms. J. Fluid Mech. 693, 150–200 (2012)
25. Trujillo, S., Bogard, D., Ball, K.: Turbulent boundary layer drag reduction using an oscillating
wall. AIAA Paper 97–1870 (1997)
Control of Spatially Developing Turbulent
Boundary Layers for Skin Friction Drag
Alexander Stroh
1 Introduction
A broad variety of control methods aimed at the reduction of skin friction drag in
turbulent boundary layers was introduced over the past few decades [1–4]. Since
the majority of these control methods are proposed for a configuration of a periodic
fully developed turbulent channel flow (TCF) controlling the entire wall area, the
knowledge about local control application is still limited. However, localized control
is more realistic from the engineering point of view. In this case the flow alteration
outside of the control region also has to be taken into account for the overall control
performance estimation. In the present work two locally applied drag reducing
control methods with entirely different control mechanisms are investigated in the
framework of spatially developing turbulent boundary layers (TBL) in order to
analyse the flow behaviour downstream of the control region. In addition, the global
performance of these flow control techniques is evaluated.
A. Stroh ()
Institute of Fluid Mechanics (ISTM), Karlsruhe Institute of Technology (KIT), Karlsruhe,
e-mail: alexander.stroh@kit.edu
3 Numerical Procedure
The investigation is performed using DNS of a turbulent boundary layer with zero
pressure gradient (ZPG). The coordinate system of the numerical domain and its
geometry are illustrated in Fig. 1, where x, y and z correspond to the streamwise,
wall-normal and spanwise directions respectively. For an incompressible fluid, the
Navier-Stokes equations for a constant property Newtonian fluid and the continuity
equation are required:
Dui @2 ui @p @ui
D 2 and D 0; (1)
Dt @xj @xi @xi
Ly turbulent region
ū Ly
control region
Lx y D xc
z z x
where p is the static pressure and is the dynamic viscosity. The Reynolds numbers
for a boundary layer flow are defined as
U1 ı0 U1
u ı99
Reı;0 D ; Re
D and Re D ; (2)
Table 1 Properties of considered simulation configurations for TBL. Viscous lengthscale is based
on the average u in the turbulent region of the TBL simulation. Grey shade highlights the main
configuration setup
Grid size Domain size Resolution Height Grid nodes
# Nx Ny Nz Lx Ly Lz xC yC zC max
N 106
1 512 129 128 600 30 34 23:8 0:1 8:2 5:9 2:25 8:6
2 1024 257 128 1200 60 34 23:8 0:1 8:2 5:9 2:88 33:6
3 3072 301 256 3000 100 120 17:8 0:1 13:3 8:9 2:32 236:7
402 A. Stroh
respect to the uncontrolled case, the reduction rate of skin friction drag is given by
cf w
RD1 with cf D ; (3)
cf ;0 0:5Ub2
where cf denotes the skin friction coefficient, Ub is the bulk mean velocity and the
subscript “0” denotes the uncontrolled value. If the flow rate in a channel flow is
kept constant (CFR),
p the modification of the skin friction coefficient is reflected in
a change in w D u = or u :
w u
RD1 D1 : (4)
w;0 u;0
Similarly, the control performance indices are introduced in TBL using U1 instead
of Ub , so the local driving power is given as
RD1 : (6)
Control is applied locally in the streamwise direction, while the spanwise extent
of the control area covers the total domain width (Fig. 1). All control types are
placed at the same position, x0 , with the same control area extension, xc . The
location is defined by the control input profile:
1; for x0 x x0 C xc
f .x/ D (7)
0; otherwise.
The control amplitude is smoothly increased and decreased at the edges of the
control area using a hyperbolic tangent function. Three control techniques are
considered for the present investigation: opposition control, body force damping
and uniform blowing.
Opposition control [1] is one of the most prominent classical reactive control
schemes. Control activation is performed by local suction and blowing in the wall-
normal direction at the wall surface, so as to suppress the sweep and ejection events
in the near-wall region and reduce the skin friction drag. In TCF the control is
commonly applied to the entire area of the wall, imposing wall-normal or spanwise
velocity opposite to the velocity captured at a prescribed sensing plane ys . The wall-
normal control input at the wall is given by
f .x/
by .x; y; z; t/ D v.x; y; z; t/; (9)
with the forcing time constant ˚. The scheme is very efficient in terms of drag
reduction (up to R D 75 % for yC c D 60) [12–14]. The technique reproduces
effects of various near-wall reactive control schemes such as opposition control or
suboptimal control [2] and provides more flexibility in terms of tuning and easier
implementation than velocity-based control schemes.
The most prominent example of drag reducing flow control is the uniform
blowing at the wall of a flat plate boundary layer [4, 15, 16]. The control scheme can
be also considered the most realistic one, since it does not utilize any information
about the instantaneous flow field and thus can be classified as a predetermined
active control technique. The control can be imagined to be implemented in reality
by transpiration through a porous wall or by direct suction or blowing through a slot
on the wall surface. The wall-normal velocity profile at the wall is given by
4 Computational Details
The solver is implemented using Fortran and utilizes OpenMP, MPI or hybrid (MPI
with OpenMP) parallelization paradigms. The code introduces one-dimensional and
two-dimensional domain decomposition for MPI parallelization model.
Smaller simulation configurations with 8:6 and 33:6 Mio. grid nodes (see
Table 1) have been used for tests, development and preliminary investigation of the
parameter set utilizing 16–32 and 64–128 CPU-cores per job, correspondingly. Main
simulations with 236:6 Mio. grid nodes has been carried out with 256 CPU-cores
per job, which has been found to be an optimal trade-off between queuing time and
simulation run time. One-dimensional and two-dimensional domain decomposition
404 A. Stroh
with MPI-parallelization has been utilized in the study. Due to the presence of higher
CFL numbers close to the wall for wall-based control application, simulations with
opposition control & uniform blowing utilize smaller time steps and hence have to
be executed for a longer time period to achieve the same statistical integrational
time in comparison to simulations with body force damping. However, since at least
three control configuration cases for each control technique had to be tested, it was
possible to run several (up to ten) 256-CPU-cores cases simultaneously. Table 2
presents the summary of the computational details for the carried out simulations.
Figures 2 and 3 demonstrate the strong scaling for the main simulation config-
uration with 236.7 Mio. grid nodes. Due to the dimension of the computational
domain and the specifics of utilized parallelization the amount of used CPUs
Table 2 Computational details of the performed simulation. Grey shade highlights the main
configuration setup
Grid nodes CPU-cores Process memory Initial field Mean time-to-solution,
# N 106 procs pmem, Mb size days per case
1 8:6 16;20;32 768 194 Mb 14
2 33:6 64;128 1024 773 Mb 40
3 236:7 256 1536 5.3 Gb 60
1d decomposition 2d decomposition
0 50 100 150 200 250
number of CPUs
Fig. 2 Speedup of the utilized numerical code for the main simulation configuration on ForHLR I
1d decomposition 2d decomposition
efficiency, [%]
0 50 100 150 200 250
number of CPUs
Fig. 3 Efficiency of the utilized numerical code for the main simulation configuration on
Control of Turbulent Boundary Layers for Skin Friction Drag Reduction 405
5 Results
The scientific results of the study have been published in [17] and [18]. Therefore,
the current section provides only a condensed summary about the simulation results
and contains text segments and figures from these publications. For further details
please refer to the journal publications.
Although turbulent boundary layers and turbulent channel flows reveal many
similarities in the corresponding flow statistics of near-wall turbulence, some
principal differences for these two flows are known to exist even in the uncontrolled
state [19]. The present project aims at understanding how opposition flow control
designed to reduce skin friction drag acts in both flows and whether fundamental
differences of the control mechanism can be identified.
In order to perform a direct comparison between TCF and TBL at a number of
different friction Reynolds numbers, five DNS of TCF (each driven by a prescribed
flow rate) are carried out. In TBL control is applied partially in the streamwise
direction, while the spanwise extension of the control area covers the total domain
width. All control areas begin at x0 D 186 corresponding to Re D 188 as shown in
Fig. 1. Three different control areas with a streamwise extension of xc D 100; 150
and 200 are introduced in TBL. The Reynolds numbers of the TCF are chosen in
such a way that the friction based Reynolds numbers for the uncontrolled TCF are
within the range found for the uncontrolled TBL. Statistical averaging for TCF
and TBL simulations is performed during 100–150 eddy turnover times after the
controlled flow reaches an equilibrium state.
Figure 4 shows the distribution of the local drag reduction rate for the three
control area lengths along the streamwise coordinate within the turbulent region
of the TBL in comparison to TCF results. It can be seen that very similar results in
terms of R are obtained for TBL and TCF.
Further insight into the mechanism how this drag reduction rate is generated in
this flow is provided through a decomposition of the skin friction coefficient into
its contributing parts as originally suggested by Fukagata et al. [21]. Their original
406 A. Stroh
180 200 220 240 260
control area 1 2 3
30 interpolated TCF
R [%]
150 200 250 300 350 400 450
Fig. 4 Comparison of skin friction drag reduction distribution in TBL with interpolated controlled
TCF results at Re D 150; 180; 227; 270; 300. Error bars represent a 3 -confidence interval for
TCF data [20] (The figure is taken from [17]. Reprinted with permission from AIP Publishing
formulation is modified in such a way that the centerline velocity, Ucl D uN .ı/,
(instead of the bulk velocity) is used as a normalisation factor in TCF, which
corresponds to the free-stream velocity in TBL. Accordingly, the skin friction
coefficient in TCF is defined by cf D w =0:5Ucl 2 . Consequently, the following
form of the FIK-identity in TCF for the newly defined cf can be derived [17]:
Z 1
2 @Np 4 .1 ıd /
cf D C C4 .1 y/ u0 v 0 dy;
3 @x Re 0
„ ƒ‚ … „ ƒ‚c … „ ƒ‚ …
cPf cLf T
pressure development laminar Reynolds shear stress
contribution contribution contribution
where y is normalised with the channel half-height ı and Rec D Ucl ı=. This
division shows that cf in the TCF consists of the laminar (cLf ) and turbulent (cTf )
contributions. In contrast to TCF, the FIK-identity for TBL is given by [17]:
Z 1 Z 1
4 .1 ıd /
cf .x/ D C4 .1 y/ u0 v 0 dy C 4 .1 y/ .Nuv/
N dy (12)
Reı 0 0
„ ƒ‚99 … „ ƒ‚ … „ ƒ‚ …
cıf cTf cCf
boundary layer Reynolds shear stress mean convection
contribution contribution contribution
Z !
2 @NuuN @u0 u0 1 @2 uN @Np
2 .1 y/ C C dy;
0 @x @x Reı99 @x2 @x
„ ƒ‚ …
spatial development
Control of Turbulent Boundary Layers for Skin Friction Drag Reduction 407
c f6 cDf
cDf cDf
cPf cPf
[P,L,d ,T,C,D]
3 cPf
cTf cTf cTf
2 cTf cTf cTf cTf cTf
cdf cdf cdf cdf cdf cdf cdf cdf
cCf cCf cCf
−1 cCf
Fig. 5 Comparison of dynamical contributions to cf in uncontrolled and controlled TCF and TBL
at Re D 227 and Re D 664. The figure is part of a figure in [17] (Reprinted with permission
from AIP Publishing LLC)
where ıd represents the displacement thickness. In this equation all variables are
non-dimensionalised by U1 and ı99 . The turbulent contribution, cTf , is obviously
present for the TCF and TBL cases, while the boundary layer contribution, cıf ,
from TBL can be compared with the laminar contribution, cLf , in TCF. For TBL
two additional terms, namely cCf and cDf , are present.
A comparison for opposition control in TCF and TBL is shown in Fig. 5 where
the skin friction decomposition for the uncontrolled and controlled flow states are
shown at a fixed Reynolds number. For TCF, the reduction of cTf is the main control
effect. In contrast, for TBL the suppression of the turbulent contribution cTf is
weaker while changes in the boundary layer specific terms, namely cCf and cD f , also
contribute to changes in skin friction drag. This difference between TCF and TBL
becomes more pronounced at higher Reynolds number.
Based on the obtained result, it is expected that the present scenario for drag
reduction does not change significantly for a further increase of Reynolds numbers.
Meanwhile, the fact that drag reduction in TBL is achieved through the interaction
of different dynamic contributions might eventually lead to different drag reduction
rates for TCF and TBL.
408 A. Stroh
Two skin friction drag reducing control schemes with essentially different control
mechanisms are investigated in turbulent boundary layers (TBL). While the first
control type, uniform blowing, affects the convective contribution to the skin friction
coefficient by introduction of additional mass flux, the second type, body force
damping, aims at direct reduction of cTf . Since all control will end at some point
on a surface, we investigate how the boundary layers develop after they have passed
the controlled sections and how this flow development influences the global control
performance [18].
The control placement corresponds to the previous study control area 3 (x0 D
186, xc D 200). Equation (10) defines the control input for the uniform blowing
with blowing intensity, Vw , set to 0:5 % of U1 . The reactive scheme of body force
damping is based on the definition from Eq. (9) with the forcing time constant ˚
fixed to 5=3 in order to yield a drag reduction similar to the uniform blowing case.
The body force is applied up to yC 40. For both control schemes the control
amplitude is increased and decreased smoothly within a spatial extent of 10ı0 at the
edges within the control area using a hyperbolic tangent function.
Figure 6 shows the influence of the applied control on the turbulent structures
of the flow. Due to cancellation of the wall-normal fluctuations in the near-wall
region, a strongly pronounced attenuation of turbulent activity can be observed for
body force damping. The effect is also visible over a certain area downstream of the
control region, where a retransition of the flow occurs. In contrast, the application
of uniform blowing rather leads to visible thickening of the TBL due to additional
y ce d
body for
z ing
rm blow
0 5 10 15
Fig. 6 Flow structure in uncontrolled and controlled cases represented by the isosurfaces of 2 -
criterion (2 D 0:005) coloured by the wall-normal coordinate. Red shaded area at the wall
marks the location of the applied control (Figure taken from [18]. Reprinted with permission from
Cambridge University Press)
Control of Turbulent Boundary Layers for Skin Friction Drag Reduction 409
400 800 1200 1600 2000 2400
30 body force damping
R̃ [%] uniform blowing
500 1000 1500 2000 2500
Fig. 7 Streamwise development of integral drag reduction rate. Shaded area marks the location
of control region (Figure taken from [18]. Reprinted with permission from Cambridge University
The two control types are adjusted to yield very similar R in the control region.
However, as seen in Fig. 7 they show significant differences downstream of the
control section. It can be shown that the resulting R far downstream of the control
section can actually be predicted when one quantity of the control is evaluated.
This essential quantity is a virtual shift xv that is introduced by the control. One
can imagine that the controlled flow eventually returns to a canonical state when
the control is no longer present. This state is the same as the one found for an
uncontrolled flow at a different location along the plate. Uniform blowing inserts
a positive shift, while body force damping leads to a negative shift. Due the this
difference the global performance strongly depends on the length of the uncontrolled
section after the control. Once xv is identified from the simulation results it can
be used to predict R on any longer plate. For details on the estimation methodology
please refer to the journal publication [18].
1. Choi, H., Moin, P., Kim, J.: Active turbulence control for drag reduction in wall-bounded flows.
J. Fluid Mech. 262, 75–110, 10 (1994)
2. Lee, C., Kim, J., Choi, H.: Suboptimal control of turbulent channel flow for drag reduction. J.
Fluid Mech. 358, 245–258, 3 (1998)
3. Kasagi, N., Suzuki, Y., Fukagata, K.: Microelectromechanical systems-based feedback control
of turbulence for skin friction reduction. Annu. Rev. Fluid Mech. 41, 231–251 (2009)
4. Kametani, Y., Fukagata, K.: Direct numerical simulation of spatially developing turbulent
boundary layers with uniform blowing or suction. J. Fluid Mech. 681, 154–172 (2011)
5. Lundbladh, A., Berlin, S., Skote, M., Hildings, C., Choi, J., Kim, J., Henningson, D.: An
efficient spectral method for simulation of incompressible flow over a flat plate. Technical
report (1999)
6. Skote, M.: Studies of turbulent boundary layer flow through direct numerical simulation. PhD
thesis, Royal Institute of Technology, Stockholm (2001)
7. Nordström, J., Nordin, N., Henningson, D.: The fringe region technique and the Fourier method
used in the direct numerical simulation of spatially evolving viscous flows. SIAM J. Sci.
Comput. 20, 1365–1393 (1999)
8. Chang, Y., Collis, S., Ramakrishnan, S.: Viscous effects in control of near-wall turbulence.
Phys. Fluids 14(11), 4069–4080 (2002)
9. Iwamoto, K., Suzuki, Y., Kasagi, N.: Reynolds number effect on wall turbulence: toward
effective feedback control. Int. J. Heat Fluid Flow 23(5), 678–689 (2002)
10. Pamiès, M., Garnier, E., Merlen, A., Sagaut, P.: Response of a spatially developing turbulent
boundary layer to active control strategies in the framework of opposition control. Phys. Fluids
19(10), 108102 (2007)
11. Satake, S., Kasagi, N.: Turbulence control with wall-adjacent thin layer damping spanwise
velocity fluctuations. Int. J. Heat Fluid Flow 17(3), 343–352 (1996)
Control of Turbulent Boundary Layers for Skin Friction Drag Reduction 411
12. Lee, C., Kim, J.: Control of the viscous sublayer for drag reduction. Phys. Fluids 14(7), 2523–
2529 (2002)
13. Iwamoto, K., Fukagata, K., Kasagi, N., Suzuki, Y.: Friction drag reduction achievable by near-
wall turbulence manipulation at high Reynolds numbers. Phys. Fluids 17(1), 011702–011702
14. Frohnapfel, B., Hasegawa, Y., Kasagi, N.: Friction drag reduction through damping of the near-
wall spanwise velocity fluctuation. Int. J. Heat Fluid Flow 31(3), 434–441 (2010)
15. Park, J., Choi, H.: Effects of uniform blowing or suction from a spanwise slot on a turbulent
boundary layer flow. Phys. Fluids 11(10), 3095–3105 (1999)
16. Kim, K., Sung, H.J., Chung, M.K.: Assessment of local blowing and suction in a turbulent
boundary layer. AIAA J. 40(1), 175–177 (2002)
17. Stroh, A., Frohnapfel, B., Schlatter, P., Hasegawa, Y.: A comparison of opposition control in
turbulent boundary layer and turbulent channel flow. Phys. Fluids 27(7), 075101 (2015)
18. Stroh, A., Hasegawa, Y., Schlatter, P., Frohnapfel, B.: Global effect of local skin friction drag
reduction in spatially developing turbulent boundary layer. J. Fluid Mech. 805, 303–321 (2016)
19. Jiménez, J., Hoyas, S., Simens, M.P., Mizuno, Y.: Turbulent boundary layers and channels at
moderate Reynolds numbers. J. Fluid Mech. 657, 335–360 (2010)
20. Oliver, T.A., Malaya, N., Ulerich, R., Moser, R.D.: Estimating uncertainties in statistics
computed from direct numerical simulation. Phys. Fluids 26(3), 035101 (2014)
21. Fukagata, K., Iwamoto, K., Kasagi, N.: Contribution of Reynolds stress distribution to the skin
friction in wall-bounded flows. Phys. Fluids 14, L73–L76 (2002)
Scalability of OpenFOAM with Large Eddy
Simulations and DNS on High-Performance
1 Introduction
on massively parallel systems. Therefore several studies were performed in the last
years. The CSC IT Center of Science in Finland run benchmark of the cavity test
case up to 22 million of cells. They reached nearly super linear scalability up to
1024 CPUs [2]. Duran et al. investigated the scalability for bio-medical flows. Using
icoFoam as laminar, incompressible flow solver (DNS), they achieved in their study
even super linear behaviour up to 2048 cores [3]. Pringle [5] investigated in his study
the scalability of the cavity benachmark from 4 to 4096 cores. The mesh size was
increased from 1003 to 2003 cells. Super linear behaviour was achieved up to 1024
In this study, the parallel performance of OpenFOAM has been investigated on
the HPC system Cray XC40 Hazel Hen (Stuttgart). Hazel Hen is a massivly parallel
computer with 7712 nodes, each with two 12-core Intel Xeon E5-2680 b3 CPU’s
and 128 GB of memory per node. The interconnect consists of a Cray Aries network
with Dragonfly topology. One node consists of two sockets each with 12 cores.
Filling these nodes completely is beneficial with respect to both communication
and fragmentation of the job queue. Unless otherwise explicitly written, all cases
are run with OpenFOAM version 2.3.0 compiled with GNU/Intel compiler.
The basic equations for LES were first formulated by Smagorinsky [6] in the
early 1960s. Since computational resources were severely limited by that time an
alternative to resolving all the scales of motion had to be conceived. Based on
the theory of Kolmogorov [4], that the smallest scales of motion were uniform
and the assumption that these small scales serve mainly to drain energy from the
larger scales through a cascade process, it was felt that the small scales could be
successfully approximated. The large scales of motion, which contain most of the
energy, perform of the transport and are affected the strongest by the boundary
conditions, such that they should therefore be calculated directly, while the small
scales are represented by a model. This is the basis of LES.
In order to separate the large scales of motion from the small ones, some kind
of averaging must be done. In LES, this is locally derived by a weighted average of
flow properties over a volume fluid. The filtering process is performed with a filter
width . This represents a characteristic length scale. Thus scales, larger than are
retained in the filtered flow field, while scales smaller than must be modeled by a
Sub-Grid Scale (SGS) model. Formally in LES, any flow variable f is decomposed
in larger and small scales via:
f D f C f 0; (1)
Scalability of OpenFOAM with Large Eddy Simulations and DNS on HPC Systems 415
where the prime denotes the small scales and the overbar the larger ones. In order to
extract the large-scale components a filter operation is applied:
f .x/ D G.x; x0 I /f .x0 /dx0 ; (2)
where is the filter width proportional to the wavenlength of the smalles scales,
retained by the filtering operation G.x; x0 I /: The most common filters that have
been applied to LES are the Gaussian filter and the top-hat filter. The latter one is
the common choice for finite volume methods, because the average is over a grid
volume where the flow variables are a piecewise function of x. This implies that the
filter width is equal to the grid-spacing.
Next, this filtering process is applied to the Navier-Stokes equations. For
incompressible flow they are:
r uD0 (3)
@u 1
C r .uu/ D rp C r .ru C ruT / (4)
D uu u u: (5)
In LES, is known as the sub-grid scale stress. In the limit for small mesh spacings,
where j j! 0 as ! 0, a DNS solution is retained. This is similar to Reynolds
stress modelling in RANS. However, the SGS stresses here represent a much smaller
part of turbulent energy spectrum than in RANS turbulent energy. Of course, this
modelling leads a higher buildup of energy in resolved scales and can produce
instabilities. Decomposing the SGS stress results in three separate terms:
where the first term represents the interaction of resolved eddies (Leonard term), the
second term the energy transfer between the resolved and unresolved scales (cross
term) and the last term the effect of small eddy interaction (SGS Reynolds stress).
The main role of the SGS model is to extract energy from the resolved scales and
model the drain associated with the energy cascade. This can be done with an eddy-
viscosity model (similar to the RANS turbulence modeling approach). The normal
stresses are taken as isotropic and can be expressed in terms of SGS kinetic energy:
1 2
tr./I D KI D SGS .ru C ruT / D 2SGS S; (7)
3 3
416 G. Axtmann and U. Rist
where CS is the Smagorinsky constant typically chosen between 0.1 and 0.2. For
further information we refer to Smagorinsky [8].
The backward facing step testcase is supplied with OpenFOAM (pitzDailys) and is
an example of a LES simulation. The solver used is pisoFoam and solves the Poisson
equation by using a Pressure Implicit stepping method (PISO). For turbulence
modeling, the k-equation eddy viscosity model with a cube root of the cell volume is
used as LES filter width . The schematic view of the domain is shown in Fig. 1. Top
and bottom walls are set to non-slip conditions, while periodic boundary conditions
on the sides are used. The inlet boundary condition contains an artificial noise of 2 %
of the velocity. For the outlet condition a pressure-driven type is used. The Reynolds
number Reh is 13333 with respect to the step height h. The Pressure equation is
solved by the Pre-conditioned Conjugate Gradient solver (PCG). All other fields
with the preconditioned Bi-Conjuage Gradient (PBiCG) solver. In all cases the CFL
number is less than 1.0. A brief summary of geometrical dimensions and parameters
is given in Table 1. For benchmarking, five meshes with different resolutions were
examined. Beginning from one million of hexahedral cells, the size was doubled
up to 16 million cells. Within the meshes, the x, y and z-discretization is adapted
for higher resolution near the step region. Runs on 1, 2, 4, 9, 18, 36, 27 and 144
nodes were performed, summarized in Table 2. Here, one node consists of 24 cores,
thus the number of MPI tasks range up from 24 to 3456. With increasing mesh
resolution, the number of cells and aspect ratios were adjusted to be conform with
the smallest mesh. An example for the discretization of the 16 million cell mesh is
given in Fig. 2. Strong- and weak scaling studies were performed with these meshes.
In addition MPI routines were traced by using CRAY‘s performance measurement
and analysis tool CrayPAT. This is a suite of optional utilities that enable tracing
and analyzing performance data [1]. To enable this utility, the user has to compile
the code of interest with additional flags. On HazelHen precompiled versions of
OpenFOAM v2.3.0 and v2.4.0 are already compiled for profiling via CAE modules.
CrayPAT identifies bottlenecks, collects statistics and helps to optimize parallel
efficiency. Since the focus of this report is on the scalability of OpenFOAM just
a brief visualization of the flow results is given in Fig. 3a in terms of SGS and (b)
Line Integral Convolution (LIC) visualization of the velocity field U. This shows
that OpenFOAM is capable resolving such highly complex flows and features.
418 G. Axtmann and U. Rist
Fig. 3 (a) Turbulente kinetic viscosity SGS and (b) LIC visualization of velocity U at center plane
at t D 0:11 s
First, an investigation of strong scaling is performed. Here, the solution time varies
with the number of processors for a fixed total problem size. For comparison of data,
the speedup is defined as:
Sp D 1f
.f C p
ED ; (12)
a) b) 1.8
102 1.6 M
1.4 M
1.2 M
100 0.2
101 102 103 104 102 103 104
M P I P rocesses M P I P rocesses
Fig. 4 (a) Strong scaling speedup Sp and (b) parallel efficiency E for backward facing step
benchmark: GNU compiler, - - Intel compiler
101 MPI
100 101 102
M esh Size [M ]
Fig. 5 Weak scaling speedup Sp for backward facing step benchmark: GNU compiler, - - Intel
performance is achieved at 432 MPI tasks. Only between 24 and 432 MPI tasks, a
small performance increase compared to the code compiled with GNU compiler is
observed. Cache effects are for both, GNU and Intel compiler present, as shown in
Fig. 4b. Between 24 and 432 MPI tasks, they ramp up until 1.5 for GNU and even 1.6
for Intel compiler. With increasing MPI tasks, these effects get insignificant. Next,
we show the results of scaling study in terms of weak scaling. The advantage of
weak scaling is that this reveals problems which are not related with load imbalance
due to small domains. It shows how the solution time varies with the number of MPI
tasks for a fixed problem size per processor. The comparison between GNU and Intel
compiler is shown in Fig. 5. From 24 to 216 MPI tasks a decrease in scalability with
respect to increasing mesh size is observed, while from 432 to 3456 MPI tasks GNU
and Intel compiler scale reasonably well. Comparing GNU and Intel with each other,
a small benefit of Intel compiler in the range lower than 432 MPI tasks is observed.
For higher MPI tasks, the GNU compiler scales better.
420 G. Axtmann and U. Rist
Table 3 Time spent in MPI ALL, ETC, USER and IO routines in relation to total time for
backward facing step benchmark GNU compiler, M: = mesh size in million cells, N = Number of
nodes (2 12 D 24 CPU)
Case mesh/nodes MPI ALL[%] ETC [%] USER [%] IO [%]
1M 2N 16:3 2:5 80:4 0:8
1M 36N 60:7 28:5 7:6 3:2
1M 72N 91:8 1:3 4:0 2:9
1M 144N 94:5 1:2 1:9 2:4
8M 2N 6:9 0:0 92:5 0:6
8M 36N 47:8 1:8 48:8 1:6
8M 72N 77:9 1:0 19:6 1:5
8M 144N 90:7 0:0 7:1 2:2
16M 2N 6:3 93:2 0:0 0:5
16M 36N 19:0 19:6 61:2 0:2
16M 72N 61:4 0:0 37:4 1:2
16M 144N 70:7 0:0 27:3 2
The time spent in MPI ALL, ETC, USER and IO in [%] for GNU compiler is
given in Table 3. The parallel processing MPI ALL is maximal for the 144 nodes
benchmarks and lies in between 70.7 % and 94.5 %. The averaged IO produced over
all test cases is calculated at 1.6 % and quite low. Additionally, imbalance sampling
rates of several MPI routines were measured. In Fig. 6 the imbalance sampling rates
for three different mesh sizes over 2, 36, 72 and 144 nodes for Intel and GNU
compiler are shown. For the 1M mesh, most overhead is produced by MPI Isend
and MPI Recv with 53 % and 49 % for both compilers. Furthermore, with higher
node numbers, the calls of MPI Waitall increases rapidly and generates overhead up
to 45 % for Intel and 48 % for GNU compiler.
By increasing the mesh size up to 16 million cells, the imbalance of all MPI
routines is decreasing to 42 % maximal. This is caused by better intercommunication
between the different subdomains. Most significant overheads are again observed
for MPI Isend, MPI Recv and MPI Waitall. Comparing these results to the different
compilers for the 16 million cells mesh higher imbalance rates are observed by Intel
compiler. Again most overhead is produced in the MPI Isend and MPI Recv and MPI
Waitall routines. Here, higher performance rates of Intel compiler do not show up
for increasing node numbers. For example for the mesh size 16M calculated with
144 nodes, the imbalance of the MPI Waitall routine by GNU compiler is 30 %,
while the imbalance using the Intel compiler results in 42 %. This implies a big
improvement in scalability of the open-source GNU compiler in the last years.
Scalability of OpenFOAM with Large Eddy Simulations and DNS on HPC Systems 421
Fig. 6 MPI routines measured by CrayPAT for Intel and GNU compiler for 1M, 8M and 16M
mesh size over 2, 36, 72 and 144 nodes
In addition to the study above, a second benchmark using a solver, which solves
the Navier-Stokes-Equations directly, is investigated. Therefore, the classical cavity
tutorial supplied with OpenFOAM is extended from two to three dimensions and
used as a benchmark. The front and back patches are converted to walls, such that
the domain is a cube with five steady and one moving wall. For further information
422 G. Axtmann and U. Rist
the reader is referred to the OpenFOAM Documentation [8]. The Reynolds number
has been increased from 10 to 1000. Since the flow is laminar and incompressible,
icoFoam is used as solver. Here, the performance study was only done with GNU
compiler. Some important parameters of the simulation are given in Table 4. The
investigated cases (mesh sizes in million is indicated by suffix M and the number
of nodes is indicated by suffix N, the number of MPI Processes is N times 24 are
presented in Table 4.
From Fig. 7a we can see that the icoFoam solver compiled with GNU scales
well up to 3456 MPI tasks. Super ideal scaling is observed up to 1728 tasks. Since
the solver does not include any turbulence modeling and solves the Navier-Stokes-
Equation directly, less overhead in comparison to the LES benchmark is produced.
a) b) 1.6
1.4 M
102 M
1.2 M
1.0 M
101 0.8
100 0.2
101 102 103 104 102 103 104
M P I P rocesses M P I P rocesses
c) 102
101 MPI
Fig. 7 (a) Strong scaling speedup Sp , (b) parallel efficiency E and (c) weak scaling speedup for
lid-driven cavity benchmark: GNU compiler
Scalability of OpenFOAM with Large Eddy Simulations and DNS on HPC Systems 423
At the performance peak for the 27 million cells mesh with 1728 MPI tasks, we
get a maximal cell number distribution for each core around 15,000 cells. With
more than 1728 tasks, the performance decreases. Again, this is due to a too small
testcase, where the cell numbers on each core drops below than 10,000 and the
overhead intercommunication increases. Cache effects are present, as shown in
Fig. 7b. Between 24 and 100 MPI tasks, they reach until 1.5. With increasing MPI
tasks, these effects decrease. The weak scaling is shown in Fig. 7c. Almost linear
scaling for the one million cells mesh is observed for 96 MPI tasks. By increasing
mesh size, there is a strong dependency between cell number distribution for each
core and MPI tasks. Best results are given for 864, 1728 and even 3456 MPI tasks
for higher mesh sizes.
4 Conclusion
1. Cray Research Inc.: Optimizing applications on the Cray X1 system. URL http://docs.cray.com/
books/S-2315-52/html-S-2315-52/index.html (2002)
2. CSC IT Center: OpenFOAM -CSC (2010). https://research.csc.fi/-/openfoam
3. Duran, A., Celebi, M.S., Piskin, S., Tuncel, M.: Scalability of OpenFOAM for bio-medical flow
simulations. J. Supercomput. 71(3), 938–951 (2015). doi:10.1007/s11227-014-1344-1, http://
424 G. Axtmann and U. Rist
Abstract This report covers two aspects of impinging jets: heat transfer enhance-
ment and sound source mechanisms. Recent experimental investigations indicate
a possible increase of up to 40 % of heat transfer efficiency due to a pulsation of
the inlet. However, the underlying physical effects are still unclear. Performing
direct numerical simulations, we were able to compute the eigenfrequencies of
the impinging jet. Our hypothesis is that pulsating with that frequency leads to
a maximal increase of ring vortices and consequently of the heat transfer at the
impinging plate. First results of a pulsed impinging jet are shown. In addition,
impinging compressible jets may cause deafness and material fatigue due to
immensely loud tonal noise. It is generally accepted that a feedback mechanism is
responsible for impinging tones. However, it is being discussed which mechanism
creates those strong pressure waves. Using direct numerical simulations we were
able to identify the source mechanism for under-expanded impinging jets with a
nozzle pressure ratio of 2.15 and a plate distance of 5 diameters. We found two
different types of interactions between vortices and shocks to be responsible for the
generation of the impinging tones.
1 Introduction
Within this report, our in-house DNS code is used for the investigation of two topics
concerning impinging jets. For this reason, the code is described first (Sect. 2).
Afterwards, there is one section in which we look at a possible increase of the
efficiency due to pulsation (Sect. 3) and another one addressing the sound source
mechanism of impinging tones (Sect. 4). Each section has its own introduction and
(a) (b)
Fig. 2 Strong and weak scaling of the code on CRAY XC40 (Hazelhen). (a) Strong scaling;
simulations run with 10243 grid points. (b) Weak scaling; simulations run with 643 grid points
per core
Table 1 Scaling of the code on CRAY XC40 (Hazelhen). Upper part: strong scaling, simulations
run with n D 10243 . Lower part: weak scaling, simulations run with n=i D 643 . n and i denote the
total number of grid points respectively the number of used cores
i .n=i/1=3 Wall time per time step [s] Speedup Ideal speedup Efficiency
512 128 166 1.00 1 1:00
1024 102 85.6 1.94 2 0:97
2048 81 39.2 4.23 4 1:06
4096 64 17.9 9.28 8 1:16
8192 51 9.1 18.3 16 1:14
16;384 40 5.1 32.6 32 1:02
32;768 32 3.7 44.9 64 0:70
i n Wall time per time step [s] Speedup Ideal speedup Efficiency
32 8:4 106 16.7 1.00 1 1:00
64 1:7 107 16.6 2.02 2 1:01
128 3:4 107 17.0 3.93 4 0:98
256 6:7 107 17.1 7.84 8 0:98
512 1:3 108 17.1 15.7 16 0:98
1024 2:7 108 17.0 31.4 32 0:98
2048 5:4 108 17.2 62.1 64 0:97
4096 1:1 109 17.9 120 128 0:94
8192 2:1 109 18.5 231 264 0:90
16;384 4:3 109 20.1 425 512 0:83
from the original decomposition to the decomposition used for the calculation of
derivatives in x-direction. The required inter-process communication is managed
via MPI libaries.
The code is successfully used on CRAY XC40 (Hazelhen). Figure 2 shows nearly
perfect linear scaling up to 16384 cores on that machine. Detailed run times can be
found in Table 1. The scaling was made for the case of an impinging jet. Using auto-
vectorisation, the efficiency with 16,384 cores is 102 % (strong) respectively 83 %
428 R. Wilke and J. Sesterhenn
(weak). Grids with 5123 (10243) points are typically parallelized on 163 D 4096
(32 16 16 D 8192 or 32 32 16 D 16;384) cores. The preferred wall time
interval is 24 h.
The computational domain has the size of 12 5 12 diameters. The cuboid is
delimited by four non-reflecting boundary conditions: one isothermal wall which is
the impinging plate and one boundary consisting of an isothermal wall and the inlet.
The walls are fully acoustically reflective. The location of the nozzle is defined
using a hyperbolic tangent profile with a disturbed thin laminar annular shear layer
as described in [16].
A sponge region is applied for the outlet area r=D > 5 that smoothly forces the
values of pressure, velocity and entropy to reference values. This destroys vortices
before leaving the computational domain. The reference values at the outlet were
obtained by a preliminary large eddy simulation of a greater domain.
The grid is refined in the wall-adjacent regions in order to ascertain a maximum
value of the dimensionless wall distance yC of the closest grid point to the wall not
larger than one for both plates. For the wall-parallel-directions a slight symmetrical
grid stretching is applied, which refines the shear layer of the jet. The refinements
use hyperbolic tangent respectively hyperbolic sin functions resulting in a change
of the mesh spacing lower than 1 % for all directions and cases. The physical and
geometrical parameters of the simulations are given in Tables 2 and 3.
3 Heat Transfer
3.1 Introduction
Fig. 3 Increase of heat transfer effectivity ˚Re of an pulsating impinging jet related to a stationary
one, depending on the Strouhal number Sr and amplitude AMP (Modified from [11])
3.2 Results
Fig. 4 Life cycle of the secondary vortex ring and connection to the local heat transfer.
Simulation #2
The phenomenon leads to high fluctuations of the temperature and the axial
velocity and consequently to high values of the turbulent heat flux at radii of
r=D D 1::1:6, as shown in Fig. 5a. As a consequence, the decreasing trend of the
Nusselt number with increasing radius is strongly weakened in this area (Fig. 5b).
This means that the ring vortices participate positively to the heat transfer. As a
consequence, we aim to maximally increase the ring vortices by applying a pulsation
at the inlet. We expect a strong positive influence to heat transfer at the impinging
Only DNS is able to correctly predict the effect of the vortex pairs on the Nusselt
number. Dairay et al. [5] compared large eddy simulations using different subgrid
scale models with DNS. They observed that non of the tested models was able to
clearly predict the secondary peak in the Nusselt number distribution, as measured
and computed with DNS. This investigation is the only one in literature, where
432 R. Wilke and J. Sesterhenn
2 40
1 30
0 20
-1 10
-2 0
0 1 2 3 4 0 1 2 3 4
Fig. 5 Time averaged local heat transfer of the DNS #3, Re D 3300, Ma D 0:408. (a) Turbulent
heat flux at y=D D 0:05. (b) Nusselt number
large eddy- and direct numerical simulations are directly compared for an impinging
jet. Given that well-resolved LES computations fail for heat transfer prediction, the
usage of RANS cannot be recommended for the given case as long as common
models are not adapted. Our DNS provide a database for the improvement of such
models. More quantities for validation are given in [20].
Conducting direct numerical simulations, we were able to identify the frequency
of the vortical system. Therefore we performed a FFT of the Nusselt number at the
impinging plate and a dynamic mode decomposition (DMD) of the entire flow field.
Both methods revealed a Strouhal number of 0.46 and its first harmonic (0.92) as
the important frequencies. This numbers are based on simulation #1 and #2. We can
record that the Reynolds number has no significant influence on the phenomena and
the eigenfrequency of the subsonic impinging jet in the range of 3300 Re 8000.
This allows us to proceed with the lower Reynolds number for further investigations.
Comparing simulations #2 and #3, we see an influence of the Mach number.
In the case of Ma D 0:78 the dynamic mode decomposition reveals clearly only
one dominant frequency: Sr D 0:46 and its first harmonic. At lower speed (Ma D
0:41) this frequency remains, but an additional dominant frequency appears: Sr D
0:59. According to the coefficients of the DMD, the mode with Sr D 0:59 is even
more relevant and was therefore used for the pulsating impinging jet, described in
Sect. 3.2.2.
(a) (b)
Fig. 6 Vortical structure represented by Q [s2 ] ranging from 106 (blue) to 106 (red) on a cut
through the jet axis. Both simulations have the same mass flow. (a) #3 stationary inlet. (b) #4
pulsed inlet
to the non-pulsating case, the double resolution in each space direction is needed
in order to ensure the resolution of the Kolmogorov length scale. The simulations
to be compared are #3 (non-pulsed) and #4 (pulsed). Both have the same mass
flow and dynamic viscosity. In order to avoid supersonic flow, the maximal nozzle
pressure ratio (NPR) for the pulsed jet was chosen equal to the simulations #1 and
#2: NPRD p0 =p1 D 1:5. As result of these conditions, the NPR for the low Mach
number case is 1:1217. This avoids another simulation as reference for the pulsed
case. The values given in Table 2 refer to the time span in which the valve is open.
For the non-pulsed jets those values are, apart from fluctuations due to acoustic
waves reaching the nozzle, constant and therefore also the average. In contrast, the
mass flow of the pulsed case is half of the value given in the table, since the time
spans of closed and open valve are equal.
In Fig. 6 the vortical structure of the non-pulsed (a) and the pulsed (b) case
are confronted. A strong increase of Q of the pulsed case indicates that the
eigenfrequency is a reasonable choice for the pulsation frequency. Statistical values
are not available at this stage of work. However, we expect an increase of the integral
Nusselt number for the pulsed case.
3.3 Conclusion
Vortex rings are responsible for an additional heat transfer at the wall due to a
positive contribution of the turbulent heat flux. Those vortex rings occur peri-
odically. The frequency is not dependent on the Reynolds number in the range
3300 Re 8000. On the contrary, the Mach number plays a role. In the high
subsonic regime one mode is dominant (Sr D 0:46), whereas at lower Mach number,
a second one occurs additionally (Sr D 0:59) and exceeds the importance regarding
heat transfer of the lower frequent mode. The frequency of this mode (Sr D 0:59)
434 R. Wilke and J. Sesterhenn
was applied to a pulsed inlet. As a result of the pulsation, the ring vortices were
strongly amplified. Quantitative results (e.g. Nusselt number profile) are due, since
the simulation is presently running.
4 Impinging Tone
4.1 Introduction
4.2 Results
4.2.1 Shock-Vortex-Interaction
This kind of sound-emitting interaction requires two components: One shock and
one vortex or an aggregation of vortices. The computational results show that
multiple shocks can occur near by the stagnation point. Usually two or three shocks
are simultaneously present. The system of the shocks is highly unsteady within a
periodical cycle.
Numerical Simulation of Subsonic and Supersonic Impinging Jets II 435
4.2.2 Shock-Vortex-Shock-Interaction
The second kind of interaction that produces strong acoustic waves involves two
shocks, a vortex ring and a sonic line. Figure 8 shows snapshots of the simulation
with Re D 8000. All snapshots are a section of a slice through the jet axis. In the
first column normalised values of Q and of the divergence of the velocity field div.u/
are shown. This mechanism requires a periodical appearance and disappearance of
the supersonic zone close to the stagnation point. We start from a point in time
where the supersonic zone close to the stagnation point was destroyed and a new
one is transported downstream by the jet. This zone is circumscribed by the sonic
line (M D 1). As long as no obstacles are in the way, the sonic line travels together
with vortex rings, but slightly ahead of them. Travelling further downstream the
supersonic zone encounters zones of high pressure, which are fragments of the high
pressure at the stagnation point. As mentioned, typically there are multiple of such
zones. In our example, we have three of them. Each time the sonic line faces a zone
of high pressure, it stops its downstream movement for a while until the jet pushes
the sonic line over the shock by continuously delivering new fluid. The vortex rings
travel in the shear layer, which is outside of the high pressure zone formed only
in the core of the jet. Thus they are not affected by those high pressure zones. As
a consequence, the vortex rings approach the sonic line and interact. This means
they influence the shape of the sonic line due to its rotating velocity components.
In the first row of Fig. 8 the sonic line is confined by the shear layer of the jet in
radial direction. Streamwise it consists of three parts: on the left side, the sonic
line coincides with the upper shock, whereas on the right side, it coincides with
the lower shock. The crossover coincides with the inner border of the left side of
the vortex ring. The sound wave is produced when this arrangement collapses: The
vortex is not able anymore to separate the sub- and supersonic areas. This can be
seen in the following two time steps (second and third row of Fig. 8). The sonic
line looses its connection to the vortex ring and the upper shock and jumps to the
lower shock so that the upper shock gets embedded in the supersonic zone. Thereby
a subsonic area is initially embedded and then collapses. A strong spheric pressure
wave expands from that point. This goes through the whole jet and reaches the
nozzle. The phenomenon therefore triggers new instabilities of the shear layer and
is part of a feedback mechanism.
In order to obtain the sound spectra, the pressure was recorded in the near-field
on three different cylinders around the jet axis at distances of two, three and four
diameters. For the presented results, the position r=D D 4 and y=D D 5 was
chosen. The upper wall has the advantage, that the velocity is zero and no flow
disturbs the acoustic measurements. The choice of the radius does not influence
the investigated tones (frequencies), since the different distances only move the
sound pressure level up and down. For each of the 256 circumferential positions,
438 R. Wilke and J. Sesterhenn
sound wave
-1 0
10 10
Fig. 9 Sound pressure level (SPL) of the supersonic impinging jet with Re D 8000. Reference
pressure: pref D 2 105 Pa
the spectra was computed using a fast Fourier transform (FFT). The spectra were
then averaged. Figure 9 shows the sound pressure level depending on the Strouhal
number. The impinging tone can be clearly observed at Sr D 0:32. A prove that the
two sound source mechanisms found correspond to this frequency is given in [20].
4.3 Conclusion
Despite the general accordance that impinging tones are produced due to a feedback
loop, inconsistent statements about the production of the sound waves can be found
in literature. In addition, no consensus could be found if standoff shocks are present
in the pre-silence zone, a regime in NPR, where tones can be observed.
In order to clarify the open questions, we performed a direct numerical simulation
with a nozzle pressure ratio of 2.15 and a nozzle-to-plate distance of five diameters
at Reynolds number of 8000. Analysing the data, we find that standoff shocks
periodically appear, disappear and move between the impinging plate and the shock
cell system. Multiple standoff shocks can exist simultaneously, usually two or three
are present for the chosen set of parameters. Concerning the generation of impinging
tones, we clearly observe the feedback loop and prove that the interaction between
vortices and standoff shocks produce the sound waves via two different mechanisms.
One of the two mechanism can analogously be found in free jets and is responsible
for screech. The difference however is that not the shock diamonds, but the standoff
shock is involved in the interaction with the vortices. The impinging tone is not
related to screech. The mode of the impinging jet is axisymmetrical.
440 R. Wilke and J. Sesterhenn
Acknowledgements The simulations were performed on the national supercomputer Cray XC40
(Hornet, Hazelhen) at the High Performance Computing Center Stuttgart (HLRS) under the grant
numbers GCS-NOIJ/12993 and GCS-ARSI/44027.
The authors gratefully acknowledge support by the Deutsche Forschungsgemeinschaft (DFG)
as part of collaborative research center SFB 1029 “Substantial efficiency increase in gas turbines
through direct use of coupled unsteady combustion and flow dynamics”.
1. Bogey, C., de Cacqueray, N., Bailly, C.: A shock-capturing methodology based on adaptative
spatial filtering for high-order non-linear computations. J. Comput. Phys. 228(Nr. 5), 1447–
1465 (2009). http://dx.doi.org/http://dx.doi.org/10.1016/j.jcp.2008.10.042, doi:http://dx.doi.
org/10.1016/j.jcp.2008.10.042, ISSN 0021–9991
2. Chung, Y.M., Luo, K.H.: Unsteady heat transfer analysis of an impinging jet. J. Heat Transf.
124, 12(Nr. 6), 1039–1048 (2002). ISBN 0022–1481
3. Cziesla, T., Biswas, G., Chattopadhyay, H., Mitra, N.: Large-eddy simulation of flow and heat
transfer in an impinging slot jet. Int. J. Heat Fluid Flow 22(Nr. 5), 500–508 (2001). http://
dx.doi.org/http://dx.doi.org/10.1016/S0142-727X(01)00105-9, doi:http://dx.doi.org/10.1016/
S0142--727X(01)00105--9, ISSN 0142–727X
5. Dairay, T., Fortuné, V., Lamballais, E., Brizzi, L.: LES of a turbulent jet impinging on a heated
wall using high-order numerical schemes. Int. J. Heat Fluid Flow 50(Nr. 0), 177–187 (2014).
http://dx.doi.org/http://dx.doi.org/10.1016/j.ijheatfluidflow.2014.08.001, doi:http://dx.doi.org/
10.1016/j.ijheatfluidflow.2014.08.001, ISSN 0142–727X
4. Dairay, T., Fortuné, V., Lamballais, E., Brizzi, L.-E.: Direct numerical simulation of a turbulent
jet impinging on a heated wall. J. Fluid Mech. 764(2), 362–394 (2015). http://dx.doi.org/10.
1017/jfm.2014.715, doi:10.1017/jfm.2014.715, ISSN 1469–7645
6. Eidson, T.M., Erlebacher, G.: Implementation of a fully balanced periodic tridiagonal solver
on a parallel distributed memory architecture. Concurr.: Pract. Exp. 7(Nr. 4), 273–302 (1995)
7. Hattori, H., Nagano, Y.: Direct numerical simulation of turbulent heat transfer in plane
impinging jet. Int. J. Heat Fluid Flow 25(Nr. 5), 749–758 (2004). http://dx.doi.org/http://dx.
doi.org/10.1016/j.ijheatfluidflow.2004.05.004, doi:http://dx.doi.org/10.1016/j.ijheatfluidflow.
2004.05.004, ISSN 0142–727X. Selected papers from the 4th International Symposium on
Turbulence Heat and Mass Transfer
9. Henderson, B.: The connection between sound production and jet structure of the supersonic
impinging jet. J. Acoust. Soc. Am. 111,(Nr. 2), 735–747 (2002). http://dx.doi.org/http://dx.
doi.org/10.1121/1.1436069, doi:http://dx.doi.org/10.1121/1.1436069
8. Henderson, B., Powell, A.: Experiments concerning tones produced by an axisymmetric
choked jet impinging on flat plates. J. Sound Vib. 168(Nr. 2), 307–326 (1993). http://dx.doi.org/
http://dx.doi.org/10.1006/jsvi.1993.1375, doi:http://dx.doi.org/10.1006/jsvi.1993.1375, ISSN
10. Ho, C.-M., Nosseir, N.S.: Dynamics of an impinging jet. Part 1. The feedback phenomenon.
J. Fluid Mech. 105(4), 119–142 (1981), http://dx.doi.org/10.1017/S0022112081003133,
doi:10.1017/S0022112081003133, ISSN 1469–7645
11. Janetzke, T.: Experimentelle Untersuchungen zur Effizienzsteigerung von Prallkühlkonfigura-
tionen durch dynamische Ringwirbel hoher Amplitude, TU Berlin, Diss. (2010)
12. Peña Fernández, J.J., Sesterhenn, J.: Interaction between the shear layer, shock-wave and
vortex ring in a starting free jet injecting into a plenum. In: European Turbulence Conference,
Delft (2015)
13. Rockwell, D., Naudascher, E.: Self-sustained oscillations of impinging free shear layers. Annu.
Rev. Fluid Mech. 11(Nr. 1), 67–94 (1979)
Numerical Simulation of Subsonic and Supersonic Impinging Jets II 441
14. Sesterhenn, J.L.: A characteristic–type formulation of the Navier–Stokes equations for high
order upwind schemes. Comput. Fluids 30(Nr. 1), 37–67 (2001)
15. Weigand, B., Spring, S.: Multiple jet impingement – a review. Heat Transf. Res. 42(Nr. 2),
101–142 (2011). ISSN 1064–2285
16. Wilke, R., Sesterhenn, J.: Direct numerical simulation of heat transfer of a round subsonic
impinging jet. In: Active Flow and Combustion Control 2014, pp. 147–159. Springer, Cham
17. Wilke, R., Sesterhenn, J.: Numerical simulation of impinging jets. In: High Performance
Computing in Science and Engineering ’14, pp. 275–287. Springer, Cham (2015)
18. Wilke, R., Sesterhenn, J.: Numerical simulation of subsonic and supersonic impinging jets. In:
High Performance Computing in Science and Engineering´ 15, pp. 349–369. Springer, Cham
19. Wilke, R., Sesterhenn, J.: On the origin of impinging tones at low supersonic flow (2016).
arXiv preprint, arXiv:1604.05624
20. Wilke, R., Sesterhenn, J.: Statistics of fully turbulent impinging jets (2016). arXiv preprint,
21. Zuckerman, N., Lior, N.: Impingement heat transfer: correlations and numerical modeling. J.
Heat Transf. 127(Nr. 5), 544–552 (2005). ISBN 0022–1481
Aeroacoustic Simulations of Ducted Axial Fan
and Helicopter Engine Nozzle Flows
Abstract The flow and the acoustic field of an axial fan and a helicopter engine jet
are computed by a hybrid fluid dynamics – computational aeroacoustics method. For
the predictions of the flow field a high-fidelity, parallelized solver for compressible
flow is used in the first step. In the second step, the acoustic field is determined
by solving the acoustic perturbation equations. The axial fan is investigated at a
Reynolds number of Re D 9:36 105 for two tip-gap sizes, i.e., s=Do D 0:001
and s=Do D 0:01 at a fixed flow rate coefficient ˚ D 0:195. A comparison of the
numerical results of the pressure spectrum and its directivity with measurements
show a good agreement which confirms the correct identification of the sound
sources and the accurate prediction of the acoustic duct propagation. Furthermore,
the results show in agreement with the experimental data a higher broadband noise
level for the larger tip-gap size. In the second application, jets from three different
helicopter engine nozzles at a Reynolds number of Re D 7:5 105 are investigated,
showing an important dependence of the jet acoustic near field on the presence of the
nozzle built-in components. The presence of the centerbody increases the OASPL
compared to the clean nozzle, where the inclusion of struts reduces the OASPL
compared to the centerbody nozzle owing to the increased turbulent mixing caused
by the struts which lesses the length and time scales of the turbulent structures shed
from the centerbody.
1 Introduction
The prediction and reduction of noise generated by turbulent flows has become one
of the major tasks of todays aircraft development and is also one of the key goals
in European aircraft policy. Compared to the year 2000 the perceived noise level of
flying aircraft should to be reduced by 65 % until the year 2050. To comply with
new noise level regulations, reliable, efficient and accurate aeroacoustic predictions
are required, i.e., for low noise design of technical devices such as axial fans or
helicopter engine nozzles.
Fan industry increasingly demands for quieter and more efficient axial fans in
a wide range of applications. A systematic quiet fan design, however, requires
prediction methods for the acoustic field and sufficient details of the flow field
to understand the intricate flow mechanisms, e.g. in the tip-gap region of the fan
blade. Since measurements of the flow field in the rotating fan environment are
difficult to perform, time-accurate numerical simulations such as highly resolved
large-eddy simulations (LES) have shown to successfully predict the main flow
phenomena [22–24], especially those in the tip-gap region since these can be a
significant source of aerodynamic losses and noise emission.
Appreciable progress has been achieved over the last 20 years in the decrease of
jet noise by using various noise reduction techniques such as high bypass ratio and
design variations on the nozzle casing. These techniques have primarily focused on
increasing the turbulent mixing by altering the nozzle design. In modern engines,
the bypass ratio has already reached the limiting value and any further increase
will aggravate the engine performance. Flow control inside the nozzle by additional
built-in components such as wedges vanes etc. is an alternative approach and
increasingly used to suppress the noise in the jet near field [14, 20].
The overall reliability of an acoustic prediction is prominently restricted with
the quality of the flow field solution. To accurately capture the essential part
of the turbulent spatial and temporal scales generated in the flow field highly
resolved LES calculations are a must. That is such aeroacoustic analyses of high
Reynolds number flow with complex geometries included in the computational
domain require advanced computing resources.
In this paper the acoustic fields of a ducted axial fan and a helicopter engine
nozzle are predicted by a hybrid fluid-dynamics-acoustics method. In a first step,
large-eddy simulations are performed to determine the acoustic sources. In a second
step, the acoustic field on the near and far-field is determined by solving the acoustic
perturbation equations (APE) [6] on a mesh. The acoustic results of the axial fan are
compared to experimental data [27].
This paper is organized as follows. First, the numerical methods are presented
in Sect. 2. Subsequently, the LES and aeroacoustic results of the axial fan and
nozzle-jet simulations are discussed in Sects. 3 and 4. Computational features and
scalability analysis are given in Sect. 5. Finally, some conclusions are outlined in
Sect. 6.
2 Numerical Method
An LES model based on a finite volume method is used to simulate the compressible
unsteady turbulent flow by solving the Navier-Stokes equations. For the LES
an implicit grid filter is assumed and the monotone integrated LES (MILES)
approach [2] is adopted, i.e., the dissipative part of the truncation error of the
numerical method is assumed to mimic the dissipation of the non-resolved subgrid-
scale stresses. This solution method has been validated and successfully used, e.g.,
Aeroacoustic Simulations of Ducted Axial Fan and Helicopter Engine Nozzle Flows 445
in [1, 16]. The governing equations are spatially discretized by using the modified
advection upstream splitting method (AUSM) [19]. The cell center gradients are
computed using a second-order accurate least-squares reconstruction scheme [10],
i.e., the overall spatial approximation is second-order accurate. For stability reasons,
small cut-cells are treated using an interpolation and flux-redistribution method [25].
A second order 5-stage Runge-Kutta method is used for the temporal integration.
A parallel grid generator is used to create a computational hierarchical Cartesian
mesh featuring local refinement [18]. The interested reader is referred to [19] for
the details of the numerical methods, i.e., the discretization and computation of the
viscid and inviscid fluxes. To determine the sound propagation and to identify the
dominant noise sources the acoustic perturbation equations (APE) are applied. Since
a compressible flow problem is considered, the APE-4 system is used [6].
To accurately resolve the acoustic wave propagation described by the acoustic
perturbation equations in the APE-4 formulation [15] a sixth-order finite difference
scheme with the summation by parts property [13] is used for the spatial discretiza-
tion and an alternating 5–6 stage low-dispersion and low-dissipation Runge-Kutta
method for the temporal integration [11]. On the embedded boundaries between
the inhomogeneous and the homogeneous acoustic domain an artificial damping
zone has been implemented to suppress spurious sound generated by the acoustic-
flow-domain transition [26]. A detailed description of the two-step method and
the discretization of the Navier-Stokes equations and the acoustic perturbation
equations is given in [7].
In this subsection, a rotating low Mach number axial fan is investigated. In the first
subsection, it is discussed how the gap size between blade tip and the outer casing
wall affects the flow field at different operating conditions. All computations are
performed at a fixed Reynolds number based on the rotational velocity and the
D2 n
diameter of the outer casing wall Re D o D 9:36 105 and a fixed Mach
number M D Dao n D 0:136. Afterwards, the acoustic field is analyzed at the flow
rate coefficient ˚ D 2 D3o n
D 0:195 for two tip-gap widths s=Do D 0:001 and
s=Do D 0:01.
The axial fan investigated in this section is shown in Fig. 1. The fan has five
twisted blades out of which only one has been resolved in both LES and CAA
computations to reduce the computational costs. The diameter of the outer casing
wall is Do = 300 mm and the inner diameter of the hub is Di = 135 mm. The rotational
446 A. Pogorelov et al.
Fig. 1 Instantaneous contours of the Q-criterion inside the ducted axial fan configuration colored
by the relative Mach number showing the vortical structures generated by the tip leakage flow at
˚ D 0:195 and s=Do D 0:005
speed is n = 3000 rpm. As depicted in Fig. 1 for ˚ D 0:195 and s=Do D 0:005,
the existence of a gap between the blades tip and the outer casing wall and the
pressure difference between the pressure and the suction side of the blades, lead
to the development of a tip-gap vortex. Depending of the operating conditions the
tip-gap vortex can be a major noise source in the axial fan, especially at low flow
rate coefficients ˚, as demonstrated in [22] at ˚ D 0:165 and a tip-gap size of
s=Do D 0:01. At low flow rate coefficients the highly unsteady turbulent wake
generated by the tip-gap vortex is shifted further upstream and impinges upon the
leading edge of the neighboring blade. The intermittent interaction leads to a cyclic
transition on the suction side of the blade. Acoustic measurements have shown
broadband peaks in the specific sound power spectrum at frequencies corresponding
to these phenomena. The decrease of the tip-gap width from s=Do D 0:01 to
s=Do D 0:005 at ˚ D 0:165, stabilizes the tip-gap vortex and reduces the wandering
motion of the turbulent wake such that the interaction with the leading edge of the
neighboring blade and the cyclic transition triggered by this interaction vanish as
discussed by Pogorelov et al. [23]. Instead, a permanent turbulent transition, which
is triggered by a separation bubble at the leading edge was observed. The reduction
of the tip-gap width leads to a strong decrease of the noise level. However, for
the smaller tip-gap size, the turbulent wake still interacts with the pressure side of
the blade. To separate the noise generated by the interaction and the phenomena
triggered by this interaction from the self-generated noise of the tip-gap vortex
Aeroacoustic Simulations of Ducted Axial Fan and Helicopter Engine Nozzle Flows 447
Fig. 2 Turbulent kinetic energy contours in several radial planes from
D 30ı to
D 70ı , for
s=Do D 0:01 (left) and s=Do D 0:005 (right)
it is important to analyze the acoustic field at higher flow rate coefficients and
small tip-gap widths where no interaction with the neighboring blades is evident.
Pogorelov et al. [24] analyzed the flow field at ˚ D 0:195 for the tip-gap widths
s=Do D 0:005 and s=Do D 0:01. This study has demonstrated the strong impact
of the tip-gap width on the size and shape of the tip-gap vortex. It has been shown,
that due to the stronger curvature and the smaller diameter of the tip-gap vortex
for s=Do D 0:005, the entire turbulent wake passes the neighboring blade without
any interaction, where for s=Do D 0:01 several vortical structures of the turbulent
wake reach the trailing edge of the blade at the pressure side, as depicted in Fig. 2.
Therefore, for tip-gap sizes below s=Do D 0:005 no interaction with the neighboring
blades is expected. In the following subsection, the acoustic field of the flow field
at ˚ D 0:195 for s=Do D 0:001 and s=Do D 0:01 is analyzed. For the source
computation, required for the acoustic analysis LES have been conducted for both
operating conditions. The computational mesh resolving one out of five blades has
approx. 140 million grid points. Two full rotations have been required to obtain a
fully developed flow field. Data from another two full rotations has been used for
statistical analysis. In total, 1440 samples were recorded which required 8.6 TB of
disc space. The CPU time was approx. 200 h and the computations were conducted
on approx. 6000 CPUs.
Fig. 3 (a) Schematic view of the LES and (b) the acoustic configuration of an axial fan
Fig. 4 The multi-block structured mesh in the acoustic source region resolving one out of five
blades of the axial fan; (a) view of the overall mesh; (b) detailed topological view of the mesh
determined, the near far-field acoustics is computed by solving the APE-4 system.
Since the contribution of entropy and non-linear terms can be neglected in this study,
only the vortex sound sources are taken into account.
A schematic view of the present computational setup is shown in Fig. 3 In a first
step, the turbulent flow fields are determined by LES for the two configurations
for 24 full rotations. Subsequently, the source terms are computed in the source
region which contains approximately 122 million grid points with the same mesh
resolution as the corresponding LES mesh. Figure 4 shows the computational mesh
used for computing the source terms. The instantaneous distribution of the dominant
Aeroacoustic Simulations of Ducted Axial Fan and Helicopter Engine Nozzle Flows 449
Fig. 5 Instantaneous contours of the Iso-surface of axial component of the fluctuating Lamb
vector showing the major sound sources around the blade; (a) configuration s=Do D 0:001; (b)
configuration s=Do D 0:01
Fig. 6 The multi-block structured mesh for the acoustic domain resolving one out of five blades
of the axial fan; (a) view of the overall mesh; (b) detailed view of the mesh at far-field
fan noise sources, which is the fluctuating Lamb vector L0 D .˝ u/0 , for the two
configurations is shown in Fig. 5
It is clearly visible that the strongest sources occur in regions with the highest
turbulent kinetic energy, i.e., in the tip vortex, blade wake and on the hub region.
Moreover, the noise sources generated by the bigger tip-gap size s=Do D 0:01
exhibits higher amplitudes compared to the smaller tip-gap size s=Do D 0:001.
In a second step, the acoustic field is predicted based on the corresponding LES
results. The computational mesh used for the LES is extended in the axial and
radial direction up to 20Do . The grid spacing around the microphones positions is
xmic =Do 5103 , so that for 10 points per wavelength, the maximum frequency
resolvable by the grid is about 10 kHz. The acoustic mesh including some details
of the mesh resolution in the far-field are shown in Fig. 6. The time step of the
450 A. Pogorelov et al.
Fig. 7 Instantaneous contours of the fully developed acoustic field showing the acoustic duct
propagation into the far-field; (a) configuration s=Do D 0:001; (b) configuration s=Do D 0:01
Fig. 8 Schematic of the virtual microphone positions for the two acoustic configurations; (a) side
view; (b) front view
Fig. 9 Sound spectra at the far-field locations circle C1; (a) configuration s=Do D 0:001; (b)
configuration s=Do D 0:01; comparison of the () numerical results with the (—) experimental
results [27]
The evaluation of the sound pressure level at the circle C1 and the circle C2
show a convincing agreement especially at the broadband noise level. However,
considering the circle C2 towards center line of the axial fan, the computed sound
pressure level at the lower frequencies deviate from the experimental measurements
which is due to the fact that one blade acoustic simulations using periodic boundary
condition lacks certain low wave number ranges which is clearly observable in
corresponding spectral analysis. In addition, a higher noise level of the case with the
bigger compared to the smaller tip-gag size is clearly reproduced by the numerical
simulation method.
452 A. Pogorelov et al.
Fig. 10 Sound spectra at the far-field locations circle C2; (a) configuration s=Do D 0:001; (b)
configuration s=Do D 0:01; comparison of the () numerical results with the (—) experimental
results [27]
Fig. 11 Rear section of the nozzle geometry (a) clean nozzle hj1 , (b) centerbody nozzle hj2 , (c)
centerbody-plus-strut nozzle hj3
In this section, simulation results of round jets emanating from a three variants
of non-generic nozzle are presented. First the flow field of the three nozzle
configurations at a Reynolds number of Re = 7:5 105 and a Mach number of
M D 0:341 are conducted and thereafter, the acoustic field is computed whose
acoustic source terms are determined by LES data.
Table 1 Simulation features and mesh parameters of the flow and the acoustic field solutions
Clean nozzle (hj1 ) Centerbody nozzle (hj2 ) nozzle (hj3 )
Flow field
Mach number Mj 0.341 0.341 0.341
Reynolds number ReDe 750,000 750,000 750,000
Mesh points 335 106 329 106 328 106
Number of samples 2251 2251 2251
Acoustic field
Mesh points 108:5 106 108:5 106 108:5 106
Fig. 12 Contours of the Q-criterion color coded by density for three geometries (a) hj1 , (b) hj2 ,
(c) hj3
The operating conditions of the last turbine stage are set at the inlet boundary
which were taken from the measurements of a full-scale turbo-shaft engine [21].
Isotropic synthetic turbulence is injected at the inlet plane with approx. 10 %
turbulence intensity [17]. For the outflow and lateral boundaries of the jet domain,
static pressure is kept constant and other variables are extrapolated from the internal
domain. To damp the numerical reflections at the boundaries, sponge layers are
prescribed [8]. At the nozzle-wall a no-slip condition with a zero pressure and
density gradient is applied. Hierarchically refined Cartesian meshes are used for
the flow field computations and a grid convergence study of the centerbody nozzle
hj2 configuration is studied in [4, 5]. The essential mesh and simulation parameters
of the analysis of the flow and the acoustic fields are summarized in Table 1.
The overall turbulent structures in the jet are visualized in Fig. 12 by the contours
of the instantaneous Q-field [12] for the three configurations. Since the same
threshold value for the Q-contours is used, the various widening of the free jets
can be deduced from this illustration. In other words, Q-fields evidence the smaller
spreading of the jet exhausting from the clean nozzle hj1 .
The modified turbulence field influences the jet characteristics downstream of
the nozzle exit. This is illustrated by the contours of the mean axial velocity in the
free jet region in Fig. 13. The mean velocity on the centerline decreases much more
strongly for the hj2 and hj3 geometries than for the clean nozzle which possesses a
standard jet plume shape. Furthermore, the asymmetric velocity distribution caused
by the struts is visible in the jet field just downstream of the exit. However, further
downstream hardly any asymmetric influence of the struts is observed.
The mean axial velocity R distribution normalized with the average nozzle exit
axial velocity une D A1 u ndA on the centerline starting at the rear face of the
454 A. Pogorelov et al.
Fig. 13 Contours of the mean axial velocity in the free jet region for three geometries (a) hj1 , (b)
hj2 , (c) hj3
-2.3 10 20 30 40
Fig. 14 Streamwise distribution of the axial velocity on the centerline r=Re D 0 for (—) hj1 ,
() hj2 , (--) hj3
0.1 0.15
0 0
-2.3 0 10 20 30 -2.3 0 10 20 30
x/Re x/Re
Fig. 15 Streamwise distribution at r=Re D 0 of (a) the rms axial velocity and (b) the rms radial
velocity for (—) hj1 , () hj2 , (--) hj3
turbulence intensity than the clean nozzle hj1 solution due to the enhanced turbulent
mixing caused by the centerbody and the struts. Further downstream of x=Re 15
all profiles of the rms axial and radial velocities show a similar decaying trend.
The acoustic perturbation equations (APE) are applied to determine the sound
propagation and to identify a dominant noise source excited by the hot jets. Since a
compressible flow problem is tackled the APE-4 system is used [6].
For the computations a time step t D 0:011Re =a1 is chosen to obtain
stable numerical solutions. The acoustic analyses include the sound waves whose
maximum wavenumber kmax D 2 =min is approximately 0:36 =Re . The source
fields are provided for all Runge-Kutta steps using a least squares optimized
interpolation algorithm [9]. The time interval reconstructed by the 2251 LES
snapshots is Ttotal D 148:5Re=ue .
The acoustic simulation setup and mesh details are discussed at length in [3].
In Fig. 16 the acoustic field determined by the aforementioned numerical
schemes is illustrated. The contours of the acoustic pressure are ranged in
p0 5 106 0 a20 near the jet nozzle region. The acoustic pressure of the
configuration hj1 possesses smaller amplitudes than the other two configurations hj2
and hj3p. At the nozzle exit in Fig. 13 the mean axial velocity in the radial direction
(r D y2 C z2 ) decreases for the clean nozzle configuration hj1 . The turbulent
fluctuations in the shear layer are less pronounced for the single jet hj1 as discussed
in Fig. 15. These are the major reason of a low acoustic energy in the single jet hj1 .
The overall acoustic level in Fig. 17 evidences the low acoustic emission of the
single jet hj1 . The profiles of three acoustic fields are obtained by the microphones
aligned in the axial direction at the sideline location 8Re away from the jet
centerline. The dominant wave radiation occurs in the upstream position due to
456 A. Pogorelov et al.
Fig. 16 Acoustic pressure contours in the range of jp0 =0 a20 j 5 106 on the z D 0 plane, (a)
hj1 , (b) hj2 , and (c) hj3
0 5 10 15 20
x/R e
Fig. 17 Overall sound pressure level in dB at the radial distance of 8Re from the jet centerline,
(—) hj1 , () hj2 , (--) hj3
the unperturbed jet core in the nozzle exit area. The microphone in a downstream
location captures the acoustic waves at a relatively farther distance from the end of
the jet core. The centerbody nozzle configuration hj2 generates the most powerful
acoustics which shows 3 dB larger OASPL at a streamwise position x D 10Re
compared to the single jet hj1 . The additional turbulence mixing by struts in the
configuration hj3 reduces the acoustic generation by approximately 2–4 dB over
the streamwise position Re x 19Re . The acoustic directivity of the single
jet hj1 shows a silent zone in the upstream position x 5Re . Compared with the
findings of the single jet (hj1 ) the axial profiles of the other jets (hj2 and hj3 ) show
an approximately 2–9 dB higher acoustic pressure.
In Fig. 18 the acoustic spectra of a single and two centerbody jets are compared.
The sound pressure is determined at the coordinates (x D 3Re , r D 8Re ) for the
sideline acoustics in Fig. 18a and (x D 18Re, r D Re ) for the downstream acoustics
in Fig. 18b. The sideline acoustics in Fig. 18a display a large increase of power
spectral density in the frequency range fDe =ue D 0:3 0:8, where f is the frequency
and ue nozzle exit average velocity. The peaks are located at fDe =ue D 0:45 for
the single jet hj1 and at fDe =ue D 0:5 0:6 for the jets with a centerbody hj2 ,
Aeroacoustic Simulations of Ducted Axial Fan and Helicopter Engine Nozzle Flows 457
(a) 100
(b) 100
10 10-1
10-2 10-2
-3 -3
10 10
10 10-4
10 10-5
10-1 100 10-1 100
Fig. 18 Power spectra of the acoustic pressure signals determined at the coordinates (a) x=Re D
3; r=Re D 8 and (b) x=Re D 18; r=Re D 8: (—) hj1 , () hj2 , (--) hj3
hj3 . The downstream acoustics in Fig. 18b shows the pronounced low frequency
radiation at fDe =ue 0:1. The acoustic peaks occur at the same frequency range
identified in the sideline acoustics. As indicated by the spectra of hj2 and hj3
the increase of the acoustic power becomes more prominent when the turbulent
fluctuations increase. The sound generation of a hot jet includes two features.
The first feature is the downstream acoustics due to the large scale turbulence
in the shear layers and the second one is the sideline acoustics enhanced by the
temperature gradient. Figure 18a illustrates the differences of the sideline acoustics.
The acoustic radiation almost perpendicular to the jet axis is clearly intensified for
the jets with a centerbody hj2 , hj3 more than that of the single jet hj1 . Besides, in the
frequency band 0:1 fDe =ue 0:5 the acoustic level of the centerbody-plus-strut
configuration hj3 is reduced compared to that of the centerbody configuration hj2 .
The simulations of the acoustic field were carried out on the CRAY XC40 at HLRS
Stuttgart, containing two socket nodes with 12 cores at 2.5 GHz. Each node is
equipped with 128 GB of RAM, i.e., each core has 5.33 GB of memory available
for the computation. Strong scaling experiments were conducted to demonstrate the
scalability of the APE-4 solver. Five core numbers were used, i.e., 512, 1024, 2048,
4096, and 8192. Furthermore, the results are based in 100 integrated time steps using
a mono-block cubic grid with 2563 grid points and periodic boundary conditions.
The overall speedup as a function of the number of cores shown in Fig. 19 proves
the good scalability of the code.
458 A. Pogorelov et al.
Fig. 19 Strong scaling experiment; Simulations were performed for 100 integrated time steps
using five number of cores, i.e., 512, 1024, 2048, 4096 and 8192
6 Conclusion
The flow and the acoustic field of a ducted axial fan and a subsonic jet including
the nozzle geometry were simulated by a hybrid CFD/CAA method. First, the flow
field was computed by an LES and subsequently, the acoustic field was determined
by solving the APE.
For the axial fan, two configurations with different tip-gap sizes, i.e. ,s=Do D
0:001 and s=Do D 0:01 at the flow rate coefficient ˚ D 0:195 were performed and
the results were compared to reference data. The findings showed that the diameter
and strength of the tip vortex increase with the tip-gap size, while simultaneously
the efficiency of the fan decreases. Increasingly the tip-gap size led to the strongest
sound sources occur in the tip-gap regions as well as at wake of the fan blade.
In the second step, acoustic field was determined by solving APE-4 system in
rotating frame of reference. The overall agreement of the pressure spectrum and
its directivity with measurements confirm the correct identification of the sound
sources and accurate prediction of the acoustic duct propagation. The results show
that the larger the tip-gap size the higher the broadband noise level.
Next, three turbulent jets emanating from of a clean divergent annular reference
nozzle, a configuration with a centerbody and a geometry with a centerbody plus 5
equidistantly distributed struts were considered. The results showed an important
dependence of the jet acoustic near field on the presence of the nozzle built-in
components. For example, on the one hand, the presence of the centerbody increased
the OASPL up to 6 dB compared to the clean nozzle, on the other hand, inclusion
of the 5 struts reduced the OASPL up to 4 dB compared to the centerbody nozzle
owing to the increased turbulent mixing caused by the struts which lessen the length
and time scales of the turbulent structures shed from the centerbody.
Acknowledgements The research has received funding by the German Federal Ministry of
Economics and Technology via the “Arbeitsgemainschaft industrieller Forschungsvereinigungen
Otto von Guericke e.V.” (AiF) and the “Forschungsvereinigung Luft- und Trocknungstechnik e.V.”
Aeroacoustic Simulations of Ducted Axial Fan and Helicopter Engine Nozzle Flows 459
(FLT) under the grant no. 17747N (L238) as well as from the European Community’s Seventh
Framework Programme (FP7, 2007–2013), PEOPLE program under the grant agreement No.
FP7-290042 (COPAGT project). Computing resources were provided by the High Performance
Computing Center Stuttgart (HLRS) and by the Jülich Supercomputing Center (JSC).
21. Pardowitz, B., Tapken, U., Knobloch, K., Bake, F., Bouty, E., Davis, I., Bennett, G.: Core noise
– identification of broadband noise sources of a turbo-shaft engine. AIAA paper 2014–3321
22. Pogorelov, A., Meinke, M., Schröder, W.: Cut-cell method based large-eddy simulation of
tip-leakage flow. Phys. Fluids 27(7), 075106 (2015)
23. Pogorelov, A., Meinke, M., Schröder, W.: Effects of tip-gap width on the flow field in an axial
fan. Int. J. Heat Fluid Flow (2016). doi:10.1016/j.ijheatfluidflow.2016.06.009
24. Pogorelov, A., Meinke, M., Schröder, W., Kessler, R.: Cut-cell method based large-eddy
simulation of a tip-leakage vortex of an axial fan. AIAA paper 2015-1979 (2015)
25. Schneiders, L., Hartmann, D., Meinke, M., Schröder, W.: An accurate moving boundary
formulation in cut-cell methods. J. Comput. Phys. 235, 786–809 (2013)
26. Schröder, W., Ewert, R.: LES-CAA coupling. In: Large-Eddy Simulations for Acoustics.
Cambridge University Press (2005)
27. Zhu, T., Carolus, T.H.: Experimental and numerical investigation of the tip clearance noise of
an axial fan. GT2013-94100 (2014)
Adding Hybrid Mesh Capability
to a CFD-Solver for Helicopter Flows
1 Introduction
taken into account to compute the helicopter’s orientation in space due to the acting
aerodynamic loads. Structural dynamics are considered by the deformation of the
body meshes to include the aeroelasticity [3, 6]. An efficient computation is achieved
by a multi-block structure of the grid to enable parallel computing with a satisfying
scaling beyond 24,000 cores [7]. This comprehensive features of the code make it
one of very few codes world-wide for high-fidelity helicopter simulations. With this
unique characteristics the code is the optimal basis for the extension, although the
code’s architecture is designed to process structured meshes only.
gence acceleration techniques for unstructured blocks are implemented. The local
time-stepping method allows every control volume to determine its ideal time step.
The implicit residual smoothing shifts the characteristic of the explicit Runge-Kutta
scheme towards an implicit method. Hence higher CFL numbers can be used.
Multigrid methods would further accelerate the convergence of the unstructured
block handling. This will be a topic for further development of the unstructured
The turbulence in unstructured blocks is modelled by the two-equation Wilkox
k! turbulence model [14]. However, the turbulence model for unstructured blocks
can be selected independently from the turbulence model of structured blocks. The
convective and viscous fluxes are approximated by first-order methods. Time dis-
cretization for turbulence variables is achieved by a dual-time stepping scheme with
an implicit treatment of the pseudo time step. The implicit method is more robust
than using an explicit method for the turbulence variables, which is applied to the
structured blocks. However, contrary to the implicit operator constituting a block-
diagonal matrix for structured meshes for the equation, unstructured meshes lead to
a sparse, non-symmetric block matrix, with a quasi-random distribution of non-zero
elements. This requires much more focus on solving the linear system of equation
than for the structured code, which can easily be solved by the performance-
efficient Thomas algorithm. For the unstructured blocks, the equation is solved by
an iterative GMRES(m) (Generalised Minimal Residual) algorithm suggested by
Saad and Schulz [10]. The efficiency of the algorithm is further increased by an
ILU(0) (Incomplete Lower Upper) pre-conditioner. Non-zero elements in the matrix
represent the grid connectivity and therefore in a single row the corresponding
cell neighbours. To reduce the memory requirements drastically, only (ncell)(7)
elements are stored instead of a full (ncell)(ncell) matrix. Considering up to
hexahedral cells, all non-zero entries in a row can thus be mapped containing the
entry of the cell itself on the main diagonal and a maximum of six neighbour
cells. However, the compressed storage scheme requires additional decompressing
information stored in an additional array which can be stored memory-efficient with
1 byte integers. This approach results in less memory bandwidth and increased
efficiency of the equation system solver. Furthermore, a restarted GMRES(m)
method is used to limit the amount of Krylov subspaces and the ILU(0) pre-
conditioner creates no additional fill-in.
The hybrid mesh capability requires an interfacing between the unstructured
and structured meshed areas, which is solved using the already available Chimera
interpolation method. The method enables the interpolation between arbitrary grid
overlaps by the transformation of meshes into point clouds. Therefore, the currently
implemented Chimera method is already capable to handle the overlap between a
structured and unstructured meshed area.
Adding Hybrid Mesh Capability to a CFD-Solver for Helicopter Flows 465
Besides the numerical methods required for the processing of unstructured grids,
the efficient parallelization of the work is a key feature of the code to be applicable
for current and upcoming research and development. Therefore, the implementation
is integrated in the existing parallelization process for structured grids. By splitting
the grid into sub-grids, so called blocks, the grid can be distributed over several
computation units executing the numerics of the sub-grids separately. With so
called ghost-layers which consist of dummy cells at the block-boundaries, the
information is exchanged between the different sub-grids enabling the exchange
of numerical flux between blocks. In case of structured grids the block splitting
process is performed in the grid generation tool. Since this functionality is not
available for unstructured grids with the grid generation programs used, a pre-
processing tool was created. The computational workload is defined by the number
of cells of the sub-grid/block. With the input of the desired cell size per block
and the grid in CGNS-format, a block splitting using the widely utilized METIS
library is performed. The resulting mesh is prepared for the FLOWer simulation
with specific output including the ghost-layer information used for data-exchange
over the block boundaries. A workload factor considers the additional workload
required, compared to structured methods, during the distribution of the grid block
in case of a hybrid mesh. This approach ensures an equal workload for each
process. The workload factor is measured by a single-core single-block computation
applying structured numerics compared to a computation applying unstructured
numerics to a hexahedral mesh. The computation time of an iteration is measured
and the workload factor of an unstructured computation is determined by the ratio
to the computation time using the structured numerics. Compared to the standard
second-order JST scheme, the unstructured computation requires 2.5 times more
computational effort. This is equal to the additional effort required for the higher-
order WENO scheme, which is extensively used in current helicopter simulations.
With a consideration of these factors during load balancing, there is no influence of
the unstructured approach on the parallelization logic.
4 Validation Case
In this chapter a test case is presented showing the successful validation of the
implementation. A representative test case for the numerical challenges faced by a
CFD-code is the computation of the viscous flow over a forward facing step (FFS).
The front side of the step leads to a stagnation of the flow including a recirculation
area. The sharp edge challenges the numerics for its capability of representing
viscous effects resulting in flow separation with a long recirculation vortex on the
step’s upper side. The reference of this validation is the flow field of the structured
computation using the standard second-order JST method. The simulations are
performed using 3-D meshes with equal mesh resolutions. However, the flow over a
466 U. Kowarsch et al.
FFS has a two-dimensional flow characteristic. The free-stream Mach number is set
to 0.2, leading to a subsonic flow with very slight compressible effects, representing
a usual on-flow towards a helicopter geometry. The same turbulence model (Wilkox
k!) for the unsteady RANS simulation is applied to concentrate on the differences
due to the flux computation approaches.
Figure 1 shows the comparison of the resulting flow field using the different
computation methods with comparable meshes. For the unstructured computation
the PLR scheme is used in combination with the Venkatakrishnan limiter. The flow
characteristics show a very good agreement between the structured and unstructured
computation. The recirculation area in front of the step is comparable in shape
and magnitude. Most important for the flow field characteristic of the FFS is the
separation behind the step. Comparing this area shows a very good accordance
in terms of the extension of the separation vortex in wall normal and downstream
direction. The position of the reattaching point of the flow is similar. An additional
important characteristic is the increase of momentum thickness after the step as a
result of the viscous effects over the step. Comparing the velocity profile at the most
downstream position, a good accordance between the wall normal distance at which
the velocity drops significantly compared to the scaled free stream velocity of 1.0 is
Fig. 1 Flow solution of the forward facing step validation case. (a) Unstructured computation
using PLR and VK-Limiter of 200. (b) Structured computation using JST-scheme
Adding Hybrid Mesh Capability to a CFD-Solver for Helicopter Flows 467
Fig. 2 Surface of the geometry components of the helicopter considered for the simulation
In the following section a simulation of a helicopter flow using the hybrid mesh
approach is presented. The unstructured extension is aimed to be applied in near-
body areas on geometries which are complex to be meshed. The considered
simulation is a helicopter configuration including the main aerodynamic compo-
nents: the main rotor, airframe, and tail rotor (cf. Fig. 2). This configuration is
commonly used to get a first impression of the flow field as well as an estimation
for the loads acting on the helicopter.
Figure 3 shows the area where the unstructured mesh ability is applied. At the area
of the engine inlet, several geometric features would force a structured grid with
a disproportionate amount of grid cells to reproduce the geometry. Therefore, an
unstructured patch (red) is embedded into the structured meshed airframe in this
region enabling a fast and efficient meshing of this area. As already mentioned,
the interface between structured and unstructured meshes is performed via the
Chimera method available in FLOWer. Therefore, overlapping regions of the grids
are required where the data exchange takes place. The orange marked structured
mesh region is considered in both, the unstructured and structured mesh leading
to a congruent mesh area to ensure an accurate and conservative data exchange.The
extrusion normal to the surface is performed in the same manner as for the structured
grid. After the discretization of the boundary layer using prisms, the unstructured
mesh topology switches to tetrahedrons for further extrusion. After several boundary
layer heights, the Chimera interface into the structured Cartesian off-body mesh is
applied. On this off-body mesh a higher-order scheme may be applied to ensure a
low dissipation of the convecting flow.
468 U. Kowarsch et al.
Fig. 3 Application of unstructured mesh in complex geometry regions. Red marks unstructured,
green structured and orange interpolation areas. Slice made through the volume mesh
However, this application is performed using the second-order JST scheme in the
structured meshes. The simulation is performed on the Cray XC40 Hazelhen system
using 1200 cores. Both simulation strategies show a comparable computational
effort. The higher computational workload of the unstructured mesh treatment is
compensated by the slightly lower amount of grid cells required compared to the
structured meshing. However, in summary the benefit is found in the human work
load during mesh generation with is significantly lower for unstructured grids.
5.2 Evaluation
For evaluation purposes the computed flow field using the hybrid mesh approach is
compared to a simulation with a structured only grid. The structured simulation can
be seen as a reference which is extensively validated for its correctness against flight
test and wind tunnel data.
Figure 4 gives an overview of the flow field in terms of a vortex visualization for
the two simulation strategies. Both simulations show very similar results with minor
influence of the unstructured mesh region on the vortex field around the helicopter.
In both cases, the area computed structured with the characterizing blade tip
vortices shows no substantial differences. In the region of the engine inlet with
its unstructured discretization in the hybrid mesh case, slight differences influence
the vortex field around the helicopter. Minor differences in the flow separation
region behind the edge downstream of the inlet are found. A more detailed flow
field around the inlet is depicted in Fig. 5. A slice through the engine inlet plane
shows the pressure levels for the two simulation methods. In both cases the region
with higher pressure is found in front of the inlet, which is caused by the passing
Adding Hybrid Mesh Capability to a CFD-Solver for Helicopter Flows 469
Fig. 4 Comparison of the flow field in terms of vortex visualization using 2 -criterion (red:
hybrid, green: structured)
Fig. 5 Comparison of a slice through the engine inlet plane coloured with the pressure
blade at the considered time step. The expected character of the inlet is seen by a
positive pressure in the region of the stagnation zone and subsequent flow separation
with negative pressure after the edge to the engine cowling further downstream. In
both cases a comparable magnitude in pressure is found leading to similar flow
characteristics. The subsequent flow field downstream the fuselage shows the same
properties, implying that no substantial deviating disturbances are introduced by the
engine inlet using the different mesh approaches.
Besides the implementation of further numerical methods, the current state of the
code was investigated with regard to optimization potential running the code on
HPC systems. In the course of a “bring your own code” optimization workshop
organized by the HLRS and Cray in Stuttgart, a detailed profiling of the code was
conducted to investigated bottlenecks and weak points. Main focus was set on high
470 U. Kowarsch et al.
parallelization computations beyond 1000 nodes on the Cray XC40 Hazelhen sys-
tem at the HLRS. By identifying optimization potential in the MPI communication,
5 % overall performance could be gained by using sub-world communicators for
the parallelization which was presented at the last years HELISIM annual report
[7]. The scaling characteristic in terms of strong- and weak-scaling shows the same
behavior of the code as presented in [7]. Further, manual loop decompositions
and restructuring allowed an improved cache reusing in runtime relevant routines
leading to an additional performance increase of 20 %. This speed-up independent
from code-parallelization enables a more efficient use of the resources available on
each core.
Overall the workshop showed a high benefit in knowledge transfer from the
HLRS and Cray staff to the users on the Cray XC40 Hazelhen system, giving them
deeper insight into how to use the system’s capability most efficiently.
7 Conclusions
The paper presents the implementation of a hybrid mesh treatment for the former
block-structured only CFD-Code FLOWer. Various numerically optimized algo-
rithms are applied to a computationally efficient handling of cell topologies up to
hexahedrons. With the consideration of the code’s application on highly parallel
systems, the extensions are embedded into the communication structure of the code
to enable massively parallel computations. The computational effort for the second-
order unstructured computation is determined to be equal to a fifth-order structured
computation, which can be applied at the same time to structured meshed regions.
Validation of the code in terms of a forward facing step shows very good results
for the hybrid mesh approach. A full helicopter simulation with an unstructured
meshed engine inlet shows the capability to represent the physical behaviour with
good accuracy using the hybrid mesh approach, enabling the discretization of
complex areas using an unstructured discretization.
Acknowledgements The investigations are based on the long-standing cooperation with the High
Performance Computing Center (HLRS) in Stuttgart who provided us with support and service to
perform the computations on their high performance computing system Cray XC40 Hazelhen. We
greatly acknowledge the German Aerospace Center (DLR) making us their CFD-code FLOWer
available for advancements and research purpose, which we would like to thank for.
1. Barth, T.J.: Aspects on unstructured grids and finite volume solvers for the Euler and Navier-
Stokes equations, AGARD report 787, pp. 6.1–6.61. VKI special course on unstructured grid
methods for advection dominated flows (1992)
Adding Hybrid Mesh Capability to a CFD-Solver for Helicopter Flows 471
2. Barth, T.J., Jespersen, D.C.: The design and application of upwind schemes on unstructured
meshes. AIAA paper 89-0366 (1989)
3. Busch, R.E., Wurst, M.S., Keßler, M., Krämer, E.: Computational aeroacoustics with higher
order methods. In: Nagel, W.E., Kröner, D.H., Resch, M. (eds.) High Performance Computing
in Science and Engineering ’12, pp. 239–253. Springer, Berlin/New York (2012)
4. Jameson, A.: Time dependent calculations using multigrid, with applications to unsteady flows
past airfoils and wings. In: Proceedings of the 10th AIAA Computational Fluid Dynamics
Conference, Honolulu (1991)
5. Jameson, A., Schmidt, W., Turkel, E.: Numerical solution of the Euler equations by finite
volume methods using Runge-Kutta time-stepping schemes. In: 14th AIAA Fluid and Plasma
Dynamic Conference, Palo Alto (1981)
6. Kranzinger, P.P., Keßler, M., Krämer, E.: Advanced CFD-CSD coupling – generalized, high
performant, radiual basis function based volume mesh deformation algorithm for structured,
unstructured and overlapping meshes. In: Proceedings of the 40th European Rotorcraft
Conference, Southampton (2014)
7. Kranzinger, P.P., Kowarsch, U., Schuff, M., Keßler, M., Krämer, E.: Advances in parallelization
and high-fidelity simulation of helicopter phenomena. In: Nagel, W.E., Kröner, D.H., Resch,
M. (eds.) High Performance Computing in Science and Engineering ’15, Stuttgart (2015)
8. Kroll, N., Eisfeld, B., Bleeke, H.M.: The Navier-Stokes Code FLOWer. Notes on Numerical
Fluid Mechanics, vol. 71, pp. 58–68. Vieweg, Braunschweig/Wiesbaden (1999)
9. Liu, X.-D., Osher, S., Chan, T.: Weighted essentially non-oscillatory schemes. J. Comput. Phys.
115, 200–212 (1994)
10. Saad, Y., Schulz, M.H.: GMRES: a generalized minimal residual algorithm for solving
nonsymmetric linear systems. SIAM J. Sci. Stat. Comput. 7, 856–869 (1986)
11. Toro, E.F.: Riemann Solvers and Numerical Methods for Fluid Dynamics. Springer, Berlin
12. Venkatakrishnan, V.: On the accuracy of limiters and convergence to steady state solutions.
AIAA paper 93-0880 (1993)
13. Venkatakrishnan, V.: Convergence to steady-state solutions of the Euler equations on unstruc-
tured grids with limiter. J. Comput. Phys. 118, 120–130 (1995)
14. Wilcox, D.C.: Re-assessment of the scale-determining equation for advanced turbulence
models. AIAA J. 26, 1299–1310 (1988)
Direct Numerical Simulation of Heated Pipe
Flow with Strong Property Variation
1 Introduction
Based on the previous experience [3, 9, 18, 19], dealing with steep property
variation and related complicated flow phenomenon is beyond the ability of
Reynolds-averaged modeling (RANS). Even if a certain turbulence model has
shown some satisfying results in a few cases, its superiority may not be achieved
in other cases. On the other hand, only a few experimental studies delivered
detailed hydraulic resistance, mean and turbulent velocity, and temperature fields.
The technical difficulties and high cost required for developing such techniques
have practically limited the progress of experimental works according to Yoo
[19]. Jackson [10] suggested using high-fidelity DNS or LES to investigate the
heat transfer to supercritical fluids and provided a reliable data base for modeling
validation and improvement, which has been proved to feasible in He et al. [9].
According to the authors’ knowledge, no DNS about the supercritical fluid flow
in a horizontal pipe has been published, which offers an insight into the detailed
flow mechanisms without any turbulence modeling. The current study is aimed
at elucidating the flow pattern of a heated supercritical fluid in a horizontal pipe.
Various simulation conditions will be reported. The pipe geometry is adjusted to
D D 1, 2 mm, which is in the range of printed circuit heat exchanger (PCHE)
channels. The influence of buoyancy on the heat transfer and flow turbulence of
supercritical fluid is going to be our major consideration.
2 Computational Details
In the present DNS study, supercritical CO2 in the pipe is intensively heated by
the constant and uniform wall heat flux qw , which leads to significantly variable
properties. Considering this, the Navier-Stokes equations are formulated in low-
Mach-number form (Eqs. 1, 2, and 3), in which the compressibility effect due to
temperature change at constant pressure P0 is included. Li et al. [11] use the full
compressible Navier-Stokes equations in DNS of supercritical CO2 , and proved the
validity of their assumption in low-Mach cases. This form of governing equations is
also applied by other authors [2, 13] in this area:
@ @.Uj /
C D0 (1)
@t @xj
@Ui @.Ui Uj / @P @ @Ui @Uj
C D C .. C // ˙ gıi1 (2)
@t @xj @xi @xj @xj @xi
@h @.Uj h/ @ @T
C D . / (3)
@t @xj @xj @xj
h D h.P0 ; T/; T D T.P0 ; h/; D.P0 ; h/; D .P0; h/; Cp D Cp .P0 ; h/; D .P0 ; h/:
Heat Transfer of Supercritical CO2 Using DNS 475
The governing equations, Eqs. (1, 2, and 3) are discretized with the open-source
finite-volume code OpenFOAM V2.4 [16]. The Pressure-Implicit with Splitting
of Operators (PISO) algorithm is applied for the pressure-velocity coupling. The
temporal term is discretized with the second-order implicit differencing scheme.
The spatial discretization is handled with a central differencing scheme and the
third-order upwind scheme QUICK is adopted for the convective term in the energy
Figure 1 shows the pipe geometry and the boundary conditions. At the inlet,
an inflow generator of the length L1 D 5D with an isothermal wall is adopted
to generate approximately fully developed inflow turbulence. A recycling/rescaling
procedure [12] is applied in this domain, which does not require a priori knowledge
of turbulent flow profiles. For accelerating the turbulence development, the velocity
field is initialized with the perturbation method introduced by Schoppa and Hussain
[15]. In the second section of the pipe L2 D 30D, a constant wall heat flux qw is
applied. The boundary condition for the velocity field at the outlet is the convective
boundary condition @ @t
C Uc @./
D 0, where can be any dependent variable, e.g.
the velocity U.
The cylindrical pipe is discretized with a total of 80 Mio. structured hexahedral
mesh. The mesh resolution is identical in all the simulation cases. The resolution
is equivalent to approximately 168 172 400 (radial r, circumferential
axial z direction) for the inflow domain and 168 172 2400 for the heated
domain, when converted from Cartesian to Cylindrical coordinates. The grid mesh is
uniform spaced in the axial direction, and refined near the wall in the radial direction
with a stretching ratio of 10, which corresponds to a dimensionless resolution of
0:11 (wall) < yC < 1:1 (center); p .R/C 6:5; zC D 4:6 in wall units,
i.e., y D yU;0 =0 , where U D w = based on the inlet Reynolds number
Re0 D U0 D=0 D 5400. Compared with the DNS study of Bae et al. [2] at the
same simulation conditions except the vertical placement of the pipe, the current
DNS shows significant improvement of resolution in all three directions and time
considering the same second order accuracy in both studies. Cumulatively, the
total mesh number in the heated domain is about 10 times that of Bae et al.
[2]. At the outlet of the pipe, a rise of Reynolds number should be considered
in the mesh resolution. The dimensionless mesh resolution here is still higher
than the reference work at the inlet, especially in radial and streamwise direction.
Therefore, it is expected that the current mesh is fine enough for handling this
simulation conditions. In the post processing, the mesh coordinate transformation
from Cartesian coordinate to Cylindrical coordinate is necessary. The flow statistics
are obtained through averaging in time.
This numerical procedure has been applied to the DNS of heated vertical pipe
with air at Re0 D 4200; 6000 [5], where the DNS is validated with experimental
results. The variable properties of air are comparable with those of supercritical
CO2 . Various flow statistics including heat transfer results and flow profiles match
well with the experimental data. Besides, vertical pipe flow cases with supercritical
CO2 have been also investigated in our previous study [4] and validated with existing
DNS work [2, 13]. Significant flow relaminarization and transition are observed
in this study. Furthermore, the obtained turbulence data is serving for advanced
turbulence modeling by Pandey and Laurien [14].
An introduction of simulation conditions is given in Table 1. Under the condition
of the same inlet Re0 D 5400, the pipe diameter D and the wall heat flux qw are set
to different values. The pipe diameter is considered to be an important parameter
for the buoyancy effect. The fixed wall heat flux qw results in a streamwise-
distribution of wall temperature Tw . The dimensionless heat flux qC is defined as
qC D qw =.0 U0 Cp;0 T0 /. In the forced convection case SC230F, buoyancy is totally
absent by omitting the gravity term (g D 0) in Eq. 2.
The resolution applied in the present DNS exceeds the previously used reference
DNS of Eggels et al. [8]. Therefore, the quality of the inflow turbulence is validated
with better resolved reference DNS data by Wu and Moin [17]. This DNS is obtained
using a second-order finite difference method. Grid points of 256 512 512
and z direction) are spaced in the L D 7:5D long pipe at Re = 5300. The
root-mean-square velocity in dimensionless form U C D U=U of three directions
is shown in Fig. 2. The best agreement is observed in axial direction z, because
current dimensionless resolution zC D 4:5 is similar and even slightly better than
the reference work zC D 5:3. In circumferential direction
, a small difference is
observed because lack of resolution (
C D 6:5 compared with
C D 2:2 in Wu and
Moin [17]).
Heat Transfer of Supercritical CO2 Using DNS 477
Fig. 2 Inflow turbulence validation, dimensionless velocity fluctuation Urms in r,
and z direction,
lines: current DNS at Re0 = 5400, symbols: DNS data from Wu and Moin [17] at Re = 5300
Fig. 3 Development of Tw (a) and Cf (b) in downstream direction, forced-convection case SC230F
shows no differences in the circumferential direction
In the turbulence statistics below, we define the mean quantities with Reynolds- and
Favre averaging, where N is the Reynolds average of any quantity and Q D N
the mass-weighted (Favre) average. The corresponding fluctuations are denoted by
0 D N and 00 D . Q Figure 4 demonstrates the development of various
average flow profiles in downstream direction of SC230. From top to bottom,
velocity Uez =Uz;0 , temperature T (K), density =0 , thermal capacity Cp =Cp;0 are
presented. In the following subsections, each case will be discussed separately.
Compared with SC160, a stronger buoyancy effect in SC230 leads to a defor-
mation of the average velocity profile as can be seen in the first row of Fig. 4. At
z D 10D, high-velocity flow with low density begins to concentrate in the bottom
section and low-velocity flow with low density occupies the upper part of the pipe
cross section. High-velocity flow takes a crescent shape at this position. At z D 15D
and 20D, a small area of high velocity flow is developed close to the top wall
surface and it connects with the major part of high-velocity flow at z D 25D. The
high-velocity flow is found to be an anchor shape at this position. The quantitative
analysis of the velocity field at z D 25D is shown in Fig. 5a. At
D 0ı , a velocity
peak is observed at about r=R D 0:75, which corresponds to the high-velocity region
near the top wall. Compared with that, the velocity profile at
D 45ı shows a
low value from r=R D 0:4 to r=R D 0:9, which is also visualized in Fig. 4. This
can be explained by the transport of secondary flow. Low-velocity flow close to
the circumferential wall flows upwards due to low density and drops down at about
D 45ı . Therefore, a low velocity region is developed here. The stratification of the
temperature field is similar to that observed in SC160. The hot flow gathers near the
top surface and it shows a significant temperature difference against the cold flow
on the bottom. Compared with SC160, this hot layer becomes thicker. This change
of the temperature field is also reflected in the density field in the third row. Due to
buoyancy, high-temperature CO2 with low density concentrates on the upper side of
Heat Transfer of Supercritical CO2 Using DNS 479
cross section. With the input of wall heat flux, the low density layer is growing in
downstream direction.
Vector plots of the 2-D average velocity field over the cross section are given
in Fig. 6. The lines are colored with the normalized density field =0 . The
480 X. Chu et al.
Fig. 6 Vector plot of the two-dimensional average velocity field of SC230 at various downstream
Figure 7 shows the evolution of the turbulent kinetic energy TKE D 12 Ui00 Ui00 ,
which indicates the intensity of the velocity fluctuations in downstream direction.
Generally, the TKE shows a decreasing tendency in downstream direction in all
three cases. Because of the same inlet Reynolds number Re0 D 5400, they are
expected to give a similar distribution of TKE in the inlet section. After a length of
five diameters in downstream direction, TKE shows the fastest decrease in SC260.
Besides, the TKE is no more homogeneous in circumferential direction in SC260.
Near the top surface, a region of low TKE appears, which is less obvious in SC230
at this position. In SC230, the ring of high TKE starts to deform at about z D 10D.
It is broken by the low TKE region near the top surface and bended to the pipe
center at the breakpoints. Similar distribution of the high TKE ring is observed
in its downstream direction at z D 15D; 20D and 25D. In SC160, the reduction
of the TKE is also observed near the top surface starting at about z D 15D. The
TKE distribution in SC260 is qualitatively similar to SC230, but it is noticeable that
starting from z D 20D near the top wall surface, a region of high TKE begins to
build up, which cannot be clearly identified in SC230.
A quantitative analysis of the TKE at z D 25D in various circumferential
direction is shown in Fig. 8. The profile from isothermal flow at z D 0D is given
with the symboled line as a reference. At z D 25D, the TKE at all circumferential
directions in these cases is reduced compared with that of isothermal flow. In the
direction of
D 0ı , the original peak value of TKE near the wall disappears
in SC160 and SC230. In SC260, TKE shows a character of two peaks instead
of a single peak in this direction. The peak near the wall (0:8 < r=R < 0:9)
corresponds to the recovery of TKE in the last figure in the third row from Fig. 7.
(c) 4.5
0 0.2 0.4 0.6 0.8 1
Fig. 8 TKE=w;0 at z D 25D of SC160 (a), SC230 (b) and SC260 (c), legend is identical as shown
in (a)
It is also the position, where a strong velocity gradient brought by flow acceleration
is observed in Fig. 4. In SC230 and SC260, a broad peak value away from the wall
(0:6 < r=R < 0:8) is found in the direction of
D 45ı , which is absent in SC160.
The shear production rate of turbulent kinetic energy (Pk) at various circumferen-
tial positions at z D 25D is shown in Fig. 9, where Pk is defined as Pk D Ui00 Uj00 e
xj .
The isothermal flow at z D 0D is marked with a symbol as a reference. In SC230,
Pk almost vanishes at
D 0ı , which explains the significantly reduced TKE at this
position in Fig. 8. The profile at
D 45ı shows a sign change near r=R D 0:8,
which is relevant with the secondary flow at this position. Pk at
D 90ı is with a
reduced peak value, while Pk at 180ı shows a higher peak. For the pipe bulk area
0 < r=R < 0:9, Pk is significantly reduced at
D 0ı ; 90ı , and 180ı . In SC260,
Pk shows a slight double peak character at
D 0ı . The first peak near the wall can
be explained with the increased velocity gradient brought by flow acceleration as
shown in Fig. 5. At
D 45ı , Pk shifts its peak to r=R D 0:7 under the influence of
secondary flow. At
D 90ı and 180ı , narrow peak with a maximum close to the
original value is observed in the figure.
Heat Transfer of Supercritical CO2 Using DNS 483
60 60
40 40
20 20
0 0
−20 −20
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
Fig. 9 Circumferential distribution of Pk at z D 25D of SC230 (a), and SC260 (b), legend is
identical as shown in (a)
Fig. 10 Circumferential distribution of BPk at z D 25D of SC230 (a), and SC260 (b), legend is
identical as shown in (a)
4 Computational Performance
The parallel computational performance will be discussed in this chapter. The hard-
ware utilized for the computations is Hazel Hen located at the High-Performance
Computer Center Stuttgart (HLRS, Stuttgart). Hazel Hen is a Cray XC40 system
484 X. Chu et al.
(a) 20 ideal
(b) 1.2
current DNS
0 0
0 500 1000 1500 2000 2500 3000 0 500 1000 1500 2000 2500 3000
Fig. 11 HPC performance of current DNS case (strong scaling, 80 Mio. cells) on Hazel Hen,
speedup (a), efficiency (b)
that consists of 7712 compute nodes. Each node has two Intel Haswell processors
(E5-2680 v3, 12 cores) and 128 GB memory, and the nodes are interconnected by
a Cray Aries each network with a Dragonfly topology. This amounts to a total of
185,088 cores and a theoretical peak performance of 7.4 PFlops.
Parallel scalability of the current numerical solver has been tested on the Hazel
Hen platform, as shown in Fig. 11. Under the condition of the present mesh size (80
Mio. cells), the solver shows a linear, even super linear, scalability until 700 cores.
A considerable speedup can be expected at 1400 cores (80 % efficiency) and 2800
cores (60 % efficiency). At 2800 cores, about 28000 cells are distributed on a single
computational core. In a daily job, it costs about 4 days on 1400 cores for running
10 flow through time in the pipe. In the foreseeable future, the mesh resolution will
increase to 300 Mio. aimed at a higher Reynolds number and an improved resolving
of Kolmogorov scale and Batchelor scale.
5 Conclusions
In the current research, heat transfer to supercritical CO2 in a horizontal pipe has
been investigated using direct numerical simulation (DNS) for the first time. A well
resolved DNS eliminates the uncertainty brought by turbulence modeling and gives
the opportunity to discover the stratification in the turbulent flow field directly. The
small pipe diameter (D D 1, 2 mm) with moderately low inlet Reynolds number
(Re0 D 5400) is similar as the channel flow in the compact heat exchanger (PCHE).
Inlet flow temperature (T0 ) is slightly lower than the pseudo-critical temperature
Tpc . Some interesting results have been found and discussed. The open-source code
OpenFOAM runs on the HPC platform Hazel Hen with an excellent scalability with
up to 2800 cores. It shows also potential for efficiently dealing larger problem with
more computational resource. Compared with vertical orientation, flow stratification
Heat Transfer of Supercritical CO2 Using DNS 485
was observed in horizontal layout. In addition to this, ‘M’ shaped velocity profile
as a result of buoyancy in vertical layout, was missing in horizontal orientation.
In the next step, the mesh resolution will increase to 300 Mio. aiming at a higher
Reynolds number and an improved resolving of Kolmogorov scale and Batchelor
1. NIST Chemistry Webbook: In: Lemmon, E., McLinden, M., Friend, D., Linstrom, P., Mallard,
W. (eds.) NIST Standard Reference Database Number 69. National Institute of Standards and
Technology, Gaithersburg (2011). http://webbook.nist.gov/chemistry/
2. Bae, J.H., Yoo, J.Y., Choi, H.: Direct numerical simulation of turbulent supercritical flows with
heat transfer. Phys. Fluids 17(10), 105104 (2005)
3. Cheng, X., Kuang, B., Yang, Y.: Numerical analysis of heat transfer in supercritical water
cooled flow channels. Nucl. Eng. Des. 237(3), 240–252 (2007)
4. Chu, X., Laurien, E.: Investigation of convective heat transfer to supercritical carbon dioxide
with direct numerical simulation. In: High Performance Computing in Science and Engineer-
ing’15, pp. 315–331. Springer, Cham (2016)
5. Chu, X., Laurien, E., McEligot, D.M.: Direct numerical simulation of strongly heated air flow
in a vertical pipe. Int. J. Heat Mass Transf. 101, 1163–1176 (2016)
6. Dostal, V., Driscoll, M.J., Hejzlar, P.: A supercritical carbon dioxide cycle for next generation
nuclear reactors. Ph.D. thesis, Massachusetts Institute of Technology (2004)
7. Duffey, R.B., Pioro, I.L.: Experimental heat transfer of supercritical carbon dioxide flowing
inside channels (survey). Nucl. Eng. Des. 235(8), 913–924 (2005)
8. Eggels, J.G., Unger, F., Weiss, M.H., Westerweel, J., Adrian, R.J., Friedrich, R., Nieuwstadt,
F.: Fully developed turbulent pipe flow: a comparison between direct numerical simulation and
experiment. J. Fluid Mech. 268, 175–210 (1994)
9. He, S., Kim, W.S., Bae, J.H.: Assessment of performance of turbulence models in predicting
supercritical pressure heat transfer in a vertical tube. Int. J. Heat Mass Transf. 51(19–20), 4659–
4675 (2008)
10. Jackson, J.D.: Fluid flow and convective heat transfer to fluids at supercritical pressure. Nucl.
Eng. Des. 264, 24–40 (2013)
11. Li, X., Hashimoto, K., Tominaga, Y., Tanahashi, M., Miyauchi, T.: Numerical study of heat
transfer mechanism in turbulent supercritical CO2 channel flow. J. Thermal Sci. Technol. 3(1),
112–123 (2008)
12. Lund, T.S., Wu, X., Squires, K.D.: Generation of turbulent inflow data for spatially-developing
boundary layer simulations. J. Comput. Phys. 140(2), 233–258 (1998). http://dx.doi.org/10.
13. Nemati, H., Patel, A., Boersma, B.J., Pecnik, R.: Mean statistics of a heated turbulent pipe flow
at supercritical pressure. Int. J. Heat Mass Transf. 83, 741–752 (2015)
14. Pandey, S., Laurien, E.: Heat transfer analysis at supercritical pressure using two layer theory.
J. Supercrit. Fluids 109, 80–86 (2016)
15. Schoppa, W., Hussain, F.: Coherent structure dynamics in near-wall turbulence. Fluid Dyn.
Res. 26(2), 119–139 (2000)
486 X. Chu et al.
16. Weller, H.G., Tabor, G., Jasak, H., Fureby, C.: A tensorial approach to computational
continuum mechanics using object-oriented techniques. Comput. Phys. 12(6), 620–631 (1998)
17. Wu, X., Moin, P.: A direct numerical simulation study on the mean velocity characteristics in
turbulent pipe flow. J. Fluid Mech. 608, 81–112 (2008)
18. Yang, J., Oka, Y., Ishiwatari, Y., Liu, J., Yoo, J.: Numerical investigation of heat transfer in
upward flows of supercritical water in circular tubes and tight fuel rod bundles. Nucl. Eng.
Des. 237(4), 420–430 (2007)
19. Yoo, J.Y.: The turbulent flows of supercritical fluids with heat transfer. Ann. Rev. Fluid Mech.
45, 495–525 (2013)
CFD Analysis of Fast Transition from Pump
Mode to Generating Mode in a Reversible Pump
1 Introduction
Pumped storage power plants are an efficient way to store energy at a large scale.
Their importance increases with a growing share of renewables in the grid, as
excessive energy can be stored in times of high production and released when the
demand exceeds production. An optimal storage cycle in terms of profit is around
6 h [8]. However, the current procedure for changing from one operating mode to the
other is still time consuming. It is therefore desirable to develop faster manoeuvres.
In order not to damage the machine, it is important to understand the flow
mechanisms during a change of operating modes. An overview of possible flow
phenomena in reversible pump turbines is given in [5]. CFD has proven to be a
suitable tool for gaining such information and various authors have investigated
hydraulic machines under time varying conditions such as runaway [3, 4, 6], start-
up [7] and speed-no load [2] conditions.
This project investigates a fast transition from pump mode to generating mode
in a model scale reversible pump turbine with a linear variation of rotational speed
and a fixed guide vane opening. Due to the comparably large number of time steps
and the model size, a suitable parallelization is required. The present work focuses
on setup, a comparison of different meshes and flow field analysis. Results from a
pre-study on a coarse mesh are found in [10]. A preliminary evaluation of related
results from a model test is presented in [9] and a comparison between simulation
and experiment is published in [11].
2 Computational Mesh
Two block structured meshes are generated for the analysis, a coarse mesh con-
taining approximately 2.5 million points (2.5M) and a refined mesh with 20 million
points (20M). The geometry is split into four domains, spiral case (SC), twin cascade
(SV/GV), runner (RUN) and draft tube (DT). Domains are connected with each
other via interfaces. The number of mesh points per domain is listed in Table 1.
During the meshing process, special attention was paid to keeping the cells close
to the walls comparable between the meshes in the guide vanes and the runner.
Average yC values are between 20 and 65 in the runner and between 20 and 50 in
the guide vanes, depending on the time. In the draft tube, the coarse mesh showed
low yC values with a maximum average over time of 30. Therefore, wall distance
of the first point was increased in the fine mesh, leading to average values between
40 and 180.
Simulations are carried out using OpenFOAM® 2.3. The single domains are
connected via the arbitrary mesh interfaceİ (AMI). Flow rate is prescribed at the
respective inlet, i.e. at the draft tube outlet in pump mode and at the spiral case
in generating mode, together with a zero gradient condition for pressure. At the
remaining outlet, a constant average pressure and a zero gradient condition for
velocity are applied. Time varying values for flow rate and rotational speed during
the transient are prescribed via table files, where a linear variation of rotational
speed is chosen and flow rate is determined by the test rig conditions. It shows a
large gradient during the first half of the transient, while the change in operating
mode is moderate. The guide vane angle is fixed to 25ı .
CFD Analysis of Fast Transition from Pump Mode to Generating Mode 489
Figure 1 shows the transient in a four quadrant plot. The transient starts in the
lower left corner with negative flow rate and negative rotational speed, i.e. normal
pump mode. As the rotational speed decreases, the machine passes to the next
quadrant. Flow direction is now from the spiral to the draft tube, while the runner
continues its rotation in the same direction as before (pump brake or dissipation
mode). Finally, the runner reverses its rotational direction and the upper right
quadrant is reached. This represents normal generating mode.
The solution procedure follows the SIMPLE algorithm, a semi-implicit seg-
regated approach. To account for the time dependent behaviour, the transient
solver transientSimpleDyMFoam is employed, where DyM indicates the solver’s
capability to deal with moving meshes. The k-omega-SST model is chosen for
Discretisation is first order accurate in time, first/second order accurate for
the convection term and first order accurate for turbulent quantities. Higher order
schemes could not produce converged solutions under the highly unsteady flow
Time step is constant throughout the complete transient to facilitate FFT of
the result quantities. It is chosen to be 5 104 s, equalling 2.8ı at the maximum
rotational speed at the beginning of the transient. This leads to a maximum CFL
number of 110 at the beginning and 65 at the end of the transient. Peak values of
130 and high amplitude flucutuations are found at the beginning of generating mode
between 5.8 and 6.5 s.
The influence of time step size is tested for the unstable operating regime in
pump mode between 2 and 3 s. A time step of 2 104 s equalling 0.9ı per time step
is used for comparison. Although the flucutuations for head, torque and pressure at
the monitor points differ over time, mean values remain unchanged. As an example,
head is presented in Fig. 2.
490 C. Stens and S. Riedelbauch
Fig. 2 Influence of time step size on simulated head for the fine mesh
The SIMPLE algorithm requires a number of so called outer corrections, i.e. the
equations for pressure, velocity and turbulent quantities need to be solved multiple
times during each time step until a converged solution is reached. The number of
iterations is dependent on mesh size and mesh quality. A study of the development of
head and torque over the number of outer corrections during an unstable phase of the
transient leads to the conclusion that seven steps are sufficient for the coarse mesh
and eleven are required for the fine mesh. The coarse mesh additionally requires two
corrector steps due to mesh non-orthogonality. In both cases, relaxation factors of
0.3 for pressure and 0.7 for velocity and turbulent quantities are employed.
4 Computational Resources
Simulations were run on the ForHLR 1 cluster at SCC Karlsruhe. A speedup test was
carried out for both meshes in order to investigate the scalability of OpenFOAM®
2.3 and determine a suitable number of cores for the subsequent simulations. The
results are presented in Fig. 3 for both meshes. For the coarse mesh, scalability is
nearly ideal up to 40 cores, equalling 62,500 mesh points per core. The same test
for the fine mesh reveals that this number is not achieved for the higher number of
overall mesh points. The behaviour is ideal up to only 80 cores and acceptable up to
120 cores, equalling 250,000 and 167,000 mesh points per core, respectively. This
leads to computation times of approximately four days for the complete transient
with the coarse mesh and 15 days with the fine mesh.
CFD Analysis of Fast Transition from Pump Mode to Generating Mode 491
For evaluation, a number of pressure monitor points are defined along the machine
as shown in Fig. 4. There are two points on each side of each runner blade, one
at the top of each guide vane channel and four in the draft tube below the runner.
The points in the spiral case and at the draft tube outlets are used to calculate head.
Additionally, simulated head and torque are analysed and compared between the
To ensure a converged solution at each time step, initial and final residuals
are monitored during the solution. For velocity, highest residuals appear at zero
flow rate, while pressure residuals show a minimum at zero rotational speed. The
predefined final residual is reached for all variables in every time step. The number
of iterations required to meet that target in the last iteration is approximately six for
pressure during the transient and rises up to 20 during the pump instability. From
492 C. Stens and S. Riedelbauch
Fig. 5 Simulated head and dimensionless torque over time. A constant head is used for the
normalization of torque
residuals and number of iterations, flow direction from spiral to draft tube seems to
be numerically more stable than in opposite direction.
Simulated head and torque give a first indication of the behaviour of the machine
and are presented in Fig. 5. A comparison of the results shows that the coarse mesh
is already able to capture the general trends and agrees well with results from the
finer mesh under unstable operating conditions. In generating mode, both head and
torque show a nearly constant offset, where the finer mesh gives a lower head and
a higher absolute value of torque. The offset in head is approximately 5 % of the
reference head.
As volume flow rate decreases to small values in pump mode, large fluctuations
occur in torque and pressure on the runner blades, which continue in the first half
of the pump brake quadrant. This is a result of stall in the guide vanes. While
flow is evenly distributed between the guide vane channels in pump mode, it
concentrates on single channels during the pump instability while other passages are
nearly blocked. Figure 6 shows the torque on the guide vanes during the instability.
Irregularities can be tracked across various adjacent channels, starting from 1.9 s.
This indicates rotating stall in the guide vanes, a phenomenon that has been found
before in centrifugal pumps and pump-turbines [1, 12]. In pump mode, a passing of
the disturbances from high to low channel numbers signifies that the phenomenon is
moving in the rotational direction of the runner. The constant distance between the
lines shows that the absolute values of torque are similar for all guide vanes, with
the exception of guide vane number four, which contained an error in the setup. This
behaviour is found independently of the mesh size, but the onset of the phenomenon
CFD Analysis of Fast Transition from Pump Mode to Generating Mode 493
Fig. 6 Disturbances in the guide vane torque signal passing through the channels during the pump
instability. The torque signal has been offset by the respective guide vane number. Left picture:
coarse mesh, right: fine mesh
Fig. 7 Flow visualization in the guide vanes at t = 2.45 s. Arrows coloured by flow velocity from
1 to 7 m/s
is earlier in the coarse mesh. The speed of propagation decreases with decreasing
rotational speed of the runner.
Another indicator of rotating stall is the flow rate through each of the guide vane
channels. It gives similar results, but is less easily visualized as the differences
between the channels interfere with the disturbances caused by stall. In single
channels, zero flow or even backflow occurs while the global flow rate is still at 40 %
of its initial value. Flow visualizations as in Fig. 7 show that at the beginning, stall
occurs near the bottom ring and in the middle of the channels, while flow near the
head cover side remains stable. At lower flow rates, outward flow concentrates near
the head cover and bottom ring meridionals, with almost no flow or slight backflow
in the middle of the guide vane channels. It is therefore interesting to evaluate the
third possible variable to track rotating stall, namely pressure at the top of the guide
vane channels. While torque and flow rate describe the integral result of the flow in
the entire channel, pressure is evaluated locally. Although located in a region where
494 C. Stens and S. Riedelbauch
flow stays stable for a longer time, the start of irregularities in the signal coincides
in time with those in flow rate and torque.
An FFT of short periods of the signal reveals that during the rest of the transient,
flow through the guide vanes is dominated by the passing of the runner blades,
especially in pump brake mode. Here, flow is forced outward (pump direction)
near the runner blades, but inward between the runner blades. This leads to large
fluctuations in pressure and torque on the guide vanes.
In the runner, the fluctuations in the torque contribution for each blade change in the
period where rotating stall in the guide vanes is detected. However, the contribution
to overall torque is still evenly distributed between all seven blades during the
relevant period from 1.9 to 2.7 s. Only at very low flow rates, curves start to deviate
from each other. Differences are random rather than passing from one blade to the
next. The pressure sensor on the pressure side near the guide vanes confirms this
As in head, a constant offset exists between the two meshes in generating mode
at the pressure side sensor as shown in Fig. 8. On suction side, the offset disappears
between 8.5 and 8.6 s, where the curve for the finer mesh jumps back to the one of
the coarser mesh. The sudden change in pressure in simulation results from the fact
that in the upper part of the suction side, flow is able to follow the blade contour,
while in the lower part of the channel, it detaches from the suction side. The jump
signals that the border between the two has moved further downward and the sensor
has passed from the stall zone to the one with attached flow. In the coarse mesh,
the general flow is comparable to the fine mesh, but the pressure gradient along the
Fig. 8 Pressure on the pressure side of a runner blade (left) and the suction side (right). High
pressure side (HP) close to the guide vanes, low pressure side (LP) near the draft tube
CFD Analysis of Fast Transition from Pump Mode to Generating Mode 495
a b
c d
Fig. 9 (a) t D 1:0 s (pump mode). (b) t D 2:6 s (pump mode, instability). (c) t D 4:0 s
(dissipation mode). (d) t D 6:0 s (generating mode, low flow rate)
channel height is less steep between the two zones, resulting in a more constant rise
of mean pressure.
Figure 9 gives an impression of the flow field in different operating regimes. It
shows the streamlines in the midplane of the runner. The first picture shows the
starting point of the transient in pump mode, with flow evenly distributed between
the channels. As detected by the pressure sensors on the runner blades, flow during
the pump instability is slightly influenced near the guide vanes, but stable in the
rest of the runner channel. In pump brake or dissipation mode at 4 s, flow hits the
runner blades at approximately one third of chord length. Flow is strongly three
dimensional, as vortices form as well around a vertical as around a horizontal axis.
In generating mode, vortices still exist near the guide vanes, but are more stable
496 C. Stens and S. Riedelbauch
in size, form and location, leading to smaller fluctuations at the respective pressure
In the draft tube, four monitor points are located below the runner positioned at 90ı
from each other. Figure 10 provides the signal of the first pressure sensor for both
meshes. As in the other domains, there is a good agreement between the results from
the different cell sizes concerning the general behaviour.
The behaviour itself is characterized by high fluctuations that start in pump
mode and disappear after a certain flow rate and rotational speed are reached in
generating mode. An FFT of the signal shows high amplitudes at low frequencies in
the middle of the transient, but without a singular identifiable frequency that could
give evidence for a vortex rope rotating at a defined speed. Compared to the coarse
mesh, the finer mesh gives higher amplitudes at a larger number of frequencies.
Figure 11 shows the axial velocity in the draft tube on a line parallel to the draft
tube channels in the plane of the pressure sensors below the runner at different
times. A positive velocity signifies an upward flow, i.e. towards the runner. At the
beginning of pump mode, flow direction is upward in the complete draft tube. With
decreasing flow rate, it starts to detach from the draft tube walls and a swirling
flow away from the runner develops at the draft tube wall, while in the middle of
the draft tube, the fluid is moving towards the runner. The detachment starts at the
bottom of the draft tube and expands upwards until reaching the evaluation plane
at approximately 2.7 s. This causes the large fluctuations observed in the pressure
CFD Analysis of Fast Transition from Pump Mode to Generating Mode 497
Fig. 11 Axial velocity in the draft tube normalized to mean velocity at t = 1.0 s. Radius is
normalized to the runner outlet radius
monitor points. During the transient, the region with downward flow grows until
finally the flow direction has reversed in the complete cross section.
Investigations on the flow through a model scale reversible pump turbine during
a change from pump mode to generating mode are carried out using a coarse and
a fine mesh and the open source code OpenFOAM® . Input data for flow rate and
rotational speed was taken from experiment.
Comparing pressure at several monitor points in the machine shows that the
coarse mesh is generally able to predict tendencies in mean pressure and amplitudes
of fluctuations. However, a refined mesh gives different values e.g. on the runner
blades in generating mode. This is important for a correct prediction of the
mechanical loads caused by the fluid. As shown in [11], the values obtained from
the fine mesh are in better agreement with experimental data for head and pressure
in the guide vane channels.
The simulations for the fine mesh were run on 120 cores with a simulation time
of approximately two weeks. The number of cores was chosen based on a speedup
test. In future work, the simulation is to be coupled with a 1D model of the test rig,
so that no experimental data is necessary to predict the behaviour of the machine.
498 C. Stens and S. Riedelbauch
Acknowledgements The authors would like to thank the European Commission for funding
wihin the HYPERBOLE project (ERC/FP7-ENERGY-2013-1-Grant 608532). Part of this work
was performed on the computational resource ForHLR Phase I funded by the Ministry of Science,
Research and the Arts Baden-Württemberg and DFG (“Deutsche Forschungsgemeinschaft”).
1. Braun, O.: Part load flow in radial centrifugal pumps. Ph.D. thesis, STI, Lausanne (2009)
2. Casartelli, E., Mangani, L., Romanelli, G., Staubli, T.: Transient simulation of speed-no load
conditions with an open-source based C++ code. In: Proceedings of 27th Symposium of
Hydraulic Machinery and Systems, Montreal (2014)
3. Cherny, S., Chirkov, D., Bannikov, D., Lapin, V., Skorospelov, V., Eshkunova, I., Avdushenko,
A.: 3D numerical simulation of transient processes in hydraulic turbines. IOP Conf. Ser. Earth
Environ. Sci. 12(1), 012071 (2010)
4. Fortin, M., Houde, S., Deschênes, C.: Validation of simulation strategies for the flow in a model
propeller turbine during a runaway event. In: Proceedings of 27th Symposium of Hydraulic
Machinery and Systems, Montreal (2014)
5. Kerschberger, P., Gehrer, A.: Hydraulic development of high specific-speed pump-turbines by
means of an inverse design method, numerical flow-simulation (CFD) and model testing. IOP
Conf. Ser. Earth Environ. Sci. 12(1), 012039 (2010)
6. Li, J., Yu, J., Wu, Y.: 3D unsteady turbulent simulations of transients of the francis turbine. IOP
Conf. Ser. Earth Environ. Sci. 12(1), 012001 (2010)
7. Nicolle, J., Morissette, J.F., Giroux, A.M.: Transient CFD simulation of a francis turbine
startup. In: 26th IAHR Symposium on Hydraulic Machinery and Systems, Beijing (2012)
8. Rapp, C., Zeiselmair, A., Halblaub, A.B.: Überlegungen zur Abschätzung der
Wirtschaftlichkeit von Pumpspeicherkraftwerken. WasserWirtschaft 2, 68–74 (2016)
9. Ruchonnet, N., Braun, O.: Reduced scale model test of pump-turbine transition. In: Lipej, A.,
Muhic, S. (eds.) Cavitation and Dynamic Problems: 6th IAHR Meeting of the Working Group,
pp. 264–272. IAHR, Ljubljana (2015)
10. Stens, C., Riedelbauch, S.: CFD simulation of the flow through a pump turbine during a
fast transition from pump to generating mode. In: Lipej, A., Muhic, S. (eds.) Cavitation and
Dynamic Problems: 6th IAHR Meeting of the Working Group, pp. 264–272. IAHR, Ljubljana
11. Stens, C., Riedelbauch, S.: Investigation of a fast transition from pump mode to generating
mode in a model scale reversible pump turbine. In: Proceedings of 28th IAHR Symposium of
Hydraulic Machinery and Systems, Grenoble (2016)
12. Xia, L.S., Cheng, Y.G., Zhang, X.X., Yang, J.D.: Numerical analysis of rotating stall
instabilities of a pump-turbine in pump mode. IOP Conf. Ser. Earth Environ. Sci. 22(3), 032020
Scale Resolving Flow Simulations of a Francis
Turbine Using Highly Parallel CFD Simulations
1 Introduction
In the last years, the operation of Francis turbines is more and more in off-design
conditions. Therefore, it is important to reach a better understanding of the flow
behaviour at operating points, like part load conditions, which is focus of this paper.
As computational resources have increased, a transient, turbulence resolving flow
simulation using thousands of cores in parallel [9, 16] is conducted.
The flow field in the draft tube of a Francis turbine at part load conditions is
dominated by the vortex rope phenomenon. This leads to a complex and three-
dimensional flow field, which has to be resolved properly in space and time, as
well as with turbulence models being able to resolve a large amount of turbulence.
Good results could be achieved by using hybrid RANS-LES models, like the SAS
turbulence model in the research field of hydraulic turbines [7], which is also chosen
and investigated within this paper.
2 Numerical Methods
All flow simulations of the Francis turbine were performed using different versions
(16.0, 17.0-pre-release and 17.0) of the commercial CFD code Ansys CFX [1].
The CFD code is able to handle the rotation of the turbine runner and to couple
different meshes by an general-grid-interface. The finite-volume method is used for
discretisation based on an implicit pressure-based formulation, while the volumes of
discretisation are built around the cell nodes. A coupled algebraic multigrid (AMG)
linear solver [15] is used with an ILU based solver.
Two turbulence models are applied in this work, namely the RANS-SST [10]
(Reynolds-averaged Navier-Stokes) and the SAS-SST [3, 4, 11] (Scale Adaptive
Simulation) turbulence model. Within the SAS framework, the unsteady SST
RANS turbulence model is able to operate in SRS (Scale Resolving Simulation)
mode [12], resolving small turbulent structures similar to a LES turbulence model.
This is achieved by introducing the source term QSAS into the transport equation of
turbulence eddy frequency ! of the SST model. The additional source term leads
to a reduction of the turbulent eddy viscosity, which may be overestimated for fine
meshes at smallest turbulent scales. Therefore, a high wave-number limit based on
the WALE model is used [14] in such way that the effective eddy viscosity will not
fall below the LES eddy viscosity. Further details are referred to above mentioned
For temporal discretisation a second order backward Euler scheme is used. For
spatial discretisation different schemes are applied and investigated. For simulations
with RANS turbulence modelling a high-resolution scheme (HR) [2] is used. In
the framework of SAS turbulence modelling, less dissipative schemes, which are
formal second order, should be used to allow turbulent structures to evolve. The
first one is the bounded second order central differencing scheme [5] (BCD).
This scheme is based on the normalized variable diagram approach together with
the convection boundedness criterion. The second one is a hybrid convection
scheme [17] (hybCon), which is a combination of the HR-scheme and the central
Scale Resolving Flow Simulations of a Francis Turbine 501
differencing scheme (CD). The blending between those schemes is mainly based on
vortex detection parameters. For the turbulence quantities a bounded second order
backward Euler scheme is applied for the temporal discretisation and a first order
scheme for the spatial discretisation [13].
The geometry of the Francis turbine being used in this study is depicted in Fig. 1.
The different parts are the spiral casing, stay and guide vanes, runner and draft tube
with expansion tank (in streamwise direction). According to these parts, the domain
is divided into four domains of hexahedral meshes coupled with a general-grid-
interface. At the inlet of the spiral casing typical steady-state boundary conditions
are applied for the velocity profile and the turbulent quantities.
The meshes used in this study are in the range between 16 and 300 million mesh
nodes (see Table 1). The 16M-mesh has a near-wall resolution of yC D 9–16 and all
other meshes of yC D 1. The mesh refinement between the meshes strongly focuses
on the draft tube domain, almost reaching LES-like resolution in the boundary layer.
Further details of the mesh are referred to in [9]. The time step is chosen to keep the
Courant number below one in the whole computational domain.
Fig. 1 Visualisation of the computational domain of the Francis turbine, the red lines indicate the
evaluation lines, points D and G are used for wall-pressure evaluation
502 T. Krappel and S. Riedelbauch
Table 1 Description of different grid sizes for different domains in million nodes and the
corresponding time steps, see also [9]
Name Spiral Stay&guide Runner Draft tube Total #Time steps t in
casing vanes /rev /time step
16M 1:02 3:70 3:78 8:09 16:20 180 2:0
50M 4:54 10:18 13:47 22:14 50:33 720 0:5
150M 7:29 17:95 29:90 98:81 153:95 840 0:43
300M 11:84 27:92 54:98 211:62 306:36 1000 0:36
12.5 4
12.25 3.75
12 3.5
Hydraulic losses H/Href [%]
11.5 3
11.25 2.75
11 2.5
10.75 2.25
10.5 2
6.25 115
Hydraulic losses H/Href [%]
5.75 86
Fig. 2 Hydraulic losses for different simulation approaches: total machine (top, left), SVWG- and
RU-component (top, right), DT-component (bottom, left) and Euler head of the runner (bottom,
The comparison of hydraulic losses and Euler head of the different simulations of
the Francis turbine are depicted in Fig. 2. The Euler head is defined as:
whereas index 1 indicates the runner inlet and 2 the runner outlet.
Scale Resolving Flow Simulations of a Francis Turbine 503
The hydraulic losses of the total machine exhibit higher values for the simulations
applying the SST turbulence model. The simulations with the SAS-turbulence
model lead to lower hydraulic losses. The losses obtained with the BCD-scheme are
lower than those of the hybrid convection scheme (the reason is discussed later). For
both convection schemes using the SAS-turbulence model no strict grid convergence
is reached, even for the 300M-mesh, with the lowest loss values.
The results with the SST-model, especially with the coarse mesh, predict the
highest losses in all components. This is explained by the dissipative character of
a RANS model and its inability to resolve the turbulent flow structures. The 16M-
SAS-hybCon-simulation predicts higher draft tube losses. This might be caused by
the deviant tangential velocity component in the draft tube cone (see Fig. 3).
The draft losses obtained by using the SAS-model decrease with larger meshes.
The losses of the upstream components stay- and guide vanes (SVWG) and runner
(RU) depend on the convection scheme. The BCD-scheme predicts quite similar
results for all meshes. The losses using the hybCon-scheme decrease with larger
meshes. This might be explained by the nature of the convection scheme, as it
switches from a HR-scheme at the inlet to a CD-scheme (beside the boundary layer)
where turbulent structures are resolved. The coarse mesh simulations are closer to
the SST-results and the losses of the fine meshes are lower.
There is still quite an offset between the BCD- and hybCon-scheme for the runner
losses. Whereas the losses obtained by the BCD-scheme are quite constant for all
meshes, the losses for the hybCon-scheme decrease with larger mesh sizes. This
might be explained by the Euler head at the runner inlet, which shows a similar
trend. The Euler head at the runner outlet is quite the same for the larger meshes, for
which reason the flow distribution into the draft tube should be quite similar. The
Euler head difference between runner inlet and outlet indicates the resulting torque
predicted by the simulations. This trend is similar to the trend of the Euler head at
the inlet as the Euler head at the outlet is quite constant.
0 0.8
Velocity ctan/cref [-]
Velocity cax/cref [-]
-0.6 0.4
-1.4 -0.2
0 0.25 0.5 0.75 1 0 0.25 0.5 0.75 1
Radius R/Rref [-] Radius R/Rref [-]
Fig. 3 Time-averaged normalised axial (left) and circumferential velocity component (right) in
the draft tube cone
504 T. Krappel and S. Riedelbauch
Length L/Lref [-]
0 0.5 0 0.5 0 0.5 0 0.5
Velocity cm/cref [-]
Fig. 4 Velocity distributions in the diffuser of the stream-wise velocity component for different
stream-wise positions
In the draft tube cone and diffuser a flow analysis is done for time-averaged velocity
components, which are depicted in Figs. 3 and 4. The positions of the evaluation
lines are according to Fig. 1. The simulation time for all configurations equals 40
runner revolutions of time-averaging.
The axial velocity component in the cone is quite similar for all results of the
16M-mesh. The simulations on the finer meshes predict a higher axial component
in the centre of the cone, except for the 50M-SST-simulation. For higher radii this
trend is inverted.
The tangential velocity component is more or less similar for all finer meshes.
The results of the 16M-SST and 16M-SAS-BCD simulation are almost the same
with lower values in the centre. The 16M-SAS-hybCon predicts an even lower swirl
in the centre of the cone.
At the end of the draft tube diffuser, the SST-simulations predict separation at the
upper wall (L=Lref =1). The 16M-SAS-BCD simulation predicts lower values at the
lower part. The other simulation approaches show a quite similar flow distribution.
The vortex rope induced pressure pulsations are evaluated with the wall-pressure
signal at two positions in the draft tube cone somewhat above the evaluation line in
Fig. 1 named D and somewhat below named G. The results of the time signal and
FFT can be seen in Fig. 5. As the results of the 150M-mesh are quite similar to the
results of the 50M-mesh, they are not discussed in this section.
Scale Resolving Flow Simulations of a Francis Turbine 505
0 5 10 15 20
Runner revolutions [−]
Relativ Href−normalized hydraulic head [%]
0 5 10 15 20
Runner revolutions [−]
5 5
2 2
Pressure Amplitude Δp/ρgH [%]
1 1
0.5 0.5
0.2 0.2
0.1 0.1
0.05 0.05
0.02 0.02
0.01 0.01
0.1 0.2 0.5 1 2 5 10 20 40 0.1 0.2 0.5 1 2 5 10 20 40
Fig. 5 Wall-pressure evaluation in the draft tube cone with comparison to experimental results in
point D; top: time signal in point D, middle: time signal in point G, bottom: FFT-analysis in point
D (left) and point G (right); legend is the same for all figures
At the upper part of the draft tube cone, the results are compared with measure-
ments done at the closed loop test rig at the laboratory at the Institute of Fluid
Mechanics and Hydraulic Machinery, University of Stuttgart. The wall-pressure
pulsation in point D is measured with piezo-resistive pressure transducers. At this
point the wall-pressure signal has almost sinusoidal shape, mainly consisting of the
first and second mode of the vortex rope. The first mode is at around f =fn D 0:3.
The frequencies induced by the runner blade wakes at f =fn D 13 and f =fn D 26 can
only be resolved by the SAS turbulence model with larger meshes. The simulations
506 T. Krappel and S. Riedelbauch
fit quite well with the experimental results. The low frequency pressure oscillation
of the first modes and of the runner blades are quite similar predicted by the
simulations, except for the 16M-SAS-hybCon and 50M-SST-simulation.
At the end of the draft tube cone at point G, the wall-pressure signal consists of
several dominating modes, like the first six to nine modes. The shape of the pressure
signal varies for different simulation approaches. The higher frequencies between
f =fn D 5 and f =fn D 10 are only predicted by the 300M-simulation and even better
by using the hybCon-scheme. The origin of these frequencies is a better resolution
of the vortex rope rotation around itself.
2000 2000
1000 1000
Eddy viscosity ratio νt/ν [-]
Eddy viscosity ratio νt/ν [-]
500 500
5 0 0.2 0.4 0.6 0.8 1
0 0.25 0.5 0.75 1
Length L/Lref [-]
Radius R/Rref [-]
Fig. 6 Turbulent eddy viscosity ratio in the draft tube cone (left) and diffuser (right); legend is the
same as in Fig. 3
Scale Resolving Flow Simulations of a Francis Turbine 507
16M-SAS-hybCon 50M-SAS-BCD
300M-SAS-BCD 300M-SAS-hybCon
Fig. 7 Visualisation of flow structures with iso-surface of velocity invariant Q D 1, coloured with
a turbulent eddy viscosity ratio of t = = 0–100
more dissipative in the cone, only large structures are resolved. Further downstream
in the elbow, the model switches into SRS-mode. With finer meshes more details of
the flow can be resolved like the runner blade wakes. The results of the 300M-mesh
show very fine flow structures, whereas with the hybCon-scheme even smaller flow
structures can be resolved.
The CFD solver is highly optimized for large scale parallel systems using the
SPMD (Single Program Multiple Data) parallelisation approach, combined with the
common MeTiS [8] domain decomposition method. The partitioning topology is
508 T. Krappel and S. Riedelbauch
6 CFX-v17.0
500 1000 1500 2000 2500 3000 3500 4000 4500
Fig. 8 Speed up tests for the transient flow simulations of the Francis turbine using the SAS
turbulence model for the 300M-mesh with different code versions; the dash-dotted line indicates
simulations with extensive data recording
Scale Resolving Flow Simulations of a Francis Turbine 509
to the performance of version CFX-v16.0, which decreases from around 3000 cores
on, the newer version CFX-v17.0, the preliminary (pre) and final version, still has
an increasing speedup up to at least 4000 cores. Speedup-tests with larger core
counts were not possible due to licensing limitations. The parallel performance for
simulations with extensive data recording impairs, especially for larger core counts.
This means that for around 33,000 defined points, mostly in the draft tube domain,
physical data, like velocity, pressure and turbulent quantities, is recorded for each
time step.
5 Conclusion
4. Egorov Y., Menter, F.R., Cokljat, D.: The scale-adaptive simulation method for unsteady
turbulent flow predictions. Part 2: application to aerodynamic flows. J. Flow Turbul. Combust.
85(1), 139–165 (2010)
5. Jasak, H., Weller, H.G., Gosman, A.D.: High resolution NVD differencing scheme for
arbitrarily unstructured meshes. Int. J. Numer. Methods Fluids 31, 431–449 (1999)
6. Jeong, J., Hussain, F.: On the identification of a vortex. J. Fluid Mech. 285, 69–94 (1995)
7. Jost, D., Skerlavaj, A., Lipej, A.: Numerical flow simulation and efficiency prediction for axial
turbines by advanced turbulence models. In: 26th IAHR Symposium on Hydraulic Machinery
and Systems, Beijing (2012)
8. Karypis, G., Kumar, V.: MeTiS: unstrucured graph partitioning and sparse matrix ordering
system. University of Minnesota (1995)
9. Krappel, T., Ruprecht, A., Riedelbauch, S.: Turbulence resolving flow simulations of a francis
turbine with a commercial CFD code. In: High Performance Computing in Science and
Engineering’15. Springer, Berlin (2016)
10. Menter, F.R.: Two-equation eddy-viscosity turbulence models for engineering applications.
AIAA J. 32(8), 269–289 (1994)
11. Menter, F.R., Egorov Y.: The scale-adaptive simulation method for unsteady turbulent flow
predictions. Part 1: theory and model Description. J. Flow Turbul. Combust. 85(1), 113–138
12. Menter, F.R., Schütze, J., Gritskevich M.: Global vs. zonal approaches in hybrid RANS-LES
turbulence modelling. In: Fu, S., Haase, W., Peng, S.-H., Schwamborn, D. (eds.) Progress in
Hybrid RANS-LES Modelling: Papers Contributed to the 4th Symposium on Hybrid RANS-
LES Methods, Beijing. Notes on Numerical Fluid Mechanics and Multidisciplinary Design,
vol. 117, pp. 15–28. Springer, Berlin/Heidelberg (2012)
13. Menter, F.R.: Best practice: scale-resolving simulations in ANSYS CFD version 1.0 ANSYS
Germany GmbH, April 2012
14. Nicoud, F., Ducros F.: Subgrid-scale stress modelling based on the square of the velocity
gradient tensor. Flow Turbul. Combust. 62, 183–200 (1999)
15. Raw, M.J.: Robustness of coupled algebraic multigrid for the Navier-Stokes equations. In:
AIAA 96-0297, 34th Aerospace and Sciences Meeting & Exhibit, Reno (1996)
16. Pacot, O., Kato, C., Avellan, F.: High-resolution LES of the rotating stall in a reduced scale
model pump-turbine. In: 27th IAHR Symposium on Hydraulic Machinery and Systems,
Montreal (2014)
17. Strelets, M.: Detached eddy simulation of massively separated flows. In: AIAA Paper 2001-
0879, 39th Aerospace Sciences Meeting and Exhibit, Reno (2001)
CFD Simulations of Thermal-Hydraulic Flows
in a Model Containment: Phase Change Model
and Verification of Grid Convergence
Abstract Two-phase flows with water droplets greatly affect the thermal-hydraulic
behaviour in the containment of a Pressurized Water Reactor PWR. Such flows
occur, inter alia, in French PWR in the form of spray cooling. Spray cooling ensures
in case of a leak in the primary circuit the reduction of increased pressure and
temperature in the containment due to the released steam. Purpose of the current
paper is to present an application-oriented CFD model concerning heat and mass
transfer between droplets and gas during the spray cooling process with an Euler-
Euler two-fluid approach. In the current model, the resistance to droplet heating is
taken into account. A grid convergence study GCI was also performed to quantify
the spatial discretization error for a three dimensional natural convection flow
simulation using the commercial CFD package Ansys CFX 16.1. Five numerical
grids with up to 39:73 106 elements have been considered to perform this study.
Low grid convergence indexes were reported for the fine-mesh comparisons of
7:11 106 –16:85 106 and 16:85 106 –39:73 106 , resulting in averaged GCI values
of less than 1 % for all considered flow variables. The parallel scalability of the
simulations was also investigated in this work. Due to the large size and complexity
of containment simulations as well as the physically complex flow phenomena
in nuclear applications, numerical meshes with large cell numbers may have to
be generated in order to minimize the numerical errors. Hence, efficient parallel
computing is very important to get realistic computing time. Good scalability of
CFX 16.1 is achieved up to 1800 computational cores on a mesh with 83 106
elements and 24 106 nodes.
1 Introduction
droplet sizes in contrast provide droplets in the center of the spray due to less inertia
In order to investigate the thermal-hydraulic behavior in a nuclear reactor,
different CFD containment simulations have been performed. Using the commercial
CFD code CFX 4.4, a model containment of a nuclear power plant was used in the
Petten Research Center to calculate an accident scenario, which has been performed
earlier with the lumped parameter code SPECTRA [4]. For the 3D geometry, a mesh
with approx. 680,000 hexahedral cells was generated. The results of the CFD model
and the lumped parameter code were qualitatively close to each other, although
a quantitative discrepancy was observed due to the absence of an evaporation
model. The spatial discretization errors due to the coarse numerical grid used for
this investigation can be another reason for this discrepancy. Due to the large
computational efforts and the absence of hardware with high-level computational
capacity, many works could not study and quantify the discretization error in CFD
simulations for nuclear applications [1, 4, 9]. For the quantification of the spatial
discretization error, the Grid Convergence Index GCI has been proposed by Roache,
[10]. This method has been recommended for the estimation of discretization, since
it has been tested in many cases [11]. However, meshes with large element numbers
have to be generated in order to carry out a GCI study in containment calculations,
due to the complexity of both nuclear physics and geometries. Hence, efficient
parallel computing is very important in order to reduce the resulting high computing
time. To study the parallel performance of Ansys CFX 14.0, calculations using a
mesh with approx. 10:2 106 of a PWR containment were performed on the Cray
XE6 Hermit Cluster at the HLRS Stuttgart [6]. The speedup and efficiency of those
parallel calculations were significantly away from the ideal values. For 80 cores it
was approx. 35/80 and for 160 cores, 43/160. A comparison of the parallel efficiency
of Ansys CFX 14.5 and OpenFOAM16 ext was carried out also on the Cray
XE6 Hermit Cluster through transient CFD simulations in a Francis turbine [12].
A numerical Mesh with 40 106 cells was used for this investigation. The results
showed a relatively poor parallel behavior of CFX compared with OpenFOAM.
The optimum speedups were achieved at 192 cores for CFX and 768 Cores for
OpenFOAM. However, many improvements in terms of parallel performance should
have been added to the new versions of CFX, namely the version 16.1 used in this
The aim of this paper is to present an application-oriented Euler-Euler model for
Ansys CFX 16.1, which describes the heat and mass transfer for containment spray
cooling applications with larger droplets (up to 1250 m). In the present model,
the heating process of droplets is additionally taken into account which affects the
phase change. This model will be used to simulate monosized as well as polysized
sprays in the model containment THAI. Another aim of this work is to estimate
the numerical discretization error in a natural convection flow simulation based on
the theory of the grid convergence index. The applicability of this theory on the
two-room geometry THAI C and the complicated convection flow is considered. In
addition, the parallel efficiency of Ansys CFX16.1.will be investigated.
514 A. Mansour et al.
2 Computational Model
The basic mathematical approach for this work is the Euler-Euler two-fluid model,
which was developed for multiphase flows by Ishii and Hibiki [13]. Each fluid is
considered as a continuous phase and has a complete set of conservation equations
for mass, momentum and energy. Due to the interpenetrating continuum, each phase
is indicated with the so-called volumetric fraction ˛k . The subscript k indicates
the phase state gas G or liquid L. The continuous gas phase (humid air) is a
mixture of dry air and water vapor. The liquid is handled as disperse with a fixed
droplet diameter. The contact area between the phases is denoted by the interfacial
area density AKK . Through the interface area, interactions between the phases can
be taken into account. The postulated phase change model for evaporation and
condensation is implemented via source terms for mass (k ) and energy (Ek ) in
Ansys CFX. In the following the basic equations for Ishii’s two-fluid model are
explained. The mass conservation is described by
@.k ˛k /
C r.k ˛k!
u k / D ;
k (1)
k represents the density of phase k, !u k stands for the averaged velocity for
phase k and t is the physical time. The momentum conservation is described by the
following equation:
@.k ˛k ukm /
C r.k ˛k!
u k uk / D
@t m m
@.˛k p/
C rŒ.˛k k C Re;k /m C ukm k C ˛k k gm C Mk;m ;
p is the pressure, k and Re;k represent the molecular and the turbulent Reynolds
stresses of phase k and g is the acceleration of gravity. Mk;m is the momentum source
term and must be modeled. The energy equation is specified with the enthalpy ek
@.k ˛k ek /
C r.k ˛k!
u k ek / D rŒ˛ .qk C qRe;k / C ek C E :
k k k (3)
Here qk and qRe;k are the molecular and the turbulent heat fluxes. Ek represents
the source term for the energy.
Due to the application of the two-fluid model, there are several secondary
conditions, which must also confirm conservation. All volume fractions ˛k must
sum to one and all mass source terms k have to yield zero.
CFD Simulations of Thermal-Hydraulic Flows in a Model Containment 515
The droplets are modelled as disperse phase with a fixed diameter d. The
interfacial momentum transfer term Mk;m contains the interfacial drag force Mk;m .
The interfacial drag force Mk;m can be described by the following equation:
3G !
cD ju !
u G j.!
u L !
u G /
Mk;m D MG;m
D ML;m
D ˛L ; (4)
cD D .1 C 0:15 Re0:687 / : (5)
The correlation is valid for Reynolds numbers up to 800. Beyond, the Euler-
Euler two-fluid model is based on the Unsteady Reynolds Averaged Navier Stokes
(URANS) equations. Therefore the Reynolds stresses Re;k and the turbulent heat
fluxes qRe;k must be modeled. In the current work this is done with the shear stress
turbulence (SST) model, which was developed by Menter [15] and is based on two
The Method of the Grid Convergence Index was introduced by Roache [10] as a
uniform criterion to estimate the spatial discretization error in CFD applications.
The GCI is based on the theory of the Richardson Extrapolation
f1 f2
fexact f1 C ; (6)
rp 1
where f1 and f2 are solutions of the considered variables (in this investigation:
temperature, velocity, pressure and relative humidity) on two different grids with
discrete spacings h1 (fine grid) and h2 (coarse grid), respectively. r D hh21 represents
the grid refinement ratio and p stands for the accuracy order of the numerical
method. The objective of the Richardson extrapolation, according to Eq. (6), is to
provide a more accurate estimation fexact of the exact solution, using the two numer-
ical solutions f1 and f2 . The relative error between the Richardson Extrapolation
estimation fexact and the fine grid solution f1 is defined as follows:
fexact f1 "
E1 D D p : (7)
f1 r 1
In Eq. (7), " is the relative error between the fine and coarse grid solutions f1 and f2
f1 f2
"D : (8)
516 A. Mansour et al.
The estimator " would be only accepted by most CFD users as a good error
estimation for a grid doubling/halving (r D 2) and a code with 2nd-order accuracy
(p D 2). In this case, cumulative experience has demonstrated the reasonability
of the indicator " [10]. For other cases, i.e. r ¤ 2 or p ¤ 2, " seems not to be
an appropriate error estimator. On the one hand, it does not take into account r or
p and on the other hand it is not always conservative with respect to E1 . The last
issue, however, relates also to E1 which can be conservative and optimistic with an
equal probability of 50 %. For this reason, E1 cannot be a well-founded criterion
such as for example the 2 indicator for statisticians [10]. The idea behind the Grid
Convergence Index is to combine both error estimators " and E1 . Suppose, we have
performed a mesh study with any r and p and determined the error indicator E1 . The
GCI will be equal to an equivalent "eq which would produce the same E1 for the
same problem and on the same mesh but for r D 2 and p D 2.
GCI D Fs (9)
rp 1
The safety factor Fs is set to 1:25 since more than 2 meshes are used in the
current study [10]. The GCI can be understood as a measure, which indicates how
far a computed solution from the asymptotic numerical value is. To perform a GCI
study, the following procedure has been adopted [11]. Suppose that for a specific
CFD calculation we generated 3 meshes. N1 , N2 and N3 are the total cell numbers
for mesh 1 (fine), mesh 2 (middle) and mesh 3 (coarse). First, one calculates the
averaged grid spacing for each mesh
" #1=3
1 X
hi D Vi ; (10)
N iD1
where Vi the volume of each mesh cell. After calculating the grid refinement ratios
r21 D hh21 and r32 D hh32 , one should determine the observed accuracy order p using
the numerical solutions f1 , f2 and f3 .
ˇ ˇ ˇ p ˇ
ˇ ˇ f3 f2 ˇ r s ˇ
ˇln ˇ f2 f1 ˇ C ln r21
pD 32
ln.r21 /
f3 f2
s D sign / (12)
f2 f1
Equation (11) can be solved using fixed-point iteration. The fact that ff32 f 2
f1 <0
indicates an oscillatory convergence, which should also be reported. When the
observed order of accuracy agrees with the theoretical order of the numerical
method, then the grids are assumed to be within the asymptotic range. The next
step is to calculate the extrapolated value fexact for the fine mesh using Eq. 7 and the
CFD Simulations of Thermal-Hydraulic Flows in a Model Containment 517
j"21 j
GCI21 D Fs p ; (13)
r21 1
j"32 j
GCI32 D Fs p : (14)
2.3 Parallelization
The flow simulations described below are performed on the CRAY XC40 HazelHen
of the High Performance Computing Center Stuttgart HLRS. This is a supercom-
puter with 7712 compute nodes. Each node contains 24 processing cores and has
128 GB memory. In order to measure the parallel performance of CFX 16.1, one
considers the speedup which can be defined as
Sp D : (15)
T120 is the reference speedup, which denotes the wall clock time needed on 120
processing cores, while Tp is the wall clock time for p cores. For the partitioning
operation, CFX uses a node-based partitioning method. The default partitioner is
the Multilevel Graph Partitioning Software MeTiS [18]. To improve the parallel
performance of CFX, the expert parameter parallel optimization level was set to
its upper bound 3 and the large Problem partitioner was selected.
The implemented phase change model is based on heat transfer between the droplet
and gas phase. Droplets can’t evaporate until they heat up to saturation temperature.
A constant temperature distribution over the whole droplet is assumed. Based on
the convective heat transfer between droplets and gas (radiation is neglected), we
can perform a simple assessment of the droplet thermal response time according to
Crowe et al. [14]. We assume the droplets have spherical shape
@T L 6 Nu
D .T L T G / : (16)
@t cL dL2 L
518 A. Mansour et al.
cL dL2 L
T D : (18)
6 Nu ˛
The spray nozzle in the used validation experiment is characterized with a Sauter
droplet diameter of d32 D 830 m. For evaluation of T we use additional fluid
properties, see Table 1.
Based on the assumptions, we obtain a thermal response time in the order of
one second. Due to the spray velocity of roughly 20 m/s at the spray nozzle outlet,
each droplet covers a significant distance until it reaches saturation temperature. It
is therefore mandatory to consider droplet heating.
The present phase change model is based on the Euler-Euler two-fluid model.
In the case of spray cooling, droplet heating with subsequent evaporation and
volume condensation due to supersaturation of the gas phase is possible. In Fig. 1
the temperature distribution around a single droplet for the two thermodynamic
processes are shown schematically. Figure 1 (left) shows a cold droplet with
temperature T entering in a hot gas atmosphere, which holds T . When droplets
reach Tsat , the heating process is finished and evaporation occurs due to saturation.
In the case of condensation, Fig. 1 (right), droplets possess T sat . T around the
droplet is lower than Tsat and therefore the latent heat due to the phase change can
be released.
CFD Simulations of Thermal-Hydraulic Flows in a Model Containment 519
TL Tsat= TL
Fig. 1 Temperature distribution during droplet heating and evaporation (left) and during conden-
sation (right)
Important for the process of heat up and evaporation is the temperature difference
between the droplets and the surrounding humid air atmosphere (mixture of water
vapor and air as non-condensable gas). The interfacial heat flux EL due to the
temperature difference in Eq. (19) drives the droplet heating and the following
evaporation process:
6 G L
EL D EG D ˛HT ˛L .T T / : (19)
In Eq. (19), ˛HT represents the heat transfer coefficient, which is based on the
Nusselt number
1 1
˛HT D .2 C 0:6 Re 2 Pr 3 / : (20)
The described interfacial heat flux EL heats the droplet up to saturation tempera-
ture Tsat . It is determined with the gas-atmosphere conditions and can be defined by
Antoine’s equation [17]
Tsat D 230:17 C 273:15 ŒK : (21)
5:11564 log.psat /
In Eq. (21) psat is the saturation pressure and is calculated with the assumption,
that each partial pressure of water vapor in the humid gas atmosphere is equal to the
saturation pressure
where cvap stands for the volume fraction of water vapor and p is the absolute
pressure in the containment.
The following term for the mass transfer G due to evaporation or condensation
is only valid, when the droplets achieve Tsat . Is T less than Tsat , phase change is not
taken into account and droplets are only heated due to the temperature difference
between gas and droplets. When droplets achieve saturation temperature Tsat , they
520 A. Mansour et al.
evaporate in the containment atmosphere and the temperature difference drives the
phase change. Having regard to the latent heat hLG , one can obtain the evaporated
mass G , which is incorporated into the gas phase:
6 ˛HT ˛L .T Tsat /
G D L D : (23)
hLG dL
4 Numerical Method
Fig. 2 THAI C facility with the two vessels THAI and PAD (left), [19]; Unstructured grid at the
bottom of THAI vessel (right)
CFD Simulations of Thermal-Hydraulic Flows in a Model Containment 521
bottom on the inner and outer wall of the inner cylinder. The THAI vessel with
those internal installations represents the plant room, while PAD vessel corresponds
to the operating room. The geometry of THAI C was created with the Ansys
DesignModeler originating from a CAD model of Becker Technologies GmbH. The
3D mesh was generated using the Ansys Meshing Tool. This is an unstructured
grid composed of tetrahedral elements in the mesh volume and prism layers on the
walls. The structure of this grid is shown on Fig. 2. The initial state is described
by an overall temperature of 92:6 ı C and a pressure of 2 bars. The air throughout
the facility is considered to be saturated at the beginning, i.e. the relative humidity
amounts to 100 %. The THAI walls are then heated up to 105 ıC. Starting from this
initial state, a transient simulation is run until thermal equilibrium was reached, i.e.
the temperature is almost equal to 105 ıC everywhere in THAI C . Due to the high
temperatures, no steam condensation occurred during this simulation.
The transient simulations were carried out using the commercial CFD package
Ansys CFX 16.1. For turbulence and buoyancy, the SST turbulence model and
full buoyancy model were used. An advection scheme with high resolution was
employed for the spatial discretization of the URANS. This adaptive method uses
a blend factor ˇ, which varies between 0 and 1 and aims to maintain the solver
accuracy as much second order as possible. In regions with low gradients, ˇ is set
to values near to 1, i.e. the method has a second order accuracy. However, in regions
with high gradients, the first order upwind scheme will be used for more robustness,
i.e. ˇ D 0. The high resolution scheme was also used in the turbulence numerics for
the spatial discretization of the turbulent terms. The second order backward euler
method was employed for the temporal discretization of the transient terms, while a
scheme of first order was set to solve the temporal turbulent terms. The total physical
time for the simulations was set to 3.36 h.
5 Results
To perform the GCI study, three unstructured meshes with approx. 1:26 106 ,
3:02 106 and 7:11 106 tetrahedral and prism elements (see Fig. 2) were initially
generated and a grid independence study was carried out, in which a comparison
of temperature, velocity, pressure and relative humidity in different points was
performed. In Fig. 3 temperature profiles over time in two measurement points MP
are shown. One recognizes no significant difference between the two coarse meshes.
522 A. Mansour et al.
Fig. 3 Grid independence study using 3 meshes with 1:26 106 , 3:02 106 and 7:11 106 elements:
comparison of temporal temperature profiles in 2 Measurement points in THAI and PAD (left); MP
locations in THAI C (right)
In contrast, the temperature for the fine mesh with 7:11 106 elements shows a
big deviation from the coarse meshes. Due to this behavior, which has also been
detected in almost all evaluated points, a further mesh refinement was necessary
in order to make sure that the asymptotic range was achieved using the mesh with
7:11 106 elements. For this reason, two other grids with approx. 16:85 106 and
39:73 106 elements were generated, see Table 2. The corresponding grid spacings
according to Eq. 10 and the refinement ratios are also reported in Table 2. Celik [11]
recommended refinement ratios greater than 1.3 for a better accuracy of the GCI
CFD Simulations of Thermal-Hydraulic Flows in a Model Containment 523
results. The five grids in this study were generated based on this recommendation.
In addition, the CFL stability condition is an important numerical issue related to
the refinement ratio
u t
CFL D : (24)
When refining the mesh, h will decrease and the CFL number may be very large,
which could affect the stability of the numerical method. Especially for explicit
methods, the CFL condition is a very important numerical issue. Even if CFX uses
an implicit code, the refinement ratios were selected to be as small as possible
(1.33), so that the variations in the CFL number are kept in the small range of
In this study, the five grids are compared at approx. 936 points for the variables
temperature, relative humidity, pressure and vertical velocity. A major problem for
the GCI evaluations was to determine the observed order of accuracy pobs according
to Eqs. (11) and (12). In fact, this results in high p values significantly greater
than the maximum theoretical order of CFX, i.e. pmax D 2. These high pobs values
result in very low “non-realistic” GCI values. This problem has been encountered in
several complex turbulent flows in complicated geometries [20], which is the case
of this natural convection flow. The main contributors of this so called noisy Grid
Convergence could be the lack of the geometrical similarity of the unstructured
grids and the use of damping functions and switches in the turbulence models [20].
One approach to deal with this issue was suggested by Roache [20] and consists
in estimating the GCI using the theoretical order of accuracy [21]. The evaluation
of the blend factor ˇ of the high resolution scheme (see Chap. 4.2) results in an
averaged theoretical order of accuracy ptheo D 1:926 [22]. This value is used
to assess the GCI’s in the further investigations. The results reported in Table 3,
show that the GCI values are over all conservative compared to ". Successive
grid refinement leads to a minimization of both " and GCI. An exception is the
Table 3 The averaged ", GCI measures and percentage of oscillatory convergence OC over the
936 considered points for temperature T, relative humidity , pressure p and vertical velocity vy
"54 [%] "43 [%] "32 [%] "21 [%] GCI54 [%] GCI43 [%] GCI32 [%] GCI21 [%]
T 0.104 1.119 0.178 0.079 0.175 1.906 0.229 0.134
0.385 4.138 0.351 0.285 0.645 7.047 0.513 0.485
p 0.016 0.298 0.011 0.002 0.026 0.507 0.018 0.003
vy 372 534 139 180 1521 896 236 308
5-4-3 [%] 4-3-2 [%] 3-2-1 [%]
T 8.25 14.25 9.25
8.79 13.33 12.52
p 25 100 62.50
vy 70.83 47.92 59.03
524 A. Mansour et al.
Fig. 4 Comparison of the relative humidity profiles over time in MP Annulus THAI and in line 2
at 240 s (left); Locations of the MP Annulus THAI and Line 2 (right)
transition from grid no. 4 to 3. This issue was already mentioned at the beginning
of this section. A large deviation between those two grids results in large " and
GCI values for more than 7 % for the relative humidity, see Fig. 4. A possible
reason for this issue may be the numerical resolution of vortices on the mesh no.
3 with 7:11 106 , which could not have been resolved on the coarse meshes 4 and
5. For a high-quality grid independence study, the asymptotic range is assumed
to be reached for a numerical error " less than 1 % [21]. Even if the GCI’s are
more conservative than ", grid convergence index values less than 1 % have been
achieved for the grid pairs 3-2 and 2-1. For instance, a comparison of the relative
humidity profiles in Fig. 4 confirms this statement. Indeed, no significant changes
between the fine meshes 3, 2 and 1 are detected. This behavior has been identified
for almost all considered profiles. The only exception is the vertical velocity vy ,
where very large " and GCI values (more than 300 % for the fine grid set 1-2) are
reported. In effect, the flow circulation in this natural convection problem is too
slow resulting in very low velocities with an averaged value vave 0.028 m/s. In
addition, the velocities are characterized by high fluctuations due the complexity
of the geometry and the several internal installations in THAI vessel, see Fig. 5.
For these reasons, the velocity does not seem to be an appropriate field variable for
the quantification of the grid convergence index in the current natural convection
flow. However, the comparison of the transient velocity profiles on Fig. 5 shows
that the vertical velocity decreases nearly to zero when the solution is close to the
CFD Simulations of Thermal-Hydraulic Flows in a Model Containment 525
0.2 Mesh 5
0 1000 2000 3000 4000 5000 6000
Time Step [s]
Fig. 5 Comparison of the simulated velocity time diagrams for meshes 1–5 in a measurement
point in the THAI annulus
thermal equilibrium (after approx. 5000 s) in the fine grids 1-2-3, while it is always
oscillating in the coarse grids 4-5. On the basis of the above results of the Grid
Convergence Index and the mesh independence study, it can be concluded that the
asymptotic range has been reached using mesh no. 3 with approximately 7:11 106
5.2 Parallelization
Fig. 6 Speedup measurements for simulations with CFX 16.1 in a fine unstructured grid composed
of 83 106 elements and 24 106 nodes
6 Conclusions
To quantify the spatial discretization error, a Grid Convergence Index study was
carried out for a natural convection flow simulation using the commercial CFD
package Ansys CFX 16.1. This simulation is a part of the initial operating test TH
27 of the newly constructed two-room facility THAI C . Five numerical grids with
approximately 1:26 106 , 3:02 106 , 7:11 106 , 16:85 106 and 39:73 106 elements
were employed to assess the GCI values in 936 consistent points in THAI C . The
averaged GCI values for the grid set 3-2 (7:11 106 -16:85 106 ) were 0.229, 0.513
and 0.018 for the temperature, relative humidity and pressure, respectively. These
low GCI values are an indication of a good quality grid-independent solution for
mesh 3 with 7:11 106 elements. The parallel performance of CFX 16.1 was also
investigated in this work. The results of the scaling tests using a numerical grid
with 83 106 elements and 24 106 nodes showed a super-linear speedup up to
710 cores. For higher core numbers, CFX speedup decreases. However, it remains
always close to the ideal one. The maximum speedup was about 11.4/15 at 1800
cores. The computational time at this maximum point is equal to the physical time
at a ratio of 44:1.
Future work on the grid convergence index may focus on the applicability of the
GCI method in two-phase flows using the newly developed phase change model,
presented in this publication. In addition, the parallel performance of CFX 16.1
using other partitioning methods besides the Multilevel Graph Partitioning Software
MeTiS should be investigated, in order to determine the most appropriate method for
containment simulations.
CFD Simulations of Thermal-Hydraulic Flows in a Model Containment 527
Acknowledgements This work was supported by the German Federal Ministry of Economic
Affairs and Energy (BMWi) on the basis of a decision by the German Bundestag, project number
1. Babić, M., Kljenak, I., Mavko, B.: Simulation of atmosphere mixing and stratification in the
THAI experimental facility with a CFD code. In: International Conference Nuclear Energy for
New Europe, Bled, Slovenia (2005)
2. Zirkel, A., Doebbener, G., Laurien,E.: CFD simulation of forced flow within the THAI model
containment. In: 17th International Conference on Nuclear Engineering, Bruessels, Belgium
3. IAEA: Use of computational fluid dynamics codes for safety analysis of nuclear reactor
systems. Summary report of a technical meeting, Pisa, 11–14 Nov 2002
4. Houkema, M., Siccama, N.B., Lycklama à, J.A., Nijeholt, Komen, E.: Validation of the CFX4
CFD code for containment thermal-hydraulics. Nucl. Eng. Des. 238, 590–599 (2008)
5. Babić, M., Kljenak, I., Mavko, B.: Prediction of light gas distribution in experimental
containment facilities using the CFX4 code. Nucl. Eng. Des. 238, 538–550 (2008)
6. Zhang, J., Laurien, E.: 3D numerical simulation of flow with volume condensation in presence
of non-condensable gases inside a PWR containment. In: Nagel, W.E., Kröner, D.H., Resch,
M.M. (eds.) High Performance Computing in Science and Engineering’14, pp. 479–497.
Springer, Cham (2015)
7. Mimouni, S., Lamy, J.-S., Lavieville, J., Guieu, S., Martin, M.: Modelling of sprays in
containment applications with a CMFD code. Nucl. Eng. Des. 240, 2260–2270 (2010)
8. Malet, J., Mimouni, S., Manzini, G., Xiao, J., Vyskocil, L., Siccama, N.B., Huhtanen, R.: Gas
Entrainment by one single French PWR spray, SARNET-2 spray benchmark. Nucl. Eng. Des.
282, 44–53 (2015)
9. Stewering, J., Schramm, B., Sonnenkalb, M.: Validation of CFD-models for natural convection,
heat transfer and turbulence phenomena. In: Computational Fluid Dynamics (CFD) for Nuclear
Reactor Safety Applications (NRS), Bethesda (2010)
10. Roache, P.J.: Verification and Validation in Computational Science and Engineering, pp. 446.
Hermosa, Albuquerque (1998)
11. Celik, I.B., Ghia, U., Roache, P.J., Freitas, C.J., Colemann, H., Radd, P.E.: Procedure for
estimation and reporting of uncertainty due to discretization in CFD applications. ASME. J.
Fluids Eng. 130, 078001–078001-4 (2008)
12. Krappel, T., Ruprecht, A., Riedelbauch, S.: Flow simulation of a francis turbine using the
SAS turbulence model. In: Nagel, W.E., Kröner, D.H., Resch, M.M. (eds.) High Performance
Computing in Science and Engineering’13, pp. 455–463. Springer, Cham (2013)
13. Ishii, M., Hibiki, T.: Thermo-Fluid Dynamics of Two-Phase Flow. Springer, New York (2006)
14. Crowe, C.T., Sommerfeld, M., Tsuji, Y.: Multiphase Flows with Droplets and Particles. CRC-
Press, Boca Raton (1998)
15. Menter, F.R.: Two-equation eddy-viscosity turbulence models for engineering applications.
AIAA J. 32, 1598 (1994)
16. Ranz, W.E., Marshall, W.R.: Evaporation from drops, Parts I & II. Chem. Eng. Prog. 48,
141–146, 173–180 (1952)
17. Poling, B.E., Prausnitz, J.M., O’Connell, J.P.: The Properties of Gases and Liquids. McGraw-
Hill, New York (2001)
18. Karypis, G., Kumar, V.: A fast and high quality multilevel scheme for partitioning irregular
graphs. SIAM J. Sci. Comput. 20, 359–392 (1998)
528 A. Mansour et al.
19. Freitag, M., Schmidt, E., Gupta, S.: Specification GRS–Report for Double-Blind Simulations
of THAI Test TH27: Initial Operation Test of THAI C : Part 1 Natural Convection with Steam
Injection and Condensation (2015)
20. Eça, L., Hoekstra, M.: A procedure for the estimation of the numerical uncertainty of CFD
calculations based on grid refinement studies. J. Comput. Phys. 262, 104–130 (2014)
21. Longest, P.W., Vinchurkar, S.,: Effects of mesh style and grid convergence on particle
deposition in bifurcating airway models with comparisons to experimental data. Med. Eng.
Phys. 29, 350–366 (2007)
22. Mansour, A., Laurien, E.: Simulation of a Natural Convection Flow with Humid Air in a
Two-Room Geometry. Computational Fluid Dynamics (CFD) for Nuclear Reactor Safety
Applications (NRS), Boston, 12–15 Sep 2016
Simulations of Unsteady Aerodynamic Effects
on Innovative Wind Turbine Concepts
1 Introduction
In order to supply the needs of the population for more energy on the one hand and
to follow the political demand to reduce emissions on the other hand, alternative
sources of energy gain in importance. Thereby, wind energy has become one of
the most important and promising regenerative sources of energy in the last few
years. To improve the competitiveness of wind energy, the cost of energy (CoE)
has to be reduced. This can be done among other by designing more effective and
durable wind turbines or by reducing material costs. Load alleviation systems like
for example flaps can reduce fatigue loads and consequently increase the life time of
wind turbines and turbines with a two-bladed rotor can save money as the material
and installation costs are lower.
To design wind turbines with load alleviation systems or two bladed rotors
appropriate, a wide range of investigations has to be performed. Some can be done
using engineering models like blade element momentum theory (BEMT), but other
require high fidelity methods like CFD. Especially unsteady aerodynamic effects,
caused for example by flap movement or yaw misalignment, need to be investigated
appropriate as they occur frequently during the life time of a wind turbine. 2 % up to
10 % of their operating time, wind turbines are exposed to yaw misalignment [18]
and load fluctuations, caused for example by tower shadow, shear or gusts, occur
several times per revolution of the rotor.
In the present paper a three-bladed wind turbine equipped with mechanically cou-
pled leading and trailing edge flaps as well as a two-blade rotor under yawed inflow
are investigated. The three-bladed rotor equipped with load alleviation system is the
NREL 5MW wind turbine with some minor modifications introduced within the
KIC-OFFWINDTECH project [1, 2], whereas the two-bladed turbine is a prototype
developed by Skywind. For both setups, a grid convergence study according to
Celik [4] was performed in order to estimate the influence of the numerical grids
on the solution. For the three-bladed rotor the temporal discretization of unsteady
effects with a 1p excitation frequency, representing an atmospheric boundary layer,
caused by the flaps was investigated. To better understand the influence of yawed
inflow on a two-bladed rotor, an investigation of the unsteady load fluctuations
caused by yawed inflow and a comparison to a reference case without yawed inflow
was done.
The investigations presented in this paper were performed using the finite volume
solver FLOWer [14] by DLR (German Aerospace Center) that has already been
applied in several other wind energy projects [11, 20, 22, 25, 26] and is continuously
developed by the present working group. FLOWer solves the unsteady Reynold-
averaged Navier-Stokes equations (URANS) on block-structured grids. A second
order central discretization scheme JST [9] is used for spatial discretization and the
temporal discretization is realized with an implicit dual time stepping scheme [8].
For turbulence modeling the Menter SST model is applied. The overlapping grid
technique CHIMERA [3] enables the use of independently generated grids. For
the present studies the components of the turbines have been meshed individually
with fully resolved boundary layer ensuring yC D 1 of the first cell. By applying
rotationally periodic boundary conditions only one blade of a rotor can be simulated.
This allows computationally efficient studies on the rotor aerodynamics of wind
turbines under uniform inflow conditions.
Table 1 gives an overview of the turbines and their operating conditions used for the
investigations in this paper.
Simulations of Unsteady Aerodynamic Effects on Innovative Wind Turbine Concepts 531
The present investigations were performed, using a one third model of the modified
NREL 5 MW wind turbine. More information about the turbine can be found in [17].
For the present study, the turbine has been equipped with mechanically coupled
leading and trailing edge flaps using a coupling ratio of three. This means that a
deflection angle of
D 1° of the leading edge flap leads to a deflection angle of
ˇ = 3°of the trailing edge flap. This combined movement results in an increase of
the camber of the airfoil (see Fig. 1). The concept, also known as adaptive camber
concept, was developed by Lambie and Hufnagel at TU Darmstadt [15] as a passive
load alleviation concept. For the present study, the flaps extend from 60 % to 80 %
radius with the leading edge flap covering 20 % of the chord length and the trailing
edge flap covering 30 % of the chord length.
In 2D, it was already investigated experimentally under steady inflow condi-
tions [15] and dynamic inflow conditions [5, 13] and numerically under steady
inflow conditions [7]. In the present paper prescribed flap deflections have been
applied by grid deformation as described in [11].
The computational domain expands over a length of 2700 m and a radius
of 720 m. It consists of four individual grids and 8.6 million cells of overall
16.5 million cells belong to the blade grid. All simulations were started with a
steady computation in order to accelerate convergence, followed by an unsteady
The two-bladed turbine is the Skywind 3.4 MW prototype that has been subject
of the German LARS project. It is embedded in a computational domain of
532 A. Fischer et al.
1536 1024 832 m (length width height). Six individual grids are placed in a
Cartesian background grid which was refined using a hanging grid nodes technique.
Altogether the computational setup consists of approx. 43.2 million cells. The first
simulation was performed at uniform inflow conditions. To investigate the influence
of yaw misalignment on the aerodynamics of the turbine, it has been rotated up to
30° around the vertically upward directed tower axis. In these simulations the time
step corresponds to 2° azimuth. Additionally, to investigate the grid dependency, a
one half model of the turbine has been generated. It consists of three independent
grids for background, blade and hub. The background grid has the shape of a half
cylinder with a length of 2700 m and a diameter of 1440 m.
3 FLOWer at HLRS
As an example, the two-bladed full model case is regarded. The problem size
of 43.2 million cells has been computed on 1536 cpus. A simulation with 30
revolutions with 180 time steps per revolution and 50 inner iterations, as presented
in the present paper, consumes approximately 42,000 cpuh. Since 2015, only minor
changes in FLOWer, which do not affect the efficiency of the code, have been
taken place. Therefore, the weak scaling test from [19] can be used as reference.
Additionally, a strong scaling test has been performed. Both results are shown in
Fig. 2. For these tests FLOWer was compiled with ifort on Cray XC40. For the
weak scaling test the number of cells on each core was kept constant to 323 . The
strong scaling test was performed using 4096 times 323 cells. In both cases the
number of cores was increased from 128 up to 4096 and for each run the time
for 1000 iterations was taken and compared to the simulation on 128 cores. For
up to 1024 cores both cases show the same efficiency. Using 4096 cores, FLOWer
has an efficiency of 0:77 for weak scaling and 0:83 for strong scaling. This shows
Efficiency [-]
102 103 104
Cores [-]
Fig. 2 “Efficiency of FLOWer on Cray XC40 using ifort fortran compiler and a constant cell
loading of 323 for each MPI process in case of weak scaling and 4096 times 323 cells in case of
strong scaling”
Simulations of Unsteady Aerodynamic Effects on Innovative Wind Turbine Concepts 533
the suitability of FLOWer for the simulations regarded in this report and gives the
opportunity for bigger cases.
4.1 Approach
For the three-bladed and the two-bladed turbine described in Sect. 2.2 grid con-
vergence studies according to Celik [4] have been performed to evaluate the
dependency of the CFD results on the grid resolution and to find suitable grids for
further simulations. For the studies, three setups of different resolution have been
generated for both turbines, using the one half model in the two-bladed case, see
Table 2. As recommended by Celik, the grid refinement factor between two setups
has been chosen to be larger than 1.3. For the investigation of the grid convergence
of the three-bladed turbine the flaps have not been deflected and the grid had no
refinement in the flap area.
Table 3 shows an extract of the results of the grid convergence studies. For both
cases, two-bladed and three-bladed, the fine grid convergence index GCIfine and the
2 21
extrapolated error for the medium grid errext are listed. GCIfine is an indicator for
the grid dependency of the fine grid solution and errext represents the error of the
medium grid solution to the extrapolated value.
Figure 3 shows the normalized sectional driving force with respect to the
normalized radius. The three curves for the coarse, medium and fine setup are
plotted as well as the extrapolated sectional driving force. Error bars indicate the
local GCIfine . In both cases the grid convergence is very good in the outer part of the
rotor and worse in the inner part. In the root region there are big differences between
the two cases. For the three-bladed turbine, the medium and fine grid solution lie
very close and a low uncertainty is indicated. In contrast to this, the indicated
uncertainty is very high for the two-bladed turbine as all setups overestimate the
534 A. Fischer et al.
Fig. 3 Averaged sectional driving force for all three grids. Left: two-bladed, right: three-bladed
driving force close to the root compared to the extrapolated solution. This also
explains the high GCIfine in the integral driving force. Nevertheless, the influence
on power is very low, as the sectional driving force is multiplied with the radius to
get the moment. Overall the medium setup fulfills the contradictory requirements of
little computational effort and high accuracy of the solution best.
5 Results
Load reduction on wind turbines by dynamic flap deflection is associated with the
occurrence of unsteady aerodynamic effects. Therefore, the temporal discretization
of the flap deflection has to be investigated in order to ensure that unsteady
aerodynamic effects are accurately resolved. The investigation of the temporal
discretization was done for the 1p frequency, representing for example the impact
of an atmospheric shear profile or a tilt angle of the rotor. In the middle of the flap,
which corresponds to 70 % of the rotor radius the reduced frequency is k = 0.0338,
where k is a function of the rotational frequency, the velocity and the chord length.
According to Leishman [16], for 0 k 0.05 a flow can be considered as
quasi-steady and in most cases unsteady effects can be neglected. To prove this,
2D simulations of an airfoil extracted at the middle of the flap were performed.
However, it turned out, that unsteady effects do occur (Fig. 4) and the shape of the
Simulations of Unsteady Aerodynamic Effects on Innovative Wind Turbine Concepts 535
Fig. 4 cl over ˇ for different time steps. All calculations are performed with 30 inner iterations
per time step
hysteresis curves depends on the time step. Consequently, those effects and their
dependency on different factors were investigated.
If no complex separation occurs, 2D simulations are resolved at least with 50 to
100 time steps per convective time unit. Using 50 time steps per convective time
unit at 70 % of the rotor radius of the NREL 5MW wind turbine would lead to
approximately 4650 time steps for one revolution of the rotor which would not
be feasible. In 3D simulations the time step is usually given in azimuth angle
increments and typical values are between 1.0° and 3.0° per time step [19]. Table 4
shows the number of time steps per convective time unit for a selection of different
time steps used in the 2D and 3D simulations.
In order to transfer the results from the 2D simulations to the 3D case, time steps
normally used in full wind turbine simulations, corresponding to 0.1° up to 2.0°,
were used for the present 2D investigations. A time step corresponding to 0.1° serves
as reference in these simulations.
Figure 4 shows that the hysteresis curves for the different time steps converge to
one solution if the time step is small enough. This effect is independent of the angle
of attack and the flap angle as long as no separation occurs. It is also independent
of the leading edge extension and only a minor dependency on the trailing edge
extension was observed. Moreover, if the time step is small enough (in the present
case 1°), the number of inner iterations has only an insignificant influence on the
predicted time shift. As additional time for grid deformation and Chimera search is
needed at each time step, a larger time step with more inner iterations is preferable.
536 A. Fischer et al.
Fig. 5 cn over the azimuth for different time steps and number of inner iterations extracted at
r/R = 70 %
For the 3D simulations a steady computation with 45,000 iterations was per-
formed, followed by four revolutions without flaps and four revolutions with flaps
whereas only the last revolution was used for evaluation. Simulations with flap
deflections of ˇ D ˙3° and time steps corresponding to 0.5°, 1° and 2° azimuthal
increment with 30, 60 and 90 inner iterations have been performed. The reference
calculations without flaps were performed with a time step corresponding to
1.5° and 30 inner iterations.
In the 3D case, the choice of the time step has a stronger influence on the
amplitude of the load coefficient as in the 2D case. Figure 5 shows the coefficient
of the normal force cn for the 3D case for different time steps and number of inner
The simulation with 0.5° time step and 30 inner iterations shows, as expected,
the same results as the simulation with 1.0° time step and 60 inner iterations. Both
simulations have 60 inner iterations per 1.0° azimuth. Regarding the computational
costs, the case 1.0° and 60 inner iterations is preferable. Up to a certain time step
size, the time shift of the solution is independent of the time step size and the
number of inner iterations. Between 0.5° (30/60 inner iterations) and 1.0°(60/90
inner iterations) there is no difference in the time shift. However, if the number
of inner iterations is too small (1.0°/30 and 2.0°/30) or the time step size is too
big (2.0°), the time shift varies depending on these parameters. Therefore a time
step of 2° or even larger or a combination with less than 60 inner iterations per
1.0° azimuthal increment is not recommendable.
The cn amplitude for the different cases can be found in Fig. 6. It becomes
obvious, that the amplitude reaches convergence with increasing number of inner
iterations per time step. Therefore, a further increase of inner iterations per 1.0° as
90 is not worth regarding the benefit of a more accurate solution compared to the
Simulations of Unsteady Aerodynamic Effects on Innovative Wind Turbine Concepts 537
Fig. 6 cn amplitude for different numbers of inner iterations per time step corresponding to 1.0°
simulation time. The same effects as described above can be seen for other force
coefficients such as the tangential force coefficient ct and the lift coefficient cl .
A closer look to other blade sections shows, that the influence of the unsteady
effects, caused by the 1p flap deflection, occurs all over the blade. Even in the root
region, where the cylindrical part of the blade causes separation, the effects are
superimposed. Moreover, the effects seen in Fig. 5 are the same over the whole
radius. According to Table 4 the number of time steps per convective time unit for
a time step according to 0.5° at 50 % radius is approximately twice the size it is
for 70 % radius and approximately three times larger than at 90 %. Nonetheless, the
differences between the solutions caused by the size of the time step and the number
for inner iterations remain the same. This leads to the conclusion that no further
significant benefit can be expected for the whole blade for a further increase of
inner iterations or reduction of time step size. Therefore, the optimum combination
of time step size and number of inner iterations regarding accuracy of the result and
computational time for the present case with a 1p excitation frequency is 1.0° and
90 inner iterations.
In this section the full model simulation of the two-bladed turbine with 30° yaw is
compared to the baseline 0° yaw case. Figure 7 shows the influence of the yawed
inflow on the wake of the wind turbine. Compared to the baseline case the wake is
538 A. Fischer et al.
Fig. 7 Influence of yawed inflow on the wake of the two-bladed wind turbine. 2 D 2e7
isosurfaces for vortex visualization. Contour levels indicate relative velocity magnitudes. View
from top. Left: 0°, right: 30°
deflected towards the downwind side of the rotor. A common way to describe the
decrease of power under yawed conditions is the cosx function [18].
P D P0 cosx .
/ (1)
Dahlberg [6] evaluated measurements from field as well as from wind tunnel and
found exponents reaching from 1:88 to 5:14. In [21] a three-bladed turbine of
approximately the same size as the two-bladed turbine has been investigated under
yawed condition using FLOWer. An exponent of 2.38 was found there. The power of
the investigated two-bladed turbine is reduced by 22.3 % under 30° yawed condition.
This corresponds to an exponent of approximately x D 1:7 which is clearly lower
than the exponents from literature stated above. The thrust and torque on one blade
with respect to the azimuth are displayed in Fig. 8. The influence of the tower
blockage can be seen clearly at 180° azimuth, where thrust is reduced by approx. 5 %
and torque by approx. 10 % in the non-yawed case. Except of this local effect, both
loads are almost constant over the azimuth for the non-yawed case. In the yawed
case, both, thrust and torque are reduced. The maximum thrust is located between
90° and 120° azimuth, while the maximum torque is found at the upright position of
the blade at 0° azimuth. Compared to thrust, the amplitude in torque is much higher
but almost symmetrical. To understand the loads in more detail, the spanwise load
distribution at 90° and 270° azimuth is displayed in Fig. 9. In the baseline case, the
loads only differ close to the root caused by separation on the very thick airfoils.
In the yawed case the sectional loads are lower or equal compared to the baseline
case, except for the blade root region. Below 65 % relative radius the sectional loads
are higher at 90° azimuth, above they are higher at 270° azimuth. Again the blade
root region is neglected as there is a high fluctuation of the loads. At 90° azimuth the
sectional thrust is almost the same in both cases for a relative radial position of 25 %
Simulations of Unsteady Aerodynamic Effects on Innovative Wind Turbine Concepts 539
Fig. 8 Normalized blade thrust force and normalized blade torque for yawed and non-yawed
0° yaw 90° azimuth
1 0° yaw 270° azimuth 1
30° yaw 90° azimuth
sec. F_drivingnorm [-]
sec. F_thrustnorm [-]
0.2 0° yaw 90° azimuth
0° yaw 270° azimuth
0.2 30° yaw 90° azimuth
30° yaw 270° azimuth
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
r/R [-] r/R [-]
Fig. 9 Normalized sectional thrust force and normalized sectional driving force for yawed and
non-yawed inflow
Fig. 10 Normalized axial velocity (in direction of the rotor axis) in the rotor plane. View from
front in direction of the rotor axis. Left: 0ı , right: 30ı
averaged in circumferential direction. Because of this, for the presented study the
reduced axial velocity method was adapted for full model simulations with steady
inflow. It is based on the assumption, that the velocity in the rotor plane, averaged
over 1=nblades revolutions, is representative for the flow state and can be used to
determine the local flow conditions.
Figure 10 shows the normalized averaged axial velocity in the rotor plane for
both cases. For the non-yawed case it is almost constant over the azimuth, it only
shows fluctuations in the inner part and an decrease of velocity at 180° caused by
tower blockage. As it has to be expected the axial velocity is globally reduced in
the yawed case. The strongest decrease can be found between 30° and 180° azimuth
in the downwind side of the rotor with its maximum near the blade tip region. This
gives a first hint for reduced loads at 90° in the outer region of the blade observed
in Fig. 9.
Figure 11 shows the AoA and the relative local velocity normalized with the
rotational velocity for two different spanwise positions for both cases. Compared to
the non-yawed case, the average AoA is reduced in both spanwise positions but the
amplitude is higher at the inner position caused by the lower rotational velocity. The
curves are shifted to the upper right side in case of the 40 % radial position and to
the upper left side in case of the 85 % radial position. Looking at the relative local
velocity, one can see that it is dominated by the rotational velocity in the non-yawed
case. At the inner position the local velocity is approx. 4 % higher than the rotational
velocity while it is only 1 % at the outer position. In the yawed case the influence of
the wind is much higher, the curves are shifted to the lower side of the rotor where
the blades are moving against the wind, again the amplitude is higher at the inner
position with the relative velocity ranging between 90 % and 117 %.
At 90° and 270° azimuth the local velocity magnitude is almost independent of
yaw, the shapes of the sectional load graphs in Fig. 9 are only a result of the local
Simulations of Unsteady Aerodynamic Effects on Innovative Wind Turbine Concepts 541
Fig. 11 Angle of attack and relative local velocity for yawed and non-yawed inflow
6 Conclusion
1. Bekiropoulos, D., Lutz, T., Baltazar, J., Lehmkuhl, O., Glodic, N.: D2013-3.1: comparison of
benchmark results from CFD-simulation. Deliverable report, KIC-OFFWINDTECH (2013)
2. Bekiropoulos, D., Rieß, R., Lutz, T., Krämer, E., Matha, D., Werner, M., Cheng, P.W.:
Simulation of unsteady aerodynamic effects on floating offshore wind turbines. In: DEWEK
3. Benek, J.A., Steger, J.L., Dougherty, F.C., Buning, P.G.: Chimera. A Grid-Embedding Tech-
nique. Arnold Engineering Development Center Arnold Air Force Station, Tennessee Air Force
Systems Command United States Air Force (1986)
4. Celik, I.B., Ghia, U., Roache, P.J., et al.: Procedure for estimation and reporting of uncertainty
due to discretization in CFD applications. J. Fluids Eng.-Trans. ASME. 130(7), 0780011–
0780014 (2008)
5. Cordes, U., Hufnagel, K., Tropea, C., Kampers, G., Hölling, M., Peinke, J.: Experimental
investigation of passive load reduction under dynamic inflow conditions. In: 33rd AIAA
Applied Aerodynamics Conference, p. 3313 (2015)
6. Dahlberg, J., Montgomerie, B.: Research program of the Utgrunden demonstration offshore
wind farm, final report part 2, wake effects and other loads. Swedish Defense Research Agency,
FOI, pp. 2–17 (2005)
7. Fischer, A., Jost, E., Lutz, T., Krämer, E.: Numerical investigations of a passive load alleviation
technique for wind turbines. In: 10th PhD Seminar on Wind Energy in Europe, Orléans, pp. 51–
54, EAWE, 28–31 Oct 2014
8. Jameson, A.: Time dependent calculations using multigrid, with applications to unsteady flows
past airfoils and wings. AIAA Paper, 1596:1991 (1991)
9. Jameson, A., Schmidt, W., Turkel, E., et al.: Numerical solutions of the euler equations by finite
volume methods using Runge-Kutta time-stepping schemes. AIAA Paper, 1259:1981 (1981)
Simulations of Unsteady Aerodynamic Effects on Innovative Wind Turbine Concepts 543
10. Johansen, J., Sørensen, N.N.: Aerofoil characteristics from 3D CFD rotor computations. Wind
Energy 7(4), 283–294 (2004)
11. Jost, E., Fischer, A., Lutz, T., Krämer, E.: Cfd studies of a 10 mw wind turbine equipped with
active trailing edge flaps. In: 10th PhD Seminar on Wind Energy in Europe, Orléans, pp. 119–
122, EAWE, 28–31 Oct 2014
12. Jost, E., Lutz, T., Krämer, E.: A parametric CFD study of morphing trailing edge flaps applied
on a 10 mw offshore wind turbine. In: 13th Deep Sea Offshore Wind R&D Conference, EERA
DeepWind’2016, Trondheim, 20–22 Jan 2016
13. Kampers, G., Peinke, J., Hölling, M., Cordes, U., Tropea, C.: Stochastic analysis of aero-
dynamic forces acting on a self-adaptive camber airfoil in turbulent inflow. In: 33rd AIAA
Applied Aerodynamics Conference, p. 2427 (2015)
14. Kroll, N., Rossow, C.-C., Becker, K., Thiele, F.: The megaflow project. Aerosp. Sci. Technol.
4(4), 223–237 (2000)
15. Lambie, B.: Aeroelastic investigation of a wind turbine airfoil with self-adaptive camber. PhD
thesis, Technical University of Darmstadt (2011)
16. Leishman, J.G.: Principles of Helicopter Aerodynamics. Cambridge Aerospace Series, Cam-
bridge, New York (2000)
17. Matha, D., Schuon, F., Lutz, T.: Baseline fowt definition v4. Deliverable report d3.1, KIC-
18. Schepers, J.: Engineering models in wind energy aerodynamics. PhD thesis, TU Delft (2012)
19. Schulz, C., Fischer, A., Weihing, P., Lutz, T., Krämer, E.: Evaluation and control of loads on
wind turbines under different operating conditions by means of CFD. In: High Performance
Computing in Science and Engineering’15, pp. 463–478. Springer, Cham (2016)
20. Schulz, C., Klein, L., Weihing, P., Lutz, T., et al.: CFD studies on wind turbines in complex
terrain under atmospheric inflow conditions. J. Phys. Conf. Ser. 524, 012134 (2014). IOP
21. Schulz, C., Letzgus, P., Lutz, T., Krämer, E.: CFD study on the impact of yawed inflow
on loads, power and near wake of a generic wind turbine. Wind Energy (to be published).
22. Schulz, C., Meister, K., Lutz, T., Krämer, E.: Investigations on the wake development
of the Mexico rotor considering different inflow conditions. In: Contributions to the 19th
STAB/DGLR Symposium, Munich, Germany 2014. Notes on Numerical Fluid Mechanics and
Multidisciplinary Design. STAB. Springer, Nov 2014. Under review
23. Shen, W.Z., Hansen, M.O., Sørensen, J.N.: Determination of angle of attack (AOA) for rotating
blades. In: Wind Energy, pp. 205–209. Springer (2007)
24. Shen, W.Z., Hansen, M.O., Sørensen, J.N.: Determination of the angle of attack on rotor blades.
Wind Energy 12(1), 91–98 (2009)
25. Weihing, P., Meister, K., Schulz, C., Lutz, T., et al.: CFD simulations on interference effects
between offshore wind turbines. J. Phys. Conf. Ser. 524, 012143 (2014). IOP Publishing
26. Weihing, P., Schulz, C., Lutz, T., Krämer, E.: CFD performance analyses of wind turbines
operating in complex environments. In: Nagel, W.E., Kröner, D.H., Resch, M.M. (eds.) High
Performance Computing in Science and Engineering’14, pp. 403–415. Springer, Cham (2015)
Part V
Transport and Climate
Christoph Kottmeier
In the field of “Transport and Climate”, both the number and the CPU requirements
of high-performance computing projects making use of the HLRS in Stuttgart
and of the SSC in Karlsruhe have increased considerably in the last 2 years.
Currently 11 projects are ongoing, mostly related to modelling large parts of the
climate system. The topics cover a broad range of objectives as well as geographic
regions. The CPU time requirements of such models strongly increase with higher
and higher resolution, which are needed in oceanic and atmospheric models to
resolve the small scales (turbulent, convective, mesoscale) of ambient flows. It is
known from measurements that these highly energy-containing scales can strongly
control larger-scale processes. Therefore it is important to represent their net
effects adequately. This is done in coarsely resolved models by semi-empirical
parametrizations. This is not fully satisfying, however, since such parametrizations
can been hardly validated against measurements for the full parameter range of their
Therefore atmospheric and oceanic modellers go to higher resolution down
to 1 km (and partly less), aiming that processes are directly simulated. Another
general development is also reflected by the HLRS- and SSC-projects. More and
more coupling between model submodules for, e.g., the atmosphere, the ocean,
and ecosystems is realized. This also holds for nested models, where either 1-way
coupling, and still rarely 2-way coupling is realized between a coarsely resolving
model being applied for a large domain such as a global model and a limited
area model that is run at high resolution. Developing such coupling tools requires
substantial efforts in adaption and testing with many model test runs. The domain
size of the limited area model can have strong effects on the outcome. Other critical
C. Kottmeier ()
Institut für Meteorologie und Klimaforschung, Karlsruher Institut für Technologie (KIT),
Wolfgang-Gaede-Straße 1, 76131 Karlsruhe, Germany
e-mail: christoph.kottmeier@kit.edu
546 C. Kottmeier
issues are due to the coincidence of slow processes in one of the coupled model
systems (ocean and ice) and fast processes (atmosphere), which implies different
time steps in numerical schemes.
Extensive testing on the model setups have to be done therefore, before long runs
are performed, such as over decades up to centuries in climate modelling. Despite
their high quality and very ambitious objectives it was decided not include the short
interim reports from projects at an early stage into the review.
The projects chosen for oral presentation and for the HLRS report reflect very
well the high importance of the HLRS and SSC computing facilities for highly
visible research programmes in actual research.
The report on “Simulation of the rain belt of the West African Monsoon
(WAM) in high resolution CCLM simulation (WASCAL-CCLM)” by IMK-IFU in
Garmisch-Partenkirchen (KIT) addresses a vital problem for population in the semi-
desert West African regions. The monsoon brings the only rain for potable water,
agriculture, and ground water replenishment. It is highly important at which time
the onset is and high intense in a certain year rainfall is. Both result from complex
atmospheric interaction with surface processes and it a major problem to account for
convection at high resolution. For HPC this means that mesh sizes have to be rather
small and that the related HPC and storage requirements increase substantially.
Another project at SCC with focus on Australia is WA-AERO (Anthropogenic
aerosol emissions and rainfall decline in South-West Australia) by IMK-IFU. It also
addresses a desertication-threatened region, but in addition a open debate in climate
science, namely the role of aerosols for cloud and precipitation formation.
In “High resolution climate projections using the WRF model on the HLRS
(WRFCLIM)” a group from University of Hohenheim uses the Hazelhen of HLRS
goes to km-scale resolution for smaller regions, but decadal simulation periods.
In the project LUCCi, the biogeophysical impacts of land surface on regional
climate have been investigated for Central Vietnam (Vu Gia-Thu Bon basin) in
using the regional climate model (RCM) Weather Research and Forecasting (WRF)
Model. It is demonstrated that the replacement
of land surface due to an updated land-use/land cover data leads to significant
changes of the biogeophysical properties of land surface, thereby altering the
regional climate.
Aim of the project RUCACI is to investigate the impact of aerosol in high
resolution climate runs for major parts of Africa. Aerosols, particularly mineral
dust in West Africa, and their interactions with radiation and clouds represent one
of the major uncertainties in our understanding of the climate system at regional
scales. The online coupled comprehensive chemistry model system COSMO-ART,
developed at KIT already showed in several case studies the potential of closing the
gap between coarse global models and regionalized modelling. In order to apply
it on decadal climate time scales the use of high performance computing becomes
a necessity. In this study the effect of replacing the aerosol climatology usually
used in regional climate simulations with COSMO-CLM by online calculated dust
Simulation of the Rain Belt of the West African
Monsoon (WAM) in High Resolution CCLM
1 Introduction
Fig. 1 CCLM simulation domain and topography for WASCAL at 12 km resolution runs. The
red lines represent the different locations of the four longitudinal transects used in the study. A:
lon = 15 ı W, B: lon = 10 ı W, C: lon = 0 ı W and D: lon = 10 ı E
standard CCLM setup, MODIS provides a more realistic information over deserts
[10, 16]. The Runge-Kutta numerical integration scheme and the TKE advection
scheme are selected and a vertical stratification of 40 verticals levels up to 10 hPa
with additional layers close to the surface is employed. The model is driven with
ERA-Interim boundary forcing data (Fig. 1).
Earlier simulations performed by [1] pointed out that the choice of the simulation
domain has an important impact on the modeled West African summer monsoon
rainfall, because the ocean as well as the land-surface and the atmosphere are
important drivers of the monsoon circulation and need to be taken into account. To
capture the large scale atmospheric patterns in this region, we choose an extended
high-resolution domain that covers entire West Africa, a fraction of the tropical
Atlantic Ocean, the Cameroon and Fouta Djallon mountains, the Jos and Ethiopian
plateaus, the Volta River basin as well as parts of the Sahara desert (27.5 ı W to
27.5 ı E, 7.5 ı N to 27.5 ı N). This high-resolution domain (CCLM11) is nested in
a lower-resolution domain (CCLM44) at 0.44ı (approx. 50 km) resolution with a
significantly larger geometrical extent (37 ı W to 48 ı E, 18 ı N to 36 ı N), and
forced by its 3-hourly lateral boundary conditions.
550 D. Dieng et al.
The observational data for precipitation at monthly temporal resolution are obtained
from the Tropical Rainfall Measuring Mission (TRMM) [7] with a 0.25ı spatial
resolution, the Global Precipitation Climatology Centre (GPCC) [18] 0.5ı gridded
rain gauge analysis, the Climate Research Unit (CRU) [14] 0.5ı resolution data, and
the Climate Hazards Group InfraRed Precipitation with Station data (CHIRPS) [4]
gridded precipitation time series data. The monthly mean rainfall CHIRPS dataset is
derived from a combination of satellite observations and rainfall station observations
and provides gridded data at 0.05ı (approx. 5 km) resolution from 1981 to the near
3 Results
The simulated mean rainfall and CRU, CHIRPS, GPCC, GPCP, TRMM observa-
tions for the rainy season June-July-August-September (JJAS) are depicted in Fig. 2.
The CCLM11 spatial patterns are characterized by increasing rainfall amounts from
both north and south, peaking at the monsoon rain band located between 2 and
15 ı N. Largest rainfall amounts are simulated over the highlands of Guinea and
Ethiopia and in the Cameroon mountains. In those areas, CCLM11 shows the largest
absolute deviations from the observations of up to 3 mm/day. Relative biases, on
the other hand, reach values of up to 60 % in the west of the modeled area with
respect to CRU, CHIRPS, GPCC, GPCP, and up to 40 % with respect to TRMM.
For the Sudan area with very little absolute precipitation, the relative biases exceed
C60 %. In general, CCLM11 underestimates the observed values and extends the
rain band further north than observed. As a consequence, the CCLM11 run shows
a large wet bias in the north, where the 200 mm isohyet is located at about 17.5 ı N
(16 to 16.5 ı N for the observations). At the same time, it exhibits a dry bias in the
Golf of Guinea. An improvement is seen at the western coastline Africa and near
Mont Cameroon regions, where the rainfall observed in CRU and GPCC is better
reproduced in CCLM11.
In Fig. 3, we compare the observed and simulated longitudinal rainfall evolution
in JJAS averaged along 10 ı W and 10 ı N over the period 1979–2013. The CCLM11
simulations underestimate the maximum precipitation in the region between 5 ı N
to about 12 ı N in all months considered. The underestimation ranges from 80
to 100 mm, and the greatest value can be found in August. The reason for this
shortcoming can be attributed to the limited ability of CCLM to fully transport the
moisture from the ocean to the inland regions [16]. The simulated rainfall intensity
decreases between 14 and 25 ı N with a similar spatial pattern as in observations,
but associated with a small bias in range of 5 mm. In summary, the CCLM11
simulations reveal some errors in capturing the position of the rainfall peak, shifting
Simulation of the Rain Belt of the West African Monsoon 551
Fig. 2 Observed CRU (a), CHIRPS (b), GPCC (c), GPCP (d), TRMM (e) and simulated CCLM11
(f) JJAS rainfall distribution for the period 1979–2013
it 3ı further north than observed. The precipitation bias varies substantially from
June to August and ranges from 10 mm in June to over 90 mm in August.
The mean seasonal rainfall (mm/day) at the four selected longitudes for the
period 1979–2013 is shown in Fig. 4. Close to the western continental coast (A), the
simulated seasonal rainfall matches the observations well with peak values of up to
15 mm/day at the southern coastline. Further inland (B), the model underestimates
the observed precipitation in the southern areas up to 12 ı N, but matches the
observations furthern northwards. For the longitudinal positions C and D, the model
exhibits a northward shift of the rain band with a corresponding underestimation
in the south. In both cases these deviations are a consequence of the orographic
features, which are prominent in the south but less so in the north. Examples
herefore are Lake Volta in Ghana and Jos Plateau in Nigeria, which are located
between 5 and 10 ı N along the transects C and D, respectively. In the transitional
arid zone between 12 and 18 ı N, the CCLM11 model run produces excessive rainfall
amounts. This is more potentially be attributed to inaccurate time invariant data used
in the model (e.g., land use and soil characteristics) as a result of poor observational
coverage [6].
552 D. Dieng et al.
Fig. 3 Zonal average (10 ı W–10 ı E) of the observed and simulated mean rainfall (mm) for the
months of June, July, August and September over the period 1979–2013
Fig. 4 Observed and simulated mean (June-July-August-September) rainfall (mm/day) for the
period 1979–2013. Values averaged over 15 ı W (a), 10 ı W (b), 0 ı W (c) and 10 ı E (d)
Fig. 5 Time-Latitude diagrams of monthly mean precipitation (in mm) averaged between 10ı W
and 10ı E from CRU (a, b), CHIRPS (c, d), GPCC (e, f), CCLM11 (g, h) in the dry year 1983 (left
panel), the wet year 1999 (right panel)
Simulation of the Rain Belt of the West African Monsoon 555
Anomalies [mm]
Fig. 6 Interannual variability of precipitation amount (upper plot) and precipitation anomalies
(bottom plot) over West Africa from 1981 to 2010 for CHIRPS
556 D. Dieng et al.
4 Conclusion
Our results show that the high-resolution CCLM11 control run is able to reproduce
the observed main climate characteristics, including the annual cycle of the West
African Monsoon, within the range of comparable RCM evaluation studies. Despite
the increased resolution, we find a northward shift of the monsoon rain band by
about 3ı , which results in a dry bias at the Coast of Guinea and a wet bias in the
northern areas around 15 ı N during the peak of the monsoon season in JJAS. The
fact that the higher resolution of 12 km does not improve significantly the model
results compared to existing lower-resolution experiments and our own CCLM44
simulation could be attributed to the fact that the monsoon precipitation over West
Africa is dominated by convective rainfall [2]. At resolutions larger than 10 km,
convective processes in the models are parameterized and therefore represented
implicitly rather than calculated explicitly. We hypothesize that a convection-
permitting resolution below 5–10 km will be able to address these deficiencies.
However, the computational requirements forbid the application of such a high-
resolution in long-term climate simulations.
A second aspect is the above-mentioned representation of time-invariant data sets
in the model, such as surface and soil characteristics. Work is underway within the
WASCAL program to address the poor data coverage on land and soil characteristics
over West Africa using high-resolution satellite-composite products [5]. It is
expected that integrating this high-resolution data in the physical parameterizations
and the microphysics of the CCLM model will reduce the model bias.
Fig. 7 Scaling plot for D2 Domain (black line indicates ideal scaling)
Acknowledgements This work was study funded by the German Federal Ministry of Science
and Education (BMBF) within the WASCAL project. The authors thank the Steinbuch Centre for
Computing (SCC) for providing access to the ForHLR I supercomputer.
1. Browne, N.A.K., Sylla, M.B.: Regional climate model sensitivity to domain size for the
simulation of the West African Summer Monsoon Rainfall. Int. J. Geophys. 2012, 17 (2012)
2. Diaconescu, E.P., Gachon, P., Scinocca, J., Laprise, R.: Evaluation of daily precipitation
statistics and monsoon onset retreat over Western Sahel in multiple data sets. Clim. Dyn. 45(5–
6), 1325–1354 (2014)
3. Doms, G., Förstner, J., Heise, E., Herzog, H., Mironov, D., Raschendorfer, M., Reinhardt,
T., Ritter, B., Schrodin, R., Schulz, J.-P., et al.: A description of the nonhydrostatic regional
COSMO model. Part II: Physical Parameterization, p. 154 (2011)
4. Funk, C.C., Peterson, P.J., Landsfeld, M.F., Pedreros, D.H., Verdin, J.P., Rowland, J.D.,
Romero, B.E., Husak, G.J., Michaelsen, J.C., Verdin, A.P., et al.: A quasi-global precipitation
time series for drought monitoring. U.S. Geolog. Surv. 832(4), 4 (2014)
5. Gessner, U., Niklaus, M., Kuenzer, C., Dech, S.: Intercomparison of leaf area index products
for a gradient of sub-humid to arid environments in West Africa. Remote Sens. 5(3), 1235–
1257 (2013)
6. Guillod, B.P., Davin, E.L., Kündig, C., Smiatek, G., Seneviratne, S.I.: Impact of soil map
specifications for European climate simulations. Clim. Dyn. 40(1–2), 123–141 (2013)
558 D. Dieng et al.
7. Huffman, G.J., Bolvin, D.T., Nelkin, E.J., Wolff, D.B., Adler, R.F., Gu, G., Hong, Y., Bowman,
K.P., Stocker, E.F.: The TRMM multisatellite precipitation analysis TMPA: quasi-global,
multiyear, combined-sensor precipitation estimates at fine scales. J. Hydrometeorol. 8(1), 38–
55 (2007)
8. Kaspar, F., Cubasch, U.: Simulation of East African precipitation patterns with the regional
climate model CLM. Meteorol. Z. 17(4), 511–517 (2008)
9. Kothe, S., Ahrens, B.: On the radiation budget in regional climate simulation for West Africa.
J. Geophys. Res. Atmos. 115(D23), 12 (2010)
10. Kotlarski, S., Keuler, K., Christensen, O.B., Colette, A., Déqué, M., Gobiet, A., Goergen, K.,
Jacob, D., Lüthi, D., van Meijgaard, E., et al.: Regional climate modeling on European scales:
a joint standard evaluation of the EURO-CORDEX RCM ensemble. Geosci. Model Dev. 7(4),
1297–1333 (2014)
11. Lawrence, P.J., Chase, T.N.: Representing a new MODIS consistent land surface in the
Community Land Model (CLM 3.0). J. Geophys. Res. 112(G1), 17 (2007)
12. Masson, V., Champeaux, J.L., Chauvin, F., Meriguet, C., Lacaze, R.: A global database of land
surface parameters at 1 km resolution in meteorological and climate models. J. Clim. 16(9),
1261–1282 (2003)
13. Masson, V., Champeaux, J.-L., Chauvin, F., Meriguet, C., Lacaze, R.: A global database of
land surface parameters at 1 km resolution. Meteorol. Appl. 12(1), 29–32 (2005)
14. Mitchell, T.D., Jones, P.D.: An improved method of constructing a database of monthly climate
observations and associated highresolution grids. Int. J. Climatol. 25(6), 693–712 (2005)
15. Nikulin, G., Jones, C., Giorgi, F., Asrar, G., Büchner, M., Cerezo-Mota, R., Christensen, O.B.,
Déqué, M., Fernandez, J., Hänsler, A., et al.: Precipitation climatology in an ensemble of
CORDEX-Africa regional climate simulations. J. Clim. 25(18), 6057–6078 (2012)
16. Panitz, H.-J., Dosio, A., Büchner, M., Lüthi, D., Keuler, K.: COSMO-CLM (CCLM) Climate
simulations over CORDEX-Africa domain: Analysis of the ERA-interim driven simulations at
0.44ı and 0.22ı resolution. Clim. Dyn. 42(11–12), 3015–3038 (2014)
17. Rockel, B., Geyer, B.: The performance of the regional climate model CLM in different climate
regions, based on the example of precipitation. Meteorol. Z. 17(4), 487–498 (2008)
18. Schneider, U., Becker, A., Meyer-Christoffer, A., Ziese, M., Rudolf, B.: Global precipitation
analysis products of the GPCC. Deutscher Wetterdienst (2011)
19. Smiatek, G., Rockel, B., Schättler, U.: Time resolution data preprocessor for climate version
of the COSMO Model COSMO-CLM. Meteorol. Z. 17(4), 395–405 (2008)
Anthropogenic Aerosol Emissions and Rainfall
Decline in South-West Australia
1 Introduction
for agriculture. Naturally, the question arises to what extent this decline in rainfall is
human-induced and how much global and local environmental changes contribute
to it. This topic motivated numerous studies and measurement campaigns already in
the 1970s and has been debated widely since then [4–6, 10, 11, 20].
Delworth et al. [10] used the global climate model GFDL (General Fluid
Dynamics Laboratory) CM2.5 to analyse the causes of the decline in precipitation.
In [11], they concluded that many aspects of the observed reduction in rainfall can
be attributed to anthropogenic changes in levels of greenhouse gases and ozone
in the atmosphere, whereas anthropogenic aerosols do not contribute significantly.
This stands in contrast to numerous studies of the impact of aerosols on the build-
up of clouds and precipitation through the formation of cloud particles and by
exerting persistent radiative forcing on the climate system that disturbs dynamics
[28]. Lee and Feingold [21] investigated aerosol effects on cloud field properties of
convective clouds and concluded that aerosols do have substantial influence on the
spatiotemporal distribution of convection and precipitation. The coarse resolution
of 50 km of the global model used by [10] and the simplified treatment of aerosols
therein may explain this discrepancy.
Bates et al. [4] reported from rainfall observations that the decrease in pre-
cipitation occurred in two distinctive steps around 1975 and 2000 rather than
continuously. Likewise, they demonstrated a clear stepwise decrease in stream flow
measurements at those times. Karoly [20] also stressed that the simulations of [10]
underestimate the decline in rainfall and that their identified drivers for this decline
are usually associated with changes to the Southern Hemisphere climate in summer,
while the bulk of the precipitation in this region occurs in austral winter.
Changes in atmospheric circulation due to a constant rise in greenhouse gases
and a constant depletion of ozone are large-scale features and usually induce a
continuous change in precipitation on longer time scales than observed in parts of
South-West Australia. Local changes to the environment, on the other hand, may
have the potential to alter the local climate on very short time scales. With respect
to the sudden drop in precipitation in the 1970s, several human-induced factors
occurred just before or at this time:
1. The conversion of natural forest to agricultural land after World War II led to
an almost complete deforestation of a 130,000 km2 area by 1968, previously a
biodiversity hotspot and now known as the “wheatbelt” [8, 25]. The deforestation
had a strong impact on the aerosol concentration in this region through direct
effects [13] and indirect effects [17, 24] with a time lag of about 15 years. Andrich
and Imberger [3] compared coastal and inland rainfall and showed empirically
that land clearing alone can account for 55–62 % of the observed decline in
precipitation for the wheatbelt area south-east of the vermin fence. This is also
supported by modelling experiments by [18], who showed that deforestation has
been causing rainfall declines in this area, albeit their study was limited to two
single events.
2. In 1966, the Muja Power Station was commissioned 22 km east of Collie (see
Fig. 1). The coal power plant had a total output of 974 MW and as such was the
Anthropogenic Aerosol Emissions and Rainfall Decline in South-West Australia 561
a b
Fig. 1 Model topography (left, terrain height in m) and land-use classification (right) for a subset
of the 3.3 km domain, labelled as West Australia (WA). Indicated are the three regions West Coast
(WC), Perth/Freemantle (PF) and Back Country (BC) used in the analysis, as well as the location
of Perth (P) and Muja Power (M). The black dots represent meteorological stations of the BOM [9]
with a data availability of 90 % or more between 1920 and 2015. The dominant land-use categories
are evergreen broadleaf forest (red), woody savannas (ochre) and croplands (cyan), which clearly
mark the north-eastern border of the wheatbelt
largest source of aerosols in this region. Further power stations burning coal from
the Collie mine were added in 1973 (Kwinana, varying several times between
coal, gas and oil), in 1999 (Collie B) and in 2009 (Bluewaters). Additionally,
the Kwinana refinery was continuously enlarged and eventually became the
largest refinery in Australia. Airborne measurements taken during several flight
campaigns in the 1970s led to an estimated total flux of 4 1019 particles per
second from the Perth/Freemantle area, including the Collie region 150–200 km
to the south-east, equivalent to a CCN production rate of 1 1019 particles per
second [2]. This value is close to the total natural CCN production of all of
Australia at that time [6].
Global circulation models suffer from a simplified treatment of aerosols and a
relatively coarse resolution. Key components for the generation of rainfall such
as convection and the interaction of cloud-condensating nuclei (CCN) and ice
nuclei (IN) cannot be resolved and therefore are parameterised. For instance, in
GFDL CM2.5, only direct aerosol effects are included implicitly in the model [10].
Regional climate models, on the other hand, have been taken to higher and higher
resolution over recent years. New, sophisticated physics schemes have been added
to explicitly treat the transport, growth and interaction of CCN and IN.
The Weather and Research Forecasting tool WRF [27] is widely used in
numerical weather prediction and regional climate simulations. Since version 3.6,
released in April 2014, the ARW (Advanced Research WRF) core of WRF contains
an aerosol-aware microphysics option, the Thompson and Eidhammer scheme. Its
main features are a fundamental, first order aerosol treatment and a direct coupling
with radiation for aerosol indirect effects, which allows it to simulate the impact of
aerosols on local weather and climate at a moderate increase in computational costs.
In a first test of their new aerosol-aware scheme, Thompson and Eidhammer [29]
562 D. Heinzeller et al.
2 Methods
We refer to the Bureau of Meteorology Daily Rainfall Climate Data [9] for the re-
analysis of the decrease in precipitation over an extended period from 1920 to 2015.
The starting date is chosen to guarantee a sufficiently large number of recording
stations with high availability of data (90 % or more) for the entire period. Figure 1
displays all available stations in the area of study. A simple quality control is applied
to the data to filter stations with zero or excessive annual precipitation.
In addition, we use gridded rainfall and near-surface air temperature data from
the University of Delaware (UDEL) long term monthly means v3.01 [34] and from
the Climate Research Unit (CRU) high-resolution time series data set v3.23 [15] at
0:5ı 0:5ı spatial resolution. Lastly, we include the HadSLP2 gridded global sea
level pressure anomalies [1] at 5ı 5ı spatial resolution in our analysis.
We employ version 3.6.1 of the regional climate model WRF-ARW, released August
2014, to study the effect of aerosols and changes in their concentration on local
weather and climate at very high resolution for a 4-year period from 1970 to 1974.
Anthropogenic Aerosol Emissions and Rainfall Decline in South-West Australia 563
Fig. 2 Triple-nested domain configuration with 30, 10 and 3.3 km resolution. Lateral boundary
conditions are taken from ERA40 re-analysis data at 1:0ı 1:0ı spatial resolution (110 km)
CCN are referred to as water-friendly aerosol particles and are created by summing
sulphates, sea salts and organic carbon, while IN are referred to as ice-friendly
aerosol particles and are created by summing five size bins of dust. Aerosols
are treated in a fundamental, first order approach through activation of CCN
and IN, depletion of aerosols (precipitation scavenging) and simplistic aerosol
replenishment (surface emissions). The microphysics scheme is coupled directly to
the RRTMG longwave (LW) and shortwave (SW) radiation schemes to in principle
account for aerosol direct and indirect effects. It is important to note that in WRF
version 3.6.1, this coupling is not complete: The calculation of the aerosol optical
depth (AOD) is not informed by the new Thompson-Eidhammer scheme and thus
assumes climatological aerosol concentrations (aerosol direct effect). However, the
size of the aerosol particles, emitted by anthropogenic sources such as power plants
and smelters in Australia and representing the bulk of the increase in aerosol
concentration in the 1970s, has been measured for sizes between 5 and 100 nm [16].
While their exact size depends on the distance from the source and the available
time for coagulation and growth [7, 16, 17], they are well below the range in
which direct effects through scattering and absorption are important (>300 nm).
Apart from the additional treatment of aerosols, physical consistency between the
Thompson scheme and this new microphysics scheme is ensured. This allows us
to assess the effect of the aerosol treatment or, more precisely, the aerosol indirect
effects of small, anthropogenic aerosol particles, on the model results.
The aerosol scheme is not coupled to any cumulus scheme, which means that
there is no depletion of aerosols by convective precipitation and no sub-grid scale
aerosol activation. It is thus required to use a very high, convection-resolving
horizontal resolution. Further, a relatively high vertical resolution and the activation
of the new namelist variable scalar_pblmix are required to ensure that aerosols
get mixed in the vertical by sub-grid turbulence. The specification of aerosols can
be handled in two primary ways: (1) external data sets from climatology or other
(chemistry) models, or (2) simplified vertical profiles prescribed in the model.
To fulfil the above requirements, we use a horizontal resolution of 3.3 km for the
innermost domain. At this resolution, it can be assumed that convection is resolved
at grid scale [23, 33]. To achieve a sufficiently high vertical resolution, in particular
in the planetary boundary layer, we use 75 vertical levels with a lowermost level
height of 25 m and 20 levels in the first 1000 m above surface. Such a small vertical
grid spacing implies a reduction of the typical time step of 18 s (6 s per km horizontal
resolution) to 4 s for model stability.
The initial aerosol concentrations are specified as simplified vertical profiles.
The default vertical profile in WRFV3.6.1 depends on the terrain height and was
designed to fit the continental U. S., for which the near-surface value is found to exist
within an idealised boundary layer of approximately 200–1000 m, depending on
starting elevation. An exponential decay of aerosol number from the higher numer-
ical value in the boundary layer to the lower free tropospheric number is used to
complete the vertical profile (Greg Thompson, private communication). This profile
is adapted to describe different aerosol concentrations for South-West Australia:
First, a standard vertical profile was created based on the airborne measurements
Anthropogenic Aerosol Emissions and Rainfall Decline in South-West Australia 565
and analysis of [5–7, 17], reflecting the clean environmental conditions prior to
the commissioning of the Muja, Kwinana and Collie coal power plants. The initial
profile is applied once at the starting time of the model integration and for every
grid point, both over land and sea. During the model integration, the CCN and IN
variables are advected and diffused exactly as other scalars (e.g. cloud ice number
concentration). A simplified surface aerosol emission tendency is computed as a 2D
field based on the horizontal grid spacing and starting aerosol number concentration
for the CCN variable [29]. No surface emission tendency is applied for IN in this
version of the code. The 2D tendency field is added each time step to the first model
vertical level CCN value.
In this study, we address two questions. Firstly, we investigate the changes in the
simulated weather and climate when aerosols are considered in the microphysics,
using the initial aerosol profile presented above. Secondly, we study the impact
of changes in the aerosol concentration from the clean environmental conditions
to a polluted environment through modifications of the initial aerosol profile and
the surface emission rates. In total, we conduct four model runs on the innermost
domain for the period 1970–1974:
1. Standard run (wrf-std): In this configuration, we use the default Thompson
microphysics scheme, which is coupled to the RRTMG LW/SW schemes, but
does not treat aerosols explicitly. Because of the physical consistency between
the Thompson and the Thompson-Eidhammer schemes, this run can be compared
directly to the following runs to assess the impact of adding aerosol physics to
the WRF model.
2. Aerosol run (wrf-aero): Here, we use the aerosol-aware Thompson-Eidhammer
microphysics scheme with the initial aerosol profile and surface emission rates
for South-West Australia. This run allows us to investigate the effect of aerosols
(natural and anthropogenic) on the model without the contribution of the Muja
Power Station or other large pollutants.
3. Aerosol boost run (wrf-aerox3): This configuration is identical to the aerosol
run, but uses an initial aerosol profile trice as large for both CCN and IN, and
accordingly, the CCN surface emission rate is also tripled. This run describes in
a simple way the increase in aerosol concentrations due to the commissioning
of the Muja Power Station and other sources of anthropogenic aerosols. The
increase by a factor of three is motivated by differences in measured aerosol
concentrations in the vicinity and far distance of the larger pollutants in the
Muja/Collie area.
4. Muja Power run (wrf-muja): This configuration is identical to the aerosol run, but
contains an additional source of anthropogenic aerosols injected into the model
at the location of the power plant and with an emission rate as derived from
observations. A total emission rate of 4:6 108 particles=.kg s/ is added to the
surface emissions in a circle sector with 20 km radius and 35ı opening angle
in direction north-east at the location of the Muja Power Station (116:26 ıW,
33:34 ı S). To account for the elevated emission from power plants at 250–400 m
above ground, this additional source term is distributed evenly across the first
1500 m in height at every grid point in this sector.
566 D. Heinzeller et al.
We refer to previous work [12, 29] for a recommended WRF model configuration
for this specific region and research question, which is summarised in Table 2 in
the Appendix. It is important to remember that the differences between the standard
Thompson scheme and the aerosol-aware Thompson-Eidhammer scheme are in the
consideration of the aerosol indirect effects only: While the aerosol-aware runs
compute size and number concentration of aerosols and thus cloud droplet numbers
consistently, the standard run uses prescribed values for the cloud number droplets
in the microphysics scheme. For both schemes, the aerosol direct effect is included
through the calculation of the aerosol optical depth using climatological aerosol
concentrations, which are independent of the initial aerosol profiles used for the
different high-resolution model runs.
Figure 3 displays the annual precipitation compiled from the [9] Daily Rainfall Cli-
mate Data (BOM hereafter) for the entire year, the wet season April to September,
and the dry season October to March for the areas described in Fig. 1. Displayed
are the mean of all stations with a data availability of 90 % or more and the stations
with maximum and minimum rainfall over the entire period 1920–2015. Stations
with minimum rainfall show no decrease over the entire period, while stations
with maximum rainfall exhibit a dramatic decline in rainfall predominantly in the
wet season. Independent of the area, the decrease in mean annual precipitation is
observed entirely in the rainy season, while there is no change in the low amount
of rainfall during the dry season. We would like to note here that the selection of
stations used in the analysis matches that of [3] with the exception that the quality
control flags of the station data and possible relocations of stations are not taken into
Stepwise and linear fits to the mean annual rainfall are displayed with their
corresponding r2 coefficients of determination. The data presented here and the
fits to it at first glance contradict the current perception of a sudden, step-wise
decrease in precipitation in the 1970s and at the beginning of the twenty-first century
in South-West Australia [4]. A single, step-wise fit to the data results in a step
of 246 mm=a (PF) to 42 mm=a (WA) around 1970 with a similar correlation
coefficient as a continuous, linear decline by 430 mm=a (PF) to 71 mm=a (WA)
from 1920 to 2015. For the three regions WA, WC and BC, the observations and our
fits to them imply that a continuous decline in annual precipitation by 20 % between
1920 and 2015 matches the observations as well as a sudden drop by 10 % around
Anthropogenic Aerosol Emissions and Rainfall Decline in South-West Australia 567
Fig. 3 Annual precipitation compiled from the [9] Daily Rainfall Climate Data for the areas
described in Fig. 1 (top to bottom: WA, WC, PF, BC) and for the entire year (left), the wet season
April to September (middle), and the dry season October to March (right). Displayed are the
mean of all stations with a data availability of 90 % or more and the data of the station with the
maximum/minimum rainfall over the entire period. Stepwise and linear fits to the mean annual
rainfall are displayed with their r2 coefficients
The extreme numbers for PF are a result of a very small number of three
stations only that meet our criteria on data availability and for which two of the
stations, 9010 Churchman Brook and 9031 Mundaring Weir, show a significant
decrease in precipitation from 1500 mm=a to about 1000 mm=a for Churchman
568 D. Heinzeller et al.
Brook, and from 1100 mm=a to about 800 mm=a for Mundaring Weir, respectively.
For these particular stations, the long-term records indeed resemble a sudden
decrease in annual rainfall precipitation around 1970. It is interesting to note that
both Churchman Brook and Mundaring Weir are located in the Perth catchment
area at the Canning/Mundaring surface water storages and that the sudden drop
in precipitation around 1970 for these stations fit perfectly with the observed dam
levels displayed in [20], Fig. 1a.
Hence, our general findings of a continuous decline in precipitation for the WA,
WC and BC areas do not contradict the sudden drop in observed river discharges
reported by [4] and [20] for the Perth catchment area, for which small changes
in circulation may have led to shifts in precipitation bands on a regional scale. In
addition, anthropogenic factors such as irrigation and deforestation [3] may have
influenced the dam water levels at that time.
Our findings in the previous section suggest that observational rainfall data can be
interpreted as a continuous decline in precipitation or as a sudden decrease around
1970, depending on the area. In the following, we address the question of whether a
sudden increase in small, anthropogenic particles can in principle cause such a drop
in precipitation through first and second aerosol indirect effects.
Figure 4 displays the CCN number concentrations for the three different high-
resolution model runs using aerosol-aware microphysics (wrf-aero, wrf-aerox3,
wrf-muja) as contour-plot average for the wet season (April–September) and the
dry season (October–March) 1970–1974. Near-surface wind vectors are overlaid on
the contour plots. The CCN number concentration of the wrf-muja run, averaged
over the area WA, exceeds that of the wrf-aerox3 run, i.e. three times the conditions
prior to the commissioning of Muja Power and other large sources of anthropogenic
aerosols. Despite being emitted by a tiny area around the location of Muja Power
Station, the ultrafine aerosol particles are distributed widely and result in a higher
CCN concentration along the West Coast and over the wheatbelt. While the
annual average shows a symmetrical distribution around the emitting source (not
displayed), the direction in which the ultrafine aerosol particles travel depends on the
seasonality of the near-surface winds. In austral summer, the dominant near-surface
wind direction is towards the north-west over land, while in austral winter the bulk
of the CCN are pushed to the south-east of the area WA (Fig. 4). This suggests
that the emissions from Muja Power are advected horizontally rather than mixed
up into higher layers. The CCN number concentration varies with time around its
Anthropogenic Aerosol Emissions and Rainfall Decline in South-West Australia 569
Fig. 4 CCN number concentrations [kg1 ] for the wet season and the dry season averages 1970–
1974, summed up over the entire column at each grid point. Overlaid are 10 m surface winds. The
black dots represent the positions of the BOM stations Churchman Brook (9010) and Mundaring
Weir (9031), the white dot the location of the Muja Power station
initial values as a result of the continuous removal and replenishment through CCN
activation, rain/snow/graupel collecting aerosols, cloud/rain evaporation and surface
3.2.2 Precipitation
The main focus of our investigation is the change in model precipitation when
including aerosol physics in the microphysics schemes (i.e. the difference between
wrf-std and the aerosol-aware runs), and when changing the aerosol concentration
(i.e. the difference between wrf-aero and wrf-aerox3/wrf-muja). Figure 5 displays
the mean monthly rainfall for the different model runs and for observational data
from UDEL and BOM for the four regions of interest over land only, averaged over
the entire period 1970–1974. The monthly precipitation is derived as the average
of all stations in the region with a data availability of 90 % or more between 1970
and 1974 (BOM) or all grid points in the corresponding region (all others). Averaged
over the entire region WA and the Back Country region BC, the gridded observations
from UDEL and the station data from BOM agree well. For WC and even more so
for PF, the UDEL observations show consistently smaller values than the BOM
station data, which is a result of the small number of stations contributing to the
BOM average and the “outlier” stations Churchman Brook and Mundaring Weir.
570 D. Heinzeller et al.
Fig. 5 Mean monthly precipitation 1970–1974 for the different model runs and observational data
sets. For each region, the mean value is taken over all stations (bom) or over all grid points (all
others), respectively
Table 1 Accumulated precipitation [mm] at the end of the 4-year simulation period from 1
January 1970 to 1 January 1974 for the different model runs and observations from UDEL, for
each of the regions over land only and split into wet season and dry season
Region Season UDEL wrf-30 km wrf-10 km wrf-std wrf-aero wrf-aerox3 wrf-muja
Accumulated precipitation [mm]
WA Wet 1347 787 1075 1575 1758 1679 1737
Dry 508 531 655 725 817 759 786
WC Wet 2193 1226 1549 2207 2379 2321 2352
Dry 535 383 492 562 633 598 610
PF Wet 2342 1124 1560 2372 2603 2525 2587
Dry 470 332 403 495 564 515 526
BC Wet 940 675 918 1282 1435 1353 1412
Dry 461 538 658 733 859 791 818
and for WA/BC during the dry season (not displayed). We speculate that this is due
to the improved representation of the topography, i.e. the mountainous region in
the north-east, in the 3.3 km models, while the coastal regions on average exhibit a
smaller interannual variability due to the dominant wind direction from the south-
east, i.e. over dry continental planes.
Table 1 compares the total amount of modelled precipitation, accumulated over
the entire 4-year period 1970–1974, to the UDEL observations for each of the four
regions and split into dry season and wet season. Averaged over the areas WA
and BC, the coarse-resolution runs wrf-30 km and wrf-10 km exhibit smaller errors
than the high-resolution runs. The coastal areas WC and PF are fit significantly
better by the high-resolution runs, in particular for the wet season in austral
winter. Among the four high-resolution runs, the standard run wrf-std with least
precipitation performs best and in particular matches the observed WC and PF
winter precipitation closely.
By comparing the accumulated precipitation values of the wrf-std and the wrf-
aero runs, we find that the addition of aerosol physics with pre-1960s aerosol
concentrations to the microphysics scheme leads to an increase in rainfall by 8.1 %
(WC), 9.5 % (PF), 10.6 % (WA) and 12.1 % (BC). This effect is more pronounced
for the dry season (between 11.2 % and 14.6 %) than for the wet season (between
7.2 % and 10.6 %). Increasing the aerosol concentration, however, reduces the
amount of precipitation in all areas: Of the three aerosol runs, wrf-aero shows the
largest values of accumulated precipitation and wrf-aerox3 (with 3 times larger CCN
and IN concentrations) shows the smallest values: Compared to the pre-1960s run,
precipitation is reduced by 3.1 % for WC, 4.0 % for PF, 5.4 % for WA, and 6.5 %
for BC. Again, this effect is more pronounced for the dry season (between 5.5 %
and 8.7 %) than for the wet season (between 2.4 % and 4.4 %). The wrf-muja run
with additional CCN emissions from Muja Power, but otherwise standard aerosol
concentrations from wrf-aero, lies in between.
572 D. Heinzeller et al.
Acknowledgements The modelling experiments presented here required more than 2 Mio CPUh
and were conducted on the Karlsruhe Institute of Technology Steinbruch Centre for Computing
(KIT-SCC) ForHLR1 supercomputer. The authors acknowledge the European Centre for Medium-
Range Weather Forecasts (ECMWF) for the dissemination of ERA40, the NOAA/OAR/ ESRL
PSD, Boulder for providing the UDEL air temperature and precipitation data and the HadSLP2
sea level pressure data, the University of East Anglia, Climate Research Unit, for access to the
CRU air temperature and precipitation data, and the Bureau of Meteorology, Australia, for the
dissemination of the daily rainfall climate data. The authors are particularly grateful for the support
of Greg Thompson (NCAR) in the design of the experiment and the setup of the WRF model.
Table 2 WRF model configuration for the different domains at 30 km, 10 km and 3.33 km
resolution and for the different types of high-resolution runs
Run wa-30 km wa-10 km wa-std wa-aero/
Microphysics Thompson Thompson Thompson Thompson-
Cumulus BMJ BMJ Off Off
Surface layer Janjic Eta Janjic Eta Janjic Eta Janjic Eta
Land-surface Noah LSM Noah LSM Noah LSM Noah LSM
Scalar PBL mix On On On On
Grid FDDA Above PBL Off Off Off
o3input 2 2 2 2
aer_opt 1 1 1 1
Domain size 190 132 75 199 199 75 304 298 75 304 298 75
Time step 120 s 40 s 4s 4s
Rad. time step 30 m 10 m 3m 3m
Forcing interval 6 h 3h 3h 3h
Computational setup
Nodes 4 4 24 24
Total tasks 80 80 480 480
Anthropogenic Aerosol Emissions and Rainfall Decline in South-West Australia 575
1. Allan, R., Ansell, T.: A new globally complete monthly historical gridded mean sea level
pressure dataset (HadSLP2): 1850–2004. J. Clim. 19, 5816–5842 (2006)
2. Andreae, M.O.: Correlation between cloud condensation nuclei concentration and aerosol
optical thickness in remote and polluted regions. Atmos. Chem. Phys. Discus. 8(3), 11293–
11320 (2008)
3. Andrich, M.A., Imberger, J.: The effect of land clearing on rainfall and fresh water resources in
Western Australia: a multi-functional sustainability analysis. Int. J. Sustain. Dev. World Ecol.
20(6), 549–563 (2013)
4. Bates, B.C., Hope, P., Ryan, B., Smith, I., Charles, S.: Key findings from the Indian Ocean
climate initiative and their impact on policy development in Australia. Clim. Change 89(3–4),
339–354 (2008)
5. Bigg, E., Soubeyrand, S., Morris, C.: Persistent after-effects of heavy rain on concentrations of
ice nuclei and rainfall suggest a biological cause. Atmos. Chem. Phys. 15, 2313–2326 (2015)
6. Bigg, E., Turvey, D.: Sources of atmospheric particles over Australia. Atmos. Environ. 12(8),
1643–1655 (1978)
7. Bigg, E.K.: Ice nucleus concentrations in remote areas. J. Atmos. Sci. 30(6), 1153–1157 (1973)
8. Bradshaw, C.J.A.: Little left to lose: deforestation and forest degradation in Australia since
European colonization. J. Plant Ecol. 5(1), 109–120 (2012)
9. Bureau of Meteorology: Daily rainfall climate data: product code IDCJAC0009 (2015)
10. Delworth, T.L., Rosati, A., Anderson, W., Adcroft, A.J., Balaji, V., Benson, R., Dixon, K.,
Griffies, S.M., Lee, H.C., Pacanowski, R.C., Vecchi, G.A., Wittenberg, A.T., Zeng, F., Zhang,
R.: Simulated climate and climate change in the GFDL CM2.5 high-resolution coupled climate
model. J. Clim. 25(8), 2755–2781 (2012)
11. Delworth, T.L., Zeng, F.: Regional rainfall decline in Australia attributed to anthropogenic
greenhouse gases and ozone levels. Nat. Geosci. 7, 583–587 (2014)
12. Fersch, B., Kunstmann, H.: Atmospheric and terrestrial water budgets: sensitivity and per-
formance of configurations and global driving data for long term continental scale WRF
simulations. Clim. Dyn. 42(9–10), 2367–2396 (2013)
13. Gallagher, M.W., Nemitz, E., Dorsey, J.R., Fowler, D., Sutton, M.A., Flynn, M., Duyzer, J.H.:
Measurements and parameterizations of small aerosol deposition velocities to grassland, arable
crops, and forest: influence of surface roughness length on deposition. J. Geophys. Res. Atmos.
107, AAC 8-1–AAC 8-10 (2002)
14. Grabowski, W.W., Morrison, H.: Indirect impact of atmospheric aerosols in idealized simula-
tions of convective-radiative quasi equilibrium. Part II: Double-moment microphysics. J. Clim.
24(7), 1897–1912 (2011)
15. Harris, I., Jones, P.D., Osborn, T.J., Lister, D.H.: Updated high-resolution grids of monthly
climatic observations – the CRU TS3.10 Dataset. Int. J. Climatol. 34, 623–642 (2014)
16. Junkermann, W., Hacker, J.M.: Ultrafine particles over Eastern Australia: an airborne survey.
Tellus B 67, 25308 (2015)
17. Junkermann, W., Hacker, J.M., Lyons, T., Nair, U.: Land use change suppresses precipitation.
Atmos. Chem. Phys. Discus. 9(3), 11481–11500 (2009)
18. Kala, J., Lyons, T., Nair, U.: Numerical simulations of the impacts of land-cover change on
cold fronts in south-west Western Australia. Boun. Layer Meteorol. 138, 121–138 (2010)
19. Kamilli, K.A., Ofner, J., Lendl, B., Schmitt-Kopplin, P., Held, A.: New particle formation
above a simulated salt lake in aerosol chamber experiments. Environ. Chem. 12(4), 489–503
20. Karoly, D.J.: Climate change: human-induced rainfall changes. Nat. Geosci. 7(8), 551–552
21. Lee, S.S., Feingold, G.: Aerosol effects on the cloud-field properties of tropical convective
clouds. Atmos. Chem. Phys. 13(14), 6713–6726 (2013)
576 D. Heinzeller et al.
22. Miguez-Macho, G., Stenchikov, G.L., Robock, A.: Spectral nudging to eliminate the effects
of domain position and geometry in regional climate model simulations. J. Geophys. Res. D
Atmos. 109, D13104 (2004)
23. Prein, A.F., Gobiet, A., Suklitsch, M., Truhetz, H., Awan, N.K., Keuler, K., Georgievski, G.:
Added value of convection permitting seasonal simulations. Clim. Dyn. 41(9–10), 2655–2677
24. Ruprecht, J., Schofield, N.: Effects of partial deforestation on hydrology and salinity in high
salt storage landscapes. II. Strip, soils and parkland clearing. J. Hydrol. 129(1–4), 39–55 (1991)
25. Saunders, D.: Changes in the Avifauna of a region, district and remnant as a result of
fragmentation of native vegetation: the wheatbelt of western Australia. A case study. Biol.
Conserv. 50(1–4), 99–135 (1989)
26. Seifert, A., Köhler, C., Beheng, K.D.: Aerosol-cloud-precipitation effects over Germany as
simulated by a convective-scale numerical weather prediction model. Atmos. Chem. Phys.
12(2), 709–725 (2012)
27. Skamarock, W., Klemp, J., Dudhi, J., Gill, D., Barker, D., Duda, M., Huang, X.-Y., Wang,
W., Powers, J.: A description of the advanced research WRF version 3, NCAR/TN-475+STR.
Technical report (2008)
28. Tao, W.-K., Chen, J.-P., Li, Z., Wang, C., Zhang, C.: Impact of aerosols on convective clouds
and precipitation. Rev. Geophys. 50, RG2001 (2012). doi:10.1029/2011RG000369
29. Thompson, G., Eidhammer, T.: A study of aerosol impacts on clouds and precipitation
development in a large winter cyclone. J. Atmos. Sci. 71, 3636–3658 (2014)
30. Uppala, S., Kållberg, P.W., Simmons, A.J., Andrae, U., Bechtold, V.D.C., Fiorino, M., Gibson,
J.K., Haseler, J., Hernandez, A., Kelly, G.A., Li, X., Onogi, K., Saarinen, S., Sokka, N., Allan,
R., Andersson, E., Arpe, K., Balmaseda, M.A., Beljaars, A.C.M., Berg, L.V.D., Bidlot, J.,
Bormann, N., Caires, S., Chevallier, F., Dethof, A., Dragosavac, M., Fisher, M., Fuentes, M.,
Hagemann, S., Hólm, E., Hoskins, B.J., Isaksen, L., Janssen, P.A.E.M., Jenne, R., Mcnally,
A.P., Mahfouf, J.-F., Morcrette, J.-J., Rayner, N.A., Saunders, R.W., Simon, P., Sterl, A.,
Trenberth, K.E., Untch, A., Vasiljevic, D., Viterbo, P., Woollen, J.: The ERA-40 re-analysis. Q.
J. R. Meteorol. Soc. 131, 2961–3012 (2005)
31. van den Heever, S.C., Stephens, G.L., Wood, N.B.: Aerosol indirect effects on tropical
convection characteristics under conditions of radiative-convective equilibrium. J. Atmos. Sci.
68(4), 699–718 (2011)
32. von Storch, H., Langenberg, H., Feser, F.: A spectral nudging technique for dynamical
downscaling purposes. Mon. Weather Rev. 128, 3664–3673 (2000)
33. Weisman, M.L., Skamarock, W.C., Klemp, J.B.: The resolution dependence of explicitly
modeled convective systems. Mon. Weather Rev. 125(4), 527–548 (1997)
34. Willmott, C.J., Matsuura, K.: University of Delaware Terrestrial Air Temperature and Precip-
itation: Monthly and Annual Time Series (1950–1999) v3.01. http://www.esrl.noaa.gov/psd/
data/gridded/data.UDel_AirT_Precip.html (2014). Accessed 6 Nov 2016
High-Resolution Climate Projections Using
the WRF Model on the HLRS
Abstract The latest generation of climate projections for the twenty-first century
are build on new emission scenarios based on Representative Concentration Path-
ways (RCPs). Within the world wide coordinated effort of the Coupled Model
Intercomparison Project Phase 5 (CMIP5), their impact on climate is simulated
with global general circulation models (GCMs) of the climate system with a spatial
grid of 100–200 km resolution. High resolution information from a robust multi-
model ensemble on possible ranges of future climate changes is essential for climate
impact research and as background information for policy and economy. Within the
Coordinated Regional Downscaling EXperiments (CORDEX), the global climate
simulations are downscaled for most continental regions, e.g. a unique set of high
resolution climate change simulations for Europe is currently established. This
project contributes to this ensemble downscaling five GCM simulations from 1958
to 2100 with the Weather Research and Forecasting (WRF) model. The WRF
simulations are currently performed with 0.44ı and 0.11ı resolution on the CRAY
XC40 at the High Performance Computing Center Stuttgart (HLRS).
First results of the simulations on the 0.44ı grid for the “historical” period
from 1971–2000 and as comparison for two different future scenarios from 2071–
2099 show an increase of the average temperature by 2–4 ı C with respect to the
chosen emission scenario, especially in the southeastern and northeastern part of
Europe. In the future scenario where a moderate Greenhouse Gas emission increase
is projected, the annual average precipitation in Germany is indicated to experience
a decrease by 50–100 l/m2 . Considering the future scenario with a high projected
emission increase, only marginal changes of the annual average precipitation are
The projected increase of the anthropogenic emissions of CO2 and other greenhouse
gases within the next decades will have a considerable influence on the future
climate and consequently on the society. Although climate change is a global issue,
regionally the impact will be much more diverse. General circulation models (GCM)
are currently the most advanced tools for simulating the response of the global
climate system to increasing greenhouse gas concentrations. GCMs typically have
a horizontal resolution of 100–200 km. However, to better understand also regional
climate phenomena such as local extremes and in order to assess the effect of the
expected climate change, scientists and end users like federal agencies and climate
impact and adaptation researchers require projections on the regional scale with a
higher horizontal resolution.
The Coupled Model Intercomparison Project Phase 5 (CMIP5) [1] provides a
framework for coordinated global climate change experiments, where 20 different
modelling groups performed global climate projections with their GCMs. CMIP5
contributed to the latest assessment report of the Intergovernmental Panel on climate
change (IPCC). Those projections are based on Representative Concentration
Pathways (RCPs) [2], representing four different possible greenhouse gas (GHG)
concentration scenarios of the future climate. These scenarios are the RCP8.5,
RCP6, RCP4.5 and RCP2.6 scenario. The number indicates the possible range in
the change of radiative forcing (in W/m2 ) by the year 2100 relative to pre-industrial
CORDEX (http://wcrp-cordex.ipsl.jussieu.fr), the COordinated Regional Down-
scaling EXperiment was established by the World Climate Research Programme
(WCRP) in order to provide ensembles of regional climate simulations on a higher
spatial resolution [3]. The task within CORDEX is to downscale the GCMs which
contributed to the CMIP5 database with regional climate models to continental scale
regions. For EURO-CORDEX, the European branch of the CORDEX initiative,
simulations are done with 0.44ı and 0.11ı resolution (e.g. [4–6]). The higher res-
olution allows the simulation of smaller scale processes and feedback mechanisms
and provides the results on a smaller spatial scale for end users.
The BMBF-funded (Federal Ministry for Education and Research) project
ReKliEs-De (Regional Climate Ensembles Germany) (http://reklies.hlnug.de/) con-
tributes to EURO-CORDEX by carrying out a certain number of regional climate
projections. ReKliEs-De is a nationwide project for more accurate assessment of
regional climate changes. The project aims to identify bandwidths and extreme
values from the results of high-resolution regional climate projections for Germany
and their preparation for Climate Impact Research and Policy Consulting. On this
basis, more detailed studies on changes in the occurrence of extreme precipitation,
drought or extreme heat can be carried out. In this new project the assessment
of possible bandwidth and extreme expressions of these weather events shall be
improved. This will provide more resilient statements and thus the results will be
more usable for providing policy advice and the Climate Impact Research.
High-Resolution Climate Projections 579
For the future climate projections, four different GCMs and two different RCP
scenarios of the CMIP5 project, are applied as boundary forcing with the WRF
model. The “historical” runs of the GCMs cover the period from 1850 to 2005. This
period is forced by observed atmospheric composition changes of anthropogenic
and natural sources. The “RCP” scenarios of the GCM’s cover the period from 2006
to 2100. They represent mitigation scenarios that assume policy actions will be taken
into account to achieve certain emission targets [1]. The numbers of the RCPs give
a rough estimate of the range in the change of the radiative forcing by the year 2100
relative to the pre-industrial values. The forcing data we applied, the resolution of
the GCMs, its scenarios and the chosen simulation period is presented in Table 1.
The simulations are performed with WRF model version 3.6.1, using the CRAY
XC40 System at the HLRS. WRF is coupled with the land surface model NOAH
[8] and applied with the following parameterizations: the Morrison two-moment
microphysics scheme [9], the Yonsei University (YSU) planetary boundary layer
scheme [10], the Kain-Fritsch-Eta convection scheme [11] and the radiation trans-
port scheme CAM for longwave and shortwave radiation [12].
WRF was applied to create climate projections for Europe with the domain being
specified by CORDEX (Fig. 1a). Within ReKliEs-De the focal region of the analyses
is Germany and its contributing river catchment areas (Fig. 1b). The simulations are
forced 6 hourly at the lateral boundaries with data, which is generally available
at approx 1ı –2ı grid resolution (see Table 1). WRF is applied one-way nested in a
nesting approach via 0.44ı –0.11ı resolution. WRF was compiled at HLRS with PGI
14.7 and applied in a hybrid configuration using MP and OpenMP to optimize the
speed of the simulation. During the performance of the simulations a huge amount
of output data emerges. As this amount cannot be stored as raw output, it will
be extracted and minimized within the postprocessing process. Table 2 shows the
technical details of the different simulations performed on hazelhen.
Within 24 h walltime, it was possible to simulate approximately 4 years of the
50 km domain projections and 1 year of the 12 km domain projections respectively.
So far (status April 2016) we were able to downscale the majority of the GCM’s
from 1958–2100 (historical C RCP’s) to the grid size of 50 km. The simulations on
the 12 km grid are about to be finalized for the “historical” (1958–2005) projections
within the next weeks.
Fig. 1 EURO-CORDEX domain (a), and ReKliEs-De area of investigation with orography on
12 km resolution: Germany (red) and river catchments of Danube, Rhine, Elbe, Weser and Ems
(colours) (b)
Table 2 Technical details of the WRF simulations from May 2015 to April 2016
Nr. CPUs April Nr. of
Simulation openMPI Simulation period Nr. of grid cells t (s) 2016 Walltime simulations Raw output size
EURO-CORDEX/ 1536 01.01.1958–31.12.2005 129 139 50 180 Finalized 300 h 4 42 TB (210 GB/year)
ReKlEs-De/ FOR
1696 0.44ı historical
EURO-CORDEX/ 1536 01.01.2006–31.12.2100 129 139 50 180 Soon 570 h 5 99 TB (210 GB/year)
High-Resolution Climate Projections
3 Results
Since the model simulations are still running and the first results only became
available recently, the following analyses is preliminary and shows only first results.
Note that this section describes only exemplary the result of one GCM forcing
one RCM. For a complete analyses of climate projections including bandwidth
estimations it is essential to analyse the data from multi-model ensembles from
different GCMs and RCMs as it will be done within ReKliEs-De.
To simulate the state of the atmosphere, a certain number of input variables are
required by the WRF model. Among them, also the 3-D temperature field is needed
at the lateral boundaries. Figure 2a displays the average temperature of the lowest
model level from the coarse model resolution (150 km) of the MPI- ESM-LR
GCM for Europe. The temperature ranges from about 270 K (3 ı C) to 290 K
(17 ı C) from northern to southern Europe for the “historical” 30 year average 1971–
2000. Figure 2b shows the near surface temperature simulated by the WRF model
on 50 km resolution for the same period and area forced by the MPI-ESM-LR model
(Fig. 2a). As mentioned above, only the EURO-CORDEX domain is simulated
which is given on a rotated grid with lateral borders from around 25ı N to 75ı N
Fig. 2 Average near surface temperature from 1971-2000 from MPI-ESM-LR raw model output
(a) and the downscaled simulation of MPI-ESM-LR from WRF (50 km) (b)
High-Resolution Climate Projections 583
and 30ı W to 50ı E. In the downscaled simulation, the temperature ranges from
around 268 K (5 ı C) in the north to 290 K (17 ı C) in the southern part of the
domain. The mountain elevations increase with model resolution, which explains
that the temperature above Scandinavia and the Alpine region is lower, compared
to the coarse model input. Due to the better orographic resolution especially
of the Pyrenees, the alpine region and the scandinavian mountains in the WRF
50 km model, more details of the temperature pattern are indicated. This example
represents pretty well, that GCMs are able to provide a basic representation of
characteristics of the global climate on a large spatial scale. However, on the regional
scale some important features are neglected due to a too coarse model resolution.
In Fig. 3, the simulated change of the average temperature is shown for the historical
period 1971–2000 and the projection period 2071–2099 simulated with the MPI-
ESM-LR for 2 GHG emission scenarios. The GCM raw model output is given by
Fig. 3a, b whereas Fig. 3c, d depict the WRF downscaled simulations on 50 km
grid resolution. Following the RCP2.6 on the coarse GCM grid (Fig. 3a), the
temperature will increase by 0.5–1 ı C in Europe. Considering the WRF simulation
with the higher resolution, the temperature is increasing by 0.5 to 2 ı C in the
EURO-CORDEX domain. The coarse GCM model for RCP8.5 (Fig. 3b) gives a
temperature increase of 1–3 ı C whereas the downscaled projections on the 50 km
grid reveal an increase of up to 4 ı C in the southeastern part of Europe. The intensity
of the simulated temperature increase for both scenarios is higher when considering
the high resolution projections on 50 km resolution. Another distinctive feature of
the RCM’s (Fig. 3c, d) compared to the GCM’s (Fig. 3a, b) is the indicated opposite
sign of the warming showing a higher and in general a more intense increase of the
average temperature in the eastern part of the domain than in the western part and
over the Atlantic ocean.
(a) (b)
Fig. 3 Difference of near surface temperature from 1971–2000 and 2071–2099 of raw GCM
output for RCP2.6 (a) and RCP8.5 (b). And average temperatures for the same period from the
WRF model (50 km) forced by MPI-ESM-LR for RCP2.6 (c) and RCP8.5 (d)
in Fig. 5a for RCP2.6 and Fig. 5b for RCP8.5. The precipitation in Germany is
projected to experience a slight decrease by about 50–100 l/year on the annual
average regarding RCP2.6, a scenario with moderate increase of GHG emissions.
For RCP8.5, where a rather strong increase of GHG emissions is assumed,
the precipitation changes in Germany are projected to be moderate and smaller
High-Resolution Climate Projections 585
Fig. 4 Annual average precipitation from 1971–2000, simulated by WRF on the 50 km grid,
forced with MPI-ESM-LR
Fig. 5 Difference of the annual average precipitation from 1971–2000 and 2071–2099 simulated
by WRF on 50 km grid. MPI-ESM-LR RCP2.6 forcing (a) and RCP8.5 forcing in (b)
586 V. Mohr et al.
compared to the RCP2.6 scenario varying from 50 to C50 l/year. Especially in
the southeastern part of Europe a strong decrease of the annual precipitation is
Although precipitation changes appear to be small for Germany on the annual
average, on the seasonal scale the changes might be more intense and hence much
more significant for the environment and the society than indicated by the annual
average precipitation.
The results highlighted some preliminary results of the downscaling of one GCM
(MPI-ESM-LR) on the coarse model grid (150 km) with WRF to a refined grid
(50 km) for a certain domain in Europe for 2 scenarios. The differences among
GCM and RCM simulations demonstrated the need of projections using a higher
spatial resolution in order to evaluate possible changes of the climate on a regional
scale. The scope of the WRFCLIM project is to investigate the performance and
benefit on a higher resolution of 12 km which is demanded by EURO-CORDEX.
It is expected that simulations on a higher resolution improve their projection skill
due to a better representation of the orographic effects. Especially metrics with a
higher spatial variation will experience an improved representation e.g. like it was
shown for the precipitation in[4] and [6].
High resolution simulations with WRF forced by 4 different GCM models and
two RCP scenarios are currently beeing downscaled from 2000 to 2100 onto the
12 km grid. Due to variable and often long queueing times (2–7 days), it is difficult
to predict the time the simulations are finished. The final projections should be
realized by the end of the current year (2016) to fullfill ReKliEs-De objectives in
From the scientific point of view, besides the evaluation of the climate projections
on the high resolution grid, also monthly and seasonal timescales need to be
investigated in more detail as temperature or precipitation extremes are highly
variable throughout the year within Germany.
The main objective of ReKliEs-De within the WRFCLIM project is the provision
of “easy to use” special climate indices for end-users in order to assess the climate
impacts on Germany. This will also be done by analyzing model ensembles, to be
able to give also estimation of errors and robustness of the simulation results. The
preparation of climate indices accompanied by the extraction and reduction of the
simulation output and further investigations addressing the impact of climate change
will be the main task within the next months
Acknowledgements This work is part of the ReKliEs-De project funded by the BMBF (Federal
Ministry for Education and Research) and the Research Unit 1695 funded by the DFG (Deutsche
Forschungsgemeinschaft). We are thankful for the support from the staff of the DKRZ (Deutsches
Klimarechenzentrum), to be able to access GCM data. Computational Resources for the model
simulations on the HLRS CRAY XC40 within WRFCLIM were kindly provided by HLRS. We
would like to thank the staff for their great support.
High-Resolution Climate Projections 587
1. Taylor, K.E., Stouffer, R.J., Meehl, G.A.: Bull. Am. Meteorol. Soc. 93(4), 485 (2012)
2. Van Vuuren, D.P., Edmonds, J., Kainuma, M., Riahi, K., Thomson, A., Hibbard, K., Hurtt,
G.C., Kram, T., Krey, V., Lamarque, J.F., et al.: Clim. Chang. 109, 5 (2011)
3. Giorgi, F., Jones, C., Asrar, G.R., et al.: World Meteorol. Organ. (WMO) Bull. 58(3), 175
4. Warrach-Sagi, K., Schwitalla, T., Wulfmeyer, V., Bauer, H.S.: Clim. Dyn. 41(3–4), 755 (2013)
5. Kotlarski, S., Keuler, K., Christensen, O.B., Colette, A., Déqué, M., Gobiet, A., Goergen, K.,
Jacob, D., Lüthi, D., van Meijgaard, E., Nikulin, G., Schär, C., Teichmann, C., Vautard, R.,
Warrach-Sagi, K., Wulfmeyer, V.: Geosci. Model Dev. 7(4), 1297 (2014)
6. Prein, A., Gobiet, A., Truhetz, H., Keuler, K., Goergen, K., Teichmann, C., Maule, C.F., van
Meijgaard, E., Déqué, M., Nikulin, G., et al.: Clim. Dyn. 46(1–2), 383 (2016)
7. Skamarock, W.C., Klemp, J.B., Dudhia, J., Gill, D.O., Barker, D.M., Wang, W., Powers, J.G.:
A description of the advanced research wrf version 2. Technical report, DTIC Document (2005)
8. Chen, F., Dudhia, J.: Mon. Weather Rev. 129, 569 (2001)
9. Morrison, H., Thompson, G., Tatarskii, V.: Mon. Weather Rev. 137, 991 (2009)
10. Hong, S.Y., Noh, Y., Dudhia, J.: Mon. Weather Rev. 134, 2318 (2006)
11. Kain, J.S.: J. Appl. Meteorol. 43, 170 (2004)
12. Collins, W.D., Rasch, P.J., Boville, B.A., Mc Caa, J.R., Williamson, D.L., Kiehl, J.T., Briegleb,
B., Bitz, C., Lin, S.J., Zhang, M., Dai, Y.: Description of the NCAR community atmosphere
model (cam 3.0), 226pp. NCAR technical Note NCAR/TN-464+STR, NCAR, Boulder (2004)
Biogeophysical Impacts of Land Surface
on Regional Climate in Central Vietnam
1 Introduction
Many investigations have shown that human-induced land surface change impacts
strongly on climate at local and regional climate [2, 6, 12]. The impacts are a result
of alterations in biogeophysical processes of land surface. In this study, we assess
the biogeophysical impacts of land surface on the regional climate of the Vu Gia-
Thu Bon (VGTB) basin of Central Vietnam.
9% 12%
Cropland to mixed forest
10% Woodland to mixed forest
Woodland to cropland
Broadleaf forest to grassland
18% 31% Woodland to grassland
No change
5% 15%
Fig. 1 Converted LULC types from WRF LULC-default to WRF LULC-LUCCi. Percentage
changes for the VGTB basin
3 Results
The changes in monthly surface heat fluxes were analysed for all types of the LULC
replacements (Fig. 2). Results were drawn by averaging all grid points representing
LULC replacement type in the study area. In general, ground heat fluxes were two
orders of magnitude less than latent and sensible heat fluxes. The modification of
ground heat fluxes was nearly zero value. The changes in latent and sensible heat
fluxes are in a range of 20 to 20 Wm2 depending on characteristics of the LULC
Figure 2a shows that the woodland replaced by cropland was a reason for
increased latent heat fluxes in dry season (March to August). The maximum
increased latent heat flux was about 10 Wm2 in June. The increased latent heat
fluxes tended to increase evapotranspiration and then resulted in surface cooling.
However, the replacement caused decreased sensible heat fluxes during most of year
due to the increase in albedo and the decrease of roughness length. The maximum
decrease in sensible heat flux was about 15 Wm2 appearing in April and May.
The reduction in sensible heat fluxes were the cause of decreased heat loss into the
atmosphere, thereby suppressing the surface cooling. In rainy season (October to
November), latent heat fluxes decreased by about 10 Wm2 , while a weak increase
in sensible heat fluxes found in these months was about 2 Wm2 . The reduction
of the heat fluxes tended to warm the surface in rainy season. In general, the total
changes in the turbulent heat fluxes were reduced during the year, thereby warming
the surface.
The larger reduction of roughness length and vegetation greenness when
broad-leaf forest replaced by grassland has a remarkable effect on decreased
20 20
(a) 15 Latent heat flux
Sensible heat flux
(b) 15
Latent heat flux
Sensible heat flux
Ground heat flux Ground heat flux
Δ Surface heat fluxes [Wm ]
10 10
5 5
0 0
−5 −5
−10 −10
−15 −15
−20 −20
−25 −25
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
(c) 20
Latent heat flux
Sensible heat flux
(d) 20
Latent heat flux
Sensible heat flux
(e) 20
Latent heat flux
Sensible heat flux
Ground heat flux Ground heat flux Ground heat flux
Δ Surface heat fluxes [Wm ]
10 10 10
5 5 5
0 0 0
−5 −5 −5
Fig. 2 Alteration of monthly heat fluxes based on LULC conversion types over the VGTB basin
(WRF LULC-LUCCi minus WRF LULC-default). (a) Woodland to cropland. (b) Broadleaf
forest to grassland. (c) Cropland to mixed forest. (d) Woodland to mixed forest. (e)
Woodland to grassland
Biogeophysical Impacts of Land Surface on Regional Climate in Central Vietnam 593
evapotranspiration and increased albedo (Fig. 2b). Consequently, latent and sensible
heat fluxes were reduced in the year, except a weak increase in sensible heat flux
of about 3 Wm2 in November. The maximum decrease in surface heat fluxes
was about 30 Wm2 observed in April. The decreases in total heat fluxes is the
cause of warming surface and reducing convective cloud, indicating a decrease in
Cropland replaced by mixed forest has an opposite changes in physical properties
with the replacement of woodland by cropland or broad-leaf forest to grassland.
Figure 2c indicates that sensible heat fluxes increased due to the decrease in albedo
and warmer surface. Latent heat fluxes were reduced during the year; as a result,
evapotranspiration was reduced, causing the surface warming. Latent heat fluxes
decreased by about 3 Wm2 in most of year, except no changes in June, November
and December. Sensible heat fluxes increased by about 5 Wm2 from January to
September and fluctuated around zero value in November and December.
The replacement of woodland by mixed forest (Fig. 2d) and woodland by grass-
land (Fig. 2e) are the main LULC replacements over the VGTB basin, accounting for
31 % and 18 %, respectively. The alteration of heat fluxes due to these conversions
are similar. Latent heat fluxes were reduced during the year except June with
increases of about 5 Wm2 . The maximum decrease in latent heat flux was about
10 Wm2 in September. The decreased latent heat fluxes might result in a surface
warming. A weak decrease in sensible heat fluxes was found from January to May.
From June to December, sensible heat fluxes were in a range of 5 to 5 Wm2 .
The changes in the surface heat fluxes shows a clear seasonality characterized
by a dry season from February to August and rainy season from September to
November. The alteration of surface heat fluxes may affect seasonal variability of
climate variables.
Figure 3 shows the alteration of roughness length and albedo, as well as the changes
in annual soil moisture and surface heat fluxes because of the updated LULC data in
the VGTB basin. In the Eastern lowland basin, most of woodland was replaced by
cropland, thereby decreasing roughness length and increasing albedo. The decrease
in roughness length has a warming effect, while the increase in albedo tend to cool
the surface owing to decreasing net absorb radiation. The transfer of sensible heat
fluxes into the atmosphere was reduced by the cooling surface. While, latent heat
fluxes had no changes in this LULC replacement. As a result, total turbulent heat
fluxes decreased, causing less surface heat fluxes away from the surface, causing a
warming effect. Sensible heat fluxes decreased by about 14 Wm2 and surface soil
moisture decreased by about 0.1 m3 m3 .
In the Western basin, there is a large area of broad-leaf forest replaced by
grassland similar to deforestation. Remarkable decreases in roughness length and
evapotranspiration caused to warm the surface, while the increases in albedo had
594 N.B.P. Nguyen et al.
−0.5 −0.3 −0.1 0 0.1 0.3 0.5 (M) −0.2 −0.1 0 0.1 0.2 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 (M3/M3)
(d) (e)
−40 −30 −20 −10 0 10 20 30 (W/M2) −40 −30 −20 −10 0 10 20 30 (W/M2)
Fig. 3 Alteration of annual roughness length, albedo, soil moisture and heat fluxes over the
VGTB basin due to updated LULC map (WRF LULC-LUCCi minus WRF LULC-default). (a)
Roughness length. (b) Albedo. (c) Soil moisture at surface. (d) Latent heat flux. (e)
Sensible heat flux
the effect of cooling. The roughness length was remarkably reduced by 0.5 and
the albedo increased about 0.09. The replacement led to decrease both latent
and sensible heat fluxes. The decrease in latent heat fluxes were greater than the
decrease in sensible heat fluxes about 5 Wm2 . Sensible heat fluxes reduced by
about 15 Wm2 . The decreases in turbulent heat fluxes tended to warm the land
surface and decrease precipitation (Fig. 5). There was a decrease in soil moisture of
about 0.1 m3 m3 observed over the area. This result is consistent with the surface
warming and less precipitation due to deforestation [e.g. 5, 9, 16].
In the Northern and Southern basin, roughness length was increased due to the
replacement of cropland by mixed forest, resulting in decreased albedo. Changes in
albedo and roughness length tended to increase sensible heat fluxes, cooling the
land surface. However, the replacement of cropland was the result of decreases
in latent heat fluxes, consequence in less evapotranspiration and warming the
surface. The modifications acted to increase slightly surface air temperatures as a
result of the competition between the warming due to the decreased albedo and
evapotranspiration, and the cooling due to the increased sensible heat fluxes.
Although the main LULC replacements in the Central basin and scattered over
the highland of VGTB basin are woodland replaced by mixed forest (31 % area of
the VGTB basin) and woodland replaced by grassland (18 % area of the VGTB
basin), the replacement was no significant effects on the changes in albedo and
surface heat fluxes. Consequently, there were no significant changes in surface air
Biogeophysical Impacts of Land Surface on Regional Climate in Central Vietnam 595
temperature. The albedo, soil moisture and sensible heat fluxes stayed in the same
levels. There were weak decreases in latent heat fluxes in a range of 2–5 Wm2 .
Some areas along the coastline were replaced by urban land. Although the
extension of urban area was about 1 % area of the basin, the extension was the
causes of remarkable increases in sensible heat fluxes and considerable decreases in
latent heat fluxes. Consequently, surface air temperature was strongly affected.
The LULC replacements over the VGTB basin result in the alteration of physical
properties of land surface, thereby changes in the climate variables. The role of
LULC on regional climate as a first-order climate forcing has been well-documented
[e.g. 1, 3, 9].
Figure 4 exhibits the alteration of climate variables due to the updated LULC data.
Soil moisture at the first level (10 cm), which dominates the variability of the latent
heat fluxes, experienced in decreases in a range of about 0.03 (4 %)–0.03 m3 m3
(17 %) for all LULC replacement types in the VGTB basin. The replacement of
broad-leaf forest by grassland caused the largest decrease in soil moisture because
of the reduction in convective cloud. The decreased soil moisture in the dry months
(April and May) was various among LULC replacement types, while the decreased
soil moisture stayed in the same level with a decrease of about 15 % for most of
LULC replacements. This evidence indicates the changes in soil moisture depending
on LULC types and seasonal variability.
The changes in surface air temperature indicated a relation with the modification
of surface heat fluxes (Fig. 4b). Overall, all replacements tended to increase surface
air temperature of about 0.3 ı C during the year The increased surface air temperature
results from the decreases in net surface heat fluxes. In May, these replacements have
less increases in surface air temperature than the other months due to the increased
sensible heat fluxes. In October, surface air temperature increased in a range of
0.3–0.5 ı C. The changes of surface air temperature depended on the type of LULC
replacements. The maximum increase in surface temperature was 0.5 ı C observed
in October when replacing woodland by grassland.
Precipitation exhibited a decrease of about 50 mm in September and an increase
of about 50 mm in October, and fluctuation in ˙20 mm in the other months for all
LULC replacements. From January to August, precipitation altered about 50 mm for
all LULC replacements. However, precipitation altered about 20 % of the Control in
precipitation amount for these months.
596 N.B.P. Nguyen et al.
−0.06 0
Woodland to Cropland Woodland to Cropland
−0.08 Broadleaf forest to grassland Broadleaf forest to grassland
Cropland to mixed forest −0.4 Cropland to mixed forest
−0.09 Woodland to mixed forest Woodland to mixed forest
Woodland to Grassland Woodland to Grassland
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
(c) 100
(d) 2
50 1
0 0
−50 −1
(e) 40
Fig. 4 Alteration of climate variables based on LULCC types over the VGTB basin due to
the updated LULC map (WRF LULC-LUCCi minus WRF LULC-default). (a) Soil moisture
(m3 m3 ). (b) Surface temperature (ı C). (c) Precipitation (mm). (d) Wind speed (ms1 ).
(e) Maximum CAPE (JKg1 )
Figure 4d shows that the changes in wind speed were the result from the alteration
of roughness length. The replacement of broad-leaf forest by grassland resulted in
a strengthening of wind speed during the year in a range of 0–1.2 ms1 , while the
replacement of cropland by mixed forest caused decreases in wind speed of about
1 ms1 from August to November.
The alteration of surface heat fluxes and surface energy balance tends to
modification the structure and stability of the overlying troposphere, which can
be exhibited through the convective available potential energy (CAPE). To provide
insight into the genesis and intensity of convection, averaged maximum CAPE was
calculated. Magnitude of the CAPE are much smaller than what would be expected
for an individual storm due to regardless of weather conditions [17]. The changes
Biogeophysical Impacts of Land Surface on Regional Climate in Central Vietnam 597
in monthly mean of daily maximum CAPE are shown in Fig. 4e. All replacements
led to increase in maximum CAPE in March about 20 JKg1 (20 % of the Control).
In the other months, the maximum CAPE were in a range of 80 to 40 JKg1 .
However, there was only change about 10 % of the Control.
The consequences of the updated LULC data on climate variables are shown in
Fig. 5. Overall, the updated LULC caused increases in surface air temperatures of
about 0.8 ı C over the VGTB basin due to the reduction in turbulent heat fluxes. The
impacts of land forcing on the regional climate is dominated by the modification
in turbulent heat fluxes. This result is consistent with previous studies, in which
changes in albedo dominate the climate response in temperate regions while the
(a) (b)
(c) (d)
Fig. 5 Alteration of annual surface air temperature, precipitation, wind speed and maximum
CAPE over the VGTB basin due to updated LULC map (WRF LULC-LUCCi minus WRF LULC-
default). (a) Surface air temperature. (b) Precipitation
598 N.B.P. Nguyen et al.
changes in evapotranspiration, heat fluxes and roughness length drive in the tropics
[5, 9, 16].
In the Eastern basin, the decrease in surface heat fluxes due to decrease in
roughness length tended to warm the surface, while the increase in albedo tended to
cooling the surface. The net influences were a slight warming over the area. Surface
air temperatures increased by about 0.3 ı C. The alteration in precipitation was not
directly impacted by the land forcing. Precipitation increased in a range of 50–
200 mm. Wind speed shows the conjunction with the decreases in roughness length.
The wind speed increased by about 0.5 ms1 over the area. Convective available
potential energy (CAPE) is the amount of energy a parcel of air would have if lifted
a certain distance vertically through the atmosphere. The maximum CAPE in a day
represents the unstable atmosphere in magnitude. The maximum CAPE increased
by about 18 JKg1 .
In the Western basin, broad-leaf forest replaced by grassland and cropland
resulted in a decrease of convective cloud due to reduction in surface heat
fluxes. Precipitation decreased by about 600 mm over this area. The sensitivity of
deforestation in tropical area to convective cloud and precipitation was mentioned
in a large number of studies [e.g. 4, 18, 20]. These studies demonstrated that
deforestation is the cause of a decrease in precipitation in a wide range from 1 to
20 % of the Control. The decreased precipitation depends on topographic effects and
natural spatial variability of precipitation. [18] indicated that the east Asian summer
monsoon is sensitive to deforestation in the Indo-China region. The replacement
results in considerable increases in albedo and remarkable decreases in turbulent
heat fluxes. Two drivers are in the opposite direction, leading a slight warming
over the area. Wind speed and maximum CAPE were impacted by the reduction
in roughness length. The wind speed increased in a range of 0.5–0.8 ms1 and
maximum CAPE increased by about 10 JKg1 .
In the Northern and Southern basin, although the replacement of cropland by
mixed forest led to increase sensible heat fluxes, surface air temperature increased
slightly due to the decreases in latent heat fluxes and albedo. Some studies indicated
that the replacement of cropland to forest lead to decrease near surface temperature
and increase in latent heat fluxes [e.g. 13, 15]. However, water available for
evapotranspiration is not efficiency during dry season, results in decreases in latent
heat fluxes, thereby warming the land surface. At the Da Nang city, surface air
temperature experienced an increase of about 1.5 ı C because of the extended urban.
Precipitation increased by about 1,000 mm over these areas, which may be associate
with the increased surface air temperature and increased sensible heat fluxes into
the atmosphere. Due to the increase in roughness length, wind speed and maximum
CAPE were decreased by about 1.5 ms1 and 20 JKg1 , respectively. The decreases
in wind speed and maximum CAPE were a cause of increased sensible heat fluxes
and surface temperature over these areas.
Biogeophysical Impacts of Land Surface on Regional Climate in Central Vietnam 599
Table 1 Number of simulated months using the regional climate model WRF
Experiments Perturbed runs Months for perturbed runs Total
WRF LULC-default 5 72 360
WRF LULC-LUCCi 5 72 360
720 simulated months
Time [s]
2 4 6 8 10 12 14 16 18 20
Acknowledgements This research is funded by the Federal Ministry of Education and Research
(research project: Land Use and Climate Change Interactions in Central Vietnam (LUCCi),
reference number 01LL0908C). The provision of CPU and storage capacities at Karlruhe Institute
of Technology (KIT), Steinbuch Centre for Computing (SCC) and Karlruhe Institute of Technology
(KIT), Institute of Meteorology and Climate Research (IMK-IFU) is highly acknowledged.
600 N.B.P. Nguyen et al.
1. Bonan, G.B.: Forests and climate change: forcings, feedbacks, and the climate benefits of
forests. Science 320, 1444–1449 (2008)
2. Caldas, M.M., Goodin, D., Sherwood, S., Campos Krauer, J.M., Wisely, S.M.: Land-cover
change in the Paraguayan Chaco: 2000–2011. J. Land Use Sci. 10, 1–18 (2015)
3. Charney, J., Quirk, W.J., Chow, S.-H., Kornfield, J.: A comparative study of the effects of
albedo change on drought in semi-arid regions. J. Atmos. Sci. 34, 1366–1385 (1977)
4. Costa, M.H., Pires, G.F.: Effects of Amazon and Central Brazil deforestation scenarios on the
duration of the dry season in the arc of deforestation. Int. J. Climatol. 30, 1970–1979 (2010)
5. Davin, E.L., de Noblet-Ducoudré, N.: Climatic impact of global-scale deforestation: radiative
versus nonradiative processes. J. Clim. 23, 97–112 (2010)
6. Deng, X., Zhao, C., Yan, H.: Systematic modeling of impacts of land use and land cover
changes on regional climate: a review. Adv. Meteorol. 2013, 1–11 (2013)
7. Ezber, Y., Lutfi Sen, O., Kindap, T., Karaca, M.: Climatic effects of urbanization in Istanbul: a
statistical and modeling analysis. Int. J. Clim. 27, 667–679 (2007)
8. Giorgi, F., Bi, X.: A study of internal variability of a regional climate model. J. Geophys. Res.
Atmos. (1984–2012) 105, 29503–29521 (2000)
9. Kvalevåg, M.M., Myhre, G., Bonan, G., Levis, S.: Anthropogenic land cover changes in a
GCM with surface albedo changes based on MODIS data. Int. J. Climatol. 30, 2105–2117
10. Laux, P., Lorenz, C., Thuc, T., Ribbe, L., Kunstmann, H., et al.: Setting up regional climate
simulations for Southeast Asia. In: High Performance Computing in Science and Engineering,
vol. 12, pp. 391–406. Springer, Berlin/New York (2013)
11. Lee, S.-J., Berbery, E.H.: Land cover change effects on the climate of the La Plata Basin. J.
Hydrometeorol. 13, 84–102 (2012)
12. Mahmood, R., Pielke, R.A., Hubbard, K.G., Niyogi, D., Dirmeyer, P.A., McAlpine, C.,
Carleton, A.M., Hale, R., Gameda, S., Beltrán-Przekurat, A., et al.: Land cover changes and
their biogeophysical effects on climate. Int. J. Climatol. 34, 929–953 (2014)
13. Nagendra, H., Southworth, J.: Reforesting landscapes: linking pattern and process, vol. 10.
Springer, Dordrecht (2009)
14. Nguyen, N.B.P., Laux, P., Cullmann, J., Kunstmann, H.: High performance computing
in science and engineering 15: Transactions of the High Performance Computing Center,
Stuttgart (HLRS) 2015. In: Do We Have to Update the Land-Use/Land-Cover Data in RCM
Simulations? A Case Study for the Vu Gia-Thu Bon River Basin of Central Vietnam, pp. 623–
635. Springer, Berlin/New York (2016)
15. Pielke, R.A., Pitman, A., Niyogi, D., Mahmood, R., McAlpine, C., Hossain, F., Goldewijk,
K.K., Nair, U., Betts, R., Fall, S., et al.: Land use/land cover changes and climate: modeling
analysis and observational evidence. Wiley Interdiscip. Rev. Clim. Change 2, 828–850 (2011)
16. Pongratz, J., Reick, C., Raddatz, T., Claussen, M.: Biogeophysical versus biogeochemical
climate response to historical anthropogenic land cover change. Geophys. Res. Lett. 37(8)
(2010). doi:10.1029/2010GL043010
17. Riemann-Campe, K., Fraedrich, K., Lunkeit, F.: Global climatology of convective available
potential energy (CAPE) and convective inhibition (CIN) in ERA-40 reanalysis. Atmos. Res.
93, 534–545 (2009)
18. Sen, O.L., Wang, Y., Wang, B.: Impact of Indochina deforestation on the East Asian summer
monsoon*. J. Clim. 17, 1366–1380 (2004)
19. Sertel, E., Robock, A., Ormeci, C.: Impacts of land cover data quality on regional climate
simulations. Int. J. Climatol. 30, 1942–1953 (2010)
20. Wang, J., Chagnon, F.J., Williams, E.R., Betts, A.K., Renno, N.O., Machado, L.A., Bisht, G.,
Knox, R., Bras, R.L.: Impact of deforestation in the Amazon basin on cloud climatology. Proc.
Natl. Acad. Sci. 106, 3670–3674 (2009)
Reducing the Uncertainties of Climate
Projections: High-Resolution Climate Modeling
of Aerosol and Climate Interactions
on the Regional Scale Using COSMO-ART:
Interaction of Mineral Dust with Atmospheric
Radiation over West-Africa
Abstract Aim of this project is to investigate the impact of aerosol in high resolu-
tion climate runs. At the moment aerosols and their interactions with radiation and
clouds represent one of the major uncertainties in our understanding of the climate
system as they can be described only roughly in coarse resolution global models.
The online coupled comprehensive chemistry model system COSMO-ART already
showed in several case studies the potential of closing this gap. In order to apply it
on decadal climate time scales the use of high performance computing becomes a
necessity. In this study we quantified the effect of replacing the aerosol climatology
usually used in regional climate simulations with CLM by online calculated dust
concentrations. The model domain covered the DEPARTURE region. Only radiation
feedback was accounted for neglecting aerosol cloud interactions. Interactive dust
improved the agreement of simulated precipitation in comparison with observations.
1 Motivation
Aerosols and their interactions with radiation and clouds represent one of the major
uncertainties in our understanding of the climate system. While on the global scale
a lot of modeling activity is going on to narrow this uncertainty there is not to much
effort visible on the regional climate scale. On the other hand due to the coarse
spatial resolution of global models the aerosol and micro physical processes and
their interaction are only roughly described. The online coupled comprehensive
chemistry model system COSMO-ART already showed in several case studies the
potential of closing this gap. We used this model system to perform a sensitivity
study quantifying the effect of different mineral dust climatology available from
literature and of online calculated dust and its feedback with radiation followed
by altering precipitation over selected domains within Africa. This elucidates the
relative importance of online coupled aerosol radiation interactions on the regional
scale climate.
In order to quantify the feedback processes between aerosols and the state of the
atmosphere on the continental to regional scale the fully online integrated model
system COSMO-ART with two-way interactions between different atmospheric
processes has been developed [1–3]. The operational weather forecast model
COSMO of the Deutscher Wetterdienst [4] was extended to treat secondary aerosols
as well as directly emitted components like soot, mineral dust, sea salt and biological
material and their feedback with radiation and clouds. The gas phase chemistry
module (RADMKA) is based on RADM2 and includes several improvements. We
updated rate constants according to IUPAC, updated the mechanism concerning
biogenic VOCs, made extensions for the hydrolysis of N2O5 and included new
sources for HONO. The KPP mechanism can be used for a flexible modification
of the chemical mechanism. COSMO-ART uses the modal approach to describe
the size distribution. New particles can be formed by nucleation of sulfuric acid.
The processes condensation, coagulation, sedimentation, and washout are taken into
account. A volatility basic set approach is used to describe the secondary organic
aerosol [5]. The thermodynamic module ISORROPIAII [6] is applied. Emissions of
mineral dust are calculated at each grid point and each time step for three individual
modes, depending on the simulated friction velocity and surface parameters [7].
Figure 1 shows the feedback processes that are realized in COSMO-ART.
3 Sensitivity Study
We carried our three individual model runs for the model domain shown in Fig. 2.
In two of them (Tanre and AeroCom) different dust climatology were used to
calculate the radiative fluxes. In one simulation (ART) the online calculated dust
concentrations were used to calculate the optical properties of the mineral dust.
Aerosol cloud interactions were not accounted for in all three model runs. For each
scenario a time period of 10 years was simulated.
Climate Modeling with COSMO-ART 603
Fig. 2 Model domain. The red boxes indicate subdomains that were used for the evaluation of the
model results. CS D Central Sahel, WS D Western Sahel, GC D Guinea coast
604 B. Vogel et al.
Although we have enabled the interaction of mineral dust with radiation only
and disabled the interaction with cloud formation we have to expect changes of
precipitation which is a key meteorological quantity for West Africa. This is caused
by the following process chain. Dust aerosol modifies the radiative fluxes. This
induces temperature changes. Caused by these temperature changes the flow field
is modified on the local scale as well as on synoptic scales. This alters the cloud
formation and consequently the precipitation. Here, we will concentrate on the
modifications of precipitation caused by the different scenarios.
Figure 3 shows the results for the sub domain Central Sahel. A comparison
with observed precipitation (red curve) shows that all model results are in close
agreement with the observations. When looking at the root mean square error
it shows that the fully interactive model run (ART) gives better results than the
scenarios with prescribed climatology. For the subdomain Guinea coast (Fig. 4)
all scenarios are overestimating the precipitation. However, again the interactive
Willmott-Matsuura CCLM_2000_DS2R4E55_AeroCom
450 CCLM_2000_DS2R4E54_Tanre CCLM_2000_DS2R4E56_ART
2001 2002 2003 2004 2005 2006 2007 2008 2009 2010
Time [Years]
Annual RMSE (mm)
Central Sahel
1 2 3 4 5 6 7 8 9 10
Hindcast Year
Fig. 3 Results for sub domain CS. Top: Simulated sum of precipitation per month for the different
scenarios. The red curve shows the observation. Bottom: Annual root mean square error
Climate Modeling with COSMO-ART 605
Willmott-Matsuura CCLM_2000_DS2R4E55_AeroCom
450 CCLM_2000_DS2R4E54_Tanre CCLM_2000_DS2R4E56_ART
2001 2002 2003 2004 2005 2006 2007 2008 2009 2010
Time [Years]
Annual RMSE (mm)
Guinea Coast
20 CCLM_2000_DS2R4E54_Tanre
1 2 3 4 5 6 7 8 9 10
Hindcast Year
Fig. 4 Results for sub domain GC. Top: Simulated sum of precipitation per month for the different
scenarios. The red curve shows the observation. Bottom: Annual root mean square error
scenario improves the agreement with observation for all years (bottom of Fig. 4).
The results for sub domain West Sahel (Fig. 5) again show that all scenarios are in
close agreement with the observations. When looking at the root mean square error
the quality of the model results is comparable for all of the scenarios.
4 Resources
The simulations were carried out on a 275*207*35 grid with a time step of 240 s and
a Runge-Kutta 3rd order time integration scheme. In each case 51 nodes were used.
The simulation of one scenario with a prescribed mineral dust climatology required
1500 node hours on the CRAY XC40 ‘Hazel Hen’. A simulation with the interactive
mineral dust required 5610 node hours. This is an increase by a factor of four. In
addition to the increased computational costs a surplus of 1650 GB data storage in
comparison with a pure COSMO run (with dust climatology) was needed.
606 B. Vogel et al.
Willmott-Matsuura CCLM_2000_DS2R4E55_AeroCom
450 CCLM_2000_DS2R4E54_Tanre CCLM_2000_DS2R4E56_ART
2001 2002 2003 2004 2005 2006 2007 2008 2009 2010
Time [Years]
100 CCLM_2000_DS2R4E56_ART
Annual RMSE (mm)
80 West Sahel
1 2 3 4 5 6 7 8 9 10
Hindcast Yeart
Fig. 5 Results for sub domain WS. Top: Simulated sum of precipitation per month for the different
scenarios. The red curve shows the observation. Bottom: Annual root mean square error
5 Summary
1. Vogel, B., Vogel, H., Bumer, D., Bangert, M., Lundgren, K., Rinke, R., Stanelle, T.: The compre-
hensive model system COSMO-ART Radiative impact of aerosol on the state of the atmosphere
on the regional scale. Atmos. Chem. Phys. 9(22), 8661–8680 (2009). doi:10.5194/acp-9-8661-
Climate Modeling with COSMO-ART 607
2. Knote, C., et al.: Towards an online-coupled chemistry-climate model: evaluation of trace gases
and aerosols in COSMO-ART. Geosci. Model Dev. 4(4), 1077–1102 (2011). doi:10.5194/gmd-
3. Bangert, M., et al.: Saharan dust event impacts on cloud formation and radiation over Western
Europe. Atmos. Chem. Phys. 12(9), 4045–4063 (2012). doi:10.5194/acp-12-4045-2012
4. Baldauf, M., Seifert, A., Foerstner, J., Majewski, D., Raschendorfer, M., Reinhardt, T.:
Operational convective-scale numerical weather prediction with the COSMO model: description
and sensitivities. Mon. Weather Rev. 139(12), 3887–3905 (2011). doi:10.1175/MWR-D-10-
5. Athanasopoulou, E., et al.: Fire risk, atmospheric chemistry and radiative forcing assess-
ment of wildfires in eastern Mediterranean. Atmos. Environ. 95, 113–125 (2014).
6. Fountoukis, C., Nenes, A.: ISORROPIA II: a computationally efficient thermodynamic equilib-
rium model for ,Ca2+,Mg2+,NH4+,Na+,SO42,NO3,Cl,H2O aerosols. Atmos. Chem. Phys. 7,
4639–4659 (2007). doi:10.5194/acp-7-4639-2007
7. Stanelle, T., Vogel, B., Vogel, H., Bumer, D., Kottmeier, C.: Feedback between dust particles
and atmospheric processes over West Africa during dust episodes in March 2006 and June 2007.
Atmos. Chem. Phys. 10(22), 10771–10788 (2010). doi:10.5194/acp-10-10771-2010
Part VI
Miscellaneous Topics
Wolfgang Schröder
W. Schröder ()
Institute of Aerodynamics, RWTH Aachen University, Wüllnerstr. 5a, 52062 Aachen, Germany
e-mail: office@aia.rwth-aachen.de
610 W. Schröder
temperature and pressure, corresponding to a set of more than 200 transport data
points. Each simulation point was calculated employing 4000 molecules with
production runs of 107 time steps. The simulation results were compared, wherever
possible, to experimental data and to a set of predictive equations.
The Institute of Applied Materials, Reliability of Components and Systems,
Karlsruhe Institute of Technology deals with large-scale phase-field simulations.
The combination of different chemical elements allows to obtain new and improved
materials, as required for novel applications. Especially directionally solidified mul-
ticomponent eutectic alloys exhibit a wide range of patterns in the microstructure,
which are correlated to the mechanical properties. The pattern formation during
solidification depends on the chemical elements and the applied process parameters.
Large-scale phase-field simulations are used to study the pattern formation of
directional solidified ternary eutectics. Three different systems, starting from a
model system towards the system Al-Ag-Cu are investigated, using three growth
velocities. The three-dimensional simulation results are quantitatively compared and
a broad variety of arising patterns for the studied systems is found. The results of the
velocity variation follow the predictions from the analytic Jackson-Hunt approach.
The Geophysics Institute, Karlsruhe Institute of Technology presents applica-
tions of full waveform inversion (FWI) to field data. FWI is a powerful imaging
technique which exploits the richness of seismic waveforms. It is further developed
to obtain multi-parameter images at high resolution. Physical parameters are
involved such as velocities and attenuation of seismic waves as well as mass density
are involved. They are essential for a reliable petrophysical characterization of
subsurface structures in hydrocarbon exploration, geotechnical applications and
underground constructions. Referring to this, FWI is successfully applied to field
datasets recorded in the Black Sea and in the shallow-water area of a river delta
in the Atlantic Ocean. Detailed subsurface images are obtained containing rock
formations which might be potential gas deposits. Additionally, synthetic studies
are performed as preparatory steps to verify methodological improvements for
further field-data applications. Resolution capabilities of FWI are demonstrated for
imaging geological structures beneath salt bodies. Strategies to recover attenuation
information from seismic data are investigated and a joint inversion of surface waves
is performed to image the very shallow subsurface.
The Goethe Center for Scientific Computing, Goethe University Frankfurt has
developed a massively parallel multigrid solver with level dependent smoothers. An
issue that had not been fully addressed in previous studies is the difficulty to solve
elliptic potential differential equations (PDE) on massively parallel computers in
the presence of anisotropic coefficients or anisotropic elements in the underlying
grid. While parallelism has been considered in previous studies, massively parallel
systems as today’s supercomputers with hundred thousands of computing cores did
not exist at that time and thus no optimization regarding massive scalability has
been performed. Recently, massively parallel multigrid has been described for the
solution of elliptic PDEs for the special case with strong vertical anisotropies on
structured grids. Considering the real world problem of drug discussion through
VI Miscellaneous Topics 611
the human skin, the former approaches have been extended to construct a method
that employs geometric multigrid on massively parallel computers for problems
with highly anisotropic elements using a combination of specialized refinement
techniques and smoothers resulting in a robust and highly scalable solver for
anisotropic problems. The special grid layout of the model problem thereby requires
a solver which can handle anisotropies in all spatial directions on unstructured grids.
Molecular Simulation Study of Transport
Properties for 20 Binary Liquid Mixtures
and New Force Fields for Benzene, Toluene
and CCl4
1 Introduction
Sk Sl e
1 kc ld Qld kc C Qkc ld
f 1 .! k ; ! l / C 4
f 2 .! k ; !l / (1)
cD1 dD1
4 " 0 rklcd rklcd
Transport Properties of 20 Binary Mixtures 615
where rklab , klab , klab are the distance, the LJ energy parameter and the LJ
size parameter, respectively, for the pair-wise interaction between LJ site a on
the molecule k and the LJ site b on molecule l. The vacuum permittivity is "0 ,
whereas qkc , kc and Qkc denote the point charge magnitude, the dipole and the
quadrupole moments of the electrostatic interaction site c on molecule k. The
expression f .!k ; !l / stands for the dependence of the electrostatic interactions on
the orientations !k and !l of the molecules k and l [21]. The summation limits SkLJ
and Ske indicate the number of LJ and electrostatic sites, respectively. It should be
noted that a point quadrupole can be approximated by three collinear point charges
q, 2q and q separated by l each, where Q D 2ql2 .
The force fields for benzene and toluene were obtained with the parameterization
procedure proposed by Muñoz-Muñoz et al. [35]. Benzene was modeled by six LJ
sites with superimposed point quadrupole sites. Internal bond angles of 120ı and
dihedral angles of 0ı were kept constant so that all sites were located in a plane.
The quadrupole moment of benzene was equally distributed among all LJ sites to
avoid artifacts when mixtures with small molecules are considered. Initially, the
quadrupole sites were located at the carbon positions and their value was set to
Q D QT =6, where QT is the quadrupole moment magnitude of the benzene model
by Bonnaud et al. [4]. The site-site distance between the LJ and quadrupole sites
were modified in small steps as described in Ref. [35] until a suitable combination
of parameters was obtained. Finally, the reduced units method by Merker et al. [34]
was applied to obtain the definitive force field parameters.
Toluene was modeled on the basis of the benzene model with an additional LJ
site, representing the methyl group, located a distance ı from the ring. The LJ
parameters of the methyl site were taken from Schnabel et al. [46]. All site-site
distances were optimized until accurate results of the thermodynamic properties
were obtained following the above described procedures.
The critical point of the new benzene force field is located at T D 561 K, D
4:01 mol l1 and p D 5:0 MPa. The relative deviations with respect to experimental
data [49] for critical temperature, critical density and critical pressure of benzene are
0.1 %, C2.9 % and C2.5 %, respectively. The predicted VLE properties exhibit an
average deviation of 7.3 % for vapor pressure, 1.0 % for saturated liquid density
and 2.4 % for enthalpy of vaporization in the regarded temperature range. In
the temperature range between 280 and 333 K at ambient pressure, self-diffusion
coefficient and shear viscosity obtained with the new benzene force field deviate
on average by 5.7 % and 6.2 % from correlations of experimental data by Fischer
and Weiss [15], respectively. The thermal conductivity deviates on average by 11 %
from a correlation of experimental data [39, 45]. Figure 1 shows the calculated VLE
and transport properties in comparison with the corresponding reference equations
of state or experimental data. The critical temperature of the new toluene force field
is 594 K, the critical density is 3.24 mol l1 and the critical pressure is 4.4 MPa. The
according relative deviations from experiment [26] are C0.5 %, C2.1 % and C6.4%,
respectively. The VLE properties exhibit average deviations of 18.3 % for vapor
pressure, 1.1 % for saturated liquid density and 3.7 % for enthalpy of vaporization
in the studied temperature range. The ability of the new toluene force field to predict
616 G. Guevara-Carrion et al.
3 Methodology
Transport properties were sampled with EMD and the Green-Kubo formalism.
The Green-Kubo expression for the self-diffusion coefficient Di is related to the
individual molecule velocity autocorrelation function
Z 1
1 ˝ ˛
Di D dt vki .t/ vki .0/ (2)
3Ni 0
Here, vki .t/ is the center of mass velocity vector of molecule k of component i at
some time t and Ni is the number of molecules of component i. The brackets <. . . >
denote the canonical (NVT) ensemble average. Equation (2) is an average over all
Ni molecules in the ensemble because all contribute to the self-diffusion coefficient.
The self-diffusion coefficient that describes the mobility of species i in a mixture is
also termed intradiffusion coefficient.
618 G. Guevara-Carrion et al.
Fig. 3 Temperature dependence of the vapor-liquid equilibrium and transport properties of CCl4 .
Simulation results (blue solid circle) for (a) saturated liquid density, (b) vapor pressure and (c)
enthalpy of vaporization are compared with experimental data (plus) and an equation of state (solid
curve). (d) Simulation results for the self-diffusion coefficient (blue solid circle) are compared
with experimental data by Fischer and Weiss [15] (black plus), McCool and Wolf [32] (green
plus), Collins and Mills [6] (red plus), Harris et al. [19] (blue plus) and Rathbun and Babb [41]
(dark blue plus). (e) Simulation results for the shear viscosity (blue solid circle) are compared with
experimental data by Luchinskii [29] (black plus) and Ikeuchi et al. [22] (red plus). (f) Simulation
results for the thermal conductivity (blue solid circle) are shown together with experimental data
by Rowley et al. [43] (black plus), Fischer [13] (blue plus) and Lei et al. [25] (red plus)
620 G. Guevara-Carrion et al.
The MS diffusion coefficient Ðij can be determined from the Onsager coefficients
Lij with the Green-Kubo expression [23]
Z 1 DX
Ni X
Nj E
Lij D dt vki .0/ vlj .t/ ; (3)
3N 0 kD1 lD1
In this way, the MS diffusion coefficient can be sampled directly, but cannot be
measured experimentally in the laboratory like the Fick diffusion coefficient Dij .
However, both diffusion coefficients are related by Dij D Ðij % ; where % is the
thermodynamic factor given by
@ ln
1 @ ln
% D 1 C x1 D 1 C x2 ; (5)
@x1 T;p @x2 T;p
i and xi stand for the activity coefficient and molar fraction of component
i, respectively. The MS diffusion coefficient can thus be transformed to the Fick
diffusion coefficient and vice versa, if the thermodynamic factor is known. In
this work, the thermodynamic factor was obtained from excess Gibbs energy GE
models fitted to experimental VLE data. Because the thermodynamic factor is
sensitive to the underlying thermodynamic model, it was calculated for all studied
mixtures employing three different GE models, i.e., Wilson [54], NRTL [42] and
The shear viscosity is associated with the autocorrelation function of the off-
diagonal elements of the stress tensor Jpxy
Z 1
1 ˝ ˛
D dt Jpxy .t/ Jpxy .0/ ; (6)
VkB T 0
where V stands for the volume. The component Jpxy of the microscopic stress tensor
Jp is given by [18]
y 1 X X x @u.rkl /
Jpxy D mk vkx vk rkl y : (7)
2 kD1 @rkl
Here, k and l denote different molecules of any species. The upper indices x and y
stand for the spatial vector components, e.g., for velocity vkx or site-site distance rklx .
Equations (6) and (7) may directly be applied to mixtures. Five independent terms
Transport Properties of 20 Binary Mixtures 621
of the stress tensor Jpxy , Jpxz , Jpyz , .Jpxx Jpyy /=2 and .Jpyy Jpzz /=2 were considered to
improve statistics [2].
The thermal conductivity is given by the autocorrelation function of the
elements of the microscopic heat flow Jqx
Z 1
1 ˝ ˛
D dt Jqx .t/ Jqx .0/ : (8)
VkB T 2 0
In mixtures, energy and mass transport occur in a coupled manner, thus, the heat
flow for a mixture of n components is given by [10]
2 3
1 XX
n Ni
Xn XNj
Jq D 4mik vik C wik Iik wik C u rkl 5 vik
2 iD1 kD1 jD1 l¤k
1 XXX i X
N Nj @u rkl X X
ij ij
n n n
rkl vik C wik kl hi vik ; (9)
2 iD1 jD1 kD1 ij
@rkl iD1 kD1
where wik is the angular velocity vector of molecule k of component i and Iik its
matrix of angular momentum of inertia. u.rkl / is the intermolecular potential energy
and kl is the torque due to the interaction between molecules k and l. The lower
indices i and j denote the components of the mixture and hi is the partial molar
enthalpy of component i.
Molecular dynamics simulations were performed with the program ms2 [9, 16].
In a first step, a simulation in the isobaric-isothermal (NpT) ensemble was carried
out to calculate the density and enthalpy at the desired temperature, pressure and
composition. In the second step, a NVT ensemble simulation was performed at
this temperature, density and composition to determine the transport properties.
Newton’s equations of motion were solved with a fifth-order Gear predictor-
corrector numerical integrator. The temperature was controlled by velocity scaling.
In all simulations, the integration time step was 0:877 fs. The simulations contained
4000 molecules and were carried out in a cubic volume with periodic boundary
conditions where the cut-off radius was set to rc D 17:5 Å. LJ long range
interactions were considered using angle averaging [30]. Electrostatic long-range
corrections were approximated by the reaction field technique with conducting
boundary conditions .RF D 1/. Analogous NVT simulations with an extended
cut-off radius that reached half of the edge length of the cubic simulation volume
were employed to calculate the radial distribution function (RDF). Here, starting
from well-equilibrated configurations, production runs had a duration between 5
and 10 104 time steps.
622 G. Guevara-Carrion et al.
The simulations in the NpT ensemble were equilibrated over 1:2105 time steps,
followed by a production run over 5 105 time steps. In the NVT ensemble, the
simulations were equilibrated over 3105 time steps, followed by production runs of
107 time steps. The self- and MS diffusion coefficients, shear viscosity and thermal
conductivity were calculated by Eqs. (2), (3), (4), (6) and (8) with up to 4 104
independent time origins of the autocorrelation functions. The sampling length of
the autocorrelation functions was 17:5 ps for all mixtures. That extensive length
of the autocorrelation functions was chosen such that long-time tail corrections
were not necessary. The separation between the time origins was chosen so that
all autocorrelation functions have decayed at least to 1=e of their normalized value
to achieve their time independence [48]. Statistical uncertainties of the predicted
values were estimated with a block averaging method [3].
4 Simulation Results
Fig. 4 Average relative deviation (ARD) of present simulation results for Fick diffusion coeffi-
cient (top), shear viscosity (center) and thermal conductivity (bottom) from the best polynomial fit
of the available experimental data
624 G. Guevara-Carrion et al.
Fig. 5 Results for benzene (1) C toluene at 298.15 K and 0.1 MPa. (a) Simulation results for
the density (blue open circle) are compared with experimental data. (b) Thermodynamic factor.
(c) Simulation results for the Maxwell-Stefan diffusion coefficient (blue solid circle) are com-
pared with the models by Darken [8] (blue open circle), Vignes [51] (black dashed curve),
Li et al. [27] (blue curve with diamonds) and Zhou et al. [57] (black solid curve) based on present
Transport Properties of 20 Binary Mixtures 625
Because the components of this mixture are chemically similar, it behaves nearly
ideal. This fact can be clearly observed for the calculated thermodynamic factor,
cf. Fig. 5b, which is approximately unity over the whole composition range. Per
definition, an ideal mixture has a thermodynamic factor equal to unity. Figure 5
shows also the simulation results for the density, MS- and self-diffusion coefficient,
shear viscosity and thermodynamic factor, as well as the calculated Fick diffusion
coefficient in comparison with experimental data and some predictive equations
from the literature. As can be observed, all properties are almost a linear function
of the mole fraction. Thus, simple interpolative predictive equations are able
to accurately predict the mixture properties with low deviations. Transport data
reported here agree well with the experiments. The average relative deviations for
Fick diffusion coefficient, shear viscosity and thermal conductivity are 6:9 %, 4:7 %
and 7:9 %, respectively.
Fig. 5 (continued) simulation data. (d) Simulation results for the Fick diffusion coefficient (blue
solid circle) are compared with experimental data (plus). The models by Li et al. [27] (blue curve
with diamonds), Zhou et al. [57] (black solid curve) and Zhu et al. [58] (green curve with diamonds)
based on present simulation data are also shown. (e) Simulation results for the self-diffusion
coefficients of benzene (black solid circle) and toluene (blue solid circle) are compared with the
models by Li et al. [27] (dashed curve) and Liu et al. [28] (solid curve). (f) Simulation results for
the shear viscosity (blue solid circle) are shown together with the viscosity of the ideal mixture
(dashed curve) and experimental data (plus). (g) Simulation results for the thermal conductivity
(blue solid circle) are compared with the predictions from the Filippov relation [12] (dashed curve)
and experimental data (plus)
626 G. Guevara-Carrion et al.
Fig. 6 Results for acetone (1) C benzene at 298.15 K and 0.1 MPa. (a) Simulation results for the
density (blue open circle) are compared with experimental data. (b) Thermodynamic factor; the
shaded area represents the range of the results of the three considered GE models. (c) Simulation
results for the Maxwell-Stefan diffusion coefficient (blue solid circle) are compared with the
models by Darken [8] (blue open circle), Vignes [51] (dashed curve), Li et al. [27] (blue curve
with diamonds) and Zhou et al. [57] (solid curve) based on present simulation data. (d) Simulation
Transport Properties of 20 Binary Mixtures 627
Mixtures containing one alcohol usually show strong non-idealities for different
thermodynamic and transport properties, e.g., MS, Fick, self-diffusion coefficients,
shear viscosity or excess volume, as in the case of ethanol C cyclohexane. This
behavior can be explained by association effects related to the presence of hydrogen-
bonding. The thermodynamic factor of this mixture, shown in Fig. 7b, can reach
values close to zero and exhibits a strong composition dependence. The shaded area,
related to the chosen GE model, is important so that the Fick diffusion coefficient
can be calculated only within a relatively large uncertainty. Other binary mixtures
showing a similar behavior are the ones of methanol and ethanol with benzene,
toluene and CCl4 .
Figure 7 shows the predicted transport properties of ethanol C cyclohexane
together with experimental data and selected predictive equations from the liter-
ature. It can be observed that shear viscosity, the Fick diffusion coefficient and
self-diffusion coefficient of ethanol show a strong negative deviation from linearity
with a minimum located around 0.2 mol mol1 of ethanol. This sharp decrease of
the diffusion coefficients at low alcohol concentration has been related to cluster
formation due to solute self-association [44].
In order to study the microscopic structure responsible for this behavior RDF
were sampled. The RDF gAB .r/ between like and unlike sites were calculated
together with the running coordination number
Z r
NAB .r/ D 4 r2 gAB .r/ dr; (10)
where r is the distance from the reference site and is the bulk number density of
site B.
Three selected RDF for this mixture at different compositions are shown in
Fig. 8. The double peak related to the hydrogen-bonding structure of ethanol gOH
is enhanced as the ethanol concentration decreases. Thus, the nearest neighbor
Fig. 6 (continued) results for the Fick diffusion coefficient (blue solid circle) are compared with
experimental data (plus). The models by Li et al. [27] (blue curve with diamonds), Zhou et al. [57]
(solid curve) and Zhu et al. [58] (green curve with diamonds) based on present simulation data are
also shown. (e) Simulation results for the self-diffusion coefficients of acetone (black solid circle)
and benzene (blue solid circle) are compared with experimental data (plus) and the models by
Li et al. [27] (dashed curve) and Liu et al. [28] (solid curve). (f) Simulation results for the shear
viscosity (blue solid circle) are shown together with the viscosity of the ideal mixture (dashed
curve) and experimental data (plus). (g) Simulation results for the thermal conductivity (blue solid
circle) are compared with the predictions from the Filippov relation [12] (dashed curve)
628 G. Guevara-Carrion et al.
Fig. 7 Results for ethanol (1) C cyclohexane at 298.15 K and 0.1 MPa. (a) Simulation results for
the density (blue open circle) are compared with experimental data. (b) Thermodynamic factor; the
shaded area represents the range of the results of the three considered GE models. (c) Simulation
results for the Maxwell-Stefan diffusion coefficient (blue solid circle) are compared with the
models by Darken [8] (blue open circle), Vignes [51] (dashed curve), Li et al. [27] (blue curve
with diamonds) and Zhou et al. [57] (solid curve) based on present simulation data. (d) Simulation
Transport Properties of 20 Binary Mixtures 629
5 Conclusion
Fig. 7 (continued) results for the Fick diffusion coefficient (blue solid circle) are compared with
experimental data (plus). The models by Li et al. [27] (blue curve with diamonds), Zhou et al. [57]
(solid curve) and Zhu et al. [58] (green curve with diamonds) based on present simulation data are
also shown. (e) Simulation results for the self-diffusion coefficients of ethanol (black solid circle)
and cyclohexane (blue solid circle) are compared with the models by Li et al. [27] (dashed curve)
and Liu et al. [28] (solid curve). (f) Simulation results for the shear viscosity (blue solid circle)
are shown together with the viscosity of the ideal mixture (dashed curve) and experimental data
(plus). (g) Simulation results for the thermal conductivity (blue solid circle) are compared with the
predictions from the Filippov relation [12] (dashed curve) and experimental data (plus)
630 G. Guevara-Carrion et al.
Fig. 8 Selected radial distribution functions (left) and the corresponding running coordination
numbers (right) of ethanol (1) C cyclohexane at 298.15 K and 0.1 MPa between the oxygen and
hydroxyl hydrogen sites of ethanol gOH (a, b), the methylene and methyl sites of ethanol and
cyclohexane gCH3CH2 (c, d) and the methyl sites of cyclohexane gCH2CH2 (e, f). Data for pure
ethanol and cyclohexane (black dotted curve) as well as for the mixtures with x1 D 0:1 (red curve),
0.3 (green curve), 0.5 (blue curve) and 0.9 mol mol1 (black curve) are depicted
Transport Properties of 20 Binary Mixtures 631
Fig. 9 Snapshots of toluene (1) C benzene (top), methanol (1) C acetone (center) and ethanol
(1) C CCl4 (bottom), at three mole fractions x1 D 0:1 (left), 0.5 (center) and 0.9 mol mol1 (right).
At mole fractions of 0.1 and 0.9 the molecules of the solvent were suppressed to improve visibility.
The methyl and methylene groups are shown in orange, the methine sites in brown, the oxygen
atoms in red and the chlorine atoms in green
1. Abrams, D.S., Prausnitz, J.M.: Statistical thermodynamics of liquid mixtures: a new expression
for the excess Gibbs energy of partly or completely miscible systems. AIChE J. 21, 116–128
2. Alfe, D., Gillan, M.J.: First-principles calculation of transport coefficients. Phys. Rev. Lett. 81,
5161–5164 (1988)
3. Allen, M.P., Tildesley, D.J.: Computer simulation of liquids. Clarendon Press, Oxford (1987)
4. Bonnaud, P., Nieto-Draghi, C., Ungerer, P.: Anisotropic united atom model including the
electrostatic interactions of benzene. J. Phys. Chem. B 111, 3730–3741 (2007)
5. Campbell, A., Chatterjee, R.: The critical constants and orthobaric densities of acetone,
chloroform, benzene, and carbon tetrachloride. Can. J. Chem. 47, 3893–3898 (1969)
6. Collings, A., Mills, R.: Temperature-dependence of self-diffusion for benzene and carbon
tetrachloride. Trans. Faraday Soc. 66, 2761–2766 (1970)
7. Computational Chemistry Comparison and Benchmark Data Base, Standard Reference Data
Base No. 101. The National Institute of Standards and Technology. http://cccbdb.nist.gov/
mulliken2.asp (2015)
8. Darken, L.S.: Diffusion, mobility and their interrelation through free energy in binary metallic
systems. Trans. Am. Inst. Min. Met. Eng. 175, 184–201 (1948)
9. Deublein, S., Eckl, B., Stoll, J., Lishchuk, S.V., Guevara-Carrion, G., Glass, C.W., Merker, T.,
Bernreuther, M., Hasse, H., Vrabec, J.: ms2: a molecular simulation tool for thermodynamic
properties. Comput. Phys. Commun. 182, 2350–2367 (2011)
10. Evans, D.J., Morris, G.P.: Statistical Mechanics of Nonequilibrium Liquids. Academic, London
11. Falcone, D.R., Douglass, D.C., McCall, D.W.: Self-diffusion in benzene. J. Phys. Chem. 71,
2754–2755 (1967)
12. Filippov, L.P.: Teploprovodnost’ rastvorov associirovannyh zhidkostej. Vest. Mosk. Univ., Ser.
Fiz. Mat. Estestv. Nauk 10, 67–69 (1955)
13. Fischer, S.: Experimentelle und theoretische Untersuchung des Einflusses der thermischen
Strahlung auf die effektive Wärmeleitfähigkeit von Flüssigkeiten. Ph.D. thesis, Universität
Siegen, Germany (1984)
14. Fischer, J.D.: Transporteigenschaften reiner Flüssigkeiten und binärer Mischungen mit unter-
schiedlichen Wechselwirkungsparametern. Ph.D. thesis, TH Darmstadt (1986)
15. Fischer, J., Weiss, A.: Transport properties of liquids. V. Self diffusion, viscosity, and mass
density of ellipsoidal shaped molecules in the pure liquid phase. Ber. Bunsenges. Phys. Chem.
90, 896–905 (1986)
16. Glass, C.W., Reiser, S., Rutkai, G., Deublein, S., Köster, A., Guevara-Carrion, G., Wafai, A.,
Horsch, M., Bernreuther, M., Windmann, T., Hasse, H., Vrabec, J.: ms2: a molecular simulation
tool for thermodynamic properties, new version release. Comp. Phys. Commun. 185, 3302–
3306 (2014)
17. Graupner, K., Winter, E.R.S.: Some measurements of the self-diffusion coefficients of liquids.
J. Chem. Soc. (Resumed) 1, 1152–1150 (1952)
18. Gubbins, K.E., Quirke, N.: Introduction to Molecular Simulation and Industrial Applications:
Methods, Examples and Prospects. Gordon and Breach Science Publishers, Amsterdam (1996)
19. Harris, K.R., Alexander, J.J., Goscinska, T., Malhotra, R., Woolf, L.A., Dymond, J.H.:
Temperature and density dependence of the selfdiffusion coefficients of liquid n-octane and
toluene. Mol. Phys. 78, 235–248 (1993)
20. Hiraoka, H.: Self-diffusion of benzene under pressure. Bull. Chem. Soc. Jpn. 32, 423–424
21. Hirschfelder, J.O., Curtiss, C.F., Bird, R.B.: Molecular theory of gases and liquids. Wiley, New
York (1954)
Transport Properties of 20 Binary Mixtures 633
22. Ikeuchi, H., Kanakubo, M., Okuno, S., Sato, R., Fujita, K., Hamada, M., Shoda, N., Fukai, K.,
Okada, K., Kanazawa, H.: Densities and viscosities of tris(acetylacetonato)cobalt(III) complex
solutions in various solvents. J. Solut. Chem. 39, 1428–1453 (2010)
23. Krishna, R., van Baten, J.M.: The darken relation for multicomponent diffusion in liquid
mixtures of linear alkanes: an investigation using molecular dynamics (MD) simulations. Ind.
Eng. Chem. Res. 44, 6939–6847 (2005)
24. Krüger, G., Weiss, R.: Diffusionskonstanten einiger organischer Flüssigkeiten. Z. Naturforsch.
A 25, 777–780 (1970)
25. Lei, Q.F., Lin R.-S., Ni, D.Y., Hou, Y.C.: Thermal conductivities of some organic solvents and
their binary mixtures. J. Chem. Eng. Data 42, 971–974 (1997)
26. Lemmon, E.W., Span, R.: Short fundamental equations of state for 20 industrial fluids. J.
Chem. Eng. Data 51, 785–850 (2006)
27. Li, J., Liu, H., Hu, Y.: A mutual-diffusion-coefficient model based on local composition. Fluid
Phase Equilib. 187–188, 193–208 (2001)
28. Liu, X., Schnell, S.K., Simon, J.M., Bedeaux, D., Kjelstrup, S., Bardow, A., Vlugt, T.J.H.:
Fick diffusion coefficients of liquid mixtures directly obtained from equilibrium molecular
dynamics. J. Phys. Chem. B 115, 12921–12929 (2011)
29. Luchinskii, G.: Mechanical characteristics of Halogene anhydride’s molecules. Zh. Obshch.
Khim. 7, 2116–2127 (1937)
30. Lustig, R.: Angle-average for the powers of the distance between two separated vectors. Mol.
Phys. 65, 175–179 (1988)
31. Maginn, E.J., Elliot, J.R.: Historical perspective and current outlook for molecular dynamics
as a chemical engineering tool. Ind. Eng. Chem. Res. 49, 3059–3078 (2010)
32. McCool, M.A., Collings, A.F., Woolf, L.A.: Pressure and temperature dependence of the self-
diffusion of benzene. J. Chem. Soc. Faraday Trans. 1 68, 1489–1497 (1972)
33. Merker, T., Engin, C., Vrabec, J., Hasse, H.: Molecular model for carbon dioxide optimized to
vapor-liquid equilibria. J. Chem. Phys. 132, 234512 (2010)
34. Merker, T., Vrabec, J., Hasse, H.: Engineering molecular models: efficient parameterization
procedure and cyclohexanol as case study. Soft Matter 10, 3–25 (2012)
35. Muñoz-Muñoz, Y.M., Guevara-Carrion, G., Llano-Restrepo, M., Vrabec, J.: Lennard-Jones
force field parameters for cyclic alkanes from cyclopropane to cyclohexane. Fluid Phase
Equilib. 404, 150–160 (2015)
36. Nieto-Draghi, C., Bonnaud, P., Ungerer, P.: Anisotropic united atom model including the
electrostatic interactions of methylbenzenes. I. Thermodynamic and structural properties. J.
Phys. Chem. C 111, 15686–15699 (2007)
37. Nieto-Draghi, C., Bonnaud, P., Ungerer, P.: Anisotropic united atom model including the
electrostatic interactions of methylbenzenes. II. Transport properties. J. Phys. Chem. C 111,
15942–15951 (2007)
38. Pickup, S., Blum, F.D.: Self-diffusion of toluene in polystyrene solutions. Macromolecules 22,
3961–3968 (1989)
39. Poling, B.E., Thomson, D.W., Friend, D.G., Rowley, R.L., Wilding, W.V.: Section 2. Physical
and chemical data. In: Perry, R.H., Green, D.W. (eds.) Perry’s Chemical Engineers’ Handbook,
8th edn. McGraw-Hill, New York (2008)
40. Požar, M., Seguier, J.B., Guerche, J., Mazighi, R., Zoranić, L., Mijaković, M.,
Kežić-Lovrinčević, B., Sokolić, F., Perera, A.: Simple and complex disorder in binary mixtures
with benzene as a common solvent. Phys. Chem. Chem. Phys. 17, 9885–9898 (2015)
41. Rathbun, R., Babb, A.: Self-diffusion in liquids. III. Temperature dependence in pure liquids.
J. Phys. Chem. 65, 1072–1074 (1961)
42. Renon, H., Prausnitz, J.M.: Local compositions in thermodynamic excess functions for liquid
mixtures. AIChE J. 14, 135–144 (1968)
43. Rowley, R., White, G.: Thermal conductivities of ternary liquid mixtures. J. Chem. Eng. Data
32, 63–69 (1987)
44. Rutten, P.W.M.: Diffusion in Liquids. Delft University Press, Delft (1992)
634 G. Guevara-Carrion et al.
45. Santos, F.J.V., Nieto de Castro, C.A., Dymond, J.H., Dalaouti, N.K., Assael, M.J., Nagashima,
A.: Standard reference data for the viscosity of toluene. J. Phys. Chem. Ref. Data 35, 1–8
46. Schnabel, T., Vrabec, J., Hasse, H.: Henry’s law constants of methane, nitrogen, oxigen and
carbon dioxide in ethanol from 273 to 498 K: prediction from molecular simulation. Fluid
Phase Equilib. 233, 134–143 (2005)
47. Schnabel, T., Srivastava, A., Vrabec, J., Hasse, H.: Hydrogen bonding of methanol in
supercritical CO2: comparison between 1H-NMR spectroscopic data and molecular simulation
results. J. Phys. Chem. B 111, 9871–9878 (2007)
48. Schoen, M., Hoheisel, C.: The mutual diffusion coefficient D_12 in binary liquid model
mixtures. Molecular dynamics calculations based on Lennard-Jones (12-6) potentials. Mol.
Phys. 52, 33–56 (1984)
49. Thol, M., Lemmon, E.W., Span, R.: Equation of state for benzene for temperatures from the
melting Line up to 725 K with pressures up to 500 MPa. High Temp. High Press. 41, 81–97
50. Trepǎdus, V., Rǎpeanu, S., Pǎdureanu, I., Parfenov, V.A., Novikov, A.G.: Study of molecular
rotations in some aromatic compounds by cold neutron scattering. J. Chem. Phys. 60, 2832–
2839 (1974)
51. Vignes, A.: Diffusion in binary solutions. Variation of diffusion coefficient with composition.
Ind. Eng. Chem. Fundam. 5, 189–199 (1966)
52. Wakeham, W.A.: Transport properties and industry. In: Letcher, T.M. (ed.) Chemical
Thermodynamics for Industry. The Royal Society of Chemistry, London (2004)
53. Wensink, E.J.W., Hoffmann, A.C., van Maaren, P.J., van der Spoel, D.: Dynamic properties
of water/alcohol mixtures studied by computer simulation. J. Chem. Phys. 119, 7308–7317
54. Wilson, G.M.: Vapor-liquid equilibrium. A new expression for the excess free energy of
mixing. J. Am. Chem. Soc. 86, 127–130 (1964)
55. Windfield, D.J.: Measurement of the apparent diffusion coefficient of toluene by quasielastic
neutron scattering. J. Chem. Phys. 54, 3643–3645 (1971)
56. Windmann, T., Linnemann, M., Vrabec, J.: Fluid phase behavior of nitrogen C acetone and
oxygen C acetone by molecular simulation, experiment and the Peng-Robinson equation of
state. J. Chem. Eng. Data 59, 28–38 (2014)
57. Zhou, M., Yuan, X., Zhang, Y., Yu, K.T.: Local CompLocal composition based Maxwell–
Stefan diffusivity model for binary liquid Systemsosition based Maxwell–Stefan diffusivity
model for binary liquid systems. Ind. Eng. Chem. Res. 52, 10845–10852 (2013)
58. Zhu, Q., Moggridge, G.D., D’Agostino, C.: A local composition model for the prediction of
mutual diffusion coefficients in binary liquid mixtures from tracer diffusion coefficients. Chem.
Eng. Sci. 132, 250–258 (2015)
Large-Scale Phase-Field Simulations
of Directional Solidified Ternary Eutectics
Using High-Performance Computing
1 Introduction
not yet fully understood. To study for example the influence of gravity on the
pattern formation, directional solidification experiments of Al-Ag-Cu are conducted
on the international space station (ISS) [19]. For the simulative investigation of the
pattern evolution we exploit large-scale phase-field simulations. A thermodynamic
consistent phase-field model based on the Grand potential approach is applied
[4, 13, 18]. Previous studies proved, that large-scale simulations are required to study
the patterns formation for different systems in order to avoid effects of the domain
boundary on the morphology [2, 12, 15, 21–24]. In systematic simulation studies,
we analyze the influences of the equilibrium concentrations, the interface energies
and the solidification velocity on the microstructure evolution. We start from an
idealized symmetric ternary system and change it towards the real asymmetric
system Al-Ag-Cu. To quantitatively classify and compare the arising patterns, a
novel analysis method based on the second moment of inertia is introduced.
2 Methods
In the following, the phase-field model and the novel quantitative analysis method
are presented.
The differences of the Grand potentials , acting as driving force for the phase
transitions, can be derived from parabolic fits of the Gibbs energies, provided from
thermodynamic CALPHAD databases [3]. The model is implemented in the massive
parallel framework WALBERLA [2, 11]. A detailed description of the model is given
in [13], the discretisation in [14] and applied computational optimizations and the
scaling behaviour in [2].
In cross sections parallel to the growth front, rods and lamellae of the three solid
phases arrange in different patterns during the directional solidification. To compare
single rods and lamellae from experiments and simulations a quantitative method,
based on the second moment of inertia, is presented. Adapting the method of [17]
to quantify grains and combining it with the algorithm of [1], enables the automated
analysis of the conducted phase-field simulations with periodic boundaries. The
second moments of area are defined as
O p;q D xp yq dA 8 p; q 2 f0; 1; 2g j p C q D 2 : (3)
From this, the inertia tensor in the principle axis system of the form
R 2 R
O 02 O 11 RA y dA RA xydA
JD D 2 (4)
O 11 O 20 A xydA A x dA
is derived. The first and second main invariances of J are expressed in dimensionless
form by dividing it with the surface area. Subsequently, the dimensionless invari-
ances are inverted, following [17], leading to
!O 1 D (5)
O 02 C O 20
!O 2 D : (6)
O 02 O 20 O 211
After scaling !O 1 and !O 2 with the invariances of a circle, the operating numbers ˝1
and ˝2 can be derived, as presented in [17].
!O 1 A2
˝1 D D (7)
!O 1;circle 2 .O 02 C O 20 /
!O 2 A4
˝2 D D (8)
!O 2;circle 16 2 .O 02 O 20 O 211 /
638 J. Hötzer et al.
The ˝2 is related to its shape of the considered object and the ˝1 value to its
distortion. To determine the inertia tensor in a principle axis system, the barycenter
of the considered structure has to be calculated. Due to the periodic boundary
conditions in the simulations, the structures can expand over the domain boundaries.
To calculate the barycenters of these structures, the algorithm of Bai and Breen [1]
is applied. Initially, all cells with the coordinates xi , yi , belonging to the structure,
are projected on a line with the normalized coordinate si D xi=xmax with si 2 Œ0; 1.
This line is defined normal to the corresponding periodic boundaries and xmax is the
width in this direction. Afterwards, the points on the line are mapped on a circle
using the coordinate transformation of each point si to xO i , yO i with
xO i D cos.2 si / ; (9)
yO i D sin.2 si / : (10)
1 X
xO D xO i ; (11)
N iD1
yO D yO i : (12)
N iD1
The two average values are projected back in the original coordinate system using
atan2.Oy; Ox/ 1
x D xmax C : (13)
2 2
This algorithm is repeated for each dimension with periodic boundaries and can also
be used to calculate the inertia tensor in the principle axis system with the Steiner’s
With this method, rods in the microstructure from simulations as well as exper-
iments can be automatically analyzed and quantitatively compared, independent
from the rod size. Also the method can be applied to investigate the shape evolution
of a single rod during the solidification process. This allows to identify a stationary
growth state and derive a criterion to stop the simulation. With this in-situ analysis
the required computational time can be reduced.
Phase-Field Simulations of Large-Scale Domains 639
The influence of the process conditions on the pattern formation of ternary eutectics
during directional solidification, is investigated with large-scale phase-field simu-
lations. For this, the growth velocity for three different systems is systematically
For the simulations a setup as following is applied: Starting from an initial
Voronoi tessellation, the nuclei of the three solid phases grow coupled in direction
of an imprinted temperature gradient. This gradient is pulled in a defined direction
with the velocity v. To simulate an infinite domain, periodic boundaries are applied
on the sides parallel to the growth direction and a Dirichlet boundary condition is
used to model a infinite liquid domain above the solidification front.
The equilibrium concentrations of the three systems S1 , S2 and S3 are depicted
in the ternary isotherm concentration diagram Fig. 1. The ternary eutectic point is
described by the equilibrium concentration of the liquid. With these systems the
transition from an ideal system (S1 ), as investigated in [22], to a model of the
real system Al-Ag-Cu (S3 ) as studied in [13, 21, 24], is shown. In system S1 , the
equilibrium concentrations of the solid phases ˛, ˇ and
are equally distributed
around the liquid equilibrium concentration and the interface energies are set equal.
For the systems S2 and S3 the arrangement of the solid equilibrium concentrations
around the eutectic point are chosen similar to [13]. Equal interface energies, which
S2 , S3
γ α
t C
t C
liquid 60
S1 40
γ β
0 20 40 60 80 100
component C3 [mol-%]
Fig. 1 Equilibrium concentrations of the solid phases and the liquid for system S1 (marked with
circles) and the systems S2 , S3 (marked with triangles)
640 J. Hötzer et al.
are the same as in system S1 , are applied for system S2 , while system S3 is modeled
with the interface energies from [13, 22].
For these three systems, simulations with the velocities v1 D 1:74 103 ,
v2 D 2:76 103 and v3 D 2:61 103 are conducted. A selection of the common
parameters for all simulations are summarized in Table 1.
The simulations are conducted with 3 million time steps in a domain of
800 800 250 cells. Due to the applied moving window technique [2, 25], this
corresponds to a growth height of approximately 6300 cells. Stationary growth is
ensured for all presented simulations. To reduce the output data, only the surface
meshes are stored. For each simulation, 13; 600 cores for 8 h were utilized, resulting
in approximately 200 Gbyte of simulation data. In all simulation results, the phase
˛ is marked in red, ˇ in green and
in blue. In the following, the simulation results
are described and discussed.
3.1 System S1
The simulation fronts after 3 million time steps for the three velocities v1 to v3 of
system S1 are depicted in Fig. 2. For all simulations different aligned hexagonal
structures evolve, divided by contact zones, as discussed in previous work [22].
With an increasing velocity, the microstructure becomes finer. This is accompanied
with an increase in both, the number of rods as well as the interface length. The
observed refinement is in accordance with the analytical Jackson-Hunt approach
[16]. Therefore, the characteristics of the two-dimensional Jackson-Hunt approach
are also fulfilled for three-dimensional growth, as shown in [22] with phase-
field simulations. This approach predicts, that for the minimum undercooling, the
lamellar spacing and v are related as
2 v D constant : (14)
Phase-Field Simulations of Large-Scale Domains 641
Fig. 2 Solidification front of system S1 after 3 million time steps for the three velocities v1 D
1:74 103 (a), v2 D 2:76 103 (b) and v3 D 2:61 103 (c)
Fig. 3 In the left pattern image, selected rods from the simulated micrograph of system S1 are
marked with A-I for the velocity v2 and on the right side, the position of the rods are plotted in an
1 ˝1 over 1 ˝2 diagram
Fig. 4 Solidification front of system S2 after 3 million time steps for the three velocities v1 (a),
v2 (b) and v3 (c)
emphasizes the visual observation, that the rods do not align in a regular hexagonal
3.2 System S2
In the next step the parabolic free energies are shifted to those of the Al-Ag-Cu,
as given in [13], but still equal interface energies are applied. The location and
arrangement of the equilibrium concentrations are shown in the liquidus projection
in Fig. 1.
In Fig. 4, the rods arrange in chain-like structures consisting of ˇ and
embedded in a matrix phase ˛ for all three velocities. As also observed in the
simulations of [13], junctions and ring-like structures occur. This behavior is
reported in experiments of directionally solidified ternary eutectics [5–10]. Due to
the asymmetric arrangement of the equilibrium concentrations around the eutectic
point, different phase fractions evolve for the three simulated growth velocities.
This is caused by an adjustment of the front undercooling, depending on the growth
velocity predicted in the analytical approach from Jackson and Hunt [16]. Therefore,
different concentrations in the solid phases are established. Similar to system S1 the
structures become finer with an increase of the velocity.
3.3 System S3
To accurately approximate the real material system, interface energies similar to [13,
21] are applied for system S3 . The solidification fronts at the end of the simulations
are shown in Fig. 5 for the three different velocities v1 , v2 and v3 .
Phase-Field Simulations of Large-Scale Domains 643
Fig. 5 Solidification front of system S3 after 3 million time steps for the three velocities v1 (a),
v2 (b) and v3 (c)
For a quantitative comparison of the arising patterns in the three systems S1 , S2 and
S3 the previously introduced method of the second moment of inertia is applied.
In Fig. 6 the probability of a ˝1 and ˝2 value for a certain rod shape is shown.
On the left side, the probability of the shapes for the ˇ rods and on the right side
for the
rods are depicted for the velocity v1 . For system S1 , a pronounced peak
near ˝1 D 0:9924 and ˝2 D 0:9848 can be seen for both phases. These values
correspond to an undistorted hexagon. For system S2 , no peak can be observed for
the ˇ rods and for
, the peak is smaller than for system S1 . This trend continues for
system S3 . The variance of the probabilities over ˝1 and ˝2 increase from system
S1 to system S3 . The introduced analysis with the second moment of inertia reflects
the visual observation of different rod shapes in a quantitative manner.
644 J. Hötzer et al.
β γ
1 1
0.8 0.8
0.6 0.6
0.2 0.2 0.8
0 0
1 1 0.2
1 1
0.8 0.8 0.8 0.8
0.6 0.6 0.6 0.6
0.4 0.4 0.4 0.4
Ω1 Ω2 Ω1 Ω2
0.2 0.2 0.2 0.2
0 0 0 0
1 1
0.8 0.8
0.6 0.6
0.4 1
0.2 0.2 0.8
0 0
1 1 1 0.2
0.8 0.8 0.8
0.8 0
0.6 0.6 0.6 0.6
0.4 0.4 0.4 0.4 Ω2
Ω1 Ω2 Ω1
0.2 0.2 0.2 0.2
0 0 0 0
1 1
0.8 0.8
0.4 0.4 0.8
0.2 0.2
0 0
1 1
1 1 0
0.8 0.8 0.8 0.8
0.6 0.6 0.6 0.6
0.4 0.4 Ω1 0.4 0.4 Ω2
Ω1 Ω2
0.2 0.2 0.2 0.2
0 0 0 0
Fig. 6 Probability of a ˝1 and ˝2 value for a certain rod shape of the phases ˇ (left column) and
4 Conclusion
In this work, we investigated the influence of process and physical parameters on the
pattern formation of directionally solidified ternary eutectics. An idealized ternary
eutectic alloy (system S1 ) is systematically changed towards the real system of Al-
Ag-Cu (system S3 ). The pattern formation for all systems is investigated for three
different velocities and quantitatively compared with the presented method of the
second moment of inertia.
The conducted simulations follow the predictions of the analytical Jackson-Hunt
approach, that higher velocities result in finer microstructures. The systems S2 and
S3 refer to asymmetric arranged equilibrium concentration and we observe that the
phase fractions change for the different velocities. This effect of the velocity on
the phase fractions is also reported from experiments (Dennstedt, A.: 2016-03-21.
Private communication). In our simulations, a variation of the interface energies
leads to a better visual accordance with experimental micrographs. With the method
of the second moment of inertia, it is quantitatively shown, that a larger deviation
from the idealized system results in a higher variance of the rod shapes. We conclude
that the analysis method is suited to classify single rod shapes as well as the total
Phase-Field Simulations of Large-Scale Domains 645
1. Bai, L., Breen, D.: Calculating center of mass in an unbounded 2d environment. J. Gr. GPU
Game Tools 13(4), 53–60 (2008)
2. Bauer, M., Hötzer, J., Steinmetz, P., Jainta, M., Berghoff, M., Schornbaum, F., Godenschwager,
C., Köstler, H., Nestler, B., Rüde, U.: Massively parallel phase-field simulations for ternary
eutectic directional solidification (2015). arXiv preprint arXiv:1506.01684, accepted at
SuperComputing 2015
3. Choudhury, A., Kellner, M., Nestler, B.: A method for coupling the phase-field model based
on a grand-potential formalism to thermodynamic databases. Curr Opin Solid State Mater Sci
19(0), 287–300 (2015)
4. Choudhury, A., Nestler, B.: Grand-potential formulation for multicomponent phase transfor-
mations combined with thin-interface asymptotics of the double-obstacle potential. Phys. Rev.
E 85:021602 (2012)
5. Dennstedt, A., Choudhury, A., Ratke, L., Nestler, B.: Microstructures in a ternary eutectic
alloy: devising metrics based on neighbourhood relationships. In: IOP Conference Series:
Materials Science and Engineering, Kazan (2014)
6. Dennstedt, A., Helfen, L., Steinmetz, P., Nestler, B., Ratke, L.: 3D synchrotron imaging of a
directionally solidified ternary eutectic. Metall. Mater. Trans. A 47, 981–984 (2015)
7. Dennstedt, A., Ratke, L.: Microstructures of directionally solidified Al-Ag-Cu ternary
eutectics. Trans. Indian Inst. Met. 65(6), 777–782 (2012)
8. Dennstedt, A., Ratke, L., Choudhury, A., Nestler, B.: New metallographic method for
estimation of ordering and lattice parameter in ternary eutectic systems. Metall. Microstruct.
Anal. 2(3), 140–147 (2013)
9. Genau, A.L., Ratke, L.: Crystal orientation and morphology in Al–Ag–Cu ternary eutectic.
IOP Conf. Ser.: Mater. Sci. Eng. 27(1), 012032 (2012)
10. Genau, A., Ratke, L.: Morphological characterization of the Al-Ag-Cu ternary eutectic. Int. J.
Mater. Res. 103(4), 469–475 (2012)
11. Godenschwager, C., Schornbaum, F., Bauer, M., Köstler, H., Rüde, U.: A framework for
hybrid parallel flow simulations with a trillion cells in complex geometries. In: Proceedings of
SC13: International Conference for High Performance Computing, Networking, Storage and
Analysis, p. 35. ACM, New York (2013)
12. Hötzer, J., Jainta, M., Steinmetz, P., Dennstedt, A., Nestler, B.: Die Vielfalt der Musterbildung
in Metallen (2015)
13. Hötzer, J., Jainta, M., Steinmetz, P., Nestler, B., Dennstedt, A., Genau, A., Bauer, M., Köstler,
H., Rüde, U.: Large scale phase-field simulations of directional ternary eutectic solidification.
Acta Materialia 93, 194–204 (2015)
14. Hötzer, J., Tschukin, O., Said, M.B., Berghoff, M., Jainta, M., Barthelemy, G., Smorchkov, N.,
Schneider, D., Selzer, M., Nestler, B.: Calibration of a multi-phase field model with quantitative
angle measurement. J. Mater. Sci. 51(4), 1788–1797 (2016)
646 J. Hötzer et al.
15. Hötzer, J., Steinmetz, P., Jainta, M., Schulz, S., Kellner, M., Nestler, B., Genau, A., Dennstedt,
A., Bauer, M., Köstler, H., Rüde, U.: Phase-field simulations of spiral growth during directional
ternary eutectic solidification. Acta Materialia 106, 249–259 (2016)
16. Jackson, K.A., Hunt, J.D.: Lamellar and rod eutectic growth. Aime Met. Soc. Trans. 236,
1129–1142 (1966)
17. MacSleyne, J.P., Simmons, J.P., De Graef, M.: On the use of 2-d moment invariants for the
automated classification of particle shapes. Acta Materialia 56(3), 427–437 (2008)
18. Plapp, M.: Unified derivation of phase-field models for alloy solidification from a grand-
potential functional. Phys. Rev. E 84, 031601 (2011)
19. Rex, S.: ACCESS e.V., RWTH Aachen, SETA – Das Erstarrungsverhalten von mehrkompo-
nentigen Legierungen, 2014-03-07. Accessed 25 Feb 2016
20. Rinaldi, M.D., Sharp, R.M., Flemings, M.C.: Growth of ternary composites from the melt: Part
II. Metall. Trans. 3(12), 3139–3148 (1972)
21. Steinmetz, P., Yabansu, Y.C., Hötzer, J., Jainta, M., Nestler, B., Kalidindi, S.R.: Analytics for
microstructure datasets produced by phase-field simulations. Acta Materialia 103, 192–203
22. Steinmetz, P., Hötzer, J., Kellner, M., Dennstedt, A., Nestler, B.: Large-scale phase-field
simulations of ternary eutectic microstructure evolution. Comput. Mater. Sci. 117, 205–214
23. Steinmetz, P., Hötzer, J., Nestler, B.: Charakterisierung mehrkomponentiger Materialstrukturen
durch den Einsatz von Hchstleistungsrechnern und Data Mining Konzepte (2015)
24. Steinmetz, P., Kellner, M., Hötzer, J., Dennstedt, A., Nestler, B.: Phase-field study of the
pattern formation in Al–Ag–Cu under the influence of the melt concentration. Comput. Mater.
Sci. 121, 6–13 (2016)
25. Vondrous, A., Selzer, M., Hötzer, J., Nestler, B.: Parallel computing for phase-field models.
Int. J. High Perform. Comput. Appl. 28(1), 61–72 (2014)
Seismic Applications of Full Waveform Inversion
1 Introduction
First implementations of FWI were conducted in the 1980s in the time domain by
[23] and [15] as well as in the frequency-domain in the 1990s by [19] (see [25] for a
general FWI overview). Particularly due to huge improvements in high-performance
computing, these FWI strategies have emerged as an efficient imaging tool. In
our work we concentrate on the implementation of the time-domain FWI and its
application to seismic field-data problems. It comprises two- and three-dimensional
(e.g., [4]) modeling of viscoelastic wavefields and exploits – as a main advantage
– straightforward and efficient parallelization by domain decomposition [2] and
source parallelization [12] leading to a significant speedup on parallel computers.
Within the scope of the HPC project KITFWT, we present applications of FWI to
field-data (Sects. 3.1, 3.2, and 3.3) as well as further methodological developments
illustrated by synthetic examples (Sects. 3.4 and 3.5).
2 Methodology
FWI aims to find the optimal subsurface model by iteratively minimizing the misfit
function between recorded and synthetic seismic data. That is, by solving the
“forward problem” this model has to explain the recorded seismic data. The iterative
optimization scheme of FWI – combining “forward problem” and “inverse problem”
– comprises several steps shown in Fig. 1.
In detail, the method is initialized by two main inputs. First, we choose 2D or
3D initial parameter models of the subsurface, such as seismic velocities vP of
compressional wave (P-wave) and vS of shear wave (S-wave), mass density as
well as attenuation represented by quality factors QP and QS for both wave types.
They are assigned to the starting model at the first FWI iteration. The initial model
can be estimated from a-priori information or computed by conventional seismic
imaging methods. Second, the recorded data is obtained from seismic measurements
involving many source locations and receivers. Typical acquisitions are performed
offshore by utilizing air guns as sources and hydrophones located at sea surface/sea
floor or onshore with hammerblow sources and geophones.
Within the FWI framework, for each source of the acquisition geometry, seismic
modeling is applied (solution of “forward problem”, see [2, 24]). That is, using the
initial source wavelet the wavefield is emitted by the source and forward-propagates
across the medium. A time series of spatial wavefield volumes has to be stored
in memory. Synthetic seismic data is obtained at the receivers and the difference of
synthetic and recorded data is calculated – resulting in residuals. In order to improve
the minimization of the misfit function, a comprehensive multi-stage workflow
focusses on both different model scales by applying frequency filtering to the data
(e.g., [3, 21]) or choosing subsets of the data.
For each source, the residual wavefield is back-propagated from the receivers
to the source position. The cross-correlation of forward- and back-propagated
wavefields yields source-specific steepest-descent gradients (solution of “inverse
Seismic Applications of Full Waveform Inversion 649
Inversion for single or multiple model parameter(s) Inversion for source wavelet
• residuals and data misfit: “synthetic data” - “recorded data” • calculation of new source
• back-propagation of residuals and calculation of gradients by cross- wavelet using a least-
correlating back-propagated wavefield and forward-wavefield snapshots squares method
• preconditioning of gradients
No Yes
Misfit minimized? Workflow finished?
No: next workflow step
Stop: best-fit model has been found
Model update
• summation of gradients for all sources and application of preconditioned conjugate-gradient method
• optimization: calculation of optimal step length using parabolic line search or L-BFGS and Wolfe line search
• update model parameters vP , vS , r , QP , QS
Fig. 1 General FWI scheme used for iterative improvement of physical model parameters of the
subsurface by minimizing the misfit between modeled and recorded seismic data
problem”, see [10, 15, 18, 23]). The computation of the global gradient for the
entire acquisition geometry is given by the summation of all source-specific gradi-
ents. Subsequent optimization methods, such as preconditioned conjugate-gradient
method and L-BFGS method are applied. The update of the model parameter(s) is
the final step of a FWI iteration. The gradient has to be scaled by an optimal step
length to get a proper model update. The estimation of the step length might require
a significant amount of additional forward modelings. Furthermore, regarding the
source wavelet, an equivalent inverse problem is given. The true wavelet is not
known. The initial wavelet (a rough estimation or synthetic signal) is subject to
FWI, too, and optimized during inversion by a least-squares method [7, 19].
Seismic modeling represents the fundamental part of FWI. In dependence of
the field of application, the wave-propagation physics for an underlying subsurface
650 A. Kurzmann et al.
model has to be described by an appropriate wave equation. On the one hand, that
comprises the utilization of (visco-)acoustic or (visco-)elastic wave equation. On the
other hand, the problem has to be solved for two-dimensional or three-dimensional
subsurface models. The numerical implementation of the wave equations consists of
a time-domain finite-difference (FD) time-stepping method in cartesian coordinates.
In detail, the FD-scheme solves the stress-velocity formulation by utilizing stress
and particle-velocity wavefields. Due to finite model sizes, the wave equations
are expanded by perfectly matched layer terms (PML) to avoid artificial boundary
reflections. Finally, at each FWI iteration a 2D or 3D wave equation has to be solved
for a certain number of sources in forward- and back-propagation.
Apart from other factors, the success of a FWI depends on a sufficient illumination
of the model area. Thus, several source and receiver positions are necessary (reason-
able numbers may vary between 20 and more than 500). For each source, modelings
have to be performed separately requiring most of the entire computation time of
FWI. That results in huge computational efforts, which can be handled by a massive
parallelization. Our FWI implementation offers two types of parallelization. On the
one hand, the model area can be decomposed into subdomains, which are assigned
to all available cores [2]. Additional padding layers with half the size of the spatial
differential operator are located around the model. At each time step these model
boundaries are exchanged by Message-Passing-Interace communication (MPI). On
the other hand, due to increasing communication, modelings cannot benefit from the
decomposition of a model into a high number of very small subdomains. Hence, it
should be supplemented with parallelized modelings with respect to the sources
[11, 12]. The combination of domain decomposition and source parallelization
results in nearly perfect speedup on supercomputers.
To study gas hydrate deposits in the Danube Deep Sea Fan in the Black Sea off the
coast of Romania several geophysical experiments have been carried out including
reflection and refraction seismic measurements. Within the SUGAR-III project
(SUGAR – SUbmarine GAs Hydrate Resources) 15 ocean-bottom seismometer
(OBS) stations equipped with pressure and particle velocity sensors have been
Seismic Applications of Full Waveform Inversion 651
deployed and have been covered by eight profiles of about 14 km length each. In this
project, we apply FWI to data of a profile covering five stations to study subseafloor
deposits of hydrated sediments which are possibly underlain by free gas.
For the acoustic FWI the data of the pressure sensors is utilized. As a starting
model we use the resulting compressional-wave velocity model of a travel time
tomography and a density model is derived by an empirical relation.
To account for the unknown source signature a correction filter is derived which
matches the signature of the main events in the measured field data and in the
modeled seismograms calculated for the starting model. This filter is inverted for
at the beginning of each frequency stage by a waterlevel deconvolution and is then
convolved with the original wavelet that has been used for the forward propagation.
Another forward propagation is then performed with the corrected wavelet. Within
one iteration the residuals of the measured and modeled data are minimized using
the objective function suggested by [5]. Practically, this means the traces are
normalized and, thus, a comparability of field and synthetic data is ensured. The
gradients are spatially preconditioned to suppress undesired model updates, e.g. in
the watercolumn and close to the OBS positions. Further preconditioning is applied
to enhance updates in the deeper parts of the model.
To prepare the field data for the inversion a data transformation is required to
correct for 3D geometrical spreading effects when applying pa 2D-FWI approach.
This is accomplishedp by a convolution of each trace with 1= t (t: traveltime) and a
multiplication by v 2t (v: velocity).
Examination of the field data showed strong ringing following the arrival of
the direct waves which is possibly caused by receiver characteristics. To reduce
influence of these dominant signals the timewindow including the direct wave
and following simply reflected signals has been excluded from the inversion. We
therefore only use refracted wave signals and multiple reflections.
3.1.4 Results
The inverted model of wave velocity vP shows structures that are mainly horizon-
tally orientated. In the central part of the profile high velocity anomalies become
visible (see Fig. 2). The depth of the recovered anomalies coincides with the location
of the BSR horizon. This, and a velocity increase of about 200 m/s indicates the
presence of hydrated sediments. No indication for a gas layer is observed though.
652 A. Kurzmann et al.
Fig. 2 Top: inverted vP model, OBS locations (white circles), location of depth profiles shown
below (red dashed lines); bottom: depth profiles of starting (blue) and final (red) vP -model, seafloor
(white line), depth of BSR horizon (black dashed lines)
A good overall match of the field data with the modeled seismograms for the final
model could be achieved by the inversion (see Fig. 3). Also the direct wave phases
that have not been considered for inversion could be fitted well.
3.1.5 Summary
Fig. 3 Exemplary seismograms of OBS 3: measured field data (red), synthetic data for the final
vP -model (black); time window used for the inversion is marked by the shaded red area
The field-data application of FWI is still challenging and not common practise. 2D
acoustic FWI is usually used to update a kinematic subsurface model and to improve
the results of standard seismic imaging methods. In this work we apply 2D acoustic
FWI to a marine seismic data set acquired in a river delta using ocean bottom cables
(OBC). The aim of the seismic survey was to characterise a deep oil and gas deposit.
Object of this work is an improved reconstruction of the near-surface region. Here,
rising gases may stop at impenetrable sediments and accumulate to gas “pockets”
reducing seismic velocities, which is a potential source for difficulties in standard
imaging methods.
The field data were acquired in an OBC-geometry illustrated in Fig. 5a. Two-
hundred and forty hydrophones were placed at the sea floor, whereas the source
array (airguns) was dragged by a ship and triggered near the sea surface. Considering
the solution of the 2D wave equation in forward modeling, we perform a 3D-to-2D
transformation [17] of the recorded field data (example shown in Fig. 4 (left)).
654 A. Kurzmann et al.
0 0
time in s
time in s
2 2
4 4
2 4 6 2 4 6
offset in km offset in km
Fig. 4 Left: exemplary trace-normalized field data seismogram; right: windowed and filtered field
data seismogram used in FWI; the seismograms belong to the source 13 located at x D 2:5 km (see
Fig. 5a)
The final vP model is shown in Fig. 5b showing a satisfactory resolution in the centre
of the model (3 km x 9 km and depths up to 1.2 km). Here, FWI recovered
several geological structures which can be interpreted meaningfully. The remaining
model areas are poorly illuminated. In shallow parts beneath the receiver array we
identify layered structures including two types of significant low-velocity zones.
On the one hand, rising gases may form accumulations (e.g., “gas pockets” close
to the seafloor between x D 4 km and x D 6 km) at impenetrable layers reducing
Seismic Applications of Full Waveform Inversion 655
(a) 0
depth in m
1.5 sources
0 2 4 6 8 10 12
x in km
(b) 0
depth in km
0 2 4 6 8 10 12
x in km
Fig. 5 (a) Starting vP model based on a provided traveltime tomography. The coloured mark-
ers illustrate the acquisition geometry. Sources (gray) were triggered at the sea surface and
hydrophones (red) were located at the sea floor. (b) Final vP model obtained from FWI. The
dashed rectangle represents the section in (c). (c) Overlay of final vP model and the provided
result of a reflection seismic imaging method. Examples of high similarity between both methods
are highlighted by red markers
the seismic velocities. On the other hand, we find dipping structures in terms of
geological fault zones causing both rising gases and accumulations.
We validated the quality of the velocity model by comparison with the result
of a conventional reflection seismic imaging method (pre-stack depth migration),
shown in Fig. 5c. Disregarding a different content of wavelengths or frequencies,
respectively, they show a very good match, in particular fault zones and areas of
strong seismic contrasts, such as gas accumulations.
3.2.4 Summary
Salt bodies proved to be promising sites for the search for hydro carbonates. For
classical imaging techniques the reconstruction of structures beneath or near salt
bodies is challenging. One reason for this is the high reflection coefficient at the
salt-sediment-interface that results in only weak scattered energy returning from
the subsalt regions. Additional reasons are the complex shape of the salt bodies,
trapped sediments in the salt body and a rugose surface. These characteristics lead
to complex wavefields and regions with poor illumination. The solution for these
problems can be FWI that is capable to use weak scattered waves travelling in
complex velocity models. In this work we explore the performance of 2D acoustic
and elastic FWI in time domain for subsalt imaging.
For this we use field data provided by PGS (marine 2D line). The profile is
265 km long with a total number of 5300 shots. The model is 15 km deep and for a
better handling, the model was divided into three subpart. Only the right subpart will
be shown in the following. The subpart has a size of 88.512 km, a grid distance of
12.5 m and a record length of 12 s for each of the 99 source points 804 receivers are
used (moving streamer geometry).
3.3.2 Modeling
The initial velocity model (provided by PGS) includes a salt layer, a salt body and
a velocity gradient as background (Fig. 7). The original acquisition geometry was
extracted and used for the modeling. To validate the acquisition geometry and verify
the suitability of the starting model. In Fig. 6 one acoustically modeled shot and the
appropriated field data shot are displayed. The good match of the main events shows
the sufficiency of the starting model.
The resolution in different parts of the model is influenced by the wave coverage
and the velocity in the model. To estimate the resolution the true model is perturbed
with a chequerboard pattern (Fig. 7). Each block has a size of 300 300 m and
has a velocity perturbation of plus or minus 2 % of the true velocity. The starting
model corresponds to the true model, however, without perturbations. The inversion
Seismic Applications of Full Waveform Inversion 657
Fig. 6 Comparison of field data (a) and acoustically modeled data (b)
is performed purely acoustic in a frequency band of 3–10 Hz. The result is displayed
in Fig. 8. The precise imaging of the chequerboards shows a good illumination in
the target area (subsalt) for the given acquisition geometry.
3.4.1 Motivation
Attenuation and dispersion of seismic waves play important role and need to be
taken into account. Since the first numerical implementations [22] of viscoacoustic
FWI until the most recent ones both the modeling and inversion were mostly
developed in the frequency domain – exploiting its benefits, such as easy imple-
mentation of attenuation, computation of gradients without extra-cost. Time-domain
FWI in attenuative media is less popular. The implementation of strictly constant
Q within a wide frequency band is not so easy. Therefore, we have to consider
a sum of relaxation mechanisms [14]. However, an advantage of time-domain
implementations is efficient parallelizability. Our understanding of attenuation
mechanisms and ability to get reliable Q estimates [9] are still limited. The problem
Seismic Applications of Full Waveform Inversion 659
is compounded by the fact that scattering effects mimic intrinsic attenuation. This
opens the main question of this work: Can spatial distributions of velocity and
attenuation be accurately recovered by applying FWI to synthetic marine reflection
data? We investigate the applicability of time-domain viscoacoustic FWI and show
numerical results using spatially uncorrelated models of velocity and attenuation
3.4.2 Methodology
3.4.4 Summary
1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9
0 0
(a) (d)
1 1
2 2
0 0
(b) (e)
1 1
2 2
0 0
(c) (f)
1 1
2 2
Fig. 9 vP model: true (a), initial (b), final (c); QP model: true (d), initial (e), final (f)
can be interpreted either as low sensitivity of the synthetic data to deeper parts or
a cross-talk effect where the QP -related data misfit is explained by the vP model.
In viscoacoustic inversion, the conventional assumption of correlated velocity
and attenuation subsurface structures might induce an incorrect interpretation.
Our results make clear that (a) further development of inversion strategies is
necessary to extract the desired attenuation information from seismic data, and
(b) the investigation of multiparameter inverse problems with spatially uncorrelated
parameters has to be considered as a necessary step to verify the reliability of these
Shallow-seismic Rayleigh and Love waves are attractive for geotechnical site
investigations. They exhibit a high signal to noise ratio in field data recordings and
have a high sensitivity to the S-wave velocity, an important geotechnical parameter
to characterize the very shallow subsurface. In recent time full waveform inversion
(FWI) has been successfully applied to reconstruct shallow 2-D shear wave velocity
models using either Rayleigh waves (e.g., [8, 20]) or Love waves (e.g., [6, 16]). In
most publications Rayleigh waves have been utilized. The aims of this synthetic
study are (1) to compare the performance of individual waveform inversions of
Seismic Applications of Full Waveform Inversion 661
Rayleigh or Love waves and (2) to explore the benefits of a simultaneous joint-
inversion of both types of surfaces waves.
3.5.3 Results
The results of the FWI reconstruction tests are shown in Fig. 10. The used 1-D
starting models consist of a linear gradient up to a depth of 9 m. Below the true
and the starting models are identical.
Individual Love wave FWI reconstructs the vS model satisfactorily, also the
shallow anomaly is reconstructed well. The model is recovered surprisingly
well, especially as normalized seismograms were used and the fact that the impact
of density is mainly to the absolute wave amplitude as a function of offset.
The objective function is reduced by four orders of magnitude and the velocity
seismograms are fitted very well, thereby only a small residuum is remaining. The
final Love wave FWI result of vS and appear as smoothed versions of the true
Individual Rayleigh wave FWI reconstructs the vS model similarily well
as the Love wave FWI. However, vS and suffer from vertically orientated
artifacts underneath each source. These artifacts are most likely caused by the high
amplitudes and specific radiation of Rayleigh waves below the source locations
and are further enhanced in the gradients of vS by wrong P-wave velocities in the
vicinity of the sources. For the presented starting model the reconstruction of vP was
limited to the upper two meters. The inverted seismograms show small residuums,
especially for the first arrivals. The objective function could be reduced by one order
of magnitude.
The results of the simultaneous joint FWI are also affected by the source arti-
facts of the Rayleigh wave FWI. Nevertheless, the shallow trench is reconstructed
in vS successfully. However, the trench is not visible in vP and only partly in the
True Start Love FWI Rayleigh FWI Joint FWI
10 150
Depth in m
Depth in m
0 2200
r 1800
Kg/m 3
Depth in m
15 1400
10 vy vx vx
Starting model
10 Final model
-2 Observed vz
Love FWI vz
10 -3
Rayleigh FWI vy
Normalized L2
10 -4
Joint FWI
0 50 100 150 0 0.5 1 0 0.5 1 0 0.5 1
# Iteration T in s
T in s T in s
Fig. 10 Results of surface wave reconstruction tests. True and starting models are shown in column 1 and 2, respectively. The final FWI results for Love wave,
Rayleigh wave and the joint FWI are shown in columns 3, 4 and 5, respectively. Below each FWI result, the seismograms for the true, starting and final models
are given. The source (receiver) position corresponding to the seismograms is labeled by a yellow star (red triangle). The evolution of the L2-norm (lower left
corner) is normalized to the maximal value in each case
A. Kurzmann et al.
Seismic Applications of Full Waveform Inversion 663
inverted model. The general appearance of the final vP and models is quite
similar to individual Rayleigh wave FWI. The fit of the velocity seismograms vx
and vz is slightly better than individual Rayleigh wave FWI.
3.5.4 Summary
Based on the field-data example shown in Sect. 3.2, we estimated the resource
consumption of one FWI application as follows:
• finite-difference discretization and 2D wave simulations:
– spatial discretization: 12;160 2160 grid points (26 million grid points)
– 60,000 time steps for each simulation
– number of seismic sources: 50
– total amount of 10,280 simulations within whole FWI
• parallelization:
– domain decomposition: 2D model is divided into 20 24 subdomains
– source parallelization: simulations for 5 sources are computed at once
– allocation of 2,400 CPU cores
• resource consumption and computational performance of FWI framework:
– number of iterations: 72
– computation time for whole FWI job: 38.6 h (92,640 core hours)
– memory consumption for wavefield storage: 835 MB/core (total: 1.9 TB)
664 A. Kurzmann et al.
Acknowledgements The scientific projects in this work are kindly supported by the sponsors
of the Wave Inversion Technology (WIT) Consortium and funded by BWWi (grant number
03SX381C). We also gratefully acknowledge financial support by the Deutsche Forschungsge-
meinschaft (DFG) through CRC 1173. The computations were performed on the computational
resource “ForHLR Phase I” funded by the Ministry of Science, Research and the Arts Baden-
Württemberg and DFG (“Deutsche Forschungsgemeinschaft”).
1. Blanch, J., Robertsson, J., Symes, W.: Modeling of a constant Q: methodology and algorithm
for an efficient and optimally inexpensive viscoelastic technique. Geophysics 60(1), 176–184
2. Bohlen, T.: Parallel 3-D viscoelastic finite difference seismic modeling. Comput. Geosci. 28,
887–899 (2002)
3. Bunks, C., Saleck, F.M., Zaleski, S., Chavent, G.: Multiscale seismic waveform inversion.
Geophysics 60(5), 1457–1473 (1995)
4. Butzer, S., Kurzmann, A., Bohlen, T.: 3D elastic full-waveform inversion of small-scale
heterogeneities in transmission geometry. Geophys. Prospect. 61(6), 1238–1251 (2013)
5. Choi, Y., Alkhalifah, T.: Application of multi-source waveform inversion to marine streamer
data using the global correlation norm. Geophys. Prospect. 60(4), 748–758 (2012)
6. Dokter, E., Köhn, D., Wilken, D., Rabbel, W.: Application of elastic 2D waveform inversion to
a near surface SH-wave dataset. In: 76th EAGE Conference and Exhibition (2014)
7. Forbriger, T.: Inversion flachseismischer Wellenfeldspektren. Dissertation, Stuttgart, Univer-
sity of Stuttgart (2001)
8. Groos, L., Schäfer, M., Forbriger, T., Bohlen, T.: The role of attenuation in 2D full-waveform
inversion of shallow-seismic body and Rayleigh waves. Geophysics 79(6), R247–R261 (2014)
9. Kamei, R., Pratt, R.G.: Inversion strategies for visco-acoustic waveform inversion. Geophys. J.
Int. 194(2), 859–884 (2013)
10. Köhn, D., De Nil, D., Kurzmann, A., Przebindowska, A., Bohlen, T.: On the influence of model
parametrization in elastic full waveform tomography. Geophys. J. Int. 191(1), 325–345 (2012)
11. Kurzmann, A.: Applications of 2D and 3D full waveform tomography in acoustic and
viscoacoustic complex media. Dissertation, Karlsruhe Institute of Technology, Karlsruhe
12. Kurzmann, A., Köhn, D., Przebindowska, A., Nguyen, N., Bohlen, T.: 2D acoustic full wave-
form tomography: performance and optimization. In: 71st EAGE Conference and Technical
Exhibition (2009)
13. Kurzmann, A., Shigapov, R., Bohlen, T.: Viscoacoustic full waveform inversion for spatially
correlated and uncorrelated problems in reflection seismics. In: 77th EAGE Conference and
Exhibition (2015)
14. Liu, H.-P., Anderson, D.L., Kanamori, H.: Velocity dispersion due to anelasticity; implications
for seismology and mantle composition. Geophys. J. Int. 47(1), 41–58 (1976)
15. Mora, P.: Nonlinear two-dimensional elastic inversion of multioffset seismic data. Geophysics
52, 1211–1228 (1987)
16. Pan, Y., Xia, J., Xu, Y., Gao, L., Xu, Z.: Love-wave waveform inversion in time domain for
shallow shear-wave velocity. Geophysics 81(1), R1–R14 (2016)
17. Pica, A., Diet, J.P., Tarantola, A.: Nonlinear inversion of seismic reflection data in a laterally
invariant medium. Geophysics 55(3), 284–292 (1990)
18. Plessix, R.-E.: A review of the adjoint-state method for computing the gradient of a functional
with geophysical applications. Geophys. J. Int. 167(2), 495–503 (2006)
19. Pratt, R.: Seismic waveform inversion in the frequency domain, Part 1: theory and verification
in a physical scale model. Geophysics 64, 888–901 (1999)
Seismic Applications of Full Waveform Inversion 665
20. Schäfer, M., Groos, L., Forbriger, T., Bohlen, T.: Line-source simulation for shallow-seismic
data. Part 2: full-waveform inversion – a synthetic 2-D case study. Geophys. J. Int. 198(3),
1405–1418 (2014)
21. Sirgue, L., Pratt, R.G.: Efficient waveform inversion and imaging: a strategy for selecting
temporal frequencies. Geophysics 69(1), 231–248 (2004)
22. Song, Z., Williamson, P., Pratt, R.: Frequency-domain acoustic-wave modeling and inversion
of crosshole data: Part II – inversion method, synthetic experiments and real-data results.
Geophysics 60(3), 796–809 (1995)
23. Tarantola, A.: Inversion of seismic reflection data in the acoustic approximation. Geophysics
49, 1259–1266 (1984)
24. Virieux, J.: P-SV wave propagation in heterogeneous media: velocity-stress finite-difference
method. Geophysics 51(4), 889–901 (1986)
25. Virieux, J., Operto, S.: An overview of full-waveform inversion in exploration geophysics.
Geophysics 74, WCC1–WCC26 (2009)
A Massively Parallel Multigrid Method
with Level Dependent Smoothers for Problems
with High Anisotropies
1 Introduction
2 Problem Description
The motivating biological question for the construction of our solver is the numeri-
cal simulation of substance transport through the human skin. Using simulations of
such processes helps to estimate the risk assessment of chemical exposures and at
the same time the need for in vitro and in vivo testing can be reduced. However, the
special structure of the human skin imposes several numerical challenges due to the
anisotropic geometry and physical coefficients varying by orders of magnitude, cf.,
e.g., [8, 11]. The uppermost part of the skin, called stratum corneum (SC), consists
of multiple layers of cells (corneocytes) which are connected by thin channels
(lipid layers) (cf. Fig. 1). Since parameters affecting transport and diffusivity of
a substance vary strongly in those subdomains, both have to be considered in a
simulation. For benchmark purposes we solve a modified heat equation
.Ku/ C r .DKru/ D 0
in a computational domain ˝ D ˝lip [ ˝cor . Here, u corresponds to the chemical
activity, K D K.x/ and D D D.x/ are spatially dependent partition and diffusion
coefficients respectively. Concentrations are given by c D Ku and may undergo
discontinuities which reflect domain dependent variations in lipophilicity and
hydrophilicity. For reasons of simplicity we used K D 1 and
1 if x 2 ˝lip
D.x/ D
10 if x 2 ˝cor :
Massive Parallel Multigrd Method for Problems with High Anisotropies 669
1 μm
Fig. 2 Left: Individual hexahedral elements used for the brick-and-mortar FE-grid. Right:
Anisotropic coarse grid of the 3d brick-and-mortar geometry (1280 elements)
For this study we focused on the idealized but still realistic brick-and-mortar
domain. To this end, we used finite element grids consisting of highly anisotropic
elements. This level of anisotropy is required to reduce the number of elements
in the coarse grids, while still capturing the topology and morphology of the
underlying domain.
For the considered geometry only hexahedral elements with varying degrees
of anisotropy were used. The elements used to construct the coarse grid and an
overview over the resulting FE-grid are depicted in Fig. 2. In the given example the
highest aspect ratios are 1:15, however, the presented methods work for much higher
aspect ratios, too.
The steady state solution of the regarded problem setup is depicted in Fig. 3.
670 S. Reiter et al.
Fig. 3 Cut through the domain showing the steady state solution on level 5
3 Solver Setup
In [13] we employed a massively parallel geometric multigrid solver for the Poisson
problem on grids with isotropic cells. While the rather simple Jacobi smoother
used in those studies is fast and perfectly scalable, it is not suitable for anisotropic
problems, since its smoothing properties deteriorate in this case. Iterative methods
like the ILU method on the other hand are known to possess good smoothing
properties also for highly anisotropic problems (cf. [2, 5, 18]). However, for most
applicable methods an efficient parallel implementation is not feasible. The typical
strategy for parallelization is then to employ those methods in Block-Jacobi-type
fashion, i.e., on each process the more sophisticated smoother is executed locally,
while interprocess couplings are treated using a Jacobi method.
While this setup works nicely for smaller process numbers, the iteration numbers
typically increase with the number of processes being involved. As a second aspect,
load imbalances can have a severe impact on the runtime of solver initializations
and such load imbalances are typically given for massively parallel simulations on
unstructured grids. This in particular holds true, when setup times do not grow like
O.n/, as, e.g., for a threshold based ILUT.
In order to construct an optimal solver which can handle high anisotropies
while still providing nearly optimal scalability, we combined the benefits of the
Jacobi smoother for isotropic elements with the efficiency and robustness of the
ILU smoothers for anisotropic problems. This combination is possible using a
special refinement technique, which reduces the anisotropy of elements with each
refinement until the resulting grid can be considered isotropic. The anisotropic and
isotropic refinement rules used for the elements from Fig. 2 are depicted in Fig. 4.
In each refinement step only edges which are longer than a certain threshold
are refined. Starting with a threshold of half of the length of the longest edge,
the threshold is halved after each step so that shorter edges will be refined in the
next iteration. For the shown brick-and-mortar geometry this technique leads to an
isotropic grid after a certain number of refinements, depending on the highest aspect
Massive Parallel Multigrd Method for Problems with High Anisotropies 671
Fig. 4 Anisotropic refinement schemes for different shapes. Black edges are introduced during
4 Parallelization
5 Results
A weak scaling study was performed on the Cray XC40 super computer Hazel
Hen at the HLRS Stuttgart which features 7712 compute nodes, each with 128 GB
of memory, 24 cores per node (virtually 48 through hyperthreading) and a peak
performance of 7420 TFlops.
The study was performed by solving the aforementioned human skin brick-and-
mortar model using the described solver setup. To allow for better comparability
of the different runs, the number of outer CG-iterations was thereby fixed to 12,
which resulted in a relative reduction of the defect by approximately 106 in all
runs. The study starts on 1 process and for each subsequent run we refine the grid
once more using regular refinement, thus increasing the number of elements by a
factor of 8. At the same time we also increase the number of involved processes by
a factor of 8 to guarantee a constant workload per process for all runs. Since we
executed the parallel base-solver of the outer multigrid method on level 4, our study
starts with level 5. Table 2 shows the number of unknowns and the run times of the
different runs. The scaling behavior of assembly, solver initialization, and solving is
also shown in Fig. 5.
Table 3 gives an overview over levelwise distribution qualities. The distribution
quality ql of a level l of the hierarchy is computed as
ntotal nmax
ql WD l l
nl .Pl 1/
Table 2 Each line corresponds to an individual run. Recorded are the number of processes (PEs),
the number of levels (Levels), the number of unknowns (DoFs), the run times of assembly (Tass ),
solver initialization (Tini ), and solving (Tsol )
PEs Levels DoFs Tass (s) Tini (s) Tsol (s)
8 6 522,720 0.48 1.38 6.58
64 7 4,181,760 0.80 2.03 6.95
512 8 33,454,080 0.85 2.10 7.25
4096 9 267,632,640 0.87 2.10 7.15
32,768 10 2,141,061,120 0.86 2.13 7.60
Massive Parallel Multigrd Method for Problems with High Anisotropies 673
Time / Processes
Time (s)
8 64 512 4k 32k
Processes (k=1024)
Fig. 5 Scaling of the run times of assembly (ass), solver initialization (ini), and solving (sol)
Table 3 Distribution qualities for each level of the multigrid hierarchy for the different runs
PE 0 1 2 3 4 5 6 7 8 9
8 1 1 1 1 1 1 – – – –
64 1 1 0:92 0:93 0:94 0:99 0:99 – – –
512 1 1 1 0:91 0:95 0:94 0:94 0:94 – –
4096 1 1 1 0:91 0:95 0:89 0:89 0:89 0:89 –
32,768 1 1 1 0:74 0:83 0:93 0:78 0:78 0:8 0:8
l WD nl ;
l WD max nl :
Table 4 Number of processes used on each level for the individual runs
PE 0 1 2 3 4 5 6 7 8 9
8 1 1 8 8 8 8 – – – –
64 1 1 64 64 64 64 64 – – –
512 1 1 1 256 256 512 512 512 – –
4096 1 1 1 256 256 4096 4096 4096 4096 –
32,768 1 1 1 256 256 256 16;384 16;384 32;768 32;768
Both matrix assembly and solver initialization are performed process locally.
The increase in run time Tass and Tini from 8 to 64 processes is related to the slight
load-imbalance which can be observed for higher process numbers (cf. Table 3).
Nevertheless, the scaling behavior of both assembly and initialization is very good
and perfectly suited for large scale parallel runs.
The solver scalability is satisfactory as well. The run times Tsol increase slightly
the more processes are involved and two effects are in play here: The distribution
quality of the grid hierarchy deteriorates the larger the number of processes
involved. This reflects the fact that some processes have more work to do than others
in each program section due to the slight load imbalance. The slight imbalance is to
be expected for an unstructured grid in which no special properties can be exploited
for partitioning. However, for runs up to 256 processes the increasing parallelization
of the intermediate base-solver on level 4 has a positive effect on total solver run
As demonstrated in [13], the underlying multigrid implementation in UG4 has
nearly optimal scaling properties for perfectly balanced grids. The slightly worse
scaling properties in the study at hand are thus likely to be linked to the observed
load-imbalance. Nevertheless, given the complexity of the problem at hand we think
that the achieved run-times are still convincing. The achieved results demonstrate
the applicability of the presented approach to gain insight into complex biological
processes through high-resolution numerical simulations on massively parallel
Acknowledgements We thank the HLRS for the opportunity to use Hazel Hen and their kind
1. Baker, A.H., Falgout, R.D., Kolev, T.V., Yang, U.M.: Multigrid smoothers for ultra-parallel
computing. SIAM J. Sci. Comput. 33, 2864–2887 (2011)
2. Bastian, P., Wittum, G.: Adaptive multigrid methods: the UG concept. In: Adaptive Methods –
Algorithms, Theory and Applications: Proceedings of the Ninth GAMM-Seminar, Kiel, 22–24
Jan 1993, pp. 17–37. Vieweg+Teubner Verlag, Wiesbaden (1994)
3. Bastian, P., Blatt, M., Scheichl, R.: Algebraic multigrid for discontinuous Galerkin discretiza-
tions of heterogeneous elliptic problems. Numer. Linear Algebra Appl. 19(2), 367–388 (2012)
Massive Parallel Multigrd Method for Problems with High Anisotropies 675
4. Bergen, B., Gradl, T., Rude, U., Hulsemann, F.: A massively parallel multigrid method for
finite elements. Comput. Sci. Eng. 8(6), 56–62 (2006)
5. Bramble, J., Zhang, X.: Uniform convergence of the multigrid v-cycle for an anisotropic
problem. Math. Comput. 70(234), 453–470 (2001)
6. Gmeiner, B., Köstler, H., Stürmer, M., Rüde, U.: Parallel multigrid on hierarchical hybrid grids:
a performance study on current high performance computing clusters. Concurr. Comput.: Pract.
Exp. 26(1), 217–240 (2014)
7. Hackbusch, W.: Multi-grid Methods and Applications, vol. 4. Springer, Berlin/New York
8. Heisig, M., Lieckfeldt, R., Wittum, G., Mazurkevich, G., Lee, G.: Non steady-state descriptions
of drug permeation through stratum corneum. I. The biphasic brick-and-mortar model. Pharm.
Res. 13(3), 421–426 (1996)
9. Heppner, I., Lampe, M., Nägel, A., Reiter, S., Rupp, M., Vogel, A., Wittum, G.: Software
framework ug4: parallel multigrid on the hermit supercomputer. In: High Performance
Computing in Science and Engineering 12, pp. 435–449. Springer, Berlin/London (2013)
10. Müller, E.H., Scheichl, R.: Massively parallel solvers for elliptic partial differential equations
in numerical weather and climate prediction. Q. J. R. Meteorol. Soc. 140(685), 2608–2624
11. Nägel, A., Heisig, M., Wittum, G.: Detailed modeling of skin penetration–an overview. Adv.
Drug Deliv. Rev. 65(2), 191–207 (2013) Modeling the human skin barrier – towards a better
understanding of dermal absorption
12. Reiter, S.: Effiziente Algorithmen und Datenstrukturen für die Realisierung von adaptiven,
hierarchischen Gittern auf massiv parallelen Systemen. PhD thesis, Universität Frankfurt am
Main (2014)
13. Reiter, S., Vogel, A., Heppner, I., Rupp, M., Wittum, G.: A massively parallel geometric
multigrid solver on hierarchically distributed grids. Comput. Vis. Sci. 16(4), 151–164 (2013)
14. Sampath, R.S., Biros, G.: A parallel geometric multigrid method for finite elements on octree
meshes. SIAM J. Sci. Comput. 32, 1361–1392 (2010)
15. Sundar, H., Biros, G., Burstedde, C., Rudi, J., Ghattas, O., Stadler, G.: Parallel geometric-
algebraic multigrid on unstructured forests of octrees. In: Proceedings of the International
Conference on High Performance Computing, Networking, Storage and Analysis, SC ’12,
pp. 43:1–43:11, Los Alamitos. IEEE Computer Society Press (2012)
16. Vogel, A., Reiter, S., Rupp, M., Nägel, A., Wittum, G.: UG 4: a novel flexible software system
for simulating PDE based models on high performance computers. Comput. Vis. Sci. 16(4),
165–179 (2013)
17. Williams, S., Lijewski, M., Almgren, A., Van Straalen, B., Carson, E., Knight, N., Demmel,
J.: s-step Krylov subspace methods as bottom solvers for geometric multigrid. In:
28th International Parallel and Distributed Processing Symposium, pp. 1149–1158. IEEE,
Piscataway (2014)
18. Wittum, G.: On the robustness of ILU smoothing. SIAM J. Sci. Stat. Comput. 10(4), 699–717