Search | arXiv e-print repository

Energy efficiency of finite difference algorithms on multicore CPUs, GPUs, and Intel Xeon Phi processors

Authors: Satya P. Jammy, Christian T. Jacobs, David J. Lusher, Neil D. Sandham

Abstract: In addition to hardware wall-time restrictions commonly seen in high-performance computing systems, it is likely that future systems will also be constrained by energy budgets. In the present work, finite difference algorithms of varying computational and memory intensity are evaluated with respect to both energy efficiency and runtime on an Intel Ivy Bridge CPU node, an Intel Xeon Phi Knights Lan… ▽ More In addition to hardware wall-time restrictions commonly seen in high-performance computing systems, it is likely that future systems will also be constrained by energy budgets. In the present work, finite difference algorithms of varying computational and memory intensity are evaluated with respect to both energy efficiency and runtime on an Intel Ivy Bridge CPU node, an Intel Xeon Phi Knights Landing processor, and an NVIDIA Tesla K40c GPU. The conventional way of storing the discretised derivatives to global arrays for solution advancement is found to be inefficient in terms of energy consumption and runtime. In contrast, a class of algorithms in which the discretised derivatives are evaluated on-the-fly or stored as thread-/process-local variables (yielding high compute intensity) is optimal both with respect to energy consumption and runtime. On all three hardware architectures considered, a speed-up of ~2 and an energy saving of ~2 are observed for the high compute intensive algorithms compared to the memory intensive algorithm. The energy consumption is found to be proportional to runtime, irrespective of the power consumed and the GPU has an energy saving of ~5 compared to the same algorithm on a CPU node. △ Less

Submitted 27 September, 2017; originally announced September 2017.

Comments: Submitted to Computers and Fluids

arXiv:1704.08368 [pdf, ps, other]

doi 10.1002/fld.4395

Surface-sampled simulations of turbulent flow at high Reynolds number

Authors: Neil D. Sandham, Roderick Johnstone, Christian T. Jacobs

Abstract: A new approach to turbulence simulation, based on a combination of large-eddy simulation (LES) for the whole flow and an array of non-space-filling quasi-direct numerical simulations (QDNS), which sample the response of near-wall turbulence to large-scale forcing, is proposed and evaluated. The technique overcomes some of the cost limitations of turbulence simulation, since the main flow is treate… ▽ More A new approach to turbulence simulation, based on a combination of large-eddy simulation (LES) for the whole flow and an array of non-space-filling quasi-direct numerical simulations (QDNS), which sample the response of near-wall turbulence to large-scale forcing, is proposed and evaluated. The technique overcomes some of the cost limitations of turbulence simulation, since the main flow is treated with a coarse-grid LES, with the equivalent of wall functions supplied by the near-wall sampled QDNS. Two cases are tested, at friction Reynolds number Re$_τ$=4200 and 20,000. The total grid node count for the first case is less than half a million and less than two million for the second case, with the calculations only requiring a desktop computer. A good agreement with published DNS is found at Re$_τ$=4200, both in terms of the mean velocity profile and the streamwise velocity fluctuation statistics, which correctly show a substantial increase in near-wall turbulence levels due to a modulation of near-wall streaks by large-scale structures. The trend continues at Re$_τ$=20,000, in agreement with experiment, which represents one of the major achievements of the new approach. A number of detailed aspects of the model, including numerical resolution, LES-QDNS coupling strategy and sub-grid model are explored. A low level of grid sensitivity is demonstrated for both the QDNS and LES aspects. Since the method does not assume a law of the wall, it can in principle be applied to flows that are out of equilibrium. △ Less

Submitted 26 April, 2017; originally announced April 2017.

Comments: Author accepted version. Accepted for publication in the International Journal for Numerical Methods in Fluids on 26 April 2017

Journal ref: International Journal for Numerical Methods in Fluids 85(9):525-537, 2017

arXiv:1610.09146 [pdf, other]

Performance evaluation of explicit finite difference algorithms with varying amounts of computational and memory intensity

Authors: Satya P. Jammy, Christian T. Jacobs, Neil D. Sandham

Abstract: Future architectures designed to deliver exascale performance motivate the need for novel algorithmic changes in order to fully exploit their capabilities. In this paper, the performance of several numerical algorithms, characterised by varying degrees of memory and computational intensity, are evaluated in the context of finite difference methods for fluid dynamics problems. It is shown that, by… ▽ More Future architectures designed to deliver exascale performance motivate the need for novel algorithmic changes in order to fully exploit their capabilities. In this paper, the performance of several numerical algorithms, characterised by varying degrees of memory and computational intensity, are evaluated in the context of finite difference methods for fluid dynamics problems. It is shown that, by storing some of the evaluated derivatives as single thread- or process-local variables in memory, or recomputing the derivatives on-the-fly, a speed-up of ~2 can be obtained compared to traditional algorithms that store all derivatives in global arrays. △ Less

Submitted 28 October, 2016; originally announced October 2016.

Comments: Author accepted version. Accepted for publication in Journal of Computational Science on 27 October 2016

arXiv:1609.01277 [pdf, other]

doi 10.1016/j.jocs.2016.11.001

OpenSBLI: A framework for the automated derivation and parallel execution of finite difference solvers on a range of computer architectures

Authors: Christian T. Jacobs, Satya P. Jammy, Neil D. Sandham

Abstract: Exascale computing will feature novel and potentially disruptive hardware architectures. Exploiting these to their full potential is non-trivial. Numerical modelling frameworks involving finite difference methods are currently limited by the 'static' nature of the hand-coded discretisation schemes and repeatedly may have to be re-written to run efficiently on new hardware. In contrast, OpenSBLI us… ▽ More Exascale computing will feature novel and potentially disruptive hardware architectures. Exploiting these to their full potential is non-trivial. Numerical modelling frameworks involving finite difference methods are currently limited by the 'static' nature of the hand-coded discretisation schemes and repeatedly may have to be re-written to run efficiently on new hardware. In contrast, OpenSBLI uses code generation to derive the model's code from a high-level specification. Users focus on the equations to solve, whilst not concerning themselves with the detailed implementation. Source-to-source translation is used to tailor the code and enable its execution on a variety of hardware. △ Less

Submitted 14 November, 2016; v1 submitted 5 September, 2016; originally announced September 2016.

Comments: Author accepted version, with a small amendment: the link in the "Code Availability" section has been updated, and now refers to the OpenSBLI source code repository on GitHub. Accepted for publication in the Journal of Computational Science on 8 November 2016

Journal ref: Journal of Computational Science 18 (2017) 12-23

Showing 1–4 of 4 results for author: Sandham, N D