-
Automated MPI code generation for scalable finite-difference solvers
Authors:
George Bisbas,
Rhodri Nelson,
Mathias Louboutin,
Paul H. J. Kelly,
Fabio Luporini,
Gerard Gorman
Abstract:
Partial differential equations (PDEs) are crucial in modelling diverse phenomena across scientific disciplines, including seismic and medical imaging, computational fluid dynamics, image processing, and neural networks. Solving these PDEs on a large scale is an intricate and time-intensive process that demands careful tuning. This paper introduces automated code-generation techniques specifically…
▽ More
Partial differential equations (PDEs) are crucial in modelling diverse phenomena across scientific disciplines, including seismic and medical imaging, computational fluid dynamics, image processing, and neural networks. Solving these PDEs on a large scale is an intricate and time-intensive process that demands careful tuning. This paper introduces automated code-generation techniques specifically tailored for distributed memory parallelism (DMP) to solve explicit finite-difference (FD) stencils at scale, a fundamental challenge in numerous scientific applications. These techniques are implemented and integrated into the Devito DSL and compiler framework, a well-established solution for automating the generation of FD solvers based on a high-level symbolic math input. Users benefit from modelling simulations at a high-level symbolic abstraction and effortlessly harnessing HPC-ready distributed-memory parallelism without altering their source code. This results in drastic reductions both in execution time and developer effort. While the contributions of this work are implemented and integrated within the Devito framework, the DMP concepts and the techniques applied are generally applicable to any FD solvers. A comprehensive performance evaluation of Devito's DMP via MPI demonstrates highly competitive weak and strong scaling on the Archer2 supercomputer, demonstrating the effectiveness of the proposed approach in meeting the demands of large-scale scientific simulations.
△ Less
Submitted 7 May, 2024; v1 submitted 20 December, 2023;
originally announced December 2023.
-
A Novel Immersed Boundary Approach for Irregular Topography with Acoustic Wave Equations
Authors:
Edward Caunt,
Rhodri Nelson,
Fabio Luporini,
Gerard Gorman
Abstract:
Irregular terrain has a pronounced effect on the propagation of seismic and acoustic wavefields but is not straightforwardly reconciled with structured finite-difference (FD) methods used to model such phenomena. Methods currently detailed in the literature are generally limited in scope application-wise or non-trivial to apply to real-world geometries. With this in mind, a general immersed bounda…
▽ More
Irregular terrain has a pronounced effect on the propagation of seismic and acoustic wavefields but is not straightforwardly reconciled with structured finite-difference (FD) methods used to model such phenomena. Methods currently detailed in the literature are generally limited in scope application-wise or non-trivial to apply to real-world geometries. With this in mind, a general immersed boundary treatment capable of imposing a range of boundary conditions in a relatively equation-agnostic manner has been developed, alongside a framework implementing this approach, intending to complement emerging code-generation paradigms. The approach is distinguished by the use of N-dimensional Taylor-series extrapolants constrained by boundary conditions imposed at some suitably-distributed set of surface points. The extrapolation process is encapsulated in modified derivative stencils applied in the vicinity of the boundary, utilizing hyperspherical support regions. This method ensures boundary representation is consistent with the FD discretization: both must be considered in tandem. Furthermore, high-dimensional and vector boundary conditions can be applied without approximation prior to discretization. A consistent methodology can thus be applied across free and rigid surfaces with the first and second-order acoustic wave equation formulations. Application to both equations is demonstrated, and numerical examples based on analytic and real-world topography implementing free and rigid surfaces in 2D and 3D are presented.
△ Less
Submitted 7 September, 2023;
originally announced September 2023.
-
Stride: a flexible platform for high-performance ultrasound computed tomography
Authors:
Carlos Cueto,
Oscar Bates,
George Strong,
Javier Cudeiro,
Fabio Luporini,
Oscar Calderon Agudo,
Gerard Gorman,
Lluis Guasch,
Meng-Xing Tang
Abstract:
Advanced ultrasound computed tomography techniques like full-waveform inversion are mathematically challenging and orders of magnitude more computationally expensive than conventional ultrasound imaging methods. This computational and algorithmic complexity, and a lack of open-source libraries in this field, represent a barrier preventing the generalised adoption of these techniques, slowing the p…
▽ More
Advanced ultrasound computed tomography techniques like full-waveform inversion are mathematically challenging and orders of magnitude more computationally expensive than conventional ultrasound imaging methods. This computational and algorithmic complexity, and a lack of open-source libraries in this field, represent a barrier preventing the generalised adoption of these techniques, slowing the pace of research and hindering reproducibility. Consequently, we have developed Stride, an open-source Python library for the solution of large-scale ultrasound tomography problems. On one hand, Stride provides high-level interfaces and tools for expressing the types of optimisation problems encountered in medical ultrasound tomography. On the other, these high-level abstractions seamlessly integrate with high-performance wave-equation solvers and with scalable parallelisation routines. The wave-equation solvers are generated automatically using Devito, a domain specific language, and the parallelisation routines are provided through the custom actor-based library Mosaic. Through a series of examples, we show how Stride can handle realistic tomographic problems, in 2D and 3D, providing intuitive and flexible interfaces that scale from a local multi-processing environment to a multi-node high-performance cluster.
△ Less
Submitted 18 May, 2022; v1 submitted 7 October, 2021;
originally announced October 2021.
-
Temporal blocking of finite-difference stencil operators with sparse "off-the-grid" sources
Authors:
George Bisbas,
Fabio Luporini,
Mathias Louboutin,
Rhodri Nelson,
Gerard Gorman,
Paul H. J. Kelly
Abstract:
Stencil kernels dominate a range of scientific applications, including seismic and medical imaging, image processing, and neural networks. Temporal blocking is a performance optimization that aims to reduce the required memory bandwidth of stencil computations by re-using data from the cache for multiple time steps. It has already been shown to be beneficial for this class of algorithms. However,…
▽ More
Stencil kernels dominate a range of scientific applications, including seismic and medical imaging, image processing, and neural networks. Temporal blocking is a performance optimization that aims to reduce the required memory bandwidth of stencil computations by re-using data from the cache for multiple time steps. It has already been shown to be beneficial for this class of algorithms. However, applying temporal blocking to practical applications' stencils remains challenging. These computations often consist of sparsely located operators not aligned with the computational grid ("off-the-grid"). Our work is motivated by modeling problems in which source injections result in wavefields that must then be measured at receivers by interpolation from the grided wavefield. The resulting data dependencies make the adoption of temporal blocking much more challenging. We propose a methodology to inspect these data dependencies and reorder the computation, leading to performance gains in stencil codes where temporal blocking has not been applicable. We implement this novel scheme in the Devito domain-specific compiler toolchain. Devito implements a domain-specific language embedded in Python to generate optimized partial differential equation solvers using the finite-difference method from high-level symbolic problem definitions. We evaluate our scheme using isotropic acoustic, anisotropic acoustic, and isotropic elastic wave propagators of industrial significance. After auto-tuning, performance evaluation shows that this enables substantial performance improvement through temporal blocking over highly-optimized vectorized spatially-blocked code of up to 1.6x.
△ Less
Submitted 25 February, 2021; v1 submitted 20 October, 2020;
originally announced October 2020.
-
Scaling through abstractions -- high-performance vectorial wave simulations for seismic inversion with Devito
Authors:
Mathias Louboutin,
Fabio Luporini,
Philipp Witte,
Rhodri Nelson,
George Bisbas,
Jan Thorbecke,
Felix J. Herrmann,
Gerard Gorman
Abstract:
[Devito] is an open-source Python project based on domain-specific language and compiler technology. Driven by the requirements of rapid HPC applications development in exploration seismology, the language and compiler have evolved significantly since inception. Sophisticated boundary conditions, tensor contractions, sparse operations and features such as staggered grids and sub-domains are all su…
▽ More
[Devito] is an open-source Python project based on domain-specific language and compiler technology. Driven by the requirements of rapid HPC applications development in exploration seismology, the language and compiler have evolved significantly since inception. Sophisticated boundary conditions, tensor contractions, sparse operations and features such as staggered grids and sub-domains are all supported; operators of essentially arbitrary complexity can be generated. To accommodate this flexibility whilst ensuring performance, data dependency analysis is utilized to schedule loops and detect computational-properties such as parallelism. In this article, the generation and simulation of MPI-parallel propagators (along with their adjoints) for the pseudo-acoustic wave-equation in tilted transverse isotropic media and the elastic wave-equation are presented. Simulations are carried out on industry scale synthetic models in a HPC Cloud system and reach a performance of 28TFLOP/s, hence demonstrating Devito's suitability for production-grade seismic inversion problems.
△ Less
Submitted 22 April, 2020;
originally announced April 2020.
-
GPU Support for Automatic Generation of Finite-Differences Stencil Kernels
Authors:
Vitor Hugo Mickus Rodrigues,
Lucas Cavalcante,
Maelso Bruno Pereira,
Fabio Luporini,
István Reguly,
Gerard Gorman,
Samuel Xavier de Souza
Abstract:
The growth of data to be processed in the Oil & Gas industry matches the requirements imposed by evolving algorithms based on stencil computations, such as Full Waveform Inversion and Reverse Time Migration. Graphical processing units (GPUs) are an attractive architectural target for stencil computations because of its high degree of data parallelism. However, the rapid architectural and technolog…
▽ More
The growth of data to be processed in the Oil & Gas industry matches the requirements imposed by evolving algorithms based on stencil computations, such as Full Waveform Inversion and Reverse Time Migration. Graphical processing units (GPUs) are an attractive architectural target for stencil computations because of its high degree of data parallelism. However, the rapid architectural and technological progression makes it difficult for even the most proficient programmers to remain up-to-date with the technological advances at a micro-architectural level. In this work, we present an extension for an open source compiler designed to produce highly optimized finite difference kernels for use in inversion methods named Devito. We embed it with the Oxford Parallel Domain Specific Language (OP-DSL) in order to enable automatic code generation for GPU architectures from a high-level representation. We aim to enable users coding in a symbolic representation level to effortlessly get their implementations leveraged by the processing capacities of GPU architectures. The implemented backend is evaluated on a NVIDIA GTX Titan Z, and on a NVIDIA Tesla V100 in terms of operational intensity through the roof-line model for varying space-order discretization levels of 3D acoustic isotropic wave propagation stencil kernels with and without symbolic optimizations. It achieves approximately 63% of V100's peak performance and 24% of Titan Z's peak performance for stencil kernels over grids with 256 points. Our study reveals that improving memory usage should be the most efficient strategy for leveraging the performance of the implemented solution on the evaluated architectures.
△ Less
Submitted 2 December, 2019;
originally announced December 2019.
-
Performance of Devito on HPC-Optimised ARM Processors
Authors:
Hermes Senger,
Jaime F. de Souza,
Edson S. Gomi,
Fabio Luporini,
Gerard J. Gorman
Abstract:
We evaluate the performance of Devito, a domain specific language (DSL) for finite differences on Arm ThunderX2 processors. Experiments with two common seismic computational kernels demonstrate that Arm processors can deliver competitive performance compared to other Intel Xeon processors.
We evaluate the performance of Devito, a domain specific language (DSL) for finite differences on Arm ThunderX2 processors. Experiments with two common seismic computational kernels demonstrate that Arm processors can deliver competitive performance compared to other Intel Xeon processors.
△ Less
Submitted 19 August, 2019; v1 submitted 9 August, 2019;
originally announced August 2019.
-
Automatic Differentiation for Adjoint Stencil Loops
Authors:
Jan Hückelheim,
Navjot Kukreja,
Sri Hari Krishna Narayanan,
Fabio Luporini,
Gerard Gorman,
Paul Hovland
Abstract:
Stencil loops are a common motif in computations including convolutional neural networks, structured-mesh solvers for partial differential equations, and image processing. Stencil loops are easy to parallelise, and their fast execution is aided by compilers, libraries, and domain-specific languages. Reverse-mode automatic differentiation, also known as algorithmic differentiation, autodiff, adjoin…
▽ More
Stencil loops are a common motif in computations including convolutional neural networks, structured-mesh solvers for partial differential equations, and image processing. Stencil loops are easy to parallelise, and their fast execution is aided by compilers, libraries, and domain-specific languages. Reverse-mode automatic differentiation, also known as algorithmic differentiation, autodiff, adjoint differentiation, or back-propagation, is sometimes used to obtain gradients of programs that contain stencil loops. Unfortunately, conventional automatic differentiation results in a memory access pattern that is not stencil-like and not easily parallelisable.
In this paper we present a novel combination of automatic differentiation and loop transformations that preserves the structure and memory access pattern of stencil loops, while computing fully consistent derivatives. The generated loops can be parallelised and optimised for performance in the same way and using the same tools as the original computation. We have implemented this new technique in the Python tool PerforAD, which we release with this paper along with test cases derived from seismic imaging and computational fluid dynamics applications.
△ Less
Submitted 5 July, 2019;
originally announced July 2019.
-
Combining Checkpointing and Data Compression to Accelerate Adjoint-Based Optimization Problems
Authors:
Navjot Kukreja,
Jan Hueckelheim,
Mathias Louboutin,
Fabio Luporini,
Paul Hovland,
Gerard Gorman
Abstract:
Seismic inversion and imaging are adjoint-based optimization problems that process up to terabytes of data, regularly exceeding the memory capacity of available computers. Data compression is an effective strategy to reduce this memory requirement by a certain factor, particularly if some loss in accuracy is acceptable. A popular alternative is checkpointing, where data is stored at selected point…
▽ More
Seismic inversion and imaging are adjoint-based optimization problems that process up to terabytes of data, regularly exceeding the memory capacity of available computers. Data compression is an effective strategy to reduce this memory requirement by a certain factor, particularly if some loss in accuracy is acceptable. A popular alternative is checkpointing, where data is stored at selected points in time, and values at other times are recomputed as needed from the last stored state. This allows arbitrarily large adjoint computations with limited memory, at the cost of additional recomputations.
In this paper, we combine compression and checkpointing for the first time to compute a realistic seismic inversion. The combination of checkpointing and compression allows larger adjoint computations compared to using only compression, and reduces the recomputation overhead significantly compared to using only checkpointing.
△ Less
Submitted 20 September, 2021; v1 submitted 11 October, 2018;
originally announced October 2018.
-
Devito (v3.1.0): an embedded domain-specific language for finite differences and geophysical exploration
Authors:
Mathias Louboutin,
Michael Lange,
Fabio Luporini,
Navjot Kukreja,
Philipp A. Witte,
Felix J. Herrmann,
Paulius Velesko,
Gerard J. Gorman
Abstract:
We introduce Devito, a new domain-specific language for implementing high-performance finite difference partial differential equation solvers. The motivating application is exploration seismology where methods such as Full-Waveform Inversion and Reverse-Time Migration are used to invert terabytes of seismic data to create images of the earth's subsurface. Even using modern supercomputers, it can t…
▽ More
We introduce Devito, a new domain-specific language for implementing high-performance finite difference partial differential equation solvers. The motivating application is exploration seismology where methods such as Full-Waveform Inversion and Reverse-Time Migration are used to invert terabytes of seismic data to create images of the earth's subsurface. Even using modern supercomputers, it can take weeks to process a single seismic survey and create a useful subsurface image. The computational cost is dominated by the numerical solution of wave equations and their corresponding adjoints. Therefore, a great deal of effort is invested in aggressively optimizing the performance of these wave-equation propagators for different computer architectures. Additionally, the actual set of partial differential equations being solved and their numerical discretization is under constant innovation as increasingly realistic representations of the physics are developed, further ratcheting up the cost of practical solvers. By embedding a domain-specific language within Python and making heavy use of SymPy, a symbolic mathematics library, we make it possible to develop finite difference simulators quickly using a syntax that strongly resembles the mathematics. The Devito compiler reads this code and applies a wide range of analysis to generate highly optimized and parallel code. This approach can reduce the development time of a verified and optimized solver from months to days.
△ Less
Submitted 9 August, 2019; v1 submitted 6 August, 2018;
originally announced August 2018.
-
Architecture and performance of Devito, a system for automated stencil computation
Authors:
Fabio Luporini,
Michael Lange,
Mathias Louboutin,
Navjot Kukreja,
Jan Hückelheim,
Charles Yount,
Philipp Witte,
Paul H. J. Kelly,
Felix J. Herrmann,
Gerard J. Gorman
Abstract:
Stencil computations are a key part of many high-performance computing applications, such as image processing, convolutional neural networks, and finite-difference solvers for partial differential equations. Devito is a framework capable of generating highly-optimized code given symbolic equations expressed in Python, specialized in, but not limited to, affine (stencil) codes. The lowering process…
▽ More
Stencil computations are a key part of many high-performance computing applications, such as image processing, convolutional neural networks, and finite-difference solvers for partial differential equations. Devito is a framework capable of generating highly-optimized code given symbolic equations expressed in Python, specialized in, but not limited to, affine (stencil) codes. The lowering process---from mathematical equations down to C++ code---is performed by the Devito compiler through a series of intermediate representations. Several performance optimizations are introduced, including advanced common sub-expressions elimination, tiling and parallelization. Some of these are obtained through well-established stencil optimizers, integrated in the back-end of the Devito compiler. The architecture of the Devito compiler, as well as the performance optimizations that are applied when generating code, are presented. The effectiveness of such performance optimizations is demonstrated using operators drawn from seismic imaging applications.
△ Less
Submitted 7 February, 2020; v1 submitted 9 July, 2018;
originally announced July 2018.
-
Automated Tiling of Unstructured Mesh Computations with Application to Seismological Modelling
Authors:
Fabio Luporini,
Michael Lange,
Christian T. Jacobs,
Gerard J. Gorman,
J. Ramanujam,
Paul H. J. Kelly
Abstract:
Sparse tiling is a technique to fuse loops that access common data, thus increasing data locality. Unlike traditional loop fusion or blocking, the loops may have different iteration spaces and access shared datasets through indirect memory accesses, such as A[map[i]] -- hence the name "sparse". One notable example of such loops arises in discontinuous-Galerkin finite element methods, because of th…
▽ More
Sparse tiling is a technique to fuse loops that access common data, thus increasing data locality. Unlike traditional loop fusion or blocking, the loops may have different iteration spaces and access shared datasets through indirect memory accesses, such as A[map[i]] -- hence the name "sparse". One notable example of such loops arises in discontinuous-Galerkin finite element methods, because of the computation of numerical integrals over different domains (e.g., cells, facets). The major challenge with sparse tiling is implementation -- not only is it cumbersome to understand and synthesize, but it is also onerous to maintain and generalize, as it requires a complete rewrite of the bulk of the numerical computation. In this article, we propose an approach to extend the applicability of sparse tiling based on raising the level of abstraction. Through a sequence of compiler passes, the mathematical specification of a problem is progressively lowered, and eventually sparse-tiled C for-loops are generated. Besides automation, we advance the state-of-the-art by introducing: a revisited, more efficient sparse tiling algorithm; support for distributed-memory parallelism; a range of fine-grained optimizations for increased run-time performance; implementation in a publicly-available library, SLOPE; and an in-depth study of the performance impact in Seigen, a real-world elastic wave equation solver for seismological problems, which shows speed-ups up to 1.28x on a platform consisting of 896 Intel Broadwell cores.
△ Less
Submitted 19 June, 2019; v1 submitted 10 August, 2017;
originally announced August 2017.
-
Optimised finite difference computation from symbolic equations
Authors:
Michael Lange,
Navjot Kukreja,
Fabio Luporini,
Mathias Louboutin,
Charles Yount,
Jan Hückelheim,
Gerard J. Gorman
Abstract:
Domain-specific high-productivity environments are playing an increasingly important role in scientific computing due to the levels of abstraction and automation they provide. In this paper we introduce Devito, an open-source domain-specific framework for solving partial differential equations from symbolic problem definitions by the finite difference method. We highlight the generation and automa…
▽ More
Domain-specific high-productivity environments are playing an increasingly important role in scientific computing due to the levels of abstraction and automation they provide. In this paper we introduce Devito, an open-source domain-specific framework for solving partial differential equations from symbolic problem definitions by the finite difference method. We highlight the generation and automated execution of highly optimized stencil code from only a few lines of high-level symbolic Python for a set of scientific equations, before exploring the use of Devito operators in seismic inversion problems.
△ Less
Submitted 12 July, 2017;
originally announced July 2017.
-
TSFC: a structure-preserving form compiler
Authors:
Miklós Homolya,
Lawrence Mitchell,
Fabio Luporini,
David A. Ham
Abstract:
A form compiler takes a high-level description of the weak form of partial differential equations and produces low-level code that carries out the finite element assembly. In this paper we present the Two-Stage Form Compiler (TSFC), a new form compiler with the main motivation to maintain the structure of the input expression as long as possible. This facilitates the application of optimizations a…
▽ More
A form compiler takes a high-level description of the weak form of partial differential equations and produces low-level code that carries out the finite element assembly. In this paper we present the Two-Stage Form Compiler (TSFC), a new form compiler with the main motivation to maintain the structure of the input expression as long as possible. This facilitates the application of optimizations at the highest possible level of abstraction. TSFC features a novel, structure-preserving method for separating the contributions of a form to the subblocks of the local tensor in discontinuous Galerkin problems. This enables us to preserve the tensor structure of expressions longer through the compilation process than other form compilers. This is also achieved in part by a two-stage approach that cleanly separates the lowering of finite element constructs to tensor algebra in the first stage, from the scheduling of those tensor operations in the second stage. TSFC also efficiently traverses complicated expressions, and experimental evaluation demonstrates good compile-time performance even for highly complex forms.
△ Less
Submitted 9 April, 2018; v1 submitted 10 May, 2017;
originally announced May 2017.
-
Devito: Towards a generic Finite Difference DSL using Symbolic Python
Authors:
Michael Lange,
Navjot Kukreja,
Mathias Louboutin,
Fabio Luporini,
Felippe Vieira,
Vincenzo Pandolfo,
Paulius Velesko,
Paulius Kazakas,
Gerard Gorman
Abstract:
Domain specific languages (DSL) have been used in a variety of fields to express complex scientific problems in a concise manner and provide automated performance optimization for a range of computational architectures. As such DSLs provide a powerful mechanism to speed up scientific Python computation that goes beyond traditional vectorization and pre-compilation approaches, while allowing domain…
▽ More
Domain specific languages (DSL) have been used in a variety of fields to express complex scientific problems in a concise manner and provide automated performance optimization for a range of computational architectures. As such DSLs provide a powerful mechanism to speed up scientific Python computation that goes beyond traditional vectorization and pre-compilation approaches, while allowing domain scientists to build applications within the comforts of the Python software ecosystem. In this paper we present Devito, a new finite difference DSL that provides optimized stencil computation from high-level problem specifications based on symbolic Python expressions. We demonstrate Devito's symbolic API and performance advantages over traditional Python acceleration methods before highlighting its use in the scientific context of seismic inversion problems.
△ Less
Submitted 12 September, 2016;
originally announced September 2016.
-
Devito: automated fast finite difference computation
Authors:
Navjot Kukreja,
Mathias Louboutin,
Felippe Vieira,
Fabio Luporini,
Michael Lange,
Gerard Gorman
Abstract:
Domain specific languages have successfully been used in a variety of fields to cleanly express scientific problems as well as to simplify implementation and performance opti- mization on different computer architectures. Although a large number of stencil languages are available, finite differ- ence domain specific languages have proved challenging to design because most practical use cases requi…
▽ More
Domain specific languages have successfully been used in a variety of fields to cleanly express scientific problems as well as to simplify implementation and performance opti- mization on different computer architectures. Although a large number of stencil languages are available, finite differ- ence domain specific languages have proved challenging to design because most practical use cases require additional features that fall outside the finite difference abstraction. Inspired by the complexity of real-world seismic imaging problems, we introduce Devito, a domain specific language in which high level equations are expressed using symbolic expressions from the SymPy package. Complex equations are automatically manipulated, optimized, and translated into highly optimized C code that aims to perform compa- rably or better than hand-tuned code. All this is transpar- ent to users, who only see concise symbolic mathematical expressions.
△ Less
Submitted 10 October, 2016; v1 submitted 30 August, 2016;
originally announced August 2016.
-
A structure-exploiting numbering algorithm for finite elements on extruded meshes, and its performance evaluation in Firedrake
Authors:
Gheorghe-Teodor Bercea,
Andrew T. T. McRae,
David A. Ham,
Lawrence Mitchell,
Florian Rathgeber,
Luigi Nardi,
Fabio Luporini,
Paul H. J. Kelly
Abstract:
We present a generic algorithm for numbering and then efficiently iterating over the data values attached to an extruded mesh. An extruded mesh is formed by replicating an existing mesh, assumed to be unstructured, to form layers of prismatic cells. Applications of extruded meshes include, but are not limited to, the representation of 3D high aspect ratio domains employed by geophysical finite ele…
▽ More
We present a generic algorithm for numbering and then efficiently iterating over the data values attached to an extruded mesh. An extruded mesh is formed by replicating an existing mesh, assumed to be unstructured, to form layers of prismatic cells. Applications of extruded meshes include, but are not limited to, the representation of 3D high aspect ratio domains employed by geophysical finite element simulations. These meshes are structured in the extruded direction. The algorithm presented here exploits this structure to avoid the performance penalty traditionally associated with unstructured meshes. We evaluate the implementation of this algorithm in the Firedrake finite element system on a range of low compute intensity operations which constitute worst cases for data layout performance exploration. The experiments show that having structure along the extruded direction enables the cost of the indirect data accesses to be amortized after 10-20 layers as long as the underlying mesh is well-ordered. We characterise the resulting spatial and temporal reuse in a representative set of both continuous-Galerkin and discontinuous-Galerkin discretisations. On meshes with realistic numbers of layers the performance achieved is between 70% and 90% of a theoretical hardware-specific limit.
△ Less
Submitted 28 October, 2016; v1 submitted 20 April, 2016;
originally announced April 2016.
-
An algorithm for the optimization of finite element integration loops
Authors:
Fabio Luporini,
David A. Ham,
Paul H. J. Kelly
Abstract:
We present an algorithm for the optimization of a class of finite element integration loop nests. This algorithm, which exploits fundamental mathematical properties of finite element operators, is proven to achieve a locally optimal operation count. In specified circumstances the optimum achieved is global. Extensive numerical experiments demonstrate significant performance improvements over the s…
▽ More
We present an algorithm for the optimization of a class of finite element integration loop nests. This algorithm, which exploits fundamental mathematical properties of finite element operators, is proven to achieve a locally optimal operation count. In specified circumstances the optimum achieved is global. Extensive numerical experiments demonstrate significant performance improvements over the state of the art in finite element code generation in almost all cases. This validates the effectiveness of the algorithm presented here, and illustrates its limitations.
△ Less
Submitted 20 April, 2016;
originally announced April 2016.
-
Firedrake: automating the finite element method by composing abstractions
Authors:
Florian Rathgeber,
David A. Ham,
Lawrence Mitchell,
Michael Lange,
Fabio Luporini,
Andrew T. T. McRae,
Gheorghe-Teodor Bercea,
Graham R. Markall,
Paul H. J. Kelly
Abstract:
Firedrake is a new tool for automating the numerical solution of partial differential equations. Firedrake adopts the domain-specific language for the finite element method of the FEniCS project, but with a pure Python runtime-only implementation centred on the composition of several existing and new abstractions for particular aspects of scientific computing. The result is a more complete separat…
▽ More
Firedrake is a new tool for automating the numerical solution of partial differential equations. Firedrake adopts the domain-specific language for the finite element method of the FEniCS project, but with a pure Python runtime-only implementation centred on the composition of several existing and new abstractions for particular aspects of scientific computing. The result is a more complete separation of concerns which eases the incorporation of separate contributions from computer scientists, numerical analysts and application specialists. These contributions may add functionality, or improve performance.
Firedrake benefits from automatically applying new optimisations. This includes factorising mixed function spaces, transforming and vectorising inner loops, and intrinsically supporting block matrix operations. Importantly, Firedrake presents a simple public API for escaping the UFL abstraction. This allows users to implement common operations that fall outside pure variational formulations, such as flux-limiters.
△ Less
Submitted 1 July, 2016; v1 submitted 8 January, 2015;
originally announced January 2015.
-
COFFEE: an Optimizing Compiler for Finite Element Local Assembly
Authors:
Fabio Luporini,
Ana Lucia Varbanescu,
Florian Rathgeber,
Gheorghe-Teodor Bercea,
J. Ramanujam,
David A. Ham,
Paul H. J. Kelly
Abstract:
The numerical solution of partial differential equations using the finite element method is one of the key applications of high performance computing. Local assembly is its characteristic operation. This entails the execution of a problem-specific kernel to numerically evaluate an integral for each element in the discretized problem domain. Since the domain size can be huge, executing efficient ke…
▽ More
The numerical solution of partial differential equations using the finite element method is one of the key applications of high performance computing. Local assembly is its characteristic operation. This entails the execution of a problem-specific kernel to numerically evaluate an integral for each element in the discretized problem domain. Since the domain size can be huge, executing efficient kernels is fundamental. Their op- timization is, however, a challenging issue. Even though affine loop nests are generally present, the short trip counts and the complexity of mathematical expressions make it hard to determine a single or unique sequence of successful transformations. Therefore, we present the design and systematic evaluation of COF- FEE, a domain-specific compiler for local assembly kernels. COFFEE manipulates abstract syntax trees generated from a high-level domain-specific language for PDEs by introducing domain-aware composable optimizations aimed at improving instruction-level parallelism, especially SIMD vectorization, and register locality. It then generates C code including vector intrinsics. Experiments using a range of finite-element forms of increasing complexity show that significant performance improvement is achieved.
△ Less
Submitted 4 July, 2014; v1 submitted 3 July, 2014;
originally announced July 2014.
-
A Survey on Open Problems for Mobile Robots
Authors:
Alberto Bandettini,
Fabio Luporini,
Giovanni Viglietta
Abstract:
Gathering mobile robots is a widely studied problem in robotic research. This survey first introduces the related work, summarizing models and results. Then, the focus shifts on the open problem of gathering fat robots. In this context, "fat" means that the robot is not represented by a point in a bidimensional space, but it has an extent. Moreover, it can be opaque in the sense that other robots…
▽ More
Gathering mobile robots is a widely studied problem in robotic research. This survey first introduces the related work, summarizing models and results. Then, the focus shifts on the open problem of gathering fat robots. In this context, "fat" means that the robot is not represented by a point in a bidimensional space, but it has an extent. Moreover, it can be opaque in the sense that other robots cannot "see through" it. All these issues lead to a redefinition of the original problem and an extension of the CORDA model. For at most 4 robots an algorithm is provided in the literature, but is gathering always possible for n>4 fat robots? Another open problem is considered: Boundary Patrolling by mobile robots. A set of mobile robots with constraints only on speed and visibility is working in a polygonal environment having boundary and possibly obstacles. The robots have to perform a perpetual movement (possibly within the environment) so that the maximum timespan in which a point of the boundary is not being watched by any robot is minimized.
△ Less
Submitted 7 November, 2011;
originally announced November 2011.