research-article

Automatic Code Generation for High-performance Discontinuous Galerkin Methods on Modern Architectures

Authors:

Steffen Müthing,

Peter BastianAuthors Info & Claims

ACM Transactions on Mathematical Software (TOMS), Volume 47, Issue 1

Article No.: 6, Pages 1 - 31

https://doi.org/10.1145/3424144

Published: 08 December 2020 Publication History

Abstract

SIMD vectorization has lately become a key challenge in high-performance computing. However, hand-written explicitly vectorized code often poses a threat to the software’s sustainability. In this publication, we solve this sustainability and performance portability issue by enriching the simulation framework dune-pdelab with a code generation approach. The approach is based on the well-known domain-specific language UFL but combines it with loopy, a more powerful intermediate representation for the computational kernel. Given this flexible tool, we present and implement a new class of vectorization strategies for the assembly of Discontinuous Galerkin methods on hexahedral meshes exploiting the finite element’s tensor product structure. The performance-optimal variant from this class is chosen by the code generator through an auto-tuning approach. The implementation is done within the open source PDE software framework Dune and the discretization module dune-pdelab. The strength of the proposed approach is illustrated with performance measurements for DG schemes for a scalar diffusion reaction equation and the Stokes equation. In our measurements, we utilize both the AVX2 and the AVX512 instruction set, achieving 30% to 40% of the machine’s theoretical peak performance for one matrix-free application of the operator.

References

[1]

Ahmad Abdelfattah, Marc Baboulin, Veselin Dobrev, Jack Dongarra, A. Haidar, I. Karlin, Tz Kolev, I. Masliah, and S. Tomov. [2017]. Small Tensor Operations on Advanced Architectures for High-order Applications. Technical Report. Technical Report UT-EECS-17-749.

[2]

Ahmad Abdelfattah, Azzam Haidar, Stanimire Tomov, and Jack Dongarra. 2016. Performance, design, and autotuning of batched GEMM for GPUs. In Proceedings of the International Conference on High Performance Computing. Springer, 21--38.

[3]

Mark Ainsworth, Gaelle Andriamaro, and Oleg Davydov. 2011. Bernstein–Bézier finite elements of arbitrary order and optimal assembly procedures. SIAM J. Sci. Comput. 33, 6 (2011), 3087--3109.

Digital Library

[4]

Martin S. Alnæs, Anders Logg, Kristian B. Ølgaard, Marie E. Rognes, and Garth N. Wells. 2014. Unified form language: A domain-specific language for weak formulations of partial differential equations. ACM Trans. Math. Softw. 40, 2 (2014), 9.

Digital Library

[5]

Martin S. Alnæs, Anders Logg, Kristian B. Ølgaard, Marie E. Rognes, and Garth N. Wells. 2014. Unified form language: A domain-specific language for weak formulations of partial differential equations. ACM Trans. Math. Softw. 40, 2 (2014).

Digital Library

[6]

W. Bangerth, R. Hartmann, and G. Kanschat. 2007. deal.II—A general purpose object oriented finite element library. ACM Trans. Math. Softw. 33, 4 (2007), 24/1–24/27.

Digital Library

[7]

Peter Bastian. 2014. A fully coupled discontinuous Galerkin method for two-phase flow in porous media with discontinuous capillary pressure. Comput. Geosci. 18, 5 (2014), 779--796.

[8]

P. Bastian, K. Birken, K. Johannsen, S. Lang, N. Neuß, H. Rentz-Reichert, and C. Wieners. 1997. UG—A flexible software toolbox for solving partial differential equations. Comput. Visual. Sci. 1, 1 (Jan. 1997), 27--40.

[9]

Peter Bastian, Markus Blatt, Andreas Dedner, Christian Engwer, Robert Klöfkorn, Ralf Kornhuber, Mario Ohlberger, and Oliver Sander. 2008. A generic grid interface for parallel and adaptive scientific computing. part II: Implementation and tests in DUNE. Computing 82, 2–3 (2008), 121--138.

[10]

Peter Bastian, Markus Blatt, Andreas Dedner, Christian Engwer, Robert Klöfkorn, Mario Ohlberger, and Oliver Sander. 2008. A generic grid interface for parallel and adaptive scientific computing. part I: Abstract framework. Computing 82, 2–3 (2008), 103--119.

[11]

Peter Bastian, Felix Heimann, and Sven Marnach. 2010. Generic implementation of finite element methods in the distributed and unified numerics environment (DUNE). Kybernetika 46, 2 (2010), 294--315.

[12]

Peter Bastian, Eike Hermann Müller, Steffen Müthing, and Marian Piatkowski. 2019. Matrix-free multigrid block-preconditioners for higher order discontinuous Galerkin discretisations. J. Comput. Phys. 394 (2019), 417--439.

Digital Library

[13]

Paul E. Buis and Wayne R. Dyksen. 1996. Efficient vector and parallel manipulation of tensor products. ACM Trans. Math. Softw. 22, 1 (1996), 18--23.

Digital Library

[14]

B. Cockburn, S. Y. Lin, and C.-W. Shu (Eds.). 2000. Discontinuous Galerkin Methods. Theory, Computation and Applications. Lecture Notes in Computational Science and Engineering, Vol. 11. Springer-Verlag.

[15]

Kaushik Datta, Mark Murphy, Vasily Volkov, Samuel Williams, Jonathan Carter, Leonid Oliker, David Patterson, John Shalf, and Katherine Yelick. 2008. Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures. In Proceedings of the ACM/IEEE Conference on Supercomputing (SC’08). IEEE Press, Piscataway, NJ, Article 4, 12 pages. Retrieved from http://dl.acm.org/citation.cfm?id=1413370.1413375.

Digital Library

[16]

R. H. Dennard, F. H. Gaensslen, H. Yu, V. L. Rideout, E. Bassous, and A. R. LeBlanc. 1974. Design of ion-implanted MOSFET’s with very small physical dimensions. IEEE J. Solid-State Circ. 9, 5 (Oct. 1974), 256--268.

[17]

Romain Dolbeau. 2018. Theoretical peak FLOPS per instruction set: A tutorial. J. Supercomput. 74 (2018), 1341--1377.

Digital Library

[18]

Craig C. Douglas, Jonathan Hu, Wolfgang Karl, Markus Kowarschik, Ulrich Rüde, and Christian Weiß. 2000. Fixed and Adaptive Cache Aware Algorithms for Multigrid Methods. Springer, Berlin, 87--93.

[19]

Alexandre Ern, Annette F. Stephansen, and Paolo Zunino. 2009. A discontinuous Galerkin method with weighted averages for advection-diffusion equations with locally small and anisotropic diffusivity. IMA J. Numer. Anal. 29, 2 (2009), 235--256.

[20]

Paul Fischer, Misun Min, Thilina Rathnayake, Som Dutta, Tzanio Kolev, Veselin Dobrev, Jean-Sylvain Camier, Martin Kronbichler, Tim Warburton, Kasia Swirydowicz, et al. 2020. Scalability of high-performance PDE solvers. Arxiv Preprint Arxiv:2004.06722 (2020).

[21]

P. F. Fischer, K. Heisey, and M. Min. 2015. Scaling limits for PDE-based simulation. In Proceedings of the 22nd AIAA Computational Fluid Dynamics Conference. Dallas, TX.

[22]

Agner Fog. [n.d.]. VCL C++ vector class library v 1.30. Retrieved from http://www.agner.org/optimize/vectorclass.pdf.

[23]

F. Franchetti, S. Kral, J. Lorenz, and C. W. Ueberhuber. 2005. Efficient utilization of SIMD extensions. Proc. IEEE 93, 2 (Feb. 2005), 409--425.

[24]

Vivette Girault, Mary Béatrice Rivière, and F. Wheeler. 2005. A discontinuous Galerkin method with nonoverlapping domain decomposition for the stokes and Navier-Stokes problems. Math. Comput. 74 (2005), 53--84.

[25]

Google. 2020. Benchmark. Retrieved from https://github.com/google/benchmark.

[26]

Alexander Heinecke, Greg Henry, Maxwell Hutchinson, and Hans Pabst. 2016. LIBXSMM: Accelerating small matrix multiplications by runtime code generation. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE Press, 84.

[27]

Miklós Homolya, Lawrence Mitchell, Fabio Luporini, and David A. Ham. 2018. TSFC: A structure-preserving form compiler. SIAM J. Sci. Comput. 40, 3 (2018), C401–C428.

Digital Library

[28]

H. Huang and G. Scovazzi. 2013. A high-order, fully coupled, upwind, compact discontinuous Galerkin method for modeling of viscous fingering in compressible porous media. Computer Methods in Applied Mechanics and Engineering 263, 0 (2013), 169--187.

[29]

George Em Karniadakis and Spencer J. Sherwin. 2005. Spectral/hp Element Methods for CFD. Oxford University Press.

[30]

Dominic Kempf and René Heß. 2020. Automatic Code Generation for High-Performance Discontinuous Galerkin Methods on Modern Architectures—Software Stack: Retrieved from https://doi.org/10.5281/zenodo.377926.

[31]

Dominic Kempf and Timo Koch. 2017. System testing in scientific numerical software frameworks using the example of DUNE. Arch. Numer. Softw. 5, 1 (2017), 151--168.

[32]

Kyungjoo Kim, Timothy B. Costa, Mehmet Deveci, Andrew M. Bradley, Simon D. Hammond, Murat E. Guney, Sarah Knepper, Shane Story, and Sivasankaran Rajamanickam. 2017. Designing vector-friendly compact BLAS and LAPACK kernels. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC’17). ACM, New York, NY, Article 55, 12 pages.

Digital Library

[33]

Robert C. Kirby and Anders Logg. 2006. A compiler for variational forms. ACM Trans. Math. Softw. 32, 3 (2006), 417--444.

Digital Library

[34]

Robert C. Kirby and Kieu Tri Thinh. 2012. Fast simplicial quadrature-based finite element operators using Bernstein polynomials. Numer. Math. 121, 2 (2012), 261--279.

Digital Library

[35]

Andreas Klöckner. 2014. Loo.Py: Transformation-based code generation for GPUs and CPUs. In Proceedings of ACM SIGPLAN International Workshop on Libraries, Languages, and Compilers for Array Programming (ARRAY’14). ACM, New York, NY, Article 82, 6 pages.

Digital Library

[36]

Andreas Klöckner, Lucas C. Wilcox, and T. Warburton. 2016. Array program transformation with Loo.Py by example: High-order finite elements. In Proceedings of the 3rd ACM SIGPLAN International Workshop on Libraries, Languages, and Compilers for Array Programming (ARRAY’16). ACM, New York, NY, 9--16.

[37]

Tzanio Kolev et al. [n.d.]. MFEM: Modular finite element methods. Retrieved from http://mfem.org.

[38]

Benjamin Krank, Niklas Fehn, Wolfgang A. Wall, and Martin Kronbichler. 2017. A high-order semi-explicit discontinuous Galerkin solver for 3D incompressible flow with application to DNS and LES of turbulent channel flow. J. Comput. Phys. 348 (2017), 634--659.

Digital Library

[39]

Matthias Kretz and Volker Lindenstruth. 2012. Vc: A C++ library for explicit vectorization. Softw.: Pract. Exper. 42, 11 (2012), 1409--1430.

Digital Library

[40]

Moritz Kreutzer, Georg Hager, Gerhard Wellein, Holger Fehske, and Alan R. Bishop. 2014. A unified sparse matrix data format for efficient general sparse matrix-vector multiplication on modern processors with wide SIMD units. SIAM J. Sci. Comput. 36, 5 (2014), C401–C423.

Digital Library

[41]

Martin Kronbichler and Katharina Kormann. 2012. A generic interface for parallel cell-based finite element operator application. Comput. Fluids 63 (2012), 135--147.

[42]

Martin Kronbichler and Katharina Kormann. 2019. Fast matrix-free evaluation of discontinuous Galerkin finite element operators. ACM Trans. Math. Softw. 45, 3, Article Article 29 (Aug. 2019), 40 pages.

Digital Library

[43]

Martin Kronbichler and Wolfgang A. Wall. 2018. A performance comparison of continuous and discontinuous Galerkin methods with fast multigrid solvers. SIAM J. Sci. Comput. 40, 5 (2018), A3423–A3448.

Digital Library

[44]

Jizhou Li and Beatrice Riviere. 2015. Numerical solutions of the incompressible miscible displacement equations in heterogeneous media. Comput. Methods Appl. Mech. Eng. 292 (2015), 107--121.

[45]

Charles F. Van Loan. 2000. The ubiquitous kronecker product. J. Comput. Appl. Math. 123, 1 (2000), 85--100.

Digital Library

[46]

Anders Logg, Kent-Andre Mardal, Garth N. Wells, et al. 2012. Automated Solution of Differential Equations by the Finite Element Method. Springer.

[47]

A. T. T. McRae, G.-T. Bercea, L. Mitchell, D. A. Ham, and C. J. Cotter. 2016. Automated generation and symbolic manipulation of tensor product finite elements. SIAM J. Sci. Comput. 38, 5 (2016), S25–S47.

[48]

Steffen Müthing, Marian Piatkowski, and Peter Bastian. 2017. High-performance implementation of matrix-free high-order discontinuous Galerkin methods. Retrieved from https://Arxiv:1711.10885.

[49]

Steven A. Orszag. 1980. Spectral methods for problems in complex geometries. J. Comput. Phys. 37, 1 (1980), 70--92.

[50]

Will Pazner and Per-Olof Persson. 2018. Approximate tensor-product preconditioners for very high-order discontinuous Galerkin methods. J. Comput. Phys. 354 (2018), 344--369.

[51]

Marian Piatkowski, Steffen Müthing, and Peter Bastian. 2018. A stable and high-order accurate discontinuous Galerkin-based splitting method for the incompressible Navier-Stokes equations. J. Comput. Phys. 356 (2018), 220--239.

[52]

Florian Rathgeber, David A. Ham, Lawrence Mitchell, Michael Lange, Fabio Luporini, Andrew T. T. McRae, Gheorghe-Teodor Bercea, Graham R. Markall, and Paul H. J. Kelly. 2015. Firedrake: Automating the finite element method by composing abstractions. Retrieved from http://arxiv.org/abs/1501.01809.

[53]

J. Schöberl, A. Arnold, J. Erb, J. M. Melenk, and T. P. Wihler. 2017. C++11 Implementation of Finite Elements in NGSolve. Technical Report.

[54]

Sriram Sellappa and Siddhartha Chatterjee. 2004. Cache-efficient multigrid algorithms. Int. J. High Perform. Comput. Appl. 18, 1 (Feb. 2004), 115--133.

Digital Library

[55]

Tianjiao Sun, Lawrence Mitchell, Kaushik Kulkarni, Andreas Klöckner, David A. Ham, and Paul H. J. Kelly. 2019. A study of vectorization for matrix-free finite element methods. Retrieved from https://Arxiv:1903.08243.

[56]

Herb Sutter. 2005. The free lunch is over. Dr. Dobb’s J. 30, 3 (2005).

[57]

Kasia Świrydowicz, Noel Chalmers, Ali Karakus, and Tim Warburton. 2019. Acceleration of tensor-product operations for high-order finite element methods. Int. J. High Perform. Comput. Appl. 33, 4 (2019), 735--757.

Digital Library

[58]

Ulrich Trottenberg, Cornelius W. Oosterlee, and Anton Schuller. 2000. Multigrid. Elsevier.

[59]

Maurice V. Wilkes. 2001. The memory gap and the future of high performance memories. SIGARCH Comput. Archit. News 29, 1 (Mar. 2001), 2--7.

Digital Library

[60]

Samuel Williams, Andrew Waterman, and David Patterson. 2009. Roofline: An insightful visual performance model for multicore architectures. Commun. ACM 52, 4 (2009), 65--76.

Digital Library

Cited By

Schmitz AMiller JBurak SMüller M(2024)Parallel Pattern Language Code GenerationProceedings of the 15th International Workshop on Programming Models and Applications for Multicores and Manycores10.1145/3649169.3649245(32-41)Online publication date: 3-Mar-2024
https://dl.acm.org/doi/10.1145/3649169.3649245
Schmitz ABurak SMiller JMüller M(2024)Parallel Pattern Compiler for Automatic Global OptimizationsParallel Computing10.1016/j.parco.2024.103112122(103112)Online publication date: Nov-2024
https://doi.org/10.1016/j.parco.2024.103112
Ahusborde EAmaziane Bde Hoop SEl Ossmani MFlauraud EHamon FKern MSocié ASu DMayer KTóth MVoskov D(2024)A benchmark study on reactive two-phase flow in porous media: Part II - results and discussionComputational Geosciences10.1007/s10596-024-10269-y28:3(395-412)Online publication date: 3-Feb-2024
https://doi.org/10.1007/s10596-024-10269-y
Show More Cited By

Index Terms

Automatic Code Generation for High-performance Discontinuous Galerkin Methods on Modern Architectures

Recommendations

Automated Code Generation for Discontinuous Galerkin Methods

A compiler approach for generating low-level computer code from high-level input for discontinuous Galerkin finite element forms is presented. The input language mirrors conventional mathematical notation, and the compiler generates efficient code in a ...
hp-Discontinuous Galerkin Finite Element Methods with Least-Squares Stabilization

We consider a family of hp-version discontinuous Galerkin finite element methods with least-squares stabilization for symmetric systems of first-order partial differential equations. The family includes the classical discontinuous Galerkin finite ...
Superconvergence for discontinuous Galerkin finite element methods by L2-projection methods

A general superconvergence of discontinuous Galerkin (DG) finite element method for the elliptic problem is established by using L^2-projection method. Regularity assumptions for the elliptic problem with regular partitions are required. Numerical ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Mathematical Software

ACM Transactions on Mathematical Software Volume 47, Issue 1

March 2021

219 pages

ISSN:0098-3500

EISSN:1557-7295

DOI:10.1145/3441641

Editors:
Zhaojun Bai
University of California at Davis, USA
,
Wolfgang Bangerth
Colorado State University, USA

Issue’s Table of Contents

Copyright © 2020 Owner/Author.

This work is licensed under a Creative Commons Attribution-ShareAlike International 4.0 License.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 December 2020

Accepted: 01 September 2020

Revised: 01 April 2020

Received: 01 December 2018

Published in TOMS Volume 47, Issue 1

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

Bundesministerium für Bildung und Forschung
Deutsche Forschungsgemeinschaft

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

11
Total Citations
View Citations
549
Total Downloads

Downloads (Last 12 months)27
Downloads (Last 6 weeks)3

Reflects downloads up to 10 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Schmitz AMiller JBurak SMüller M(2024)Parallel Pattern Language Code GenerationProceedings of the 15th International Workshop on Programming Models and Applications for Multicores and Manycores10.1145/3649169.3649245(32-41)Online publication date: 3-Mar-2024
https://dl.acm.org/doi/10.1145/3649169.3649245
Schmitz ABurak SMiller JMüller M(2024)Parallel Pattern Compiler for Automatic Global OptimizationsParallel Computing10.1016/j.parco.2024.103112122(103112)Online publication date: Nov-2024
https://doi.org/10.1016/j.parco.2024.103112
Ahusborde EAmaziane Bde Hoop SEl Ossmani MFlauraud EHamon FKern MSocié ASu DMayer KTóth MVoskov D(2024)A benchmark study on reactive two-phase flow in porous media: Part II - results and discussionComputational Geosciences10.1007/s10596-024-10269-y28:3(395-412)Online publication date: 3-Feb-2024
https://doi.org/10.1007/s10596-024-10269-y
Kronbichler MSashko DMunch P(2023)Enhancing data locality of the conjugate gradient method for high-order matrix-free finite-element implementationsInternational Journal of High Performance Computing Applications10.1177/1094342022110788037:2(61-81)Online publication date: 1-Mar-2023
https://dl.acm.org/doi/10.1177/10943420221107880
Yang JFu CDeng FWen MGuo XWan C(2023)Toward Interpretable Graph Tensor Convolution Neural Network for Code Semantics EmbeddingACM Transactions on Software Engineering and Methodology10.1145/358257432:5(1-40)Online publication date: 21-Jul-2023
https://dl.acm.org/doi/10.1145/3582574
Heisler EDeshmukh AMazumder SSadayappan PSundar H(2023)Multi-discretization domain specific language and code generation for differential equationsJournal of Computational Science10.1016/j.jocs.2023.10198168(101981)Online publication date: Apr-2023
https://doi.org/10.1016/j.jocs.2023.101981
Mythily MSaha SSelvam SSwamidason I(2022)BPM supported model generation by contemplating key elements of information securityAutomated Software Engineering10.1007/s10515-022-00321-529:1Online publication date: 1-May-2022
https://dl.acm.org/doi/10.1007/s10515-022-00321-5
Heisler EDeshmukh ASundar H(2022)Finch: Domain Specific Language and Code Generation for Finite Element and Finite Volume in JuliaComputational Science – ICCS 202210.1007/978-3-031-08751-6_9(118-132)Online publication date: 21-Jun-2022
https://dl.acm.org/doi/10.1007/978-3-031-08751-6_9
Munch PKormann KKronbichler M(2021)hyper.deal: An Efficient, Matrix-free Finite-element Library for High-dimensional Partial Differential EquationsACM Transactions on Mathematical Software10.1145/346972047:4(1-34)Online publication date: 28-Sep-2021
https://dl.acm.org/doi/10.1145/3469720
Kronbichler MFehn NMunch PBergbauer MWichmann KGeitner CAllalen MSchulz MWall Wde Supinski BHall MGamblin T(2021)A next-generation discontinuous galerkin fluid dynamics solver with application to high-resolution lung airflow simulationsProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3458817.3476171(1-15)Online publication date: 14-Nov-2021
https://dl.acm.org/doi/10.1145/3458817.3476171
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Issue’s Table of Contents