research-article

Open access

Yet Another Tensor Toolbox for Discontinuous Galerkin Methods and Other Applications

Authors:

Carsten Uphoff,

Michael BaderAuthors Info & Claims

ACM Transactions on Mathematical Software (TOMS), Volume 46, Issue 4

Article No.: 34, Pages 1 - 40

https://doi.org/10.1145/3406835

Published: 16 October 2020 Publication History

All formats PDF

Abstract

The numerical solution of partial differential equations is at the heart of many grand challenges in supercomputing. Solvers based on high-order discontinuous Galerkin (DG) discretisation have been shown to scale on large supercomputers with excellent performance and efficiency if the implementation exploits all levels of parallelism and is tailored to the specific architecture. However, every year new supercomputers emerge and the list of hardware-specific considerations grows simultaneously with the list of desired features in a DG code. Thus, we believe that a sustainable DG code needs an abstraction layer to implement the numerical scheme in a suitable language. We explore the possibility to abstract the numerical scheme as small tensor operations, describe them in a domain-specific language (DSL) resembling the Einstein notation, and to map them to small General Matrix-Matrix Multiplication routines. The compiler for our DSL implements classic optimisations that are used for large tensor contractions, and we present novel optimisation techniques such as equivalent sparsity patterns and optimal index permutations for temporary tensors. Our application examples, which include the earthquake simulation software SeisSol, show that the generated kernels achieve over 50% peak performance of a recent 48-core Skylake system while the DSL considerably simplifies the implementation.

References

[1]

K. Åhlander. 2002. Einstein summation for multidimensional arrays. Comput. Math. Applic. 44, 8 (2002), 1007--1017.

[2]

artin S. Alnæs, Anders Logg, Kristian B. Ølgaard, Marie E. Rognes, and Garth N. Wells. 2014. Unified form language: A domain-specific language for weak formulations of partial differential equations. ACM Trans. Math. Softw. 40, 2 (Mar. 2014).

Digital Library

[3]

Harold L. Atkins and Chi-Wang Shu. 1998. Quadrature-free implementation of discontinuous Galerkin method for hyperbolic equations. AIAA J. 36:5 (1998), 775--782.

[4]

G. Baumgartner, A. Auer, D. E. Bernholdt, A. Bibireata, V. Choppella, D. Cociorva, Xiaoyang Gao, R. J. Harrison, S. Hirata, S. Krishnamoorthy, S. Krishnan, Chi chung Lam, Qingda Lu, M. Nooijen, R. M. Pitzer, J. Ramanujam, P. Sadayappan, and A. Sibiryakov. 2005. Synthesis of high-performance parallel programs for a class of ab initio quantum chemistry models. Proc. IEEE 93, 2 (Feb. 2005), 276--292.

[5]

Nathan W. Brei. 2018. Generating Small Sparse Matrix Multiplication Kernels for Knights Landing. Master’s thesis. Technical University of Munich, Garching, Germany.

[6]

Alexander Breuer, Alexander Heinecke, Michael Bader, and Christian Pelties. 2014a. Accelerating SeisSol by generating vectorized code for sparse matrix operators. In Parallel Computing: Accelerating Computational Science and Engineering (CSE). IOS Press, 347--356.

[7]

Alexander Breuer, Alexander Heinecke, and Yifeng Cui. 2017. EDGE: Extreme scale fused seismic simulations with the discontinuous Galerkin method. In High Performance Computing, ISC 2017. Springer International Publishing, Cham, 41--60.

[8]

Alexander Breuer, Alexander Heinecke, Sebastian Rettenberger, Michael Bader, Alice-Agnes Gabriel, and Christian Pelties. 2014b. Sustained petascale performance of seismic simulations with SeisSol on SuperMUC. In Proceedings of the 29th International Conference on Supercomputing (ISC’14). Springer, 1--18.

Digital Library

[9]

Edith Cohen. 1998. Structure prediction and computation of sparse matrix products. J. Combin. Optimiz. 2, 4 (1998), 307--332.

[10]

Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. 2009. Introduction to Algorithms (3rd ed.). The MIT Press.

Digital Library

[11]

Steven M. Day, Jacobo Bielak, Doug Dreger, Shawn Larsen, Robert Graves, Arben Pitarka, and Kim B. Olsen. 2003. Tests of 3D elastodynamics Codes: Final Report for Lifelines Program Task 1A02. Pacific Earthquake Engineering Research Center.

[12]

Edoardo Di Napoli, Diego Fabregat-Traver, Gregorio Quintana-Ortí, and Paolo Bientinesi. 2014. Towards an efficient use of the BLAS library for multilinear tensor contractions. Appl. Math. Comput. 235 (2014), 454--468.

[13]

Michael Dumbser and Martin Käser. 2006. An arbitrary high-order discontinuous Galerkin method for elastic waves on unstructured meshes—II. The three-dimensional isotropic case. Geophys. J. Int. 167 (2006), 319--336.

[14]

A. Einstein. 1916. Die Grundlage der allgemeinen Relativitätstheorie. Annal. Phys. 354, 7 (1916), 769--822.

[15]

Erich Gamma, Richard Helm, Ralph Johnson, and John Vlissides. 1995. Design Patterns: Elements of Reusable Object-oriented Software. Addison-Wesley Longman Publishing Co., Inc., Boston, MA.

Digital Library

[16]

Kazushige Goto and Robert A. van de Geijn. 2008. Anatomy of high-performance matrix multiplication. ACM Trans. Math. Softw. 34, 3 (2008), 12:1--12:25.

Digital Library

[17]

Gaël Guennebaud, Benoît Jacob, et al. 2010. Eigen v3. Retrieved from http://eigen.tuxfamily.org.

[18]

R. Harrison, G. Beylkin, F. Bischoff, J. Calvin, G. Fann, J. Fosso-Tande, D. Galindo, J. Hammond, R. Hartman-Baker, J. Hill, J. Jia, J. Kottmann, M. Yvonne Ou, J. Pei, L. Ratcliff, M. Reuter, A. Richie-Halford, N. Romero, H. Sekino, W. Shelton, B. Sundahl, W. Thornton, E. Valeev, Á. Vázquez-Mayagoitia, N. Vence, T. Yanai, and Y. Yokoi. 2016. MADNESS: A multiresolution, adaptive numerical environment for scientific simulation. SIAM J. Sci. Comput. 38, 5 (2016), S123--S142.

Digital Library

[19]

Alexander Heinecke, Alexander Breuer, Michael Bader, and Pradeep Dubey. 2016a. High order seismic simulations on the Intel Xeon Phi processor (Knights Landing). In Proceedings of the 31st International Conference on High Performance Computing. Springer, 343--362.

[20]

Alexander Heinecke, Alexander Breuer, Sebastian Rettenberger, Michael Bader, Alice-Agnes Gabriel, Christian Pelties, Arndt Bode, William Barth, Xiang-Ke Liao, Karthikeyan Vaidyanathan, Mikhail Smelyanskiy, and Pradeep Dubey. 2014. Petascale high order dynamic rupture earthquake simulations on heterogeneous supercomputers. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, 3--14.

Digital Library

[21]

Alexander Heinecke, Greg Henry, Maxwell Hutchinson, and Hans Pabst. 2016b. LIBXSMM: Accelerating small matrix multiplications by runtime code generation. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC’16). IEEE Press, Piscataway, NJ, 84:1--84:11. Retrieved from http://dl.acm.org/citation.cfm?id=3014904.3015017.

[22]

Jan S. Hesthaven and Tim Warburton. 2008. Nodal Discontinuous Galerkin Methods. Springer, New York.

Digital Library

[23]

Miklós Homolya, Robert C. Kirby, and David A. Ham. 2017. Exposing and exploiting structure: Optimal code generation for high-order finite element methods. Retrieved from: arxiv:cs.MS/1711.02473.

[24]

Miklós Homolya, Lawrence Mitchell, Fabio Luporini, and David A. Ham. 2018. TSFC: A structure-preserving form compiler. SIAM J. Sci. Comput. 40, 3 (2018), C401--C428.

Digital Library

[25]

Maxwell Hutchinson, Alexander Heinecke, Hans Pabst, Greg Henry, Matteo Parsani, and David Keyes. 2016. Efficiency of high order spectral element methods on petascale architectures. In Proceedings of the 31st International Conference on High Performance Computing. Springer, 449--466.

[26]

Klaus Iglberger, Georg Hager, Jan Treibig, and Ulrich Rüde. 2012. Expression templates revisited: A performance analysis of current methodologies. SIAM J. Sci. Comput. 34, 2 (2012), C42--C69.

Digital Library

[27]

Intel Corporation. 2020. Intel Xeon Processor Scalable Family: Specification Update (June 2020 ed.). Retrieved from https://www.intel.de/content/www/de/de/processors/xeon/scalable/xeon-scalable-spec-update.html.

[28]

Martin Käser, Michael Dumbser, Josep de la Puente, and Heiner Igel. 2007. An arbitrary high-order discontinuous Galerkin method for elastic waves on unstructured meshes—III. Viscoelastic attenuation. Geophy. J. Int. 168 (2007), 224--242.

[29]

D. Kempf, R. Heß, S. Müthing, and P. Bastian. 2018. Automatic code generation for high-performance discontinuous Galerkin methods on modern architectures. arXiv e-printsarxiv:math.NA/arXiv:1812.08075 (2018).

[30]

T. Kolda and B. Bader. 2009. Tensor decompositions and applications. SIAM Rev. 51, 3 (2009), 455--500.

Digital Library

[31]

David A. Kopriva. 2009. Implementing Spectral Methods for Partial Differential Equations: Algorithms for Scientists and Engineers (1st ed.). Springer.

[32]

Chi Chung Lam. 1999. Performance optimization of a class of loops implementing multi-dimensional integrals. Ph.D. Dissertation. Graduate School of the Ohio State University, UMI Company. Retrieved from: http://rave.ohiolink.edu/etdc/view?acc_num=osu1488191667180786.

[33]

Chi-Chung Lam, P. Sadayappan, Cociorva Daniel, Mebarek Alouani, and John Wilkins. 1999. Performance optimization of a class of loops involving sums of products of sparse arrays. In Proceedings of the 9th SIAM Conference on Parallel Processing for Scientific Computing.

[34]

Chi-Chung Lam, P. Sadayappan, and Rephael Wenger. 1997. Optimal reordering and mapping of a class of nested-loops for parallel execution. In Languages and Compilers for Parallel Computing. Springer Berlin, 315--329.

[35]

Randall J. LeVeque. 2002. Finite Volume Methods for Hyperbolic Problems. Vol. 31. Cambridge University Press.

[36]

J. Li, C. Battaglino, I. Perros, J. Sun, and R. Vuduc. 2015. An input-adaptive and in-place approach to dense tensor-times-matrix multiply. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC’15). 1--12.

[37]

Anders Logg, Kent-Andre Mardal, and Garth Wells. 2012. Automated Solution of Differential Equations by the Finite Element Method: The FEniCS Book. Springer.

Digital Library

[38]

Fabio Luporini, Ana Lucia Varbanescu, Florian Rathgeber, Gheorghe-Teodor Bercea, J. Ramanujam, David A. Ham, and Paul H. J. Kelly. 2015. Cross-loop optimization of arithmetic intensity for finite element local assembly. ACM Trans. Archit. Code Optimiz. 11, 4 (Jan. 2015).

[39]

D. Matthews. 2018. High-performance tensor contraction without transposition. SIAM J. Sci. Comput. 40, 1 (2018), C1--C24.

Digital Library

[40]

T. Nelson, A. Rivera, P. Balaprakash, M. Hall, P. D. Hovland, E. Jessup, and B. Norris. 2015. Generating efficient tensor contractions for GPUs. In Proceedings of the 44th International Conference on Parallel Processing. 969--978.

[41]

Elmar Peise and Paolo Bientinesi. 2012. Performance modeling for dense linear algebra. In Proceedings of the SC Companion: High Performance Computing, Networking Storage and Analysis (SCC’12). IEEE Computer Society, Washington, DC, 406--416.

Digital Library

[42]

Elmar Peise, Diego Fabregat-Traver, and Paolo Bientinesi. 2015. On the performance prediction of BLAS-based tensor contractions. In Proceedings of the Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems Workshop (Lecture Notes in Computer Science, Vol. 8966). Springer, 193--212.

[43]

Florian Rathgeber, David A. Ham, Lawrence Mitchell, Michael Lange, Fabio Luporini, Andrew T. T. McRae, Gheorghe-Teodor Bercea, Graham R. Markall, and Paul H. J. Kelly. 2016. Firedrake: Automating the finite element method by composing abstractions. ACM Trans. Math. Softw. 43, 3 (Dec. 2016).

[44]

S. Schoeder, K. Kormann, W. A. Wall, and M. Kronbichler. 2018. Efficient explicit time stepping of high order discontinuous Galerkin schemes for waves. SIAM J. Sci. Comput. 40, 6 (2018), C803--C826.

Digital Library

[45]

Helmut Seidl, Reinhard Wilhelm, and Sebastian Hack. 2012. Compiler Design: Analysis and Transformation. Springer.

[46]

Y. Shi, U. N. Niranjan, A. Anandkumar, and C. Cecka. 2016. Tensor contractions with extended BLAS kernels on CPU and GPU. In Proceedings of the IEEE 23rd International Conference on High Performance Computing (HiPC’16). 193--202.

[47]

E. Solomonik, D. Matthews, J. Hammond, and J. Demmel. 2013. Cyclops tensor framework: Reducing communication and eliminating load imbalance in massively parallel contractions. In Proceedings of the IEEE 27th International Symposium on Parallel and Distributed Processing. 813--824.

[48]

Daniele G. Spampinato, Diego Fabregat-Traver, Paolo Bientinesi, and Markus Püschel. 2018. Program generation for small-scale linear algebra applications. In Proceedings of the International Symposium on Code Generation and Optimization (CGO’18). Association for Computing Machinery, New York, 327--339.

[49]

Daniele G. Spampinato and Markus Püschel. 2014. A basic linear algebra compiler. In Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization (CGO’14). Association for Computing Machinery, New York, 23--32.

[50]

Paul Springer and Paolo Bientinesi. 2018. Design of a high-performance GEMM-like tensor-tensor multiplication. ACM Trans. Math. Softw. 44, 3 (2018), 28:1--28:29.

Digital Library

[51]

Paul Springer, Jeff R. Hammond, and Paolo Bientinesi. 2017. TTC: A high-performance compiler for tensor transpositions. ACM Trans. Math. Softw. 44, 2 (Aug. 2017).

Digital Library

[52]

Kevin Stock, Tom Henretty, Iyyappa Murugandi, P. Sadayappan, and Robert Harrison. 2011. Model-driven SIMD code generation for a multi-resolution tensor kernel. In Proceedings of the IEEE Parallel and Distributed Processing Symposium. IEEE Computer Society, 1058--1067.

Digital Library

[53]

J. Treibig, G. Hager, and G. Wellein. 2010. LIKWID: A lightweight performance-oriented tool suite for x86 multicore environments. In Proceedings of the 1st International Workshop on Parallel Software Tools and Tool Infrastructures (PSTI’10).

[54]

Carsten Uphoff and Michael Bader. 2016. Generating high performance matrix kernels for earthquake simulations with viscoelastic attenuation. In Proceedings of the International Conference on High Performance Computing and Simulation (HPCS’16). 908--916.

[55]

Carsten Uphoff, Sebastian Rettenberger, Michael Bader, Elizabeth H. Madden, Thomas Ulrich, Stephanie Wollherr, and Alice-Agnes Gabriel. 2017. Extreme scale multi-physics simulations of the Tsunamigenic 2004 Sumatra megathrust earthquake. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC’17). ACM, New York, NY.

Digital Library

[56]

Field G. Van Zee and Robert A. van de Geijn. 2015. BLIS: A framework for rapidly instantiating BLAS functionality. ACM Trans. Math. Softw. 41, 3 (June 2015).

Digital Library

[57]

Peter Vincent, Freddie Witherden, Brian Vermeire, Jin Seok Park, and Arvind Iyer. 2016. Towards green aviation with Python at petascale. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC’16). IEEE Press, Piscataway, NJ. Retrieved from: http://dl.acm.org/citation.cfm?id=3014904.3014906.

Digital Library

[58]

Peter Wauligmann and Nathan W. Brei. 2019. PSpaMM: Portable Sparse Matrix Multiplication. Retrieved from https://github.com/peterwauligmann/pspamm.

[59]

Stephanie Wollherr, Alice-Agnes Gabriel, and Carsten Uphoff. 2018. Off-fault plasticity in three-dimensional dynamic rupture simulations using a modal Discontinuous Galerkin method on unstructured meshes: Implementation, verification and application. Geophys. J. Int. 214, 3 (2018), 1556--1584.

[60]

Bartosz D. Wozniak, Freddie D. Witherden, Francis P. Russell, Peter E. Vincent, and Paul H. J. Kelly. 2016. GiMMiK--Generating bespoke matrix multiplication kernels for accelerators: Application to high-order computational fluid dynamics. Comput. Phys. Commun. 202 (2016), 12--22.

Cited By

Palgunadi KGabriel AGaragash DUlrich TMai P(2024)Rupture Dynamics of Cascading Earthquakes in a Multiscale Fracture NetworkJournal of Geophysical Research: Solid Earth10.1029/2023JB027578129:3Online publication date: 19-Mar-2024
https://doi.org/10.1029/2023JB027578
Dorozhinskii RGadeschi GBader M(2024)Fused GEMMs towards an efficient GPU implementation of the ADER‐DG method in SeisSolConcurrency and Computation: Practice and Experience10.1002/cpe.803736:12Online publication date: 13-Feb-2024
https://doi.org/10.1002/cpe.8037
Heisler EDeshmukh AMazumder SSadayappan PSundar H(2023)Multi-discretization domain specific language and code generation for differential equationsJournal of Computational Science10.1016/j.jocs.2023.10198168(101981)Online publication date: May-2023
https://doi.org/10.1016/j.jocs.2023.101981
Show More Cited By

Index Terms

Yet Another Tensor Toolbox for Discontinuous Galerkin Methods and Other Applications

Recommendations

Superconvergence of discontinuous Galerkin and local discontinuous Galerkin methods

Various superconvergence properties of discontinuous Galerkin (DG) and local DG (LDG) methods for linear hyperbolic and parabolic equations have been investigated in the past. Due to these superconvergence properties, DG and LDG methods have been known ...
An $hp$-Version Discontinuous Galerkin Method for Integro-Differential Equations of Parabolic Type

We study the numerical solution of a class of parabolic integro-differential equations with weakly singular kernels. We use an $hp$-version discontinuous Galerkin (DG) method for the discretization in time. We derive optimal $hp$-version error estimates ...
Discontinuous Galerkin time stepping method for solving linear space fractional partial differential equations

In this paper, we consider the discontinuous Galerkin time stepping method for solving the linear space fractional partial differential equations. The space fractional derivatives are defined by using Riesz fractional derivative. The space variable is ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Mathematical Software

ACM Transactions on Mathematical Software Volume 46, Issue 4

December 2020

272 pages

ISSN:0098-3500

EISSN:1557-7295

DOI:10.1145/3430683

Editors:
Zhaojun Bai
University of California at Davis, USA
,
Wolfgang Bangerth
Colorado State University, USA

Issue’s Table of Contents

Copyright © 2020 Owner/Author.

This work is licensed under a Creative Commons Attribution-NoDerivatives International 4.0 License.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 16 October 2020

Accepted: 01 June 2020

Revised: 01 February 2020

Received: 01 March 2019

Published in TOMS Volume 46, Issue 4

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

Volkswagen Foundation

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

12
Total Citations
View Citations
806
Total Downloads

Downloads (Last 12 months)197
Downloads (Last 6 weeks)20

Reflects downloads up to 26 Jul 2024

Other Metrics

View Author Metrics

Citations

Cited By

Palgunadi KGabriel AGaragash DUlrich TMai P(2024)Rupture Dynamics of Cascading Earthquakes in a Multiscale Fracture NetworkJournal of Geophysical Research: Solid Earth10.1029/2023JB027578129:3Online publication date: 19-Mar-2024
https://doi.org/10.1029/2023JB027578
Dorozhinskii RGadeschi GBader M(2024)Fused GEMMs towards an efficient GPU implementation of the ADER‐DG method in SeisSolConcurrency and Computation: Practice and Experience10.1002/cpe.803736:12Online publication date: 13-Feb-2024
https://doi.org/10.1002/cpe.8037
Heisler EDeshmukh AMazumder SSadayappan PSundar H(2023)Multi-discretization domain specific language and code generation for differential equationsJournal of Computational Science10.1016/j.jocs.2023.10198168(101981)Online publication date: May-2023
https://doi.org/10.1016/j.jocs.2023.101981
Breuer AHeinecke A(2022)Next-Generation Local Time Stepping for the ADER-DG Finite Element Method2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS53621.2022.00046(402-413)Online publication date: May-2022
https://doi.org/10.1109/IPDPS53621.2022.00046
Uphoff CMay DGabriel A(2022)A discontinuous Galerkin method for sequences of earthquakes and aseismic slip on multiple faults using unstructured curvilinear gridsGeophysical Journal International10.1093/gji/ggac467233:1(586-626)Online publication date: 25-Nov-2022
https://doi.org/10.1093/gji/ggac467
Wolf SGalis MUphoff CGabriel AMoczo PGregor DBader M(2022)An efficient ADER-DG local time stepping scheme for 3D HPC simulation of seismic waves in poroelastic mediaJournal of Computational Physics10.1016/j.jcp.2021.110886455:COnline publication date: 15-Apr-2022
https://dl.acm.org/doi/10.1016/j.jcp.2021.110886
Heisler EDeshmukh ASundar H(2022)Finch: Domain Specific Language and Code Generation for Finite Element and Finite Volume in JuliaComputational Science – ICCS 202210.1007/978-3-031-08751-6_9(118-132)Online publication date: 21-Jun-2022
https://dl.acm.org/doi/10.1007/978-3-031-08751-6_9
Aniko Wirp SGabriel ASchmeller MH. Madden Evan Zelst IKrenz Lvan Dinther YRannabauer L(2021)3D Linked Subduction, Dynamic Rupture, Tsunami, and Inundation Modeling: Dynamic Effects of Supershear and Tsunami Earthquakes, Hypocenter Location, and Shallow Fault SlipFrontiers in Earth Science10.3389/feart.2021.6268449Online publication date: 24-Jun-2021
https://doi.org/10.3389/feart.2021.626844
Krenz LUphoff CUlrich TGabriel AAbrahams LDunham EBader Mde Supinski BHall MGamblin T(2021)3D acoustic-elastic coupling with gravityProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3458817.3476173(1-14)Online publication date: 14-Nov-2021
https://dl.acm.org/doi/10.1145/3458817.3476173
Dorozhinskii RBader M(2021)SeisSol on Distributed Multi-GPU Systems: CUDA Code Generation for the Modal Discontinuous Galerkin MethodThe International Conference on High Performance Computing in Asia-Pacific Region10.1145/3432261.3436753(69-82)Online publication date: 20-Jan-2021
https://dl.acm.org/doi/10.1145/3432261.3436753
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Media

Figures

Other

Tables

View Issue’s Table of Contents