Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article
Open access

Yet Another Tensor Toolbox for Discontinuous Galerkin Methods and Other Applications

Published: 16 October 2020 Publication History
  • Get Citation Alerts
  • Abstract

    The numerical solution of partial differential equations is at the heart of many grand challenges in supercomputing. Solvers based on high-order discontinuous Galerkin (DG) discretisation have been shown to scale on large supercomputers with excellent performance and efficiency if the implementation exploits all levels of parallelism and is tailored to the specific architecture. However, every year new supercomputers emerge and the list of hardware-specific considerations grows simultaneously with the list of desired features in a DG code. Thus, we believe that a sustainable DG code needs an abstraction layer to implement the numerical scheme in a suitable language. We explore the possibility to abstract the numerical scheme as small tensor operations, describe them in a domain-specific language (DSL) resembling the Einstein notation, and to map them to small General Matrix-Matrix Multiplication routines. The compiler for our DSL implements classic optimisations that are used for large tensor contractions, and we present novel optimisation techniques such as equivalent sparsity patterns and optimal index permutations for temporary tensors. Our application examples, which include the earthquake simulation software SeisSol, show that the generated kernels achieve over 50% peak performance of a recent 48-core Skylake system while the DSL considerably simplifies the implementation.

    References

    [1]
    K. Åhlander. 2002. Einstein summation for multidimensional arrays. Comput. Math. Applic. 44, 8 (2002), 1007--1017.
    [2]
    artin S. Alnæs, Anders Logg, Kristian B. Ølgaard, Marie E. Rognes, and Garth N. Wells. 2014. Unified form language: A domain-specific language for weak formulations of partial differential equations. ACM Trans. Math. Softw. 40, 2 (Mar. 2014).
    [3]
    Harold L. Atkins and Chi-Wang Shu. 1998. Quadrature-free implementation of discontinuous Galerkin method for hyperbolic equations. AIAA J. 36:5 (1998), 775--782.
    [4]
    G. Baumgartner, A. Auer, D. E. Bernholdt, A. Bibireata, V. Choppella, D. Cociorva, Xiaoyang Gao, R. J. Harrison, S. Hirata, S. Krishnamoorthy, S. Krishnan, Chi chung Lam, Qingda Lu, M. Nooijen, R. M. Pitzer, J. Ramanujam, P. Sadayappan, and A. Sibiryakov. 2005. Synthesis of high-performance parallel programs for a class of ab initio quantum chemistry models. Proc. IEEE 93, 2 (Feb. 2005), 276--292.
    [5]
    Nathan W. Brei. 2018. Generating Small Sparse Matrix Multiplication Kernels for Knights Landing. Master’s thesis. Technical University of Munich, Garching, Germany.
    [6]
    Alexander Breuer, Alexander Heinecke, Michael Bader, and Christian Pelties. 2014a. Accelerating SeisSol by generating vectorized code for sparse matrix operators. In Parallel Computing: Accelerating Computational Science and Engineering (CSE). IOS Press, 347--356.
    [7]
    Alexander Breuer, Alexander Heinecke, and Yifeng Cui. 2017. EDGE: Extreme scale fused seismic simulations with the discontinuous Galerkin method. In High Performance Computing, ISC 2017. Springer International Publishing, Cham, 41--60.
    [8]
    Alexander Breuer, Alexander Heinecke, Sebastian Rettenberger, Michael Bader, Alice-Agnes Gabriel, and Christian Pelties. 2014b. Sustained petascale performance of seismic simulations with SeisSol on SuperMUC. In Proceedings of the 29th International Conference on Supercomputing (ISC’14). Springer, 1--18.
    [9]
    Edith Cohen. 1998. Structure prediction and computation of sparse matrix products. J. Combin. Optimiz. 2, 4 (1998), 307--332.
    [10]
    Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. 2009. Introduction to Algorithms (3rd ed.). The MIT Press.
    [11]
    Steven M. Day, Jacobo Bielak, Doug Dreger, Shawn Larsen, Robert Graves, Arben Pitarka, and Kim B. Olsen. 2003. Tests of 3D elastodynamics Codes: Final Report for Lifelines Program Task 1A02. Pacific Earthquake Engineering Research Center.
    [12]
    Edoardo Di Napoli, Diego Fabregat-Traver, Gregorio Quintana-Ortí, and Paolo Bientinesi. 2014. Towards an efficient use of the BLAS library for multilinear tensor contractions. Appl. Math. Comput. 235 (2014), 454--468.
    [13]
    Michael Dumbser and Martin Käser. 2006. An arbitrary high-order discontinuous Galerkin method for elastic waves on unstructured meshes—II. The three-dimensional isotropic case. Geophys. J. Int. 167 (2006), 319--336.
    [14]
    A. Einstein. 1916. Die Grundlage der allgemeinen Relativitätstheorie. Annal. Phys. 354, 7 (1916), 769--822.
    [15]
    Erich Gamma, Richard Helm, Ralph Johnson, and John Vlissides. 1995. Design Patterns: Elements of Reusable Object-oriented Software. Addison-Wesley Longman Publishing Co., Inc., Boston, MA.
    [16]
    Kazushige Goto and Robert A. van de Geijn. 2008. Anatomy of high-performance matrix multiplication. ACM Trans. Math. Softw. 34, 3 (2008), 12:1--12:25.
    [17]
    Gaël Guennebaud, Benoît Jacob, et al. 2010. Eigen v3. Retrieved from http://eigen.tuxfamily.org.
    [18]
    R. Harrison, G. Beylkin, F. Bischoff, J. Calvin, G. Fann, J. Fosso-Tande, D. Galindo, J. Hammond, R. Hartman-Baker, J. Hill, J. Jia, J. Kottmann, M. Yvonne Ou, J. Pei, L. Ratcliff, M. Reuter, A. Richie-Halford, N. Romero, H. Sekino, W. Shelton, B. Sundahl, W. Thornton, E. Valeev, Á. Vázquez-Mayagoitia, N. Vence, T. Yanai, and Y. Yokoi. 2016. MADNESS: A multiresolution, adaptive numerical environment for scientific simulation. SIAM J. Sci. Comput. 38, 5 (2016), S123--S142.
    [19]
    Alexander Heinecke, Alexander Breuer, Michael Bader, and Pradeep Dubey. 2016a. High order seismic simulations on the Intel Xeon Phi processor (Knights Landing). In Proceedings of the 31st International Conference on High Performance Computing. Springer, 343--362.
    [20]
    Alexander Heinecke, Alexander Breuer, Sebastian Rettenberger, Michael Bader, Alice-Agnes Gabriel, Christian Pelties, Arndt Bode, William Barth, Xiang-Ke Liao, Karthikeyan Vaidyanathan, Mikhail Smelyanskiy, and Pradeep Dubey. 2014. Petascale high order dynamic rupture earthquake simulations on heterogeneous supercomputers. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, 3--14.
    [21]
    Alexander Heinecke, Greg Henry, Maxwell Hutchinson, and Hans Pabst. 2016b. LIBXSMM: Accelerating small matrix multiplications by runtime code generation. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC’16). IEEE Press, Piscataway, NJ, 84:1--84:11. Retrieved from http://dl.acm.org/citation.cfm?id=3014904.3015017.
    [22]
    Jan S. Hesthaven and Tim Warburton. 2008. Nodal Discontinuous Galerkin Methods. Springer, New York.
    [23]
    Miklós Homolya, Robert C. Kirby, and David A. Ham. 2017. Exposing and exploiting structure: Optimal code generation for high-order finite element methods. Retrieved from: arxiv:cs.MS/1711.02473.
    [24]
    Miklós Homolya, Lawrence Mitchell, Fabio Luporini, and David A. Ham. 2018. TSFC: A structure-preserving form compiler. SIAM J. Sci. Comput. 40, 3 (2018), C401--C428.
    [25]
    Maxwell Hutchinson, Alexander Heinecke, Hans Pabst, Greg Henry, Matteo Parsani, and David Keyes. 2016. Efficiency of high order spectral element methods on petascale architectures. In Proceedings of the 31st International Conference on High Performance Computing. Springer, 449--466.
    [26]
    Klaus Iglberger, Georg Hager, Jan Treibig, and Ulrich Rüde. 2012. Expression templates revisited: A performance analysis of current methodologies. SIAM J. Sci. Comput. 34, 2 (2012), C42--C69.
    [27]
    Intel Corporation. 2020. Intel Xeon Processor Scalable Family: Specification Update (June 2020 ed.). Retrieved from https://www.intel.de/content/www/de/de/processors/xeon/scalable/xeon-scalable-spec-update.html.
    [28]
    Martin Käser, Michael Dumbser, Josep de la Puente, and Heiner Igel. 2007. An arbitrary high-order discontinuous Galerkin method for elastic waves on unstructured meshes—III. Viscoelastic attenuation. Geophy. J. Int. 168 (2007), 224--242.
    [29]
    D. Kempf, R. Heß, S. Müthing, and P. Bastian. 2018. Automatic code generation for high-performance discontinuous Galerkin methods on modern architectures. arXiv e-printsarxiv:math.NA/arXiv:1812.08075 (2018).
    [30]
    T. Kolda and B. Bader. 2009. Tensor decompositions and applications. SIAM Rev. 51, 3 (2009), 455--500.
    [31]
    David A. Kopriva. 2009. Implementing Spectral Methods for Partial Differential Equations: Algorithms for Scientists and Engineers (1st ed.). Springer.
    [32]
    Chi Chung Lam. 1999. Performance optimization of a class of loops implementing multi-dimensional integrals. Ph.D. Dissertation. Graduate School of the Ohio State University, UMI Company. Retrieved from: http://rave.ohiolink.edu/etdc/view?acc_num=osu1488191667180786.
    [33]
    Chi-Chung Lam, P. Sadayappan, Cociorva Daniel, Mebarek Alouani, and John Wilkins. 1999. Performance optimization of a class of loops involving sums of products of sparse arrays. In Proceedings of the 9th SIAM Conference on Parallel Processing for Scientific Computing.
    [34]
    Chi-Chung Lam, P. Sadayappan, and Rephael Wenger. 1997. Optimal reordering and mapping of a class of nested-loops for parallel execution. In Languages and Compilers for Parallel Computing. Springer Berlin, 315--329.
    [35]
    Randall J. LeVeque. 2002. Finite Volume Methods for Hyperbolic Problems. Vol. 31. Cambridge University Press.
    [36]
    J. Li, C. Battaglino, I. Perros, J. Sun, and R. Vuduc. 2015. An input-adaptive and in-place approach to dense tensor-times-matrix multiply. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC’15). 1--12.
    [37]
    Anders Logg, Kent-Andre Mardal, and Garth Wells. 2012. Automated Solution of Differential Equations by the Finite Element Method: The FEniCS Book. Springer.
    [38]
    Fabio Luporini, Ana Lucia Varbanescu, Florian Rathgeber, Gheorghe-Teodor Bercea, J. Ramanujam, David A. Ham, and Paul H. J. Kelly. 2015. Cross-loop optimization of arithmetic intensity for finite element local assembly. ACM Trans. Archit. Code Optimiz. 11, 4 (Jan. 2015).
    [39]
    D. Matthews. 2018. High-performance tensor contraction without transposition. SIAM J. Sci. Comput. 40, 1 (2018), C1--C24.
    [40]
    T. Nelson, A. Rivera, P. Balaprakash, M. Hall, P. D. Hovland, E. Jessup, and B. Norris. 2015. Generating efficient tensor contractions for GPUs. In Proceedings of the 44th International Conference on Parallel Processing. 969--978.
    [41]
    Elmar Peise and Paolo Bientinesi. 2012. Performance modeling for dense linear algebra. In Proceedings of the SC Companion: High Performance Computing, Networking Storage and Analysis (SCC’12). IEEE Computer Society, Washington, DC, 406--416.
    [42]
    Elmar Peise, Diego Fabregat-Traver, and Paolo Bientinesi. 2015. On the performance prediction of BLAS-based tensor contractions. In Proceedings of the Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems Workshop (Lecture Notes in Computer Science, Vol. 8966). Springer, 193--212.
    [43]
    Florian Rathgeber, David A. Ham, Lawrence Mitchell, Michael Lange, Fabio Luporini, Andrew T. T. McRae, Gheorghe-Teodor Bercea, Graham R. Markall, and Paul H. J. Kelly. 2016. Firedrake: Automating the finite element method by composing abstractions. ACM Trans. Math. Softw. 43, 3 (Dec. 2016).
    [44]
    S. Schoeder, K. Kormann, W. A. Wall, and M. Kronbichler. 2018. Efficient explicit time stepping of high order discontinuous Galerkin schemes for waves. SIAM J. Sci. Comput. 40, 6 (2018), C803--C826.
    [45]
    Helmut Seidl, Reinhard Wilhelm, and Sebastian Hack. 2012. Compiler Design: Analysis and Transformation. Springer.
    [46]
    Y. Shi, U. N. Niranjan, A. Anandkumar, and C. Cecka. 2016. Tensor contractions with extended BLAS kernels on CPU and GPU. In Proceedings of the IEEE 23rd International Conference on High Performance Computing (HiPC’16). 193--202.
    [47]
    E. Solomonik, D. Matthews, J. Hammond, and J. Demmel. 2013. Cyclops tensor framework: Reducing communication and eliminating load imbalance in massively parallel contractions. In Proceedings of the IEEE 27th International Symposium on Parallel and Distributed Processing. 813--824.
    [48]
    Daniele G. Spampinato, Diego Fabregat-Traver, Paolo Bientinesi, and Markus Püschel. 2018. Program generation for small-scale linear algebra applications. In Proceedings of the International Symposium on Code Generation and Optimization (CGO’18). Association for Computing Machinery, New York, 327--339.
    [49]
    Daniele G. Spampinato and Markus Püschel. 2014. A basic linear algebra compiler. In Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization (CGO’14). Association for Computing Machinery, New York, 23--32.
    [50]
    Paul Springer and Paolo Bientinesi. 2018. Design of a high-performance GEMM-like tensor-tensor multiplication. ACM Trans. Math. Softw. 44, 3 (2018), 28:1--28:29.
    [51]
    Paul Springer, Jeff R. Hammond, and Paolo Bientinesi. 2017. TTC: A high-performance compiler for tensor transpositions. ACM Trans. Math. Softw. 44, 2 (Aug. 2017).
    [52]
    Kevin Stock, Tom Henretty, Iyyappa Murugandi, P. Sadayappan, and Robert Harrison. 2011. Model-driven SIMD code generation for a multi-resolution tensor kernel. In Proceedings of the IEEE Parallel and Distributed Processing Symposium. IEEE Computer Society, 1058--1067.
    [53]
    J. Treibig, G. Hager, and G. Wellein. 2010. LIKWID: A lightweight performance-oriented tool suite for x86 multicore environments. In Proceedings of the 1st International Workshop on Parallel Software Tools and Tool Infrastructures (PSTI’10).
    [54]
    Carsten Uphoff and Michael Bader. 2016. Generating high performance matrix kernels for earthquake simulations with viscoelastic attenuation. In Proceedings of the International Conference on High Performance Computing and Simulation (HPCS’16). 908--916.
    [55]
    Carsten Uphoff, Sebastian Rettenberger, Michael Bader, Elizabeth H. Madden, Thomas Ulrich, Stephanie Wollherr, and Alice-Agnes Gabriel. 2017. Extreme scale multi-physics simulations of the Tsunamigenic 2004 Sumatra megathrust earthquake. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC’17). ACM, New York, NY.
    [56]
    Field G. Van Zee and Robert A. van de Geijn. 2015. BLIS: A framework for rapidly instantiating BLAS functionality. ACM Trans. Math. Softw. 41, 3 (June 2015).
    [57]
    Peter Vincent, Freddie Witherden, Brian Vermeire, Jin Seok Park, and Arvind Iyer. 2016. Towards green aviation with Python at petascale. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC’16). IEEE Press, Piscataway, NJ. Retrieved from: http://dl.acm.org/citation.cfm?id=3014904.3014906.
    [58]
    Peter Wauligmann and Nathan W. Brei. 2019. PSpaMM: Portable Sparse Matrix Multiplication. Retrieved from https://github.com/peterwauligmann/pspamm.
    [59]
    Stephanie Wollherr, Alice-Agnes Gabriel, and Carsten Uphoff. 2018. Off-fault plasticity in three-dimensional dynamic rupture simulations using a modal Discontinuous Galerkin method on unstructured meshes: Implementation, verification and application. Geophys. J. Int. 214, 3 (2018), 1556--1584.
    [60]
    Bartosz D. Wozniak, Freddie D. Witherden, Francis P. Russell, Peter E. Vincent, and Paul H. J. Kelly. 2016. GiMMiK--Generating bespoke matrix multiplication kernels for accelerators: Application to high-order computational fluid dynamics. Comput. Phys. Commun. 202 (2016), 12--22.

    Cited By

    View all
    • (2024)Rupture Dynamics of Cascading Earthquakes in a Multiscale Fracture NetworkJournal of Geophysical Research: Solid Earth10.1029/2023JB027578129:3Online publication date: 19-Mar-2024
    • (2024)Fused GEMMs towards an efficient GPU implementation of the ADER‐DG method in SeisSolConcurrency and Computation: Practice and Experience10.1002/cpe.803736:12Online publication date: 13-Feb-2024
    • (2023)Multi-discretization domain specific language and code generation for differential equationsJournal of Computational Science10.1016/j.jocs.2023.10198168(101981)Online publication date: May-2023
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Mathematical Software
    ACM Transactions on Mathematical Software  Volume 46, Issue 4
    December 2020
    272 pages
    ISSN:0098-3500
    EISSN:1557-7295
    DOI:10.1145/3430683
    Issue’s Table of Contents
    This work is licensed under a Creative Commons Attribution-NoDerivatives International 4.0 License.

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 16 October 2020
    Accepted: 01 June 2020
    Revised: 01 February 2020
    Received: 01 March 2019
    Published in TOMS Volume 46, Issue 4

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. ADER-DG
    2. Tensor operations
    3. finite element method
    4. high-performance computing

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Funding Sources

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)197
    • Downloads (Last 6 weeks)20
    Reflects downloads up to 26 Jul 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Rupture Dynamics of Cascading Earthquakes in a Multiscale Fracture NetworkJournal of Geophysical Research: Solid Earth10.1029/2023JB027578129:3Online publication date: 19-Mar-2024
    • (2024)Fused GEMMs towards an efficient GPU implementation of the ADER‐DG method in SeisSolConcurrency and Computation: Practice and Experience10.1002/cpe.803736:12Online publication date: 13-Feb-2024
    • (2023)Multi-discretization domain specific language and code generation for differential equationsJournal of Computational Science10.1016/j.jocs.2023.10198168(101981)Online publication date: May-2023
    • (2022)Next-Generation Local Time Stepping for the ADER-DG Finite Element Method2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS53621.2022.00046(402-413)Online publication date: May-2022
    • (2022)A discontinuous Galerkin method for sequences of earthquakes and aseismic slip on multiple faults using unstructured curvilinear gridsGeophysical Journal International10.1093/gji/ggac467233:1(586-626)Online publication date: 25-Nov-2022
    • (2022)An efficient ADER-DG local time stepping scheme for 3D HPC simulation of seismic waves in poroelastic mediaJournal of Computational Physics10.1016/j.jcp.2021.110886455:COnline publication date: 15-Apr-2022
    • (2022)Finch: Domain Specific Language and Code Generation for Finite Element and Finite Volume in JuliaComputational Science – ICCS 202210.1007/978-3-031-08751-6_9(118-132)Online publication date: 21-Jun-2022
    • (2021)3D Linked Subduction, Dynamic Rupture, Tsunami, and Inundation Modeling: Dynamic Effects of Supershear and Tsunami Earthquakes, Hypocenter Location, and Shallow Fault SlipFrontiers in Earth Science10.3389/feart.2021.6268449Online publication date: 24-Jun-2021
    • (2021)3D acoustic-elastic coupling with gravityProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3458817.3476173(1-14)Online publication date: 14-Nov-2021
    • (2021)SeisSol on Distributed Multi-GPU Systems: CUDA Code Generation for the Modal Discontinuous Galerkin MethodThe International Conference on High Performance Computing in Asia-Pacific Region10.1145/3432261.3436753(69-82)Online publication date: 20-Jan-2021
    • Show More Cited By

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Get Access

    Login options

    Full Access

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media