Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

On the arithmetic intensity of high-order finite-volume discretizations for hyperbolic systems of conservation laws

Published: 01 January 2019 Publication History
  • Get Citation Alerts
  • Abstract

    It has been conjectured that higher-order discretizations for partial differential equations will have advantages over the lower-order counterparts commonly used today. The reasoning is that the increase in arithmetic operations will be more than offset by the reduction in data transfers and the increase in concurrent floating-point units. To evaluate this conjecture, the arithmetic intensity of a class of high-order finite-volume discretizations for hyperbolic systems of conservation laws is theoretically analyzed for spatial discretizations from orders three through eight in arbitrary dimensions. Three cache models are considered: the limiting cases of no cache and infinite cache as well as a finite-sized cache model. Models are validated experimentally by measuring floating-point operations and data transfers on an IBM Blue Gene/Q node. Theory and experiments demonstrate that high-order finite-volume methods will be able to provide increases in arithmetic intensity that will be necessary to make better utilization of on-node floating-point capability.

    References

    [1]
    2012BGPM - BG/Q Hardware Perf Monitoring API. International Business Machines Corporation.
    [2]
    Adams M, Colella P, Graves D . 2014Chombo software package for AMR applications - design document. Technical report LBNL-6616E, Lawrence Berkeley National Laboratory, CA.
    [3]
    Ashby S, Beckman P, Chen J . 2010The opportunities and challenges of exascale computing. Technical report, U.S. Department of Energy Advanced Scientific Computing Advisory Committee, Washington, DC.
    [4]
    Ballard G, Carson E, Demmel J . 2014Communication lower bounds and optimal algorithms for numerical linear algebra. Acta NumericaVolume 23 : pp.1-–155.
    [5]
    Basu P, Hall M, Williams S . 2015Compiler-directed transformation for higher-order stencils. In: 29th annual IEEE international parallel and distributed processing symposium ed O'Conner L, Hyderabad, India, <conf-date>25-29 May 2015</conf-date>, pp.pp.313-–323. Los Alamitos, CA, USA: IEEE Computer Society Conference Publishing Services.
    [6]
    Bell J, Almgren A, Beckner V . 2013BoxLib User's Guide. Center for Computational Sciences and Engineering, Lawrence Berkeley National Laboratory, CA.
    [7]
    Bermejo-Moreno I, Bodart J, Larsson J . 2013Solving the compressible Navier-Stokes equations on up to 1.97 million cores and 4.1 trillion grid points. In: SC `13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, eds Gropp W, Matsuoka S, Denver, CO, <conf-date>17-21 November 2013</conf-date>, pp.pp.62:1-–62:10. New York, NY, USA: ACM.
    [8]
    Bhatelé A, Wesolowski L, Bohm E . 2010Understanding application performance via micro-benchmarks on three large supercomputers: Intrepid, Ranger and Jaguar. The International Journal of High Performance Computing ApplicationsVolume 24 Issue 4: pp.411-–427.
    [9]
    Bohr M 2007A 30 year retrospective on Dennard's MOSFET scaling paper. IEEE Solid-State Circuits Society NewsletterVolume 12 Issue 1: pp.11-–13.
    [10]
    Boris JP, Book DL 1973Flux-corrected transport. I. SHASTA, a fluid transport algorithm that works. Journal of Computational PhysicsVolume 11 : pp.38-–69.
    [11]
    Callahan D, Cocke J, Kennedy K 1988Estimating interlock and improving balance for pipelined architectures. Journal of Parallel and Distributed ComputingVolume 5 Issue 4: pp.334-–358.
    [12]
    Carr S, Kennedy K 1994Improving the ratio of memory operations to floating-point operations in loops. ACM Transactions on Programming Languages and SystemsVolume 16 Issue 6: pp.1768-–1810.
    [13]
    Chaplin C, Colella P 2017A single stage flux-corrected transport algorithm for high-order finite-volume methods. Communications in Applied Mathematics and Computational ScienceVolume arXiv : 1506.02999.
    [14]
    Chen G, Chacón L, Barnes DC 2012An efficient mixed-precision, hybrid CPU-GPU implementation of a nonlinearly implicit one-dimensional particle-in-cell algorithm. Journal of Computational PhysicsVolume 231 Issue 16: pp.5374-–5388.
    [15]
    Colella P, Woodward PR 1984The piecewise parabolic method PPM for gas-dynamical simulations. Journal of Computational PhysicsVolume 54 Issue 1: pp.174-–201.
    [16]
    Colella P, Dorr MR, Hittinger JAF . 2011High-order, finite-volume methods in mapped coordinates. Journal of Computational PhysicsVolume 230 Issue 8: pp.2952-–2976.
    [17]
    Dongarra J, Hittinger J, Bell J . 2014Applied mathematics research for exascale computing. Technical report, Advanced Scientific Computing Research Program, U.S. Department of Energy Office of Science, Washington, DC.
    [18]
    Godenschwager C, Schornbaum F, Bauer M . 2013A framework for hybrid parallel flow simulations with a trillion cells in complex geometries. In: SC `13 Proceedings of the international conference on high performance computing, networking, storage and analysis eds Gropp W, Matsuoka S, Denver, CO, USA, <conf-date>17-21 November 2013</conf-date>, pp.pp.35:1-–35:12. New York, NY, USA: ACM.
    [19]
    Golub G, Van Loan C 2012Matrix Computations Johns Hopkins Studies in the Mathematical Sciences. Baltimore, MD: Johns Hopkins University Press.
    [20]
    Gottlieb S, Shu CW, Tadmor E 2001Strong stability-preserving high-order time discretization methods. SIAM ReviewVolume 43 Issue 1: pp.89-–112.
    [21]
    Hejazialhosseini B, Rossinelli D, Conti C . 2012High throughput software for direct numerical simulations of compressible two-phase flows. In: SC `12 Proceedings of the international conference on high performance computing, networking, storage and analysis ed Hollingsworth J, Salt Lake City, UT, USA, <conf-date>11-15 November 2012</conf-date>, pp.pp.1-–12. New York, NY, USA: ACM.
    [22]
    Hornung RD, Kohn SR 2002Managing application complexity in the SAMRAI object-oriented framework. Concurrency and ComputationVolume 14 : pp.347-–368.
    [23]
    Idomura Y, Jolliet S 2011Performance evaluations of gyrokinetic Eulerian code GT5D on massively parallel multi-core platforms. In: State of the practice reports, pp. .
    [24]
    Jiang GS, Shu CW 1996Efficient implementation of weighted ENO schemes. Journal of Computational PhysicsVolume 126 Issue 1: pp.202-–228.
    [25]
    Jia-Wei H, Kung HT 1981I/O complexity: The red-blue pebble game. In: Proceedings of the thirteenth annual ACM symposium on theory of computing ed Rivest R, Milwaukee, Wisconsin, USA, <conf-date>11-13 May 1981</conf-date>, pp.pp.326-–333. New York, NY, USA: ACM.
    [26]
    Kreiss HO, Oliger J 1972Comparison of accurate methods for the integration of hyperbolic equations. TellusVolume 24 Issue 3: pp.199-–215.
    [27]
    Langguth J, Wu N, Chai J . 2013On the GPU performance of cell-centered finite volume method over unstructured tetrahedral meshes. In: Proceedings of the 3rd workshop on irregular applications: Architectures and algorithms eds Tumeo A, Feo J, Villa O, Secchi S., Denver, CO, USA, <conf-date>17-21 November 2013</conf-date>, pp. . New York, NY, USA: ACM.
    [28]
    Liu XD, Osher S, Chan T 1994Weighted essentially non-oscillatory schemes. Journal of Computational PhysicsVolume 115 Issue 1: pp.200-–212.
    [29]
    Lucas R, Ang J, Bergman K . 2014Top ten exascale research challenges. Technical report, U.S. Department of Energy Advanced Scientific Computing Advisory Committee, Washington, DC.
    [30]
    McCorquodale P, Colella P 2011a A high-order finite volume method for conservation laws on locally refined grids. Communications in Applied Mathematics and Computational ScienceVolume 6 Issue 1: pp.1-–25.
    [31]
    McCorquodale P, Dorr MR, Hittinger JAF . 2015High-order finite-volume methods for hyperbolic conservation laws on mapped multiblock grids. Journal of Computational PhysicsVolume 288 : pp.181-–195.
    [32]
    Olschanowsky C, Strout MM, Guzik S . 2014A study on balancing parallelism, data locality, and recomputation in existing PDE solvers. In: SC `14 Proceedings of the international conference on high performance computing, networking, storage and analysis eds Damkroger T, Dongarra J, New Orleans, LA, USA, <conf-date>16-21 November 2014</conf-date>, pp.pp.793-–804. New York, NY, USA: ACM.
    [33]
    Pananilath I, Acharya A, Vasista V . 2015An optimizing code generator for a class of lattice-Boltzmann computations. ACM Transactions on Architecture and Code OptimizationVolume 12 Issue 2: pp.14:1-–14:23.
    [34]
    Rossinelli D, Hejazialhosseini B, Hadjidoukas P . 201311 pflop/s simulations of cloud cavitation collapse. In: SC `13 Proceedings of the international conference on high performance computing, networking, storage and analysis eds Gropp W, Matsuoka S, Denver, CO, USA, <conf-date>17-21 November 2013</conf-date>, pp.pp.3:1-–3:13. New York, NY, USA: ACM.
    [35]
    Rossinelli D, Hejazialhosseini B, Spampinato DG . 2011Multicore/multi-GPU accelerated simulations of multiphase compressible flows using wavelet adapted grids. SIAM Journal on Scientific ComputingVolume 33 Issue 2: pp.512-–540.
    [36]
    Shu CW 1997Essentially non-oscillatory and weighted essentially non-oscillatory schemes for hyperbolic conservation laws. Technical report ICASE-97-65, NASA, Washington, DC.
    [37]
    Sim J, Dasgupta A, Kim H . 2012A performance analysis framework for identifying potential benefits in GPGPU applications. SIGPLAN NoticesVolume 47 Issue 8: pp.11-–22.
    [38]
    Stengel H, Treibig J, Hager G . 2015Quantifying performance bottlenecks of stencil computations using the execution-cache-memory model. In: Proceedings of the 29th ACM on international conference on supercomputing eds Bhuyan LN, Chong F, Sarkar V, Newport Beach, CA, USA, <conf-date>8-11 June 2015</conf-date>, pp.pp.207-–216. New York, NY, USA: ACM.
    [39]
    Williams S, Carter J, Oliker L . 2009a Optimization of a lattice Boltzmann computation on state-of-the-art multicore platforms. Journal of Parallel and Distributed ComputingVolume 69 Issue 9: pp.762-–777.
    [40]
    Williams S, Oliker L, Carter J . 2011Extracting ultrascale lattice Boltzmann performance via hierarchical and distributed auto-tuning. In: SC `11 Proceedings of the international conference on high performance computing, networking, storage and analysis eds Lathrop S, Costa J, Kramer W Seattle, WA, USA, <conf-date>12-18 November 2011</conf-date>, pp.pp.55:1-–55:12. New York, NY, USA: ACM.
    [41]
    Williams S, Waterman A, Patterson D 2009b Roofline: An insightful visual performance model for multicore architectures. Communications of the ACMVolume 52 Issue 4: pp.65-–76.
    [42]
    Yokota R, Barba LA 2012Hierarchical n-body simulations with autotuning for heterogeneous systems. Computing in Science & EngineeringVolume 14 Issue 3: pp.30-–39.
    [43]
    Zalesak ST 1979Fully multidimensional flux-corrected transport algorithms for fluids. Journal of Computational PhysicsVolume 31 Issue 2: pp.335-–362.
    [44]
    Zhang W, Wei W, Cai X 2014Performance modeling of serial and parallel implementations of the fractional Adams-Bashforth-Moulton method. Fractional Calculus and Applied AnalysisVolume 17 Issue 3: pp.617-–637.

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image International Journal of High Performance Computing Applications
    International Journal of High Performance Computing Applications  Volume 33, Issue 1
    1 2019
    221 pages

    Publisher

    Sage Publications, Inc.

    United States

    Publication History

    Published: 01 January 2019

    Author Tags

    1. Arithmetic intensity
    2. algorithmic balance
    3. high-order finite-volume methods
    4. hyperbolic systems of conservation laws
    5. processor-memory performance gap

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 0
      Total Downloads
    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 11 Aug 2024

    Other Metrics

    Citations

    View Options

    View options

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media