Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3607947.3608041acmotherconferencesArticle/Chapter ViewAbstractPublication Pagesic3Conference Proceedingsconference-collections
research-article

Hardware Counter-based Performance Analysis of ANUGA Flood Simulator

Published: 28 September 2023 Publication History

Abstract

ANUGA is a Python-based finite volume solver on the unstructured grid for shallow water model in two dimensions for flood modelling. This paper is an account of the hardware counter-based performance study to understand the impact of the memory hierarchy (due to the relative sizes of cache) in two different generations of Intel architecture on the performance of this application. A quantitative analysis of the memory access in the compute-intensive portion of the solver (the time evolution of the quantities associated with the fluid flow on the domain) is attempted by instrumenting with the PAPI library. The hit rates at different levels of cache/memory for various decomposition of the region of interest (computational domain) across processes bring out the impact of optimal sub-domain sizes. The size of the working set ensures temporal locality for iterative loop computations handled by the individual processes, leading to better parallel performance on distributed memory systems. Further, this study showcases the possibility of achieving better performance with larger data sets through suitable decomposition such that the working set fits into various levels of the cache hierarchy.

References

[1]
Nisha Agrawal, Abhishek Das, Girishchandra R. Yendargaye, T. S. Murugesh Prabhu, Sandeep K. Joshi, and V. Venkatesh Shenoi. 2021. Performance analysis of Python-based finite volume solver ANUGA on modern architectures. In Proceedings of the 13th International Conference on Contemporary Computing, IC3 ’21). Association for Computing Machinery, New York, NY, USA, 378–387. https://doi.org/10.1145/3474124.3474152
[2]
Ulf Andersson and Phil Mucci. 2005. Analysis and optimization of Yee_Bench using hardware performance counters. In Proceedings of Parallel Computing 2005 (ParCo).
[3]
Dong Chen, Fangzhou Liu, Chen Ding, and Sreepathi Pai. 2018. Locality Analysis through Static Parallel Sampling. In Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation (Philadelphia, PA, USA) (PLDI 2018). Association for Computing Machinery, New York, NY, USA, 557–570. https://doi.org/10.1145/3192366.3192402
[4]
Gopinath Chennupati, Nandakishore Santhi, and Stephan Eidenbenz. 2019. Scalable Performance Prediction of Codes with Memory Hierarchy and Pipelines. In Proceedings of the 2019 ACM SIGSIM Conference on Principles of Advanced Discrete Simulation (Chicago, IL, USA) (SIGSIM-PADS ’19). Association for Computing Machinery, New York, NY, USA, 13–24. https://doi.org/10.1145/3316480.3325518
[5]
Nan Ding, Shiming Xu, Zhenya Song, Baoquan Zhang, Jingmei Li, and Zhigao Zheng. 2019. Using hardware counter-based performance model to diagnose scaling issues of HPC applications. Neural Computing and Applications 31 (2019), 1563–1575. https://doi.org/10.1007/s00521-018-3496-z
[6]
Marta Garcia-Gasulla, Fabio Banchelli, Kilian Peiro, Guillem Ramirez-Gargallo, Guillaume Houzeaux, Ismaïl Ben Hassan Saïdi, Christian Tenaud, Ivan Spisso, and Filippo Mantovani. 2020. A generic performance analysis technique applied to different CFD methods for HPC. International Journal of Computational Fluid Dynamics 34, 7-8 (2020), 508–528. https://doi.org/10.1080/10618562.2020.1778168
[7]
Ioan Hadade, Feng Wang, Mauro Carnevale, and Luca di Mare. 2019. Some useful optimisations for unstructured computational fluid dynamics codes on multicore and manycore architectures. Computer Physics Communications 235 (2019), 305–323. https://doi.org/10.1016/j.cpc.2018.07.001
[8]
Shobhit Jagga and Preeti Malakar. 2021. Parallel Program Scaling Analysis using Hardware Counters. In Proceedings of the 30th International Symposium on High-Performance Parallel and Distributed Computing (HPDC ’21). Association for Computing Machinery, New York, NY, USA, 259–260. https://doi.org/10.1145/3431379.3464453
[9]
George Karypis and Vipin Kumar. 2013. METIS - Serial Graph Partitioning and Fill-reducing Matrix Ordering. http://glaros.dtc.umn.edu/gkhome/metis/metis/overview
[10]
D. K. Kaushik and D. E. Keyes. 1999. Efficient parallelization of an unstructured grid solver: A memory-centric approach. Technical Report. Istambul Technical University.
[11]
Ozgur O. Kilic, Nathan R. Tallent, and Ryan D. Friese. 2020. Rapid Memory Footprint Access Diagnostics. In Proceedings of IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). 273–284. https://doi.org/10.1109/ISPASS48437.2020.00047
[12]
Gabriel Marin and John Mellor-Crummey. 2004. Cross-Architecture Performance Predictions for Scientific Applications Using Parameterized Models. In Proceedings of the Joint International Conference on Measurement and Modeling of Computer Systems (New York, NY, USA) (SIGMETRICS ’04/Performance ’04). Association for Computing Machinery, New York, NY, USA, 2–13. https://doi.org/10.1145/1005686.1005691
[13]
Daniel Molka, Robert Schöne, Daniel Hackenberg, and Wolfgang E. Nagel. 2017. Detecting Memory-Boundedness with Hardware Performance Counters. In Proceedings of the 8th ACM/SPEC on International Conference on Performance Engineering (L’Aquila, Italy) (ICPE ’17). Association for Computing Machinery, New York, NY, USA, 27–38. https://doi.org/10.1145/3030207.3030223
[14]
Mohammad Alaul Haque Monil, Seyong Lee, Jeffrey S. Vetter, and Allen D. Malony. 2020. Understanding the Impact of Memory Access Patterns in Intel Processors. In Proceedings of IEEE/ACM Workshop on Memory Centric High Performance Computing (MCHPC 2020). 52–61. https://doi.org/10.1109/MCHPC51950.2020.00012
[15]
Philip J. Mucci, Shirley Browne, Christine Deane, and George Ho. 1999. PAPI: A Portable Interface to Hardware Performance Counters. In In Proceedings of the Department of Defense HPCMP Users Group Conference. 7–10.
[16]
Sudi Mungkasi and Stephen Gwyn Roberts. 2011. A finite volume method for shallow water flows on triangular computational grids. In Proceedings of IEEE International Conference on Advanced Computer Science and Information Systems, ICACSIS 2011. 79–84. https://ieeexplore.ieee.org/xpl/conhome/6132213/proceeding
[17]
A. M. B. Owenson, S. A. Wright, R. A. Bunt, Y. K. Ho, M. J. Street, and S. A. Jarvis. 2018. An unstructured CFD mini-application for the performance prediction of a production CFD code. Concurrency and Computation: Practice and Experience 32:e5443 (2018), 1–14. https://doi.org/10.1002/cpe.5443
[18]
Stephen G Roberts, Yusuke Oishi, and Michael Li. 2013. High resolution tsunami inundation simulations. In Proceedings of International Congress on Modelling and Simulation. Modelling and Simulation Society of Australia and New Zealand, MODSIM 2013, J. Piantadosi, R. S. Anderssen, and J. Boland (Eds.). 310–316. https://www.mssanz.org.au/modsim2013/
[19]
Abhinav Sarje, Sukhyun Song, Douglas Jacobsen, Kevin Huck, Jeffrey Hollingsworth, Allen Malony, Samuel Williams, and Leonid Oliker. 2015. Parallel Performance optimizations on unstructured mesh-based simulations. Procedia Computer Science 51 (2015), 2016–2025. https://doi.org/10.1016/j.procs.2015.05.466
[20]
Michael Wagner, Stephan Mohr, Judit Giménez, and Jesús Labarta. 2017. A Structured Approach to Performance Analysis. In Tools for High Performance Computing 2017: Proceedings of the 11th International Workshop on Parallel Tools for High Performance Computing. 1–15.
[21]
Xiaoya Xiang, Chen Ding, Hao Luo, and Bin Bao. 2013. HOTL: A Higher Order Theory of Locality. In Proceedings of the 2013 ACM ASPLOS (Houston, Texas, USA) (ASPLOS ’13). Association for Computing Machinery, New York, NY, USA, 343–356. https://doi.org/10.1145/2451116.2451153

Index Terms

  1. Hardware Counter-based Performance Analysis of ANUGA Flood Simulator

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    IC3-2023: Proceedings of the 2023 Fifteenth International Conference on Contemporary Computing
    August 2023
    783 pages
    ISBN:9798400700224
    DOI:10.1145/3607947
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 28 September 2023

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. HPC
    2. cache misses
    3. finite volume solver
    4. instrumentation
    5. performance
    6. unstructured grid

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    IC3 2023

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 32
      Total Downloads
    • Downloads (Last 12 months)32
    • Downloads (Last 6 weeks)4
    Reflects downloads up to 12 Sep 2024

    Other Metrics

    Citations

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media