research-article

Hardware Counter-based Performance Analysis of ANUGA Flood Simulator

Authors:

Harshada A Jadhav,

Sandeep K Joshi,

V Venkatesh ShenoiAuthors Info & Claims

IC3-2023: Proceedings of the 2023 Fifteenth International Conference on Contemporary Computing

Pages 412 - 418

https://doi.org/10.1145/3607947.3608041

Published: 28 September 2023 Publication History

Abstract

ANUGA is a Python-based finite volume solver on the unstructured grid for shallow water model in two dimensions for flood modelling. This paper is an account of the hardware counter-based performance study to understand the impact of the memory hierarchy (due to the relative sizes of cache) in two different generations of Intel architecture on the performance of this application. A quantitative analysis of the memory access in the compute-intensive portion of the solver (the time evolution of the quantities associated with the fluid flow on the domain) is attempted by instrumenting with the PAPI library. The hit rates at different levels of cache/memory for various decomposition of the region of interest (computational domain) across processes bring out the impact of optimal sub-domain sizes. The size of the working set ensures temporal locality for iterative loop computations handled by the individual processes, leading to better parallel performance on distributed memory systems. Further, this study showcases the possibility of achieving better performance with larger data sets through suitable decomposition such that the working set fits into various levels of the cache hierarchy.

References

[1]

Nisha Agrawal, Abhishek Das, Girishchandra R. Yendargaye, T. S. Murugesh Prabhu, Sandeep K. Joshi, and V. Venkatesh Shenoi. 2021. Performance analysis of Python-based finite volume solver ANUGA on modern architectures. In Proceedings of the 13th International Conference on Contemporary Computing, IC3 ’21). Association for Computing Machinery, New York, NY, USA, 378–387. https://doi.org/10.1145/3474124.3474152

Digital Library

[2]

Ulf Andersson and Phil Mucci. 2005. Analysis and optimization of Yee_Bench using hardware performance counters. In Proceedings of Parallel Computing 2005 (ParCo).

[3]

Dong Chen, Fangzhou Liu, Chen Ding, and Sreepathi Pai. 2018. Locality Analysis through Static Parallel Sampling. In Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation (Philadelphia, PA, USA) (PLDI 2018). Association for Computing Machinery, New York, NY, USA, 557–570. https://doi.org/10.1145/3192366.3192402

Digital Library

[4]

Gopinath Chennupati, Nandakishore Santhi, and Stephan Eidenbenz. 2019. Scalable Performance Prediction of Codes with Memory Hierarchy and Pipelines. In Proceedings of the 2019 ACM SIGSIM Conference on Principles of Advanced Discrete Simulation (Chicago, IL, USA) (SIGSIM-PADS ’19). Association for Computing Machinery, New York, NY, USA, 13–24. https://doi.org/10.1145/3316480.3325518

Digital Library

[5]

Nan Ding, Shiming Xu, Zhenya Song, Baoquan Zhang, Jingmei Li, and Zhigao Zheng. 2019. Using hardware counter-based performance model to diagnose scaling issues of HPC applications. Neural Computing and Applications 31 (2019), 1563–1575. https://doi.org/10.1007/s00521-018-3496-z

Digital Library

[6]

Marta Garcia-Gasulla, Fabio Banchelli, Kilian Peiro, Guillem Ramirez-Gargallo, Guillaume Houzeaux, Ismaïl Ben Hassan Saïdi, Christian Tenaud, Ivan Spisso, and Filippo Mantovani. 2020. A generic performance analysis technique applied to different CFD methods for HPC. International Journal of Computational Fluid Dynamics 34, 7-8 (2020), 508–528. https://doi.org/10.1080/10618562.2020.1778168

[7]

Ioan Hadade, Feng Wang, Mauro Carnevale, and Luca di Mare. 2019. Some useful optimisations for unstructured computational fluid dynamics codes on multicore and manycore architectures. Computer Physics Communications 235 (2019), 305–323. https://doi.org/10.1016/j.cpc.2018.07.001

[8]

Shobhit Jagga and Preeti Malakar. 2021. Parallel Program Scaling Analysis using Hardware Counters. In Proceedings of the 30th International Symposium on High-Performance Parallel and Distributed Computing (HPDC ’21). Association for Computing Machinery, New York, NY, USA, 259–260. https://doi.org/10.1145/3431379.3464453

Digital Library

[9]

George Karypis and Vipin Kumar. 2013. METIS - Serial Graph Partitioning and Fill-reducing Matrix Ordering. http://glaros.dtc.umn.edu/gkhome/metis/metis/overview

[10]

D. K. Kaushik and D. E. Keyes. 1999. Efficient parallelization of an unstructured grid solver: A memory-centric approach. Technical Report. Istambul Technical University.

[11]

Ozgur O. Kilic, Nathan R. Tallent, and Ryan D. Friese. 2020. Rapid Memory Footprint Access Diagnostics. In Proceedings of IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). 273–284. https://doi.org/10.1109/ISPASS48437.2020.00047

[12]

Gabriel Marin and John Mellor-Crummey. 2004. Cross-Architecture Performance Predictions for Scientific Applications Using Parameterized Models. In Proceedings of the Joint International Conference on Measurement and Modeling of Computer Systems (New York, NY, USA) (SIGMETRICS ’04/Performance ’04). Association for Computing Machinery, New York, NY, USA, 2–13. https://doi.org/10.1145/1005686.1005691

Digital Library

[13]

Daniel Molka, Robert Schöne, Daniel Hackenberg, and Wolfgang E. Nagel. 2017. Detecting Memory-Boundedness with Hardware Performance Counters. In Proceedings of the 8th ACM/SPEC on International Conference on Performance Engineering (L’Aquila, Italy) (ICPE ’17). Association for Computing Machinery, New York, NY, USA, 27–38. https://doi.org/10.1145/3030207.3030223

Digital Library

[14]

Mohammad Alaul Haque Monil, Seyong Lee, Jeffrey S. Vetter, and Allen D. Malony. 2020. Understanding the Impact of Memory Access Patterns in Intel Processors. In Proceedings of IEEE/ACM Workshop on Memory Centric High Performance Computing (MCHPC 2020). 52–61. https://doi.org/10.1109/MCHPC51950.2020.00012

[15]

Philip J. Mucci, Shirley Browne, Christine Deane, and George Ho. 1999. PAPI: A Portable Interface to Hardware Performance Counters. In In Proceedings of the Department of Defense HPCMP Users Group Conference. 7–10.

[16]

Sudi Mungkasi and Stephen Gwyn Roberts. 2011. A finite volume method for shallow water flows on triangular computational grids. In Proceedings of IEEE International Conference on Advanced Computer Science and Information Systems, ICACSIS 2011. 79–84. https://ieeexplore.ieee.org/xpl/conhome/6132213/proceeding

[17]

A. M. B. Owenson, S. A. Wright, R. A. Bunt, Y. K. Ho, M. J. Street, and S. A. Jarvis. 2018. An unstructured CFD mini-application for the performance prediction of a production CFD code. Concurrency and Computation: Practice and Experience 32:e5443 (2018), 1–14. https://doi.org/10.1002/cpe.5443

[18]

Stephen G Roberts, Yusuke Oishi, and Michael Li. 2013. High resolution tsunami inundation simulations. In Proceedings of International Congress on Modelling and Simulation. Modelling and Simulation Society of Australia and New Zealand, MODSIM 2013, J. Piantadosi, R. S. Anderssen, and J. Boland (Eds.). 310–316. https://www.mssanz.org.au/modsim2013/

[19]

Abhinav Sarje, Sukhyun Song, Douglas Jacobsen, Kevin Huck, Jeffrey Hollingsworth, Allen Malony, Samuel Williams, and Leonid Oliker. 2015. Parallel Performance optimizations on unstructured mesh-based simulations. Procedia Computer Science 51 (2015), 2016–2025. https://doi.org/10.1016/j.procs.2015.05.466

Digital Library

[20]

Michael Wagner, Stephan Mohr, Judit Giménez, and Jesús Labarta. 2017. A Structured Approach to Performance Analysis. In Tools for High Performance Computing 2017: Proceedings of the 11th International Workshop on Parallel Tools for High Performance Computing. 1–15.

[21]

Xiaoya Xiang, Chen Ding, Hao Luo, and Bin Bao. 2013. HOTL: A Higher Order Theory of Locality. In Proceedings of the 2013 ACM ASPLOS (Houston, Texas, USA) (ASPLOS ’13). Association for Computing Machinery, New York, NY, USA, 343–356. https://doi.org/10.1145/2451116.2451153

Digital Library

Index Terms

Hardware Counter-based Performance Analysis of ANUGA Flood Simulator
1. Mathematics of computing
  1. Mathematical software
    1. Mathematical software performance

Recommendations

Counter-Based Cache Replacement and Bypassing Algorithms

Recent studies have shown that in highly associative caches, the performance gap between the Least Recently Used (LRU) and the theoretical optimal replacement algorithms is large, motivating the design of alternative replacement algorithms to improve ...
Modelling accesses to migratory and producer-consumer characterised data in a shared memory multiprocessor
SPDP '94: Proceedings of the 1994 6th IEEE Symposium on Parallel and Distributed Processing

Directory-based, write-invalidate cache coherence protocols are effective in reducing latencies to the memory but suffer from cache misses due to coherence actions. It is therefore important to understand the nature of data sharing causing misses for ...
A low-power cache scheme for embedded computing
Issues in embedded single-chip multicore architectures

This paper proposes an efficient cache scheme to reduce power consumption and conflict misses for single-core or multi-core embedded computing architecture. The proposed cache requires an additional gate stage before it accesses the cache line, which ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

IC3-2023: Proceedings of the 2023 Fifteenth International Conference on Contemporary Computing

August 2023

783 pages

ISBN:9798400700224

DOI:10.1145/3607947

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 September 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

IC3 2023

IC3 2023: 2023 Fifteenth International Conference on Contemporary Computing

August 3 - 5, 2023

Noida, India

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
32
Total Downloads

Downloads (Last 12 months)32
Downloads (Last 6 weeks)4

Reflects downloads up to 12 Sep 2024

Other Metrics

View Author Metrics

Citations

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Table of Contents