Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3605731.3605884acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicppConference Proceedingsconference-collections
research-article

Measuring Thread Timing to Assess the Feasibility of Early-bird Message Delivery

Published: 07 September 2023 Publication History

Abstract

Early-bird communication is a communication/computation overlap technique that combines fine-grained communication with partitioned communication to improve application run-time. Communication is divided among the compute threads such that each individual thread can initiate transmission of its portion of the data as soon as it is complete rather than waiting for all of the threads. However, the benefit of early-bird communication depends on the completion timing of the individual threads.
In this paper, we measure and evaluate the potential overlap, the idle time each thread experiences between finishing their computation and the final thread finishing. These measurements help us understand whether a given application could benefit from early-bird communication. We present our technique for gathering this data and evaluate data collected from three proxy applications: MiniFE, MiniMD, and MiniQMC. To characterize the behavior of these workloads, we study the thread timings at both a macro level, i.e., across all threads across all runs of an application, and a micro level, i.e., within a single process of a single run. We observe that these applications exhibit significantly different behavior. While MiniFE and MiniQMC appear to be well-suited for early-bird communication because of their wider thread distribution and more frequent laggard threads, the behavior of MiniMD may limit its ability to leverage early-bird communication.

References

[1]
David E Bernholdt, Swen Boehm, George Bosilca, Manjunath Gorentla Venkata, Ryan E Grant, Thomas Naughton, Howard P Pritchard, Martin Schulz, and Geoffroy R Vallee. 2020. A survey of MPI usage in the US exascale computing project. Concurrency and Computation: Practice and Experience 32, 3 (2020), e4851.
[2]
Intel Corporation. [n. d.]. Intel VTune Profiler User Guide. https://www.intel.com/content/www/us/en/develop/documentation/vtune-help/top.html?wapkw=intel%20vtune%20profiler. Accessed: 2022-08-04.
[3]
RALPH B. D'AGOSTINO. 1971. An omnibus test of normality for moderate and large size samples. Biometrika 58, 2 (1971), 341–348. https://doi.org/10.1093/biomet/58.2.341
[4]
Luiz DeRose, Bernd Mohr, and Seetharami Seelam. 2004. Profiling and tracing OpenMP applications with POMP based monitoring libraries. In European Conference on Parallel Processing. Springer, 39–46.
[5]
James Dinan, Pavan Balaji, David Goodell, Douglas Miller, Marc Snir, and Rajeev Thakur. 2013. Enabling MPI interoperability through flexible communication endpoints. In Proceedings of the 20th European MPI Users’ Group Meeting. 13–18.
[6]
Matthew GF Dosanjh, Taylor Groves, Ryan E Grant, Ron Brightwell, and Patrick G Bridges. 2016. RMA-MT: a benchmark suite for assessing MPI multi-threaded RMA performance. In 2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid). IEEE, 550–559.
[7]
Mario Flajslik, James Dinan, and Keith D Underwood. 2016. Mitigating MPI message matching misery. In International conference on high performance computing. Springer, 281–299.
[8]
Karl Fürlinger and Michael Gerndt. 2005. ompP: A profiling tool for OpenMP. In International Workshop on OpenMP. Springer, 15–23.
[9]
Todd Gamblin. [n. d.]. Scalable Load Balance Analysis. https://github.com/tgamblin/libra. Accessed: 2022-08-03.
[10]
Todd Gamblin, Bronis R De Supinski, Martin Schulz, Rob Fowler, and Daniel A Reed. 2008. Scalable load-balance measurement for SPMD codes. In SC’08: Proceedings of the 2008 ACM/IEEE conference on Supercomputing. IEEE, 1–12.
[11]
Ryan E Grant, Matthew GF Dosanjh, Michael J Levenhagen, Ron Brightwell, and Anthony Skjellum. 2019. Finepoints: Partitioned multithreaded MPI communication. In International Conference on High Performance Computing. Springer, 330–350.
[12]
Michael Heroux and Richard Barrett. 2019. Mantevo project. https://mantevo.github.io/
[13]
Jeongnim Kim, Andrew D Baczewski, Todd D Beaudet, Anouar Benali, M Chandler Bennett, Mark A Berrill, Nick S Blunt, Edgar Josué Landinez Borda, Michele Casula, David M Ceperley, Simone Chiesa, Bryan K Clark, Raymond C Clay, Kris T Delaney, Mark Dewing, Kenneth P Esler, Hongxia Hao, Olle Heinonen, Paul R C Kent, Jaron T Krogel, Ilkka Kylänpää, Ying Wai Li, M Graham Lopez, Ye Luo, Fionn D Malone, Richard M Martin, Amrita Mathuriya, Jeremy McMinis, Cody A Melton, Lubos Mitas, Miguel A Morales, Eric Neuscamman, William D Parker, Sergio D Pineda Flores, Nichols A Romero, Brenda M Rubenstein, Jacqueline A R Shea, Hyeondeok Shin, Luke Shulenburger, Andreas F Tillack, Joshua P Townsend, Norm M Tubman, Brett Van Der Goetz, Jordan E Vincent, D ChangMo Yang, Yubo Yang, Shuai Zhang, and Luning Zhao. 2018. QMCPACK: an open source ab initio quantum Monte Carlo package for the electronic structure of atoms, molecules and solids. Journal of Physics: Condensed Matter 30, 19 (April 2018), 195901. https://doi.org/10.1088/1361-648x/aab9c3
[14]
Xu Liu, John Mellor-Crummey, and Michael Fagan. 2013. A new approach for performance analysis of OpenMP programs. In Proceedings of the 27th international ACM conference on International conference on supercomputing. 69–80.
[15]
W Pepper Marts, Matthew GF Dosanjh, Scott Levy, Whit Schonbein, Ryan E Grant, and Patrick G Bridges. 2021. MiniMod: A Modular Miniapplication Benchmarking Framework for HPC. In 2021 IEEE International Conference on Cluster Computing (CLUSTER). IEEE, 12–22.
[16]
PJ Mendygral, Nick Radcliffe, Krishna Kandalla, David Porter, Brian J O’Neill, Chris Nolting, Paul Edmon, Julius MF Donnert, and Thomas W Jones. 2017. WOMBAT: A scalable and high-performance astrophysical magnetohydrodynamics code. The Astrophysical Journal Supplement Series 228, 2 (2017), 23.
[17]
Bernd Mohr, Allen D Malony, Sameer Shende, and Felix Wolf. 2002. Design and prototype of a performance tool interface for OpenMP. The Journal of Supercomputing 23, 1 (2002), 105–128.
[18]
Mubrak S Mohsen, Rosni Abdullah, and Yong M Teo. 2009. A survey on performance tools for OpenMP. World Academy of Science, Engineering and Technology 49 (2009), 754–765.
[19]
Alessandro Morari, Roberto Gioiosa, Robert W Wisniewski, Francisco J Cazorla, and Mateo Valero. 2011. A quantitative analysis of OS noise. In 2011 IEEE International Parallel & Distributed Processing Symposium. IEEE, 852–863.
[20]
Ananya Muddukrishna, Peter A Jonsson, and Mats Brorsson. 2015. Characterizing task-based OpenMP programs. PloS one 10, 4 (2015), e0123545.
[21]
Ananya Muddukrishna, Peter A Jonsson, Artur Podobas, and Mats Brorsson. 2016. Grain graphs: OpenMP performance analysis made easy. In Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. 1–13.
[22]
Institute of Electrical and Electronics Engineers. [n. d.]. International Organization for Standardization. Information Technology–Portable Operating System Interface (POSIX). https://pubs.opengroup.org/onlinepubs/9699919799.2018edition/. Accessed: 2022-08-04.
[23]
Fabian Orland and Christian Terboven. 2020. A case study on addressing complex load imbalance in OpenMP. In International Workshop on OpenMP. Springer, 130–145.
[24]
S. S. SHAPIRO and M. B. WILK. 1965. An analysis of variance test for normality (complete samples). Biometrika 52, 3-4 (dec 1965), 591–611. https://doi.org/10.1093/biomet/52.3-4.591
[25]
Srinivas Sridharan, James Dinan, and Dhiraj D Kalamkar. 2014. Enabling efficient multithreaded MPI communication through a library-based implementation of MPI endpoints. In SC’14: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, 487–498.
[26]
M. A. Stephens. 1974. EDF Statistics for Goodness of Fit and Some Comparisons. J. Amer. Statist. Assoc. 69, 347 (Sept. 1974), 730–737. https://doi.org/10.1080/01621459.1974.10480196
[27]
Yiltan Hassan Temucin, Ryan Grant, and Ahmad Afsahi. 2022. Micro-Benchmarking MPI Partitioned Point-to-Point Communication. In 2022 International Conference on Parallel Processing (ICPP. ACM.
[28]
Rajeev Thakur and William Gropp. 2009. Test suite for evaluating performance of multithreaded MPI communication. Parallel Comput. 35, 12 (2009), 608–617.
[29]
A. P. Thompson, H. M. Aktulga, R. Berger, D. S. Bolintineanu, W. M. Brown, P. S. Crozier, P. J. in ’t Veld, A. Kohlmeyer, S. G. Moore, T. D. Nguyen, R. Shan, M. J. Stevens, J. Tranchida, C. Trott, and S. J. Plimpton. 2022. LAMMPS - a flexible simulation tool for particle-based materials modeling at the atomic, meso, and continuum scales. Comp. Phys. Comm. 271 (2022), 108171. https://doi.org/10.1016/j.cpc.2021.108171
[30]
Karthikeyan Vaidyanathan, Dhiraj D Kalamkar, Kiran Pamnany, Jeff R Hammond, Pavan Balaji, Dipankar Das, Jongsoo Park, and Bálint Joó. 2015. Improving concurrency and asynchrony in multithreaded MPI applications using software offloading. In SC’15: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, 1–12.

Cited By

View all
  • (2024)Taking the MPI standard and the open MPI library to exascaleThe International Journal of High Performance Computing Applications10.1177/10943420241265936Online publication date: 23-Jul-2024
  • (2023)Modeling and Benchmarking the Potential Benefit of Early-Bird Transmission in Fine-Grained CommunicationProceedings of the 52nd International Conference on Parallel Processing10.1145/3605573.3605618(306-316)Online publication date: 7-Aug-2023
  • (2023)A Dynamic Network-Native MPI Partitioned Aggregation Over InfiniBand Verbs2023 IEEE International Conference on Cluster Computing (CLUSTER)10.1109/CLUSTER52292.2023.00029(259-270)Online publication date: 31-Oct-2023

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
ICPP Workshops '23: Proceedings of the 52nd International Conference on Parallel Processing Workshops
August 2023
217 pages
ISBN:9798400708428
DOI:10.1145/3605731
Publication rights licensed to ACM. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of the United States government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 September 2023

Permissions

Request permissions for this article.

Check for updates

Author Tag

  1. high-performance computing computer networks fine-grained communication benchmarks

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

  • Advanced Simulation and Computing Program

Conference

ICPP-W 2023

Acceptance Rates

Overall Acceptance Rate 91 of 313 submissions, 29%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)21
  • Downloads (Last 6 weeks)1
Reflects downloads up to 22 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Taking the MPI standard and the open MPI library to exascaleThe International Journal of High Performance Computing Applications10.1177/10943420241265936Online publication date: 23-Jul-2024
  • (2023)Modeling and Benchmarking the Potential Benefit of Early-Bird Transmission in Fine-Grained CommunicationProceedings of the 52nd International Conference on Parallel Processing10.1145/3605573.3605618(306-316)Online publication date: 7-Aug-2023
  • (2023)A Dynamic Network-Native MPI Partitioned Aggregation Over InfiniBand Verbs2023 IEEE International Conference on Cluster Computing (CLUSTER)10.1109/CLUSTER52292.2023.00029(259-270)Online publication date: 31-Oct-2023

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media