research-article

A hierarchical and load-aware design for large message neighborhood collectives

Authors:

S. Mahdieh Ghazimirsaeed,

Mohammadreza BayatpourAuthors Info & Claims

SC '20: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis

Article No.: 34, Pages 1 - 13

Published: 09 November 2020 Publication History

Abstract

The MPI-3.0 standard introduced neighborhood collective to support sparse communication patterns used in many applications. In this paper, we propose a hierarchical and distributed graph topology that considers the physical topology of the system and the virtual communication pattern of processes to improve the performance of large message neighborhood collectives. Moreover, we propose two design alternatives on top of the hierarchical design: 1. LAG-H: assumes the same communication load for all processes, 2. LAW-H: considers the communication load of processes for fair distribution of load between them. We propose a mathematical model to determine the communication capacity of each process. Then, we use the derived capacity to fairly distribute the load between processes. Our experimental results on up to 28,672 processes show up to 9x speedup for various process topologies. We also observe up to 8.2% performance gain and 34x speedup for NAS-DT and SpMM, respectively.

References

[1]

"MPI-3 Standard Document," http://www.mpi-forum.org/docs/mpi-3.0/mpi30-report.pdf.

[2]

T. Hoefler and T. Schneider, "Optimization principles for collective neighborhood communications," in Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis. IEEE Computer Society Press, 2012.

[3]

J. L. Träff, A. Carpen-Amarie, S. Hunold, and A. Rougier, "Message-combining algorithms for isomorphic, sparse collective communication," arXiv preprint arXiv:1606.07676, 2016.

[4]

J. L. Träff, F. D. Lübbe, A. Rougier, and S. Hunold, "Isomorphic, sparse MPI-like collective communication operations for parallel stencil computations," in Proceedings of the 22nd European MPI Users' Group Meeting. ACM, 2015.

[5]

J. L. Träff and S. Hunold, "Cartesian Collective Communication," in Proceedings of the 48th International Conference on Parallel Processing, ser. ICPP 2019. ACM, 2019.

[6]

S. H. Mirsadeghi, J. L. Träff, P. Balaji, and A. Afsahi, "Exploiting common neighborhoods to optimize MPI neighborhood collectives," in High Performance Computing (HiPC), 2017 IEEE 24th International Conference on. IEEE, 2017.

[7]

S. M. Ghazimirsaeed, S. H. Mirsadeghi, and A. Afsahi, "An Efficient Collaborative Communication Mechanism for MPI Neighborhood Collectives," in 2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS), May 2019.

[8]

J. Pjesivac-Grbovic, T. Angskun, G. Bosilca, G. E. Fagg, E. Gabriel, and J. J. Dongarra, "Performance analysis of mpi collective operations," in 19th IEEE International Parallel and Distributed Processing Symposium, April 2005.

[9]

G. Almási, P. Heidelberger, C. J. Archer, X. Martorell, C. C. Erway, J. E. Moreira, B. Steinmacher-Burow, and Y. Zheng, "Optimization of mpi collective communication on bluegene/l systems," in Proceedings of the 19th Annual International Conference on Supercomputing, ser. ICS '05, 2005.

Digital Library

[10]

"MPICH: High-Performance Portable MPI," http://www.mpich.org. Accessed: September 9, 2020.

[11]

Network-Based Computing Laboratory, "MVAPICH: MPI over InfiniBand, Omni-Path, Ethernet/iWARP, and RoCE," http://mvapich.cse.ohio-state.edu/.

[12]

E. Gabriel, G. E. Fagg, G. Bosilca, T. Angskun, J. J. Dongarra, J. M. Squyres, V. Sahay, P. Kambadur, B. Barrett, A. Lumsdaine, R. H. Castain, D. J. Daniel, R. L. Graham, and T. S. Woodall, "Open MPI: Goals, Concept, and Design of a Next Generation MPI Implementation," in Proceedings, 11th European PVM/MPI Users' Group Meeting, 2004.

[13]

T. Hoefler and J. L. Traff, "Sparse collective operations for MPI," in Parallel & Distributed Processing, 2009. IPDPS 2009. IEEE International Symposium on. IEEE, 2009.

[14]

M. Bayatpour, S. Chakraborty, H. Subramoni, X. Lu, and D. K. Panda, "Scalable reduction collectives with data partitioning-based multi-leader design," in Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, 2017, pp. 1--11.

[15]

Network-Based Computing Laboratory, "MVAPICH: MPI over Infini-Band, Omni-Path, Ethernet/iWARP, and RoCE," http://mvapich.cse.ohio-state.edu/.

[16]

F. D. Lübbe, "Micro-benchmarking MPI Neighborhood Collective Operations," in European Conference on Parallel Processing. Springer, 2017.

[17]

T. U. of Florida Sparse Matrix Collection, "Davis, Timothy A. and Hu, Yifan," ACM Trans. Math. Softw., vol. 38, no. 1, 2011.

[18]

D. H. Bailey, E. Barszcz, J. T. Barton, D. S. Browning, R. L. Carter, L. Dagum, R. A. Fatoohi, P. O. Frederickson, T. A. Lasinski, R. S. Schreiber et al., "The nas parallel benchmarks," The International Journal of Supercomputing Applications, vol. 5, no. 3, pp. 63--73, 1991.

Digital Library

[19]

M. W. Berry, K. A. Gallivan, E. Gallopoulos, A. Grama, B. Philippe, Y. Saad, and F. Saied, High-performance scientific computing: algorithms and applications. Springer Science & Business Media, 2012.

Digital Library

[20]

"Message Passing Interface (MPI)," http://www.mpi-forum.org. Accessed: September 9, 2020.

[21]

"Recent Efforts of the MPI Forum for MPI-4 and Future MPI Standards," https://www.osti.gov/servlets/purl/1492628. Accessed: September 9, 2020.

[22]

S. Kumar, P. Heidelberger, D. Chen, and M. Hines, "Optimization of applications with non-blocking neighborhood collectives via multisends on the blue gene/p supercomputer," in 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS). IEEE, 2010.

[23]

S. Kumar, G. Dozsa, G. Almasi, P. Heidelberger, D. Chen, M. E. Giampapa, M. Blocksome, A. Faraj, J. Parker, J. Ratterman et al., "The deep computing messaging framework: generalized scalable message passing on the Blue Gene/P supercomputer."

[24]

T. Hoefler, R. Rabenseifner, H. Ritzdorf, B. R. de Supinski, R. Thakur, and J. L. Träff, "The scalable process topology interface of MPI 2.2," Concurrency and Computation: Practice and Experience, vol. 23, no. 4, 2011.

[25]

A. Ovcharenko, D. Ibanez, F. Delalondre, O. Sahni, K. E. Jansen, C. D. Carothers, and M. S. Shephard, "Neighborhood communication paradigm to increase scalability in large-scale dynamic scientific applications," Parallel Computing, vol. 38, no. 3, 2012.

[26]

K. Kandalla, A. Buluç, H. Subramoni, K. Tomko, J. Vienne, L. Oliker, and D. K. Panda, "Can network-offload based non-blocking neighborhood MPI collectives improve communication overheads of irregular graph algorithms?" in Cluster Computing Workshops (CLUSTER WORKSHOPS), 2012 IEEE International Conference on. IEEE, 2012.

[27]

O. Selvitopi and C. Aykanat, "Regularizing irregularly sparse point-to-point communications," in Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, 2019, pp. 1--14.

[28]

S. Ghosh, M. Halappanavar, A. Kalyanaraman, A. Khan, and A. Gebremedhin, "Exploring mpi communication models for graph applications using graph matching as a case study," in 2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS), May 2019.

[29]

A. A. Awan, K. Hamidouche, A. Venkatesh, and D. K. Panda, "Efficient large message broadcast using nccl and cuda-aware mpi for deep learning," in Proceedings of the 23rd European MPI Users' Group Meeting, ser. EuroMPI 2016. New York, NY, USA: ACM, 2016.

Recommendations

Optimizing Irregular Communication with Neighborhood Collectives and Locality-Aware Parallelism
SC-W '23: Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis

Irregular communication often limits both the performance and scalability of parallel applications. Typically, applications individually implement irregular communication as point-to-point, and any optimizations are integrated directly into the ...
Efficiently Acquiring Communication Traces for Large-Scale Parallel Applications

Communication patterns of parallel applications are important to optimize application performance and design better communication subsystems. Communication patterns can be extracted from communication traces. However, existing approaches to generate ...
MPACP: An Approach for Automatic Matching of Parallel Application Communication Patterns
APSCC '08: Proceedings of the 2008 IEEE Asia-Pacific Services Computing Conference

Current trends in HPC (High Performance Computing) suggest that clusters will soon consist with hundreds, if not thousands, processors and the size of current scientific problems becomes much larger than before. Many researchers have predicted that the ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SC '20: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis

November 2020

1454 pages

ISBN:9781728199986

General Chair:
Christine Cuicchi,
Program Chairs:
Irene Qualters,
William Kramer

Sponsors

SIGHPC: ACM Special Interest Group on High Performance Computing, Special Interest Group on High Performance Computing

In-Cooperation

IEEE CS

Publisher

IEEE Press

Publication History

Published: 09 November 2020

Check for updates

Author Tags

Qualifiers

Research-article

Conference

SC '20

Sponsor:

SIGHPC

SC '20: The International Conference for High Performance Computing, Networking, Storage and Analysis

November 9 - 19, 2020

Georgia, Atlanta

Acceptance Rates

Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
176
Total Downloads

Downloads (Last 12 months)9
Downloads (Last 6 weeks)3

Reflects downloads up to 17 Oct 2024

Other Metrics

View Author Metrics

Citations

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents