Article

Free access

Automatically tuned collective communications

Authors:

Sathish S. Vadhiyar,

Graham E. Fagg,

Jack DongarraAuthors Info & Claims

SC '00: Proceedings of the 2000 ACM/IEEE conference on Supercomputing

Pages 3 - es

Published: 01 November 2000 Publication History

PDF eReader

Abstract

The performance of the MPI's collective communications is critical in most MPI-based applications. A general algorithm for a given collective communication operation may not give good performance on all systems due to the differences in architectures, network parameters and the storage capacity of the underlying MPI implementation. In this paper we discuss an approach in which the collective communications are tuned for a given system by conducting a series of experiments on the system. We also discuss a dynamic topology method that uses the tuned static topology shape, but re-orders the logical addresses to compensate for changing run time variations. A series of experiments were conducted comparing our tuned collective communication operations to various native vendor MPI implementations. The use of the tuned collective communications resulted in about 30 percent to 650 percent improvement in performance over the native MPI implementations.

References

[1]

Thilo Kielmann, Henri E. Bal and Segei Gorlatch. Bandwidth-efficient Collective Communication for Clustered Wide Area Systems.IPDPS 2000, Cancun, Mexico. (May 1-5, 2000)]]

Google Scholar

[2]

Lars Paul Huse. Collective Communication on Dedicated Clusters of Workstations.Proceedings of the 6th European PVM/MPI Users' Group Meeting, Barcelona, Spain, Spetmeber 1999. p(469-476).]]

Digital Library

Google Scholar

[3]

David Culler, R. Karp, D. Patterson, A. Sahay, K.E. Schauser, E. Santos, R. Subramonian and T. von Eicken. LogP: Towards a Realistic Model of Parallel Computation. In Proc. Symposium on Principles and Practice of Parallel Programming (PpoPP), pages 1-12, San Diego, CA (May 1993).]]

Digital Library

Google Scholar

[4]

R. Rabenseifner. A new optimized MPI reduce algorithm. http://www.hlrs.de/structure/support/ parallel computing/models/mpi/myreduce.html (1997).]]

Google Scholar

[5]

Marc Snir, Steve Otto, Steven Huss-Lederman, David Walker and Jack Dongarra. MPI- The Complete Reference. Volume 1, The MPI Core, second edition (1998).]]

Digital Library

Google Scholar

[6]

M. Frigo. FFTW: An Adaptive Software Architecture for the FFT. Proceedings of the ICASSP Conference, page 1381, Vol. 3. (1998).]]

Google Scholar

[7]

R. Clint Whaley and Jack Dongarra. Automatically Tuned Linear Algebra Software. SC98: High Performance Networking and Computing. http://www.cs.utk.edu/ rwhaley/ATL/INDEX.HTM. (1998)]]

Google Scholar

[8]

L. Prylli and B. Tourancheau. "BIP: a new protocol designed for high performance networking on myrinet". In the PC-NOW workshop, IPPS/SPDP 1998, Orlando, USA, 1998.]]

Crossref

Google Scholar

[9]

Debra Hensgen, Raphael Finkel and Udi Manber. Two algorithms for Barrier Synchroniztion. International Journal of Parallel Programming, Vol. 17, No. 1, 1988.]]

Digital Library

Google Scholar

[10]

M. Beck, J. Dongarra, G. Fagg, A. Geist, P. Gray, J.Kohl, M. Migliardi, K. Moore, T. Moore, P. Papadopoulous, S. Scott, V. Sunderam,"HARNESS: a next generation distributed virtual machine"", Journal of Future Generation Computer Systems, (15), Elsevier Science B.V., 1999.]]

Digital Library

Google Scholar

Cited By

View all

He JZhai JAntunes TWang HLuo FShi SLi QLee JAgrawal KSpear M(2022)FasterMoEProceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming10.1145/3503221.3508418(120-134)Online publication date: 2-Apr-2022
https://dl.acm.org/doi/10.1145/3503221.3508418
Küttler MPlaneta MBierbaum JWeinhold CHärtig HBarak AHoefler THollingsworth JKeidar I(2019)Corrected trees for reliable group communicationProceedings of the 24th Symposium on Principles and Practice of Parallel Programming10.1145/3293883.3295721(287-299)Online publication date: 16-Feb-2019
https://dl.acm.org/doi/10.1145/3293883.3295721
Ferretti MSantangelo L(2018)Protein Secondary Structure Analysis in the CloudProceedings of the 6th International Workshop on Parallelism in Bioinformatics10.1145/3235830.3235837(63-70)Online publication date: 23-Sep-2018
https://dl.acm.org/doi/10.1145/3235830.3235837
Show More Cited By

Index Terms

Recommendations

Towards an Accurate Model for Collective Communications

The performance of the MPI's collective communications is critical in most MPI-based applications. A general algorithm for a given collective communication operation may not give good performance on all systems due to the differences in architectures, ...
Process Mapping for MPI Collective Communications
Euro-Par '09: Proceedings of the 15th International Euro-Par Conference on Parallel Processing

It is an important problem to map virtual parallel processes to physical processors (or cores) in an optimized way to get scalable performance due to non-uniform communication cost in modern parallel computers. Existing work uses profile-guided ...
Towards an Accurate Model for Collective Communications
ICCS '01: Proceedings of the International Conference on Computational Sciences-Part I

The performance of the MPI's collective communications is critical in most MPI-based applications. A general algorithm for a given collective communication operation may not give good performance on all systems due to the differences in architectures, ...

Comments

Information & Contributors

Information

Published In

SC '00: Proceedings of the 2000 ACM/IEEE conference on Supercomputing

November 2000

889 pages

ISBN:0780398025

Conference Chair:
Louis Turcotte
Rose-Hulman Institute of Technology

In-Cooperation

SIAM: Society for Industrial and Applied Mathematics

Publisher

IEEE Computer Society

United States

Publication History

Published: 01 November 2000

Check for updates

Qualifiers

Article

Conference

SC '00

Sponsor:

SIGARCH
IEEE-CS

SC '00: International Conference for High Performance Computing, Networking, Storage and Analysis

November 4 - 10, 2000

Texas, Dallas, USA

Acceptance Rates

SC '00 Paper Acceptance Rate 62 of 179 submissions, 35%;

Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

34
Total Citations
View Citations
679
Total Downloads

Downloads (Last 12 months)95
Downloads (Last 6 weeks)8

Reflects downloads up to 03 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

View all

He JZhai JAntunes TWang HLuo FShi SLi QLee JAgrawal KSpear M(2022)FasterMoEProceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming10.1145/3503221.3508418(120-134)Online publication date: 2-Apr-2022
https://dl.acm.org/doi/10.1145/3503221.3508418
Küttler MPlaneta MBierbaum JWeinhold CHärtig HBarak AHoefler THollingsworth JKeidar I(2019)Corrected trees for reliable group communicationProceedings of the 24th Symposium on Principles and Practice of Parallel Programming10.1145/3293883.3295721(287-299)Online publication date: 16-Feb-2019
https://dl.acm.org/doi/10.1145/3293883.3295721
Ferretti MSantangelo L(2018)Protein Secondary Structure Analysis in the CloudProceedings of the 6th International Workshop on Parallelism in Bioinformatics10.1145/3235830.3235837(63-70)Online publication date: 23-Sep-2018
https://dl.acm.org/doi/10.1145/3235830.3235837
Hasanov KLastovetsky A(2017)Hierarchical redesign of classic MPI reduction algorithmsThe Journal of Supercomputing10.1007/s11227-016-1779-773:2(713-725)Online publication date: 1-Feb-2017
https://dl.acm.org/doi/10.1007/s11227-016-1779-7
Graham RBureddy DLui PRosenstock HShainer GBloch GGoldenerg DDubman MKotchubievsky SKoushnir VLevi LMargolin ARonen TShpiner AWertheim OZahavi E(2016)Scalable hierarchical aggregation protocol (SHArP)Proceedings of the First Workshop on Optimization of Communication in HPC10.5555/3018058.3018059(1-10)Online publication date: 13-Nov-2016
https://dl.acm.org/doi/10.5555/3018058.3018059
Hasanov KLastovetsky A(2015)Hierarchical Optimization of MPI Reduce AlgorithmsProceedings of the 13th International Conference on Parallel Computing Technologies - Volume 925110.1007/978-3-319-21909-7_3(21-34)Online publication date: 31-Aug-2015
https://dl.acm.org/doi/10.1007/978-3-319-21909-7_3
Hoefler TMoor D(2014)Energy, Memory, and Runtime Tradeoffs for Implementing Collective Communication OperationsSupercomputing Frontiers and Innovations: an International Journal10.14529/jsfi1402041:2(58-75)Online publication date: 9-Jul-2014
https://dl.acm.org/doi/10.14529/jsfi140204
Fatyanov ATerekhov A(2011)High-performance modeling acoustic and elastic waves using the parallel Dichotomy AlgorithmJournal of Computational Physics10.1016/j.jcp.2010.11.046230:5(1992-2003)Online publication date: 1-Mar-2011
https://dl.acm.org/doi/10.1016/j.jcp.2010.11.046
Scherer WAdhianto LJin GMellor-Crummey JYang CMoreira JIancu CSaraswat V(2010)Hiding latency in Coarray Fortran 2.0Proceedings of the Fourth Conference on Partitioned Global Address Space Programming Model10.1145/2020373.2020387(1-9)Online publication date: 12-Oct-2010
https://dl.acm.org/doi/10.1145/2020373.2020387
Lee SEigenmann RPapatheodorou TBanerjee UMendelson AGallivan K(2008)Adaptive runtime tuning of parallel sparse matrix-vector multiplication on distributed memory systemsProceedings of the 22nd annual international conference on Supercomputing10.1145/1375527.1375558(195-204)Online publication date: 7-Jun-2008
https://dl.acm.org/doi/10.1145/1375527.1375558
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Cited By

Index Terms

Recommendations

Towards an Accurate Model for Collective Communications

Process Mapping for MPI Collective Communications

Towards an Accurate Model for Collective Communications

Comments

Published In

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF

eReader

Login options

Full Access

Abstract

References

Cited By

Index Terms

Recommendations

Towards an Accurate Model for Collective Communications

Process Mapping for MPI Collective Communications

Towards an Accurate Model for Collective Communications

Comments

Information

Published In

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View options

PDF

eReader

Login options

Full Access

Share

Share this Publication link

Share on social media

Affiliations