Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/370049.370055acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
Article
Free access

Automatically tuned collective communications

Published: 01 November 2000 Publication History

Abstract

The performance of the MPI's collective communications is critical in most MPI-based applications. A general algorithm for a given collective communication operation may not give good performance on all systems due to the differences in architectures, network parameters and the storage capacity of the underlying MPI implementation. In this paper we discuss an approach in which the collective communications are tuned for a given system by conducting a series of experiments on the system. We also discuss a dynamic topology method that uses the tuned static topology shape, but re-orders the logical addresses to compensate for changing run time variations. A series of experiments were conducted comparing our tuned collective communication operations to various native vendor MPI implementations. The use of the tuned collective communications resulted in about 30 percent to 650 percent improvement in performance over the native MPI implementations.

References

[1]
Thilo Kielmann, Henri E. Bal and Segei Gorlatch. Bandwidth-efficient Collective Communication for Clustered Wide Area Systems.IPDPS 2000, Cancun, Mexico. (May 1-5, 2000)]]
[2]
Lars Paul Huse. Collective Communication on Dedicated Clusters of Workstations.Proceedings of the 6th European PVM/MPI Users' Group Meeting, Barcelona, Spain, Spetmeber 1999. p(469-476).]]
[3]
David Culler, R. Karp, D. Patterson, A. Sahay, K.E. Schauser, E. Santos, R. Subramonian and T. von Eicken. LogP: Towards a Realistic Model of Parallel Computation. In Proc. Symposium on Principles and Practice of Parallel Programming (PpoPP), pages 1-12, San Diego, CA (May 1993).]]
[4]
R. Rabenseifner. A new optimized MPI reduce algorithm. http://www.hlrs.de/structure/support/ parallel computing/models/mpi/myreduce.html (1997).]]
[5]
Marc Snir, Steve Otto, Steven Huss-Lederman, David Walker and Jack Dongarra. MPI- The Complete Reference. Volume 1, The MPI Core, second edition (1998).]]
[6]
M. Frigo. FFTW: An Adaptive Software Architecture for the FFT. Proceedings of the ICASSP Conference, page 1381, Vol. 3. (1998).]]
[7]
R. Clint Whaley and Jack Dongarra. Automatically Tuned Linear Algebra Software. SC98: High Performance Networking and Computing. http://www.cs.utk.edu/ rwhaley/ATL/INDEX.HTM. (1998)]]
[8]
L. Prylli and B. Tourancheau. "BIP: a new protocol designed for high performance networking on myrinet". In the PC-NOW workshop, IPPS/SPDP 1998, Orlando, USA, 1998.]]
[9]
Debra Hensgen, Raphael Finkel and Udi Manber. Two algorithms for Barrier Synchroniztion. International Journal of Parallel Programming, Vol. 17, No. 1, 1988.]]
[10]
M. Beck, J. Dongarra, G. Fagg, A. Geist, P. Gray, J.Kohl, M. Migliardi, K. Moore, T. Moore, P. Papadopoulous, S. Scott, V. Sunderam,"HARNESS: a next generation distributed virtual machine"", Journal of Future Generation Computer Systems, (15), Elsevier Science B.V., 1999.]]

Cited By

View all
  • (2022)FasterMoEProceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming10.1145/3503221.3508418(120-134)Online publication date: 2-Apr-2022
  • (2019)Corrected trees for reliable group communicationProceedings of the 24th Symposium on Principles and Practice of Parallel Programming10.1145/3293883.3295721(287-299)Online publication date: 16-Feb-2019
  • (2018)Protein Secondary Structure Analysis in the CloudProceedings of the 6th International Workshop on Parallelism in Bioinformatics10.1145/3235830.3235837(63-70)Online publication date: 23-Sep-2018
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SC '00: Proceedings of the 2000 ACM/IEEE conference on Supercomputing
November 2000
889 pages
ISBN:0780398025

Sponsors

In-Cooperation

  • SIAM: Society for Industrial and Applied Mathematics

Publisher

IEEE Computer Society

United States

Publication History

Published: 01 November 2000

Check for updates

Qualifiers

  • Article

Conference

SC '00
Sponsor:

Acceptance Rates

SC '00 Paper Acceptance Rate 62 of 179 submissions, 35%;
Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)95
  • Downloads (Last 6 weeks)8
Reflects downloads up to 03 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2022)FasterMoEProceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming10.1145/3503221.3508418(120-134)Online publication date: 2-Apr-2022
  • (2019)Corrected trees for reliable group communicationProceedings of the 24th Symposium on Principles and Practice of Parallel Programming10.1145/3293883.3295721(287-299)Online publication date: 16-Feb-2019
  • (2018)Protein Secondary Structure Analysis in the CloudProceedings of the 6th International Workshop on Parallelism in Bioinformatics10.1145/3235830.3235837(63-70)Online publication date: 23-Sep-2018
  • (2017)Hierarchical redesign of classic MPI reduction algorithmsThe Journal of Supercomputing10.1007/s11227-016-1779-773:2(713-725)Online publication date: 1-Feb-2017
  • (2016)Scalable hierarchical aggregation protocol (SHArP)Proceedings of the First Workshop on Optimization of Communication in HPC10.5555/3018058.3018059(1-10)Online publication date: 13-Nov-2016
  • (2015)Hierarchical Optimization of MPI Reduce AlgorithmsProceedings of the 13th International Conference on Parallel Computing Technologies - Volume 925110.1007/978-3-319-21909-7_3(21-34)Online publication date: 31-Aug-2015
  • (2014)Energy, Memory, and Runtime Tradeoffs for Implementing Collective Communication OperationsSupercomputing Frontiers and Innovations: an International Journal10.14529/jsfi1402041:2(58-75)Online publication date: 9-Jul-2014
  • (2011)High-performance modeling acoustic and elastic waves using the parallel Dichotomy AlgorithmJournal of Computational Physics10.1016/j.jcp.2010.11.046230:5(1992-2003)Online publication date: 1-Mar-2011
  • (2010)Hiding latency in Coarray Fortran 2.0Proceedings of the Fourth Conference on Partitioned Global Address Space Programming Model10.1145/2020373.2020387(1-9)Online publication date: 12-Oct-2010
  • (2008)Adaptive runtime tuning of parallel sparse matrix-vector multiplication on distributed memory systemsProceedings of the 22nd annual international conference on Supercomputing10.1145/1375527.1375558(195-204)Online publication date: 7-Jun-2008
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media