Article

Optimizing MPI Alltoall Communication of Large Messages in Multicore Clusters

Authors:

PDCAT '11: Proceedings of the 2011 12th International Conference on Parallel and Distributed Computing, Applications and Technologies

Pages 257 - 262

https://doi.org/10.1109/PDCAT.2011.60

Published: 20 October 2011 Publication History

Abstract

MPI All to all communication is widely used in many high performance computing (HPC) applications. In All to all communication, each process sends a distinct message to all other participating processes. In multicore clusters, processes within a node simultaneously contend for the same network resource of the node in All to all communication. However, many small synchronization messages are required in All to all communication of large messages. With the contention, their latency is orders of magnitude larger than that without contention. As a result, the synchronization overhead is significantly increased and accounts for a large proportion to the whole latency of All to all communication. In this paper, we analyse the considerable overhead of synchronization messages. Base on the analysis, an optimization is presented to reduce the number of synchronization messages from 3N to 2 N. Evaluations on a 240-core cluster show that the performance is improved by almost constant ratio, which is mainly determined by message size and independent of system scale. The performance of All to all communication is improved by 25% for 32K and 64K bytes messages. For FFT application, performance is improved by 20%.

Cited By

View all

Chochia GSolt DHursey J(2022)Applying on Node Aggregation Methods to MPI Alltoall Collectives: Matrix Block Aggregation AlgorithmProceedings of the 29th European MPI Users' Group Meeting10.1145/3555819.3555821(11-17)Online publication date: 14-Sep-2022
https://dl.acm.org/doi/10.1145/3555819.3555821

Index Terms

Optimizing MPI Alltoall Communication of Large Messages in Multicore Clusters
1. Computer systems organization
  1. Dependable and fault-tolerant systems and networks
2. Software and its engineering
  1. Software organization and properties
    1. Software system structures
      1. Distributed systems organizing principles

Index terms have been assigned to the content through auto-classification.

Recommendations

An MPI prototype for compiled communication on Ethernet switched clusters
Special issue: Design and performance of networks for super-, cluster-, and grid-computing: Part I

Compiled communication has recently been proposed to improve communication performance for clusters of workstations. The idea of compiled communication is to apply more aggressive optimizations to communications whose information is known at compile ...
MPI Alltoall Personalized Exchange on GPGPU Clusters: Design Alternatives and Benefit
CLUSTER '11: Proceedings of the 2011 IEEE International Conference on Cluster Computing

General Purpose Graphics Processing Units (GPGPUs) are rapidly becoming an integral part of high performance system architectures. The Tianhe-1A and Tsubame systems received significant attention for their architectures that leverage GPGPUs. ...
Optimizing mpi point-to-point communication performance on rdma-enabled smp-cmp clusters

Comments

Information & Contributors

Information

Published In

PDCAT '11: Proceedings of the 2011 12th International Conference on Parallel and Distributed Computing, Applications and Technologies

October 2011

442 pages

ISBN:9780769545646

Publisher

IEEE Computer Society

United States

Publication History

Published: 20 October 2011

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 04 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

View all

Chochia GSolt DHursey J(2022)Applying on Node Aggregation Methods to MPI Alltoall Collectives: Matrix Block Aggregation AlgorithmProceedings of the 29th European MPI Users' Group Meeting10.1145/3555819.3555821(11-17)Online publication date: 14-Sep-2022
https://dl.acm.org/doi/10.1145/3555819.3555821

View Options

View options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Abstract

Cited By

Index Terms

Recommendations

An MPI prototype for compiled communication on Ethernet switched clusters

MPI Alltoall Personalized Exchange on GPGPU Clusters: Design Alternatives and Benefit

Optimizing mpi point-to-point communication performance on rdma-enabled smp-cmp clusters

Comments

Information

Published In

Publisher

Publication History

Qualifiers

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View options

Get Access

Login options

Full Access

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations