research-article

Scalability of communicators and groups in MPI

Authors:

Seyed M. Mirtaheri,

Alan WagnerAuthors Info & Claims

HPDC '10: Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing

Pages 264 - 275

https://doi.org/10.1145/1851476.1851507

Published: 21 June 2010 Publication History

Abstract

As the number of cores inside compute clusters continues to grow, the scalability of MPI (Message Passing Interface) is important to ensure that programs can continue to execute on an ever-increasing number of cores. One important scalability issue for MPI is the implementation of communicators and groups. Communicators and groups are an integral part of MPI and play an essential role in the design and use of libraries. It is challenging to create an MPI implementation to support communicators and groups to scale to the hundreds of thousands of processes that are possible in today's clusters. In this paper we present the design and evaluation of techniques to support the scalability of communicators and groups in MPI.

We have designed and implemented a fine-grain version of MPI (FG-MPI) based on MPICH2, that allows thousands of full-fledged MPI processes inside an operating system process. Using FG-MPI we can create hundreds and thousands of MPI processes, which allowed us to implement and evaluate solutions to the scalability issues associated with communicators. We describe techniques to allow for sharing of group information inside processes, and the design of scalable operations to create the communicators. A set plus permutation framework is introduced for storing group information for communicators and a set, instead of map, representation is proposed for MPI group objects. Performance results are given for the execution of a MPI benchmark program with upwards of 100,000 processes with communicators created for various groups of different sizes and types.

References

[1]

}}BuDDy - A Binary Decision Diagram Package, http://vlsicad.eecs.umich.edu/BK/Slots/cache/www.itu.dk/research/buddy/index.html.

[2]

}}Argonne National Laboratory. Communicators and Context IDs. Available from http://wiki.mcs.anl.gov/mpich2/index.php/Communicators_and_Context_IDs.

[3]

}}Argonne National Laboratory. MPICH2: A high performance and portable implementation of MPI standard. Available from http://www.mcs.anl.gov/research/projects/mpich2/index.php.

[4]

}}P. Balaji, D. Buntinas, D. Goodell, W. Gropp, S. Kumar, E. L. Lusk, R. Thakur, and J. L. Träff. MPI on a million processors. In PVM/MPI, pages 20--30, 2009.

Digital Library

[5]

}}J. Barbay and G. Navarro. Compressed representations of permutations, and applications. In S. Albers and J.-Y. Marion, editors, 26th Intl. Symp. on Theoretical Aspects of Computer Science (STACS), pages 111--122, Dagstuhl, Germany, 2009.

[6]

}}V. R. Basili, J. C. Carver, D. Cruzes, L. M. Hochstein, J. K. Hollingsworth, F. Shull, and M. V. Zelkowitz. Understanding the high-performance-computing community: A software engineer's perspective. IEEE Softw., 25(4):29--36, 2008.

Digital Library

[7]

}}R. E. Bryant. Graph-based algorithms for boolean function manipulation. IEEE Transactions on Computers, 35:677--691, 1986.

Digital Library

[8]

}}D. Buntinas, W. Gropp, and G. Mercier. Design and evaluation of Nemesis, a scalable, low-latency, message-passing communication subsystem. In Proc. of the Sixth IEEE Intl. Symp. on Cluster Computing and the Grid (CCGRID), pages 521--530, Washington, DC, USA, 2006. IEEE Computer Society.

Digital Library

[9]

}}M. Burrows and D. J. Wheeler. A block-sorting lossless data compression algorithm. Technical Report 124, Digital Equipment Corp., 1994.

[10]

}}M. Chaarawi and E. Gabriel. Evaluating sparse data storage techniques for mpi groups and communicators. In ICCS '08: Proceedings of the 8th international conference on Computational Science, Part I, pages 297--306, Berlin, Heidelberg, 2008. Springer-Verlag.

Digital Library

[11]

}}E. D. Demaine, I. Foster, C. Kesselman, and M. Snir. Generalized communicators in the message passing interface. IEEE Trans. Parallel Distrib. Syst., 12(6):610--616, 2001.

Digital Library

[12]

}}W. Gropp, E. Lusk, and A. Skjellum. Using MPI (2nd ed.): Portable parallel programming with the message-passing interface. MIT Press, Cambridge, MA, USA, 1999.

Digital Library

[13]

}}W. D. Gropp and R. Thakur. Issues in developing a thread-safe MPI implementation. In PVM/MPI, pages 12--21, 2006.

Digital Library

[14]

}}W.-K. Hon, K. Sadakane, and W.-K. Sung. Breaking a time-and-space barrier in constructing full-text indices. SIAM J. Comput., 38(6):2162--2178, 2009.

Digital Library

[15]

}}T. C. Hu and A. C. Tucker. Optimal computer search trees and variable-length alphabetical codes. SIAM Journal on Applied Mathematics, 21(4):514--532, 1971.

Digital Library

[16]

}}H. Kamal and A. Wagner. FG-MPI: Fine-grain MPI for multicore and clusters. In 11th IEEE Intl. Workshop on Parallel and Distributed Scientific and Engineering Computing (PDSEC) held in conjunction with IPDPS-24, April 2010.

[17]

}}D. Okanohara and K. Sadakane. A linear-time burrows-wheeler transform using induced sorting. In SPIRE, pages 90--101, 2009.

Digital Library

[18]

}}J. Seward. bzip2 and libbzip2, version 1.0.5 a program and library for data compression. Available from http://www.bzip.org/.

[19]

}}M. Stabno and R. Wrembel. RLH: Bitmap compression technique based on run-length and huffman encoding. Info. Sys., 34(4--5):400--414, 2009.

Digital Library

[20]

}}E. Toernig. Coroutine library. Available from http://www.goron.de/~froese/coro/coro.html.

[21]

}}TOP500. Top 500 supercomputing sites. Available from http://www.top500.org/.

[22]

}}R. von Behren, J. Condit, F. Zhou, G. C. Necula, and E. Brewer. Capriccio: scalable threads for internet services. In SOSP '19, pages 268--281, New York, NY, USA, 2003. ACM.

Digital Library

Cited By

Haghi PGuo AXiong QYang CGeng TBroaddus JMarshall RSchafer DSkjellum AHerbordt M(2021)Reconfigurable switches for high performance and flexible MPI collectivesConcurrency and Computation: Practice and Experience10.1002/cpe.676934:6Online publication date: 12-Dec-2021
https://doi.org/10.1002/cpe.6769
Haghi PGuo AXiong QPatel RYang CGeng TBroaddus JMarshall RSkjellum AHerbordt M(2020)FPGAs in the Network and Novel Communicator Support Accelerate MPI Collectives2020 IEEE High Performance Extreme Computing Conference (HPEC)10.1109/HPEC43674.2020.9286200(1-10)Online publication date: 22-Sep-2020
https://doi.org/10.1109/HPEC43674.2020.9286200
Hjelm NPritchard HGutierrez SHolmes DCastain RSkjellum A(2019)MPI Sessions: Evaluation of an Implementation in Open MPI2019 IEEE International Conference on Cluster Computing (CLUSTER)10.1109/CLUSTER.2019.8891002(1-11)Online publication date: Sep-2019
https://doi.org/10.1109/CLUSTER.2019.8891002
Show More Cited By

Index Terms

Scalability of communicators and groups in MPI
1. Software and its engineering
  1. Software creation and management
    1. Designing software
      1. Software implementation planning
        Software design techniques
    2. Software development process management
  2. Software organization and properties
    1. Contextual software domains
      1. Operating systems
        Communications management

Recommendations

Tools-supported HPF and MPI parallelization of the NAS parallel benchmarks
FRONTIERS '96: Proceedings of the 6th Symposium on the Frontiers of Massively Parallel Computation

High Performance Fortran (HPF) compilers and communication libraries with the standardized Message Passing Interface (MPI) are becoming widely available, easing the development of portable parallel applications. The Annai tool environment supports ...
MPI as a Coordination Layer for Communicating HPF Tasks
MPIDC '96: Proceedings of the Second MPI Developers Conference

Abstract: Data-parallel languages such as High Performance Fortran (HPF) present a simple execution model in which a single thread of control performs high-level operations on distributed arrays. These languages can greatly ease the development of ...
An Analysis of Multicore Specific Optimization in MPI Implementations
IPDPSW '12: Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum

We first introduced the multicore specific optimization modules of two common MPI implementations â€" MPICH2 and Open MPI, and then tested their performance on one multicore computer. By enabling and disabling these modules, we provided their ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

HPDC '10: Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing

June 2010

911 pages

ISBN:9781605589428

DOI:10.1145/1851476

General Chairs:
Salim Hariri
University of Arizona
,
Kate Keahey
University of Chicago

Copyright © 2010 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

University of Arizona: University of Arizona
SIGARCH: ACM Special Interest Group on Computer Architecture

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 June 2010

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

HPDC '10

Sponsor:

University of Arizona
SIGARCH

HPDC '10: The 19th International Symposium on High Performance Distributed Computing

June 21 - 25, 2010

Illinois, Chicago

Acceptance Rates

Overall Acceptance Rate 166 of 966 submissions, 17%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

20
Total Citations
View Citations
253
Total Downloads

Downloads (Last 12 months)5
Downloads (Last 6 weeks)0

Reflects downloads up to 08 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Haghi PGuo AXiong QYang CGeng TBroaddus JMarshall RSchafer DSkjellum AHerbordt M(2021)Reconfigurable switches for high performance and flexible MPI collectivesConcurrency and Computation: Practice and Experience10.1002/cpe.676934:6Online publication date: 12-Dec-2021
https://doi.org/10.1002/cpe.6769
Haghi PGuo AXiong QPatel RYang CGeng TBroaddus JMarshall RSkjellum AHerbordt M(2020)FPGAs in the Network and Novel Communicator Support Accelerate MPI Collectives2020 IEEE High Performance Extreme Computing Conference (HPEC)10.1109/HPEC43674.2020.9286200(1-10)Online publication date: 22-Sep-2020
https://doi.org/10.1109/HPEC43674.2020.9286200
Hjelm NPritchard HGutierrez SHolmes DCastain RSkjellum A(2019)MPI Sessions: Evaluation of an Implementation in Open MPI2019 IEEE International Conference on Cluster Computing (CLUSTER)10.1109/CLUSTER.2019.8891002(1-11)Online publication date: Sep-2019
https://doi.org/10.1109/CLUSTER.2019.8891002
Kim HIshikawa MYamakawa Y(2018)Reference broadcast frame synchronization for distributed high-speed camera network2018 IEEE Sensors Applications Symposium (SAS)10.1109/SAS.2018.8336781(1-5)Online publication date: Mar-2018
https://doi.org/10.1109/SAS.2018.8336781
Axtmann MWiebigke ASanders P(2018)Lightweight MPI Communicators with Applications to Perfectly Balanced Quicksort2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS.2018.00035(254-265)Online publication date: May-2018
https://doi.org/10.1109/IPDPS.2018.00035
Guo YArcher CBlocksome MParker SBland WRaffenetti KBalaji P(2017)Memory Compression Techniques for Network Address Management in MPI2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS.2017.18(1008-1017)Online publication date: May-2017
https://doi.org/10.1109/IPDPS.2017.18
Fu HPophale SVenkata MYu W(2016)DISPProceedings of the First Workshop on Optimization of Communication in HPC10.5555/3018058.3018064(53-62)Online publication date: 13-Nov-2016
https://dl.acm.org/doi/10.5555/3018058.3018064
Holmes DMohror KGrant RSkjellum ASchulz MBland WSquyres JDongarra JHolmes DCollis ALarsson Träff JSmith L(2016)MPI SessionsProceedings of the 23rd European MPI Users' Group Meeting10.1145/2966884.2966915(121-129)Online publication date: 25-Sep-2016
https://dl.acm.org/doi/10.1145/2966884.2966915
Fu HPophale SVenkata MYu W(2016)DISP: Optimizations towards Scalable MPI Startup2016 First International Workshop on Communication Optimizations in HPC (COMHPC)10.1109/COMHPC.2016.011(53-62)Online publication date: Nov-2016
https://doi.org/10.1109/COMHPC.2016.011
Khaldi DEachempati DGe SJouvelot PChapman B(2015)A Team-Based Methodology of Memory Hierarchy-Aware Runtime Support in Coarray FortranProceedings of the 2015 IEEE International Conference on Cluster Computing10.1109/CLUSTER.2015.67(448-451)Online publication date: 8-Sep-2015
https://dl.acm.org/doi/10.1109/CLUSTER.2015.67
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten