Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1007/978-3-642-03869-3_11guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Process Mapping for MPI Collective Communications

Published: 23 August 2009 Publication History

Abstract

It is an important problem to map virtual parallel processes to physical processors (or cores) in an optimized way to get scalable performance due to non-uniform communication cost in modern parallel computers. Existing work uses profile-guided approaches to optimize mapping schemes to minimize the cost of point-to-point communications automatically. However, these approaches cannot deal with collective communications and may get sub-optimal mappings for applications with collective communications.
In this paper, we propose an approach called OPP (Optimized Process Placement) to handle collective communications which transforms collective communications into a series of point-to-point communication operations according to the implementation of collective communications in communication libraries. Then we can use existing approaches to find optimized mapping schemes which are optimized for both point-to-point and collective communications.
We evaluated the performance of our approach with micro-benchmarks which include all MPI collective communications, NAS Parallel Benchmark suite and three other applications. Experimental results show that the optimized process placement generated by our approach can achieve significant speedup.

References

[1]
Colwell, R.R.: From terabytes to insights. Commun. ACM 46(7), 25-27 (2003).
[2]
Pant, A., Jafri, H.: Communicating efficiently on cluster based grids with MPICH-VMI. In: CLUSTER, pp. 23-33 (2004).
[3]
Chen, H., Chen, W., Huang, J., Robert, B., Kuhn, H.: MPIPP: an automatic profile-guided parallel process placement toolset for SMP clusters and multiclusters. In: ICS, pp. 353-360 (2006).
[4]
NASA Ames Research Center. NAS parallel benchmark NPB, http://www.nas.nasa.gov/Resources/Software/npb.html
[5]
Phinjaroenphan, P., Bevinakoppa, S., Zeephongsekul, P.: A heuristic algorithm for mapping parallel applications on computational grids. In: EGC, pp. 1086-1096 (2005).
[6]
Sanyal, S., Jain, A., Das, S.K., Biswas, R.: A hierarchical and distributed approach for mapping large applications to heterogeneous grids using genetic algorithms. In: CLUSTER, pp. 496-499 (2003).
[7]
Bhanot, G., Gara, A., Heidelberger, P., Lawless, E., Sexton, J., Walkup, R.: Optimizing task layout on the Blue Gene/L supercomputer. IBM Journal of Research and Development 49(2-3), 489-500 (2005).
[8]
Yu, H., Chung, I., Moreira, J.: Topology mapping for Blue Gene/L supercomputer. In: SC, pp. 52-64 (2006).
[9]
Kielmann, T., Hofman, R.F.H., Bal, H.E., Plaat, A., Bhoedjang, R.: MagPIe: MPI's collective communication operations for clustered wide area systems. In: PPOPP (1999).
[10]
Sanders, P., Traff, J.L.: The hierarchical factor algorithm for all-to-all communication (research note). In: Monien, B., Feldmann, R.L. (eds.) Euro-Par 2002. LNCS, vol. 2400, pp. 799-804. Springer, Heidelberg (2002).
[11]
Sistare, S., vande Vaart, R., Loh, E.: Optimization of MPI collectives on clusters of largescale SMP's. In: SC, pp. 23-36 (1999).
[12]
Tipparaju, V., Nieplocha, J., Panda, D.K.: Fast collective operations using shared and remote memory access protocols on clusters. In: IPDPS, pp. 84-93 (2003).
[13]
Barnett, M., Gupta, S., Payne, D.G., Shuler, L., van de Geijn, R., Watts, J.: Interprocessor collective communication library (InterCom). In: SHPCC, pp. 357-364 (1994).
[14]
Kalé, L.V., Kumar, S., Varadarajan, K.: A framework for collective personalized communication. In: IPDPS, pp. 69-77 (2003).
[15]
Ohio State University. MVAPICH: MPI over infiniband and iWARP, http://mvapich.cse.ohio-state.edu
[16]
Bruck, J., Ho, C., Upfal, E., Kipnis, S., Weathersby, D.: Efficient algorithms for all-to-all communications in multiport message-passing systems. IEEE Trans. Parallel Distrib. 8(11), 1143-1156 (1997).
[17]
Rabenseifner, R.: New optimized MPI reduce algorithm, http://www.hlrs.de/ organization/par/services/models/mpi/myreduce.html
[18]
Argonne National Laboratory. MPICH1, http://www-unix.mcs.anl.gov/mpi/mpich1
[19]
Intel Ltd. Intel IMB benchmark, http://www.intel.com/cd/software/ products/asmo-na/eng/219848.htm
[20]
Huang, Z., Purvis, M.K., Werstein, P.: Performance evaluation of view-oriented parallel programming. In: ICPP, pp. 251-258 (2005).
[21]
Xue, W., Shu, J., Wu, Y., Zheng, W.: Parallel algorithm and implementation for realtime dynamic simulation of power system. In: ICPP, pp. 137-144 (2005).
[22]
Hewlett-Packard Development Company. HP-MPI user's guide, http://docs.hp.com/en/B6060-96024/ch03s12.html

Cited By

View all
  • (2023)Using Mixed-Radix Decomposition to Enumerate Computational Resources of Deeply Hierarchical ArchitecturesProceedings of the SC '23 Workshops of the International Conference on High Performance Computing, Network, Storage, and Analysis10.1145/3624062.3624109(405-415)Online publication date: 12-Nov-2023
  • (2019)LPMSWorkshop Proceedings of the 48th International Conference on Parallel Processing10.1145/3339186.3339208(1-10)Online publication date: 5-Aug-2019
  • (2017)Enabling hierarchy-aware MPI collectives in dynamically changing topologiesProceedings of the 24th European MPI Users' Group Meeting10.1145/3127024.3127031(1-11)Online publication date: 25-Sep-2017
  • Show More Cited By
  1. Process Mapping for MPI Collective Communications

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image Guide Proceedings
    Euro-Par '09: Proceedings of the 15th International Euro-Par Conference on Parallel Processing
    August 2009
    1082 pages
    ISBN:9783642038686
    • Editors:
    • Henk Sips,
    • Dick Epema,
    • Hai-Xiang Lin

    Publisher

    Springer-Verlag

    Berlin, Heidelberg

    Publication History

    Published: 23 August 2009

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 08 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)Using Mixed-Radix Decomposition to Enumerate Computational Resources of Deeply Hierarchical ArchitecturesProceedings of the SC '23 Workshops of the International Conference on High Performance Computing, Network, Storage, and Analysis10.1145/3624062.3624109(405-415)Online publication date: 12-Nov-2023
    • (2019)LPMSWorkshop Proceedings of the 48th International Conference on Parallel Processing10.1145/3339186.3339208(1-10)Online publication date: 5-Aug-2019
    • (2017)Enabling hierarchy-aware MPI collectives in dynamically changing topologiesProceedings of the 24th European MPI Users' Group Meeting10.1145/3127024.3127031(1-11)Online publication date: 25-Sep-2017
    • (2017)A hierarchical model to manage hardware topology in MPI applicationsProceedings of the 24th European MPI Users' Group Meeting10.1145/3127024.3127030(1-11)Online publication date: 25-Sep-2017
    • (2015)Optimising MPI tree-based communication for NUMA architecturesInternational Journal of Autonomous and Adaptive Communications Systems10.1504/IJAACS.2015.0731908:4(407-423)Online publication date: 1-Nov-2015
    • (2014)CypressProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1109/SC.2014.17(143-153)Online publication date: 16-Nov-2014
    • (2012)Automatic mapping of parallel applications on multicore architectures using the Servet benchmark suiteComputers and Electrical Engineering10.1016/j.compeleceng.2011.12.00738:2(258-269)Online publication date: 1-Mar-2012
    • (2012)Dynamic thread mapping based on machine learning for transactional memory applicationsProceedings of the 18th international conference on Parallel Processing10.1007/978-3-642-32820-6_47(465-476)Online publication date: 27-Aug-2012
    • (2009)FACTProceedings of the Conference on High Performance Computing Networking, Storage and Analysis10.1145/1654059.1654087(1-12)Online publication date: 14-Nov-2009

    View Options

    View options

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media