Abstract
We report on our work on improving the performance of collective operations in MPICH on clusters connected by switched networks. For each collective operation, we use multiple algorithms depending on the message size, with the goal of minimizing latency for short messages and minimizing bandwidth usage for long messages. Although we have implemented new algorithms for all MPI collective operations, because of limited space we describe only the algorithms for allgather, broadcast, reduce-scatter, and reduce. We present performance results using the SKaMPI benchmark on a Myrinet-connected Linux cluster and an IBM SP. In all cases, the new algorithms significantly outperform the old algorithms used in MPICH on the Myrinet cluster, and, in many cases, they outperform the algorithms used in IBM’s MPI on the SP.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Barnett, M., Gupta, S., Payne, D., Shuler, L., van de Geijn, R., Watts, J.: Interprocessor collective communication library (InterCom). In: Proceedings of Supercomputing 1994 (November 1994)
Barnett, M., Littlefield, R., Payne, D., van de Geijn, R.: Global combine on mesh architectures with wormhole routing. In: Proceedings of the 7th International Parallel Processing Symposium (April 1993)
Bokhari, S.: Complete exchange on the iPSC/860. Technical Report 91–4, ICASE, NASA Langley Research Center (1991)
Bokhari, S., Berryman, H.: Complete exchange on a circuit switched mesh. In: Proceedings of the Scalable High Performance Computing Conference, pp. 300– 306 (1992)
Hensgen, D., Finkel, R., Manbet, U.: Two algorithms for barrier synchronization. International Journal of Parallel Programming 17(1), 1–17 (1988)
Kale, L.V., Kumar, S., Vardarajan, K.: A framework for collective personalized communication. In: Proceedings of the 17th International Parallel and Distributed Processing Symposium, IPDPS 2003 (2003)
Karonis, N., de Supinski, B., Foster, I., Gropp, W., Lusk, E., Bresnahan, J.: Exploiting hierarchy in parallel computer networks to optimize collective operation performance. In: Proceedings of the Fourteenth International Parallel and Distributed Processing Symposium (IPDPS 2000), pp. 377–384 (2000)
Kielmann, T., Hofman, R.F.H., Bal, H.E., Plaat, A., Bhoedjang, R.A.F.: Mag-PIe: MPI’s collective communication operations for clustered wide area systems. In: ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP 1999), May 1999, pp. 131–140. ACM Press, New York (1999)
Mitra, P., Payne, D., Shuler, L., van de Geijn, R., Watts, J.: Fast collective communication libraries, please. In: Proceedings of the Intel Supercomputing Users’ Group Meeting (June 1995)
Rabenseifner, R.: Effective bandwidth (b_eff) benchmark, http://www.hlrs.de/mpi/beff
Rabenseifner, R.: New optimized MPI reduce algorithm, http://www.hlrs.de/organization/par/services/models/mpi/myreduce.html
Sanders, P., Träff, J.L.: The hierarchical factor algorithm for all-toall communication. In: Monien, B., Feldmann, R.L. (eds.) Euro-Par 2002. LNCS, vol. 2400, pp. 799–803. Springer, Heidelberg (2002)
Scott, D.: Efficient all-to-all communication patterns in hypercube and mesh topologies. In: Proceedings of the 6th Distributed Memory Computing Conference, pp. 398–403 (1991)
Shroff, M., van de Geijn, R.A.: CollMark: MPI collective communication benchmark. Technical report, Dept. of Computer Sciences, University of Texas at Austin (December 1999)
Sistare, S., vandeVaart, R., Loh, E.: Optimization of MPI collectives on clusters of large-scale SMPs. In: Proceedings of SC 1999: High Performance Networking and Computing (November 1999)
Tipparaju, V., Nieplocha, J., Panda, D.K.: Fast collective operations using shared and remote memory access protocols on clusters. In: Proceedings of the 17th International Parallel and Distributed Processing Symposium, IPDPS 2003 (2003)
Träff, J.L.: Improved MPI all-to-all communication on a Giganet SMP cluster. In: Kranzlmüller, D., Kacsuk, P., Dongarra, J., Volkert, J. (eds.) PVM/MPI 2002. LNCS, vol. 2474, pp. 392–400. Springer, Heidelberg (2002)
Vadhiyar, S.S., Fagg, G.E., Dongarra, J.: Automatically tuned collective communications. In: Proceedings of SC 1999: High Performance Networking and Computing (November 1999)
Worsch, T., Reussner, R., Augustin, W.: On benchmarking collective MPI operations. In: Kranzlmüller, D., Kacsuk, P., Dongarra, J., Volkert, J. (eds.) PVM/MPI 2002. LNCS, vol. 2474, pp. 271–279. Springer, Heidelberg (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Thakur, R., Gropp, W.D. (2003). Improving the Performance of Collective Operations in MPICH. In: Dongarra, J., Laforenza, D., Orlando, S. (eds) Recent Advances in Parallel Virtual Machine and Message Passing Interface. EuroPVM/MPI 2003. Lecture Notes in Computer Science, vol 2840. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-39924-7_38
Download citation
DOI: https://doi.org/10.1007/978-3-540-39924-7_38
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-20149-6
Online ISBN: 978-3-540-39924-7
eBook Packages: Springer Book Archive