Optimization of MPI collective communication on BlueGene/L systems

G Almási, P Heidelberger, CJ Archer… - Proceedings of the 19th …, 2005 - dl.acm.org
G Almási, P Heidelberger, CJ Archer, X Martorell, CC Erway, JE Moreira
Proceedings of the 19th annual international conference on Supercomputing, 2005dl.acm.org
BlueGene/L is currently the world's fastest supercomputer. It consists of a large number of
low power dual-processor compute nodes interconnected by high speed torus and collective
networks, Because compute nodes do not have shared memory, MPI is the the natural
programming model for this machine. The BlueGene/L MPI library is a port of MPICH2. In
this paper we discuss the implementation of MPI collectives on BlueGene/L. The MPICH2
implementation of MPI collectives is based on point-to-point communication primitives. This …
BlueGene/L is currently the world's fastest supercomputer. It consists of a large number of low power dual-processor compute nodes interconnected by high speed torus and collective networks, Because compute nodes do not have shared memory, MPI is the the natural programming model for this machine. The BlueGene/L MPI library is a port of MPICH2.In this paper we discuss the implementation of MPI collectives on BlueGene/L. The MPICH2 implementation of MPI collectives is based on point-to-point communication primitives. This turns out to be suboptimal for a number of reasons. Machine-optimized MPI collectives are necessary to harness the performance of BlueGene/L. We discuss these optimized MPI collectives, describing the algorithms and presenting performance results measured with targeted micro-benchmarks on real BlueGene/L hardware with up to 4096 compute nodes.
ACM Digital Library