Distributed memory multicomputers are logical target architectures for medium grain size applications. Many of these parallel machines, however, produce poor speedups on problems that should execute efficiently. A dense linear system solver is analyzed as a sample medium grain size application. Computational imbalance is proposed as an important performance metric in predicting speedup for medium grain size problems. An implementation of a dense linear system solver using NetLib, a low latency communication library, demonstrates that computational imbalance is an important predictor of parallel performance on the iWarp processor. Computational imbalance is also significant in predicting the speedup attained by several other multicomputers running an implementation of a dense linear system solver and a shallow water model written in a data-parallel language. Parallel languages must have access to routing primitives which minimize computational imbalance if they are to achieve good performance in this medium grain size problem space.
Recommendations
Optimizations for Efficient Array Redistribution on Distributed Memory Multicomputers
Special issue on compilation techniques for distributed memory systemsAppropriate data distribution has been found to be critical for obtaining good performance on distributed memory multicomputers such as the Thinking Machines CM-5, Intel Paragon, and IBM SP-1/SP-2. It has also been found that some programs need to ...
Dynamic Data Partitioning for Distributed-Memory Multicomputers
Special issue on compilation techniques for distributed memory systemsFor distributed-memory multicomputers such as the Intel Paragon, the IBM SP-1/SP-2, the NCUBE/2, and the Thinking Machines CM-5, the quality of the data partitioning for a given application is crucial to obtaining high performace. This task has ...