The effects of problem partitioning, allocation, and granularity on the performance of multiple-processor systems

Cvetanovic - IEEE transactions on Computers, 1987 - ieeexplore.ieee.org
Cvetanovic
IEEE transactions on Computers, 1987ieeexplore.ieee.org
In this paper we analyze the effects of the problem decomposition, the allocation of
subproblems to processors, and the grain size of subproblems on the performance of a
multiple-processor shared-memory architecture. Our results indicate that for algorithms
where both the computation and the communication overhead can be fully decomposed
among N processors, the speedup is a nondecreasing function of the level of granularity for
arbitrary interconnection structure and allocation of subproblems to processors. For these …
In this paper we analyze the effects of the problem decomposition, the allocation of subproblems to processors, and the grain size of subproblems on the performance of a multiple- processor shared-memory architecture. Our results indicate that for algorithms where both the computation and the communication overhead can be fully decomposed among N processors, the speedup is a nondecreasing function of the level of granularity for arbitrary interconnection structure and allocation of subproblems to processors. For these algorithms, the speedup is an increasing function of the level of granularity provided that the interconnection bandwidth is greater than unity. If the bandwidth is equal to unity, then the speedup converges to the value equal to the ratio of processing time to communication time. For algorithms where the computation is decomposable but the communication overhead cannot be decomposed, the speedup is a nondecreasing function of the level of granularity for the best case bandwidth only. If the bandwidth is less than N, the speedup reaches its maximum and then decreases approaching zero as the level of granularity grows. For algorithms where the computation consists of parallel and serial sections of code and the communication overhead is fully decomposable, the speedup converges to a value inversely proportional to the fraction of time spent in the serial code even for the best case interconnection bandwidth.
ieeexplore.ieee.org