The use of multiprocessor architectures and compilation for such architectures to speed up the execution of numerical programs is investigated. Three types of parallelism in programs are exploited: (1) Fine-grain: parallelism at the level of individual machine operations, (2) Loop: parallelism between different iterations of the same loop, (3) Coarse-grain: parallelism between different parts of a program.
Several multiprocessor architectures to use these types of parallelism are defined. These architectures are either of shared memory multiprocessor or multiple array processor class. Parallel programs for each architecture are automatically generated from serial codes by a Fortran compiler. The performance of each architecture using different compilation methods for each type of parallelism and each architecture, and for the best use of all types together is studied through simulation.
In addition, serial loops in parallel programs are studied to determine why they remain serial, their effect on performance, and whether they can be made parallel by the compiler or by a programmer.
Cited By
- Wang C and Wang S (1992). Efficient Processor Assignment Algorithms and Loop Transformations for Executing Nested Parallel Loops on Multiprocessors, IEEE Transactions on Parallel and Distributed Systems, 3:1, (71-82), Online publication date: 1-Jan-1992.
- Chen D, Su H and Yew P (2019). The impact of synchronization and granularity on parallel systems, ACM SIGARCH Computer Architecture News, 18:2SI, (239-248), Online publication date: 1-Jun-1990.
- Chen D, Su H and Yew P The impact of synchronization and granularity on parallel systems Proceedings of the 17th annual international symposium on Computer Architecture, (239-248)
- Polychronopoulos C, Kuck D and Padua D (2019). Utilizing Multidimensional Loop Parallelism on Large Scale Parallel Processor Systems, IEEE Transactions on Computers, 38:9, (1285-1296), Online publication date: 1-Sep-1989.
- Yamana H, Marushima T, Hagiwara T and Muraoka Y System architecture of parallel processing system -Harry- Proceedings of the 2nd international conference on Supercomputing, (76-89)
- Arafeh B Vectorization and parallelization interactive assistant Proceedings of the 1988 ACM sixteenth annual conference on Computer science, (573-577)
- Padua D and Wolfe M (1986). Advanced compiler optimizations for supercomputers, Communications of the ACM, 29:12, (1184-1201), Online publication date: 1-Dec-1986.
Recommendations
A Compiler Optimization Algorithm for Shared-Memory Multiprocessors
This paper presents a new compiler optimization algorithm that parallelizes applications for symmetric, shared-memory multiprocessors. The algorithm considers data locality, parallelism, and the granularity of parallelism. It uses dependence analysis ...
Implementation of a parallel Prolog interpreter on multiprocessors
IPPS '91: Proceedings of the Fifth International Parallel Processing SymposiumDescribes the implementation of the Reduce-OR process model for the parallel execution of logic programs in an interpreter for parallel Prolog. The interpreter supports full OR and independent AND parallelism in logic programs on both shared and ...
Coarse-Grain Task Parallel Processing Using the OpenMP Backend of the OSCAR Multigrain Parallelizing Compiler
ISHPC '00: Proceedings of the Third International Symposium on High Performance ComputingThis paper describes automatic coarse grain parallel processing on a shared memory multiprocessor system using a newly developed OpenMP backend of OSCAR multigrain parallelizing compiler for from single chip multiprocessor to a high performance ...