No abstract available.
Basic compiler algorithms for parallel programs
Traditional compiler techniques developed for sequential programs do not guarantee the correctness (sequential consistency) of compiler transformations when applied to parallel programs. This is because traditional compilers for sequential programs do ...
Code motion for explicitly parallel programs
In comparison to automatic parallelization, which is thoroughly studied in the literature [31, 33], classical analyses and optimizations of explicitly parallel programs were more or less neglected. This may be due to the fact that naive adaptations of ...
An evaluation of computing paradigms for N-body simulations on distributed memory architectures
The efficiency of HPF with respect to irregular applications is still largely unproven. While recent work has shown that a highly irregular hierarchical n-body force calculation method can be implemented in HPF, we have found that the implmentation ...
SUIF Explorer: an interactive and interprocedural parallelizer
The SUIF Explorer is an interactive parallelization tool that is more effective than previous systems in minimizing the number of lines of code that require programmer assistance. First, the interprocedural analyses in the SUIF system is successful in ...
Dynamic instrumentation of threaded applications
The use of threads is becoming commonplace in both sequential and parallel programs. This paper describes our design and initial experience with non-trace based performance instrumentation techniques for threaded programs. Our goal is to provide ...
StackThreads/MP: integrating futures into calling standards
An implementation scheme of fine-grain multithreading that needs no changes to current calling standards for sequential languages and modest extensions to sequential compilers is described. Like previous similar systems, it performs an asynchronous call ...
Automatic parallelization of divide and conquer algorithms
Divide and conquer algorithms are a good match for modern parallel machines: they tend to have large amounts of inherent parallelism and they work well with caches and deep memory hierarchies. But these algorithms pose challenging problems for ...
Evaluation of predicated array data-flow analysis for automatic parallelization
This paper presents an evaluation of a new analysis for parallelizing compilers called predicated array data-flow analysis. This analysis extends array data-flow analysis for parallelization and privatization to associate predicates with data-flow ...
Transparent adaptive parallelism on NOWs using OpenMP
We present a system that allows OpenMP programs to execute on a network of workstations with a variable number of nodes. The ability to adapt to a variable number of nodes allows a program to take advantage of additional nodes that become available ...
Compile/run-time support for threaded MPI execution on multiprogrammed shared memory machines
MPI is a message-passing standard widely used for developing high-performance parallel applications. Because of the restriction in the MPI computation model, conventional implementations on shared memory machines map each MPI node to an OS process, ...
Design challenges of virtual networks: fast, general-purpose communication
Virtual networks provide applications with the illusion of having their own dedicated, high-performance networks, although network interfaces posses limited, shared resources. We present the design of a large-scale virtual network system and examine the ...
MagPIe: MPI's collective communication operations for clustered wide area systems
Writing parallel applications for computational grids is a challenging task. To achieve good performance, algorithms designed for local area networks must be adapted to the differences in link speeds. An important class of algorithms are collective ...
Predictive analysis of a wavefront application using LogGP
This paper develops a highly accurate LogGP model of a complex wavefront application that uses MPI communication on the IBM SP/2. Key features of the model include: (1) elucidation of the principal wavefront synchronization structure, and (2) explicit ...
Performance prediction of large parallel applications using parallel simulations
Accurate simulation of large parallel applications can be facilitated with the use of direct execution and parallel discrete event simulation. This paper describes the use of COMPASS, a direct execution-driven, parallel simulator for performance ...
Automatic node selection for high performance applications on networks
A central problem in executing performance critical parallel and distributed applications on shared networks is the selection of computation nodes and communication paths for execution. Automatic selection of nodes is complex as the best choice depends ...
An efficient implementation of Java's remote method invocation
Java offers interesting opportunities for parallel computing. In particular, Java Remote Method Invocation provides an unusually flexible kind of Remote Procedure Call. Unlike RPC, RMI supports polymorphism, which requires the system to be able to ...
Space-time memory: a parallel programming abstraction for interactive multimedia applications
Realistic interactive multimedia involving vision, animation, and multimedia collaboration is likely to become an important aspect of future computer applications. The scalable parallelism inherent in such applications coupled with their computational ...
Index Terms
- Proceedings of the seventh ACM SIGPLAN symposium on Principles and practice of parallel programming