Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
Volume 36, Issue 7July 2001
Recommend ACM DL
ALREADY A SUBSCRIBER?SIGN IN
Reflects downloads up to 08 Feb 2025Bibliometrics
Skip Table Of Content Section
article
Reference idempotency analysis: a framework for optimizing speculative execution

Recent proposals for multithreaded architectures allow threads with unknown dependences to execute speculatively in parallel. These architectures use hardware speculative storage to buffer uncertain data, track data dependences and roll back incorrect ...

article
Pointer and escape analysis for multithreaded programs

This paper presents a new combined pointer and escape analysis for multithreaded programs. The algorithm uses a new abstraction called parallel interaction graphs to analyze the interactions between threads and extract precise points-to, escape, and ...

article
Language support for Morton-order matrices

The uniform representation of 2-dimensional arrays serially in Morton order (or {\eee} order) supports both their iterative scan with cartesian indices and their divide-and-conquer manipulation as quaternary trees. This data structure is important ...

article
Efficient load balancing for wide-area divide-and-conquer applications

Divide-and-conquer programs are easily parallelized by letting the programmer annotate potential parallelism in the form of spawn and sync constructs. To achieve efficient program execution, the generated work load has to be balanced evenly among the ...

article
Scalable queue-based spin locks with timeout

Queue-based spin locks allow programs with busy-wait synchronization to scale to very large multiprocessors, without fear of starvation or performance-destroying contention. So-called try locks, traditionally based on non-scalable test-and-set locks, ...

article
Contention elimination by replication of sequential sections in distributed shared memory programs

In shared memory programs contention often occurs at the transition between a sequential and a parallel section of the code. As all threads start executing the parallel section, they often access data just modified by the thread that executed the ...

article
Accurate data redistribution cost estimation in software distributed shared memory systems

Distributing data is one of the key problems in implementing efficient distributed-memory parallel programs. The problem becomes more difficult in programs where data redistribution between computational phases is considered. The global data ...

article
Dynamic adaptation to available resources for parallel computing in an autonomous network of workstations

Networks of workstations (NOWs), which are generally composed of autonomous compute elements networked together, are an attractive parallel computing platform since they offer high performance at low cost. The autonomous nature of the environment, ...

article
Source-level global optimizations for fine-grain distributed shared memory systems

This paper describes and evaluates the use of aggressive static analysis in Jackal, a fine-grain Distributed Shared Memory (DSM) system for Java. Jackal uses an optimizing, source-level compiler rather than the binary rewriting techniques employed by ...

article
High-level adaptive program optimization with ADAPT

Compile-time optimization is often limited by a lack of target machine and input data set knowledge. Without this information, compilers may be forced to make conservative assumptions to preserve correctness and to avoid performance degradation. In ...

article
Blocking and array contraction across arbitrarily nested loops using affine partitioning

Applicable to arbitrary sequences and nests of loops, affine partitioning is a program transformation framework that unifies many previously proposed loop transformations, including unimodular transforms, fusion, fission, reindexing, scaling and ...

article
Efficiency vs. portability in cluster-based network servers

Efficiency and portability are conflicting objectives for cluster-based network servers that distribute the clients' requests across the cluster based on the actual content requested. Our work is based on the observation that this efficiency vs. ...

article
Statistical scalability analysis of communication operations in distributed applications

Current trends in high performance computing suggest that users will soon have widespread access to clusters of multiprocessors with hundreds, if not thousands, of processors. This unprecedented degree of parallelism will undoubtedly expose scalability ...

article
LogGPS: a parallel computational model for synchronization analysis

We present a new parallel computational model, named LogGPS, which captures synchronization.

The LogGPS model is an extension of the LogGP model, which abstracts communication on parallel platforms. Although the LogGP model captures long messages with ...

article
Peer to peer and distributed computing

Subjects

Comments