SIGARCH: Vol 17, No 2

Volume 17, Issue 2April 1989Special issue: Proceedings of ASPLOS-III: the third international conference on architecture support for programming languages and operating systems

Volume 17, Issue 2

April 1989

Editor:

Joel Emer

Publisher:

Association for Computing Machinery
New York
NY
United States

ISSN:0163-5964

Tags:

Bibliometrics

Newsletter Downloads

PDFFront Matter Material

PDFBack Matter Material

Select All

Export Citations Save to Binder

article

Free

Architecture and compiler tradeoffs for a long instruction wordprocessor

Pages 2–14https://doi.org/10.1145/68182.68183

A very long instruction word (VLIW) processor exploits parallelism by controlling multiple operations in a single instruction word. This paper describes the architecture and compiler tradeoffs in the design of iWarp, a VLIW single-chip microprocessor ...

article

Free

Tradeoffs in instruction format design for horizontal architectures

Pages 15–25https://doi.org/10.1145/68182.68184

With recent improvements in software techniques and the enhanced level of fine grain parallelism made available by such techniques, there has been an increased interest in horizontal architectures and large instruction words that are capable of issuing ...

article

Free

Overlapped loop support in the Cydra 5

Pages 26–38https://doi.org/10.1145/68182.68185

The Cydra^TM 5 architecture adds unique support for overlapping successive iterations of a loop to a very long instruction word (VLIW) base. This architecture allows highly parallel loop execution for a much larger class of loops than can be vectorized, ...

article

Free

Architectural support for synchronous task communication

Pages 40–53https://doi.org/10.1145/68182.68186

This paper describes the motivation for a set of intertask communication primitives, the hardware support of these primitives, the architecture used in the Sylvan project which studies these issues, and the experience gained from various experiments ...

article

Free

The fuzzy barrier: a mechanism for high speed synchronization of processors

Rajiv Gupta

Pages 54–63https://doi.org/10.1145/68182.68187

Parallel programs are commonly written using barriers to synchronize parallel processes. Upon reaching a barrier, a processor must stall until all participating processors reach the barrier. A software implementation of the barrier mechanism using ...

article

Free

Efficient synchronization primitives for large-scale cache-coherent multiprocessors

Pages 64–75https://doi.org/10.1145/68182.68188

This paper proposes a set of efficient primitives for process synchronization in multiprocessors. The only assumptions made in developing the set of primitives are that hardware combining is not implemented in the inter-connect, and (in one case) that ...

article

Free

A software instruction counter

Pages 78–86https://doi.org/10.1145/68182.68189

Although several recent papers have proposed architectural support for program debugging and profiling, most processors do not yet provide even basic facilities, such as an instruction counter. As a result, system developers have been forced to invent ...

article

Free

Efficient debugging primitives for multiprocessors

Pages 87–95https://doi.org/10.1145/68182.68190

Existing kernel-level debugging primitives are inappropriate for instrumenting complex sequential or parallel programs. These functions incur a heavy overhead in their use of system calls and process switches. Context switches are used to alternately ...

article

Free

Sheaved memory: architectural support for state saving and restoration in pages systems

M. E. Staknis

Pages 96–102https://doi.org/10.1145/68182.68191

The concept of read-one/write-many paged memory is introduced and given the name sheaved memory. It is shown that sheaved memory is useful for efficiently maintaining checkpoints in main memory and for providing state saving and state restoration for ...

article

Free

Reference history, page size, and migration daemons in local/remote architectures

M. A. Holliday

Pages 104–112https://doi.org/10.1145/68182.68192

We address the problem of paged main memory management in the local/remote architecture subclass of shared memory multiprocessors. We consider the case where the operating system has primary responsibility and uses page migration as its main tool. We ...

article

Free

Translation lookaside buffer consistency: a software approach

Pages 113–122https://doi.org/10.1145/68182.68193

We discuss the translation lookaside buffer (TLB) consistency problem for multiprocessors, and introduce the Mach shootdown algorithm for maintaining TLB consistency in software. This algorithm has been implemented on several multiprocessors, and is in ...

article

Free

Failure correction techniques for large disk arrays

Pages 123–132https://doi.org/10.1145/68182.68194

The ever increasing need for I/O bandwidth will be met with ever larger arrays of disks. These arrays require redundancy to protect against data loss. This paper examines alternative choices for encodings, or codes, that reliably store information in ...

article

Free

A unified vector/scalar floating-point architecture

Pages 134–143https://doi.org/10.1145/68182.68195

In this paper we present a unified approach to vector and scalar computation, using a single register file for both scalar operands and vector elements. The goal of this architecture is to yield improved scalar performance while broadening the range of ...

article

Free

Data buffering: run-time versus compile-time support

H. Mulder

Pages 144–151https://doi.org/10.1145/68182.68196

Data-dependency, branch, and memory-access penalties are main constraints on the performance of high-speed microprocessors. The memory-access penalties concern both penalties imposed by external memory (e.g. cache) or by under utilization of the local ...

article

Free

An analysis of 8086 instruction set usage in MS DOS programs

Pages 152–160https://doi.org/10.1145/68182.68197

article

Free

A real-time support processor for ada tasking

J. Roos

Pages 162–171https://doi.org/10.1145/68182.68198

Task synchronization in Ada causes excessive run-time overhead due to the complex semantics of the rendezvous. To demonstrate that the speed can be increased by two orders of magnitude by using special purpose hardware, a single chip VLSI support ...

article

Free

The runtime environment for Scheme, a Scheme implementation on the 88000

Pages 172–182https://doi.org/10.1145/68182.68199

We are implementing a Scheme development system for the Motorola 88000. The core of the implementation is an optimizing native code compiler, together with a carefully designed runtime system. This paper describes our experiences with the 88000 as a ...

article

Free

Program optimization for instruction caches

S. McFarling

Pages 183–191https://doi.org/10.1145/68182.68200

This paper presents an optimization algorithm for reducing instruction cache misses. The algorithm uses profile information to reposition programs in memory so that a direct-mapped cache behaves much like an optimal cache with full associativity and ...

article

Free

Using registers to optimize cross-domain call performance

Paul A. Karger

Pages 194–204https://doi.org/10.1145/68182.68201

This paper describes a new technique to improve the performance of cross-domain calls and returns in a capability-based computer system. Using register optimization information obtained from the compiler, a trusted linker can minimize the number of ...

article

Free

The design of nectar: a network backplane for heterogeneous multicomputers

Pages 205–216https://doi.org/10.1145/68182.68202

Nectar is a “network backplane” for use in heterogeneous multicomputers. The initial system consists of a star-shaped fiber-optic network with an aggregate bandwidth of 1.6 gigabits/second and a switching latency of 700 nanoseconds. The system can be ...

article

Free

A message driven OR-parallel machine

Pages 217–228https://doi.org/10.1145/68182.68203

A message driven architecture for the execution of OR-parallel logic languages is proposed. The computational model is based on well known compilation techniques for Logic Languages. We present first the multiple binding mechanism for the OR-parallel ...

article

Free

Evaluating the performance of software cache coherence

Pages 230–242https://doi.org/10.1145/68182.68204

In a shared-memory multiprocessor with private caches, cached copies of a data item must be kept consistent. This is called cache coherence. Both hardware and software coherence schemes have been proposed. Software techniques are attractive because they ...

article

Free

Analysis of cache invalidation patterns in multiprocessors

Pages 243–256https://doi.org/10.1145/68182.68205

To make shared-memory multiprocessors scalable, researchers are now exploring cache coherence protocols that do not rely on broadcast, but instead send invalidation messages to individual caches that contain stale data. The feasibility of such directory-...

article

Free

The effect of sharing on the cache and bus performance of parallel programs

Pages 257–270https://doi.org/10.1145/68182.68206

Bus bandwidth ultimately limits the performance, and therefore the scale, of bus-based, shared memory multiprocessors. Previous studies have extrapolated from uniprocessor measurements and simulations to estimate the performance of these machines. In ...

article

Free

Available instruction-level parallelism for superscalar and superpipelined machines

Pages 272–282https://doi.org/10.1145/68182.68207

Superscalar machines can issue several instructions per cycle. Superpipelined machines can issue only one instruction per cycle, but they have cycle times shorter than the latency of any functional unit. In this paper these two techniques are shown to ...

article

Free

Micro-optimization of floating-point operations

W. J. Dally

Pages 283–289https://doi.org/10.1145/68182.68208

This paper describes micro-optimization, a technique for reducing the operation count and time required to perform floating-point calculations. Micro-optimization involves breaking floating-point operations into their constituent micro-operations and ...

article

Free

Limits on multiple instruction issue

Pages 290–302https://doi.org/10.1145/68182.68209

This paper investigates the limitations on designing a processor which can sustain an execution rate of greater than one instruction per cycle on highly-optimized, non-scientific applications. We have used trace-driven simulations to determine that ...

Sections

Newsletter Downloads

Save to Binder

Subjects

Comments