Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/30350acmconferencesBook PagePublication PagesiscaConference Proceedingsconference-collections
ISCA '87: Proceedings of the 14th annual international symposium on Computer architecture
ACM1987 Proceeding
Publisher:
  • Association for Computing Machinery
  • New York
  • NY
  • United States
Conference:
ISCA87: The 14th Annual International Symposium on Computer Architecture Pittsburgh Pennsylvania USA June 2 - 5, 1987
ISBN:
978-0-8186-0776-9
Published:
01 June 1987
Sponsors:
Next Conference
Reflects downloads up to 10 Oct 2024Bibliometrics
Abstract

No abstract available.

Article
Free
Branch folding in the CRISP microprocessor: reducing branch delay to zero

A new method of implementing branch instructions is presented. This technique has been implemented in the CRISP Microprocessor. With a combination of hardware and software techniques the execution time cost for many branches can be effectively reduced ...

Article
Free
An evaluation of branch architectures

Branch instructions form a significant fraction of executed instructions, and their design is thus a crucial component of any architecture. This paper examines three alternatives in the design of branch instructions: delayed vs. non-delayed branches, ...

Article
Free
Checkpoint repair for out-of-order execution machines

Out-of-order execution and branch prediction are two mechanisms that can be used profitably in the design of Supercomputers to increase performance. Unfortunately this means there must be some kind of repair mechanism, since situations do occur that ...

Article
Free
Instruction issue logic for high-performance, interruptable pipelined processors

The performance of pipelined processors is severely limited by data dependencies. In order to achieve high performance, a mechanism to alleviate the effects of data dependencies must exist. If a pipelined CPU with multiple functional units is to be used ...

Article
Free
Fast temporary storage for serial and parallel execution

There is an apparent conflict between the hardware requirements for fast parallel execution and the hardware requirements for fast serial execution. For example, fast vector execution is achieved by maintaining high execution concurrency over extended ...

Article
Free
Performance analysis and design of a logic simulation machine

The high costs associated with logic simulation of large VLSI circuits has led to the need for new computer architectures tailored to the simulation task. Such architectures have the potential for significant speed-ups over software-based logic ...

Article
Free
A modular systolic architecture for image convolutions

This paper describes a modular, systolic design for two-dimensional convolution which is a frequent and computationally intensive operation in low-level image processing. The design consists of a one-dimensional array of homogeneous cells, each with a ...

Article
Free
A template matching algorithm using optically-connected 3-D VLSI architecture

Three-dimensional VLSI (in short, 3-D VLSI) is a new device technology that is expected to realize high performance systems. In this paper, we propose an image processing architecture based on 3-D VLSI consisting of optically-connected layers. Since the ...

Article
Free
Mapping data flow programs on a VLSI array of processors

With the advent of VLSI, relatively large processing arrays may be realized in a single VLSI chip. Such regularly structured arrays take considerably less time to design and test, and fault-tolerance can easily be introduced into them. However, only a ...

Article
Free
Analytical modeling and architectural modifications of a dataflow computer

Dataflow computers are an alternative to the von Neumann architectures and are capable of exploiting large amount of parallelism inherent in many computer applications. This paper deals with the performance analysis of the Manchester dataflow computer ...

Article
Free
A unified resource management and execution control mechanism for data flow machines

This paper presents a unified resource management and execution control mechanism for data flow machines. The mechanism integrates load control, depth-first execution control, cache memory control and a load balancing mechanism. All of these mechanisms ...

Article
Free
High performance integrated Prolog processor IPP

To realize the highest performance possible for a sequential processor, and to realize utilization of a large amount of existing software, an integrated Prolog processor (IPP) and its optimized compiler are now being developed.

A tagged architecture ...

Article
Free
Performance studies of a parallel Prolog architecture

This paper presents a new multiprocessor architecture for the parallel execution of logic programs, developed as part of the Aquarius Project. This architecture is designed to support AND-parallelism, OR-parallelism, and intelligent backtracking. We ...

Article
Free
An experimental VLSI Prolog interpreter: preliminary measurements and results

This work presents the preliminary results of a project oriented to the design and VLSI implementation of a Prolog interpreter. Even if the interpretative approach is being considered an inefficient way to execute high level languages when compared to ...

Article
Free
Deterministic and stochastic modeling of parallel garbage collection: towards real-time criteria

The study of garbage collection for a logic programming language machine has exhibited fundamental differences with the more popular functional programming garbage collection. These differences yield behaviours that cannot be observed with classical ...

Article
Free
Architectural issues in designing symbolic processors in optics

This paper analyzes potential optical architectures for AI applications (such as knowledge-based systems). Our goal was to investigate architectures most suitable for implementation completely in optics. While optical computing appears to hold much ...

Article
Free
Rearrangeability of multistage shuffle/exchange networks

In this paper we study the rearrangeability of multistage shuffle/exchange networks. Although a theoretical lower bound of (2 log2N - 1) stages for rearrangeability of a network with N = 2n inputs and outputs has been known, the sufficiency of (2 log2N -...

Article
Free
Optimized mesh-connected networks for SIMD and MIMD architectures

A class of mesh networks with wrap-around links is obtained from a class of circulant graphs by means of a graph isomorphism. We demonstrate how to obtain, from the adjacency pattern of the graph, simple parameters that serve to construct a planar ...

Article
Free
Performance evaluation of reduced bandwidth multistage interconnection networks

This paper presents and evaluates a class of buffered interconnection networks which provide performance and cost levels intermediate to a bus and a delta network. These networks, referred to as hybrid networks, are formed by beginning with a delta ...

Article
Free
Hardware support for interprocess communication

In recent years there has been increasing interest in message-based operating systems, particularly in distributed environments. Such systems consist of a small message-passing kernel supporting a collection of system server processes that provide such ...

Article
Free
Architecture of a message-driven processor

We propose a machine architecture for a high-performance processing node for a message-passing, MIMD concurrent computer. The principal mechanisms for attaining this goal are the direct execution and buffering of messages and a memory-based architecture ...

Article
Free
Effect of storage allocation/reclamation methods on parallelism and storage requirements

The write after read/write synchronizations (the anti- and output-dependence constraints) inhibit the parallelism exhibited by Fortran programs. These constraints can be avoided by allocating storage for the values generated in a program dynamically, so ...

Article
Free
Cache design of a sub-micron CMOS system/370

An innovative cache accessing scheme based on high MRU (most recently used) hit ratio [1] is proposed for the design of a one-cycle cache in a CMOS implementation of System/370. It is shown that with this scheme the cache access time is reduced by 30 ~ ...

Article
Free
An architectural perspective on a memory access controller

In this paper a CMOS memory access controller chip is described that provides the basis for achieving high-performance 68020-based (68030-based) systems. This controller matches the speed of the memory system to that of the microprocessor by providing a ...

Article
Free
Organization and analysis of a gracefully-degrading interleaved memory system

A hardware mechanism has been proposed to reconfigure an interleaved memory system. The reconfiguration scheme is such that, at any instant all fault-free memory banks in the memory system are utilized in interleaved manner. A performance metric is ...

Article
Free
Correct memory operation of cache-based multiprocessors

This paper shows that cache coherence protocols can implement indivisible synchronization primitives reliably and can also enforce sequential consistency. Sequential consistency provides a commonly accepted model of behavior of multiprocessors. We ...

Article
Free
Hierarchical cache/bus architecture for shared memory multiprocessors

A new, large scale multiprocessor architecture is presented in this paper. The architecture consists of hierarchies of shared buses and caches. Extended versions of shared bus multicache coherency protocols are used to maintain coherency among all ...

Article
Free
Multiprocessor cache design considerations

In this paper, cache design is explored for large high-performance multiprocessors with hundreds or thousands of processors and memory modules interconnected by a pipe-lined multi-stage network. The majority of the multiprocessor cache studies in the ...

Article
Free
Performance evaluation of multiple register sets

In this paper a DEC VAX with multiple register sets is evaluated under many differently sized register sets. Both the number of register sets and the number of registers per set were varied. Performance, measured in terms of memory traffic, is compared ...

Contributors

Index Terms

  1. Proceedings of the 14th annual international symposium on Computer architecture

      Recommendations

      Acceptance Rates

      Overall Acceptance Rate 543 of 3,203 submissions, 17%
      YearSubmittedAcceptedRate
      ISCA '224006717%
      ISCA '193656217%
      ISCA '173225417%
      ISCA '132885619%
      ISCA '122624718%
      ISCA '082593714%
      ISCA '062343113%
      ISCA '051944523%
      ISCA '042173114%
      ISCA '031843620%
      ISCA '021802715%
      ISCA '011632415%
      ISCA '991352619%
      Overall3,20354317%