Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/74925acmconferencesBook PagePublication PagesiscaConference Proceedingsconference-collections
ISCA '89: Proceedings of the 16th annual international symposium on Computer architecture
ACM1989 Proceeding
Publisher:
  • Association for Computing Machinery
  • New York
  • NY
  • United States
Conference:
Jerusalem Israel
ISBN:
978-0-89791-319-5
Published:
01 April 1989
Sponsors:
SIGARCH, IEEE-CS
Next Conference
Reflects downloads up to 25 Dec 2024Bibliometrics
Abstract

No abstract available.

Article
Free
Evaluating the performance of four snooping cache coherency protocols

Write-invalidate and write-broadcast coherency protocols have been criticized for being unable to achieve good bus performance across all cache configurations. In particular, write-invalidate performance can suffer as block size increases; and large ...

Article
Free
Multi-level shared caching techniques for scalability in VMP-M/C

The problem of building a scalable shared memory multiprocessor can be reduced to that of building a scalable memory hierarchy, assuming interprocessor communication is handled by the memory system. In this paper, we describe the VMP-MC design, a ...

Article
Free
Design and performance of a coherent cache for parallel logic programming architectures

This paper describes the design and performance of a tightly-coupled shared-memory coherent cache optimized for the execution of parallel logic programming architectures. The cache utilizes a copy-back write-allocation protocol having five states and a ...

Article
Free
The Epsilon dataflow processor

The εpsilon dataflow architecture is designed for high speed uniprocessor execution as well as for parallel operation in a multiprocessor system. The εpsilon architecture directly matches ready operands, thus eliminating the need for associative ...

Article
Free
An architecture of a dataflow single chip processor

A highly parallel (more than a thousand) dataflow machine EM-4 is now under development. The EM-4 design principle is to construct a high performance computer using a compact architecture by overcoming several defects of dataflow machines. Constructing ...

Article
Free
Exploiting data parallelism in signal processing on a dataflow machine

This paper will show that the massive data parallelism inherent to most signal processing tasks may be easily mapped onto the parallel structure of a data flow machine. A special system called STRUCTFLOW has been designed to optimize the static data ...

Article
Free
Architectural mechanisms to support sparse vector processing

We discuss the algorithmic steps involved in common sparse matrix problems, with particular emphasis on linear programming by the revised simplex method. We then propose new architectural mechanisms which are being built into an experimental machine, ...

Article
Free
A dynamic storage scheme for conflict-free vector access

Previous investigations into data storage schemes have focused on finding a storage scheme that permits conflict-free access for a set of frequently encountered access patterns. This paper considers an alternative approach. Rather than forcing a single ...

Article
Free
SIMP (Single Instruction stream/Multiple instruction Pipelining): a novel high-speed single-processor architecture

SIMP is a novel multiple instruction-pipeline parallel architecture. It is targeted for enhancing the performance of SISD processors drastically by exploiting both temporal and spatial parallelisms, and for keeping program compatibility as well. Degree ...

Article
Free
2-D SIMD algorithms in the perfect shuffle networks

This paper studies a set of basic algorithms for SIMD Perfect Shuffle networks. These algorithms where studied in several papers, but for the 1-D case, where the size of the problem N is the same as the number of processors P. For the 2-D case of N = L *...

Article
Free
Systematic hardware adaptation of systolic algorithms

In this paper we propose a methodology to adapt Systolic Algorithms to the hardware selected for their implementation. Systolic Algorithms obtained can be efficiently implemented using Pipelined Functional Units. The methodology is based on two ...

Article
Free
Task migration in hypercube multiprocessors

Allocation and deallocation of subcubes usually result in a fragmented hypercube where even if a sufficient number of hypercube nodes are available, they do not form a subcube large enough to execute an incoming task. As the fragmentation in ...

Article
Free
Characteristics of performance-optimal multi-level cache hierarchies

The increasing speed of new generation processors will exacerbate the already large difference between CPU cycle times and main memory access times. As this difference grows, it will be increasingly difficult to build single-level caches that are both ...

Article
Free
Supporting reference and dirty bits in SPUR's virtual address cache

Virtual address caches can provide faster access times than physical address caches, because translation is only required on cache misses. However, because we don't check the translation information on each cache access, maintaining reference and dirty ...

Article
Free
Inexpensive implementations of set-associativity

The traditional approach to implementing wide set-associativity is expensive, requiring a wide tag memory (directory) and many comparators. Here we examine alternative implementations of associativity that use hardware similar to that used to implement ...

Article
Free
Organization and performance of a two-level virtual-real cache hierarchy

We propose and analyze a two-level cache organization that provides high memory bandwidth. The first-level cache is accessed directly by virtual addresses. It is small, fast, and, without the burden of address translation, can easily be optimized to ...

Article
Free
High performance communications in processor networks

In order to provide an arbitrary and fully dynamic connectivity in a network of processors, transport mechanisms must be implemented, which provide the propagation of data from processor to processor, based on addresses contained within a packet of ...

Article
Free
Introducing memory into the switch elements of multiprocessor interconnection networks

As VLSI technology continues to improve, circuit area is gradually being replaced by pin restrictions as the limiting factor in design. Thus, it is reasonable to anticipate that on-chip memory will become increasingly inexpensive since it is a simple, ...

Article
Free
Using feedback to control tree saturation in multistage interconnection networks

In this paper, we propose the use of feedback schemes in multiprocessors which use an interconnection network with distributed routing control. We show that by altering system behavior so as to minimize the occurrence of a performance-degrading ...

Article
Free
Constructing replicated systems using processors with point-to-point communication links

Replicated processing with majority voting is a well known method of achieving fault tolerance. We consider the problem of constructing a distributed system composed of an arbitrarily large number of N-modular redundant (NMR) nodes, where each node ...

Article
Free
KCM: a knowledge crunching machine

KCM (Knowledge Crunching Machine) is a high-performance back-end processor which, coupled to a UNIX* desk-top workstation, provides a powerful and user-friendly Prolog environment catering for both development and execution of significant Prolog ...

Article
Free
A high performance Prolog processor with multiple function units

We describe the Parallel Unification Machine (PLUM), a Prolog processor that exploits fine grain parallelism using multiple function units executing in parallel. In most cases the execution of bookkeeping instructions is almost completely overlapped by ...

Article
Free
Evaluation of memory system for integrated Prolog processor IPP

This paper discusses an optimal memory system to realize a high performance integrated Prolog processor, the IPP. First, the memory access characteristics of Prolog are analyzed by a simulator, which simulates the execution of a Prolog program at a ...

Article
Free
A type driven hardware engine for Prolog clause retrieval over a large knowledge base

Whereas existing Prolog systems are very effective at handling small knowledge bases, they are not very efficient at and often incapable of handling large sets of clauses. Large knowledge bases which may comprise millions of clauses and are shared by a ...

Article
Free
Comparing software and hardware schemes for reducing the cost of branches

Pipelining has become a common technique to increase throughput of the instruction fetch, instruction decode, and instruction execution portions of modern computers. Branch instructions disrupt the flow of instructions through the pipeline, increasing ...

Article
Free
Improving performance of small on-chip instruction caches

Most current single-chip processors employ an on-chip instruction cache to improve performance. A miss in this instruction cache will cause an external memory reference which must compete with data references for access to the external memory, thus ...

Article
Free
Achieving high instruction cache performance with an optimizing compiler

Increasing the execution power requires a high instruction issue bandwidth, and decreasing instruction encoding and applying some code improving techniques cause code expansion. Therefore, the instruction memory hierarchy performance has become an ...

Article
Free
The impact of code density on instruction cache performance

The widespread use of reduced-instruction-set computers has generated a lot of interest in the tradeoff between the density of an instruction set and the size of the instruction cache. In this paper we present and justify a method that predicts the ...

Article
Free
Can dataflow subsume von Neumann computing?

We explore the question: “What can a von Neumann processor borrow from dataflow to make it more suitable for a multiprocessor?” Starting with a simple, “RISC-like” instruction set, we show how to change the underlying processor organization to make it ...

Article
Free
Exploring the benefits of multiple hardware contexts in a multiprocessor architecture: preliminary results

A fundamental problem that any scalable multiprocessor must address is the ability to tolerate high latency memory operations. This paper explores the extent to which multiple hardware contexts per processor can help to mitigate the negative effects of ...

Contributors

Index Terms

  1. Proceedings of the 16th annual international symposium on Computer architecture

    Recommendations

    Acceptance Rates

    Overall Acceptance Rate 543 of 3,203 submissions, 17%
    YearSubmittedAcceptedRate
    ISCA '224006717%
    ISCA '193656217%
    ISCA '173225417%
    ISCA '132885619%
    ISCA '122624718%
    ISCA '082593714%
    ISCA '062343113%
    ISCA '051944523%
    ISCA '042173114%
    ISCA '031843620%
    ISCA '021802715%
    ISCA '011632415%
    ISCA '991352619%
    Overall3,20354317%