SIGARCH: Vol 18, No 3b

Volume 18, Issue 3bSept. 1990Special Issue: Proceedings of the 4th international conference on Supercomputing

Volume 18, Issue 3b

Sept. 1990

Publisher:

Association for Computing Machinery
New York
NY
United States

ISSN:0163-5964

Tags:

Bibliometrics

Select All

Export Citations Save to Binder

article

Free

The Tera computer system

Pages 1–6https://doi.org/10.1145/255129.255132

article

Free

OMP: a RISC-based multiprocessor using orthogonal-access memories and multiple spanning buses

Pages 7–22https://doi.org/10.1145/255129.255133

This paper presents the architectural design and RISC based implementation of a prototype supercomputer, namely the Orthogonal MultiProcessor (OMP). The OMP system is constructed with 16 Intel 1860 RISC microprocessors and 256 parallel memory modules, ...

article

Free

A basic architecture supporting LGDG computation

Pages 23–33https://doi.org/10.1145/255129.255135

In order to combine the benefits of dataflow and control-flow computation while avoiding the pitfalls of both, the authors propose a two-level model of large-grain dataflow computation, called LGDG computation. A formalism has been provided in a ...

article

Free

An efficient caching support for critical sections in large-scale shared-memory multiprocessors

Pages 34–47https://doi.org/10.1145/255129.255137

Directory-based and software-assisted schemes are the two main approaches to solving the cache coherence problem in large scale shared-memory multiprocessors. Until now, the emphasis in software-assisted schemes has been on ascertaining consistency ...

article

Free

An improvement of I/O function for auxiliary storage: parallel I/O for a large scale supercomputing

Pages 48–59https://doi.org/10.1145/255129.255138

New I/O technique for external auxiliary storage: magnetic disk unit, has been developed to improve the I/O performance on HITAC VOS3/ES1 with usual hardware architecture. Since the I/O technique is based on the idea that the sequence of I/O processes ...

article

Free

Analysis of a variant hypercube topology

Nian-Feng Tzeng

Pages 60–70https://doi.org/10.1145/255129.255140

Each node of a hypercube system, when fabricated, comes with a fixed number of links designed for a maximum sized construction. Very often, there are links left unused at each node in a real system. In this article, we study the hypercube in which extra ...

article

Free

Parallel ODE solvers

Pages 71–81https://doi.org/10.1145/255129.255141

We are interested in the efficient solution of linear second order Partial Differential Equation (PDE) problems on rectangular domains. The PDE discretisation scheme used is of Finite Element type and is based on quadratic splines and the collocation ...

article

Free

Use of parallel level 3 BLAS in LU factorization on three vector multiprocessors the ALLIANT FX/80, the CRAY-2, and the IBM 3090 VF

Pages 82–95https://doi.org/10.1145/255129.255142

We show how to transform the B-spline curve and surface fitting problems into suffix computations of continued fractions. Then a parallel substitution scheme is introduced to compute the suffix values on a newly proposed mesh-of-unshuffle network. The ...

article

Free

//ELLPACK: a numerical simulation programming environment for parallel MIMD machines

Pages 96–107https://doi.org/10.1145/255129.255144

article

Free

Schur complement preconditioned conjugate gradient methods for spline collocation equations

Christina C. Christara

Pages 108–120https://doi.org/10.1145/255129.255146

article

Free

Cost-optimal parallel B-spline interpolations

Pages 121–131https://doi.org/10.1145/255129.255147

article

Free

Solving general sparse linear systems using conjugate gradient-type methods

Pages 132–139https://doi.org/10.1145/255129.255149

The problem of finding an approximation of @@@@ = A^†b (where A^† is the pseudo-inverse of A ∈ @@@@^m@@@@n with m ≥ n and rank(A) = n) is discussed. It is assumed that A is sparse but has neither a special pattern (as bandedness) nor a special property (as ...

article

Free

Dataflow computer development in Japan

Pages 140–147https://doi.org/10.1145/255129.255151

This paper describes the research activity on dataflow computing in Japan focusing on dataflow computer development at the Electrotechnical Laboratory (ETL). First, the history of dataflow computer development in Japan is outlined. Some distinguished ...

article

Free

POSC—a partitioning and optimizing SISAL compiler

Pages 148–164https://doi.org/10.1145/255129.255152

Single-assignment languages like SISAL offer parallelism at all levels—among arbitrary operations, conditionals, loop iterations, and function calls. All control and data dependencies are local, and can be easily determined from the program. Various ...

article

Free

Loop optimization for horizontal microcoded machines

Pages 164–176https://doi.org/10.1145/255129.255153

Long Instruction Word (LIW) architectures exploit parallelism between various functional units. In order to produce efficient code for such an architecture, the microcode compiler will have to expose a relatively large degree of fine grain parallelism ...

article

Free

Compiler techniques for data synchronization in nested parallel loops

Pages 177–186https://doi.org/10.1145/255129.255155

The major source of parallelism in ordinary programs is do loops. When loop iterations of parallelized loops are executed on multiprocessors, the cross-iteration data dependencies need to be enforced by synchronization between processors. Existing data ...

article

Free

Compiler techniques for data partitioning of sequentially iterated parallel loops

Pages 187–200https://doi.org/10.1145/255129.255156

This paper uses bottom-up, static program partitioning to minimize the execution time of parallel programs by reducing interprocessor communication. Program partitioning is applied to a parallel programming construct known as a sequentially iterated ...

article

Free

On the perfect accuracy of an approximate subscript analysis test

Pages 201–212https://doi.org/10.1145/255129.255158

The Banerjee test is commonly considered to be the more accurate of the two major approximate data dependence tests used in automatic vectorization/parallelization of loops, the other being the GCD test. From its derivation, however, there is no simple ...

article

Free

A hardware-based performance monitor for the Intel iPSC/2 hypercube

Pages 213–226https://doi.org/10.1145/255129.255159

The complexity of parallel computer systems makes a priori performance prediction difficult and experimental performance analysis crucial. A complete characterization of software and hardware dynamics, needed to understand the performance of high-...

article

Free

Performance degradation due to multiprogramming and system overheads in real workloads: case study on a shared memory multiprocessor

Pages 227–238https://doi.org/10.1145/255129.255160

In this paper, performance degradation specifically due to the multiprogramming (MP) overhead in a parallel execution environment is quantified. In addition, total system overhead is also measured. A methodology, which estimates the MP overhead present ...

article

Free

SPARK: a benchmark package for sparse computations

Pages 239–253https://doi.org/10.1145/255129.255162

As the diversity of novel architectures expands rapidly there is a growing interest in studying the behavior of these architectures for computations arising in different applications. There has been significant efforts in evaluating the performance of ...

article

Free

Supercomputer performance evaluation and the Perfect Benchmarks

Pages 254–266https://doi.org/10.1145/255129.255163

In the past three years, the Perfect Benchmark^TM Suite has evolved from a supercomputer performance evaluation plan, presented by Kuck and Sameh at the 1987 International Conference on Supercomputing, to a vigorous international activity. This paper ...

article

Free

Strategies for large-scale structural problems on high-performance computers

Pages 267–280https://doi.org/10.1145/255129.255164

Novel computational strategies are presented for the analysis of large and complex structures. The strategies are based on generating the response of the complex structure using large perturbations from that of a simpler model, associated with a simpler ...

article

Free

Elastodynamics on clustered vector multiprocessors

Pages 281–290https://doi.org/10.1145/255129.255166

We present the parallelization of an elastodynamic code on a firmly coupled configuration consisting of two IBM 3090-600 VF, a total of 12 processors, joined with a connection facility. The programming environment used is Clustered FORTRAN which is a ...

article

Free

Implementation of 5-point/9-point multi-level methods on hypercube architectures

Victor Eijkhout

Pages 291–295https://doi.org/10.1145/255129.255167

Computational complexity of implementing 5/9-point multi-level methods on hypercube architectures is considered. The embedding of the nested red/black structures of these methods is described, and an analysis is made of data distances involved.

article

Free

Supercomputer-based visualization systems used for analyzing output data of a numerical weather prediction model

Philip C. Chen

Pages 296–309https://doi.org/10.1145/255129.255168

Comparison of two supercomputer-based visualization systems developed over a half-year period show that the visualization/animation efficiency is largely dependent upon the efficiencies of individual computers, networking, and memory management.

Using a ...

article

Free

Parallel automated wire-routing with a number of competing processors

Pages 310–317https://doi.org/10.1145/255129.255170

The purpose of the automated wire routing for VLSI and printed circuit board design is to connect a number of terminal pairs distributed throughout wiring plane with net paths which do not intersect each other. Although maze running and line search are ...

article

Hierarchical algorithms and architectures for parallel scientific computing

Tony F. Chan

Pages 318–329https://doi.org/10.1145/255129.255171

There has been a recent emergence of many interesting and highly efficient hierarchical (multilevel) algorithms (e.g. multigrid, domain decomposition, wavelets, multilevel preconditioning, the fast multipole algorithms, etc.) for solving numerical ...

article

Free

Incremental dependence analysis for interactive parallelization

Pages 330–341https://doi.org/10.1145/255129.255173

Incrementally updating dependence information during interactive parallelization is a difficult proposition. We have developed a tool (PAT) that maintains dependence information during incremental transformations to a Fortran program, including loop ...

article

Free

Parallelization of FORTRAN code on distributed-memory parallel processors

Pages 342–353https://doi.org/10.1145/255129.255174

This paper presents some preliminary results toward the automatic parallelization of uniprocessor FORTRAN code on distributed-memory parallel processors (DMPPs). The paper introduces Oxygen, a compiler for a DMPP under development at the Laboratory. The ...

Sections

Save to Binder

Subjects

Comments