No abstract available.
A formal framework handling the description and implementation of multigrid algorithms
A formal approach to deal with several types of grids in the context of multigrid algorithms is presented. The approach serves as a framework for describing the definition and manipulation of grids as well as the specification of typical grid ...
On the parallel solution of parabolic equations
We propose new parallel algorithms for the solution of linear parabolic problems. The first of these methods is based on using polynomial approximation to the exponential. It does not require solving any linear systems and is highly parallelizable. The ...
Implementing Linda for distributed and parallel processing
In a recent paper [17], we described experiments using the VAX LINDA system. VAX LINDA allows a single application program to utilize many machines on a network simultaneously. Applications implemented on the network at Sandia National Laboratories have ...
An efficient message-passing scheduler based on guided self scheduling
While much work has been done to date on the study of task-scheduling schemes for shared memory machines, little of the knowledge gained has been transferred to distributed memory systems. In this paper we discuss the implementation and performance ...
The composite binary cube — a family of interconnection networks for multiprocessors
A class of interconnection networks called composite binary n-cube networks is presented in this paper for multiprocessors. These networks provide a spectrum of performance levels (measured in terms of message delays or network bandwidth) at different ...
I/O issues for hypercubes
In this paper, we look at several issues concerning the design of a disk system for a multiprocessor such as a hypercube. We propose a methodology for connecting the I/O processors to such a system for efficient I/O access. An analysis is presented to ...
Semi-iterative methods on distributed memory multiprocessor architectures
In the parallel ELLPACK (//ELLPACK) project we are developing a library of parallel interative methods for distributed memory multiprocessor systems and software tools for partitioning and allocation of the underlying computations. In this paper we ...
One-to-one mapping of process graphs onto a hypercube
In this paper, the problem of assigning N parallel tasks onto a hypercube with N processors is considered. Three heuristics are developed to map arbitrary process graphs, in a one-to-one fashion, onto a hypercube architecture, in order to minimize the ...
Automatic load balanced paritioning strategies for PDE computations
In this paper we study the partitioning and allocation of computations associated with the numerical solution of partial differential equations (PDEs). Strategies for the mapping of such computations to parallel MIMD architectures can be applied to ...
Parallel calculations on the wind-driven oceanic circulation using Fourier pseudospectral methods
The shallow-water equations for the wind-driven oceanic circulation were solved in parallel, using Fourier pseudospectral methods. A domain decomposition approach was used to perform two-dimensional Fast Fourier Transforms (2-D FFTs), as part of a ...
Simulations of three-dimensional flows with the lattice Boltzmann equation on the IBM 3090/VF
We illustrate the basic features of the Lattice Boltzmann Equation (LBE), a new finite-difference scheme that arises from the microdynamics of the Frisch-Hasslacher-Pomcau cellular automation once, instead of tracking the individual history of each ...
Vectorized molecular dynamics algorithms for very large number of particles
- Jacek Mościnski,
- Monika Bargieł,
- Jacek Kitowski,
- Zbigniew Skotniczny,
- Zbigniew A. Rycerz,
- Patrick W. M. Jacobs
In this paper three algorithms for Molecular Dynamics simulation suitable for vector computers are reviewed and timings presented. The algorithms were tested on the ETA 10-P and IBM 3090/150E/VF supercomputers and compared against the INTEL 80386/80387 ...
Control flow optimization for supercomputer scalar processing
Control intensive scalar programs pose a very different challenge to highly pipelined supercomputers than vectorizable numeric applications. Function call/return and branch instructions disrupt the flow of instructions through the pipeline, degrading ...
A global resource-constrained parallelization technique
This paper presents a new approach to resource-constrained compiler extraction of fine-grain parallelism, targeted towards VLIW supercomputers, and in particular, the IBM VLIW (Very Large Instruction Word) processor. The algorithms described integrate ...
Array distribution in SUPERB
This paper describes MIMD parallelization in SUPERB. SUPERB is an interactive system for semi-automatic transformation of Fortran 77 programs into parallel programs for the SUPRENUM machine. The main topic of this paper is array distribution as a basis ...
A unified semantic approach for the vectorization and parallelization of generalized reductions
Generalized reductions include some of the most well known programming idioms, for instance loop invariant variables, induction variables and reduction operations. We propose a unified framework that allows the detection of these paradigms and thus the ...
Constraint based vectorization
The constraint tree provides a uniform framework for representing many loop transformations. It allows us to estimate the performance of several alternative execution methods before committing to any of the transformations.
We introduce the constraint ...
Intererence analysis tools for parallelizing programs with recursive data structures
Interference estimation is a useful tool in developing parallel programs and is a key aspect of automatically parallelizing sequential programs. Interference analysis and disambiguation mechanisms for programs with simple data types and arrays have ...
Data dependence analysis on multi-dimensional array references
An efficient and precise data dependence analysis is the key to the success of a parallelizing compiler because it is required in almost all phases of the parallelism detection and enhancement in such compilers. However, existing test algorithms are ...
Interactive conversion of sequential to multitasking FORTRAN
Fully automated compilation of sequential Fortran to efficient multitasking code is impractical; tools need to be developed to aid users in interactively converting sequential to multitasking Fortran. This paper reports on experience using an ...
Alpha du centaur: a prototype environment for the design of parallel regular alorithms
We describe Alpha du Centaur (ADC), a prototype environment for the design of parallel regular algorithms. In ADC, a program is specified using the Alpha language, using system of parameterized linear recurrence equations. The goal of ADC is to make it ...
Parallelizing algorithms for MIMD architectures with shared memory
The solution of a system of linear equations Ax = b is an important application in scientific computation. It arises for the numerical solution of self adjoint problems using finite difference or finite element methods for discretization. For realistic ...
Performance comparisons of Cholesky factorization algorithms using level-2 & 3 BLAS on the national advanced systems AS/XL Vector computer
This paper contains three blocked Cholesky algorithms with calls to standard (level-1, level-2 or level-3) BLAS, one blocked Cholesky algorithm with calls to nonstandard level-2 BLAS and one unblocked Cholesky algorithm. Performance comparisons for ...
Portable and efficient factorization algorithms on the IBM 3090/VF
This paper describes a series of experiments performed with block versions of the LU, Cholesky and QR factorizations using Level 3 BLAS on one processor of the IBM 3090/VF. We show that the LAPACK approach to designing linear algebra software that is ...
A comparison of parallel processing on CRAY X-MP AND IBM 3090 VF multiprocessors
Modern supercomputers like CRAY X-MP and IBM 3090 VF achieve their high computing speed by using both vector and parallel hardware. The available multitasking concepts supporting concurrent execution of tasks within a single application have been ...
Data traffic reduction schemes for Cholesky factorization on asynchronous multiprocessor systems
For multiprocessor systems with two level memory hierarchy; the communication requirements of parallel Cholesky factorization of dense and sparse symmetric, positive definite matrices are analyzed. The data traffic associated with computing the Chloesky ...
Cited By
-
Juang B and Tsuhan Chen (1998). The past, present, and future of speech processing, IEEE Signal Processing Magazine, 10.1109/79.671130, 15:3, (24-48), Online publication date: 1-May-1998.
-
Paul D New developments in the Lincoln stack-decoder based large-vocabulary CSR system 1995 International Conference on Acoustics, Speech, and Signal Processing, 10.1109/ICASSP.1995.479269, 0-7803-2431-5, (45-48)
- Aubert X, Haeb-Umbach R and Ney H Continuous mixture densities and linear discriminant analysis for improved context-dependent acoustic models Proceedings of the 1993 IEEE international conference on Acoustics, speech, and signal processing: speech processing - Volume II, (648-651)
- Huang X (2019). Phoneme classification using semicontinuous hidden Markov models, IEEE Transactions on Signal Processing, 40:5, (1062-1067), Online publication date: 1-May-1992.
-
Paul D (1991). The Lincoln tied-mixture HMM continuous speech recognizer [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing, 10.1109/ICASSP.1991.150343, 0-7803-0003-3, (329-332 vol.1)
-
Wood L, Pearce D and Novello F (1991). Improved vocabulary-independent sub-word HMM modelling [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing, 10.1109/ICASSP.1991.150307, 0-7803-0003-3, (181-184 vol.1)
- Paul D New results with the Lincoln tied-mixture HMM CSR system Proceedings of the workshop on Speech and Natural Language, (65-70)
-
Lee C, Rabiner L, Pieraccini R and Wilpon J Acoustic modeling of subword units for speech recognition International Conference on Acoustics, Speech, and Signal Processing, 10.1109/ICASSP.1990.115885, , (721-724)
-
Huang X, Lee K and Hon H On semi-continuous hidden Markov modeling International Conference on Acoustics, Speech, and Signal Processing, 10.1109/ICASSP.1990.115853, , (689-692)
-
Russel M, Ponting K, Peeling S, Browning S, Bridle J, Moore R, Galiano I and Howell P The ARM continuous speech recognition system International Conference on Acoustics, Speech, and Signal Processing, 10.1109/ICASSP.1990.115539, , (69-72)
- Paul D The Lincoln tied-mixture HMM continuous speech recognizer Proceedings of the workshop on Speech and Natural Language, (332-336)
- Russell M and Ponting K Recent results from the ARM continuous speech recognition project Proceedings of the workshop on Speech and Natural Language, (397-402)
Index Terms
- Proceedings of the 3rd international conference on Supercomputing