research-article

Applications considerations in the system design of highly concurrent multiprocessors

Author:

S. F. LundstromAuthors Info & Claims

IEEE Transactions on Computers, Volume 36, Issue 11

Pages 1292 - 1309

https://doi.org/10.1109/TC.1987.5009469

Published: 01 November 1987 Publication History

Abstract

A five-year series of studies, which ended in 1982 and which was supported in part by NASA and in part by Burroughs Corporation, led to the system design of a very large, very high-speed multiprocessor. This system was intended to solve large scientific problems, especially modeling problems such as those in computational aerodynamics. The performance objective was to sustain execution rates up to one billion floating-point operations per second with problems requiring 40 million words of main memory. The viability of this design depended on an in-depth understanding of the projected applications of the system. An overview of the project objectives and the resulting 128 processor design will be presented showing the local private memories available to each processor, the 64 million word shared memory, the dual-omega interconnection network, and the important programming concepts. During the design of the system, studies were conducted which determined the number of processors (a tradeoff with individual processor speed), the memory organization (program and data, private and shared), and the structure of the networks used to interconnect the processor and memory resources. These studies and the important application-related considerations are presented. Although this system was never constructed and tested, it was extensively simulated and the design was completed to sufficient detail to develop a reasonably accurate parts list and implementation plan.

References

[1]

Burroughs corp., Federal and Special Systems, Final Report. Numerical Aerodynamic Simulation Facility Preliminary Study, NASA CR-152060, CR-152061, and CR-152062, Paoli, PA, 1977.

[2]

Burroughs corp., Federal and Special Systems, Final Report, Numerical Aerodynamic Simulation Facility Preliminary Study Extension, NASA CR-152106 and CR-152107, Paoli, PA, 1978.

[3]

Burroughs corp., Federal and Special Systems, Final Report, Numerical Aerodynamic Simulation Facility Feasibility Study, NASA CR-152284 and CR-152285, Paoli, PA, 1979.

[4]

Burroughs corp., Federal and Special Systems, Numerical Aerodynamic Simulator Processing System, Final Report--System Design Study, NASA CR 166133, Paoli, PA, 1981.

[5]

G. H. Barnes, S. F. Lundstrom, and P. E. Shafer, "Array processor architecture," U.S. Patent 4 412 303, Oct. 25, 1983.

[6]

G. H. Barnes and S. F. Lundstrom, "Design and validation of a connection network for many-processor multiprocessor systems," IEEE Computer, vol. 14, pp. 31-41, Dec. 1981.

Digital Library

[7]

G. H. Barnes, S. F. Lundstrom, and P. E. Shafer, "Array processor architecture connection network," U.S. Patent 4 365 292, Dec. 21, 1982.

[8]

S. F. Lundstrom, "A decentralized control, highly concurrent multiprocessor," in Conf. Proc. 12th Annu. Int. Symp. Comput. Architecture, June 1985, pp. 145-151.

[9]

H. F. Jordan, "Structuring parallel algorithms in an MIMD, shared memory environment," Parallel Processing, vol. 3, pp. 93-110, May 1986.

Digital Library

[10]

J. T. Schwartz, "Ultracomputers," ACM Trans. Programming Languages Syst., vol. 2, pp. 484-521, Oct. 1980.

Digital Library

[11]

A. Gottlieb, R. Grishman, C. P. Kruskal, K. P. McAuliffe, L. Rudolph, and M. Snir, "The NYU Ultracomputer--Designing a MIMD shared memory parallel computer," IEEE Trans. Comput., vol. C-32, pp. 175-189, Feb. 1983.

[12]

A. Gottlieb, "A historical guide to the ultracomputer literature," Ultracomputer Note 36, Dep. Comput. Sci., Courant Instit. Math. Sci., New York University, NY, 1981.

[13]

G. S. Almasi, "Overview of parallel processing," Parallel Comput., vol. 2, pp. 191-203, Nov. 1985.

[14]

D. J. Kuck, P. P. Budnik, S. C. Chen, D. H. Lawrie, R. A. Towle, R. E. Strebendt, E. W. Davis, Jr., J. Han, P. W. Kraska, and Y. Muraoka, "Measurements of parallelism in ordinary Fortran programs," IEEE Computer, vol. 7, pp. 37-46, Jan. 1974.

[15]

G. M. Amdahl, "Validity of the single processor approach to achieving large scale computing capabilities," in Proc. 1967 Spring Joint Comput. Conf., 1967, pp. 483-485.

[16]

G. F. Pfister, W. C. Brantley, D. A. George, S. L. Harvey, W. J. Kleinfelder, K. P. McAuliffe, E. A. Melton, V. A. Norton, and J. Weiss, "The IBM research parallel processor prototype (RP3): Introduction and architecture," in Proc. 1985 Int. Conf. Parallel Processing, Aug. 1985, pp. 764-771.

[17]

H. Sullivan, T. Bashkow, and D. Klappholtz, "A large scale homogeneous fully distributed parallel machine," in Proc. Fourth Symp. Comput. Architecture, 1977.

[18]

C. P. Kruskal, L. Rudolph, and M. Snir, "Efficient synchronization on multiprocessors with shared memory," in Proc. 5th ACM SIGACT-SIGOPS Symp. Principles Distributed Comput., Aug. 1986.

[19]

D. Potter, Computational Physics. New York: Wiley, 1973.

[20]

L. Lapidus and G. E. Pinder, Numerical Solution of Partial Differential Equations in Science and Engineering. New York: Wiley, 1982.

[21]

R. Peyret and T. D. Taylor, Computational Methods for Fluid Flow. New York: Springer-Verlag, 1983.

[22]

R. Hiromoto, "Some issues in parallel processsing as encountered on the Denelcor HEP," Parallel Processing, vol. 3, pp. 111-127, May 1986.

Digital Library

[23]

A. Brandt, "Multi-level adaptive solutions to boundary-value problems," Math. Comput., vol. 31, pp. 333-390, 1977.

[24]

D. H. Lawrie, "Access and alignment of data in an array processor," IEEE Trans. Comput., vol. C-24, pp. 1145-1155, 1975.

[25]

C. J. Tomlinson, "Computer memory system employing a power-of-two memory modules," U.S. Patent 4 400 768, Aug. 23, 1983.

[26]

G. F. Pfister and V. A. Norton, "Hot spot contention and combining in multistage interconnection networks," IEEE Trans. Comput., vol. C- 34, pp. 934-948, Oct. 1985.

[27]

M. Kumar and G. F. Pfister, "The onset of hot spot contention," in Proc. 1986 Int. Conf. Parallel Processing, pp. 28-34, Aug. 1986.

Cited By

Fang ZZhang LCarter JParker M(2004)Scalable barrier synchronisation for large-scale shared-memory multiprocessorsInternational Journal of High Performance Computing and Networking10.1504/IJHPCN.2004.0075631:1-3(33-42)Online publication date: 1-Aug-2004
https://dl.acm.org/doi/10.1504/IJHPCN.2004.007563
Shang SHwang K(1995)Distributed Hardwired Barrier Synchronization for Scalable Multiprocessor ClustersIEEE Transactions on Parallel and Distributed Systems10.1109/71.3880406:6(591-605)Online publication date: 1-Jun-1995
https://dl.acm.org/doi/10.1109/71.388040
Okeefe MDietz H(1995)Static Barrier MIMDJournal of Parallel and Distributed Computing10.1006/jpdc.1995.103525:2(126-132)Online publication date: 1-Mar-1995
https://dl.acm.org/doi/10.1006/jpdc.1995.1035
Show More Cited By

Index Terms

Applications considerations in the system design of highly concurrent multiprocessors

Recommendations

The NYU Ultracomputer Designing an MIMD Shared Memory Parallel Computer

We present the design for the NYU Ultracomputer, a shared-memory MIMD parallel machine composed of thousands of autonomous processing elements. This machine uses an enhanced message switching network with the geometry of an Omega-network to approximate ...
Design Considerations for Shared Memory Multiprocessor Message Systems

The comparative performance is studied of different message passing system designsexperimentally on a shared memory Encore Multimax multiprocessor. The systems aremeasured both by benchmarks and by running example parallel applications. To act as ...
Mapping Techniques for Parallel Evaluation of Chains of Recurrences
IPPS '96: Proceedings of the 10th International Parallel Processing Symposium

If it is assumed, for discussion's sake, that "massive" means at least one thousand, there are probably at most one or two hundred massively parallel machines (i.e, machines with at least one thousand processors). This panel will focus on the ...

Comments

Information & Contributors

Information

Published In

cover image IEEE Transactions on Computers

IEEE Transactions on Computers Volume 36, Issue 11

Nov. 1987

136 pages

ISSN:0018-9340

Editor:
Bruce D. Shriver

Issue’s Table of Contents

Copyright © 1987.

Publisher

IEEE Computer Society

United States

Publication History

Published: 01 November 1987

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

12
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 10 Aug 2024

Other Metrics

View Author Metrics

Citations

Cited By

Fang ZZhang LCarter JParker M(2004)Scalable barrier synchronisation for large-scale shared-memory multiprocessorsInternational Journal of High Performance Computing and Networking10.1504/IJHPCN.2004.0075631:1-3(33-42)Online publication date: 1-Aug-2004
https://dl.acm.org/doi/10.1504/IJHPCN.2004.007563
Shang SHwang K(1995)Distributed Hardwired Barrier Synchronization for Scalable Multiprocessor ClustersIEEE Transactions on Parallel and Distributed Systems10.1109/71.3880406:6(591-605)Online publication date: 1-Jun-1995
https://dl.acm.org/doi/10.1109/71.388040
Okeefe MDietz H(1995)Static Barrier MIMDJournal of Parallel and Distributed Computing10.1006/jpdc.1995.103525:2(126-132)Online publication date: 1-Mar-1995
https://dl.acm.org/doi/10.1006/jpdc.1995.1035
O'Keefe MDietz H(1993)Loop Coalescing and Scheduling for Barrier MIMD ArchitecturesIEEE Transactions on Parallel and Distributed Systems10.1109/71.2435314:9(1060-1064)Online publication date: 1-Sep-1993
https://dl.acm.org/doi/10.1109/71.243531
Feldmann AGross TO'Hallaron DStricker TSnyder L(1992)Subset barrier synchronization on a private-memory parallel systemProceedings of the fourth annual ACM symposium on Parallel algorithms and architectures10.1145/140901.140923(209-218)Online publication date: 1-Jun-1992
https://dl.acm.org/doi/10.1145/140901.140923
Goodman JVernon MWoest PEmer JHennessy J(1989)Efficient synchronization primitives for large-scale cache-coherent multiprocessorsProceedings of the third international conference on Architectural support for programming languages and operating systems10.1145/70082.68188(64-75)Online publication date: 1-Apr-1989
https://dl.acm.org/doi/10.1145/70082.68188
Goodman JVernon MWoest P(1989)Efficient synchronization primitives for large-scale cache-coherent multiprocessorsACM SIGARCH Computer Architecture News10.1145/68182.6818817:2(64-75)Online publication date: 1-Apr-1989
https://dl.acm.org/doi/10.1145/68182.68188
Sohi GSmith JGoodman JPaul GPapatheodorou TGannon DPudue E(1989)Restricted Fetch and Φ operations for parallel processingProceedings of the 3rd international conference on Supercomputing10.1145/318789.318872(410-416)Online publication date: 1-Jun-1989
https://dl.acm.org/doi/10.1145/318789.318872
Goodman JWoest PSiegel H(1988)The Wisconsin multicube: a new large-scale cache-coherent multiprocessorProceedings of the 15th Annual International Symposium on Computer architecture10.5555/52400.52447(422-431)Online publication date: 1-Jun-1988
https://dl.acm.org/doi/10.5555/52400.52447
Goodman JWoest P(1988)The Wisconsin multicube: a new large-scale cache-coherent multiprocessorACM SIGARCH Computer Architecture News10.1145/633625.5244716:2(422-431)Online publication date: 17-May-1988
https://dl.acm.org/doi/10.1145/633625.52447
Show More Cited By

View Options

View options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Issue’s Table of Contents