Article

Free access

Integration of message passing and shared memory in the Stanford FLASH multiprocessor

Authors:

Kourosh Gharachorloo,

Anoop GuptaAuthors Info & Claims

ASPLOS VI: Proceedings of the sixth international conference on Architectural support for programming languages and operating systems

Pages 38 - 50

https://doi.org/10.1145/195473.195494

Published: 01 November 1994 Publication History

Abstract

The advantages of using message passing over shared memory for certain types of communication and synchronization have provided an incentive to integrate both models within a single architecture. A key goal of the FLASH (FLexible Architecture for SHared memory) project at Stanford is to achieve this integration while maintaining a simple and efficient design. This paper presents the hardware and software mechanisms in FLASH to support various message passing protocols. We achieve low overhead message passing by delegating protocol functionality to the programmable node controllers in FLASH and by providing direct user-level access to this messaging subsystem. In contrast to most earlier work, we provide an integrated solution that handles the interaction of the messaging protocols with virtual memory, protected multiprogramming, and cache coherence. Detailed simulation studies indicate that this system can sustain message-transfers rates of several hundred megabytes per second, effectively utilizing projected network bandwidths for next generation multiprocessors.

References

[1]

Anant Agarwal, David Chaiken, Godfrey D'Souza, Kirk Johnson, David Kranz, John Kubiatowicz, Kiyoshi Kurihara, Beng-Hong Lim, Gino Maa, Dan Nussbaum, Mike Parkin, and Donald Yeung. The MIT Alewife machine: A large scale distributed-memory multiprocessor. In Proceedings of the Workshop on Scalable Shared Memory Multiprocessors. Kluwer Academic Publishers, 1991. This paper also appears as MIT/LCS Memo TM-454, 1991.

Digital Library

[2]

Michael J. Beckerle. An overview of the START(*T) computer system. Motorola Technical Report MCRC-TR-28, Motorola, Inc., One Kendall Square, Building 200, Cambridge, MA 02139, July 1992.

[3]

Matthias Blumrich, Kai Li, Richard Alpert, Cezary Dubnicki, Edward Felten, and Jonathan Sandberg. Virtual memory mapped network interface for the SHRIMP multicomputer. In Proceedings of the 21 st International Symposium on Computer Architecture, pages 142-153, April 1994.

Digital Library

[4]

Cray Research, Inc. Cray T3D System Architecture, 1993.

[5]

W. Dally, J. Fiske, J. Keen, R. Lethin, M. Noakes, P. Nuth, R. Davison, and G. Fyler. The message-driven processor: A multicomputer processing node with efficient mechanisms. IEEE Mtcro, 12(2):23- 39, 1992.

Digital Library

[6]

Stephen Goldschmidt. Simulatton of Muttiprocessors: Accuracy and Performance. PhD thesis, Stanford University, June 1993.

Digital Library

[7]

John Heinlein, Kourosh Gharachorloo, and Anoop Gupta. Integrating multiple communication paradigms in high performance multiprocessors. Technical Report CSL-TR-94-604, Stanford University, Computer Systems Laboratory, February 1994.

Digital Library

[8]

Mark Heinrich, Jeffrey Kuskin, David Ofelt, John Heinlein, Joel Baxter, Jaswinder Pal Singh, Richard Simoni, Kourosh Gharachorloo, David Nakahira, Mark Horowitz, Anoop Gupta, Mendel Rosenblum, and John Hennessy. The performance impact of flexibihty in the Stanford FLASH Multlprocessor. in Proceedings of the Stxth International Conference on Architectural Support for Programming Languages and Operating Systems, October 1994.

Digital Library

[9]

Dana S. Henry and Christopher F. Joerg. A tightly coupled processor-network interface. In Proceedings of the Fifth International Conference on Architectural Support for Programming Languages and Operating Systems, pages 111-122, September 1992.

Digital Library

[10]

Mark Homewood and Moray McLaren. Meiko CS-2 interconnect Elan-Elite design. In Proceedings of Hot Interconnects 93, August 1993.

[11]

Intel Corporation. Paragon XP/S Product Overview, 1991.

[12]

David Kranz, Kirk Johnson, Anant Agarwal, John Kubiatowicz, and Beng-Hong Lim. Integrating message passing and sharedmemory: Early experience. In Proceedings of the 4th A CM SIG- PLAN Symposium on Principles and Practices of Parallel Programming, pages 54-63, May 1993.

Digital Library

[13]

John Kubiatowicz and Anant Agarwal. Anatomy of a message in the Alewife multiprocessor. In Proceedings of the 7th A CM international Conference on Supercomputing, July 1993.

Digital Library

[14]

Jeffrey Kuskin, David Ofelt, Mark Heinrich, John Heinlein, Richard Simoni, Kourosh Gharachofioo, John Chapin, Davld Nakahira, Joel Baxter, Mark Horowitz, Anoop Gupta, Mendel Rosenblum, and John Hennessy. The Stanford FLASH Muluprocessor, in Proceedings of the 21st international Symposium on Computer Architecture, pages 302-313, April 1994.

Digital Library

[15]

Message Passing Interface Forum. Document for a standard message-passing interface. Technical Report No. CS-93-214, University of Tennessee, November 1993.

Digital Library

[16]

Rishiyur Nikhil, Gregory M. Papadopoulos, and Arvind. *T: A multlthreaded massively parallel architecture. In Proceedings of the I 9th international Symposium on Computer Archttecture, pages 156-167, May 1992.

Digital Library

[17]

Paul Pierce, Intel Supercomputer Systems Division, November 1993. Personal Communication.

[18]

Paul Pierce. The NX/2 operating system. In G. Fox, editor, Proceedings of the Third Conference on Hypercube Concurrent Computers and AppIicattons, volume 1 of 2, pages 384-390, 1988.

Digital Library

[19]

Steven K. Reinhardt, James R. Larus, and David A. Wood. Tempest and Typhoon: User-level shared memory. In Proceedings of the 21st International Symposium on Computer Architecture, pages 325-336, April 1994.

Digital Library

[20]

Michael David Smith. Support for Speculative Execution in High- Performance Processors. PhD thesis, Stanford University, November 1992. Tech. Report CSL-TR-93-556.

Digital Library

[21]

Thmking Machines Corporation. The Connection Machine CM-5 Technical Summary, 1991.

[22]

Thinking Machines Corporation. Programming the NI, March 1992.

[23]

Thorsten von Emken, David E. Culler, Seth Copen Goldstein, and Klaus Erik Schauser. Active messages: A mechanism for integrated communication and computation. In Proceedings of the I9th International Symposium on Computer Archttecture, pages 256-266, May 1992.

Digital Library

Cited By

LeBeane MPotter BPan ADutu AAgarwala VLee WMajeti DGhimire BVan Tassell EWasmundt SBenton BBreternitz MChu MThottethodi MJohn LReinhardt SWest J(2016)Extended task queuingProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.5555/3014904.3015012(1-12)Online publication date: 13-Nov-2016
https://dl.acm.org/doi/10.5555/3014904.3015012
Asmussen NVölp MNöthen BHärtig HFettweis G(2016)M3ACM SIGARCH Computer Architecture News10.1145/2980024.287237144:2(189-203)Online publication date: 25-Mar-2016
https://dl.acm.org/doi/10.1145/2980024.2872371
Asmussen NVölp MNöthen BHärtig HFettweis G(2016)M3ACM SIGOPS Operating Systems Review10.1145/2954680.287237150:2(189-203)Online publication date: 25-Mar-2016
https://doi.org/10.1145/2954680.2872371
Show More Cited By

Index Terms

Integration of message passing and shared memory in the Stanford FLASH multiprocessor

Recommendations

Integration of message passing and shared memory in the Stanford FLASH multiprocessor

The advantages of using message passing over shared memory for certain types of communication and synchronization have provided an incentive to integrate both models within a single architecture. A key goal of the FLASH (FLexible Architecture for SHared ...
The performance impact of flexibility in the Stanford FLASH multiprocessor

A flexible communication mechanism is a desirable feature in multiprocessors because it allows support for multiple communication protocols, expands performance monitoring capabilities, and leads to a simpler design and debug process. In the Stanford ...
An Architecture for High-Performance Scalable Shared-Memory Multiprocessors Exploiting On-Chip Integration

Recent technology improvements allow multiprocessor designers to put some key components inside the processor chip, such as the memory controller, the coherence hardware, and the network interface/router. In this paper, we exploit such integration scale,...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ASPLOS VI: Proceedings of the sixth international conference on Architectural support for programming languages and operating systems

November 1994

341 pages

ISBN:0897916603

DOI:10.1145/195473

Chairmen:
Forest Baskett
Silicon Graphics
,
Douglas Clark
Princeton Univ.

ACM SIGOPS Operating Systems Review Volume 28, Issue 5
Dec. 1994
323 pages
ISSN:0163-5980
DOI:10.1145/381792
Chairman:
Henry M. Levy
Univ. of Washington, Seattle
Issue’s Table of Contents
ACM SIGPLAN Notices Volume 29, Issue 11
Nov. 1994
323 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/195470
Editor:
Richard L. Wexelblat
Washington D.C.
Issue’s Table of Contents

Copyright © 1994 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 November 1994

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Article

Conference

ASPLOS94

Sponsor:

ASPLOS94: 6th Conference on Architectural Support of Programming Languages & Operating Systems

October 5 - 7, 1994

California, San Jose, USA

Acceptance Rates

Overall Acceptance Rate 535 of 2,713 submissions, 20%

Upcoming Conference

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

89
Total Citations
View Citations
934
Total Downloads

Downloads (Last 12 months)171
Downloads (Last 6 weeks)36

Reflects downloads up to 26 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

LeBeane MPotter BPan ADutu AAgarwala VLee WMajeti DGhimire BVan Tassell EWasmundt SBenton BBreternitz MChu MThottethodi MJohn LReinhardt SWest J(2016)Extended task queuingProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.5555/3014904.3015012(1-12)Online publication date: 13-Nov-2016
https://dl.acm.org/doi/10.5555/3014904.3015012
Asmussen NVölp MNöthen BHärtig HFettweis G(2016)M3ACM SIGARCH Computer Architecture News10.1145/2980024.287237144:2(189-203)Online publication date: 25-Mar-2016
https://dl.acm.org/doi/10.1145/2980024.2872371
Asmussen NVölp MNöthen BHärtig HFettweis G(2016)M3ACM SIGOPS Operating Systems Review10.1145/2954680.287237150:2(189-203)Online publication date: 25-Mar-2016
https://doi.org/10.1145/2954680.2872371
Asmussen NVölp MNöthen BHärtig HFettweis G(2016)M3ACM SIGPLAN Notices10.1145/2954679.287237151:4(189-203)Online publication date: 25-Mar-2016
https://dl.acm.org/doi/10.1145/2954679.2872371
Asmussen NVölp MNöthen BHärtig HFettweis GConte TZhou Y(2016)M3Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/2872362.2872371(189-203)Online publication date: 25-Mar-2016
https://dl.acm.org/doi/10.1145/2872362.2872371
LeBeane MPotter BPan ADutu AAgarwala VLee WMajeti DGhimire BTassell EWasmundt SBenton BBreternitz MChu MThottethodi MJohn LReinhardt S(2016)Extended Task Queuing: Active Messages for Heterogeneous SystemsSC16: International Conference for High Performance Computing, Networking, Storage and Analysis10.1109/SC.2016.79(933-944)Online publication date: Nov-2016
https://doi.org/10.1109/SC.2016.79
Daglis ANovaković SBugnion EFalsafi BGrot B(2015)Manycore network interfaces for in-memory rack-scale computingACM SIGARCH Computer Architecture News10.1145/2872887.275041543:3S(567-579)Online publication date: 13-Jun-2015
https://dl.acm.org/doi/10.1145/2872887.2750415
Harting RDally W(2015)On-Chip Active Messages for Speed, Scalability, and EfficiencyIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2014.230787426:2(507-515)Online publication date: Feb-2015
https://doi.org/10.1109/TPDS.2014.2307874
Novakovic SDaglis ABugnion EFalsafi BGrot B(2014)Scale-out NUMAACM SIGARCH Computer Architecture News10.1145/2654822.254196542:1(3-18)Online publication date: 24-Feb-2014
https://dl.acm.org/doi/10.1145/2654822.2541965
Novakovic SDaglis ABugnion EFalsafi BGrot B(2014)Scale-out NUMAACM SIGPLAN Notices10.1145/2644865.254196549:4(3-18)Online publication date: 24-Feb-2014
https://dl.acm.org/doi/10.1145/2644865.2541965
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents