Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article
Free access

A comparison of architectural support for messaging in the TMC CM-5 and the Cray T3D

Published: 01 May 1995 Publication History

Abstract

Programming models based on messaging continue to be an important programming model for parallel machines. Messaging costs are strongly influenced by a machine's network interface architecture. We examine the impact of architectural support for messaging in two machines --- the TMC CM-5 and the Cray T3D --- by exploring the design and performance of several messaging implementations. The additional features in the T3D support remote operations: memory access, fetch-and-increment, atomic swaps, and prefetch.Experiments on the CM-5 show that requiring processor involvement for message reception can increase the communication overheads from 60% to 300% for moderate variations in computation grain size at the destination. In contrast, the T3D hardware for remote operations decouples message reception from processor activity, producing high-performance messaging independent of computation grain size or variability.In addition, hardware support for a shared address space in the T3D can be used to solve the output contention problem (output hot spots), producing messaging implementations that are robust over a wide variety of traffic patterns. Atomic swap hardware can be used to build a distributed message queue, enabling a "pull" messaging scheme where the destination requests data transfer upon receive. This scheme uses prefetches to mask receive latency. While this yields performance robust over output contention, its base cost is competitive only for small messages (up to 64 bytes) because of the high cost of issuing and resolving prefetches in the T3D. Emulation shows that if the interaction costs can be reduced by a factor of eight (250ns to 31ns), perhaps by moving the prefetch queue on chip, and there is a corresponding increase in the prefetch queue size, the pull scheme can give superior performance in all eases.

References

[1]
S. Borkar, R. Cohn, G. Cox, T. Gross, H. T. Kung et al. Supporting systolic and memory communication in iWarp. In Proceedings of the 17th International Symposium on Computer Architecture, pages 70-81. IEEE Computer Society, 1990.]]
[2]
Eric A. Brewer and Bradley C. Kuszmaul. How to get good performance from the CM-5 data network. In Proceedings of the international Parallel Processing Symposium, pages 858-867, 1994.]]
[3]
H. Levy C. A. Thekkath and E. D. Lazowska. Separating data and control transfer in distributed operating systems. In Proceedings of the Sixth Symposium on Architectural Support for Programming Languages and Operating Systems (ASPLOS.VI), 1994.]]
[4]
K. Mani Chandy and Carl Kesselman. Compositional C++: Compositional parallel programming. In Proceedings of the Fifth Workshop on Compilers and Languages for Parallel Computing, New Haven, Connecticut, 1992.]]
[5]
Andrew Chien, Vijay Karamcheti, and John Plevyak. The concert system - compiler and runtime support for efficient fine-grained concurrent object-oriented programs. Technical Report UIUCDCS-R- 93-1815, Department of Computer Science, University of Illinois, Urbana, Illinois, June 1993.]]
[6]
Andrew A. Chien. ConcurrentAggregates: Supporting Modularity in Massively-Parallel Programs. MIT Press, CambfiLdge, MA, 1993.]]
[7]
S. Chittor and R. Enbody. Performance evaluation of mesh-connected wormhole-routed networks for interprocessor communication in multicomputers. In Proceedings of Supercomputing, pages 647-56,1990.]]
[8]
Cray Research, Inc. Cray T3D System Architecture Overview, 1993.]]
[9]
David Culler, Anurag Sah, Klaus Erik Schauser, Thorsten von Eicken, and John Wawrzynek. Fine-grain parallelism with minimal hardware support: A compiler-controlled threaded abstract machine. In Proceedings of the Fourth International Conference on Architectural Support for Programming Languages an Operating Systems, pages 164-75, 1991.]]
[10]
William J. Dally et al. The J-Machine: A fine-grain concurrent computer. In Information Processing 89, Proceedings of the IFIP Congress, pages 1147-1153, August 1989.]]
[11]
Peter Druschel and Larry L. Peterson. Fbufs: A high-bandwidth cross-domain transfer facility. In Proceedings of Fourteenth ACM Symposium on Operating Systems Principles, pages 189-202.]]
[12]
Message Passing Interface Forum. The MPI message passing interface standard. Technical report, University of Tennessee, Knoxville, 1994.]]
[13]
G. Geist and V. Sunderam. The pvm system: Super computer level concurrent computation on a heterogeneous network of workstations. In Proceedings of the Sixth Distributed Memory Computers Conference, pages 258-61, 1991.]]
[14]
D. S. Henry and C. E Joerg. A tightly-coupled processor-network interface. In Proceedings of the Fifth International Conference on Architectural Support for Programming Languages an Operating Systems, pages 111-122, 1992.]]
[15]
R. W. Hockney and E. A. Carmona. Comparison of communication on the Intel iPSC/860 and Touchstone Delta. Parallel Computing, (18):1067-1072, 1992.]]
[16]
Vijay Karamcheti and Andrew Chien. Concert- Efficient runtime support for concurrent object-oriented programming languages on stock hardware. In Proceedings of Supercomputing' 93, 1993.]]
[17]
Vijay Karamcheti and Andrew Chien. Software overhead in messaging layers: Where does the time go? In Proceedings of the Sixth Symposium on Architectural Support for Programml~ng Languages and Operating Systems (ASPLOS-VI), 1994.]]
[18]
R. Metcalfe and D. Boggs. Ethernet: Distributed packet-switching for local computer networks. Communications of the Association for Computing Machinery, 19(7):395-404, 1976.]]
[19]
G. E Pfister and V. A. Norton. Hot spot contention and combining in multistage interconnecfion networks. IEEE Transactions on Computers, C-34(10):943-948, October 1985.]]
[20]
R. Ponnusamy, R. Thakur, A. Choudhary, and G. Fox. Scheduling regular and irregular communication patterns on the CM-5. In Supercomputing '92, pages 394-402, 1992.]]
[21]
Thinking Machines Corporation. The Connection Machine CM-5 Technical Summary, October 199 i.]]
[22]
T. yon Eicken, D. Culler, S. Goldstein, and K. Schauser. Active Messages: a mechanism for integrated communication and computation. In Proceedings of the International Symposium on Computer Architecture, 1992.]]
[23]
Thorsten von Eicken and David E. Culler. Building Communication Paradigms with the CM-5 Active Message layer (CMAM). University of California, Berkeley, 2.4 edition, September 1992.]]
[24]
Colin Whitby-Strevens. The Transputer. In Proceedings of 12th International Symposium on Computer Architecture, 1985.]]

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM SIGARCH Computer Architecture News
ACM SIGARCH Computer Architecture News  Volume 23, Issue 2
Special Issue: Proceedings of the 22nd annual international symposium on Computer architecture (ISCA '95)
May 1995
412 pages
ISSN:0163-5964
DOI:10.1145/225830
Issue’s Table of Contents
  • cover image ACM Conferences
    ISCA '95: Proceedings of the 22nd annual international symposium on Computer architecture
    July 1995
    426 pages
    ISBN:0897916980
    DOI:10.1145/223982
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 May 1995
Published in SIGARCH Volume 23, Issue 2

Check for updates

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)58
  • Downloads (Last 6 weeks)12
Reflects downloads up to 01 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2018)xBGASProceedings of the Workshop on Memory Centric High Performance Computing10.1145/3286475.3286478(22-26)Online publication date: 11-Nov-2018
  • (2006)Distributed computing using JavaJournal of Systems Architecture: the EUROMICRO Journal10.1016/j.sysarc.2006.02.00152:7(432-440)Online publication date: 1-Jul-2006
  • (1999)Architectural Support and Mechanisms for Object Caching in Dynamic Multithreaded ComputationsJournal of Parallel and Distributed Computing10.1006/jpdc.1999.155558:2(260-300)Online publication date: 1-Aug-1999
  • (2021)xBGAS: A Global Address Space Extension on RISC-V for High Performance Computing2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS49936.2021.00054(454-463)Online publication date: May-2021
  • (2011)The future of microprocessorsCommunications of the ACM10.1145/1941487.194150754:5(67-77)Online publication date: 1-May-2011
  • (2008)Receiver-initiated message passing over RDMA Networks2008 IEEE International Symposium on Parallel and Distributed Processing10.1109/IPDPS.2008.4536262(1-12)Online publication date: Apr-2008
  • (2002)A lightweight idempotent messaging protocol for faulty networksProceedings of the fourteenth annual ACM symposium on Parallel algorithms and architectures10.1145/564870.564912(248-257)Online publication date: 10-Aug-2002
  • (2002)An Advanced Compiler Framework for Non-Cache-Coherent MultiprocessorsIEEE Transactions on Parallel and Distributed Systems10.1109/71.99320513:3(241-259)Online publication date: 1-Mar-2002
  • (2001)LoGPCIEEE Transactions on Parallel and Distributed Systems10.1109/71.92058912:4(404-415)Online publication date: 1-Apr-2001
  • (1999)An evaluation of message passing implementations on Beowulf workstations1999 IEEE Aerospace Conference. Proceedings (Cat. No.99TH8403)10.1109/AERO.1999.790188(41-54 vol.5)Online publication date: 1999
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media