Article

Free access

Multi-protocol active messages on a cluster of SMP's

Authors:

Steven S. Lumetta,

Alan M. Mainwaring,

David E. CullerAuthors Info & Claims

SC '97: Proceedings of the 1997 ACM/IEEE conference on Supercomputing

Pages 1 - 22

https://doi.org/10.1145/509593.509596

Published: 15 November 1997 Publication History

Abstract

Clusters of multiprocessors, or Clumps, promise to be the supercomputers of the future, but obtaining high performance on these architectures requires an understanding of interactions between the multiple levels of interconnection. In this paper, we present the first multi-protocol implementation of a lightweight message layer---a version of Active Messages-II running on a cluster of Sun Enterprise 5000 servers connected with Myrinet. This research brings together several pieces of high-performance interconnection technology: bus backplanes for symmetric multiprocessors, low-latency networks for connections between machines, and simple, user-level primitives for communication. The paper describes the shared memory message-passing protocol and analyzes the multi-protocol implementation with both microbenchmarks and Split-C applications. Three aspects of the communication layer are critical to performance: the overhead of cache-coherence mechanisms, the method of managing concurrent access, and the cost of accessing state with the slower protocol. Through the use of an adaptive polling strategy, the multi-protocol implementation limits performance interactions between the protocols, delivering up to 160 MB/s of bandwidth with 3.6 microsecond end-to-end latency. Applications within an SMP benefit from this fast communication, running up to 75% faster than on a network of uniprocessor workstations. Applications running on the entire Clump are limited by the balance of NIC's to processors in our system, and are typically slower than on the NOW. These results illustrate several potential pitfalls for the Clumps architecture.

References

[1]

A. Alexandrov, M. Ionescu, K. E. Schauser, C. Scheiman, "LogGP: Incorporating Long Messages into the LogP Model---One Step Closer Towards a Realistic Model for Parallel Computation," 7th Annual Symposium on Parallel Algorithms and Architectures, July 1995.]]

Digital Library

[2]

Accelerated Strategic Computing Initiative, a program of the Department of Energy. Information is available via http://www.llnl.gov/asci-alliances/.]]

[3]

D. A. Bader, J. JáJá, "SIMPLE: A Methodology for Programming High Performance Algorithms on Clusters of Symmetric Multiprocessors (SMP's)," preliminary version, May 1997, available via http://www.umiacs.umd.edu/research/EXPAR.]]

Digital Library

[4]

N. J. Boden, D. Cohen, R. E. Felderman, A. E. Kulawik, C. L. Seitz, J. N. Seizovic, W. Su, "Myrinet---A Gigabit-per-Second Local-Area Network," IEEE Micro, Vol. 15, February 1995, pp. 29-38.]]

Digital Library

[5]

E. A. Brewer, B. C. Kuszmaul, "How to Get Good Performance from the CM-5 Data Network," Proceedings of the 8th International Parallel Processing Symposium, April 1994.]]

Digital Library

[6]

R. Butler, E. Lusk, "Monitors, Message, and Clusters: the p4 Parallel Programming System," available via http://www.mcs.anl.gov/home/lusk/p4/p4-paper/paper.html.]]

[7]

B. N. Chun, A. M. Mainwaring, D. E. Culler, "A General-Purpose Protocol Architecture for a Low-Latency, Multi-gigabit System Area Network," Proceedings of Hot Interconnects V, Stanford, California, August 1997.]]

[8]

D. E. Culler, A. Dusseau, S. C. Goldstein, A. Krishnamurthy, S. S. Lumetta, T. von Eicken, K. Yelick, "Parallel Programming in Split-C," Proceedings of Supercomputing 1993, Portland, Oregon, November 1993, pp. 262-73.]]

Digital Library

[9]

D. E. Culler, R. M. Karp, D. A. Patterson, A. Sahay, K. E. Schauser, E. Santos, R. Subramonian, T. von Eicken, "LogP: Towards a Realistic Model of Parallel Computation," Proceedings of the 4th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, San Diego, California, May 1993.]]

Digital Library

[10]

D. E. Culler, L. T. Liu, R. P. Martin, C. O. Yoshikawa, "Assessing Fast Network Interfaces", IEEE Micro, Vol. 16, No. 1, February 1996, pp. 35-43.]]

Digital Library

[11]

S. J. Fink, S. B. Baden, "Non-Uniform Partitioning of Finite Difference Methods Running on SMP Clusters," submitted for publication, available via http://www-cse.ucsd.edu/users/baden/MT.html.]]

[12]

S. J. Fink, S. B. Baden, "Runtime Support for Multi-Tier Programming of Block-Structured Applications on SMP Clusters," submitted for publication, available via http://www-cse.ucsd.edu/users/baden/MT.html.]]

[13]

I. Foster, C. Kesselman, S. Tuecke, "The Nexus Approach to Integrating Multithreading and Communication," Journal of Parallel and Distributed Computing, Vol. 37, August 1996, pp. 70-82.]]

Digital Library

[14]

I. Foster, J. Geisler, C. Kesselman, S. Tuecke, "Managing Multiple Communication Methods in High-Performance Networked Computing Systems," Journal of Parallel and Distributed Computing, Vol. 40, January 1997, pp. 35-48.]]

Digital Library

[15]

W. W. Gropp, E. L. Lusk, "A Taxonomy of Programming Models for Symmetric Multiprocessors and SMP clusters," Proceedings of Programming Models for Massively Parallel Computers 1995, October 1995, pp. 2-7.]]

Digital Library

[16]

M. Haines, D. Cronk, P. Mehrotra, "On the Design of Chant: A Talking Threads Package," Proceedings of Supercomputing 1994, Washington, D.C., November 1994, pp. 350-9.]]

[17]

D. Jiang, H. Shan, J. P. Singh, "Application Restructuring and Performance Portability on Shared Virtual Memory and Hardware-Coherent Multiprocessors," Proceedings of Principles and Practice of Parallel Programming, 1997, pp. 217-29.]]

Digital Library

[18]

B.-H. Lim, P. Heidelberger, P. Pattnaik, M. Snir, "Message Proxies for Efficient, Protected Communication on SMP Clusters," IBM Almaden Research Report #RC 20522 (90972), August 1996.]]

[19]

L. T. Liu, D. E. Culler, "Evaluation of the Intel Paragon on Active Message Communication," Proceedings of Intel Supercomputer Users Group Conference, June 1995, also available via http://now.CS.Berkeley.EDU.]]

[20]

S. S. Lumetta, D. E. Culler, "Managing Concurrent Access for Shared Memory Active Messages," U. C. Berkeley Technical Report in preparation.]]

[21]

A. M. Mainwaring, D. E. Culler, "Active Message Applications Programming Interface and Communication Subsystem Organization," U. C. Berkeley Technical Report #CSD-96-918, October 1996, also available via http://now.CS.Berkeley.EDU.]]

Digital Library

[22]

R. Martin, "HPAM: an Active Message Layer for a Network of HP Workstations," Proceedings of Hot Interconnects II, Stanford, California, August 1994, pp. 40-58.]]

[23]

J. M. Mellor-Crummey, M. L. Scott, "Algorithms for Scalable Synchronization on Shared-Memory Multiprocessors," ACM Transactions on Computer Systems, Vol. 9, No. 1, February 1991, pp. 21-65.]]

Digital Library

[24]

S. S. Mukherjee, M. D. Hill, "A Case for Making Network Interfaces Less Peripheral," Proceedings of Hot Interconnects V, Stanford, California, August 1997.]]

[25]

R. H. Saavedra, "Micro Benchmark Analysis of the KSR1," Proceedings of Supercomputing 1993, Portland, Oregon, November 1993, pp. 202-13.]]

Digital Library

[26]

K. E. Schauser, C. Scheiman, "Experiences with Active Messages on the Meiko CS-2," Proceedings of the 9th International Parallel Processing Symposium, April 1995.]]

Digital Library

[27]

A. Singhal, D. Broniarczyk, F. Cerauskis, J. Price, L. Yuan, C. Cheng, D. Doblar, S. Fosth, N. Agarwal, K. Harvey, E. Hagersten, B. Liencres, "Gigaplane: A High Performance Bus for Large SMPs," Proceedings of Hot Interconnects IV, Stanford, California, August 1996, pp. 41-52]]

[28]

L. Tucker, A. M. Mainwaring, "CMMD: Active Messages on the CM-5," Parallel Computing, Vol. 20, No. 4, August 1994, pp. 481-96.]]

Digital Library

[29]

T. von Eicken, V. Avula, A. Basu, V. Buch, "Low-latency Communication over ATM Networks Using Active Messages," Proceedings of Hot Interconnects II, Stanford, California, August 1994, pp. 60-71.]]

[30]

T. von Eicken, D. E. Culler, S. C. Goldstein, K. E. Schauser, "Active Messages: a Mechanism for Integrated Communication and Computation," in Proceedings of the 19th International Symposium on Computer Architecture, Gold Coast, Qld., Australia, May 1992, pp. 256-66.]]

Digital Library

[31]

P. R. Woodward, "Perspectives on Supercomputing: Three Decades of Change," IEEE Computer, Vol. 29, October 1996, pp. 99-111.]]

Digital Library

[32]

D. Yeung, J. Kubiatowicz, A. Agarwal, "MGS: A Multigrain Shared Memory System," Proceedings of the 23rd International Symposium on Computer Architecture, Philadelphia, Pennsylvania, May 1996, pp. 44-55.]]

Digital Library

Cited By

Pauloski JHayot-Sasson VWard LHudson NSabino CBaughman MChard KFoster IMohror KArnold DBadia R(2023)Accelerating Communications in Federated Applications with Transparent Object ProxiesProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3581784.3607047(1-15)Online publication date: 12-Nov-2023
https://dl.acm.org/doi/10.1145/3581784.3607047
Gu JKumar RLumetta SSun YEl-Shishiny HRohou E(2010)Accelerating data movement on future chip multi-processorsProceedings of the Second International Forum on Next-Generation Multicore/Manycore Technologies10.1145/1882453.1882457(1-12)Online publication date: 19-Jun-2010
https://dl.acm.org/doi/10.1145/1882453.1882457
Buntinas DMercier GGropp W(2007)Implementation and evaluation of shared-memory communication and synchronization operations in MPICH2 using the Nemesis communication subsystemParallel Computing10.1016/j.parco.2007.06.00333:9(634-644)Online publication date: 1-Sep-2007
https://dl.acm.org/doi/10.1016/j.parco.2007.06.003
Show More Cited By

Multi-protocol active messages on a cluster of SMP's

Recommendations

Language support for multi-paradigm and multi-grain parallelism on SMP-Cluster

The characteristics of large-scale parallel applications are multi-paradigm and multi-grain parallel in essence. The key factor in improving the performance of parallel application systems is to determine suitable parallel paradigms and grains according ...
MPI-2 One-Sided Communications on a Giganet SMP Cluster
Proceedings of the 8th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface

We describe and evaluate an implementation of the MPI-2 one-sided communications on a Giganet SMP Cluster. The cluster runs under Linux with our own port of MPICH 1.2.1, a well-known, portable, non-threaded MPI implementation, to the Virtual Interface ...
MPI and OpenMP paradigms on cluster of SMP architectures: the vacancy tracking algorithm for multi-dimensional array transposition
SC '02: Proceedings of the 2002 ACM/IEEE conference on Supercomputing

We investigate remapping multi-dimensional arrays on cluster of SMP architectures under OpenMP, MPI, and hybrid paradigms. Traditional method of array transpose needs an auxiliary array of the same size and a copy back stage. We recently developed an in-...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SC '97: Proceedings of the 1997 ACM/IEEE conference on Supercomputing

November 1997

921 pages

ISBN:0897919858

DOI:10.1145/509593

General Chair:
Dona Crawford

Copyright © 1997 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGARCH: ACM Special Interest Group on Computer Architecture
IEEE-CS: Computer Society

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 November 1997

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Article

Conference

SC '97

Sponsor:

SIGARCH
IEEE-CS

SC '97: International Conference for High Performance Computing, Networking, Storage and Analysis

November 15 - 21, 1997

CA, San Jose

Acceptance Rates

Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

39
Total Citations
View Citations
397
Total Downloads

Downloads (Last 12 months)40
Downloads (Last 6 weeks)7

Reflects downloads up to 26 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Pauloski JHayot-Sasson VWard LHudson NSabino CBaughman MChard KFoster IMohror KArnold DBadia R(2023)Accelerating Communications in Federated Applications with Transparent Object ProxiesProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3581784.3607047(1-15)Online publication date: 12-Nov-2023
https://dl.acm.org/doi/10.1145/3581784.3607047
Gu JKumar RLumetta SSun YEl-Shishiny HRohou E(2010)Accelerating data movement on future chip multi-processorsProceedings of the Second International Forum on Next-Generation Multicore/Manycore Technologies10.1145/1882453.1882457(1-12)Online publication date: 19-Jun-2010
https://dl.acm.org/doi/10.1145/1882453.1882457
Buntinas DMercier GGropp W(2007)Implementation and evaluation of shared-memory communication and synchronization operations in MPICH2 using the Nemesis communication subsystemParallel Computing10.1016/j.parco.2007.06.00333:9(634-644)Online publication date: 1-Sep-2007
https://dl.acm.org/doi/10.1016/j.parco.2007.06.003
Weng TWang HWu THsu CLi K(2007)Design and Implementation of a Performance Analysis and Visualization Toolkit for Cluster EnvironmentsAdvances in Hybrid Information Technology10.1007/978-3-540-77368-9_46(469-479)Online publication date: 2007
https://doi.org/10.1007/978-3-540-77368-9_46
Weng TWang HWu THsu CLi K(2006)Design and implementation of a performance analysis and visualization toolkit for cluster environmentsProceedings of the 1st international conference on Advances in hybrid information technology10.5555/1782654.1782705(469-479)Online publication date: 9-Nov-2006
https://dl.acm.org/doi/10.5555/1782654.1782705
Chavarría-Miranda DNieplocha JTipparaju VAlderighi MSalapura VMcKee S(2006)Topology-aware tile mapping for clusters of SMPsProceedings of the 3rd conference on Computing frontiers10.1145/1128022.1128073(383-392)Online publication date: 3-May-2006
https://dl.acm.org/doi/10.1145/1128022.1128073
Hatazaki T(2006)Rank reordering strategy for MPI topology creation functionsRecent Advances in Parallel Virtual Machine and Message Passing Interface10.1007/BFb0056575(188-195)Online publication date: 2-Jun-2006
https://doi.org/10.1007/BFb0056575
Chai LSur SJin HPanda D(2005)Analysis of Design Considerations for Optimizing Multi-Channel MPI over InfiniBandProceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 9 - Volume 1010.1109/IPDPS.2005.106Online publication date: 4-Apr-2005
https://dl.acm.org/doi/10.1109/IPDPS.2005.106
Aversa RDi Martino BRak MVenticinque SVillano U(2005)Performance prediction through simulation of a hybrid MPI/OpenMP applicationParallel Computing10.1016/j.parco.2005.03.00931:10-12(1013-1033)Online publication date: 1-Oct-2005
https://dl.acm.org/doi/10.1016/j.parco.2005.03.009
Blelloch GCheng P(2004)On bounding time and space for multiprocessor garbage collectionACM SIGPLAN Notices10.1145/989393.98945639:4(626-641)Online publication date: 1-Apr-2004
https://dl.acm.org/doi/10.1145/989393.989456
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents