Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/509593.509596acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
Article
Free access

Multi-protocol active messages on a cluster of SMP's

Published: 15 November 1997 Publication History

Abstract

Clusters of multiprocessors, or Clumps, promise to be the supercomputers of the future, but obtaining high performance on these architectures requires an understanding of interactions between the multiple levels of interconnection. In this paper, we present the first multi-protocol implementation of a lightweight message layer---a version of Active Messages-II running on a cluster of Sun Enterprise 5000 servers connected with Myrinet. This research brings together several pieces of high-performance interconnection technology: bus backplanes for symmetric multiprocessors, low-latency networks for connections between machines, and simple, user-level primitives for communication. The paper describes the shared memory message-passing protocol and analyzes the multi-protocol implementation with both microbenchmarks and Split-C applications. Three aspects of the communication layer are critical to performance: the overhead of cache-coherence mechanisms, the method of managing concurrent access, and the cost of accessing state with the slower protocol. Through the use of an adaptive polling strategy, the multi-protocol implementation limits performance interactions between the protocols, delivering up to 160 MB/s of bandwidth with 3.6 microsecond end-to-end latency. Applications within an SMP benefit from this fast communication, running up to 75% faster than on a network of uniprocessor workstations. Applications running on the entire Clump are limited by the balance of NIC's to processors in our system, and are typically slower than on the NOW. These results illustrate several potential pitfalls for the Clumps architecture.

References

[1]
A. Alexandrov, M. Ionescu, K. E. Schauser, C. Scheiman, "LogGP: Incorporating Long Messages into the LogP Model---One Step Closer Towards a Realistic Model for Parallel Computation," 7th Annual Symposium on Parallel Algorithms and Architectures, July 1995.]]
[2]
Accelerated Strategic Computing Initiative, a program of the Department of Energy. Information is available via http://www.llnl.gov/asci-alliances/.]]
[3]
D. A. Bader, J. JáJá, "SIMPLE: A Methodology for Programming High Performance Algorithms on Clusters of Symmetric Multiprocessors (SMP's)," preliminary version, May 1997, available via http://www.umiacs.umd.edu/research/EXPAR.]]
[4]
N. J. Boden, D. Cohen, R. E. Felderman, A. E. Kulawik, C. L. Seitz, J. N. Seizovic, W. Su, "Myrinet---A Gigabit-per-Second Local-Area Network," IEEE Micro, Vol. 15, February 1995, pp. 29-38.]]
[5]
E. A. Brewer, B. C. Kuszmaul, "How to Get Good Performance from the CM-5 Data Network," Proceedings of the 8th International Parallel Processing Symposium, April 1994.]]
[6]
R. Butler, E. Lusk, "Monitors, Message, and Clusters: the p4 Parallel Programming System," available via http://www.mcs.anl.gov/home/lusk/p4/p4-paper/paper.html.]]
[7]
B. N. Chun, A. M. Mainwaring, D. E. Culler, "A General-Purpose Protocol Architecture for a Low-Latency, Multi-gigabit System Area Network," Proceedings of Hot Interconnects V, Stanford, California, August 1997.]]
[8]
D. E. Culler, A. Dusseau, S. C. Goldstein, A. Krishnamurthy, S. S. Lumetta, T. von Eicken, K. Yelick, "Parallel Programming in Split-C," Proceedings of Supercomputing 1993, Portland, Oregon, November 1993, pp. 262-73.]]
[9]
D. E. Culler, R. M. Karp, D. A. Patterson, A. Sahay, K. E. Schauser, E. Santos, R. Subramonian, T. von Eicken, "LogP: Towards a Realistic Model of Parallel Computation," Proceedings of the 4th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, San Diego, California, May 1993.]]
[10]
D. E. Culler, L. T. Liu, R. P. Martin, C. O. Yoshikawa, "Assessing Fast Network Interfaces", IEEE Micro, Vol. 16, No. 1, February 1996, pp. 35-43.]]
[11]
S. J. Fink, S. B. Baden, "Non-Uniform Partitioning of Finite Difference Methods Running on SMP Clusters," submitted for publication, available via http://www-cse.ucsd.edu/users/baden/MT.html.]]
[12]
S. J. Fink, S. B. Baden, "Runtime Support for Multi-Tier Programming of Block-Structured Applications on SMP Clusters," submitted for publication, available via http://www-cse.ucsd.edu/users/baden/MT.html.]]
[13]
I. Foster, C. Kesselman, S. Tuecke, "The Nexus Approach to Integrating Multithreading and Communication," Journal of Parallel and Distributed Computing, Vol. 37, August 1996, pp. 70-82.]]
[14]
I. Foster, J. Geisler, C. Kesselman, S. Tuecke, "Managing Multiple Communication Methods in High-Performance Networked Computing Systems," Journal of Parallel and Distributed Computing, Vol. 40, January 1997, pp. 35-48.]]
[15]
W. W. Gropp, E. L. Lusk, "A Taxonomy of Programming Models for Symmetric Multiprocessors and SMP clusters," Proceedings of Programming Models for Massively Parallel Computers 1995, October 1995, pp. 2-7.]]
[16]
M. Haines, D. Cronk, P. Mehrotra, "On the Design of Chant: A Talking Threads Package," Proceedings of Supercomputing 1994, Washington, D.C., November 1994, pp. 350-9.]]
[17]
D. Jiang, H. Shan, J. P. Singh, "Application Restructuring and Performance Portability on Shared Virtual Memory and Hardware-Coherent Multiprocessors," Proceedings of Principles and Practice of Parallel Programming, 1997, pp. 217-29.]]
[18]
B.-H. Lim, P. Heidelberger, P. Pattnaik, M. Snir, "Message Proxies for Efficient, Protected Communication on SMP Clusters," IBM Almaden Research Report #RC 20522 (90972), August 1996.]]
[19]
L. T. Liu, D. E. Culler, "Evaluation of the Intel Paragon on Active Message Communication," Proceedings of Intel Supercomputer Users Group Conference, June 1995, also available via http://now.CS.Berkeley.EDU.]]
[20]
S. S. Lumetta, D. E. Culler, "Managing Concurrent Access for Shared Memory Active Messages," U. C. Berkeley Technical Report in preparation.]]
[21]
A. M. Mainwaring, D. E. Culler, "Active Message Applications Programming Interface and Communication Subsystem Organization," U. C. Berkeley Technical Report #CSD-96-918, October 1996, also available via http://now.CS.Berkeley.EDU.]]
[22]
R. Martin, "HPAM: an Active Message Layer for a Network of HP Workstations," Proceedings of Hot Interconnects II, Stanford, California, August 1994, pp. 40-58.]]
[23]
J. M. Mellor-Crummey, M. L. Scott, "Algorithms for Scalable Synchronization on Shared-Memory Multiprocessors," ACM Transactions on Computer Systems, Vol. 9, No. 1, February 1991, pp. 21-65.]]
[24]
S. S. Mukherjee, M. D. Hill, "A Case for Making Network Interfaces Less Peripheral," Proceedings of Hot Interconnects V, Stanford, California, August 1997.]]
[25]
R. H. Saavedra, "Micro Benchmark Analysis of the KSR1," Proceedings of Supercomputing 1993, Portland, Oregon, November 1993, pp. 202-13.]]
[26]
K. E. Schauser, C. Scheiman, "Experiences with Active Messages on the Meiko CS-2," Proceedings of the 9th International Parallel Processing Symposium, April 1995.]]
[27]
A. Singhal, D. Broniarczyk, F. Cerauskis, J. Price, L. Yuan, C. Cheng, D. Doblar, S. Fosth, N. Agarwal, K. Harvey, E. Hagersten, B. Liencres, "Gigaplane: A High Performance Bus for Large SMPs," Proceedings of Hot Interconnects IV, Stanford, California, August 1996, pp. 41-52]]
[28]
L. Tucker, A. M. Mainwaring, "CMMD: Active Messages on the CM-5," Parallel Computing, Vol. 20, No. 4, August 1994, pp. 481-96.]]
[29]
T. von Eicken, V. Avula, A. Basu, V. Buch, "Low-latency Communication over ATM Networks Using Active Messages," Proceedings of Hot Interconnects II, Stanford, California, August 1994, pp. 60-71.]]
[30]
T. von Eicken, D. E. Culler, S. C. Goldstein, K. E. Schauser, "Active Messages: a Mechanism for Integrated Communication and Computation," in Proceedings of the 19th International Symposium on Computer Architecture, Gold Coast, Qld., Australia, May 1992, pp. 256-66.]]
[31]
P. R. Woodward, "Perspectives on Supercomputing: Three Decades of Change," IEEE Computer, Vol. 29, October 1996, pp. 99-111.]]
[32]
D. Yeung, J. Kubiatowicz, A. Agarwal, "MGS: A Multigrain Shared Memory System," Proceedings of the 23rd International Symposium on Computer Architecture, Philadelphia, Pennsylvania, May 1996, pp. 44-55.]]

Cited By

View all
  • (2023)Accelerating Communications in Federated Applications with Transparent Object ProxiesProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3581784.3607047(1-15)Online publication date: 12-Nov-2023
  • (2010)Accelerating data movement on future chip multi-processorsProceedings of the Second International Forum on Next-Generation Multicore/Manycore Technologies10.1145/1882453.1882457(1-12)Online publication date: 19-Jun-2010
  • (2007)Implementation and evaluation of shared-memory communication and synchronization operations in MPICH2 using the Nemesis communication subsystemParallel Computing10.1016/j.parco.2007.06.00333:9(634-644)Online publication date: 1-Sep-2007
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SC '97: Proceedings of the 1997 ACM/IEEE conference on Supercomputing
November 1997
921 pages
ISBN:0897919858
DOI:10.1145/509593
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 November 1997

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Article

Conference

SC '97
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)40
  • Downloads (Last 6 weeks)7
Reflects downloads up to 26 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Accelerating Communications in Federated Applications with Transparent Object ProxiesProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3581784.3607047(1-15)Online publication date: 12-Nov-2023
  • (2010)Accelerating data movement on future chip multi-processorsProceedings of the Second International Forum on Next-Generation Multicore/Manycore Technologies10.1145/1882453.1882457(1-12)Online publication date: 19-Jun-2010
  • (2007)Implementation and evaluation of shared-memory communication and synchronization operations in MPICH2 using the Nemesis communication subsystemParallel Computing10.1016/j.parco.2007.06.00333:9(634-644)Online publication date: 1-Sep-2007
  • (2007)Design and Implementation of a Performance Analysis and Visualization Toolkit for Cluster EnvironmentsAdvances in Hybrid Information Technology10.1007/978-3-540-77368-9_46(469-479)Online publication date: 2007
  • (2006)Design and implementation of a performance analysis and visualization toolkit for cluster environmentsProceedings of the 1st international conference on Advances in hybrid information technology10.5555/1782654.1782705(469-479)Online publication date: 9-Nov-2006
  • (2006)Topology-aware tile mapping for clusters of SMPsProceedings of the 3rd conference on Computing frontiers10.1145/1128022.1128073(383-392)Online publication date: 3-May-2006
  • (2006)Rank reordering strategy for MPI topology creation functionsRecent Advances in Parallel Virtual Machine and Message Passing Interface10.1007/BFb0056575(188-195)Online publication date: 2-Jun-2006
  • (2005)Analysis of Design Considerations for Optimizing Multi-Channel MPI over InfiniBandProceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 9 - Volume 1010.1109/IPDPS.2005.106Online publication date: 4-Apr-2005
  • (2005)Performance prediction through simulation of a hybrid MPI/OpenMP applicationParallel Computing10.1016/j.parco.2005.03.00931:10-12(1013-1033)Online publication date: 1-Oct-2005
  • (2004)On bounding time and space for multiprocessor garbage collectionACM SIGPLAN Notices10.1145/989393.98945639:4(626-641)Online publication date: 1-Apr-2004
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media