research-article

Advanced collective communication in aspen

Authors:

Samuel P. MidkiffAuthors Info & Claims

ICS '08: Proceedings of the 22nd annual international conference on Supercomputing

Pages 83 - 93

https://doi.org/10.1145/1375527.1375543

Published: 07 June 2008 Publication History

Abstract

Aspen is a programming language that relies on high-level messaging to support communication among different program tasks executing in parallel. Unlike MPI, the computational logic of Aspen tasks is specified and developed independently of the global communication structure of the program. A root module specifies the communication structure of the program. The semantics and generality of these specifications enable novel forms of collective communication, including asynchronous and concurrent collective operations and reduction type operations with subsets of the participants being receivers of the reduced data, and with receivers that do not provide data to the reduction. This paper describes efficient implementations of these and other collective communication operations in Aspen. We demonstrate the ease-of-use of these features using several code examples and quantify their performance impact through both microbenchmarks and a quantum chemistry code used in rubber chemistry. Aspen's performance is competitive with, or slightly better than, the performance of MPI implementations for both the chemistry application and the microbenchmarks.

References

[1]

29th TOP500 List, June 2007.

[2]

K. Arvind and R. S. Nikhil. Executing a program on the mit tagged-token dataflow architecture. IEEE Trans. Comput., 39(3):300--318, 1990.

Digital Library

[3]

R. D. Blumofe, C. F. Joerg, B. C. Kuszmaul, C. E. Leiserson, K. H. Randall, and Y. Zhou. Cilk: An efficient multithreaded runtime system. In Proceedings of the Fifth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP), pages 207--216, Santa Barbara, California, July 1995.

Digital Library

[4]

R. Brightwell, S. P. Goudy, A. Rodrigues, and K. D.Underwood. Implications of application usage characteristics for collective communication offload. International Journal of High Performance Computing and Networking, 4:104--116, 2006.

Digital Library

[5]

I. Buck, T. Foley, D. Horn, J. Sugerman, K. Fatahalian, M. Houston, and P. Hanrahan. Brook for GPUs: Stream Computing on Graphics Hardware. In Proceedings of ACM SIGGRAPH 2004, August 2004

Digital Library

[6]

B. Burns, K. Grimaldi, A. Kostadinov, E. D. Berger, and M. D. Corner. Flux: A Language for Programming High-Performance Servers. In Proceedings of the USENIX 2006 Annual Technical Conference, pages 129--142, June 2006.

Digital Library

[7]

N. Carriero and D. Gelernter. Linda in context. Commun. ACM, 32(4):444--458, 1989.

Digital Library

[8]

Chan, E.W., Heimlich, M.F., Purkayastha, van de Geijn, and R.A. On optimizing collective communication. In Proceedings of the 2004 IEEE International Conference on Cluster Computing, 2004.

Digital Library

[9]

E. Chan, R. van de Geijn, W. Gropp, and R. Thakur. Collective communication on architectures that support simultaneous communication over multiple links. In PPoPP 06: Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming, pages 2--11, 2006

Digital Library

[10]

M. S. DeBergalis. A parallel file I/O API for Cilk. Master's thesis, Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science, May 2000.

[11]

A. Goyal, J. Cao, P. Patkar, G. Medvedev, S. P. Midkiff, V. Venkatasubramanian, and J. M. Caruthers. Population balance kinetic model for interaction of 2-bisbenzothiazole-2-2 disulfide (mbts) with sulfur. Rubber Chemistry and Technology, 2007. In press.

[12]

J. Gurd and W. Bohm. Implicit parallel processing: SISAL on the Manchester dataflow computer. In Proceedings of the IBM-Europe Institute on Parallel Programming, Aug. 1987.

[13]

T. Hoefler, A. Lumsdaine, and W. Rehm. Transformations to parallel codes for communicationcomputation overlap. In SC 05: Proceedings of the 2005 ACM/IEEE conference on Supercomputing. IEEE Computer Society/ACM, 11 2005.

Digital Library

[14]

T. Hoefler, A. Lumsdaine, and W. Rehm. Implementation and Performance Analysis of Non-Blocking Collective Operations for MPI. In proceedings of the 2007 International Conference on High Performance Computing, Networking, Storage and Analysis, SC07. IEEE Computer Society/ACM, 11 2007

Digital Library

[15]

J. Hoeflinger. Extending OpenMP to clusters, 2006.

[16]

http://cachewww.intel.com/cd/00/00/28/58/285865/285865.pdf, last checked Oct. 8, 2007.

[17]

K. Kennedy, C. Koelbel, and H. Zima. The rise and fall of High Performance Fortran: an historical object lesson. In HOPL III: Proceedings of the third ACM SIGPLAN conference on History of programming languages, pages 7-17

Digital Library

[18]

C. Koelbel. An overview of high performance fortran. SIGPLAN Fortran Forum, 11(4):9--16, 1992.

Digital Library

[19]

D. Kuck. Structure of Computers and Computations. John Wiley, 1979.

Digital Library

[20]

R. W. Numrich and J. Reid. Co-array Fortran for parallel programming. SIGPLAN Fortran Forum, 17(2):1--31, 1998.

Digital Library

[21]

J. Pjesivac-Grbovic, T. Angskun, G. Bosilca, G. E. Fagg, E. Gabriel, and J. J. Dongarra. Performance analysis of mpi collective operations. Cluster Computing Journal, 10:127--143, 2007.

Digital Library

[22]

R. Rabenseifner. Optimization of Collective Reduction Operations. In Proceedings of the International Conference on Computational Science, June 2004.

[23]

R. Thakur and W. Gropp. Improving the performance of collective operations in mpich. In 10th European PVM/MPI Users Group Conference (Euro PVN/MPI 2003), September 2003.

[24]

G. Upadhyaya, V. S. Pai, and S. P. Midkiff. Expressing and Exploiting Concurrency in Networked Applications with Aspen. In Proceedings of the 2007 ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pages 13--23, March 2007.

Digital Library

[25]

UPC Consortium. UPC Language Specification (Version 1.2), June 2005.

[26]

N. Harvey and J. Morris. NL: A general purpose visual dataflow language. Australian Computer Journal, 12(1):2--12, 199622, 2007.

[27]

C. Hoare. Communicating sequential processes. Communications of the ACM, 21(8), Aug. 1978.

Digital Library

[28]

R. Salama, W. Liu, and R. S. Gyurcsik. Software experience with concurrent c and lisp in a distributed system. In CSC '88: Proceedings of the 1988 ACM sixteenth annual conference on Computer science, pages 329--334, New York, NY, USA, 1988. ACM Press.

Digital Library

[29]

W. Thies, M. Karczmarek, and S. Amarasinghe. Streamit: A language for streaming applications. In International Conference on Compiler Construction, Grenoble, France, April 2002.

Digital Library

[30]

W. Thies, M. Karczmarek, J. Sermulins, R. Rabbah, and S. Amarasinghe. Teleport messaging for distributed stream programs. In Proceedings of the Symposium on Principles and Practice of Parallel Programming, Chicago, Illinois, June 2005.

Digital Library

[31]

P. H. Welch. An occam approach to transputer engineering. In Proceedings of the third conference on Hypercube concurrent computers and applications, pages 138--147, New York, NY, USA, 1988. ACM Press.

Digital Library

[32]

M. Welsh, D. Culler, and E. Brewer. Seda: An architecture for well-conditioned, scalable internet services. In Proceedings of the 18th ACM Symposium on Operating Systems Principles, October 2001.

Digital Library

[33]

Torsten Hoefler and Peter Gottschling and Andrew Lumsdaine and Wolfgang Rehm. Optimizing a conjugate gradient solver with non-blocking collective operations. In Parallel Computing Journal, 9:624--633, 2007.

Digital Library

[34]

Jun Cao and Ayush Goyal and Samuel P. Midkiff and James M. Caruthers An Optimizing Compiler for Parallel Chemistry Simulations In 21th International Parallel and Distributed Processing Symposium (IPDPS 2007).

Cited By

Benoit ACanon LMarchal L(2015)Non-clairvoyant reduction algorithms for heterogeneous platformsConcurrency and Computation: Practice & Experience10.1002/cpe.334727:6(1612-1624)Online publication date: 25-Apr-2015
https://dl.acm.org/doi/10.1002/cpe.3347
Canon L(2013)Scheduling associative reductions with homogeneous costs when overlapping communications and computations20th Annual International Conference on High Performance Computing10.1109/HiPC.2013.6799124(119-128)Online publication date: Dec-2013
https://doi.org/10.1109/HiPC.2013.6799124
Scherer WAdhianto LJin GMellor-Crummey JYang CMoreira JIancu CSaraswat V(2010)Hiding latency in Coarray Fortran 2.0Proceedings of the Fourth Conference on Partitioned Global Address Space Programming Model10.1145/2020373.2020387(1-9)Online publication date: 12-Oct-2010
https://dl.acm.org/doi/10.1145/2020373.2020387

Index Terms

Advanced collective communication in aspen
1. Computing methodologies
  1. Parallel computing methodologies
    1. Parallel programming languages
2. Software and its engineering
  1. Software notations and tools
    1. General programming languages
      1. Language types
        Parallel programming languages

Recommendations

Global-view abstractions for user-defined reductions and scans
PPoPP '06: Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming

Since APL, reductions and scans have been recognized as powerful programming concepts. Abstracting an accumulation loop (reduction) and an update loop (scan), the concepts have efficient parallel implementations based on the parallel prefix algorithm. ...
Expressing and exploiting concurrency in networked applications with aspen
PPoPP '07: Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming

This paper presents Aspen, a high-level programming language thattargets both high-productivity programming and runtime support formanaging resources needed by a computation. Programs in Aspen arerepresented as directed graphs, where the edges are well-...
Efficient high performance collective communication for the cell blade
ICS '09: Proceedings of the 23rd international conference on Supercomputing

This paper presents high-performance collective communication algorithms and implementations that exploit the unique architectural features of the Cell heterogeneous multicore processor. This paper specifically describes novel algorithms for the barrier,...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ICS '08: Proceedings of the 22nd annual international conference on Supercomputing

June 2008

390 pages

ISBN:9781605581583

DOI:10.1145/1375527

General Chairs:
Theo Papatheodorou
University of Patras, Greece
,
Utpal Banerjee
Intel (retired), USA
,
Program Chairs:
Avi Mendelson
Intel, Israel
,
Kyle Gallivan
Florida State University, USA

Copyright © 2008 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 June 2008

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

ICS08

Sponsor:

ICS08: International Conference on Supercomputing

June 7 - 12, 2008

Island of Kos, Greece

Acceptance Rates

Overall Acceptance Rate 629 of 2,180 submissions, 29%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
238
Total Downloads

Downloads (Last 12 months)2
Downloads (Last 6 weeks)1

Reflects downloads up to 12 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Benoit ACanon LMarchal L(2015)Non-clairvoyant reduction algorithms for heterogeneous platformsConcurrency and Computation: Practice & Experience10.1002/cpe.334727:6(1612-1624)Online publication date: 25-Apr-2015
https://dl.acm.org/doi/10.1002/cpe.3347
Canon L(2013)Scheduling associative reductions with homogeneous costs when overlapping communications and computations20th Annual International Conference on High Performance Computing10.1109/HiPC.2013.6799124(119-128)Online publication date: Dec-2013
https://doi.org/10.1109/HiPC.2013.6799124
Scherer WAdhianto LJin GMellor-Crummey JYang CMoreira JIancu CSaraswat V(2010)Hiding latency in Coarray Fortran 2.0Proceedings of the Fourth Conference on Partitioned Global Address Space Programming Model10.1145/2020373.2020387(1-9)Online publication date: 12-Oct-2010
https://dl.acm.org/doi/10.1145/2020373.2020387

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents