Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1375527.1375543acmconferencesArticle/Chapter ViewAbstractPublication PagesicsConference Proceedingsconference-collections
research-article

Advanced collective communication in aspen

Published: 07 June 2008 Publication History

Abstract

Aspen is a programming language that relies on high-level messaging to support communication among different program tasks executing in parallel. Unlike MPI, the computational logic of Aspen tasks is specified and developed independently of the global communication structure of the program. A root module specifies the communication structure of the program. The semantics and generality of these specifications enable novel forms of collective communication, including asynchronous and concurrent collective operations and reduction type operations with subsets of the participants being receivers of the reduced data, and with receivers that do not provide data to the reduction. This paper describes efficient implementations of these and other collective communication operations in Aspen. We demonstrate the ease-of-use of these features using several code examples and quantify their performance impact through both microbenchmarks and a quantum chemistry code used in rubber chemistry. Aspen's performance is competitive with, or slightly better than, the performance of MPI implementations for both the chemistry application and the microbenchmarks.

References

[1]
29th TOP500 List, June 2007.
[2]
K. Arvind and R. S. Nikhil. Executing a program on the mit tagged-token dataflow architecture. IEEE Trans. Comput., 39(3):300--318, 1990.
[3]
R. D. Blumofe, C. F. Joerg, B. C. Kuszmaul, C. E. Leiserson, K. H. Randall, and Y. Zhou. Cilk: An efficient multithreaded runtime system. In Proceedings of the Fifth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP), pages 207--216, Santa Barbara, California, July 1995.
[4]
R. Brightwell, S. P. Goudy, A. Rodrigues, and K. D.Underwood. Implications of application usage characteristics for collective communication offload. International Journal of High Performance Computing and Networking, 4:104--116, 2006.
[5]
I. Buck, T. Foley, D. Horn, J. Sugerman, K. Fatahalian, M. Houston, and P. Hanrahan. Brook for GPUs: Stream Computing on Graphics Hardware. In Proceedings of ACM SIGGRAPH 2004, August 2004
[6]
B. Burns, K. Grimaldi, A. Kostadinov, E. D. Berger, and M. D. Corner. Flux: A Language for Programming High-Performance Servers. In Proceedings of the USENIX 2006 Annual Technical Conference, pages 129--142, June 2006.
[7]
N. Carriero and D. Gelernter. Linda in context. Commun. ACM, 32(4):444--458, 1989.
[8]
Chan, E.W., Heimlich, M.F., Purkayastha, van de Geijn, and R.A. On optimizing collective communication. In Proceedings of the 2004 IEEE International Conference on Cluster Computing, 2004.
[9]
E. Chan, R. van de Geijn, W. Gropp, and R. Thakur. Collective communication on architectures that support simultaneous communication over multiple links. In PPoPP 06: Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming, pages 2--11, 2006
[10]
M. S. DeBergalis. A parallel file I/O API for Cilk. Master's thesis, Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science, May 2000.
[11]
A. Goyal, J. Cao, P. Patkar, G. Medvedev, S. P. Midkiff, V. Venkatasubramanian, and J. M. Caruthers. Population balance kinetic model for interaction of 2-bisbenzothiazole-2-2 disulfide (mbts) with sulfur. Rubber Chemistry and Technology, 2007. In press.
[12]
J. Gurd and W. Bohm. Implicit parallel processing: SISAL on the Manchester dataflow computer. In Proceedings of the IBM-Europe Institute on Parallel Programming, Aug. 1987.
[13]
T. Hoefler, A. Lumsdaine, and W. Rehm. Transformations to parallel codes for communicationcomputation overlap. In SC 05: Proceedings of the 2005 ACM/IEEE conference on Supercomputing. IEEE Computer Society/ACM, 11 2005.
[14]
T. Hoefler, A. Lumsdaine, and W. Rehm. Implementation and Performance Analysis of Non-Blocking Collective Operations for MPI. In proceedings of the 2007 International Conference on High Performance Computing, Networking, Storage and Analysis, SC07. IEEE Computer Society/ACM, 11 2007
[15]
J. Hoeflinger. Extending OpenMP to clusters, 2006.
[16]
http://cachewww.intel.com/cd/00/00/28/58/285865/285865.pdf, last checked Oct. 8, 2007.
[17]
K. Kennedy, C. Koelbel, and H. Zima. The rise and fall of High Performance Fortran: an historical object lesson. In HOPL III: Proceedings of the third ACM SIGPLAN conference on History of programming languages, pages 7-17
[18]
C. Koelbel. An overview of high performance fortran. SIGPLAN Fortran Forum, 11(4):9--16, 1992.
[19]
D. Kuck. Structure of Computers and Computations. John Wiley, 1979.
[20]
R. W. Numrich and J. Reid. Co-array Fortran for parallel programming. SIGPLAN Fortran Forum, 17(2):1--31, 1998.
[21]
J. Pjesivac-Grbovic, T. Angskun, G. Bosilca, G. E. Fagg, E. Gabriel, and J. J. Dongarra. Performance analysis of mpi collective operations. Cluster Computing Journal, 10:127--143, 2007.
[22]
R. Rabenseifner. Optimization of Collective Reduction Operations. In Proceedings of the International Conference on Computational Science, June 2004.
[23]
R. Thakur and W. Gropp. Improving the performance of collective operations in mpich. In 10th European PVM/MPI Users Group Conference (Euro PVN/MPI 2003), September 2003.
[24]
G. Upadhyaya, V. S. Pai, and S. P. Midkiff. Expressing and Exploiting Concurrency in Networked Applications with Aspen. In Proceedings of the 2007 ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pages 13--23, March 2007.
[25]
UPC Consortium. UPC Language Specification (Version 1.2), June 2005.
[26]
N. Harvey and J. Morris. NL: A general purpose visual dataflow language. Australian Computer Journal, 12(1):2--12, 199622, 2007.
[27]
C. Hoare. Communicating sequential processes. Communications of the ACM, 21(8), Aug. 1978.
[28]
R. Salama, W. Liu, and R. S. Gyurcsik. Software experience with concurrent c and lisp in a distributed system. In CSC '88: Proceedings of the 1988 ACM sixteenth annual conference on Computer science, pages 329--334, New York, NY, USA, 1988. ACM Press.
[29]
W. Thies, M. Karczmarek, and S. Amarasinghe. Streamit: A language for streaming applications. In International Conference on Compiler Construction, Grenoble, France, April 2002.
[30]
W. Thies, M. Karczmarek, J. Sermulins, R. Rabbah, and S. Amarasinghe. Teleport messaging for distributed stream programs. In Proceedings of the Symposium on Principles and Practice of Parallel Programming, Chicago, Illinois, June 2005.
[31]
P. H. Welch. An occam approach to transputer engineering. In Proceedings of the third conference on Hypercube concurrent computers and applications, pages 138--147, New York, NY, USA, 1988. ACM Press.
[32]
M. Welsh, D. Culler, and E. Brewer. Seda: An architecture for well-conditioned, scalable internet services. In Proceedings of the 18th ACM Symposium on Operating Systems Principles, October 2001.
[33]
Torsten Hoefler and Peter Gottschling and Andrew Lumsdaine and Wolfgang Rehm. Optimizing a conjugate gradient solver with non-blocking collective operations. In Parallel Computing Journal, 9:624--633, 2007.
[34]
Jun Cao and Ayush Goyal and Samuel P. Midkiff and James M. Caruthers An Optimizing Compiler for Parallel Chemistry Simulations In 21th International Parallel and Distributed Processing Symposium (IPDPS 2007).

Cited By

View all
  • (2015)Non-clairvoyant reduction algorithms for heterogeneous platformsConcurrency and Computation: Practice & Experience10.1002/cpe.334727:6(1612-1624)Online publication date: 25-Apr-2015
  • (2013)Scheduling associative reductions with homogeneous costs when overlapping communications and computations20th Annual International Conference on High Performance Computing10.1109/HiPC.2013.6799124(119-128)Online publication date: Dec-2013
  • (2010)Hiding latency in Coarray Fortran 2.0Proceedings of the Fourth Conference on Partitioned Global Address Space Programming Model10.1145/2020373.2020387(1-9)Online publication date: 12-Oct-2010

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ICS '08: Proceedings of the 22nd annual international conference on Supercomputing
June 2008
390 pages
ISBN:9781605581583
DOI:10.1145/1375527
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 June 2008

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. algorithms
  2. parallel programming
  3. programming languages
  4. reductions

Qualifiers

  • Research-article

Conference

ICS08
Sponsor:
ICS08: International Conference on Supercomputing
June 7 - 12, 2008
Island of Kos, Greece

Acceptance Rates

Overall Acceptance Rate 629 of 2,180 submissions, 29%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)1
Reflects downloads up to 12 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2015)Non-clairvoyant reduction algorithms for heterogeneous platformsConcurrency and Computation: Practice & Experience10.1002/cpe.334727:6(1612-1624)Online publication date: 25-Apr-2015
  • (2013)Scheduling associative reductions with homogeneous costs when overlapping communications and computations20th Annual International Conference on High Performance Computing10.1109/HiPC.2013.6799124(119-128)Online publication date: Dec-2013
  • (2010)Hiding latency in Coarray Fortran 2.0Proceedings of the Fourth Conference on Partitioned Global Address Space Programming Model10.1145/2020373.2020387(1-9)Online publication date: 12-Oct-2010

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media