Article

The design and implementation of a parallel array operator for the arbitrary remapping of data

Authors:

Steven J. Deitz,

Bradford L. Chamberlain,

Lawrence SnyderAuthors Info & Claims

PPoPP '03: Proceedings of the ninth ACM SIGPLAN symposium on Principles and practice of parallel programming

Pages 155 - 166

https://doi.org/10.1145/781498.781526

Published: 11 June 2003 Publication History

Abstract

Gather and scatter are data redistribution functions of long-standing importance to high performance computing. In this paper, we present a highly-general array operator with powerful gather and scatter capabilities unmatched by other array languages. We discuss an efficient parallel implementation, introducing three new optimizations---schedule compression, dead array reuse, and direct communication---that reduce the costs associated with the operator's wide applicability. In our implementation of this operator in ZPL, we demonstrate performance comparable to the hand-coded Fortran + MPI versions of the NAS FT and CG benchmarks.

References

[1]

J. C. Adams, W. S. Brainerd, J. T. Martin, B. T. Smith, and J. L. Wagener. Fortran 90 Handbook. McGraw-Hill, New York, NY, USA, 1992.

Digital Library

[2]

D. Bailey, T. Harris, W. Saphir, R. van~der Wijngaart, A. Woo, and M. Yarrow. The NAS parallel benchmarks 2.0. Technical report, NASA Ames Research Center (NAS-95-020), December 1995.

[3]

R. Barriuso and A. Knies. SHMEM user's guide. Technical report, Cray Research Inc., May 1994.

[4]

S. Benkner, P. Mehrotra, J. V. Rosendale, and H. Zima. High-level management of communication schedules in HPF-like languages. In Proceedings of the ACM International Conference on Supercomputing, pages 109--116, 1998.

Digital Library

[5]

S. Benkner and H. Zima. Compiling High Performance Fortran for distributed memory architectures. Parallel Computing, 25(13--14):1785--1825, 1999.

Digital Library

[6]

W. W. Carlson, J. M. Draper, D. E. Culler, K. Yelick, E. Brooks, and K. Warren. Introduction to UPC and language specification. Technical Report CCS-TR-99-157, Center for Computing Sciences, Bowie, MD, May 1999.

[7]

B. L. Chamberlain. The Design and Implementation of a Region-Based Parallel Language. PhD thesis, University of Washington, November 2001.

Digital Library

[8]

B. L. Chamberlain, S.-E. Choi, E. C. Lewis, C. Lin, L. Snyder, and W. D. Weathersby. ZPL's WYSIWYG performance model. In Proceedings of the IEEE Workshop on High-Level Parallel Programming Models and Supportive Environments, 1998.

Digital Library

[9]

B. L. Chamberlain, E. C. Lewis, and L. Snyder. Problem space promotion and its evaluation as a technique for efficient parallel computation. In Proceedings of the ACM International Conference on Supercomputing, 1999.

Digital Library

[10]

B. L. Chamberlain and L. Snyder. Array language support for parallel sparse computation. In Proceedings of the ACM International Conference on Supercomputing, 2001.

Digital Library

[11]

S. J. Deitz, B. L. Chamberlain, and L. Snyder. High-level language support for user-defined reductions. Journal of Supercomputing, 23(1):23--37, August 2002.

Digital Library

[12]

W. Gehrke. Fortran 95 Language Guide. Springer Verlag, October 1996.

Digital Library

[13]

M. Gupta, S. Midkiff, E. Schonberg, V. Seshadri, D. Shields, K.-Y. Wang, W.-M. Ching, and T. Ngo. An HPF compiler for the IBM SP2. In Proceedings of the ACM Conference on Supercomputing, December 1995.

Digital Library

[14]

High Performance Fortran Forum. High Performance Fortran Language Specification, Version 2.0. 1997.

Digital Library

[15]

K. E. Iverson. A Programming Language. Wiley, New York, NY, USA, 1968.

Digital Library

[16]

R. Mirchandany, J. Saltz, R. Smith, D. Nicol, and K. Crowley. Principles of runtime support for parallel processors. In Proceedings of the ACM International Conference on Supercomputing, pages 140--152, July 1988.

Digital Library

[17]

R. W. Numrich and J. K. Reid. Co-Array Fortran for parallel programming. Technical Report RAL-TR-1998-060, Rutherford Appleton Laboratory, Oxon, UK, August 1998.

Digital Library

[18]

M. Snir, S. W. Otto, S. Huss-Lederman, D. W. Walker, and J. Dongarra. MPI: the complete reference. MIT Press, Cambridge, MA, USA, 1996.

Digital Library

[19]

L. Snyder. Programming Guide to ZPL. MIT Press, Cambridge, MA, USA, 1999.

Digital Library

[20]

M. Ujaldon, S. D. Sharma, J. Saltz, and E. L. Zapata. Run-time techniques for parallelizing sparse matrix problems. In Workshop on Parallel Algorithms for Irregularly Structured Problems, pages 43--57, 1995.

Digital Library

[21]

K. Yelick, L. Semenzato, G. Pike, C. Miyamoto, B. Liblit, A. Krishnamurthy, P. Hilfinger, S. Graham, D. Gay, P. Colella, and A. Aiken. Titanium: A high-performance Java dialect. In ACM 1998 Workshop on Java for High-Performance Network Computing, 1998.

Cited By

Lee WPapadakis MSlaughter EAiken ATaufer MBalaji PPeña A(2019)A constraint-based approach to automatic data partitioning for distributed memory executionProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3295500.3356199(1-24)Online publication date: 17-Nov-2019
https://dl.acm.org/doi/10.1145/3295500.3356199
Slaughter ELee WTreichler SZhang WBauer MShipman GMcCormick PAiken AMohr BRaghavan P(2017)Control replicationProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3126908.3126949(1-12)Online publication date: 12-Nov-2017
https://dl.acm.org/doi/10.1145/3126908.3126949
Xu ZKamil SSolar-Lezama ADamkroger TDongarra J(2014)MSLProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1109/SC.2014.31(311-322)Online publication date: 16-Nov-2014
https://dl.acm.org/doi/10.1109/SC.2014.31
Show More Cited By

Index Terms

The design and implementation of a parallel array operator for the arbitrary remapping of data
1. Software and its engineering
  1. Software notations and tools
    1. General programming languages
      1. Language types
        Concurrent programming languages
        Distributed programming languages
        Parallel programming languages

Recommendations

The design and implementation of a parallel array operator for the arbitrary remapping of data
Proceedings of the ACM SIGPLAN symposium on principles and practice of parallel programming (PPoPP 2003) and workshop on partial evaluation and semantics-based program manipulation (PEPM 2003)

Gather and scatter are data redistribution functions of long-standing importance to high performance computing. In this paper, we present a highly-general array operator with powerful gather and scatter capabilities unmatched by other array languages. ...
Data-Parallel Programming on MIMD Computers

The implementation of two compilers for the data-parallel programming language Dataparallel C is described. One compiler generates code for Intel and nCUBE hypercube multicomputers; the other generates code for Sequent multiprocessors. A suite of ...
Portable performance of data parallel languages
SC '97: Proceedings of the 1997 ACM/IEEE conference on Supercomputing

A portable program executes on different platforms and yields consistent performance. With the focus on portability, this paper presents an in-depth study of the performance of three NAS benchmarks (EP, MG, FT) compiled with three commercial HPF ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

PPoPP '03: Proceedings of the ninth ACM SIGPLAN symposium on Principles and practice of parallel programming

June 2003

250 pages

ISBN:1581135882

DOI:10.1145/781498

General Chair:
Rudolf Eigenmann
Purdue University
,
Program Chair:
Martin Rinard
MIT Laboratory for Computer Science

ACM SIGPLAN Notices Volume 38, Issue 10
Proceedings of the ACM SIGPLAN symposium on principles and practice of parallel programming (PPoPP 2003) and workshop on partial evaluation and semantics-based program manipulation (PEPM 2003)
October 2003
331 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/966049
Issue’s Table of Contents

Copyright © 2003 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 June 2003

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Article

Conference

PPoPP03

Sponsor:

PPoPP03: ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming

June 11 - 13, 2003

California, San Diego, USA

Acceptance Rates

PPoPP '03 Paper Acceptance Rate 20 of 45 submissions, 44%;

Overall Acceptance Rate 230 of 1,014 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

11
Total Citations
View Citations
542
Total Downloads

Downloads (Last 12 months)4
Downloads (Last 6 weeks)0

Reflects downloads up to 10 Aug 2024

Other Metrics

View Author Metrics

Citations

Cited By

Lee WPapadakis MSlaughter EAiken ATaufer MBalaji PPeña A(2019)A constraint-based approach to automatic data partitioning for distributed memory executionProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3295500.3356199(1-24)Online publication date: 17-Nov-2019
https://dl.acm.org/doi/10.1145/3295500.3356199
Slaughter ELee WTreichler SZhang WBauer MShipman GMcCormick PAiken AMohr BRaghavan P(2017)Control replicationProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3126908.3126949(1-12)Online publication date: 12-Nov-2017
https://dl.acm.org/doi/10.1145/3126908.3126949
Xu ZKamil SSolar-Lezama ADamkroger TDongarra J(2014)MSLProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1109/SC.2014.31(311-322)Online publication date: 16-Nov-2014
https://dl.acm.org/doi/10.1109/SC.2014.31
Chen Y(2013)Algebraic program semantics for supercomputingTheories of Programming and Formal Methods10.5555/2554641.2554649(118-135)Online publication date: 1-Jan-2013
https://dl.acm.org/doi/10.5555/2554641.2554649
Gan GWang XManzano JGao G(2009)Tile ReductionProceedings of the 5th International Workshop on OpenMP: Evolving OpenMP in an Age of Extreme Parallelism10.1007/978-3-642-02303-3_12(140-153)Online publication date: 22-May-2009
https://dl.acm.org/doi/10.1007/978-3-642-02303-3_12
Erez MAhn JGummaraju JRosenblum MDally WSmith B(2007)Executing irregular scientific applications on stream architecturesProceedings of the 21st annual international conference on Supercomputing10.1145/1274971.1274987(93-104)Online publication date: 17-Jun-2007
https://dl.acm.org/doi/10.1145/1274971.1274987
Snyder LRyder BHailpern B(2007)The design and development of ZPLProceedings of the third ACM SIGPLAN conference on History of programming languages10.1145/1238844.1238852(8-1-8-37)Online publication date: 9-Jun-2007
https://dl.acm.org/doi/10.1145/1238844.1238852
Balart JGonzàlez MMartorell XAyguadé ELabarta J(2006)Runtime address space computation for SDSM systemsProceedings of the 19th international conference on Languages and compilers for parallel computing10.5555/1757112.1757145(330-344)Online publication date: 2-Nov-2006
https://dl.acm.org/doi/10.5555/1757112.1757145
Satoh MNegishi KKobayashi A(2006)Analysis of two-level data mapping in an HPF compiler for distributed-memory machinesParallel Computing10.1016/j.parco.2005.11.00332:4(280-300)Online publication date: 1-Apr-2006
https://dl.acm.org/doi/10.1016/j.parco.2005.11.003
Zhang YTipparaju VNieplocha JHariri S(2005)Parallelization of the NAS Conjugate Gradient Benchmark Using the Global Arrays Shared Memory Programming ModelProceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 4 - Volume 0510.1109/IPDPS.2005.331Online publication date: 4-Apr-2005
https://dl.acm.org/doi/10.1109/IPDPS.2005.331
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents