Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2627373.2627378acmconferencesArticle/Chapter ViewAbstractPublication PagespldiConference Proceedingsconference-collections
tutorial

A Local-View Array Library for Partitioned Global Address Space C++ Programs

Published: 09 June 2014 Publication History

Abstract

Multidimensional arrays are an important data structure in many scientific applications. Unfortunately, built-in support for such arrays is inadequate in C++, particularly in the distributed setting where bulk communication operations are required for good performance. In this paper, we present a multidimensional library for partitioned global address space (PGAS) programs, supporting the one-sided remote access and bulk operations of the PGAS model. The library is based on Titanium arrays, which have proven to provide good productivity and performance. These arrays provide a local view of data, where each rank constructs its own portion of a global data structure, matching the local view of execution common to PGAS programs and providing maximum flexibility in structuring global data. Unlike Titanium, which has its own compiler with array-specific analyses, optimizations, and code generation, we implement multidimensional arrays solely through a C++ library. The main goal of this effort is to provide a library-based implementation that can match the productivity and performance of a compiler-based approach. We implement the array library as an extension to UPC++, a C++ library for PGAS programs, and we extend Titanium arrays with specializations to improve performance. We evaluate the array library by porting four Titanium benchmarks to UPC++, demonstrating that it can achieve up to 25% better performance than Titanium without a significant increase in programmer effort.

References

[1]
Global Arrays webpage. http://www.emsl.pnl.gov/docs/global/.
[2]
D. Bailey et al. The NAS parallel benchmarks. The International Journal of Supercomputer Applications, 5(3):63--73, Fall 1991.
[3]
G. Bikshandi et al. Programming for parallelism and locality with hierarchically tiled arrays. In Proceedings of the Eleventh ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP '06, 2006.
[4]
D. Bonachea. GASNet specification, v1.1. Technical Report UCB/CSD-02-1207, University of California, Berkeley, 2002.
[5]
Z. Budimlic et al. Parallel object-oriented scientific computing with Habanero-Java. In 9th Workshop on Parallel/High-Performance Object-Oriented Scientific Computing (POOSC'10), October 2010.
[6]
W. Carlson et al. Introduction to UPC and language specification. Technical Report CCS-TR-99-157, IDA Center for Computing Sciences, 1999.
[7]
B. L. Chamberlain et al. ZPL: A machine independent programming language for parallel computers. Software Engineering, 26(3):197--211, 2000.
[8]
Cray Inc. Chapel Specification 0.4, Feb. 2005.
[9]
K. Datta, D. Bonachea, and K. Yelick. Titanium performance and potential: an NPB experimental study. In Proceedings of the 18th International Workshop on Languages and Compilers for Parallel Computing (LCPC), 2005.
[10]
M. Garland, M. Kudlur, and Y. Zheng. Designing a unified programming model for heterogeneous machines. In Supercomputing 2012, November 2012.
[11]
A. Kamil and K. Yelick. Hierarchical computation in the SPMD programming model. In The 26th International Workshop on Languages and Compilers for Parallel Computing, September 2013.
[12]
G. R. Pike. Reordering and Storage Optimizations for Scientific Programs. PhD thesis, University of California, Berkeley, 2002.
[13]
J. V. W. Reynders et al. POOMA: A framework for scientic simulations of paralllel architectures. In Parallel Programming in C++. MIT Press, 1996.
[14]
V. Saraswat. Report on the Experimental Language X10, Version 0.41. IBM Research, Feb. 2006.
[15]
T. Wen and P. Colella. Adaptive mesh refinement in Titanium. In The 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS05), April 2005.
[16]
K. Yelick et al. Titanium: A high-performance Java dialect. In Workshop on Java for High-Performance Network Computing, Stanford, California, February 1998.
[17]
Y. Zheng, A. Kamil, M. Driscoll, H. Shan, and K. Yelick. UPC++: A PGAS extension for C++. In The 28th IEEE International Parallel and Distributed Processing Symposium (IPDPS14), May 2014.

Cited By

View all
  • (2019)BLAS-on-flashProceedings of the 16th USENIX Conference on Networked Systems Design and Implementation10.5555/3323234.3323273(469-483)Online publication date: 26-Feb-2019
  • (2019)Easy Dataflow Programming in Clusters with UPC++ DepSpawnIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2018.288471630:6(1267-1282)Online publication date: 1-Jun-2019
  • (2018)Efficient Runtime Support for a Partitioned Global Logical Address SpaceProceedings of the 47th International Conference on Parallel Processing10.1145/3225058.3225092(1-10)Online publication date: 13-Aug-2018
  • Show More Cited By

Index Terms

  1. A Local-View Array Library for Partitioned Global Address Space C++ Programs

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      ARRAY'14: Proceedings of ACM SIGPLAN International Workshop on Libraries, Languages, and Compilers for Array Programming
      June 2014
      112 pages
      ISBN:9781450329378
      DOI:10.1145/2627373
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 09 June 2014

      Permissions

      Request permissions for this article.

      Check for updates

      Qualifiers

      • Tutorial
      • Research
      • Refereed limited

      Conference

      PLDI '14
      Sponsor:

      Acceptance Rates

      ARRAY'14 Paper Acceptance Rate 17 of 25 submissions, 68%;
      Overall Acceptance Rate 17 of 25 submissions, 68%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)1
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 05 Jan 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2019)BLAS-on-flashProceedings of the 16th USENIX Conference on Networked Systems Design and Implementation10.5555/3323234.3323273(469-483)Online publication date: 26-Feb-2019
      • (2019)Easy Dataflow Programming in Clusters with UPC++ DepSpawnIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2018.288471630:6(1267-1282)Online publication date: 1-Jun-2019
      • (2018)Efficient Runtime Support for a Partitioned Global Logical Address SpaceProceedings of the 47th International Conference on Parallel Processing10.1145/3225058.3225092(1-10)Online publication date: 13-Aug-2018
      • (2018)Investigating the performance and productivity of DASH using the Cowichan problemsProceedings of Workshops of HPC Asia10.1145/3176364.3176366(11-20)Online publication date: 31-Jan-2018
      • (2016)A Hartree-Fock Application Using UPC++ and the New DArray Library2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS.2016.108(453-462)Online publication date: May-2016
      • (2016)A Multi-dimensional Distributed Array Abstraction for PGAS2016 IEEE 18th International Conference on High Performance Computing and Communications; IEEE 14th International Conference on Smart City; IEEE 2nd International Conference on Data Science and Systems (HPCC/SmartCity/DSS)10.1109/HPCC-SmartCity-DSS.2016.0150(1061-1068)Online publication date: Dec-2016
      • (2016)DASH: A C++ PGAS Library for Distributed Data Structures and Parallel Algorithms2016 IEEE 18th International Conference on High Performance Computing and Communications; IEEE 14th International Conference on Smart City; IEEE 2nd International Conference on Data Science and Systems (HPCC/SmartCity/DSS)10.1109/HPCC-SmartCity-DSS.2016.0140(983-990)Online publication date: Dec-2016
      • (2016)Expressing and Exploiting Multi-Dimensional Locality in DASHSoftware for Exascale Computing - SPPEXA 2013-201510.1007/978-3-319-40528-5_15(341-359)Online publication date: 15-Sep-2016
      • (2015)Implementing High-Performance Geometric Multigrid Solver with Naturally Grained MessagesProceedings of the 2015 9th International Conference on Partitioned Global Address Space Programming Models10.1109/PGAS.2015.12(38-46)Online publication date: 16-Sep-2015
      • (2014)Evaluation of PGAS Communication Paradigms with Geometric MultigridProceedings of the 8th International Conference on Partitioned Global Address Space Programming Models10.1145/2676870.2676874(1-12)Online publication date: 6-Oct-2014

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media