Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1122971.1122981acmconferencesArticle/Chapter ViewAbstractPublication PagesppoppConference Proceedingsconference-collections
Article

Programming for parallelism and locality with hierarchically tiled arrays

Published: 29 March 2006 Publication History

Abstract

Tiling has proven to be an effective mechanism to develop high performance implementations of algorithms. Tiling can be used to organize computations so that communication costs in parallel programs are reduced and locality in sequential codes or sequential components of parallel programs is enhanced.In this paper, a data type - Hierarchically Tiled Arrays or HTAs - that facilitates the direct manipulation of tiles is introduced. HTA operations are overloaded array operations. We argue that the implementation of HTAs in sequential OO languages transforms these languages into powerful tools for the development of high-performance parallel codes and codes with high degree of locality. To support this claim, we discuss our experiences with the implementation of HTAs for MATLAB and C++ and the rewriting of the NAS benchmarks and a few other programs into HTA-based parallel form.

References

[1]
Intel Math Kernel Library. http://www.intel.com/cd/software/products/asmo-na/eng/perflib/mkl/index.htm.
[2]
Nas Parallel Benchmarks. Website. http://www.nas.nasa.gov/Software/NPB/.
[3]
High Performance Fortran Forum. High Performance Fortran Specification Version 2.0, January 1997.
[4]
R. C. Armstrong and A. Cheung. POET (Parallel Object-oriented Environment and Toolkit) and Frameworks for Scientific Distributed Computing. In Proc. of 30th Hawaii International Conference on System Sciences (HICSS 1997), pages 54--63, Maui, Hawai, 1997.
[5]
G. H. Barnes, R. M. Brown, M. Kato, D. Kuck, D. Slotnick, and R. Stokes. The ILLIAC IV Computer. IEEE Trans., 8(17):746--757, 1968.
[6]
G. Burns, R. Daoud, and J. Vaigl. LAM: An Open Cluster Environment for MPI. In Proceedings of Supercomputing Symposium, pages 379--386, 1994.
[7]
L. Cannon. A Cellular Computer to Implement the Kalman Filter Algorithm. PhD thesis, Montana State University, 1969.
[8]
W. Carlson, J. Draper, D. Culler, K. Yelick, E. Brooks, and K. Warren. Introduction to UPC and Language Specification. Technical Report CCS-TR-99-157, IDA Center for Computing Sciences, 1999.
[9]
B. Chamberlain, S.Choi, E. Lewis, C. Lin, L. Synder, and W. Weathersby. The Case for High Level Parallel Programming in ZPL. IEEE Computational Science and Engineering, 5(3):76--86, July--September 1998.
[10]
B. Chapman, P. Mehrotra, and H. P. Zima. Vienna Fortrana Fortran Language Extension for Distributed Memory Multiprocessors. Languages, Compilers and Run-time Environments for Distributed Memory Machines, pages 39--62, 1992.
[11]
P. Charles, C. Donawa, K. Ebcioglu, C. Grothoff, A. Kielstra, C. von Praun, V. Saraswat, and V. Sarkar. X10: An Object-oriented Approach to Non-uniform Cluster Computing. In Procs. of the Conf. on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA) -- Onward! Track, Oct 2005.
[12]
S. J. Deitz. Renewed Hope for Data Parallelism: Unintegrated Support for Task Parallelism in ZPL. Technical Report UW-CSE-03-12-04, University of Washington, Dec 2003.
[13]
R. A. V. D. Geijn and J. Watts. SUMMA: Scalable Universal Matrix Multiplication Algorithm. Concurrency: Practice and Experience, 9(4):255--274, Apr 1997.
[14]
A. Geist, A. Beguelin, J. Dongarra, W. Jiang, R. Manchek, and V. S. Sunderamet. PVM: Parallel Virtual Machine: A Users' Guide and Tutorial for Networked Parallel Computing. MIT Press, 1994.
[15]
W. Gropp, E. Lusk, and A. Skjellum. Using MPI (2nd ed.): Portable Parallel Programming with the Message-Passing Interface". MIT Press, 1999.
[16]
S. Hiranandani, K. Kennedy, and C.-W. Tseng. Compiling Fortran D for MIMD Distributed-memory Machines. Commun. ACM, 35(8):66--80, 1992.
[17]
P. Husbands and C. Isbell. Matlab*p: A Tool for Interactive Supercomputing. In Procs. of the Ninth SIAM Conference on Parallel Processing for Scientific Computing, 1999.
[18]
C. Koelbel and P. Mehrotra. An Overview of High Performance Fortran. SIGPLAN Fortran Forum, 11(4):9--16, 1992.
[19]
T. A. Ngo. The Role of Performance Models in Parallel Programming and Languages. PhD thesis, Department of Computer Science and Engineering, University of Washington, 1997.
[20]
J. Nieplocha, R. J. Harrison, and R. J. Littlefield. Global arrays: a portable shared-memory programming model for distributed memory computers. In Supercomputing '94: Proc. of the 1994 Conf. on Supercomputing, pages 340--ff., Los Alamitos, CA, USA, 1994. IEEE Computer Society Press.
[21]
R. W. Numrich and J. Reid. Co-array Fortran for Parallel Programming. SIGPLAN Fortran Forum, 17(2):1--31, 1998.
[22]
D. Pham and et al. The Design and Implementation of a First-generation Cell Processor. In Procs. of the IEEE Solid-State Circuits Symposium, February 2005.
[23]
J. V. W. Reynders, P. J. Hinker, J. C. Cummings, S. R. Atlas, S. Banerjee, W. F. Humphrey, S. R. Karmesin, K. Keahey, M. Srikant, and M. D. Tholburn. POOMA: A Framework for Scientific Simulations of Paralllel Architectures. In G. V. Wilson and P. Lu, editors, Parallel Programming in C++, pages 547--588. MIT Press, 1996.
[24]
A. E. Trefethen, V. S. Menon, C. Chang, G. Czajkowski, C. Myers, and L. N. Trefethen. MultiMATLAB: MATLAB on Multiple Processors. Technical Report TR96-1586, May 1996.
[25]
R. Whaley, A. Petitet, and J. Dongarra. Automated Empirical Optimizations of Sofware and the ATLAS Project. Parallel Computing, 27(1-2):3--35, 2001.
[26]
M. E. Wolf and M. S. Lam. A Data Locality Optimizing Algorithm. In PLDI, pages 30--44. ACM Press, 1991.
[27]
K. A. Yelick, L. Semenzato, G. Pike, C. Miyamoto, B. Liblit, A. Krishnamurthy, P. N. Hilfinger, S. L. Graham, D. Gay, P. Colella, and A. Aiken. Titanium: A High-Performance Java Dialect. In Workshop on Java for High-Performance Network Computing, February 1998.

Cited By

View all
  • (2023)Embracing Irregular Parallelism in HPC with YGMProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3581784.3607103(1-13)Online publication date: 12-Nov-2023
  • (2021)Mamba: Portable Array-based Abstractions for Heterogeneous High-Performance Systems2021 International Workshop on Performance, Portability and Productivity in HPC (P3HPC)10.1109/P3HPC54578.2021.00005(10-21)Online publication date: Nov-2021
  • (2021)CubeGen: Code Generation for Accelerated GEMM-Based Convolution with TilingLanguages and Compilers for Parallel Computing10.1007/978-3-030-72789-5_11(147-163)Online publication date: 26-Mar-2021
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
PPoPP '06: Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
March 2006
258 pages
ISBN:1595931899
DOI:10.1145/1122971
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 29 March 2006

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. data-parallel
  2. locality enhancement
  3. parallel programming
  4. tiling

Qualifiers

  • Article

Conference

PPoPP06
Sponsor:

Acceptance Rates

Overall Acceptance Rate 230 of 1,014 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)15
  • Downloads (Last 6 weeks)0
Reflects downloads up to 20 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2023)Embracing Irregular Parallelism in HPC with YGMProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3581784.3607103(1-13)Online publication date: 12-Nov-2023
  • (2021)Mamba: Portable Array-based Abstractions for Heterogeneous High-Performance Systems2021 International Workshop on Performance, Portability and Productivity in HPC (P3HPC)10.1109/P3HPC54578.2021.00005(10-21)Online publication date: Nov-2021
  • (2021)CubeGen: Code Generation for Accelerated GEMM-Based Convolution with TilingLanguages and Compilers for Parallel Computing10.1007/978-3-030-72789-5_11(147-163)Online publication date: 26-Mar-2021
  • (2020)HCL: Distributing Parallel Data Structures in Extreme Scales2020 IEEE International Conference on Cluster Computing (CLUSTER)10.1109/CLUSTER49012.2020.00035(248-258)Online publication date: Sep-2020
  • (2019)A framework for GPU‐accelerated exploration of massive time‐varying rectilinear scalar volumesComputer Graphics Forum10.1111/cgf.1367138:3(53-66)Online publication date: 10-Jul-2019
  • (2019)UPC++: A High-Performance Communication Framework for Asynchronous Computation2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS.2019.00104(963-973)Online publication date: May-2019
  • (2019)HDArray: Parallel Array Interface for Distributed Heterogeneous DevicesLanguages and Compilers for Parallel Computing10.1007/978-3-030-34627-0_13(176-184)Online publication date: 13-Nov-2019
  • (2019)Dataflow Execution of Hierarchically Tiled ArraysEuro-Par 2019: Parallel Processing10.1007/978-3-030-29400-7_22(304-316)Online publication date: 26-Aug-2019
  • (2018)Morton ordering of 2D arrays for efficient access to hierarchical memoryInternational Journal of High Performance Computing Applications10.5555/3195474.319548532:1(189-203)Online publication date: 1-Jan-2018
  • (2018)Investigating the performance and productivity of DASH using the Cowichan problemsProceedings of Workshops of HPC Asia10.1145/3176364.3176366(11-20)Online publication date: 31-Jan-2018
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media