research-article

Towards a streaming model for nested data parallelism

Authors:

Frederik M. Madsen,

Andrzej FilinskiAuthors Info & Claims

FHPC '13: Proceedings of the 2nd ACM SIGPLAN workshop on Functional high-performance computing

Pages 13 - 24

https://doi.org/10.1145/2502323.2502330

Published: 23 September 2013 Publication History

Get Access

Abstract

The language-integrated cost semantics for nested data parallelism pioneered by NESL provides an intuitive, high-level model for predicting performance and scalability of parallel algorithms with reasonable accuracy. However, this predictability, obtained through a uniform, parallelism-flattening execution strategy, comes at the price of potentially prohibitive space usage in the common case of computations with an excess of available parallelism, such as dense-matrix multiplication.

We present a simple nested data-parallel functional language and associated cost semantics that retains NESL's intuitive work--depth model for time complexity, but also allows highly parallel computations to be expressed in a space-efficient way, in the sense that memory usage on a single (or a few) processors is of the same order as for a sequential formulation of the algorithm, and in general scales smoothly with the actually realized degree of parallelism, not the potential parallelism.

The refined semantics is based on distinguishing formally between fully materialized (i.e., explicitly allocated in memory all at once) "vectors" and potentially ephemeral "sequences" of values, with the latter being bulk-processable in a streaming fashion. This semantics is directly compatible with previously proposed piecewise execution models for nested data parallelism, but allows the expected space usage to be reasoned about directly at the source-language level.

The language definition and implementation are still very much work in progress, but we do present some preliminary examples and timings, suggesting that the streaming model has practical potential.

References

[1]

L. Bergstrom and J. H. Reppy. Nested data-parallelism on the GPU. In International Conference on Functional Programming, ICFP'12, pages 247--258, Copenhagen, Denmark, Sept. 2012.

Digital Library

Google Scholar

[2]

G. E. Blelloch. NESL: A nested data-parallel language. Technical Report Carnegie Mellon University-CS-92-103; updated version: Carnegie Mellon University-CS-05-170, School of Computer Science, Carnegie Mellon University, 1992.

Digital Library

Google Scholar

[3]

G. E. Blelloch and J. Greiner. A provable time and space efficient implementation of NESL. In International Conference on Functional Programming, ICFP'96, pages 213--225, Philadelphia, Pennsylvania, May 1996.

Digital Library

Google Scholar

[4]

G. E. Blelloch, S. Chatterjee, J. C. Hardwick, J. Sipelstein, and M. Zagha. Implementation of a portable nested data-parallel language. Journal of Parallel and Distributed Computing, 21 (1): 4--14, Apr. 1994.

Digital Library

Google Scholar

[5]

M. M. T. Chakravarty, G. Keller, S. Lee, T. L. McDonell, and V. Grover. Accelerating Haskell array codes with multicore GPUs. In Sixth Workshop on Declarative Aspects of Multicore Programming, DAMP'11, pages 3--14, Austin, Texas, Jan. 2011.

Digital Library

Google Scholar

[6]

B. Lippmeier, M. M. T. Chakravarty, G. Keller, R. Leshchinskiy, and S. L. Peyton Jones. Work efficient higher-order vectorisation. In International Conference on Functional Programming, ICFP'12, pages 259--270, Copenhagen, Denmark, Sept. 2012.

Digital Library

Google Scholar

[7]

L. Nyland, M. Harris, and J. Prins. Chapter 31. Fast N-Body Simulation with CUDA. In H. Nguyen, editor, GPU Gems 3. Addison-Wesley Professional, 2007.

Google Scholar

[8]

D. W. Palmer, J. F. Prins, S. Chatterjee, and R. E. Faith. Piecewise execution of nested data-parallel programs. In Languages and Compilers for Parallel Computing, 8th International Workshop, LCPC'95, volume 1033 of Lecture Notes in Computer Science, Columbus, Ohio, Aug. 1995.

Digital Library

Google Scholar

[9]

D. W. Palmer, J. F. Prins, and S. Westfold. Work-efficient nested data-parallelism. In Fifth IEEE Symposium on Frontiers of Massively Parallel Processing, FRONTIERS'95, pages 186--193, 1995.

Digital Library

Google Scholar

[10]

S.-B. Scholz. Single Assignment C: Efficient support for high-level array operations in a functional setting. Journal of Functional Programming, 13 (6): 1005--1059, 2003.

Digital Library

Google Scholar

[11]

T. J. Sheffler. Implementing the multiprefix operation on parallel and vector computers. In Fifth Annual ACM Symposium on Parallel Algorithms and Architectures, pages 377--386, 1993.

Digital Library

Google Scholar

[12]

D. Spoonhower, G. E. Blelloch, R. Harper, and P. B. Gibbons. Space profiling for parallel functional programs. In International Conference on Functional Programming, ICFP'08, pages 253--264, Victoria, BC, Canada, Sept. 2008.

Digital Library

Google Scholar

[13]

Y. Zhang and F. Mueller. CuNesl: Compiling nested data-parallel languages for SIMT architectures. In 41st International Conference on Parallel Processing, ICPP 2012, pages 340--349, 2012.

Digital Library

Google Scholar

Cited By

View all

(2018)Array streaming for array programmingInternational Journal of Computational Science and Engineering10.5555/3292750.329275217:3(263-282)Online publication date: 1-Jan-2018
https://dl.acm.org/doi/10.5555/3292750.3292752
Takahashi RSasatani TOkuya FNarusue YKawahara Y(2018)A Cuttable Wireless Power Transfer SheetProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/32870682:4(1-25)Online publication date: 27-Dec-2018
https://dl.acm.org/doi/10.1145/3287068
Srivastava NNewn JVelloso E(2018)Combining Low and Mid-Level Gaze Features for Desktop Activity RecognitionProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/32870672:4(1-27)Online publication date: 27-Dec-2018
https://dl.acm.org/doi/10.1145/3287067
Show More Cited By

Index Terms

Towards a streaming model for nested data parallelism
1. Computing methodologies
  1. Parallel computing methodologies
    1. Parallel programming languages
2. Software and its engineering
  1. Software notations and tools
    1. General programming languages
      1. Language features
        Concurrent programming structures
      2. Language types
        Functional languages
        Parallel programming languages

Recommendations

Nested data-parallelism on the gpu
ICFP '12: Proceedings of the 17th ACM SIGPLAN international conference on Functional programming

Graphics processing units (GPUs) provide both memory bandwidth and arithmetic performance far greater than that available on CPUs but, because of their Single-Instruction-Multiple-Data (SIMD) architecture, they are hard to program. Most of the programs ...
Streaming nested data parallelism on multicores
FHPC 2016: Proceedings of the 5th International Workshop on Functional High-Performance Computing

The paradigm of nested data parallelism (NDP) allows a variety of semi-regular computation tasks to be mapped onto SIMD-style hardware, including GPUs and vector units. However, some care is needed to keep down space consumption in situations where the ...
Space-efficient implementation of nested parallelism

Many of today's high level parallel languages support dynamic, fine-grained parallelism. These languages allow the user to expose all the parallelism in the program, which is typically of a much higher degree than the number of processors. Hence an ...

Comments

Information & Contributors

Information

Published In

FHPC '13: Proceedings of the 2nd ACM SIGPLAN workshop on Functional high-performance computing

September 2013

104 pages

ISBN:9781450323819

DOI:10.1145/2502323

General Chairs:
Clemens Grelck
University of Amsterdam, The Netherlands
,
Fritz Henglein
University of Copenhagen, Denmark
,
Program Chairs:
Umut Acar
Carnegie Mellon University, USA
,
Jost Berthold
University of Copenhagen, Denmark

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 September 2013

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

ICFP'13

Sponsor:

SIGPLAN

ICFP'13: ACM SIGPLAN International Conference on Functional Programming

September 23, 2013

Massachusetts, Boston, USA

Acceptance Rates

FHPC '13 Paper Acceptance Rate 8 of 14 submissions, 57%;

Overall Acceptance Rate 18 of 25 submissions, 72%

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

25
Total Citations
View Citations
110
Total Downloads

Downloads (Last 12 months)4
Downloads (Last 6 weeks)0

Reflects downloads up to 12 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

View all

(2018)Array streaming for array programmingInternational Journal of Computational Science and Engineering10.5555/3292750.329275217:3(263-282)Online publication date: 1-Jan-2018
https://dl.acm.org/doi/10.5555/3292750.3292752
Takahashi RSasatani TOkuya FNarusue YKawahara Y(2018)A Cuttable Wireless Power Transfer SheetProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/32870682:4(1-25)Online publication date: 27-Dec-2018
https://dl.acm.org/doi/10.1145/3287068
Srivastava NNewn JVelloso E(2018)Combining Low and Mid-Level Gaze Features for Desktop Activity RecognitionProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/32870672:4(1-27)Online publication date: 27-Dec-2018
https://dl.acm.org/doi/10.1145/3287067
Huang MSu SZhang HCai GGong DCao DLi S(2018)Multifeature Selection for 3D Human Action RecognitionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/317775714:2(1-18)Online publication date: 22-May-2018
https://dl.acm.org/doi/10.1145/3177757
Gupta ASinghal D(2018)Analytical Global Median Filtering Forensics Based on Moment HistogramsACM Transactions on Multimedia Computing, Communications, and Applications10.1145/317665014:2(1-23)Online publication date: 25-Apr-2018
https://dl.acm.org/doi/10.1145/3176650
Mazaheri AGong BShah M(2018)Learning a Multi-Concept Video Retrieval Model with Multiple Latent VariablesACM Transactions on Multimedia Computing, Communications, and Applications10.1145/317664714:2(1-21)Online publication date: 25-Apr-2018
https://dl.acm.org/doi/10.1145/3176647
Chen CChoudhury VNewton R(2017)Adaptive lock-free data structures in Haskell: a general method for concurrent implementation swappingACM SIGPLAN Notices10.1145/3156695.312297352:10(197-211)Online publication date: 7-Sep-2017
https://dl.acm.org/doi/10.1145/3156695.3122973
Yates RScott M(2017)Improving STM performance with transactional structsACM SIGPLAN Notices10.1145/3156695.312297252:10(186-196)Online publication date: 7-Sep-2017
https://dl.acm.org/doi/10.1145/3156695.3122972
Clifton-Everest RMcDonell TChakravarty MKeller G(2017)Streaming irregular arraysACM SIGPLAN Notices10.1145/3156695.312297152:10(174-185)Online publication date: 7-Sep-2017
https://dl.acm.org/doi/10.1145/3156695.3122971
Clifton-Everest RMcDonell TChakravarty MKeller GDiatchki I(2017)Streaming irregular arraysProceedings of the 10th ACM SIGPLAN International Symposium on Haskell10.1145/3122955.3122971(174-185)Online publication date: 7-Sep-2017
https://dl.acm.org/doi/10.1145/3122955.3122971
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Cited By

Index Terms

Recommendations

Nested data-parallelism on the gpu

Streaming nested data parallelism on multicores

Space-efficient implementation of nested parallelism