Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2502323.2502330acmconferencesArticle/Chapter ViewAbstractPublication PagesicfpConference Proceedingsconference-collections
research-article

Towards a streaming model for nested data parallelism

Published: 23 September 2013 Publication History

Abstract

The language-integrated cost semantics for nested data parallelism pioneered by NESL provides an intuitive, high-level model for predicting performance and scalability of parallel algorithms with reasonable accuracy. However, this predictability, obtained through a uniform, parallelism-flattening execution strategy, comes at the price of potentially prohibitive space usage in the common case of computations with an excess of available parallelism, such as dense-matrix multiplication.
We present a simple nested data-parallel functional language and associated cost semantics that retains NESL's intuitive work--depth model for time complexity, but also allows highly parallel computations to be expressed in a space-efficient way, in the sense that memory usage on a single (or a few) processors is of the same order as for a sequential formulation of the algorithm, and in general scales smoothly with the actually realized degree of parallelism, not the potential parallelism.
The refined semantics is based on distinguishing formally between fully materialized (i.e., explicitly allocated in memory all at once) "vectors" and potentially ephemeral "sequences" of values, with the latter being bulk-processable in a streaming fashion. This semantics is directly compatible with previously proposed piecewise execution models for nested data parallelism, but allows the expected space usage to be reasoned about directly at the source-language level.
The language definition and implementation are still very much work in progress, but we do present some preliminary examples and timings, suggesting that the streaming model has practical potential.

References

[1]
L. Bergstrom and J. H. Reppy. Nested data-parallelism on the GPU. In International Conference on Functional Programming, ICFP'12, pages 247--258, Copenhagen, Denmark, Sept. 2012.
[2]
G. E. Blelloch. NESL: A nested data-parallel language. Technical Report Carnegie Mellon University-CS-92-103; updated version: Carnegie Mellon University-CS-05-170, School of Computer Science, Carnegie Mellon University, 1992.
[3]
G. E. Blelloch and J. Greiner. A provable time and space efficient implementation of NESL. In International Conference on Functional Programming, ICFP'96, pages 213--225, Philadelphia, Pennsylvania, May 1996.
[4]
G. E. Blelloch, S. Chatterjee, J. C. Hardwick, J. Sipelstein, and M. Zagha. Implementation of a portable nested data-parallel language. Journal of Parallel and Distributed Computing, 21 (1): 4--14, Apr. 1994.
[5]
M. M. T. Chakravarty, G. Keller, S. Lee, T. L. McDonell, and V. Grover. Accelerating Haskell array codes with multicore GPUs. In Sixth Workshop on Declarative Aspects of Multicore Programming, DAMP'11, pages 3--14, Austin, Texas, Jan. 2011.
[6]
B. Lippmeier, M. M. T. Chakravarty, G. Keller, R. Leshchinskiy, and S. L. Peyton Jones. Work efficient higher-order vectorisation. In International Conference on Functional Programming, ICFP'12, pages 259--270, Copenhagen, Denmark, Sept. 2012.
[7]
L. Nyland, M. Harris, and J. Prins. Chapter 31. Fast N-Body Simulation with CUDA. In H. Nguyen, editor, GPU Gems 3. Addison-Wesley Professional, 2007.
[8]
D. W. Palmer, J. F. Prins, S. Chatterjee, and R. E. Faith. Piecewise execution of nested data-parallel programs. In Languages and Compilers for Parallel Computing, 8th International Workshop, LCPC'95, volume 1033 of Lecture Notes in Computer Science, Columbus, Ohio, Aug. 1995.
[9]
D. W. Palmer, J. F. Prins, and S. Westfold. Work-efficient nested data-parallelism. In Fifth IEEE Symposium on Frontiers of Massively Parallel Processing, FRONTIERS'95, pages 186--193, 1995.
[10]
S.-B. Scholz. Single Assignment C: Efficient support for high-level array operations in a functional setting. Journal of Functional Programming, 13 (6): 1005--1059, 2003.
[11]
T. J. Sheffler. Implementing the multiprefix operation on parallel and vector computers. In Fifth Annual ACM Symposium on Parallel Algorithms and Architectures, pages 377--386, 1993.
[12]
D. Spoonhower, G. E. Blelloch, R. Harper, and P. B. Gibbons. Space profiling for parallel functional programs. In International Conference on Functional Programming, ICFP'08, pages 253--264, Victoria, BC, Canada, Sept. 2008.
[13]
Y. Zhang and F. Mueller. CuNesl: Compiling nested data-parallel languages for SIMT architectures. In 41st International Conference on Parallel Processing, ICPP 2012, pages 340--349, 2012.

Cited By

View all
  • (2018)Array streaming for array programmingInternational Journal of Computational Science and Engineering10.5555/3292750.329275217:3(263-282)Online publication date: 1-Jan-2018
  • (2018)A Cuttable Wireless Power Transfer SheetProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/32870682:4(1-25)Online publication date: 27-Dec-2018
  • (2018)Combining Low and Mid-Level Gaze Features for Desktop Activity RecognitionProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/32870672:4(1-27)Online publication date: 27-Dec-2018
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
FHPC '13: Proceedings of the 2nd ACM SIGPLAN workshop on Functional high-performance computing
September 2013
104 pages
ISBN:9781450323819
DOI:10.1145/2502323
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 September 2013

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. cost semantics
  2. dataflow networks
  3. space efficiency

Qualifiers

  • Research-article

Conference

ICFP'13
Sponsor:

Acceptance Rates

FHPC '13 Paper Acceptance Rate 8 of 14 submissions, 57%;
Overall Acceptance Rate 18 of 25 submissions, 72%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)4
  • Downloads (Last 6 weeks)0
Reflects downloads up to 12 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2018)Array streaming for array programmingInternational Journal of Computational Science and Engineering10.5555/3292750.329275217:3(263-282)Online publication date: 1-Jan-2018
  • (2018)A Cuttable Wireless Power Transfer SheetProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/32870682:4(1-25)Online publication date: 27-Dec-2018
  • (2018)Combining Low and Mid-Level Gaze Features for Desktop Activity RecognitionProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/32870672:4(1-27)Online publication date: 27-Dec-2018
  • (2018)Multifeature Selection for 3D Human Action RecognitionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/317775714:2(1-18)Online publication date: 22-May-2018
  • (2018)Analytical Global Median Filtering Forensics Based on Moment HistogramsACM Transactions on Multimedia Computing, Communications, and Applications10.1145/317665014:2(1-23)Online publication date: 25-Apr-2018
  • (2018)Learning a Multi-Concept Video Retrieval Model with Multiple Latent VariablesACM Transactions on Multimedia Computing, Communications, and Applications10.1145/317664714:2(1-21)Online publication date: 25-Apr-2018
  • (2017)Adaptive lock-free data structures in Haskell: a general method for concurrent implementation swappingACM SIGPLAN Notices10.1145/3156695.312297352:10(197-211)Online publication date: 7-Sep-2017
  • (2017)Improving STM performance with transactional structsACM SIGPLAN Notices10.1145/3156695.312297252:10(186-196)Online publication date: 7-Sep-2017
  • (2017)Streaming irregular arraysACM SIGPLAN Notices10.1145/3156695.312297152:10(174-185)Online publication date: 7-Sep-2017
  • (2017)Streaming irregular arraysProceedings of the 10th ACM SIGPLAN International Symposium on Haskell10.1145/3122955.3122971(174-185)Online publication date: 7-Sep-2017
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media