research-article

Efficient parallel stencil convolution in Haskell

Authors:

Gabriele KellerAuthors Info & Claims

ACM SIGPLAN Notices, Volume 46, Issue 12

Pages 59 - 70

https://doi.org/10.1145/2096148.2034684

Published: 22 September 2011 Publication History

Abstract

Stencil convolution is a fundamental building block of many scientific and image processing algorithms. We present a declarative approach to writing such convolutions in Haskell that is both efficient at runtime and implicitly parallel. To achieve this we extend our prior work on the Repa array library with two new features: partitioned and cursored arrays. Combined with careful management of the interaction between GHC and its back-end code generator LLVM, we achieve performance comparable to the standard OpenCV library.

Supplementary Material

JPG File (_talk6.jpg)

Download
10.96 KB

MP4 File (_talk6.mp4)

Download
66.51 MB

References

[1]

S. V. Adve, S. Heumann, R. Komuravelli, J. Overbey, P. Simmons, H. Sung, and M. Vakilian. A type and effect system for Deterministic Parallel Java. In In Proc. Intl. Conf. on Object-Oriented Programming, Systems, Languages, and Applications, 2009.

Digital Library

[2]

B. Alpern, M. N. Wegman, and F. K. Zadeck. Detecting equality of variables in programs. In Proc. of the 15th Symposium on Principles of Programming Languages, pages 1--11, 1988.

Digital Library

[3]

R. Barrett, P. Roth, and S. Poole. Finite difference stencils implemented using Chapel. Technical report, Oak Ridge National Laboratory, 2007.

[4]

M. Bolingbroke and S. Peyton Jones. Supercompilation by evaluation. In Proc. of the third ACM Haskell Symposium, pages 135--146. ACM, 2010.

Digital Library

[5]

G. Bradski and A. Kaehler. Learning OpenCV: Computer Vision with the OpenCV Library. O'Reilly Media, 2008.

[6]

J. Canny. Finding edges and lines in images. Technical report, Massachusetts Institute of Technology, Cambridge, MA, USA, 1983.

Digital Library

[7]

S. Carr, C. Ding, and P. Sweany. Improving software pipelining with unroll-and-jam. In Proc. of the 29th Hawaii International Conference on System Sciences. IEEE Computer Society, 1996.

Digital Library

[8]

M. M. Chakravarty, G. Keller, S. Lee, T. L. McDonell, and V. Grover. Accelerating Haskell array codes with multicore GPUs. In Proc. of the sixth workshop on Declarative Aspects of Multicore Programming, pages 3--14. ACM, 2011.

Digital Library

[9]

B. L. Chamberlain, S.-E. Choi, E. C. Lewis, C. Lin, L. Snyder, and W. D. Weathersby. ZPL: A machine independent programming language for parallel computers. IEEE Transactions on Software Engineering, 26: 197--211, 2000.

Digital Library

[10]

D. Coutts, R. Leshchinskiy, and D. Stewart. Stream fusion: from lists to streams to nothing at all. In Proc. of the 12th ACM SIGPLAN International Conference on Functional programming, pages 315--326. ACM, 2007.

Digital Library

[11]

K. Datta, M. Murphy, V. Volkov, S. Williams, J. Carter, L. Oliker, D. Patterson, J. Shalf, and K. Yelick. Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures. In Proc, of the ACM/IEEE Conference on Supercomputing, pages 4:1--4:12. IEEE Press, 2008.

Digital Library

[12]

D. G. Feitelson and L. Rudolph. Gang scheduling performance benefits for fine-grain synchronization. Journal of Parallel and Distributed Computing, 16: 306--318, 1992.

[13]

P. N. Hilfinger, D. Bonachea, D. Gay, S. Graham, B. Liblit, G. Pike, and K. Yelick. Titanium language reference manual. Technical report, Berkeley, CA, USA, 2001.

Digital Library

[14]

C. S. Ierotheou, S. P. Johnson, M. Cross, and P. F. Leggett. Computer aided parallelisation tools (CAPTools) - conceptual overview and performance on the parallelisation of structured mesh codes. Parallel Comput., 22: 163--195, February 1996.

Digital Library

[15]

G. Keller, M. M. Chakravarty, R. Leshchinskiy, S. Peyton Jones, and B. Lippmeier. Regular, Shape-polymorphic, Parallel Arrays in Haskell. In Proc. of the 15th ACM SIGPLAN International Conference on Functional Programming, pages 261--272. ACM, 2010.

Digital Library

[16]

S. Krishnamoorthy, M. Baskaran, U. Bondhugula, J. Ramanujam, A. Rountev, and P. Sadayappan. Effective automatic parallelization of stencil computations. In Proc. of the 2007 ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 235--244. ACM, 2007.

Digital Library

[17]

J. Launchbury and S. L. Peyton Jones. Lazy functional state threads. In Proc. of the ACM SIGPLAN 1994 conference on Programming Language Design and Implementation, pages 24--35. ACM, 1994.

Digital Library

[18]

M. Lesniak. PASTHA: parallelizing stencil calculations in Haskell. In Proc. of the 5th ACM SIGPLAN workshop on Declarative Aspects of Multicore Programming, pages 5--14. ACM, 2010.

Digital Library

[19]

N. Mitchell. Rethinking supercompilation. In Proceedings of the 15th ACM SIGPLAN International Conference on Functional Programming, pages 309--320. ACM, 2010.

Digital Library

[20]

R. W. Numrich. The computational energy spectrum of a program as it executes. The Journal of Supercomputing, 52 (2): 119--134, 2010.

Digital Library

[21]

L. O'Gorman, M. J. Sammon, and M. Seul. Practical Algorithms for Image Analysis. Cambridge University Press, 2nd edition, 2008.

Digital Library

[22]

D. A. Orchard, M. Bolingbroke, and A. Mycroft. Ypnos: Declarative, Parallel Structured Grid Programming. In Proc. of the 5th ACM SIGPLAN workshop on Declarative Aspects of Multicore Programming, pages 15--24. ACM, 2010.

Digital Library

[23]

S. Peyton Jones, A. Tolmach, and T. Hoare. Playing by the rules: Rewriting as a practical optimisation technique in GHC. In Proc. of the Haskell Workshop, 2001.

[24]

Repa. The Repa Home Page, Mar. 2011. http://trac.haskell.org/repa.

[25]

B. K. Rosen, M. N. Wegman, and F. K. Zadeck. Global value numbers and redundant computations. In Proc. of the 15th Symposium on Principles of Programming Languages. ACM, 1988.

Digital Library

[26]

S.-B. Scholz. Single assignment C -- efficient support for high-level array operations in a functional setting. Journal of Functional Programming, 13 (6): 1005--1059, 2003.

Digital Library

[27]

D. A. Terei and M. M. Chakravarty. An LLVM backend for GHC. In Proc. of the third ACM Symposium on Haskell, pages 109--120. ACM, 2010.

Digital Library

Cited By

Ankner JSvenningsson J(2013)An EDSL approach to high performance Haskell programmingACM SIGPLAN Notices10.1145/2578854.250378948:12(1-12)Online publication date: 23-Sep-2013
https://dl.acm.org/doi/10.1145/2578854.2503789
Ankner JSvenningsson JShan C(2013)An EDSL approach to high performance Haskell programmingProceedings of the 2013 ACM SIGPLAN symposium on Haskell10.1145/2503778.2503789(1-12)Online publication date: 23-Sep-2013
https://dl.acm.org/doi/10.1145/2503778.2503789
Hélène CMinh-Hoang LSébastien L(2013)Parallelization of Shallow-water Equations with the Algorithmic Skeleton Library SkelGISProcedia Computer Science10.1016/j.procs.2013.05.22318(591-600)Online publication date: 2013
https://doi.org/10.1016/j.procs.2013.05.223
Show More Cited By

Index Terms

Efficient parallel stencil convolution in Haskell
1. Software and its engineering
  1. Software notations and tools
    1. General programming languages
      1. Language features
        Abstract data types
        Concurrent programming structures
        Polymorphism

Recommendations

Efficient parallel stencil convolution in Haskell
Haskell '11: Proceedings of the 4th ACM symposium on Haskell

Stencil convolution is a fundamental building block of many scientific and image processing algorithms. We present a declarative approach to writing such convolutions in Haskell that is both efficient at runtime and implicitly parallel. To achieve this ...
Accelerating Haskell array codes with multicore GPUs
DAMP '11: Proceedings of the sixth workshop on Declarative aspects of multicore programming

Current GPUs are massively parallel multicore processors optimised for workloads with a large degree of SIMD parallelism. Good performance requires highly idiomatic programs, whose development is work intensive and requires expert knowledge.

To raise ...
Expressive array constructs in an embedded GPU kernel programming language
DAMP '12: Proceedings of the 7th workshop on Declarative aspects and applications of multicore programming

Graphics Processing Units (GPUs) are powerful computing devices that with the advent of CUDA/OpenCL are becomming useful for general purpose computations. Obsidian is an embedded domain specific language that generates CUDA kernels from functional ...

Comments

Information & Contributors

Information

Published In

cover image ACM SIGPLAN Notices

ACM SIGPLAN Notices Volume 46, Issue 12

Haskell '11

December 2011

129 pages

ISSN:0362-1340

EISSN:1558-1160

DOI:10.1145/2096148

Issue’s Table of Contents

Haskell '11: Proceedings of the 4th ACM symposium on Haskell
September 2011
136 pages
ISBN:9781450308601
DOI:10.1145/2034675
General Chair:
Koen Claessen
Chalmers University of Technology

Copyright © 2011 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 September 2011

Published in SIGPLAN Volume 46, Issue 12

Check for updates

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

26
Total Citations
View Citations
368
Total Downloads

Downloads (Last 12 months)3
Downloads (Last 6 weeks)0

Reflects downloads up to 11 Aug 2024

Other Metrics

View Author Metrics

Citations

Cited By

Ankner JSvenningsson J(2013)An EDSL approach to high performance Haskell programmingACM SIGPLAN Notices10.1145/2578854.250378948:12(1-12)Online publication date: 23-Sep-2013
https://dl.acm.org/doi/10.1145/2578854.2503789
Ankner JSvenningsson JShan C(2013)An EDSL approach to high performance Haskell programmingProceedings of the 2013 ACM SIGPLAN symposium on Haskell10.1145/2503778.2503789(1-12)Online publication date: 23-Sep-2013
https://dl.acm.org/doi/10.1145/2503778.2503789
Hélène CMinh-Hoang LSébastien L(2013)Parallelization of Shallow-water Equations with the Algorithmic Skeleton Library SkelGISProcedia Computer Science10.1016/j.procs.2013.05.22318(591-600)Online publication date: 2013
https://doi.org/10.1016/j.procs.2013.05.223
Mainland GLeshchinskiy RJones S(2017)Exploiting vector instructions with generalized stream fusionCommunications of the ACM10.1145/306059760:5(83-91)Online publication date: 24-Apr-2017
https://dl.acm.org/doi/10.1145/3060597
Maier PStewart RMichaelson G(2016)Why So Many?Proceedings of the 1st International Workshop on Real World Domain Specific Languages10.1145/2889420.2893172(1-2)Online publication date: 12-Mar-2016
https://dl.acm.org/doi/10.1145/2889420.2893172
Liu HDay LGlew NAnderson TBarik RBerthold JSheeran MNewton R(2014)Native offload of Haskell repa programs to integrated GPUsProceedings of the 3rd ACM SIGPLAN workshop on Functional high-performance computing10.1145/2636228.2636236(87-97)Online publication date: 3-Sep-2014
https://dl.acm.org/doi/10.1145/2636228.2636236
Svensson BSvenningsson JBerthold JSheeran MNewton R(2014)Defunctionalizing push arraysProceedings of the 3rd ACM SIGPLAN workshop on Functional high-performance computing10.1145/2636228.2636231(43-52)Online publication date: 3-Sep-2014
https://dl.acm.org/doi/10.1145/2636228.2636231
Bezanson JChen JKarpinski SShah VEdelman A(2014)Array Operators Using Multiple DispatchProceedings of ACM SIGPLAN International Workshop on Libraries, Languages, and Compilers for Array Programming10.1145/2627373.2627383(56-61)Online publication date: 9-Jun-2014
https://dl.acm.org/doi/10.1145/2627373.2627383
Petersen LAnderson TLiu HGlew NPlasmeijer R(2013)Measuring the Haskell GapProceedings of the 25th symposium on Implementation and Application of Functional Languages10.1145/2620678.2620685(61-72)Online publication date: 28-Aug-2013
https://dl.acm.org/doi/10.1145/2620678.2620685
Petersen LOrchard DGlew N(2013)Automatic SIMD vectorization for HaskellACM SIGPLAN Notices10.1145/2544174.250060548:9(25-36)Online publication date: 25-Sep-2013
https://dl.acm.org/doi/10.1145/2544174.2500605
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents