research-article

Size slicing: a hybrid approach to size inference in futhark

Authors:

Troels Henriksen,

Cosmin E. OanceaAuthors Info & Claims

FHPC '14: Proceedings of the 3rd ACM SIGPLAN workshop on Functional high-performance computing

Pages 31 - 42

https://doi.org/10.1145/2636228.2636238

Published: 03 September 2014 Publication History

Abstract

We present a shape inference analysis for a purely-functional language, named Futhark, that supports nested parallelism via array combinators such as map, reduce, filter}, and scan}. Our approach is to infer code for computing precise shape information at run-time, which in the most common cases can be effectively optimized by standard compiler optimizations. Instead of restricting the language or sacrificing ease of use, the language allows the occasional shape-dynamic, and even shape-misbehaving, constructs. Inherently shape-dynamic code is treated with a fall-back technique that preserves, asymptotically, the number of operations of the program and that computes and returns the array's shape alongside with its value. This approach leads to a shape-dependent system with existentially-quantified types, where static shape inference corresponds to eliminating existential quantifications from the types of program expressions.

We optimize the common case to negligible overhead via size slicing: a technique that separates the computation of the array's shape from its values. This allows the shape to be calculated in advance and to be used to instantiate the previously existentially-quantified shapes of the value slice. We report negligible overhead, on several mini-benchmarks and three real-world applications.

References

[1]

E. Barendsen and S. Smetsers. Conventional and Uniqueness Typing in Graph Rewrite Systems. In Found. of Soft. Tech. and Theoretical Comp. Sci. (FSTTCS), volume 761 of phLNCS, pages 41--51, 1993.

Digital Library

[2]

L. Bergstrom and J. Reppy. Nested data-parallelism on the GPU. In Proceedings of the 17th ACM SIGPLAN International Conference on Functional Programming (ICFP 2012), pages 247--258, Sept. 2012.

Digital Library

[3]

L. Birkedal, M. Tofte, and M. Vejlstrup. From region inference to von Neumann machines via region representation inference. In ACM Symposium on Principles of Programming Languages, POPL'96, pages 171--183. ACM Press, January 1996.

Digital Library

[4]

G. Blelloch. Programming Parallel Algorithms. Communications of the ACM (CACM), 39 (3): 85--97, 1996.

Digital Library

[5]

G. E. Blelloch. Vector Models for Data-parallel Computing. MIT Press, Cambridge, MA, USA, 1990. ISBN 0-262-02313-X.

Digital Library

[6]

G. E. Blelloch, J. C. Hardwick, J. Sipelstein, M. Zagha, and S. Chatterjee. Implementation of a Portable Nested Data-Parallel Language. Journal of parallel and distributed computing, 21 (1): 4--14, 1994.

Digital Library

[7]

W. Blume, R. Eigenmann, K. Faigin, J. Grout, J. Hoeflinger, D. Padua, P. Petersen, W. Pottenger, L. Rauchwerger, P. Tu, and S. Weatherford. Polaris: Improving the Effectiveness of Parallelizing Compilers. In Procs. Langs. Comp. Parallel Computing (LCPC), pages 141--154. Springer-Verlag, 1994.

Digital Library

[8]

M. M. Chakravarty, G. Keller, S. Lee, T. L. McDonell, and V. Grover. Accelerating Haskell Array Codes with Multicore GPUs. In International Workshop on Declarative Aspects of Multicore Programming, DAMP'11, pages 3--14, 2011.

Digital Library

[9]

Y. Chicha, M. Lloyd, C. Oancea, and S. M. Watt. Parametric Polymorphism for Computer Algebra Software Components. In Procs. Int. Symp. Symbolic and Numeric Alg. for Scientific Computing (SYNASC), pages 119--130. Mirton Publishing House, 2004.

[10]

K. Claessen, M. Sheeran, and B. J. Svensson. Expressive Array Constructs in an Embedded GPU Kernel Programming Language. In International Workshop on Declarative Aspects of Multicore Programming, DAMP'12, pages 21--30, 2012.

Digital Library

[11]

F. Dang, H. Yu, and L. Rauchwerger. The R-LRPD Test: Speculative Parallelization of Partially Parallel Loops. In Int. Par. and Distr. Processing Symp. (PDPS), pages 20--29, 2002.

Digital Library

[12]

M. Elsman and M. Dybdal. Compiling a Subset of APL Into a Typed Intermediate Language. In Procs. Int. Workshop on Lib. Lang. and Compilers for Array Prog. (ARRAY). ACM, 2014.

Digital Library

[13]

K. Fraser and T. Harris. Concurrent Programming Without Locks. Trans. of Comput. Syst. (TOCS), 25 (2), May 2007.

Digital Library

[14]

C. Grelck. Shared memory multiprocessor support for functional array processing in SAC. Journal of Functional Programming (JFP), 15 (3): 353--401, 2005.

Digital Library

[15]

C. Grelck and S.-B. Scholz. Accelerating APL programs with SAC. In Proceedings of the Conference on APL '99: On Track to the 21st Century, APL'99, pages 50--57. ACM, 1999.

Digital Library

[16]

C. Grelck and S.-B. Scholz. SAC: A functional array language for efficient multithreaded execution. Int. Journal of Parallel Programming, 34 (4): 383--427, 2006.

Digital Library

[17]

C. Grelck and F. Tang. Towards Hybrid Array Types in SAC. In 7th Workshop on Prg. Lang., (Soft. Eng. Conf.), pages 129--145, 2014.

[18]

J. Guo, J. Thiyagalingam, and S.-B. Scholz. Breaking the GPU programming barrier with the auto-parallelising SAC compiler. In Procs. Workshop Decl. Aspects of Multicore Prog. (DAMP), pages 15--24. ACM, 2011.

Digital Library

[19]

G. Hains and L. M. R. Mullin. Parallel functional programming with arrays. The Computer Journal, 36 (3): 238--245, 1993.

[20]

M. W. Hall, S. P. Amarasinghe, B. R. Murphy, S.-W. Liao, and M. S. Lam. Interprocedural Parallelization Analysis in SUIF. Trans. on Prog. Lang. and Sys. (TOPLAS), 27(4): 662--731, 2005.

Digital Library

[21]

T. Henriksen. Exploiting functional invariants to optimise parallelism: a dataflow approach. Master's thesis, DIKU, Denmark, 2014.

[22]

T. Henriksen and C. E. Oancea. A T2 Graph-Reduction Approach to Fusion. In Procs. Funct. High-Perf. Comp. (FHPC), pages 47--58. ACM, 2013. ISBN 978-1-4503-2381--9.

Digital Library

[23]

T. Henriksen and C. E. Oancea. Bounds Checking: An Instance of Hybrid Analysis. In Procs. Int. Workshop on Lib. Lang. and Compilers for Array Prog. (ARRAY). ACM, 2014.

Digital Library

[24]

K. E. Iverson. A Programming Language. John Wiley and Sons, Inc, May 1962.

Digital Library

[25]

C. B. Jay. Programming in fish. phInternational Journal on Software Tools for Technology Transfer, 2 (3): 307--315, 1999.

[26]

K. Kennedy, C. Koelbel, and H. Zima. The Rise and Fall of High Performance Fortran: An Historical Object Lesson. In Procs. Conf. on History of Prog. Lang. (HOPL III), pages 7-1-7-22. ACM, 2007.

Digital Library

[27]

G. Mainland and G. Morrisett. Nikola: Embedding Compiled GPU Functions in Haskell. In Proceedings of the 3rd ACM International Symposium on Haskell, pages 67--78, 2010.

Digital Library

[28]

C. Oancea, C. Andreetta, J. Berthold, A. Frisch, and F. Henglein. Financial Software on GPUs: between Haskell and Fortran. In Funct. High-Perf. Comp. (FHPC'12), 2012.

Digital Library

[29]

C. E. Oancea and A. Mycroft. Set-Congruence Dynamic Analysis for Software Thread-Level Speculation. In Procs. Langs. Comp. Parallel Computing (LCPC), pages 156--171, 2008.

Digital Library

[30]

C. E. Oancea and L. Rauchwerger. Logical Inference Techniques for Loop Parallelization. In Procs. of Int. Conf. Prog. Lang. Design and Impl. (PLDI), pages 509--520, 2012.

Digital Library

[31]

C. E. Oancea and S. M. Watt. Domains and Expressions: An Interface between Two Approaches to Computer Algebra. In Procs. Int. Symp. Symbolic Alg. Comp. (ISSAC), pages 261--269. ACM, 2005.

Digital Library

[32]

C. E. Oancea, A. Mycroft, and S. M. Watt. A New Approach to Parallelising Tracing Algorithms. In Procs. Int. Symp. on Memory Management (ISMM), pages 10--19. ACM, 2009.

Digital Library

[33]

L.-N. Pouchet, U. Bondhugula, C. Bastoul, A. Cohen, J. Ramanujam, P. Sadayappan, and N. Vasilache. Loop Transformations: Convexity, Pruning and Optimization. In Procs. Sym. Principles of Prog. Lang. (POPL), pages 549--562. ACM, 2011.

Digital Library

[34]

P. Rundberg and P. Stenström. An All-Software Thread-Level Data Dependence Speculation System for Multiprocs. phJournal of Instruction-Level Parallelism, 1999.

[35]

A. Sabry and M. Felleisen. Reasoning about programs in continuation-passing style. SIGPLAN Lisp Pointers, V (1): 288--298, Jan. 1992. ISSN 1045-3563.

Digital Library

[36]

J. E. Stone, D. Gohara, and G. Shi. OpenCL: A Parallel Programming Standard for Heterogeneous Computing Systems. IEEE Des. Test, 12 (3): 66--73, 2010. ISSN 0740-7475.

Digital Library

[37]

M. M. Strout, L. Carter, and J. Ferrante. Compile-time Composition of Run-time Data and Iteration Reorderings. In Procs. Int. Conf. Prog. Lang. Design and Implem. (PLDI), pages 91--102. ACM, 2003.

Digital Library

[38]

P. Thiemann and M. M. T. Chakravarty. Agda meets accelerate. In phProceedings of the 24th Symposium on Implementation and Application of Functional Languages, IFL'2012, 2013. Revised Papers, Springer-Verlag, LNCS 8241.

[39]

M. Tofte, L. Birkedal, M. Elsman, and N. Hallenberg. A retrospective on region-based memory management. phHigher-Order and Symbolic Computation (HOSC), 17 (3): 245--265, September 2004.

Digital Library

[40]

K. Trojahner and C. Grelck. Dependently typed array programs don't go wrong. The Journal of Logic and Algebraic Programming, 78 (7): 643--664, 2009. The 19th Nordic Workshop on Programming Theory (NWPT'2007).

[41]

K. Trojahner and C. Grelck. Descriptor-free representation of arrays with dependent types. In Proceedings of the 20th International Conference on Implementation and Application of Functional Languages, IFL'08, pages 100--117. Springer-Verlag, 2011.

Digital Library

[42]

M. Vejlstrup. Multiplicity inference. Master's thesis, Department of Computer Science, University of Copenhagen, September 1994.

[43]

S. M. Watt. Aldor. In J. Grabmeier, E. Kaltofen, and V. Weispfenning, editors, Handbook of Computer Algebra, pages 154--160, 2003.

[44]

S. M. Watt, R. D. Jenks, R. S. Sutor, and B. M. Trager. The Scratchpad II Type System: Domains and Subdomains. In Procs of Computing Tools For Scientific Problem Solving, pages 63--82. A. Miola ed. Academic Press, 1990.

Digital Library

[45]

Y. Zhang and F. Mueller. CuNesl: Compiling nested data-parallel languages for SIMT architectures. In Proceedings of the 2012 41st International Conference on Parallel Processing, ICPP'12, pages 340--349, Washington, DC, USA, 2012. IEEE Computer Society. ISBN 978-0-7695-4796-1.

Digital Library

Cited By

Henriksen TElsman MLow TGibbons J(2021)Towards size-dependent types for array programmingProceedings of the 7th ACM SIGPLAN International Workshop on Libraries, Languages and Compilers for Array Programming10.1145/3460944.3464310(1-14)Online publication date: 17-Jun-2021
https://dl.acm.org/doi/10.1145/3460944.3464310
van den Haak LMcDonell TKeller Gde Wolff I(2020)Accelerating Nested Data Parallelism: Preserving RegularityEuro-Par 2020: Parallel Processing10.1007/978-3-030-57675-2_27(426-442)Online publication date: 18-Aug-2020
https://doi.org/10.1007/978-3-030-57675-2_27
Tran DHenriksen TElsman MZocca M(2019)Compositional deep learning in FutharkProceedings of the 8th ACM SIGPLAN International Workshop on Functional High-Performance and Numerical Computing10.1145/3331553.3342617(47-59)Online publication date: 18-Aug-2019
https://dl.acm.org/doi/10.1145/3331553.3342617
Show More Cited By

Index Terms

Size slicing: a hybrid approach to size inference in futhark
1. Software and its engineering
  1. Software notations and tools
    1. Compilers

Recommendations

Depending on types
ICFP '14: Proceedings of the 19th ACM SIGPLAN international conference on Functional programming

Is Haskell a dependently typed programming language? Should it be? GHC's many type-system features, such as Generalized Algebraic Datatypes (GADTs), datatype promotion, multiparameter type classes, and type families, give programmers the ability to ...
Intrinsically-typed definitional interpreters for imperative languages

A definitional interpreter defines the semantics of an object language in terms of the (well-known) semantics of a host language, enabling understanding and validation of the semantics through execution. Combining a definitional interpreter with a ...
Typed closure conversion for the calculus of constructions
PLDI '18

Dependently typed languages such as Coq are used to specify and verify the full functional correctness of source programs. Type-preserving compilation can be used to preserve these specifications and proofs of correctness through compilation into the ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

FHPC '14: Proceedings of the 3rd ACM SIGPLAN workshop on Functional high-performance computing

September 2014

116 pages

ISBN:9781450330404

DOI:10.1145/2636228

General Chair:
Jost Berthold
University of Copenhagen
,
Program Chairs:
Mary Sheeran
Chalmers University of Technology
,
Ryan Newton
Indiana University

Copyright © 2014 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGPLAN: ACM Special Interest Group on Programming Languages

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 03 September 2014

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

ICFP'14

Sponsor:

SIGPLAN

ICFP'14: ACM SIGPLAN International Conference on Functional Programming

September 4, 2014

Gothenburg, Sweden

Acceptance Rates

FHPC '14 Paper Acceptance Rate 10 of 11 submissions, 91%;

Overall Acceptance Rate 18 of 25 submissions, 72%

Upcoming Conference

ICFP '25

Sponsor:
sigplan

ACM SIGPLAN International Conference on Functional Programming

October 12 - 18, 2025

Singapore , Singapore

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

15
Total Citations
View Citations
82
Total Downloads

Downloads (Last 12 months)3
Downloads (Last 6 weeks)1

Reflects downloads up to 04 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Henriksen TElsman MLow TGibbons J(2021)Towards size-dependent types for array programmingProceedings of the 7th ACM SIGPLAN International Workshop on Libraries, Languages and Compilers for Array Programming10.1145/3460944.3464310(1-14)Online publication date: 17-Jun-2021
https://dl.acm.org/doi/10.1145/3460944.3464310
van den Haak LMcDonell TKeller Gde Wolff I(2020)Accelerating Nested Data Parallelism: Preserving RegularityEuro-Par 2020: Parallel Processing10.1007/978-3-030-57675-2_27(426-442)Online publication date: 18-Aug-2020
https://doi.org/10.1007/978-3-030-57675-2_27
Tran DHenriksen TElsman MZocca M(2019)Compositional deep learning in FutharkProceedings of the 8th ACM SIGPLAN International Workshop on Functional High-Performance and Numerical Computing10.1145/3331553.3342617(47-59)Online publication date: 18-Aug-2019
https://dl.acm.org/doi/10.1145/3331553.3342617
Henriksen TThorøe FElsman MOancea CHollingsworth JKeidar I(2019)Incremental flattening for nested data parallelismProceedings of the 24th Symposium on Principles and Practice of Parallel Programming10.1145/3293883.3295707(53-67)Online publication date: 16-Feb-2019
https://dl.acm.org/doi/10.1145/3293883.3295707
Hovgaard AHenriksen TElsman M(2019)High-Performance Defunctionalisation in FutharkTrends in Functional Programming10.1007/978-3-030-18506-0_7(136-156)Online publication date: 24-Apr-2019
https://doi.org/10.1007/978-3-030-18506-0_7
Henriksen TElsman MOancea CDavis KRainey M(2018)Modular acceleration: tricky cases of functional high-performance computingProceedings of the 7th ACM SIGPLAN International Workshop on Functional High-Performance Computing10.1145/3264738.3264740(10-21)Online publication date: 17-Sep-2018
https://dl.acm.org/doi/10.1145/3264738.3264740
Annenkov DElsman MSabel DThiemann P(2018)Certified Compilation of Financial ContractsProceedings of the 20th International Symposium on Principles and Practice of Declarative Programming10.1145/3236950.3236955(1-13)Online publication date: 3-Sep-2018
https://dl.acm.org/doi/10.1145/3236950.3236955
Elsman MHenriksen TAnnenkov DOancea C(2018)Static interpretation of higher-order modules in Futhark: functional GPU programming in the largeProceedings of the ACM on Programming Languages10.1145/32367922:ICFP(1-30)Online publication date: 30-Jul-2018
https://dl.acm.org/doi/10.1145/3236792
Steuwer MRemmelg TDubach CReddi VSmith ATang L(2017)Lift: a functional data-parallel IR for high-performance GPU code generationProceedings of the 2017 International Symposium on Code Generation and Optimization10.5555/3049832.3049841(74-85)Online publication date: 4-Feb-2017
https://dl.acm.org/doi/10.5555/3049832.3049841
Henriksen TSerup NElsman MHenglein FOancea C(2017)Futhark: purely functional GPU-programming with nested parallelism and in-place array updatesACM SIGPLAN Notices10.1145/3140587.306235452:6(556-571)Online publication date: 14-Jun-2017
https://dl.acm.org/doi/10.1145/3140587.3062354
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents