Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3503221.3508434acmconferencesArticle/Chapter ViewAbstractPublication PagesppoppConference Proceedingsconference-collections
research-article
Open access

Parallel block-delayed sequences

Published: 28 March 2022 Publication History

Abstract

Programming languages using functions on collections of values, such as map, reduce, scan and filter, have been used for over fifty years. Such collections have proven to be particularly useful in the context of parallelism because such functions are naturally parallel. However, if implemented naively they lead to the generation of temporary intermediate collections that can significantly increase memory usage and runtime. To avoid this pitfall, many approaches use "fusion" to combine operations and avoid temporary results. However, most of these approaches involve significant changes to a compiler and are limited to a small set of functions, such as maps and reduces.
In this paper we present a library-based approach that fuses widely used operations such as scans, filters, and flattens. In conjunction with existing techniques, this covers most of the common operations on collections. Our approach is based on a novel technique which parallelizes over blocks, with streams within each block. We demonstrate the approach by implementing libraries targeting multicore parallelism in two languages: Parallel ML and C++, which have very different semantics and compilers. To help users understand when to use the approach, we define a cost semantics that indicates when fusion occurs and how it reduces memory allocations. We present experimental results for a dozen benchmarks that demonstrate significant reductions in both time and space. In most cases the approach generates code that is near optimal for the machines it is running on.

References

[1]
Frances E. Allen and John Cocke. 1971. A Catalogue of Optimizing Transformations. IBM Thomas J. Watson Research Center.
[2]
Jatin Arora, Sam Westrick, and Umut A. Acar. 2021. Provably Space Efficient Parallel Functional Programming. In Proceedings of the 48th Annual ACM Symposium on Principles of Programming Languages (POPL)".
[3]
John W. Backus. 1978. Can Programming Be Liberated From the von Neumann Style? A Functional Style and its Algebra of Programs. Commun. ACM 21, 8 (1978), 613--641.
[4]
Guy E. Blelloch. 1992. NESL: A Nested Data-Parallel Language. Technical Report CMU-CS-92-103. School of Computer Science, Carnegie Mellon University.
[5]
Guy E. Blelloch, Daniel Anderson, and Laxman Dhulipala. 2020. ParlayLib - A Toolkit for Parallel Algorithms on Shared-Memory Multi-core Machines. In ACM Symposium on Parallelism in Algorithms and Architectures (SPAA).
[6]
Guy. E. Blelloch and Siddhartha Chatterjee. 1990. Vcode: a data-parallel intermediate language. In IEEE Frontiers of Massively Parallel Computation. 471--480.
[7]
Deepayan Chakrabarti, Yiping Zhan, and Christos Faloutsos. 2004. R-MAT: A recursive model for graph mining. In SIAM SDM.
[8]
Manuel M. T. Chakravarty, Roman Leshchinskiy, Simon Peyton Jones, Gabriele Keller, and Simon Marlow. 2007. Data Parallel Haskell: A Status Report. In Workshop on Declarative Aspects of Multicore Programming (DAMP). 10--18.
[9]
Siddhartha Chatterjee, Guy E. Blelloch, and Allan L. Fisher. 1991. Size and Access Inference for Data-Parallel Programs. In ACM SIGPLAN Conference on Programming Language Design and Implementation PLDI). 130--144.
[10]
Siddhartha Chatterjee, Guy E. Blelloch, and Marco Zagha. 1990. Scan Primitives for Vector Computers. In 1990 ACM/IEEE Conference on Supercomputing (SC). 666--675.
[11]
E. F. Codd. 1970. A Relational Model of Data for Large Shared Data Banks. Commun. ACM 13, 6 (June 1970), 377--387.
[12]
Duncan Coutts, Roman Leshchinskiy, and Don Stewart. 2007. Stream Fusion: From Lists to Streams to Nothing at All. In ACM SIGPLAN International Conference on Functional Programming (ICFP). 315--326.
[13]
Alain Darte. 1999. On the complexity of loop fusion. In IEEE Int. Conference on Parallel Architectures and Compilation Techniques (PACT).
[14]
Jeffrey Dean and Sanjay Ghemawat. 2008. MapReduce: Simplified Data Processing on Large Clusters. Commun. ACM 51, 1 (2008), 107--113.
[15]
Kento Emoto and Kiminori Matsuzaki. 2014. An automatic fusion mechanism for variable-length list skeletons in SkeTo. International Journal of Parallel Programming 42, 4 (2014), 546--563.
[16]
Andrew Gill, John Launchbury, and Simon L. Peyton Jones. 1993. A Short Cut to Deforestation. In Proc. Conference on Functional Programming Languages and Computer Architecture (FPCA).
[17]
Troels Henriksen, Niels G. W. Serup, Martin Elsman, Fritz Henglein, and Cosmin E. Oancea. 2017. Futhark: Purely Functional GPU-Programming with Nested Parallelism and in-Place Array Updates. In ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI). 556--571.
[18]
Kenneth E. Iverson. 1962. A Programming Language. Wiley, New York.
[19]
Guy L. Steele Jr. and W. Daniel Hillis. 1986. Connection Machine LISP: Fine-Grained Parallel Symbolic Processing. In ACM Conference on LISP and Functional Programming (LFP). 279--297.
[20]
Gabriele Keller, Manuel M. T. Chakravarty, Roman Leshchinskiy, Simon L. Peyton Jones, and Ben Lippmeier. 2010. Regular, shape-polymorphic, parallel arrays in Haskell. In ACM SIGPLAN international conference on Functional programming (ICFP). ACM, 261--272.
[21]
Ken Kennedy and Kathryn S. McKinley. 1993. Maximizing loop parallelism and improving data locality via loop fusion and distribution. In Int. Workshop on Languages and Compilers for Parallel Computing.
[22]
Ben Lippmeier, Manuel M. T. Chakravarty, Gabriele Keller, and Simon L. Peyton Jones. 2012. Guiding parallel array fusion with indexed types. In ACM SIGPLAN Symposium on Haskell. 25--36.
[23]
J. David MacDonald and Kellogg S. Booth. 1990. Heuristics for ray tracing using space subdivision. Vis. Comput. 6, 3 (1990), 153--166.
[24]
Geoffrey Mainland, Roman Leshchinskiy, and Simon Peyton Jones. 2017. Exploiting vector instructions with generalized stream fusion. Commun. ACM 60, 5 (2017), 83--91.
[25]
Kiminori Matsuzaki and Kento Emoto. 2009. Implementing fusion-equipped parallel skeletons by expression templates. In International Symposium on Implementation and Application of Functional Languages. Springer, 72--89.
[26]
Trevor L. McDonell, Manuel M.T. Chakravarty, Gabriele Keller, and Ben Lippmeier. 2013. Optimising Purely Functional GPU Programs. In ACM SIGPLAN International Conference on Functional Programming (ICFP). 49--60.
[27]
Eric Niebler, Casey Carter, and Christopher Di Bella. 2018. The One Ranges Proposal. http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0896r4.pdf.
[28]
John R. Rose and Guy L. Steele Jr. 1987. C*: An Extended C Language. In Proceedings of the C++ Workshop. Santa Fe, NM, USA, November 1987. USENIX Association, 361--398.
[29]
J. T. Schwartz, R.B.K Dewar, E. Dubinsky, and E. Schonberg. 1986. Programming with Sets: An Introduction to SETL. Springer-Verlag, New York.
[30]
Julian Shun, Guy E. Blelloch, Jeremy T Fineman, Phillip B Gibbons, Aapo Kyrola, Harsha Vardhan Simhadri, and Kanat Tangwongsan. 2012. Brief announcement: the Problem-Based Benchmark Suite. In ACM Symposium on Parallelism in Algorithms and Architectures (SPAA).
[31]
Michel Steuwer, Christian Fensch, Sam Lindley, and Christophe Dubach. 2015. Generating Performance Portable Code Using Rewrite Rules: From High-Level Functional Expressions to High-Performance OpenCL Code. In ACM SIGPLAN International Conference on Functional Programming (ICFP). 205--217.
[32]
Josef Svenningsson. 2002. Shortcut Fusion for Accumulating Parameters & Zip-like Functions. In Proc ACM SIGPLAN International Conference on Functional Programming (ICFP).
[33]
Bo Joel Svensson and Josef Svenningsson. 2014. Defunctionalizing Push Arrays. In Proceedings of the 3rd ACM SIGPLAN Workshop on Functional High-Performance Computing (Gothenburg, Sweden) (FHPC '14). Association for Computing Machinery, New York, NY, USA, 43--52.
[34]
Philip Wadler. 1990. Deforestation: Transforming Programs to Eliminate Trees. Theor. Comput. Sci. 73, 2 (1990), 231--248.
[35]
Joe Warren. 1984. A Hierarchical Basis for Reordering Transformations. In ACM SIGACT-SIGPLAN Symposium on Principles of Programming Languages (POPL).
[36]
Sam Westrick, Rohan Yadav, Matthew Fluet, and Umut A. Acar. 2020. Disentanglement in Nested-Parallel Programs. In Proceedings of the 47th Annual ACM Symposium on Principles of Programming Languages (POPL)".
[37]
Matei Zaharia, Reynold S. Xin, Patrick Wendell, Tathagata Das, Michael Armbrust, Ankur Dave, Xiangrui Meng, Josh Rosen, Shivaram Venkataraman, Michael J. Franklin, Ali Ghodsi, Joseph Gonzalez, Scott Shenker, and Ion Stoica. 2016. Apache Spark: a unified engine for big data processing. Commun. ACM 59, 11 (2016), 56--65.

Cited By

View all
  • (2024)Representing Data Collections in an SSA FormProceedings of the 2024 IEEE/ACM International Symposium on Code Generation and Optimization10.1109/CGO57630.2024.10444817(308-321)Online publication date: 2-Mar-2024
  • (2023)Evaluating Functional Memory-Managed Parallel Languages for HPC using the NAS Parallel Benchmarks2023 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW59300.2023.00072(413-422)Online publication date: May-2023
  • (2022)Entanglement detection with near-zero costProceedings of the ACM on Programming Languages10.1145/35476466:ICFP(679-710)Online publication date: 31-Aug-2022

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
PPoPP '22: Proceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
April 2022
495 pages
ISBN:9781450392044
DOI:10.1145/3503221
This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 March 2022

Check for updates

Badges

Author Tags

  1. collections
  2. functional programming
  3. fusion
  4. parallel programming

Qualifiers

  • Research-article

Funding Sources

Conference

PPoPP '22

Acceptance Rates

Overall Acceptance Rate 230 of 1,014 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)357
  • Downloads (Last 6 weeks)44
Reflects downloads up to 13 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Representing Data Collections in an SSA FormProceedings of the 2024 IEEE/ACM International Symposium on Code Generation and Optimization10.1109/CGO57630.2024.10444817(308-321)Online publication date: 2-Mar-2024
  • (2023)Evaluating Functional Memory-Managed Parallel Languages for HPC using the NAS Parallel Benchmarks2023 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW59300.2023.00072(413-422)Online publication date: May-2023
  • (2022)Entanglement detection with near-zero costProceedings of the ACM on Programming Languages10.1145/35476466:ICFP(679-710)Online publication date: 31-Aug-2022

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media