Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article
Open access

Achieving high-performance the functional way: a functional pearl on expressing high-performance optimizations as rewrite strategies

Published: 03 August 2020 Publication History
  • Get Citation Alerts
  • Abstract

    Optimizing programs to run efficiently on modern parallel hardware is hard but crucial for many applications. The predominantly used imperative languages - like C or OpenCL - force the programmer to intertwine the code describing functionality and optimizations. This results in a portability nightmare that is particularly problematic given the accelerating trend towards specialized hardware devices to further increase efficiency.
    Many emerging DSLs used in performance demanding domains such as deep learning or high-performance image processing attempt to simplify or even fully automate the optimization process. Using a high-level - often functional - language, programmers focus on describing functionality in a declarative way. In some systems such as Halide or TVM, a separate schedule specifies how the program should be optimized. Unfortunately, these schedules are not written in well-defined programming languages. Instead, they are implemented as a set of ad-hoc predefined APIs that the compiler writers have exposed.
    In this functional pearl, we show how to employ functional programming techniques to solve this challenge with elegance. We present two functional languages that work together - each addressing a separate concern. RISE is a functional language for expressing computations using well known functional data-parallel patterns. ELEVATE is a functional language for describing optimization strategies. A high-level RISE program is transformed into a low-level form using optimization strategies written in ELEVATE . From the rewritten low-level program high-performance parallel code is automatically generated. In contrast to existing high-performance domain-specific systems with scheduling APIs, in our approach programmers are not restricted to a set of built-in operations and optimizations but freely define their own computational patterns in RISE and optimization strategies in ELEVATE in a composable and reusable way. We show how our holistic functional approach achieves competitive performance with the state-of-the-art imperative systems Halide and TVM.

    Supplementary Material

    Presentation at ICFP '20 (a92-hagedorn-presentation.mp4)

    References

    [1]
    Oana Andrei, Maribel Fernández, Hélène Kirchner, Guy Melançon, Olivier Namet, and Bruno Pinaud. 2011. PORGY: Strategy-Driven Interactive Transformation of Graphs. In Proceedings 6th International Workshop on Computing with Terms and Graphs, TERMGRAPH 2011, Saarbrücken, Germany, 2nd April 2011. 54-68. https://doi.org/10.4204/EPTCS.48.7
    [2]
    Robert Atkey, Michel Steuwer, Sam Lindley, and Christophe Dubach. 2017. Strategy Preserving Compilation for Parallel Functional Code. CoRR abs/1710.08332 ( 2017 ).
    [3]
    Riyadh Baghdadi, Jessica Ray, Malek Ben Romdhane, Emanuele Del Sozzo, Abdurrahman Akkas, Yunming Zhang, Patricia Suriana, Shoaib Kamil, and Saman P. Amarasinghe. 2019. Tiramisu: A Polyhedral Compiler for Expressing Fast and Portable Code. In IEEE/ACM International Symposium on Code Generation and Optimization, CGO 2019, Washington, DC, USA, February 16-20, 2019. 193-205. https://doi.org/10.1109/CGO. 2019.8661197
    [4]
    Paul Barham and Michael Isard. 2019. Machine Learning Systems are Stuck in a Rut. In HotOS. ACM, 177-183.
    [5]
    Richard Bird and Oege de Moor. 1997. Algebra of Programming. Prentice-Hall, Inc., Upper Saddle River, NJ, USA.
    [6]
    Peter Borovanský, Claude Kirchner, Hélène Kirchner, Pierre-Etienne Moreau, and Christophe Ringeissen. 1998. An overview of ELAN. Electr. Notes Theor. Comput. Sci. 15 ( 1998 ), 55-70. https://doi.org/10.1016/S1571-0661 ( 05 ) 82552-6
    [7]
    Peter Borovanský, Claude Kirchner, Hélène Kirchner, Pierre-Etienne Moreau, and Marian Vittek. 1996. ELAN: A logical framework based on computational systems. Electr. Notes Theor. Comput. Sci. 4 ( 1996 ), 35-50. https://doi.org/10.1016/ S1571-0661 ( 04 ) 00032-5
    [8]
    James M Boyle, Terence J Harmer, and Victor L Winter. 1997. The TAMPR program transformation system: Simplifying the development of numerical software. In Modern software tools for scientific computing. Springer, 353-372.
    [9]
    Martin Bravenboer and Eelco Visser. 2002. Rewriting Strategies for Instruction Selection. In Rewriting Techniques and Applications, 13th International Conference, RTA 2002, Copenhagen, Denmark, July 22-24, 2002, Proceedings. 237-251. https://doi.org/10.1007/3-540-45610-4_17
    [10]
    Manuel M. T. Chakravarty, Gabriele Keller, Sean Lee, Trevor L. McDonell, and Vinod Grover. 2011. Accelerating Haskell array codes with multicore GPUs. In DAMP. ACM, 3-14.
    [11]
    Chun Chen, Jacqueline Chame, and Mary Hall. 2008. CHiLL: A framework for composing high-level loop transformations. Technical Report. Citeseer.
    [12]
    Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin Zheng, Eddie Q. Yan, Haichen Shen, Meghan Cowan, Leyuan Wang, Yuwei Hu, Luis Ceze, Carlos Guestrin, and Arvind Krishnamurthy. 2018. TVM: An Automated End-to-End Optimizing Compiler for Deep Learning. In 13th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2018, Carlsbad, CA, USA, October 8-10, 2018. 578-594. https://www.usenix.org/conference/osdi18/presentation/chen
    [13]
    Elliot J. Chikofsky and James H. Cross II. 1990. Reverse Engineering and Design Recovery: A Taxonomy. IEEE Software 7, 1 ( 1990 ), 13-17. https://doi.org/10.1109/52.43044
    [14]
    Manuel Clavel, Francisco Durán, Steven Eker, Patrick Lincoln, Narciso Martí-Oliet, José Meseguer, and Jose F. Quesada. 2002. Maude: specification and programming in rewriting logic. Theor. Comput. Sci. 285, 2 ( 2002 ), 187-243. https: //doi.org/10.1016/S0304-3975 ( 01 ) 00359-0
    [15]
    Christian S. Collberg, Clark D. Thomborson, and Douglas Low. 1998. Manufacturing Cheap, Resilient, and Stealthy Opaque Constructs. In POPL '98, Proceedings of the 25th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, San Diego, CA, USA, January 19-21, 1998. 184-196. https://doi.org/10.1145/268946.268962
    [16]
    Alexander Collins, Dominik Grewe, Vinod Grover, Sean Lee, and Adriana Susnea. 2014. NOVA: A Functional Language for Data Parallelism. In ARRAY@PLDI. ACM, 8-13.
    [17]
    David Delahaye. 2000. A Tactic Language for the System Coq. In LPAR (Lecture Notes in Computer Science), Vol. 1955. Springer, 85-95.
    [18]
    Eelco Dolstra and Eelco Visser. 2002. Building Interpreters with Rewriting Strategies. Electr. Notes Theor. Comput. Sci. 65, 3 ( 2002 ), 57-76. https://doi.org/10.1016/S1571-0661 ( 04 ) 80427-4
    [19]
    Amy P. Felty. 1993. Implementing Tactics and Tacticals in a Higher-Order Logic Programming Language. J. Autom. Reasoning 11, 1 ( 1993 ), 41-81.
    [20]
    Maribel Fernández, Hélène Kirchner, and Olivier Namet. 2011. A Strategy Language for Graph Rewriting. In Logic-Based Program Synthesis and Transformation-21st International Symposium, LOPSTR 2011, Odense, Denmark, July 18-20, 2011. Revised Selected Papers. 173-188. https://doi.org/10.1007/978-3-642-32211-2_12
    [21]
    Martin Fowler. 1999. Refactoring-Improving the Design of Existing Code. Addison-Wesley. http://martinfowler.com/books/ refactoring.html
    [22]
    Sylvain Girbal, Nicolas Vasilache, Cédric Bastoul, Albert Cohen, David Parello, Marc Sigler, and Olivier Temam. 2006. Semi-Automatic Composition of Loop Transformations for Deep Parallelism and Memory Hierarchies. International Journal of Parallel Programming 34, 3 ( 2006 ), 261-317. https://doi.org/10.1007/s10766-006-0012-3
    [23]
    Joseph A. Goguen, Claude Kirchner, Hélène Kirchner, Aristide Mégrelis, José Meseguer, and Timothy C. Winkler. 1987. An Introduction to OBJ 3. In Conditional Term Rewriting Systems, 1st International Workshop, Orsay, France, July 8-10, 1987, Proceedings. 258-263. https://doi.org/10.1007/3-540-19242-5_22
    [24]
    Bastian Hagedorn, Larisa Stoltzfus, Michel Steuwer, Sergei Gorlatch, and Christophe Dubach. 2018. High performance stencil code generation with lift. In Proceedings of the 2018 International Symposium on Code Generation and Optimization, CGO 2018, Vösendorf / Vienna, Austria, February 24-28, 2018. 100-112. https://doi.org/10.1145/3168824
    [25]
    Halide. 2020. Tutorial: Scheduling. https://halide-lang.org/tutorials/tutorial_lesson_05_scheduling_1.html
    [26]
    Mary Hall, Jacqueline Chame, Chun Chen, Jaewook Shin, Gabe Rudy, and Malik Murtaza Khan. 2009. Loop transformation recipes for code generation and auto-tuning. In International Workshop on Languages and Compilers for Parallel Computing. Springer, 50-64.
    [27]
    John L. Hennessy and David A. Patterson. 2019. A new golden age for computer architecture. Commun. ACM 62, 2 ( 2019 ), 48-60.
    [28]
    Troels Henriksen, Niels G. W. Serup, Martin Elsman, Fritz Henglein, and Cosmin E. Oancea. 2017. Futhark: purely functional GPU-programming with nested parallelism and in-place array updates. In PLDI. ACM, 556-571.
    [29]
    Hélène Kirchner. 2015. Rewriting Strategies and Strategic Rewrite Programs. In Logic, Rewriting, and Concurrency-Essays dedicated to José Meseguer on the Occasion of His 65th Birthday. 380-403. https://doi.org/10.1007/978-3-319-23165-5_18
    [30]
    Chris Lattner, Mehdi Amini, Uday Bondhugula, Albert Cohen, Andy Davis, Jacques Pienaar, River Riddle, Tatiana Shpeisman, Nicolas Vasilache, and Oleksandr Zinenko. 2020. MLIR: A Compiler Infrastructure for the End of Moore's Law. arXiv:cs.PL/ 2002.11054
    [31]
    Sebastiaan Pascal Luttik, Eelco Visser, et al. 1997. Specification of rewriting strategies. Universiteit van Amsterdam. Programming Research Group.
    [32]
    Trevor L. McDonell, Manuel M. T. Chakravarty, Gabriele Keller, and Ben Lippmeier. 2013. Optimising purely functional GPU programs. In ICFP. ACM, 49-60.
    [33]
    Ulf Norell. 2007. Towards a practical programming language based on dependent type theory. Ph.D. Dissertation. Department of Computer Science and Engineering, Chalmers University of Technology, SE-412 96 Göteborg, Sweden.
    [34]
    Karina Olmos and Eelco Visser. 2002. Strategies for Source-to-Source Constant Progagation. Electr. Notes Theor. Comput. Sci. 70, 6 ( 2002 ), 156-175. https://doi.org/10.1016/S1571-0661 ( 04 ) 80605-4
    [35]
    Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. 2017. Automatic diferentiation in PyTorch. ( 2017 ).
    [36]
    Simon Peyton Jones, Andrew Tolmach, and Tony Hoare. 2001. Playing by the rules: rewriting as a practical optimisation technique in GHC. In 2001 Haskell Workshop (2001 haskell workshop ed.). ACM SIGPLAN.
    [37]
    Jonathan Ragan-Kelley, Andrew Adams, Dillon Sharlet, Connelly Barnes, Sylvain Paris, Marc Levoy, Saman P. Amarasinghe, and Frédo Durand. 2018. Halide: decoupling algorithms from schedules for high-performance image processing. Commun. ACM 61, 1 ( 2018 ), 106-115. https://doi.org/10.1145/3150211
    [38]
    Jonathan Ragan-Kelley, Connelly Barnes, Andrew Adams, Sylvain Paris, Frédo Durand, and Saman P. Amarasinghe. 2013. Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines. In PLDI. ACM, 519-530.
    [39]
    Michel Steuwer, Christian Fensch, Sam Lindley, and Christophe Dubach. 2015. Generating performance portable code using rewrite rules: from high-level functional expressions to high-performance OpenCL code. In ICFP. ACM, 205-217.
    [40]
    Michel Steuwer, Toomas Remmelg, and Christophe Dubach. 2016. Matrix multiplication beyond auto-tuning: rewrite-based GPU code generation. In CASES. ACM, 15 : 1-15 : 10.
    [41]
    Michel Steuwer, Toomas Remmelg, and Christophe Dubach. 2017. Lift: a functional data-parallel IR for high-performance GPU code generation. In Proceedings of the 2017 International Symposium on Code Generation and Optimization, CGO 2017, Austin, TX, USA, February 4-8, 2017. 74-85. http://dl.acm.org/citation.cfm?id= 3049841
    [42]
    Joel Svensson, Mary Sheeran, and Koen Claessen. 2008. Obsidian: A Domain Specific Embedded Language for Parallel Programming of Graphics Processors. In IFL (Lecture Notes in Computer Science), Vol. 5836. Springer, 156-173.
    [43]
    TVM. 2020. How to optimize GEMM on CPU. https://docs.tvm.ai/tutorials/optimize/opt_gemm.html
    [44]
    Mark van den Brand, Arie van Deursen, Jan Heering, H. A. de Jong, Merijn de Jonge, Tobias Kuipers, Paul Klint, Leon Moonen, Pieter A. Olivier, Jeroen Scheerder, Jurgen J. Vinju, Eelco Visser, and Joost Visser. 2001. The ASF+SDF Meta-environment: A Component-Based Language Development Environment. In Compiler Construction, 10th International Conference, CC 2001 Held as Part of the Joint European Conferences on Theory and Practice of Software, ETAPS 2001 Genova, Italy, April 2-6, 2001, Proceedings. 365-370. https://doi.org/10.1007/3-540-45306-7_26
    [45]
    Eelco Visser. 2001a. Stratego: A Language for Program Transformation Based on Rewriting Strategies. In Rewriting Techniques and Applications, 12th International Conference, RTA 2001, Utrecht, The Netherlands, May 22-24, 2001, Proceedings. 357-362. https://doi.org/10.1007/3-540-45127-7_27
    [46]
    Eelco Visser. 2001b. A Survey of Strategies in Program Transformation Systems. Electr. Notes Theor. Comput. Sci. 57 ( 2001 ), 109-143. https://doi.org/10.1016/S1571-0661 ( 04 ) 00270-1
    [47]
    Eelco Visser. 2004. Program transformation with Stratego/XT. In Domain-specific program generation. Springer, 216-238.
    [48]
    Eelco Visser. 2005. A survey of strategies in rule-based program transformation systems. J. Symb. Comput. 40, 1 ( 2005 ), 831-873. https://doi.org/10.1016/j.jsc. 2004. 12.011
    [49]
    Eelco Visser, Zine-El-Abidine Benaissa, and Andrew P. Tolmach. 1998. Building Program Optimizers with Rewriting Strategies. In Proceedings of the third ACM SIGPLAN International Conference on Functional Programming (ICFP '98), Baltimore, Maryland, USA, September 27-29, 1998. 13-26. https://doi.org/10.1145/289423.289425
    [50]
    Philip Wadler. 2015. Propositions as types. Commun. ACM 58, 12 ( 2015 ), 75-84.
    [51]
    Tomofumi Yuki, Gautam Gupta, DaeGon Kim, Tanveer Pathan, and Sanjay V. Rajopadhye. 2012. AlphaZ: A System for Design Space Exploration in the Polyhedral Model. In Languages and Compilers for Parallel Computing, 25th International Workshop, LCPC 2012, Tokyo, Japan, September 11-13, 2012, Revised Selected Papers. 17-31. https://doi.org/10.1007/978-3-642-37658-0_2
    [52]
    Yunming Zhang, Mengjiao Yang, Riyadh Baghdadi, Shoaib Kamil, Julian Shun, and Saman P. Amarasinghe. 2018. GraphIt: a high-performance graph DSL. PACMPL 2, OOPSLA ( 2018 ), 121 : 1-121 : 30. https://doi.org/10.1145/3276491

    Cited By

    View all
    • (2024)(De/Re)-Composition of Data-Parallel Computations via Multi-Dimensional HomomorphismsACM Transactions on Programming Languages and Systems10.1145/3665643Online publication date: 22-May-2024
    • (2024)SpEQ: Translation of Sparse Codes using EquivalencesProceedings of the ACM on Programming Languages10.1145/36564458:PLDI(1680-1703)Online publication date: 20-Jun-2024
    • (2024)Descend: A Safe GPU Systems Programming LanguageProceedings of the ACM on Programming Languages10.1145/36564118:PLDI(841-864)Online publication date: 20-Jun-2024
    • Show More Cited By

    Index Terms

    1. Achieving high-performance the functional way: a functional pearl on expressing high-performance optimizations as rewrite strategies

          Recommendations

          Comments

          Information & Contributors

          Information

          Published In

          cover image Proceedings of the ACM on Programming Languages
          Proceedings of the ACM on Programming Languages  Volume 4, Issue ICFP
          August 2020
          1070 pages
          EISSN:2475-1421
          DOI:10.1145/3415018
          Issue’s Table of Contents
          This work is licensed under a Creative Commons Attribution International 4.0 License.

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          Published: 03 August 2020
          Published in PACMPL Volume 4, Issue ICFP

          Permissions

          Request permissions for this article.

          Check for updates

          Badges

          Author Tags

          1. ELEVATE
          2. Optimization Strategies
          3. Rewrite Rules
          4. Strategy Languages

          Qualifiers

          • Research-article

          Contributors

          Other Metrics

          Bibliometrics & Citations

          Bibliometrics

          Article Metrics

          • Downloads (Last 12 months)822
          • Downloads (Last 6 weeks)67
          Reflects downloads up to 09 Aug 2024

          Other Metrics

          Citations

          Cited By

          View all
          • (2024)(De/Re)-Composition of Data-Parallel Computations via Multi-Dimensional HomomorphismsACM Transactions on Programming Languages and Systems10.1145/3665643Online publication date: 22-May-2024
          • (2024)SpEQ: Translation of Sparse Codes using EquivalencesProceedings of the ACM on Programming Languages10.1145/36564458:PLDI(1680-1703)Online publication date: 20-Jun-2024
          • (2024)Descend: A Safe GPU Systems Programming LanguageProceedings of the ACM on Programming Languages10.1145/36564118:PLDI(841-864)Online publication date: 20-Jun-2024
          • (2024)Interactive Source-to-Source Optimizations Validated using Static Resource AnalysisProceedings of the 13th ACM SIGPLAN International Workshop on the State Of the Art in Program Analysis10.1145/3652588.3663320(26-34)Online publication date: 20-Jun-2024
          • (2024)Shoggoth: A Formal Foundation for Strategic RewritingProceedings of the ACM on Programming Languages10.1145/36332118:POPL(61-89)Online publication date: 5-Jan-2024
          • (2024)Guided Equality SaturationProceedings of the ACM on Programming Languages10.1145/36329008:POPL(1727-1758)Online publication date: 5-Jan-2024
          • (2023)Graph IRs for Impure Higher-Order Languages: Making Aggressive Optimizations Affordable with Precise Effect DependenciesProceedings of the ACM on Programming Languages10.1145/36228137:OOPSLA2(400-430)Online publication date: 16-Oct-2023
          • (2023)Using Rewrite Strategies for Efficient Functional Automatic DifferentiationProceedings of the 25th ACM International Workshop on Formal Techniques for Java-like Programs10.1145/3605156.3606456(51-57)Online publication date: 18-Jul-2023
          • (2023)(De/Re)-Compositions Expressed Systematically via MDH-Based SchedulesProceedings of the 32nd ACM SIGPLAN International Conference on Compiler Construction10.1145/3578360.3580269(61-72)Online publication date: 17-Feb-2023
          • (2023)Structured Operations: Modular Design of Code Generators for Tensor CompilersLanguages and Compilers for Parallel Computing10.1007/978-3-031-31445-2_10(141-156)Online publication date: 10-May-2023
          • Show More Cited By

          View Options

          View options

          PDF

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader

          Get Access

          Login options

          Full Access

          Media

          Figures

          Other

          Tables

          Share

          Share

          Share this Publication link

          Share on social media