Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article
Open access

An Effective Fusion and Tile Size Model for PolyMage

Published: 08 November 2020 Publication History
  • Get Citation Alerts
  • Abstract

    Effective models for fusion of loop nests continue to remain a challenge in both general-purpose and domain-specific language (DSL) compilers. The difficulty often arises from the combinatorial explosion of grouping choices and their interaction with parallelism and locality. This article presents a new fusion algorithm for high-performance domain-specific compilers for image processing pipelines. The fusion algorithm is driven by dynamic programming and explores spaces of fusion possibilities not covered by previous approaches, and it is also driven by a cost function more concrete and precise in capturing optimization criteria than prior approaches. The fusion model is particularly tailored to the transformation and optimization sequence applied by PolyMage and Halide, two recent DSLs for image processing pipelines. Our model-driven technique when implemented in PolyMage provides significant improvements (up to 4.32×) over PolyMage’s approach (which uses auto-tuning to aid its model) and over Halide’s automatic approach (by up to 2.46×) on two state-of-the-art shared-memory multicore architectures.

    References

    [1]
    Protonu Basu, Anand Venkat, Mary W. Hall, Samuel W. Williams, Brian van Straalen, and Leonid Oliker. 2013. Compiler generation and autotuning of communication-avoiding operators for geometric multigrid. In Proceedings of the 20th International Conference on High Performance Computing (HiPC’13). 452--461.
    [2]
    Uday Bondhugula, Oktay Gunluk, Sanjeeb Dash, and Lakshminarayanan Renganarayanan. 2010. A model for fusion and code motion in an automatic parallelizing compiler. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques. 343--352.
    [3]
    Eddie C. Davis, Michelle Mills Strout, and Catherine Olschanowsky. 2018. Transforming loop chains via macro dataflow graphs. In Proceedings of the International Symposium on Code Generation and Optimization (CGO’18). ACM, New York, NY, 265--277.
    [4]
    Johannes Doerfert, Shrey Sharma, and Sebastian Hack. 2018. Polyhedral expression propagation. In Proceedings of the 27th International Conference on Compiler Construction (CC’18). ACM, New York, NY, 25--36.
    [5]
    Guang R. Gao, R. Olsen, Vivek Sarkar, and Radhika Thekkath. 1992. Collective loop fusion for array contraction. In Proceedings of the 5th International Workshop on Languages and Compilers for Parallel Computing. 281--295.
    [6]
    Google Inc. 2017. XLA (Accelerated Linear Algebra) for TensorFlow. Retrieved from https://www.tensorflow.org/performance/xla/.
    [7]
    Abhinav Jangda and Uday Bondhugula. 2018. An effective fusion and tile size model for optimizing image processing pipelines. In Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. 261--275.
    [8]
    Ken Kennedy. 2001. Fast greedy weighted fusion. Int. J. Parallel Prog. 29, 5 (2001), 463--491.
    [9]
    Ken Kennedy and Kathryn S. McKinley. 1993. Maximizing loop parallelism and improving data locality via loop fusion and distribution. In Proceedings of the International Workshop on Languages and Compilers for Parallel Computing. 301--320.
    [10]
    Sriram Krishnamoorthy, Muthu Baskaran, Uday Bondhugula, J. Ramanujam, A. Rountev, and P. Sadayappan. 2007. Effective automatic parallelization of stencil computations. In Proceedings of the ACM SIGPLAN Conference on Programming Languages Design and Implementation (PLDI’07).
    [11]
    Nimrod Megiddo and Vivek Sarkar. 1997. Optimal weighted loop fusion for parallel programs. In Proceedings of the ACM Symposium on Parallel Algorithms and Architectures (SPAA’97). 282--291.
    [12]
    Sanyam Mehta, Gautham Beeraka, and Pen-Chung Yew. 2013. Tile size selection revisited. ACM Trans. Archit. Code Optim. 10, 4 (Dec. 2013).
    [13]
    Ravi Teja Mullapudi, Andrew Adams, Dillon Sharlet, Jonathan Ragan-Kelley, and Kayvon Fatahalian. 2016. Automatically scheduling halide image processing pipelines. ACM Trans. Graph. 35, 4 (July 2016), 83:1--83:11.
    [14]
    Ravi Teja Mullapudi, Vinay Vasista, and Uday. Bondhugula. 2015. PolyMage: Automatic optimization for image processing pipelines. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’15). 429--443.
    [15]
    Catherine Olschanowsky, Michelle Mills Strout, Stephen Guzik, John Loffeld, and Jeffrey Hittinger. 2014. A study on balancing parallelism, data locality, and recomputation in existing PDE solvers. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. 793--804.
    [16]
    PolyMage project, Apache 2.0 license 2017. PolyMage. Retrieved from https://bitbucket.org/udayb/polymage.
    [17]
    PolyMagePage 2015. PolyMage: A DSL and compiler for automatic optimization of image processing pipelines. Retrieved from http://mcl.csa.iisc.ernet.in/polymage.html.
    [18]
    Apan Qasem and Ken Kennedy. 2006. Profitable loop fusion and tiling using model-driven empirical search. In Proceedings of the International Conference on Supercomputing (ICS’06). 249--258.
    [19]
    Jonathan Ragan-Kelley, Andrew Adams, Sylvain Paris, Marc Levoy, Saman Amarasinghe, and Frédo Durand. 2012. Decoupling algorithms from schedules for easy optimization of image processing pipelines. ACM Trans. Graph. 31, 4 (2012), 32:1--32:12.
    [20]
    Jonathan Ragan-Kelley, Connelly Barnes, Andrew Adams, Sylvain Paris, Frédo Durand, and Saman Amarasinghe. 2013. Halide: A language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines. In Proceedings of the ACM SIGPLAN Conference on Programming Languages Design and Implementation. 519--530.
    [21]
    István Z. Reguly, Gihan R. Mudalige, and Mike B. Giles. 2017. Loop tiling in large-scale stencil codes at run-time with OPS. CoRR abs/1704.00693 (2017).
    [22]
    Gerald Roth and Ken Kennedy. 1998. Loop fusion in high performance Fortran. In Proceedings of the International Conference on Supercomputing (ICS’98). 125--132.
    [23]
    Jun Shirako, Kamal Sharma, Naznin Fauzia, Louis-Noël Pouchet, J. Ramanujam, P. Sadayappan, and Vivek Sarkar. 2012. Analytical bounds for optimal tile size selection. In Proceedings of the 21st International Conference on Compiler Construction. 101--121.
    [24]
    Vinay Vasista, Kumudha Narasimhan, Siddharth Bhat, and Uday Bondhugula. 2017. Optimizing geometric multigrid method computation using a DSL approach. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC’17).
    [25]
    M. Wolf and Monica S. Lam. 1991. A data locality optimizing algorithm. In Proceedings of the ACM SIGPLAN Symposium on Programming Languages Design and Implementation. 30--44.
    [26]
    David Wonnacott. 1999. Time skewing for parallel computers. In Proceedings of the 12th Workshop on Languages and Compilers for Parallel Computing. Springer-Verlag, 477--480.
    [27]
    Qing Yi and Ken Kennedy. 2004. Improving memory hierarchy performance through combined loop interchange and multi-level fusion. Int. J. High Perf. Comput. Applic. 18, 2 (2004), 237--253.
    [28]
    Xing Zhou, Jean-Pierre Giacalone, María Jesús Garzarán, Robert H. Kuhn, Yang Ni, and David Padua. 2012. Hierarchical overlapped tiling. In Proceedings of the International Symposium on Code Generation and Optimization (CGO’12). 207--218.

    Cited By

    View all
    • (2024)Modeling the Interplay between Loop Tiling and Fusion in Optimizing Compilers Using Affine RelationsACM Transactions on Computer Systems10.1145/363530541:1-4(1-45)Online publication date: 15-Jan-2024

    Index Terms

    1. An Effective Fusion and Tile Size Model for PolyMage

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Transactions on Programming Languages and Systems
      ACM Transactions on Programming Languages and Systems  Volume 42, Issue 3
      September 2020
      230 pages
      ISSN:0164-0925
      EISSN:1558-4593
      DOI:10.1145/3430314
      Issue’s Table of Contents
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 08 November 2020
      Accepted: 01 June 2020
      Revised: 01 March 2020
      Received: 01 December 2018
      Published in TOPLAS Volume 42, Issue 3

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Fusion
      2. image processing pipelines
      3. locality
      4. parallelism
      5. tiling

      Qualifiers

      • Research-article
      • Research
      • Refereed

      Funding Sources

      • Science and Engineering Research Board (SERB)

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)137
      • Downloads (Last 6 weeks)16
      Reflects downloads up to 10 Aug 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Modeling the Interplay between Loop Tiling and Fusion in Optimizing Compilers Using Affine RelationsACM Transactions on Computer Systems10.1145/363530541:1-4(1-45)Online publication date: 15-Jan-2024

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format.

      HTML Format

      Get Access

      Login options

      Full Access

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media