Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3173162.3173182acmconferencesArticle/Chapter ViewAbstractPublication PagesasplosConference Proceedingsconference-collections
research-article

Automatic Matching of Legacy Code to Heterogeneous APIs: An Idiomatic Approach

Published: 19 March 2018 Publication History

Abstract

Heterogeneous accelerators often disappoint. They provide the prospect of great performance, but only deliver it when using vendor specific optimized libraries or domain specific languages. This requires considerable legacy code modifications, hindering the adoption of heterogeneous computing. This paper develops a novel approach to automatically detect opportunities for accelerator exploitation. We focus on calculations that are well supported by established APIs: sparse and dense linear algebra, stencil codes and generalized reductions and histograms. We call them idioms and use a custom constraint-based Idiom Description Language (IDL) to discover them within user code. Detected idioms are then mapped to BLAS libraries, cuSPARSE and clSPARSE and two DSLs: Halide and Lift. We implemented the approach in LLVM and evaluated it on the NAS and Parboil sequential C/C++ benchmarks, where we detect 60 idiom instances. In those cases where idioms are a significant part of the sequential execution time, we generate code that achieves 1.26x to over 20x speedup on integrated and external GPUs.

References

[1]
Martin S. Alnæs, Anders Logg, Kristian B. Olgaard, Marie E. Rognes, and Garth N. Wells. Unified form language: A domain-specific language for weak formulations of partial differential equations. ACM Trans. Math. Softw., 40(2):9:1--9:37, March 2014.
[2]
AMD. clBLAS. https://github.com/clMathLibraries/clBLAS, 2013.
[3]
José M. Andión. Compilation Techniques for Automatic Extraction of Parallelism and Locality in Heterogeneous Architectures. PhD thesis, University of A Coru na, 2015.
[4]
Riyadh Baghdadi, Albert Cohen, Tobias Grosser, Sven Verdoolaege, Anton Lokhmotov, Javed Absar, Sven Van Haastregt, Alexey Kravets, and Alastair Donaldson. PENCIL Language Specification. Research Report RR-8706, INRIA, May 2015.
[5]
Muthu Manikandan Baskaran, J. Ramanujam, and P. Sadayappan. Automatic C-to-CUDA code generation for affine programs. In Proceedings of the 19th Joint European Conference on Theory and Practice of Software, International Conference on Compiler Construction, CC'10/ETAPS'10, pages 244--263, Berlin, Heidelberg, 2010. Springer-Verlag.
[6]
Mohamed-Walid Benabderrahmane, Louis-Noël Pouchet, Albert Cohen, and Cédric Bastoul. The polyhedral model is more widely applicable than you think. In International Conference on Compiler Construction, pages 283--303. Springer, 2010.
[7]
Kevin J. Brown, Arvind K. Sujeeth, Hyouk Joong Lee, Tiark Rompf, Hassan Chafi, Martin Odersky, and Kunle Olukotun. A heterogeneous parallel framework for domain-specific languages. In Proceedings of the 2011 International Conference on Parallel Architectures and Compilation Techniques, PACT '11, pages 89--100, Washington, DC, USA, 2011. IEEE Computer Society.
[8]
Bryan Catanzaro, Michael Garland, and Kurt Keutzer. Copperhead: Compiling an embedded data parallel language. Technical Report UCB/EECS-2010--124, EECS Department, University of California, Berkeley, Sep 2010.
[9]
Hassan Chafi, Arvind K. Sujeeth, Kevin J. Brown, HyoukJoong Lee, Anand R. Atreya, and Kunle Olukotun. A domain-specific approach to heterogeneous parallelism. SIGPLAN Not., 46(8):35--46, February 2011.
[10]
Manuel M.T. Chakravarty, Gabriele Keller, Sean Lee, Trevor L. McDonell, and Vinod Grover. Accelerating haskell array codes with multicore GPUs. In Proceedings of the Sixth Workshop on Declarative Aspects of Multicore Programming, DAMP '11, pages 3--14, New York, NY, USA, 2011. ACM.
[11]
Lam Chi-Chung, P Sadayappan, and Rephael Wenger. On optimizing a class of multi-dimensional loops with reduction for parallel execution. Parallel Processing Letters, 7(02):157--168, 1997.
[12]
Alexander Collins, Dominik Grewe, Vinod Grover, Sean Lee, and Adriana Susnea. Nova: A functional language for data parallelism. In Proceedings of ACM SIGPLAN International Workshop on Libraries, Languages, and Compilers for Array Programming, ARRAY'14, pages 8:8--8:13, New York, NY, USA, 2014. ACM.
[13]
Leonardo De Moura and Nikolaj Bjørner. Z3: An efficient SMT solver. Tools and Algorithms for the Construction and Analysis of Systems, pages 337--340, 2008.
[14]
Johannes Doerfert, Kevin Streit, Sebastian Hack, and Zino Benaissa. Polly's polyhedral scheduling in the presence of reductions. CoRR, abs/1505.07716, 2015.
[15]
Allan L. Fisher and Anwar M. Ghuloum. Parallelizing complex scans and reductions. In Proceedings of the ACM SIGPLAN 1994 Conference on Programming Language Design and Implementation, PLDI '94, pages 135--146, New York, NY, USA, 1994. ACM.
[16]
Franz Franchetti, Frédéric de Mesmay, Daniel McFarlin, and Markus Püschel. Operator language: A program generation framework for fast kernels. In IFIP TC 2 Working Conference on Domain-Specific Languages. Springer, 2009.
[17]
Philip Ginsbach and Michael F. P. O'Boyle. Discovery and exploitation of general reductions: a constraint based approach. In CGO, pages 269--280. ACM, 2017.
[18]
Gautam Gupta and Sanjay V Rajopadhye. Simplifying reductions. In POPL, volume 6, pages 30--41, 2006.
[19]
E. Gutiérrez, O. Plata, and E. L. Zapata. A compiler method for the parallel execution of irregular reductions in scalable shared memory multiprocessors. In Proceedings of the 14th International Conference on Supercomputing, ICS '00, pages 78--87, New York, NY, USA, 2000. ACM.
[20]
Eladio Gutiérrez, O Plata, and Emilio L Zapata. Optimization techniques for parallel irregular reductions. Journal of systems architecture, 49(3):63--74, 2003.
[21]
Eladio Gutiérrez, Oscar Plata, and Emilio L Zapata. An analytical model of locality-based parallel irregular reductions. Parallel Computing, 34(3):133--157, 2008.
[22]
Bastian Hagedorn, Larisa Stoltzfus, Michel Steuwer, Sergei Gorlatch, and Christophe Dubach. High performance stencil code generation with Lift. In CGO. ACM, 2018.
[23]
Intel. Math Kernel Library.
[24]
Thomas B. Jablin, Prakash Prabhu, James A. Jablin, Nick P. Johnson, Stephen R. Beard, and David I. August. Automatic CPU-GPU communication management and optimization. In PLDI, 2011.
[25]
Richard Johnson, David Pearson, and Keshav Pingali. The program structure tree: Computing control regions in linear time. In Proceedings of the ACM SIGPLAN 1994 Conference on Programming Language Design and Implementation, PLDI '94, pages 171--185, New York, NY, USA, 1994. ACM.
[26]
Norman P. Jouppi, Cliff Young, Nishant Patil, David Patterson, Gaurav Agrawal, Raminder Bajwa, Sarah Bates, Suresh Bhatia, Nan Boden, Al Borchers, Rick Boyle, Pierre-luc Cantin, Clifford Chao, Chris Clark, Jeremy Coriell, Mike Daley, Matt Dau, Jeffrey Dean, Ben Gelb, Tara Vazir Ghaemmaghami, Rajendra Gottipati, William Gulland, Robert Hagmann, C. Richard Ho, Doug Hogberg, John Hu, Robert Hundt, Dan Hurt, Julian Ibarz, Aaron Jaffey, Alek Jaworski, Alexander Kaplan, Harshit Khaitan, Daniel Killebrew, Andy Koch, Naveen Kumar, Steve Lacy, James Laudon, James Law, Diemthu Le, Chris Leary, Zhuyuan Liu, Kyle Lucke, Alan Lundin, Gordon MacKean, Adriana Maggiore, Maire Mahony, Kieran Miller, Rahul Nagarajan, Ravi Narayanaswami, Ray Ni, Kathy Nix, Thomas Norrie, Mark Omernick, Narayana Penukonda, Andy Phelps, Jonathan Ross, Matt Ross, Amir Salek, Emad Samadiani, Chris Severn, Gregory Sizikov, Matthew Snelham, Jed Souter, Dan Steinberg, Andy Swing, Mercedes Tan, Gregory Thorson, Bo Tian, Horia Toma, Erick Tuttle, Vijay Vasudevan, Richard Walter, Walter Wang, Eric Wilcox, and Doe Hyun Yoon. In-datacenter performance analysis of a tensor processing unit. In Proceedings of the 44th Annual International Symposium on Computer Architecture, ISCA 2017, Toronto, ON, Canada, June 24--28, 2017, pages 1--12, 2017.
[27]
Pierre Jouvelot and Babak Dehbonei. A unified semantic approach for the vectorization and parallelization of generalized reductions. In Proceedings of the 3rd international conference on Supercomputing, pages 186--194. ACM, 1989.
[28]
Shoaib Kamil, Alvin Cheung, Shachar Itzhaky, and Armando Solar-Lezama. Verified lifting of stencil computations. In Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI '16, pages 711--726, New York, NY, USA, 2016. ACM.
[29]
Chris Lattner and Vikram Adve. LLVM: A compilation framework for lifelong program analysis & transformation. In Code Generation and Optimization, 2004. CGO 2004. International Symposium on, pages 75--86. IEEE, 2004.
[30]
Seyong Lee, Seung-Jai Min, and Rudolf Eigenmann. OpenMP to GPGPU: A compiler framework for automatic translation and optimization. SIGPLAN Not., 44(4):101--110, February 2009.
[31]
Trevor L. McDonell, Manuel M.T. Chakravarty, Gabriele Keller, and Ben Lippmeier. Optimising purely functional GPU programs. In Proceedings of the 18th ACM SIGPLAN International Conference on Functional Programming, ICFP '13, pages 49--60, New York, NY, USA, 2013. ACM.
[32]
Charith Mendis, Jeffrey Bosboom, Kevin Wu, Shoaib Kamil, Jonathan Ragan-Kelley, Sylvain Paris, Qin Zhao, and Saman Amarasinghe. Helium: Lifting high-performance stencil kernels from stripped x86 binaries to Halide DSL code. In Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI '15, pages 391--402, New York, NY, USA, 2015. ACM.
[33]
Ravi Teja Mullapudi, Vinay Vasista, and Uday Bondhugula. Polymage: Automatic optimization for image processing pipelines. In Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS '15, pages 429--443, New York, NY, USA, 2015. ACM.
[34]
Saurav Muralidharan, Amit Roy, Mary W. Hall, Michael Garland, and Piyush Rai. Architecture-adaptive code variant tuning. In ASPLOS, pages 325--338. ACM, 2016.
[35]
Flemming Nielson, Hanne R Nielson, and Chris Hankin. Principles of program analysis. Springer, 2015.
[36]
Nvidia. cuBLAS.
[37]
Nvidia. Nvidia OpenCL Best Practices Guide, 2011.
[38]
Georg Ofenbeck, Tiark Rompf, Alen Stojanov, Martin Odersky, and Markus Püschel. SPIRAL in Scala: Towards the systematic construction of generators for performance libraries. In International Conference on Generative Programming: Concepts & Experiences, GPCE, 2013.
[39]
Phitchaya Mangpo Phothilimthana, Jason Ansel, Jonathan Ragan-Kelley, and Saman P. Amarasinghe. Portable performance on heterogeneous architectures. In ASPLOS, pages 431--444. ACM, 2013.
[40]
Shlomit S. Pinter and Ron Y. Pinter. Program optimization and parallelization using idioms. ACM Trans. Program. Lang. Syst., 16(3):305--327, May 1994.
[41]
Bill Pottenger and Rudolf Eigenmann. Idiom recognition in the polaris parallelizing compiler. In Proceedings of the 9th international conference on Supercomputing, pages 444--448. ACM, 1995.
[42]
Jonathan Ragan-Kelley, Connelly Barnes, Andrew Adams, Sylvain Paris, Frédo Durand, and Saman Amarasinghe. Halide: A language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines. In Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI '13, pages 519--530, New York, NY, USA, 2013. ACM.
[43]
Vignesh T Ravi, Wenjing Ma, David Chiu, and Gagan Agrawal. Compiler and runtime support for enabling generalized reduction computations on heterogeneous parallel configurations. In Proceedings of the 24th ACM international conference on supercomputing, pages 137--146. ACM, 2010.
[44]
Chandan Reddy, Michael Kruse, and Albert Cohen. Reduction drawing: Language constructs and polyhedral compilation for reductions on GPU. In PACT, pages 87--97. ACM, 2016.
[45]
Xavier Redon and Paul Feautrier. Scheduling reductions. In Proceedings of the 8th international conference on Supercomputing, pages 117--125. ACM, 1994.
[46]
Tiark Rompf and Martin Odersky. Lightweight modular staging: A pragmatic approach to runtime code generation and compiled DSLs. Commun. ACM, 55(6), 2012.
[47]
Michel Steuwer, Christian Fensch, Sam Lindley, and Christophe Dubach. Generating performance portable code using rewrite rules: from high-level functional expressions to high-performance OpenCL code. In ICFP, pages 205--217. ACM, 2015.
[48]
Michel Steuwer, Toomas Remmelg, and Christophe Dubach. Lift: a functional data-parallel IR for high-performance GPU code generation. In CGO, pages 74--85. ACM, 2017.
[49]
Kevin Stock, Martin Kong, Tobias Grosser, Louis-Noël Pouchet, Fabrice Rastello, J. Ramanujam, and P. Sadayappan. A framework for enhancing data reuse via associative reordering. In Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI '14, pages 65--76, New York, NY, USA, 2014. ACM.
[50]
Toshio Suganuma, Hideaki Komatsu, and Toshio Nakatani. Detection and global optimization of reduction operations for distributed parallel machines. In Proceedings of the 10th international conference on Supercomputing, pages 18--25. ACM, 1996.
[51]
Sven Verdoolaege, Juan Carlos Juega, Albert Cohen, José Ignacio Gómez, Christian Tenllado, and Francky Catthoor. Polyhedral parallel code generation for CUDA. ACM TACO, 9(4), 2013.
[52]
Huo X., Ravi V., and Agrawal G. Porting irregular reductions on heterogeneous CPU-GPU configurations. In Proceedings of the 18th IEEE International Conference on High Performance Computing, 2011.
[53]
Hao Yu and Lawrence Rauchwerger. An adaptive algorithm selection framework for reduction parallelization. IEEE Transactions on Parallel and Distributed Systems, 17(10):1084--1096, 2006.

Cited By

View all
  • (2024)SpEQ: Translation of Sparse Codes using EquivalencesProceedings of the ACM on Programming Languages10.1145/36564458:PLDI(1680-1703)Online publication date: 20-Jun-2024
  • (2024)Latent Idiom Recognition for a Minimalist Functional Array Language Using Equality SaturationProceedings of the 2024 IEEE/ACM International Symposium on Code Generation and Optimization10.1109/CGO57630.2024.10444879(270-282)Online publication date: 2-Mar-2024
  • (2023)C2TACO: Lifting Tensor Code to TACOProceedings of the 22nd ACM SIGPLAN International Conference on Generative Programming: Concepts and Experiences10.1145/3624007.3624053(42-56)Online publication date: 22-Oct-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ASPLOS '18: Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems
March 2018
827 pages
ISBN:9781450349116
DOI:10.1145/3173162
  • cover image ACM SIGPLAN Notices
    ACM SIGPLAN Notices  Volume 53, Issue 2
    ASPLOS '18
    February 2018
    809 pages
    ISSN:0362-1340
    EISSN:1558-1160
    DOI:10.1145/3296957
    Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 19 March 2018

Permissions

Request permissions for this article.

Check for updates

Author Tag

  1. computer systems organization

Qualifiers

  • Research-article

Funding Sources

  • EPSRC

Conference

ASPLOS '18

Acceptance Rates

ASPLOS '18 Paper Acceptance Rate 56 of 319 submissions, 18%;
Overall Acceptance Rate 535 of 2,713 submissions, 20%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)42
  • Downloads (Last 6 weeks)7
Reflects downloads up to 12 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)SpEQ: Translation of Sparse Codes using EquivalencesProceedings of the ACM on Programming Languages10.1145/36564458:PLDI(1680-1703)Online publication date: 20-Jun-2024
  • (2024)Latent Idiom Recognition for a Minimalist Functional Array Language Using Equality SaturationProceedings of the 2024 IEEE/ACM International Symposium on Code Generation and Optimization10.1109/CGO57630.2024.10444879(270-282)Online publication date: 2-Mar-2024
  • (2023)C2TACO: Lifting Tensor Code to TACOProceedings of the 22nd ACM SIGPLAN International Conference on Generative Programming: Concepts and Experiences10.1145/3624007.3624053(42-56)Online publication date: 22-Oct-2023
  • (2023)A Symbolic Emulator for Shuffle Synthesis on the NVIDIA PTX CodeProceedings of the 32nd ACM SIGPLAN International Conference on Compiler Construction10.1145/3578360.3580253(110-121)Online publication date: 17-Feb-2023
  • (2023)Source Matching and Rewriting for MLIR Using String-Based AutomataACM Transactions on Architecture and Code Optimization10.1145/357128320:2(1-26)Online publication date: 1-Mar-2023
  • (2023)mlirSynth: Automatic, Retargetable Program Raising in Multi-Level IR using Program SynthesisProceedings of the 32nd International Conference on Parallel Architectures and Compilation Techniques10.1109/PACT58117.2023.00012(39-50)Online publication date: 21-Oct-2023
  • (2022)Parallelizing Neural Network Models Effectively on GPU by Implementing Reductions AtomicallyProceedings of the International Conference on Parallel Architectures and Compilation Techniques10.1145/3559009.3569656(451-466)Online publication date: 8-Oct-2022
  • (2022)Bind the gap: compiling real software to hardware FFT acceleratorsProceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation10.1145/3519939.3523439(687-702)Online publication date: 9-Jun-2022
  • (2022)Simplified High Level Parallelism Expression on Heterogeneous Systems through Data Partition Pattern DescriptionThe Computer Journal10.1093/comjnl/bxac01766:6(1400-1418)Online publication date: 14-Mar-2022
  • (2022)Managing the Root Causes of “Internal API Hell”: An Experience ReportProduct-Focused Software Process Improvement10.1007/978-3-031-21388-5_2(21-36)Online publication date: 14-Nov-2022
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media