Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article
Open access

SpEQ: Translation of Sparse Codes using Equivalences

Published: 20 June 2024 Publication History

Abstract

We present SpEQ, a quick and correct strategy for detecting semantics in sparse codes and enabling automatic translation to high-performance library calls or domain-specific languages (DSLs). When sparse linear algebra codes contain implicit preconditions about how data is stored that hamper direct translation, SpEQ identifies the high-level computation along with storage details and related preconditions. A run-time check guards the translation and ensures that required preconditions are met. We implement SpEQ using the LLVM framework, the Z3 solver, and egglog library and correctly translate sparse linear algebra codes into two high-performance libraries, NVIDIA cuSPARSE and Intel MKL, and OpenMP (OMP). We evaluate SpEQ on ten diverse benchmarks against two state-of-the-art translation tools. SpEQ achieves geometric mean speedups of 3.25×, 5.09×, and 8.04× on OpenMP, MKL, and cuSPARSE backends, respectively. SpEQ is the only tool that can guarantee the correct translation of sparse computations.

References

[1]
2023. MemorySSA. https://llvm.org/docs/MemorySSA.html. Accessed: 2023-11-13.
[2]
Maaz Bin Safeer Ahmad and Alvin Cheung. 2018. Automatically leveraging mapreduce frameworks for data-intensive applications. In Proceedings of the 2018 International Conference on Management of Data. 1205-1220. https://doi.org/10. 1145/3183713.3196891
[3]
Maaz Bin Safeer Ahmad, Jonathan Ragan-Kelley, Alvin Cheung, and Shoaib Kamil. 2019. Automatically translating image processing libraries to halide. ACM Transactions on Graphics (TOG) 38, 6 ( 2019 ), 1-13. https://doi.org/10.1145/ 3355089.3356549
[4]
Alfred V. Aho, Monica S. Lam, Ravi Sethi, and Jefrey D. Ullman. 2006. Compilers: Principles, Techniques, and Tools (2nd Edition). Addison-Wesley Longman Publishing Co., Inc., USA.
[5]
Andrew W. Appel. 1998. SSA is functional programming. SIGPLAN Not. 33, 4 (apr 1998 ), 17-20. https://doi.org/10. 1145/278283.278285
[6]
OpenMP ARB. 2023. OpenMP. https://www.openmp.org/.
[7]
Gilad Arnold, Johannes Hölzl, Ali Sinan Köksal, Rastislav Bodík, and Mooly Sagiv. 2010. Specifying and verifying sparse matrix codes. ACM Sigplan Notices 45, 9 ( 2010 ), 249-260. https://doi.org/10.1145/1863543.1863581
[8]
Nikolaj Bjørner, Anca Browne, and Zohar Manna. 1997. Automatic generation of invariants and intermediate assertions. Theoretical Computer Science 173, 1 ( 1997 ), 49-87. https://doi.org/10.1007/3-540-60299-2_37
[9]
L Susan Blackford, Antoine Petitet, Roldan Pozo, Karin Remington, R Clint Whaley, James Demmel, Jack Dongarra, Iain Duf, Sven Hammarling, Greg Henry, et al. 2002. An updated set of basic linear algebra subprograms (BLAS). ACM Trans. Math. Software 28, 2 ( 2002 ), 135-151. https://doi.org/10.1145/567806.567807
[10]
Alvin Cheung. 2023. MetaLift. https://github.com/metalift/metalift.
[11]
Stephen Chou. 2022. Format Abstractions for the Compilation of Sparse Tensor Algebra. Ph. D. Dissertation. Massachusetts Institute of Technology. https://doi.org/10.1145/3276493
[12]
cuSPARSE [n. d.]. Basic Linear Algebra for Sparse Matrices on NVIDIA GPUs. https://developer.nvidia.com/cusparse.
[13]
R. Cytron, J. Ferrante, B. K. Rosen, M. N. Wegman, and F. K. Zadeck. 1989. An Eficient Method of Computing Static Single Assignment Form. In Proceedings of the 16th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (Austin, Texas, USA) ( POPL '89). Association for Computing Machinery, New York, NY, USA, 25-35. https://doi.org/10.1145/75277.75280
[14]
Timothy A Davis. 2006. Direct methods for sparse linear systems. SIAM. https://doi.org/10.1137/1.9780898718881
[15]
Joao PL De Carvalho, Braedy Kuzma, Ivan Korostelev, José Nelson Amaral, Christopher Barton, José Moreira, and Guido Araujo. 2021. KernelFaRer: replacing native-code idioms with high-performance library calls. ACM Transactions On Architecture And Code Optimization (TACO) 18, 3 ( 2021 ), 1-22. https://doi.org/10.1145/3459010
[16]
Leonardo De Moura and Nikolaj Bjørner. 2007. Eficient E-matching for SMT solvers. In Automated Deduction-CADE-21: 21st International Conference on Automated Deduction Bremen, Germany, July 17-20, 2007 Proceedings 21. Springer, 183-198. https://doi.org/10.1007/978-3-540-73595-3_13
[17]
Leonardo De Moura and Nikolaj Bjørner. 2008. Z3: An eficient SMT solver. In Tools and Algorithms for the Construction and Analysis of Systems: 14th International Conference, TACAS 2008, Held as Part of the Joint European Conferences on Theory and Practice of Software, ETAPS 2008, Budapest, Hungary, March 29-April 6, 2008. Proceedings 14. Springer, 337-340. https://doi.org/10.1007/978-3-540-78800-3_24
[18]
David Detlefs, Greg Nelson, and James B Saxe. 2005. Simplify: a theorem prover for program checking. Journal of the ACM (JACM) 52, 3 ( 2005 ), 365-473. https://doi.org/10.1145/1066100.1066102
[19]
Jack Dongarra, Victor Eijkhout, and Henk van der Vorst. 2001. An iterative solver benchmark. Scientific Programming 9, 4 ( 2001 ), 223-231. https://doi.org/10.1155/ 2001 /527931
[20]
Tristan Dyer, Alper Altuntas, and John Baugh. 2019. Bounded verification of sparse matrix computations. In 2019 IEEE/ACM 3rd International Workshop on Software Correctness for HPC Applications ( Correctness). IEEE, 36-43. https: //doi.org/10.1109/Correctness49594. 2019.00010
[21]
Philip Ginsbach, Bruce Collie, and Michael FP O'Boyle. 2020. Automatically harnessing sparse acceleration. In Proceedings of the 29th International Conference on Compiler Construction. 179-190. https://doi.org/10.1145/3377555. 3377893
[22]
Philip Ginsbach, Toomas Remmelg, Michel Steuwer, Bruno Bodin, Christophe Dubach, and Michael FP O'Boyle. 2018. Automatic matching of legacy code to heterogeneous APIs: An idiomatic approach. In Proceedings of the TwentyThird International Conference on Architectural Support for Programming Languages and Operating Systems. 139-153. https://doi.org/10.1145/3173162.3173182
[23]
Bastian Hagedorn, Johannes Lenfers, Thomas Koehler, Xueying Qin, Sergei Gorlatch, and Michel Steuwer. 2020. Achieving high-performance the functional way: a functional pearl on expressing high-performance optimizations as rewrite strategies. Proceedings of the ACM on Programming Languages 4, ICFP ( 2020 ), 1-29. https://doi.org/10.1145/ 3408974
[24]
Shoaib Kamil, Alvin Cheung, Shachar Itzhaky, and Armando Solar-Lezama. 2016. Verified lifting of stencil computations. ACM SIGPLAN Notices 51, 6 ( 2016 ), 711-726. https://doi.org/10.1145/2908080.2908117
[25]
Shmuel Katz and Zohar Manna. 1976. Logical analysis of programs. Commun. ACM 19, 4 ( 1976 ), 188-206.
[26]
Fredrik Kjolstad, Stephen Chou, David Lugato, Shoaib Kamil, and Saman Amarasinghe. 2017. Taco: A tool to generate tensor algebra kernels. In 2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 943-948. https://doi.org/10.1109/ASE. 2017.8115709
[27]
Thomas Koehler, Phil Trinder, and Michel Steuwer. 2022. Sketch-Guided Equality Saturation.
[28]
Avery Laird. 2024. SpEQ: Translation of Sparse Codes using Equivalences. https://doi.org/10.5281/zenodo.10906216
[29]
Chris Lattner and Vikram Adve. 2004. LLVM: A Compilation Framework for Lifelong Program Analysis and Transformation. In CGO. San Jose, CA, USA, 75-88.
[30]
Chris Lattner, Mehdi Amini, Uday Bondhugula, Albert Cohen, Andy Davis, Jacques Pienaar, River Riddle, Tatiana Shpeisman, Nicolas Vasilache, and Oleksandr Zinenko. 2021. MLIR: Scaling compiler infrastructure for domain specific computation. In 2021 IEEE/ACM International Symposium on Code Generation and Optimization (CGO). IEEE, 2-14. https://doi.org/10.1109/CGO51591. 2021.9370308
[31]
LCSSA 2023. Loop Closed SSA (LCSSA). https://llvm.org/docs/LoopTerminology.html# loop-closed-ssa-lcssa.
[32]
LoopSimplify 2023. Loop Simplify Form. https://llvm.org/docs/LoopTerminology.html# loop-simplify-form.
[33]
Júnior Löf, Dalvan Griebler, Gabriele Mencagli, Gabriell Araujo, Massimo Torquati, Marco Danelutto, and Luiz Gustavo Fernandes. 2021. The NAS Parallel Benchmarks for evaluating C++ parallel programming frameworks on sharedmemory architectures. Future Generation Computer Systems 125 ( 2021 ), 743-757. https://doi.org/10.1016/j.future. 2021. 07.021
[34]
Saeed Maleki, Yaoqing Gao, Maria J. Garzar´n, Tommy Wong, and David A. Padua. 2011. An Evaluation of Vectorizing Compilers. In 2011 International Conference on Parallel Architectures and Compilation Techniques. 372-382. https: //doi.org/10.1109/PACT. 2011.68
[35]
Zohar Manna and Amir Pnueli. 2012. Temporal verification of reactive systems: safety. Springer Science & Business Media. https://doi.org/10.1007/978-1-4612-4222-2
[36]
Pablo Antonio Martínez, Jackson Woodruf, Jordi Armengol-Estapé, Gregorio Bernabé, José Manuel García, and Michael FP O'Boyle. 2023. Matching linear algebra and tensor code to specialized hardware accelerators. In Proceedings of the 32nd ACM SIGPLAN International Conference on Compiler Construction. 85-97.
[37]
MKL [n. d.]. Intel® oneAPI Math Kernel Library. https://www.intel.com/content/www/us/en/developer/tools/oneapi/ onemkl.html.
[38]
Mahdi Soltan Mohammadi, Tomofumi Yuki, Kazem Cheshmi, Eddie C Davis, Mary Hall, Maryam Mehri Dehnavi, Payal Nandy, Catherine Olschanowsky, Anand Venkat, and Michelle Mills Strout. 2019. Sparse computation data dependence simplification for eficient compiler-generated inspectors. In Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation. 594-609. https://doi.org/10.1145/3314221.3314646
[39]
Kedar S Namjoshi and Lenore D Zuck. 2013. Witnessing program transformations. In Static Analysis: 20th International Symposium, SAS 2013, Seattle, WA, USA, June 20-22, 2013. Proceedings 20. Springer, 304-323. https://doi.org/10.1007/978-3-642-38856-9_17
[40]
Charles Gregory Nelson. 1980. Techniques for program verification. Stanford University.
[41]
Michael Norrish and Michelle Mills Strout. 2015. An approach for proving the correctness of inspector/executor transformations. In Languages and Compilers for Parallel Computing: 27th International Workshop, LCPC 2014, Hillsboro, OR, USA, September 15-17, 2014, Revised Selected Papers 27. Springer, 131-145. https://doi.org/10.1007/978-3-319-17473-0_9
[42]
Louis-Noel Pouchet and Tomofumi Yuki. 2019. Polyhedral Benchmark suite. http://polybench.sf.net/.
[43]
Roldan Pozo. 2000. SciMark 2.0. http://math.nist.gov/scimark2/ ( 2000 ).
[44]
Reese T. Prosser. 1959. Applications of Boolean Matrices to the Analysis of Flow Diagrams. In Papers Presented at the December 1-3, 1959, Eastern Joint IRE-AIEE-ACM Computer Conference (Boston, Massachusetts) ( IRE-AIEE-ACM '59 (Eastern)). Association for Computing Machinery, New York, NY, USA, 133-138. https://doi.org/10.1145/1460299. 1460314
[45]
Cosmin Radoi, Stephen J Fink, Rodric Rabbah, and Manu Sridharan. 2014. Translating imperative code to MapReduce. In Proceedings of the 2014 ACM International Conference on Object Oriented Programming Systems Languages & Applications. 909-927. https://doi.org/10.1145/2714064.2660228
[46]
Mike Rainey, Kyle Hale, Ryan R. Newton, Nikos Hardavellas, Simone Campanoni, Peter Dinda, and Umut A. Acar. 2021. Task Parallel Assembly Language for Uncompromising Parallelism. In Proceedings of the 42nd ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI '21). ACM, New York, NY, USA. https: //doi.org/10.1145/3453483.3460969
[47]
Davide Sangiorgi. 2011. Introduction to bisimulation and coinduction. Cambridge University Press. https://doi.org/10. 1017/CBO9780511777110
[48]
Michel Steuwer, Christian Fensch, Sam Lindley, and Christophe Dubach. 2015. Generating performance portable code using rewrite rules: from high-level functional expressions to high-performance OpenCL code. ACM SIGPLAN Notices 50, 9 ( 2015 ), 205-217. https://doi.org/10.1145/2858949.2784754
[49]
Michel Steuwer, Toomas Remmelg, and Christophe Dubach. 2017. Lift: a functional data-parallel IR for high-performance GPU code generation. In 2017 IEEE/ACM International Symposium on Code Generation and Optimization (CGO). IEEE, 74-85. https://doi.org/10.1109/CGO. 2017.7863730
[50]
John A Stratton, Christopher Rodrigues, I-Jui Sung, Nady Obeid, Li-Wen Chang, Nasser Anssari, Geng Daniel Liu, and Wen-mei W Hwu. 2012. Parboil: A revised benchmark suite for scientific and commercial throughput computing. Center for Reliable and High-Performance Computing 127 ( 2012 ), 27.
[51]
Michelle Mills Strout, Mary Hall, and Catherine Olschanowsky. 2018. The sparse polyhedral framework: Composing compiler-generated inspector-executor code. Proc. IEEE 106, 11 ( 2018 ), 1921-1934. https://doi.org/10.1109/JPROC. 2018. 2857721
[52]
Ross Tate, Michael Stepp, Zachary Tatlock, and Sorin Lerner. 2009. Equality saturation: a new approach to optimization. In Proceedings of the 36th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages. 264-276. https://doi.org/10.1145/1594834.1480915
[53]
Anand Venkat, Mary Hall, and Michelle Strout. 2015. Loop and data transformations for sparse matrix code. ACM SIGPLAN Notices 50, 6 ( 2015 ), 521-532. https://doi.org/10.1145/2737924.2738003
[54]
Anand Venkat, Mahdi Soltan Mohammadi, Jongsoo Park, Hongbo Rong, Rajkishore Barik, Michelle Mills Strout, and Mary Hall. 2016. Automating wavefront parallelization for sparse matrix computations. In SC'16: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, 480-491. https://doi.org/10.1109/SC. 2016.40
[55]
Anand Venkat, Manu Shantharam, Mary Hall, and Michelle Mills Strout. 2014. Non-afine extensions to polyhedral code generation. In Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization. 185-194. https://doi.org/10.1145/2544137.2544141
[56]
Max Willsey, Chandrakana Nandi, Yisu Remy Wang, Oliver Flatt, Zachary Tatlock, and Pavel Panchekha. 2021. Egg: Fast and Extensible Equality Saturation. Proc. ACM Program. Lang. 5, POPL, Article 23 (jan 2021 ), 29 pages. https://doi.org/10.1145/3434304
[57]
Yihong Zhang, Yisu Remy Wang, Oliver Flatt, David Cao, Philip Zucker, Eli Rosenthal, Zachary Tatlock, and Max Willsey. 2023. Better Together: Unifying Datalog and Equality Saturation. Proc. ACM Program. Lang. 7, PLDI, Article 125 (jun 2023 ), 25 pages. https://doi.org/10.1145/3591239

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the ACM on Programming Languages
Proceedings of the ACM on Programming Languages  Volume 8, Issue PLDI
June 2024
2198 pages
EISSN:2475-1421
DOI:10.1145/3554317
Issue’s Table of Contents
This work is licensed under a Creative Commons Attribution-NonCommercial International 4.0 License.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 June 2024
Published in PACMPL Volume 8, Issue PLDI

Permissions

Request permissions for this article.

Check for updates

Badges

Author Tags

  1. Equality Saturation
  2. Equivalence Checking
  3. Program Analysis
  4. Verification

Qualifiers

  • Research-article

Funding Sources

  • NSERC

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 217
    Total Downloads
  • Downloads (Last 12 months)217
  • Downloads (Last 6 weeks)82
Reflects downloads up to 04 Oct 2024

Other Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media