Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article
Open access

Semiring optimizations: dynamic elision of expressions with identity and absorbing elements

Published: 13 November 2020 Publication History

Abstract

This paper describes a compiler optimization to eliminates dynamic occurrences of expressions in the format aabc. The operation ⊕ must admit an identity element z, such that az = a. Also, z must be the absorbing element of ⊗, such that bz = zc = z. Semirings where ⊕ is the additive operator and ⊗ is the multiplicative operator meet this contract. This pattern is common in high-performance benchmarks—its canonical representative being the multiply-add operation aa + b × c. However, several other expressions involving arithmetic and logic operations satisfy the required algebra. We show that the runtime elimination of such assignments can be implemented in a performance-safe way via online profiling. The elimination of dynamic redundancies involving identity and absorbing elements in 35 programs of the LLVM test suite that present semiring patterns brings an average speedup of 1.19x (total optimized time over total unoptimized time) on top of clang -O3. When projected onto the entire test suite (259 programs) the optimization leads to a speedup of 1.025x. Once added onto clang, semiring optimizations approximates it to TACO, a specialized tensor compiler.

Supplementary Material

Auxiliary Presentation Video (oopsla20main-p22-p-video.mp4)
This paper describes a compiler optimization to eliminates dynamic occurrences of expressions in the format a=a+b*c. The operation + must admit an identity element z, such that a+z=a. Also, z must be the absorbing element of *, such that b*z=z*c=z. Semirings where + is the additive operator and * is the multiplicative operator meet this contract. This pattern is common in high-performance benchmarks—its canonical representative being the multiply-add operation a=a+b*c. However, several other expressions involving arithmetic and logic operations satisfy the required algebra. We show that the runtime elimination of such assignments can be implemented in a performance-safe way via online profiling. The elimination of dynamic redundancies involving identity and absorbing elements in 35 programs of the LLVM test suite that present semiring patterns brings an average speedup of 1.19x over clang -O3. When projected onto the entire test suite the optimization leads to a speedup of 1.025x

References

[1]
Kadir Akbudak, Hatem Ltaief, Aleksandr Mikhalev, Ali Charara, Aniello Esposito, and David E. Keyes. 2018. Exploiting Data Sparsity for Large-Scale Matrix Computations. In Euro-Par. Springer, Heidelberg, Germany, 721-734. https: //doi.org/10.1007/978-3-319-96983-1_51
[2]
Gordon B. Bell, Kevin M. Lepak, and Mikko H. Lipasti. 2000. Characterization of Silent Stores. In PACT. IEEE, Washington, DC, USA, 133-142.
[3]
Hans-J. Boehm and Dhruva R. Chakrabarti. 2016. Persistence Programming Models for Non-Volatile Memory. In ISMM. Association for Computing Machinery, New York, NY, USA, 55-67. https://doi.org/10.1145/2926697.2926704
[4]
Qiong Cai and Jingling Xue. 2003. Optimal and Eficient Speculation-Based Partial Redundancy Elimination. In CGO. IEEE, USA, 91-102.
[5]
Brad Calder, Peter Feller, and Alan Eustace. 1997. Value Profiling. In MICRO. IEEE, USA, 259-269.
[6]
D. Callahan, J. Dongarra, and D. Levine. 1988. Vectorizing Compilers: A Test Suite and Results. In Supercomputing (Orlando, Florida, USA). IEEE, Washington, DC, USA, 98-105.
[7]
Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin Zheng, Eddie Yan, Meghan Cowan, Haichen Shen, Leyuan Wang, Yuwei Hu, Luis Ceze, Carlos Guestrin, and Arvind Krishnamurthy. 2018. TVM: An Automated End-to-end Optimizing Compiler for Deep Learning. In OSDI (Carlsbad, CA, USA). USENIX Association, Berkeley, CA, USA, 579-594. http://dl.acm.org/citation.cfm?id= 3291168. 3291211
[8]
Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Cliford Stein. 2009. Introduction to Algorithms, Third Edition (3rd ed.). The MIT Press, Cambridge, MA, US.
[9]
R. Cytron, J. Ferrante, B. K. Rosen, M. N. Wegman, and F. K. Zadeck. 1989. An Eficient Method of Computing Static Single Assignment Form. In POPL (Austin, Texas, USA). ACM, New York, NY, USA, 25-35. https://doi.org/10.1145/75277.75280
[10]
Brian Grant, Matthai Philipose, Markus Mock, Craig Chambers, and Susan J. Eggers. 1999. An Evaluation of Staged Run-Time Optimizations in DyC. In PLDI. Association for Computing Machinery, New York, NY, USA, 293-304. https: //doi.org/10.1145/301618.301683
[11]
David Hilbert. 1904. Die Theorie der algebraischen Zahlkörper. Jahresbericht der Deutschen Mathematiker-Vereinigung, Germany.
[12]
David G. Hough and Mike Cowlishaw. 2019. IEEE Standard for Floating-Point Arithmetic., 84 pages.
[13]
Wen-Mei W. Hwu, Scott A. Mahlke, William Y. Chen, Pohua P. Chang, Nancy J. Warter, Roger A. Bringmann, Roland G. Ouellette, Richard E. Hank, Tokuzo Kiyohara, Grant E. Haab, John G. Holm, and Daniel M. Lavery. 1993. The Superblock: An Efective Technique for VLIW and Superscalar Compilation. J. Supercomput. 7, 1-2 (May 1993 ), 229-248. https: //doi.org/10.1007/BF01205185
[14]
Daniel A. Jiménez. 2003. Reconsidering Complex Branch Predictors. In HPCA (HPCA '03). IEEE Computer Society, USA, 43.
[15]
Samira Khan, Chris Wilkerson, Zhe Wang, Alaa R. Alameldeen, Donghyuk Lee, and Onur Mutlu. 2017. Detecting and Mitigating Data-dependent DRAM Failures by Exploiting Current Memory Content. In MICRO. ACM, New York, NY, USA, 27-40.

Cited By

View all
  • (2021)Reverse engineering for reduction parallelization via semiring polynomialsProceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation10.1145/3453483.3454079(820-834)Online publication date: 19-Jun-2021
  • (2021)Cooperative Profile Guided OptimizationsComputer Graphics Forum10.1111/cgf.1438240:8(71-83)Online publication date: 28-Nov-2021

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the ACM on Programming Languages
Proceedings of the ACM on Programming Languages  Volume 4, Issue OOPSLA
November 2020
3108 pages
EISSN:2475-1421
DOI:10.1145/3436718
Issue’s Table of Contents
This work is licensed under a Creative Commons Attribution International 4.0 License.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 November 2020
Published in PACMPL Volume 4, Issue OOPSLA

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Compiler
  2. Optimization
  3. Profiling
  4. Semiring

Qualifiers

  • Research-article

Funding Sources

  • CNPq

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)139
  • Downloads (Last 6 weeks)26
Reflects downloads up to 30 Aug 2024

Other Metrics

Citations

Cited By

View all
  • (2021)Reverse engineering for reduction parallelization via semiring polynomialsProceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation10.1145/3453483.3454079(820-834)Online publication date: 19-Jun-2021
  • (2021)Cooperative Profile Guided OptimizationsComputer Graphics Forum10.1111/cgf.1438240:8(71-83)Online publication date: 28-Nov-2021

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media