Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Loop-Oriented Pointer Analysis for Automatic SIMD Vectorization

Published: 30 January 2018 Publication History

Abstract

Compiler-based vectorization represents a promising solution to automatically generate code that makes efficient use of modern CPUs with SIMD extensions. Two main auto-vectorization techniques, superword-level parallelism vectorization (SLP) and loop-level vectorization (LLV), require precise dependence analysis on arrays and structs to vectorize isomorphic scalar instructions (in the case of SLP) and reduce dynamic dependence checks at runtime (in the case of LLV).
The alias analyses used in modern vectorizing compilers are either intra-procedural (without tracking inter-procedural data-flows) or inter-procedural (by using field-sensitive models, which are too imprecise in handling arrays and structs). This article proposes an inter-procedural Loop-oriented Pointer Analysis for C, called Lpa, for analyzing arrays and structs to support aggressive SLP and LLV optimizations effectively. Unlike field-insensitive solutions that pre-allocate objects for each memory allocation site, our approach uses a lazy memory model to generate access-based location sets based on how structs and arrays are accessed. Lpa can precisely analyze arrays and nested aggregate structures to enable SIMD optimizations for large programs. By separating the location set generation as an independent concern from the rest of the pointer analysis, Lpa is designed so that existing points-to resolution algorithms (e.g., flow-insensitive and flow-sensitive pointer analysis) can be reused easily.
We have implemented Lpa fully in the LLVM compiler infrastructure (version 3.8.0). We evaluate Lpa by considering SLP and LLV, the two classic vectorization techniques, on a set of 20 C and Fortran CPU2000/2006 benchmarks. For SLP, Lpa outperforms LLVM’s BasicAA and ScevAA by discovering 139 and 273 more vectorizable basic blocks, respectively, resulting in the best speedup of 2.95% for 173.applu. For LLV, LLVM introduces totally 551 and 652 static bound checks under BasicAA and ScevAA, respectively. In contrast, Lpa has reduced these static checks to 220, with an average of 15.7 checks per benchmark, resulting in the best speedup of 7.23% for 177.mesa.

References

[1]
Lo Andersen. 1994. Program Analysis and Specialization for the C Programming Language. Ph.D. Dissertation.
[2]
Olaf Bachmann, Paul S. Wang, and Eugene V. Zima. 1994. Chains of recurrences—A method to expedite the evaluation of closed-form functions. In Proceedings of the ISAAC’94. 242--249.
[3]
George Balatsouras and Yannis Smaragdakis. 2016. Structure-Sensitive points-to analysis for C and C++. In Proceedings of the SAS’16.
[4]
Rajkishore Barik, Jisheng Zhao, and Vivek Sarkar. 2010. Efficient selection of vector instructions using dynamic programming. In Proceedings of the Micro’10. 201--212.
[5]
Xiaokang Fan, Yulei Sui, Xiangke Liao, and Jingling Xue. 2017. Boosting the precision of virtual call integrity protection with partial pointer analysis for C++. In Proceedings of the 26th ACM SIGSOFT’17. 329--340.
[6]
Tobias Grosser, Hongbin Zheng, Raghesh Aloor, Andreas Simbürger, Armin Größlinger, and Louis-Noël Pouchet. Polly-polyhedral optimization in {LLVM}. In Proceedings of the IMPACT’11.
[7]
Ben Hardekopf and Calvin Lin. 2007. The ant and the grasshopper: Fast and accurate pointer analysis for millions of lines of code. In Proceedings of the PLDI’07. ACM, 290--299.
[8]
B. Hardekopf and C. Lin. 2011. Flow-sensitive pointer analysis for millions of lines of code. In Proceedings of the CGO’11. 289--298.
[9]
ISO90. 1990. ISO/IEC. international standard ISO/IEC 9899, programming languages C.
[10]
Michael Jung and Sorin Alexander Huss. 2004. Fast points-to analysis for languages with structured types. In Software and Compilers for Embedded Systems. Springer, 107--121.
[11]
Ralf Karrenberg. 2015. Whole-function vectorization. In Proceedings of the CGO’11. Springer, 85--125.
[12]
Samuel Larsen and Saman Amarasinghe. 2000. Exploiting superword level parallelism with multimedia instruction sets. In Proceedings of the PLDI’00. 145--156.
[13]
Chris Lattner and Vikram Adve. 2004. LLVM: A compilation framework for lifelong program analysis 8 transformation. In Proceedings of the CGO’04. IEEE Computer Society, 75.
[14]
Ondrej Lhoták and Kwok-Chiang Andrew Chung. 2011. Points-to analysis with efficient strong updates. In Proceedings of the POPL’11. 3--16.
[15]
LLVM-Alias-Analysis. 2017. Retrieved from http://llvm.org/docs/AliasAnalysis.html.
[16]
Saeed Maleki, Yaoqing Gao, Mara J. Garzaran, Tommy Wong, David Padua, et al. 2011. An evaluation of vectorizing compilers. In Proceedings of the PACT’11. IEEE, 372--382.
[17]
Phung Hua Nguyen and Jingling Xue. 2015. Interprocedural side-effect analysis and optimisation in the presence of dynamic class loading. In Proceedings of the ACSC’05. 9--18.
[18]
Esko Nuutila and Eljas Soisalon-Soininen. 1994. On finding the strongly connected components in a directed graph. Inform. Process. Lett. 49, 1 (1994), 9--14.
[19]
Dorit Nuzman, Ira Rosen, and Ayal Zaks. 2006. Auto-vectorization of interleaved data for SIMD. In Proceedings of the PLDI’06. 132--143.
[20]
Dorit Nuzman and Ayal Zaks. 2008. Outer-loop vectorization: Revisited for short SIMD architectures. In Proceedings of the PACT’08. ACM, 2--11.
[21]
Vitor Paisante, Maroua Maalej, Leonardo Barbosa, Laure Gonnord, and Fernando Magno Quintão Pereira. 2016. Symbolic range analysis of pointers. In Proceedings of the CGO’16. ACM, 171--181.
[22]
David J. Pearce, Paul H. J. Kelly, and Chris Hankin. 2007. Efficient field-sensitive pointer analysis of C. Proceedings of the TOPLAS’07 30, 1 (2007), 4.
[23]
Fernando Magno Quintao Pereira and Daniel Berlin. 2009. Wave propagation and deep propagation for pointer analysis. In Proceedings of the CGO’09. 126--135.
[24]
Vasileios Porpodas, Alberto Magni, and Timothy M. Jones. 2015. PSLP: Padded SLP automatic vectorization. In Proceedings of the CGO’15. IEEE, 190--201.
[25]
Ganesan Ramalingam. 1994. The undecidability of aliasing. ACM TOPLAS 16, 5 (1994), 1467--1471.
[26]
Rajiv Ravindran Rick Hank, Loreena Lee. 2010. Implementing next generation points-to in open64. In Open64 Developers Forum. Retrieved from http://www.affinic.com/documents/open64workshop/2010/.
[27]
Radu Rugina and Martin Rinard. 2000. Symbolic bounds analysis of pointers, array indices, and accessed memory regions. In Proceedings of the PLDI’00, Vol. 35. ACM, 182--195.
[28]
Jaewook Shin. 2007. Introducing control flow into vectorized code. In Proceedings of the PACT’07. 280--291.
[29]
Jaewook Shin, Mary Hall, and Jacqueline Chame. 2005. Superword-level parallelism in the presence of control flow. In Proceedings of the CGO’05. 165--175.
[30]
Manu Sridharan and Rastislav Bodík. 2006. Refinement-based context-sensitive points-to analysis for Java. Proceedings of the PLDI’06, 387--400.
[31]
Yulei Sui, Peng Di, and Jingling Xue. 2016. Sparse flow-sensitive pointer analysis for multithreaded programs. In Proceedings of the CGO’16. 160--170.
[32]
Yulei Sui, Yue Li, and Jingling Xue. 2013. Query-directed adaptive heap cloning for optimizing compilers. In Proceedings of the CGO’13. 1--11.
[33]
Yulei Sui and Jingling Xue. 2016a. On-demand strong update analysis via value-flow refinement. In Proceedings of the FSE’16.
[34]
Yulei Sui and Jingling Xue. 2016b. SVF: Interprocedural static value-flow analysis in LLVM. https://github.com/unsw-corg/SVF. In Proceedings of the CC’16. 265--266.
[35]
Yulei Sui, Ding Ye, and Jingling Xue. 2012. Static memory leak detection using full-sparse value-flow analysis. In Proceedings of the ISSTA’12. ACM, 254--264.
[36]
Konrad Trifunovic, Dorit Nuzman, Albert Cohen, Ayal Zaks, and Ira Rosen. 2009. Polyhedral-model guided loop-nest auto-vectorization. In Proceedings of the PACT’09. 327--337.
[37]
Robert van Engelen. 2001. Efficient symbolic analysis for optimizing compilers. In Proceedings of the CC’01. 118--132.
[38]
Robert P. Wilson and Monica S. Lam. 1995. Efficient context-sensitive pointer analysis for C programs. In Proceedings of the PLDI’95. ACM, 1--12.
[39]
Ding Ye, Yulei Sui, and Jingling Xue. 2014a. Accelerating dynamic detection of uses of undefined values with static value-flow analysis. In Proceedings of the CGO’14. ACM, 154.
[40]
Sen Ye, Yulei Sui, and Jingling Xue. 2014b. Region-based selective flow-sensitive pointer analysis. In Proceedings of the SAS’14. Springer, 319--336.
[41]
Xin Zheng and Radu Rugina. 2008. Demand-driven alias analysis for C. In Proceedings of the POPL’08. 197--208.
[42]
Hao Zhou and Jingling Xue. 2016a. A compiler approach for exploiting partial SIMD parallelism. ACM Trans. Arch. Code Optim. 13, 1 (2016), 11:1--11:26.
[43]
Hao Zhou and Jingling Xue. 2016b. Exploiting mixed SIMD parallelism by reducing data reorganization overhead. In Proceedings of the CGO’16. 59--69.

Cited By

View all
  • (2022)Path-sensitive and alias-aware typestate analysis for detecting OS bugsProceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3503222.3507770(859-872)Online publication date: 28-Feb-2022
  • (2020)Duplo: a framework for OCaml post-link optimisationProceedings of the ACM on Programming Languages10.1145/34089804:ICFP(1-29)Online publication date: 3-Aug-2020
  • (2018)The computer for the 21st century: present security & privacy challengesJournal of Internet Services and Applications10.1186/s13174-018-0095-29:1Online publication date: 4-Dec-2018

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Embedded Computing Systems
ACM Transactions on Embedded Computing Systems  Volume 17, Issue 2
Special Issue on MEMCODE 2015 and Regular Papers (Diamonds)
March 2018
640 pages
ISSN:1539-9087
EISSN:1558-3465
DOI:10.1145/3160927
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Journal Family

Publication History

Published: 30 January 2018
Accepted: 01 November 2017
Revised: 01 August 2017
Received: 01 December 2016
Published in TECS Volume 17, Issue 2

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Pointer analysis
  2. SIMD
  3. compiler optimisation
  4. loop-oriented

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

  • ARC

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)26
  • Downloads (Last 6 weeks)1
Reflects downloads up to 30 Aug 2024

Other Metrics

Citations

Cited By

View all
  • (2022)Path-sensitive and alias-aware typestate analysis for detecting OS bugsProceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3503222.3507770(859-872)Online publication date: 28-Feb-2022
  • (2020)Duplo: a framework for OCaml post-link optimisationProceedings of the ACM on Programming Languages10.1145/34089804:ICFP(1-29)Online publication date: 3-Aug-2020
  • (2018)The computer for the 21st century: present security & privacy challengesJournal of Internet Services and Applications10.1186/s13174-018-0095-29:1Online publication date: 4-Dec-2018

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media