research-article

Loop-Oriented Pointer Analysis for Automatic SIMD Vectorization

Authors:

Yulei Sui,

Xiaokang Fan,

Hao Zhou,

Jingling XueAuthors Info & Claims

ACM Transactions on Embedded Computing Systems (TECS), Volume 17, Issue 2

Article No.: 56, Pages 1 - 31

https://doi.org/10.1145/3168364

Published: 30 January 2018 Publication History

Get Access

Abstract

Compiler-based vectorization represents a promising solution to automatically generate code that makes efficient use of modern CPUs with SIMD extensions. Two main auto-vectorization techniques, superword-level parallelism vectorization (SLP) and loop-level vectorization (LLV), require precise dependence analysis on arrays and structs to vectorize isomorphic scalar instructions (in the case of SLP) and reduce dynamic dependence checks at runtime (in the case of LLV).

The alias analyses used in modern vectorizing compilers are either intra-procedural (without tracking inter-procedural data-flows) or inter-procedural (by using field-sensitive models, which are too imprecise in handling arrays and structs). This article proposes an inter-procedural Loop-oriented Pointer Analysis for C, called Lpa, for analyzing arrays and structs to support aggressive SLP and LLV optimizations effectively. Unlike field-insensitive solutions that pre-allocate objects for each memory allocation site, our approach uses a lazy memory model to generate access-based location sets based on how structs and arrays are accessed. Lpa can precisely analyze arrays and nested aggregate structures to enable SIMD optimizations for large programs. By separating the location set generation as an independent concern from the rest of the pointer analysis, Lpa is designed so that existing points-to resolution algorithms (e.g., flow-insensitive and flow-sensitive pointer analysis) can be reused easily.

We have implemented Lpa fully in the LLVM compiler infrastructure (version 3.8.0). We evaluate Lpa by considering SLP and LLV, the two classic vectorization techniques, on a set of 20 C and Fortran CPU2000/2006 benchmarks. For SLP, Lpa outperforms LLVM’s BasicAA and ScevAA by discovering 139 and 273 more vectorizable basic blocks, respectively, resulting in the best speedup of 2.95% for 173.applu. For LLV, LLVM introduces totally 551 and 652 static bound checks under BasicAA and ScevAA, respectively. In contrast, Lpa has reduced these static checks to 220, with an average of 15.7 checks per benchmark, resulting in the best speedup of 7.23% for 177.mesa.

References

[1]

Lo Andersen. 1994. Program Analysis and Specialization for the C Programming Language. Ph.D. Dissertation.

Abstract

References

Cited By

Index Terms

Recommendations

Loop-oriented array- and field-sensitive pointer analysis for automatic SIMD vectorization

Loop-oriented array- and field-sensitive pointer analysis for automatic SIMD vectorization

Semi-sparse flow-sensitive pointer analysis

Comments

Information

Published In

Publisher

Journal Family

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Funding Sources

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Get Access

Login options

Full Access

View options

PDF

eReader

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations