Google Scholar

Cvr: Efficient vectorization of spmv on x86 processors

B Xie, J Zhan, X Liu, W Gao, Z Jia, X He… - Proceedings of the 2018 …, 2018 - dl.acm.org

B Xie, J Zhan, X Liu, W Gao, Z Jia, X He, L Zhang

Proceedings of the 2018 International Symposium on Code Generation and …, 2018•dl.acm.org

Sparse Matrix-vector Multiplication (SpMV) is an important computation kernel widely used in HPC and data centers. The irregularity of SpMV is a well-known challenge that limits SpMV’s parallelism with vectorization operations. Existing work achieves limited locality and vectorization efficiency with large preprocessing overheads. To address this issue, we present the Compressed Vectorization-oriented sparse Row (CVR), a novel SpMV representation targeting efficient vectorization. The CVR simultaneously processes multiple rows within the input matrix to increase cache efficiency and separates them into multiple SIMD lanes so as to take the advantage of vector processing units in modern processors. Our method is insensitive to the sparsity and irregularity of SpMV, and thus able to deal with various scale-free and HPC matrices. We implement and evaluate CVR on an Intel Knights Landing processor and compare it with five state-of-the-art approaches through using 58 scale-free and HPC sparse matrices. Experimental results show that CVR can achieve a speedup up to 1.70 × (1.33× on average) and a speedup up to 1.57× (1.10× on average) over the best existing approaches for scale-free and HPC sparse matrices, respectively. Moreover, CVR typically incurs the lowest preprocessing overhead compared with state-of-the-art approaches.

ACM Digital Library

Show moreShow less

Save Cite Cited by 80 Related articles All 3 versions

Cite

Advanced search

Saved to My library

Cvr: Efficient vectorization of spmv on x86 processors