Cvr: Efficient vectorization of spmv on x86 processors

B Xie, J Zhan, X Liu, W Gao, Z Jia, X He… - Proceedings of the 2018 …, 2018 - dl.acm.org
Proceedings of the 2018 International Symposium on Code Generation and …, 2018dl.acm.org
Sparse Matrix-vector Multiplication (SpMV) is an important computation kernel widely used
in HPC and data centers. The irregularity of SpMV is a well-known challenge that limits
SpMV's parallelism with vectorization operations. Existing work achieves limited locality and
vectorization efficiency with large preprocessing overheads. To address this issue, we
present the Compressed Vectorization-oriented sparse Row (CVR), a novel SpMV
representation targeting efficient vectorization. The CVR simultaneously processes multiple …
Sparse Matrix-vector Multiplication (SpMV) is an important computation kernel widely used in HPC and data centers. The irregularity of SpMV is a well-known challenge that limits SpMV’s parallelism with vectorization operations. Existing work achieves limited locality and vectorization efficiency with large preprocessing overheads. To address this issue, we present the Compressed Vectorization-oriented sparse Row (CVR), a novel SpMV representation targeting efficient vectorization. The CVR simultaneously processes multiple rows within the input matrix to increase cache efficiency and separates them into multiple SIMD lanes so as to take the advantage of vector processing units in modern processors. Our method is insensitive to the sparsity and irregularity of SpMV, and thus able to deal with various scale-free and HPC matrices. We implement and evaluate CVR on an Intel Knights Landing processor and compare it with five state-of-the-art approaches through using 58 scale-free and HPC sparse matrices. Experimental results show that CVR can achieve a speedup up to 1.70 × (1.33× on average) and a speedup up to 1.57× (1.10× on average) over the best existing approaches for scale-free and HPC sparse matrices, respectively. Moreover, CVR typically incurs the lowest preprocessing overhead compared with state-of-the-art approaches.
ACM Digital Library