Escoin: Efficient Sparse Convolutional Neural Network Inference on GPUs

Chen, Xuhao

Computer Science > Distributed, Parallel, and Cluster Computing

arXiv:1802.10280 (cs)

[Submitted on 28 Feb 2018 (v1), last revised 3 Apr 2019 (this version, v2)]

Title:Escoin: Efficient Sparse Convolutional Neural Network Inference on GPUs

Authors:Xuhao Chen

View PDF

Abstract:Deep neural networks have achieved remarkable accuracy in many artificial intelligence applications, e.g. computer vision, at the cost of a large number of parameters and high computational complexity. Weight pruning can compress DNN models by removing redundant parameters in the networks, but it brings sparsity in the weight matrix, and therefore makes the computation inefficient on GPUs. Although pruning can remove more than 80% of the weights, it actually hurts inference performance (speed) when running models on GPUs.
Two major problems cause this unsatisfactory performance on GPUs. First, lowering convolution onto matrix multiplication reduces data reuse opportunities and wastes memory bandwidth. Second, the sparsity brought by pruning makes the computation irregular, which leads to inefficiency when running on massively parallel GPUs. To overcome these two limitations, we propose Escort, an efficient sparse convolutional neural networks on GPUs. Instead of using the lowering method, we choose to compute the sparse convolutions directly. We then orchestrate the parallelism and locality for the direct sparse convolution kernel, and apply customized optimization techniques to further improve performance. Evaluation on NVIDIA GPUs show that Escort can improve sparse convolution speed by 2.63x and 3.07x, and inference speed by 1.43x and 1.69x, compared to CUBLAS and CUSPARSE respectively.

Subjects:	Distributed, Parallel, and Cluster Computing (cs.DC); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Cite as:	arXiv:1802.10280 [cs.DC]
	(or arXiv:1802.10280v2 [cs.DC] for this version)
	https://doi.org/10.48550/arXiv.1802.10280

Submission history

From: Xuhao Chen [view email]
[v1] Wed, 28 Feb 2018 06:31:45 UTC (632 KB)
[v2] Wed, 3 Apr 2019 21:11:27 UTC (566 KB)

Computer Science > Distributed, Parallel, and Cluster Computing

Title:Escoin: Efficient Sparse Convolutional Neural Network Inference on GPUs

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Distributed, Parallel, and Cluster Computing

Title:Escoin: Efficient Sparse Convolutional Neural Network Inference on GPUs

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators