Weight Block Sparsity: Training, Compilation, and AI Engine Accelerators

D'Alberto, Paolo; Jeong, Taehee; Jain, Akshai; Manjunath, Shreyas; Sarmah, Mrinal; Hsu, Samuel; Raparti, Yaswanth; Pipralia, Nitesh

Computer Science > Machine Learning

arXiv:2407.09453 (cs)

[Submitted on 12 Jul 2024]

Title:Weight Block Sparsity: Training, Compilation, and AI Engine Accelerators

Authors:Paolo D'Alberto, Taehee Jeong, Akshai Jain, Shreyas Manjunath, Mrinal Sarmah, Samuel Hsu, Yaswanth Raparti, Nitesh Pipralia

View PDF HTML (experimental)

Abstract:Nowadays, increasingly larger Deep Neural Networks (DNNs) are being developed, trained, and utilized. These networks require significant computational resources, putting a strain on both advanced and limited devices. Our solution is to implement {\em weight block sparsity}, which is a structured sparsity that is friendly to hardware. By zeroing certain sections of the convolution and fully connected layers parameters of pre-trained DNN models, we can efficiently speed up the DNN's inference process. This results in a smaller memory footprint, faster communication, and fewer operations.
Our work presents a vertical system that allows for the training of convolution and matrix multiplication weights to exploit 8x8 block sparsity on a single GPU within a reasonable amount of time. Compilers recognize this sparsity and use it for both data compaction and computation splitting into threads. Blocks like these take full advantage of both spatial and temporal locality, paving the way for fast vector operations and memory reuse. By using this system on a Resnet50 model, we were able to reduce the weight by half with minimal accuracy loss, resulting in a two-times faster inference speed. We will present performance estimates using accurate and complete code generation for AIE2 configuration sets (AMD Versal FPGAs) with Resnet50, Inception V3, and VGG16 to demonstrate the necessary synergy between hardware overlay designs and software stacks for compiling and executing machine learning applications.

Comments:	12 pages, 10 figures, 1 table
Subjects:	Machine Learning (cs.LG); Hardware Architecture (cs.AR); Computation and Language (cs.CL)
ACM classes:	C.5; D.3.4
Cite as:	arXiv:2407.09453 [cs.LG]
	(or arXiv:2407.09453v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2407.09453

Submission history

From: Paolo D'Alberto [view email]
[v1] Fri, 12 Jul 2024 17:37:49 UTC (504 KB)

Computer Science > Machine Learning

Title:Weight Block Sparsity: Training, Compilation, and AI Engine Accelerators

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Weight Block Sparsity: Training, Compilation, and AI Engine Accelerators

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators