Google Scholar

Drew: Efficient winograd cnn inference with deep reuse

R Wu, F Zhang, J Guan, Z Zheng, X Du… - Proceedings of the ACM …, 2022 - dl.acm.org

R Wu, F Zhang, J Guan, Z Zheng, X Du, X Shen

Proceedings of the ACM Web Conference 2022, 2022•dl.acm.org

Deep learning has been used in various domains, including Web services. Convolutional neural networks (CNNs), which are deep learning representatives, are among the most popular neural networks in Web systems. However, CNN employs a high degree of computing. In comparison to the training phase, the inference process is more frequently done on low-power computing equipments. The limited computing resource and high computation pressure limit the effective use of CNN algorithms in industry. Fortunately, a minimal filtering algorithm called Winograd can reduce convolution calculations by minimizing multiplication operations. We find that Winograd convolution can be sped up further by deep reuse technique, which reuses the similar data and computation processes. In this paper, we propose a new inference method, called DREW, which combines deep reuse with Winograd for further accelerating CNNs. DREW handles three difficulties. First, it can detect the similarities from the complex minimal filtering patterns by clustering. Second, it reduces the online clustering cost in a reasonable range. Third, it provides an adjustable method in clustering granularity balancing the performance and accuracy. Experiments show that 1) DREW further accelerates the Winograd convolution by an average of 2.06 × speedup; 2) when DREW is applied to end-to-end Winograd CNN inference, it achieves 1.71 × the average performance speedup with no (<0.4%) accuracy loss; 3) DREW reduces the number of convolution operations to 11% of the original operations on average.

ACM Digital Library

Show moreShow less

Save Cite Cited by 15 Related articles All 4 versions

Cite

Advanced search

Saved to My library

Drew: Efficient winograd cnn inference with deep reuse