Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3485447.3511985acmconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
research-article

DREW: Efficient Winograd CNN Inference with Deep Reuse

Published: 25 April 2022 Publication History

Abstract

Deep learning has been used in various domains, including Web services. Convolutional neural networks (CNNs), which are deep learning representatives, are among the most popular neural networks in Web systems. However, CNN employs a high degree of computing. In comparison to the training phase, the inference process is more frequently done on low-power computing equipments. The limited computing resource and high computation pressure limit the effective use of CNN algorithms in industry. Fortunately, a minimal filtering algorithm called Winograd can reduce convolution calculations by minimizing multiplication operations. We find that Winograd convolution can be sped up further by deep reuse technique, which reuses the similar data and computation processes. In this paper, we propose a new inference method, called DREW, which combines deep reuse with Winograd for further accelerating CNNs. DREW handles three difficulties. First, it can detect the similarities from the complex minimal filtering patterns by clustering. Second, it reduces the online clustering cost in a reasonable range. Third, it provides an adjustable method in clustering granularity balancing the performance and accuracy. Experiments show that 1) DREW further accelerates the Winograd convolution by an average of 2.06 × speedup; 2) when DREW is applied to end-to-end Winograd CNN inference, it achieves 1.71 × the average performance speedup with no (<0.4%) accuracy loss; 3) DREW reduces the number of convolution operations to 11% of the original operations on average.

References

[1]
2014. cuDNN: Efficient Primitives for Deep Learning. https://developer.nvidia.com/cudnn.
[2]
2016. FALCON Library: Fast Image Convolution in Neural Networks on Intel Architecture. https://colfaxresearch.com/falcon-library/.
[3]
2016. Intel(R) Math Kernel Library for Deep Neural Networks. https://github.com/oneapi-src/oneDNN.
[4]
2020. CifarNet. http://places.csail.mit.edu/deepscene/small-projects/TRN-pytorch-pose/model_zoo/models/slim/nets/cifarnet.py.
[5]
2020. LIBXSMM. https://github.com/hfp/libxsmm.
[6]
2020. MaxAs. https://github.com/NervanaSystems/maxas.
[7]
SA Cook. 1966. On the minimum computation time for multiplication. Ph.D. Dissertation. Harvard U., Cambridge, Mass.
[8]
Rahul Duggal, Scott Freitas, Cao Xiao, Duen Horng Chau, and Jimeng Sun. 2020. REST: Robust and Efficient Neural Networks for Sleep Monitoring in the Wild. In WWW ’20: The Web Conference, Yennun Huang, Irwin King, Tie-Yan Liu, and Maarten van Steen (Eds.).
[9]
Boyuan Feng, Yuke Wang, Tong Geng, Ang Li, and Yufei Ding. 2021. APNN-TC: Accelerating arbitrary precision neural networks on ampere GPU tensor cores. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis.
[10]
Boyuan Feng, Yuke Wang, Gushu Li, Yuan Xie, and Yufei Ding. 2021. Palleon: A Runtime System for Efficient Video Processing toward Dynamic Class Skew. In USENIX ATC 21.
[11]
Mikhail Figurnov, Aizhan Ibraimova, Dmitry P Vetrov, and Pushmeet Kohli. 2016. PerforatedCNNs: Acceleration through Elimination of Redundant Convolutions. In Advances in Neural Information Processing Systems 29.
[12]
Junyi Gao, Cao Xiao, Yasha Wang, Wen Tang, Lucas M. Glass, and Jimeng Sun. 2020. StageNet: Stage-Aware Neural Networks for Health Risk Prediction. In WWW ’20: The Web Conference, Yennun Huang, Irwin King, Tie-Yan Liu, and Maarten van Steen (Eds.).
[13]
Aristides Gionis, Piotr Indyk, and Rajeev Motwani. 1999. Similarity Search in High Dimensions via Hashing. In PVLDB.
[14]
Song Han, Huizi Mao, and William J Dally. 2016. Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding. International Conference on Learning Representations (ICLR) (2016).
[15]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In CVPR.
[16]
Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. 2017. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. CoRR abs/1704.04861(2017). arxiv:1704.04861
[17]
Zhenbo Hu, Xiangyu Zou, Wen Xia, Sian Jin, Dingwen Tao, Yang Liu, Weizhe Zhang, and Zheng Zhang. 2020. Delta-DNN: Efficiently Compressing Deep Neural Networks via Exploiting Floats Similarity. In ICPP.
[18]
Zhenbo Hu, Xiangyu Zou, Wen Xia, Yuhong Zhao, Weizhe Zhang, and Donglei Wu. 2021. Smart-DNN: Efficiently Reducing the Memory Requirements of Running Deep Neural Networks on Resource-constrained Platforms. In ICCD.
[19]
Benoit Jacob, Skirmantas Kligys, Bo Chen, Menglong Zhu, Matthew Tang, Andrew Howard, Hartwig Adam, and Dmitry Kalenichenko. 2018. Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference. In CVPR.
[20]
Zhen Jia, Aleksandar Zlateski, Fredo Durand, and Kai Li. 2018. Optimizing N-Dimensional, Winograd-Based Convolution for Manycore CPUs. PPoPP (2018).
[21]
Sian Jin, Chengming Zhang, Xintong Jiang, Yunhe Feng, Hui Guan, Guanpeng Li, Shuaiwen Leon Song, and Dingwen Tao. 2022. COMET: A Novel Memory-Efficient Deep Learning Training Framework by Using Error-Bounded Lossy Compression. PVLDB (2022).
[22]
Alex Krizhevsky. 2012. Learning Multiple Layers of Features from Tiny Images. University of Toronto (05 2012).
[23]
Andrew Lavin and Scott Gray. 2016. Fast Algorithms for Convolutional Neural Networks. In CVPR.
[24]
Yann LeCun, Corinna Cortes, and Christopher J.C. Burges. 1998. THE MNIST DATABASE of handwritten digits. http://yann.lecun.com/exdb/mnist/.
[25]
Yann Lecun, L.D. Jackel, Leon Bottou, A. Brunot, Corinna Cortes, J. S. Denker, Harris Drucker, I. Guyon, U.A. Muller, Eduard Sackinger, Patrice Simard, and V. Vapnik. 1995. Comparison of learning algorithms for handwritten digit recognition. In International Conference on Artificial Neural Networks, Paris, F. Fogelman and P. Gallinari (Eds.).
[26]
Yu-Chiang Li, Chia-Ming Yeh, and Chin-Chen Chang. 2010. Data hiding based on the similarity between neighboring pixels with reversibility. Digital Signal Processing(2010).
[27]
Bin Liu, Ruiming Tang, Yingzhi Chen, Jinkai Yu, Huifeng Guo, and Yuzhou Zhang. 2019. Feature Generation by Convolutional Neural Network for Click-Through Rate Prediction. In The World Wide Web Conference, Ling Liu, Ryen W. White, Amin Mantrach, Fabrizio Silvestri, Julian J. McAuley, Ricardo Baeza-Yates, and Leila Zia (Eds.).
[28]
Junxin Liu, Fangzhao Wu, Chuhan Wu, Yongfeng Huang, and Xing Xie. 2019. Neural Chinese Word Segmentation with Lexicon and Unlabeled Data via Posterior Regularization. In The World Wide Web Conference, Ling Liu, Ryen W. White, Amin Mantrach, Fabrizio Silvestri, Julian J. McAuley, Ricardo Baeza-Yates, and Leila Zia (Eds.).
[29]
Yizhi Liu, Yao Wang, Ruofei Yu, Mu Li, Vin Sharma, and Yida Wang. 2019. Optimizing CNN Model Inference on CPUs. In USENIX ATC.
[30]
Yilin Liu, Shijia Zhang, and Mahanth Gowda. 2021. NeuroPose: 3D Hand Pose Tracking using EMG Wearables. In WWW ’21: The Web Conference, Jure Leskovec, Marko Grobelnik, Marc Najork, Jie Tang, and Leila Zia (Eds.).
[31]
L. Ning, H. Guan, and X. Shen. 2019. Adaptive Deep Reuse: Accelerating CNN Training on the Fly. In ICDE.
[32]
Lin Ning and Xipeng Shen. 2019. Deep Reuse: Streamline CNN Inference on the Fly via Coarse-Grained Computation Reuse. In ICS.
[33]
Zaifeng Pan, Feng Zhang, Yanliang Zhou, Jidong Zhai, Xipeng Shen, Onur Mutlu, and Xiaoyong Du. 2021. Exploring data analytics without decompression on embedded GPU systems. IEEE Transactions on Parallel and Distributed Systems (2021).
[34]
Jathushan Rajasegaran, Naveen Karunanayake, Ashanie Gunathillake, Suranga Seneviratne, and Guillaume Jourjon. 2019. A Multi-modal Neural Embeddings Approach for Detecting Mobile Counterfeit Apps. In The World Wide Web Conference, Ling Liu, Ryen W. White, Amin Mantrach, Fabrizio Silvestri, Julian J. McAuley, Ricardo Baeza-Yates, and Leila Zia (Eds.).
[35]
Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg, and Li Fei-Fei. 2015. ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision (IJCV) (2015).
[36]
Karen Simonyan and Andrew Zisserman. 2014. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv 1409.1556 (09 2014).
[37]
Mingxing Tan, Bo Chen, Ruoming Pang, Vijay Vasudevan, Mark Sandler, Andrew Howard, and Quoc V. Le. 2019. MnasNet: Platform-Aware Neural Architecture Search for Mobile. In CVPR.
[38]
Andrei L Toom. 1963. The complexity of a scheme of functional elements realizing the multiplication of integers. Soviet Mathematics Doklady.
[39]
Hongyi Wang, Scott Sievert, Shengchao Liu, Zachary Charles, Dimitris Papailiopoulos, and Stephen Wright. 2018. ATOMO: Communication-efficient Learning via Atomic Sparsification. In Advances in Neural Information Processing Systems.
[40]
Tianhao Wang and Florian Kerschbaum. 2021. RIGA: Covert and Robust White-Box Watermarking of Deep Neural Networks. In WWW ’21: The Web Conference, Jure Leskovec, Marko Grobelnik, Marc Najork, Jie Tang, and Leila Zia (Eds.).
[41]
Yuke Wang, Boyuan Feng, and Yufei Ding. 2021. DSXplore: Optimizing Convolutional Neural Networks via Sliding-Channel Convolutions. In IPDPS.
[42]
Yuke Wang, Boyuan Feng, and Yufei Ding. 2022. QGTC: Accelerating Quantized GNN via GPU Tensor Cores. In PPoPP.
[43]
Yuke Wang, Boyuan Feng, Xueqiao Peng, and Yufei Ding. 2021. An Efficient Quantitative Approach for Optimizing Convolutional Neural Networks. In CIKM.
[44]
Yanling Wang, Jing Zhang, Shasha Guo, Hongzhi Yin, Cuiping Li, and Hong Chen. 2021. Decoupling representation learning and classification for GNN-based anomaly detection. In SIGIR.
[45]
Wei Wen, Chunpeng Wu, Yandan Wang, Yiran Chen, and Hai Li. 2016. Learning Structured Sparsity in Deep Neural Networks. In Advances in Neural Information Processing Systems 29.
[46]
S. Winograd. 1980. Arithmetic complexity of computations.
[47]
Fangzhao Wu, Junxin Liu, Chuhan Wu, Yongfeng Huang, and Xing Xie. 2019. Neural Chinese Named Entity Recognition via CNN-LSTM-CRF and Joint Training with Word Segmentation. In The World Wide Web Conference, Ling Liu, Ryen W. White, Amin Mantrach, Fabrizio Silvestri, Julian J. McAuley, Ricardo Baeza-Yates, and Leila Zia (Eds.).
[48]
Jiaxiang Wu, Cong Leng, Yuhang Wang, Qinghao Hu, and Jian Cheng. 2016. Quantized Convolutional Neural Networks for Mobile Devices. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[49]
Ruofan Wu, Feng Zhang, Zhen Zheng, Xiaoyong Du, and Xipeng Shen. 2021. Exploring deep reuse in Winograd CNN inference. In PPoPP.
[50]
Da Yan, Wei Wang, and Xiaowen Chu. 2020. Optimizing Batched Winograd Convolution on GPUs. In PPoPP.
[51]
Quanming Yao, Xiangning Chen, James T. Kwok, Yong Li, and Cho-Jui Hsieh. 2020. Efficient Neural Interaction Function Search for Collaborative Filtering. In WWW ’20: The Web Conference, Yennun Huang, Irwin King, Tie-Yan Liu, and Maarten van Steen (Eds.).
[52]
Chengming Zhang, Geng Yuan, Wei Niu, Jiannan Tian, Sian Jin, Donglin Zhuang, Zhe Jiang, Yanzhi Wang, Bin Ren, Shuaiwen Leon Song, 2021. ClickTrain: efficient and accurate end-to-end deep learning training via fine-grained architecture-preserving pruning. In ICS.
[53]
Chenyang Zhang, Feng Zhang, Xiaoguang Guo, Bingsheng He, Xiao Zhang, and Xiaoyong Du. 2020. iMLBench: A Machine Learning Benchmark Suite for CPU-GPU Integrated Architectures. https://github.com/ChenyangZhang-cs/iMLBench. IEEE TPDS (2020).
[54]
Feng Zhang, Zaifeng Pan, Yanliang Zhou, Jidong Zhai, Xipeng Shen, Onur Mutlu, and Xiaoyong Du. 2021. G-TADOC: Enabling Efficient GPU-Based Text Analytics without Decompression. In ICDE.
[55]
Feng Zhang, Jidong Zhai, Bingsheng He, Shuhao Zhang, and Wenguang Chen. 2016. Understanding co-running behaviors on integrated CPU/GPU architectures. IEEE Transactions on Parallel and Distributed Systems (2016).
[56]
Feng Zhang, Jidong Zhai, Xipeng Shen, Onur Mutlu, and Wenguang Chen. 2018. Efficient document analytics on compressed data: Method, challenges, algorithms, insights. Proceedings of the VLDB Endowment(2018).
[57]
F. Zhang, J. Zhai, X. Shen, O. Mutlu, and X. Du. 2020. Enabling Efficient Random Access to Hierarchically-Compressed Data. In ICDE.
[58]
Feng Zhang, Jidong Zhai, Xipeng Shen, Onur Mutlu, and Xiaoyong Du. 2022. POCLib: A High-Performance Framework for Enabling Near Orthogonal Processing on Compression. IEEE Transactions on Parallel and Distributed Systems (2022).
[59]
Feng Zhang, Jidong Zhai, Xipeng Shen, Dalin Wang, Zheng Chen, Onur Mutlu, Wenguang Chen, and Xiaoyong Du. 2020. TADOC: Text analytics directly on compression. The VLDB Journal (2020).
[60]
Letian Zhang, Lixing Chen, and Jie Xu. 2021. Autodidactic Neurosurgeon: Collaborative Deep Inference for Mobile Edge Intelligence via Online Learning. In WWW ’21: The Web Conference, Jure Leskovec, Marko Grobelnik, Marc Najork, Jie Tang, and Leila Zia (Eds.).
[61]
Shuyu Zhang, Donglei Wu, Haoyu Jin, Xiangyu Zou, Wen Xia, and Xiaojia Huang. 2021. QD-Compressor: a Quantization-based Delta Compression Framework for Deep Neural Networks. In ICCD.
[62]
X. Zhang, J. Zou, K. He, and J. Sun. 2016. Accelerating Very Deep Convolutional Networks for Classification and Detection. IEEE Transactions on Pattern Analysis and Machine Intelligence (2016).
[63]
Yipeng Zhang, Bo Du, Lefei Zhang, and Jia Wu. 2020. Parallel DNN Inference Framework Leveraging a Compact RISC-V ISA-Based Multi-Core System. In KDD.
[64]
Zhen Zheng, Xuanda Yang, Pengzhan Zhao, Guoping Long, Kai Zhu, Feiwen Zhu, Wenyi Zhao, Xiaoyong Liu, Jun Yang, Jidong Zhai, Shuaiwen Leon Song, and Wei Lin. 2022. AStitch: Enabling A New Multi-Dimensional Optimization Space for Memory-Intensive ML Training and Inference on Modern SIMT Architectures. In Proceedings of the 27th ACM International Conferenceon Architectural Support for Programming Languages and Operating Systems.
[65]
Zhen Zheng, Pengzhan Zhao, Guoping Long, Feiwen Zhu, Kai Zhu, Wenyi Zhao, Lansong Diao, Jun Yang, and Wei Lin. 2020. Fusionstitching: boosting memory intensive computations for deep learning workloads. arXiv preprint arXiv:2009.10924(2020).

Cited By

View all
  • (2024)Im2col-Winograd: An Efficient and Flexible Fused-Winograd Convolution for NHWC Format on GPUsProceedings of the 53rd International Conference on Parallel Processing10.1145/3673038.3673039(1072-1081)Online publication date: 12-Aug-2024
  • (2024)YFlows: Systematic Dataflow Exploration and Code Generation for Efficient Neural Network Inference using SIMD Architectures on CPUsProceedings of the 33rd ACM SIGPLAN International Conference on Compiler Construction10.1145/3640537.3641566(212-226)Online publication date: 17-Feb-2024
  • (2023)RECom: A Compiler Approach to Accelerating Recommendation Model Inference with Massive Embedding ColumnsProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 410.1145/3623278.3624761(268-286)Online publication date: 25-Mar-2023
  • Show More Cited By

Index Terms

  1. DREW: Efficient Winograd CNN Inference with Deep Reuse
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image ACM Conferences
        WWW '22: Proceedings of the ACM Web Conference 2022
        April 2022
        3764 pages
        ISBN:9781450390965
        DOI:10.1145/3485447
        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Sponsors

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 25 April 2022

        Permissions

        Request permissions for this article.

        Check for updates

        Author Tags

        1. Web systems
        2. Winograd
        3. data reuse
        4. deep reuse

        Qualifiers

        • Research-article
        • Research
        • Refereed limited

        Funding Sources

        Conference

        WWW '22
        Sponsor:
        WWW '22: The ACM Web Conference 2022
        April 25 - 29, 2022
        Virtual Event, Lyon, France

        Acceptance Rates

        Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)115
        • Downloads (Last 6 weeks)11
        Reflects downloads up to 30 Aug 2024

        Other Metrics

        Citations

        Cited By

        View all
        • (2024)Im2col-Winograd: An Efficient and Flexible Fused-Winograd Convolution for NHWC Format on GPUsProceedings of the 53rd International Conference on Parallel Processing10.1145/3673038.3673039(1072-1081)Online publication date: 12-Aug-2024
        • (2024)YFlows: Systematic Dataflow Exploration and Code Generation for Efficient Neural Network Inference using SIMD Architectures on CPUsProceedings of the 33rd ACM SIGPLAN International Conference on Compiler Construction10.1145/3640537.3641566(212-226)Online publication date: 17-Feb-2024
        • (2023)RECom: A Compiler Approach to Accelerating Recommendation Model Inference with Massive Embedding ColumnsProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 410.1145/3623278.3624761(268-286)Online publication date: 25-Mar-2023
        • (2023)GoodCore: Data-effective and Data-efficient Machine Learning through Coreset Selection over Incomplete DataProceedings of the ACM on Management of Data10.1145/35893021:2(1-27)Online publication date: 20-Jun-2023
        • (2023)Learned Data-aware Image Representations of Line Charts for Similarity SearchProceedings of the ACM on Management of Data10.1145/35889421:1(1-29)Online publication date: 30-May-2023
        • (2023)Space-Efficient TREC for Enabling Deep Learning on MicrocontrollersProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 310.1145/3582016.3582062(644-659)Online publication date: 25-Mar-2023
        • (2023)Expanding the Edge: Enabling Efficient Winograd CNN Inference With Deep Reuse on Edge DeviceIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.326901735:10(10181-10196)Online publication date: 1-Oct-2023
        • (2022)Cost-Based or Learning-Based?Proceedings of the VLDB Endowment10.14778/3565838.356584615:13(3924-3936)Online publication date: 1-Sep-2022

        View Options

        Get Access

        Login options

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        HTML Format

        View this article in HTML Format.

        HTML Format

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media