Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3649153.3649212acmconferencesArticle/Chapter ViewAbstractPublication PagescfConference Proceedingsconference-collections
research-article
Open access

TabConv: Low-Computation CNN Inference via Table Lookups

Published: 02 July 2024 Publication History

Abstract

Convolutional Neural Networks (CNNs) have demonstrated remarkable ability throughout the field of computer vision. However, CNN inference requires a large number of arithmetic operations making them expensive to deploy in hardware. Current approaches alleviate this issue by developing hardware-supported, algorithmic processes to simplify spatial convolution functions. However, these methods still heavily rely on matrix multiplication, leading to significant computational overhead. To bridge the gap between hardware, algorithmic acceleration, and approximate matrix multiplication, we propose TabConv, a novel, table-based approximation for convolution to significantly reduce arithmetic operations during inference. Additionally, we introduce a priority masking technique based on cosine similarity to select layers for table-based approximation, thereby maintaining the model performance. We evaluate our approach on popular CNNs: ResNet-18, ResNet-34, and NetworkIn-Network (NIN). TabConv preserves over 93% of the original model's performance while reducing arithmetic operations by 36.5%, 25.8%, and 99.4% for ResNet-18 on CIFAR-10, CIFAR-100, and MNIST, respectively, 35.6% and 99.3% for ResNet-34 on CIFAR-10 and MNIST, and 98.9% for NIN on MNIST, achieving low-computation inference.

Supplemental Material

External - TabConv: v0.0.1-alpha
- In this archive, we present the code and samples of the data we used for our publication titled "TabConv: Low-Computation CNN Inference via Table Lookups" in ACM Computing Frontiers 2024 - Software dependencies: Conda, Python, PyTorch, RAPIDS, CuDNN - Hardware dependencies: A platform consisting of an Nvidia GPU with large VRAM connected to a CPU via PCIe, or a multi-core CPU with large DRAM.
Creative Commons Zero v1.0 Universal

References

[1]
Kamel Abdelouahab, Maxime Pelcat, Jocelyn Serot, and François Berry. 2018. Accelerating CNN inference on FPGAs: A Survey. arXiv:1806.01683 [cs.DC]
[2]
Abien Fred Agarap. 2018. Deep learning using rectified linear units (relu). arXiv preprint arXiv:1803.08375 (2018).
[3]
Davis Blalock and John Guttag. 2021. Multiplying matrices without multiplying. In International Conference on Machine Learning. PMLR, 992--1004.
[4]
Kumar Chellapilla, Sidd Puri, and Patrice Simard. 2006. High performance convolutional neural networks for document processing. In Tenth international workshop on frontiers in handwriting recognition. Suvisoft.
[5]
Yiran Chen, Yuan Xie, Linghao Song, Fan Chen, and Tianqi Tang. 2020. A survey of accelerator architectures for deep neural networks. Engineering 6, 3 (2020), 264--274.
[6]
Sharan Chetlur, Cliff Woolley, Philippe Vandermersch, Jonathan Cohen, John Tran, Bryan Catanzaro, and Evan Shelhamer. 2014. cudnn: Efficient primitives for deep learning. arXiv preprint arXiv:1410.0759 (2014).
[7]
Lu Chi, Borui Jiang, and Yadong Mu. 2020. Fast Fourier Convolution. In Advances in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin (Eds.), Vol. 33. Curran Associates, Inc., 4479--4488. https://proceedings.neurips.cc/paper_files/paper/2020/file/2fd5d41ec6cfab47e32164d5624269b1-Paper.pdf
[8]
Michael B Cohen, Jelani Nelson, and David P Woodruff. 2015. Optimal approximate matrix product in terms of stable rank. arXiv preprint arXiv:1507.02268 (2015).
[9]
Li Deng. 2012. The mnist database of handwritten digit images for machine learning research. IEEE Signal Processing Magazine 29, 6 (2012), 141--142.
[10]
Petros Drineas and Ravi Kannan. 2001. Fast Monte-Carlo algorithms for approximate matrix multiplication. In Proceedings 42nd IEEE Symposium on Foundations of Computer Science. IEEE, 452--459.
[11]
Mostafa Elhoushi, Zihao Chen, Farhan Shafiq, Ye Henry Tian, and Joey Yiwei Li. 2021. Deepshift: Towards multiplication-less neural networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2359--2368.
[12]
Deena P Francis and Kumudha Raimond. 2022. A practical streaming approximate matrix multiplication algorithm. Journal of King Saud University-Computer and Information Sciences 34, 1 (2022), 1455--1465.
[13]
Ross Girshick. 2015. Fast r-cnn. In Proceedings of the IEEE international conference on computer vision. 1440--1448.
[14]
Neelesh Gupta, Pengmiao Zhang, Rajgopal Kannan, and Viktor Prasanna. 2023. PaCKD: Pattern-Clustered Knowledge Distillation for Compressing Memory Access Prediction Models. In 2023 IEEE High Performance Extreme Computing Conference (HPEC). 1--7. https://doi.org/10.1109/HPEC58863.2023.10363610
[15]
Song Han, Huizi Mao, and William J Dally. 2015. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv preprint arXiv:1510.00149 (2015).
[16]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770--778.
[17]
Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. 2015. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 (2015).
[18]
Sergey Ioffe and Christian Szegedy. 2015. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International conference on machine learning. pmlr, 448--456.
[19]
Herve Jegou, Matthijs Douze, and Cordelia Schmid. 2010. Product quantization for nearest neighbor search. IEEE transactions on pattern analysis and machine intelligence 33, 1 (2010), 117--128.
[20]
Heehoon Kim, Hyoungwook Nam, Wookeun Jung, and Jaejin Lee. 2017. Performance analysis of CNN frameworks for GPUs. In 2017 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). IEEE, 55--64.
[21]
Min Soo Kim, Alberto A Del Barrio, Hyunjin Kim, and Nader Bagherzadeh. 2021. The effects of approximate multiplication on convolutional neural networks. IEEE Transactions on Emerging Topics in Computing 10, 2 (2021), 904--916.
[22]
Jong Hwan Ko, Burhan Mudassar, Taesik Na, and Saibal Mukhopadhyay. 2017. Design of an energy-efficient accelerator for training of convolutional neural networks using frequency-domain computation. In Proceedings of the 54th Annual Design Automation Conference 2017. 1--6.
[23]
Alex Krizhevsky, Vinod Nair, and Geoffrey Hinton. [n. d.]. CIFAR-10 (Canadian Institute for Advanced Research). ([n. d.]). http://www.cs.toronto.edu/~kriz/cifar.html
[24]
Alex Krizhevsky, Vinod Nair, and Geoffrey Hinton. [n. d.]. CIFAR-100 (Canadian Institute for Advanced Research). ([n. d.]). http://www.cs.toronto.edu/~kriz/cifar.html
[25]
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. ImageNet Classification with Deep Convolutional Neural Networks. In Advances in Neural Information Processing Systems, F. Pereira, C.J. Burges, L. Bottou, and K.Q. Weinberger (Eds.), Vol. 25. Curran Associates, Inc. https://proceedings.neurips.cc/paper_files/paper/2012/file/c399862d3b9d6b76c8436e924a68c45b-Paper.pdf
[26]
Andrew Lavin and Scott Gray. 2016. Fast algorithms for convolutional neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 4013--4021.
[27]
Rui Li, Yufan Xu, Aravind Sukumaran-Rajam, Atanas Rountev, and P. Sadayappan. 2021. Efficient distributed algorithms for Convolutional Neural Networks. CoRR abs/2105.13480 (2021). arXiv:2105.13480 https://arxiv.org/abs/2105.13480
[28]
Tailin Liang, John Glossner, Lei Wang, Shaobo Shi, and Xiaotong Zhang. 2021. Pruning and quantization for deep neural network acceleration: A survey. Neurocomputing 461 (2021), 370--403.
[29]
Min Lin, Qiang Chen, and Shuicheng Yan. 2014. Network In Network. arXiv:1312.4400 [cs.NE]
[30]
Jonathan Long, Evan Shelhamer, and Trevor Darrell. 2015. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition. 3431--3440.
[31]
Avner Magen and Anastasios Zouzias. 2011. Low rank matrix-valued Chernoff bounds and approximate matrix multiplication. In Proceedings of the twenty-second annual ACM-SIAM symposium on Discrete Algorithms. SIAM, 1422--1436.
[32]
Youssef Mroueh, Etienne Marcheret, and Vaibahava Goel. 2017. Co-occurring directions sketching for approximate matrix multiply. In Artificial Intelligence and Statistics. PMLR, 567--575.
[33]
Eriko Nurvitadhi, Ganesh Venkatesh, Jaewoong Sim, Debbie Marr, Randy Huang, Jason Ong Gee Hock, Yeong Tat Liew, Krishnan Srivatsan, Duncan Moss, Suchit Subhaschandra, et al. 2017. Can FPGAs beat GPUs in accelerating next-generation deep neural networks?. In Proceedings of the 2017 ACM/SIGDA international symposium on field-programmable gate arrays. 5--14.
[34]
Andrew Putnam, Adrian M Caulfield, Eric S Chung, Derek Chiou, Kypros Constantinides, John Demme, Hadi Esmaeilzadeh, Jeremy Fowers, Gopi Prashanth Gopal, Jan Gray, et al. 2015. A reconfigurable fabric for accelerating large-scale datacenter services. IEEE Micro 35, 3 (2015), 10--22.
[35]
Valentin Radu, Kuba Kaszyk, Yuan Wen, Jack Turner, José Cano, Elliot J. Crowley, Björn Franke, Amos Storkey, and Michael O'Boyle. 2019. Performance Aware Convolutional Neural Network Channel Pruning for Embedded GPUs. In 2019 IEEE International Symposium on Workload Characterization (IISWC). 24--34. https://doi.org/10.1109/IISWC47752.2019.9042000
[36]
Simon Rovder, José Cano, and Michael O'Boyle. 2019. Optimising convolutional neural networks inference on low-powered GPUs. (2019).
[37]
Annachiara Ruospo, Ernesto Sanchez, Marcello Traiola, Ian O'Connor, and Alberto Bosio. 2021. Investigating data representation for efficient and reliable Convolutional Neural Networks. Microprocessors and Microsystems 86 (08 2021). https://doi.org/10.1016/j.micpro.2021.104318
[38]
Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, et al. 2015. Imagenet large scale visual recognition challenge. International journal of computer vision 115 (2015), 211--252.
[39]
Caio Salvador Rohwedder, João Paulo Labegalini de Carvalho, José Amaral, Guido Araujo, Giancarlo Colmenares, and Amy Wang. 2021. Pooling Acceleration in the DaVinci Architecture Using Im2col and Col2im Instructions. 46--55. https://doi.org/10.1109/IPDPSW52791.2021.00016
[40]
Karen Simonyan and Andrew Zisserman. 2015. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv:1409.1556 [cs.CV]
[41]
L. Sterpone, S. Azimi, and C. De Sio. 2023. CNN-Oriented Placement Algorithm for High-Performance Accelerators on Rad-Hard FPGAs. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (2023), 1--13. https://doi.org/10.1109/TCAD.2023.3331976
[42]
Hugo Touvron, Andrea Vedaldi, Matthijs Douze, and Hervé Jégou. 2019. Fixing the train-test resolution discrepancy. Advances in neural information processing systems 32 (2019).
[43]
Aravind Vasudevan, Andrew Anderson, and David Gregg. 2017. Parallel multi channel convolution using general matrix multiplication. In 2017 IEEE 28th international conference on application-specific systems, architectures and processors (ASAP). IEEE, 19--24.
[44]
Qiaomin Ye, Luo Luo, and Zhihua Zhang. 2016. Frequent direction algorithms for approximate matrix multiplication with applications in CCA. computational complexity 1, m3 (2016), 2.
[45]
Tan Yu, Junsong Yuan, Chen Fang, and Hailin Jin. 2018. Product quantization network for fast image retrieval. In Proceedings of the European Conference on Computer Vision (ECCV). 186--201.
[46]
Yuhui Yuan, Xilin Chen, and Jingdong Wang. 2020. Object-contextual representations for semantic segmentation. In Computer Vision-ECCV 2020: 16th European Conference, Glasgow, UK, August 23-28, 2020, Proceedings, Part VI 16. Springer, 173--190.
[47]
Pengmiao Zhang, Neelesh Gupta, Rajgopal Kannan, and Viktor K. Prasanna. 2024. Attention, Distillation, and Tabularization: Towards Practical Neural Network-Based Prefetching. arXiv:2401.06362 [cs.NE]
[48]
Pengmiao Zhang, Ajitesh Srivastava, Anant V. Nori, Rajgopal Kannan, and Viktor K. Prasanna. 2022. Fine-grained address segmentation for attention-based variable-degree prefetching. In Proceedings of the 19th ACM International Conference on Computing Frontiers (Turin, Italy) (CF '22). Association for Computing Machinery, New York, NY, USA, 103--112. https://doi.org/10.1145/3528416.3530236

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
CF '24: Proceedings of the 21st ACM International Conference on Computing Frontiers
May 2024
345 pages
ISBN:9798400705977
DOI:10.1145/3649153
This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 July 2024

Check for updates

Badges

Author Tags

  1. convolutional neural network
  2. product quantization
  3. table lookup

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

Conference

CF '24
Sponsor:

Acceptance Rates

CF '24 Paper Acceptance Rate 33 of 105 submissions, 31%;
Overall Acceptance Rate 273 of 785 submissions, 35%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 67
    Total Downloads
  • Downloads (Last 12 months)67
  • Downloads (Last 6 weeks)40
Reflects downloads up to 22 Sep 2024

Other Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media