research-article

Open access

TabConv: Low-Computation CNN Inference via Table Lookups

Authors:

Narayanan Kannan,

Pengmiao Zhang,

Viktor PrasannaAuthors Info & Claims

CF '24: Proceedings of the 21st ACM International Conference on Computing Frontiers

Pages 180 - 188

https://doi.org/10.1145/3649153.3649212

Published: 02 July 2024 Publication History

Abstract

Convolutional Neural Networks (CNNs) have demonstrated remarkable ability throughout the field of computer vision. However, CNN inference requires a large number of arithmetic operations making them expensive to deploy in hardware. Current approaches alleviate this issue by developing hardware-supported, algorithmic processes to simplify spatial convolution functions. However, these methods still heavily rely on matrix multiplication, leading to significant computational overhead. To bridge the gap between hardware, algorithmic acceleration, and approximate matrix multiplication, we propose TabConv, a novel, table-based approximation for convolution to significantly reduce arithmetic operations during inference. Additionally, we introduce a priority masking technique based on cosine similarity to select layers for table-based approximation, thereby maintaining the model performance. We evaluate our approach on popular CNNs: ResNet-18, ResNet-34, and NetworkIn-Network (NIN). TabConv preserves over 93% of the original model's performance while reducing arithmetic operations by 36.5%, 25.8%, and 99.4% for ResNet-18 on CIFAR-10, CIFAR-100, and MNIST, respectively, 35.6% and 99.3% for ResNet-34 on CIFAR-10 and MNIST, and 98.9% for NIN on MNIST, achieving low-computation inference.

Supplemental Material

External - TabConv: v0.0.1-alpha

- In this archive, we present the code and samples of the data we used for our publication titled "TabConv: Low-Computation CNN Inference via Table Lookups" in ACM Computing Frontiers 2024 - Software dependencies: Conda, Python, PyTorch, RAPIDS, CuDNN - Hardware dependencies: A platform consisting of an Nvidia GPU with large VRAM connected to a CPU via PCIe, or a multi-core CPU with large DRAM.

Creative Commons Zero v1.0 Universal

Download from: https://doi.org/10.5281/zenodo.10983501

References

[1]

Kamel Abdelouahab, Maxime Pelcat, Jocelyn Serot, and François Berry. 2018. Accelerating CNN inference on FPGAs: A Survey. arXiv:1806.01683 [cs.DC]

[2]

Abien Fred Agarap. 2018. Deep learning using rectified linear units (relu). arXiv preprint arXiv:1803.08375 (2018).

[3]

Davis Blalock and John Guttag. 2021. Multiplying matrices without multiplying. In International Conference on Machine Learning. PMLR, 992--1004.

[4]

Kumar Chellapilla, Sidd Puri, and Patrice Simard. 2006. High performance convolutional neural networks for document processing. In Tenth international workshop on frontiers in handwriting recognition. Suvisoft.

[5]

Yiran Chen, Yuan Xie, Linghao Song, Fan Chen, and Tianqi Tang. 2020. A survey of accelerator architectures for deep neural networks. Engineering 6, 3 (2020), 264--274.

[6]

Sharan Chetlur, Cliff Woolley, Philippe Vandermersch, Jonathan Cohen, John Tran, Bryan Catanzaro, and Evan Shelhamer. 2014. cudnn: Efficient primitives for deep learning. arXiv preprint arXiv:1410.0759 (2014).

[7]

Lu Chi, Borui Jiang, and Yadong Mu. 2020. Fast Fourier Convolution. In Advances in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin (Eds.), Vol. 33. Curran Associates, Inc., 4479--4488. https://proceedings.neurips.cc/paper_files/paper/2020/file/2fd5d41ec6cfab47e32164d5624269b1-Paper.pdf

[8]

Michael B Cohen, Jelani Nelson, and David P Woodruff. 2015. Optimal approximate matrix product in terms of stable rank. arXiv preprint arXiv:1507.02268 (2015).

[9]

Li Deng. 2012. The mnist database of handwritten digit images for machine learning research. IEEE Signal Processing Magazine 29, 6 (2012), 141--142.

[10]

Petros Drineas and Ravi Kannan. 2001. Fast Monte-Carlo algorithms for approximate matrix multiplication. In Proceedings 42nd IEEE Symposium on Foundations of Computer Science. IEEE, 452--459.

[11]

Mostafa Elhoushi, Zihao Chen, Farhan Shafiq, Ye Henry Tian, and Joey Yiwei Li. 2021. Deepshift: Towards multiplication-less neural networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2359--2368.

[12]

Deena P Francis and Kumudha Raimond. 2022. A practical streaming approximate matrix multiplication algorithm. Journal of King Saud University-Computer and Information Sciences 34, 1 (2022), 1455--1465.

Digital Library

[13]

Ross Girshick. 2015. Fast r-cnn. In Proceedings of the IEEE international conference on computer vision. 1440--1448.

Digital Library

[14]

Neelesh Gupta, Pengmiao Zhang, Rajgopal Kannan, and Viktor Prasanna. 2023. PaCKD: Pattern-Clustered Knowledge Distillation for Compressing Memory Access Prediction Models. In 2023 IEEE High Performance Extreme Computing Conference (HPEC). 1--7. https://doi.org/10.1109/HPEC58863.2023.10363610

[15]

Song Han, Huizi Mao, and William J Dally. 2015. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv preprint arXiv:1510.00149 (2015).

[16]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770--778.

[17]

Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. 2015. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 (2015).

[18]

Sergey Ioffe and Christian Szegedy. 2015. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International conference on machine learning. pmlr, 448--456.

[19]

Herve Jegou, Matthijs Douze, and Cordelia Schmid. 2010. Product quantization for nearest neighbor search. IEEE transactions on pattern analysis and machine intelligence 33, 1 (2010), 117--128.

[20]

Heehoon Kim, Hyoungwook Nam, Wookeun Jung, and Jaejin Lee. 2017. Performance analysis of CNN frameworks for GPUs. In 2017 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). IEEE, 55--64.

[21]

Min Soo Kim, Alberto A Del Barrio, Hyunjin Kim, and Nader Bagherzadeh. 2021. The effects of approximate multiplication on convolutional neural networks. IEEE Transactions on Emerging Topics in Computing 10, 2 (2021), 904--916.

[22]

Jong Hwan Ko, Burhan Mudassar, Taesik Na, and Saibal Mukhopadhyay. 2017. Design of an energy-efficient accelerator for training of convolutional neural networks using frequency-domain computation. In Proceedings of the 54th Annual Design Automation Conference 2017. 1--6.

Digital Library

[23]

Alex Krizhevsky, Vinod Nair, and Geoffrey Hinton. [n. d.]. CIFAR-10 (Canadian Institute for Advanced Research). ([n. d.]). http://www.cs.toronto.edu/~kriz/cifar.html

[24]

Alex Krizhevsky, Vinod Nair, and Geoffrey Hinton. [n. d.]. CIFAR-100 (Canadian Institute for Advanced Research). ([n. d.]). http://www.cs.toronto.edu/~kriz/cifar.html

[25]

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. ImageNet Classification with Deep Convolutional Neural Networks. In Advances in Neural Information Processing Systems, F. Pereira, C.J. Burges, L. Bottou, and K.Q. Weinberger (Eds.), Vol. 25. Curran Associates, Inc. https://proceedings.neurips.cc/paper_files/paper/2012/file/c399862d3b9d6b76c8436e924a68c45b-Paper.pdf

[26]

Andrew Lavin and Scott Gray. 2016. Fast algorithms for convolutional neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 4013--4021.

[27]

Rui Li, Yufan Xu, Aravind Sukumaran-Rajam, Atanas Rountev, and P. Sadayappan. 2021. Efficient distributed algorithms for Convolutional Neural Networks. CoRR abs/2105.13480 (2021). arXiv:2105.13480 https://arxiv.org/abs/2105.13480

[28]

Tailin Liang, John Glossner, Lei Wang, Shaobo Shi, and Xiaotong Zhang. 2021. Pruning and quantization for deep neural network acceleration: A survey. Neurocomputing 461 (2021), 370--403.

Digital Library

[29]

Min Lin, Qiang Chen, and Shuicheng Yan. 2014. Network In Network. arXiv:1312.4400 [cs.NE]

[30]

Jonathan Long, Evan Shelhamer, and Trevor Darrell. 2015. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition. 3431--3440.

[31]

Avner Magen and Anastasios Zouzias. 2011. Low rank matrix-valued Chernoff bounds and approximate matrix multiplication. In Proceedings of the twenty-second annual ACM-SIAM symposium on Discrete Algorithms. SIAM, 1422--1436.

[32]

Youssef Mroueh, Etienne Marcheret, and Vaibahava Goel. 2017. Co-occurring directions sketching for approximate matrix multiply. In Artificial Intelligence and Statistics. PMLR, 567--575.

[33]

Eriko Nurvitadhi, Ganesh Venkatesh, Jaewoong Sim, Debbie Marr, Randy Huang, Jason Ong Gee Hock, Yeong Tat Liew, Krishnan Srivatsan, Duncan Moss, Suchit Subhaschandra, et al. 2017. Can FPGAs beat GPUs in accelerating next-generation deep neural networks?. In Proceedings of the 2017 ACM/SIGDA international symposium on field-programmable gate arrays. 5--14.

Digital Library

[34]

Andrew Putnam, Adrian M Caulfield, Eric S Chung, Derek Chiou, Kypros Constantinides, John Demme, Hadi Esmaeilzadeh, Jeremy Fowers, Gopi Prashanth Gopal, Jan Gray, et al. 2015. A reconfigurable fabric for accelerating large-scale datacenter services. IEEE Micro 35, 3 (2015), 10--22.

Digital Library

[35]

Valentin Radu, Kuba Kaszyk, Yuan Wen, Jack Turner, José Cano, Elliot J. Crowley, Björn Franke, Amos Storkey, and Michael O'Boyle. 2019. Performance Aware Convolutional Neural Network Channel Pruning for Embedded GPUs. In 2019 IEEE International Symposium on Workload Characterization (IISWC). 24--34. https://doi.org/10.1109/IISWC47752.2019.9042000

[36]

Simon Rovder, José Cano, and Michael O'Boyle. 2019. Optimising convolutional neural networks inference on low-powered GPUs. (2019).

[37]

Annachiara Ruospo, Ernesto Sanchez, Marcello Traiola, Ian O'Connor, and Alberto Bosio. 2021. Investigating data representation for efficient and reliable Convolutional Neural Networks. Microprocessors and Microsystems 86 (08 2021). https://doi.org/10.1016/j.micpro.2021.104318

Digital Library

[38]

Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, et al. 2015. Imagenet large scale visual recognition challenge. International journal of computer vision 115 (2015), 211--252.

[39]

Caio Salvador Rohwedder, João Paulo Labegalini de Carvalho, José Amaral, Guido Araujo, Giancarlo Colmenares, and Amy Wang. 2021. Pooling Acceleration in the DaVinci Architecture Using Im2col and Col2im Instructions. 46--55. https://doi.org/10.1109/IPDPSW52791.2021.00016

[40]

Karen Simonyan and Andrew Zisserman. 2015. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv:1409.1556 [cs.CV]

[41]

L. Sterpone, S. Azimi, and C. De Sio. 2023. CNN-Oriented Placement Algorithm for High-Performance Accelerators on Rad-Hard FPGAs. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (2023), 1--13. https://doi.org/10.1109/TCAD.2023.3331976

Digital Library

[42]

Hugo Touvron, Andrea Vedaldi, Matthijs Douze, and Hervé Jégou. 2019. Fixing the train-test resolution discrepancy. Advances in neural information processing systems 32 (2019).

[43]

Aravind Vasudevan, Andrew Anderson, and David Gregg. 2017. Parallel multi channel convolution using general matrix multiplication. In 2017 IEEE 28th international conference on application-specific systems, architectures and processors (ASAP). IEEE, 19--24.

[44]

Qiaomin Ye, Luo Luo, and Zhihua Zhang. 2016. Frequent direction algorithms for approximate matrix multiplication with applications in CCA. computational complexity 1, m3 (2016), 2.

[45]

Tan Yu, Junsong Yuan, Chen Fang, and Hailin Jin. 2018. Product quantization network for fast image retrieval. In Proceedings of the European Conference on Computer Vision (ECCV). 186--201.

Digital Library

[46]

Yuhui Yuan, Xilin Chen, and Jingdong Wang. 2020. Object-contextual representations for semantic segmentation. In Computer Vision-ECCV 2020: 16th European Conference, Glasgow, UK, August 23-28, 2020, Proceedings, Part VI 16. Springer, 173--190.

Digital Library

[47]

Pengmiao Zhang, Neelesh Gupta, Rajgopal Kannan, and Viktor K. Prasanna. 2024. Attention, Distillation, and Tabularization: Towards Practical Neural Network-Based Prefetching. arXiv:2401.06362 [cs.NE]

[48]

Pengmiao Zhang, Ajitesh Srivastava, Anant V. Nori, Rajgopal Kannan, and Viktor K. Prasanna. 2022. Fine-grained address segmentation for attention-based variable-degree prefetching. In Proceedings of the 19th ACM International Conference on Computing Frontiers (Turin, Italy) (CF '22). Association for Computing Machinery, New York, NY, USA, 103--112. https://doi.org/10.1145/3528416.3530236

Digital Library

Index Terms

TabConv: Low-Computation CNN Inference via Table Lookups
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
  2. Machine learning
    1. Machine learning approaches
      1. Neural networks

Recommendations

LUT-NN: Empower Efficient Neural Network Inference with Centroid Learning and Table Lookup
ACM MobiCom '23: Proceedings of the 29th Annual International Conference on Mobile Computing and Networking

On-device Deep Neural Network (DNN) inference consumes significant computing resources and development efforts. To alleviate that, we propose LUT-NN, the first system to empower inference by table lookup, to reduce inference cost. LUT-NN learns the ...
RIC-CNN: Rotation-Invariant Coordinate Convolutional Neural Network
Abstract
Due to the lack of rotation invariance in traditional convolution operations, even acting a slight rotation on the input can severely degrade the performance of Convolutional Neural Networks (CNNs). To address this, we propose a Rotation-...
Highlights
- We propose RIC-C: a novel convolutional operation naturally invariant to all input center rotations, no extra parameters or data augmentation.
- Without data augmentation, RIC-CNN shows superior performance on MNIST compared to previous ...
Wavelet-Attention CNN for image classification
Abstract
The feature learning methods based on convolutional neural network (CNN) have successfully produced tremendous achievements in image classification tasks. However, the inherent noise and some other factors may weaken the effectiveness of the ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

CF '24: Proceedings of the 21st ACM International Conference on Computing Frontiers

May 2024

345 pages

ISBN:9798400705977

DOI:10.1145/3649153

Copyright © 2024 Owner/Author.

This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

SIGMICRO: ACM Special Interest Group on Microarchitectural Research and Processing

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 July 2024

Check for updates

Badges

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

Conference

CF '24

Sponsor:

SIGMICRO

CF '24: 21st ACM International Conference on Computing Frontiers

May 7 - 9, 2024

Ischia, Italy

Acceptance Rates

CF '24 Paper Acceptance Rate 33 of 105 submissions, 31%;

Overall Acceptance Rate 273 of 785 submissions, 35%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
67
Total Downloads

Downloads (Last 12 months)67
Downloads (Last 6 weeks)40

Reflects downloads up to 22 Sep 2024

Other Metrics

View Author Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents