survey

Efficient Deep Learning: A Survey on Making Deep Learning Models Smaller, Faster, and Better

Author:

Gaurav MenghaniAuthors Info & Claims

ACM Computing Surveys, Volume 55, Issue 12

Article No.: 259, Pages 1 - 37

https://doi.org/10.1145/3578938

Published: 02 March 2023 Publication History

Abstract

Deep learning has revolutionized the fields of computer vision, natural language understanding, speech recognition, information retrieval, and more. However, with the progressive improvements in deep learning models, their number of parameters, latency, and resources required to train, among others, have all increased significantly. Consequently, it has become important to pay attention to these footprint metrics of a model as well, not just its quality. We present and motivate the problem of efficiency in deep learning, followed by a thorough survey of the five core areas of model efficiency (spanning modeling techniques, infrastructure, and hardware) and the seminal work there. We also present an experiment-based guide along with code for practitioners to optimize their model training and deployment. We believe this is the first comprehensive survey in the efficient deep learning space that covers the landscape of model efficiency from modeling techniques to hardware support. It is our hope that this survey would provide readers with the mental model and the necessary understanding of the field to apply generic efficiency techniques to immediately get significant improvements, and also equip them with ideas for further research and experimentation to achieve additional gains.

Supplementary Material

3578938-supp (3578938-supp.pdf)

Supplementary material

Download
57.30 KB

References

[1]

Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, et al. 2015. TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. Retrieved February 1, 2023 from https://www.tensorflow.org/.(Software available from tensorflow.org.)

[2]

Apoorv Agnihotri and Nipun Batra. 2020. Exploring Bayesian optimization. Distill 5, 5 (2020), e26.

[3]

Bahriye Akay, Dervis Karaboga, and Rustu Akay. 2022. A comprehensive survey on optimizing deep learning models by metaheuristics. Artificial Intelligence Review 55, 2 (Feb.2022), 829–894.

Digital Library

[4]

Android Developers. 2021. Neural Networks API \(\vert\) Android NDK \(\vert\) Android Developers. Retrieved June 3, 2021 from https://developer.android.com/ndk/guides/neuralnetworks.

[5]

Sajid Anwar, Kyuyeon Hwang, and Wonyong Sung. 2017. Structured pruning of deep convolutional neural networks. ACM Journal on Emerging Technologies in Computing Systems 13, 3 (2017), 1–18.

Digital Library

[6]

Apple Authors. 2021. Accelerate \(\vert\) Apple Developer Documentation. Retrieved June 3, 2021 from https://developer.apple.com/documentation/accelerate.

[7]

PyTorch Authors. 2021. Performance Tuning Guide—PyTorch Tutorials 1.8.1+cu102 Documentation. Retrieved June 3, 2021 from https://pytorch.org/tutorials/recipes/recipes/tuning_guide.html.

[8]

PyTorch Authors. 2021. PyTorch Mobile. Retrieved June 3, 2021 from https://pytorch.org/mobile/home.

[9]

PyTorch Authors. 2021. Quantization Recipe—PyTorch Tutorials 1.8.1+cu102 documentation. Retrieved June 3, 2021 from https://pytorch.org/tutorials/recipes/quantization.html.

[10]

PyTorch Authors. 2021. torch.jit.script—PyTorch 1.8.1 Documentation. Retrieved June 3, 2021 from https://pytorch.org/docs/stable/generated/torch.jit.script.html.

[11]

TensorFlow Authors. 2021. Post-Training Quantization \(\vert\) TensorFlow Lite. Retrieved June 3, 2021 from https://www.tensorflow.org/lite/performance/post_training_quantization.

[12]

Tensorflow Authors. 2021. TensorFlow Lite Converter. Retrieved June 3, 2021 from https://www.tensorflow.org/lite/convert.

[13]

Tensorflow Authors. 2021. TensorFlow Lite \(\vert\) ML for Mobile and Edge Devices. Retrieved June 3, 2021 from https://www.tensorflow.org/lite.

[14]

TensorFlow Authors. 2021. XLA: Optimizing Compiler for Machine Learning \(\vert\) TensorFlow. Retrieved June 3, 2021 from https://www.tensorflow.org/xla.

[15]

XNNPACK Authors. 2021. XNNPACK. Retrieved June 3, 2021 from https://github.com/google/XNNPACK.

[16]

XNNPACK Authors. 2021. XNNPACK Backend for TensorFlow Lite. Retrieved June 3, 2021 from https://github.com/tensorflow/tensorflow/blob/master/tensorflow/lite/delegates/xnnpack/README.md/#sparse-inference.

[17]

Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2015. Neural machine translation by jointly learning to align and translate. In Proceedings of the 3rd International Conference on Learning Representations (ICLR’15).

[18]

James Bergstra and Yoshua Bengio. 2012. Random search for hyper-parameter optimization.Journal of Machine Learning Research 13, 2 (2012), 281–305.

[19]

Avrim Blum and Tom Mitchell. 1998. Combining labeled and unlabeled data with co-training. In Proceedings of the 11th Annual Conference on Computational Learning Theory. 92–100.

Digital Library

[20]

James Bradbury, Stephen Merity, Caiming Xiong, and Richard Socher. 2017. Quasi-recurrent neural networks. In Proceedings of the 5th International Conference on Learning Representations (ICLR ’17).

[21]

Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, et al. 2020. Language models are few-shot learners. In Advances in Neural Information Processing Systems 33 (NeurIPS’20).

[22]

Cristian Buciluǎ, Rich Caruana, and Alexandru Niculescu-Mizil. 2006. Model compression. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 535–541.

Digital Library

[23]

Moses S. Charikar. 2002. Similarity estimation techniques from rounding algorithms. In Proceedings of the 34th Annual ACM Symposium on Theory of Computing. 380–388.

Digital Library

[24]

Sneha Chaudhari, Varun Mithal, Gungor Polatkan, and Rohan Ramanath. 2021. An attentive survey of attention models. ACM Transactions on Intelligent Systems and Technology 12, 5 (2021), Article 53, 32 pages. DOI:

Digital Library

[25]

Nitesh V. Chawla, Kevin W. Bowyer, Lawrence O. Hall, and W. Philip Kegelmeyer. 2002. SMOTE: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research 16 (2002), 321–357.

[26]

Yu Cheng, Duo Wang, Pan Zhou, and Tao Zhang. 2018. Model compression and acceleration for deep neural networks: The principles, progress, and challenges. IEEE Signal Processing Magazine 35, 1 (2018), 126–136. DOI:

[27]

François Chollet. 2017. Xception: Deep learning with depthwise separable convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1251–1258.

[28]

Francois Chollet. 2020. The Keras Blog. Retrieved June 4, 2021 from https://blog.keras.io.

[29]

Aakanksha Chowdhery, Sharan Narang, Jacob Devlin, Maarten Bosma, Gaurav Mishra, Adam Roberts, Paul Barham, et al. 2022. PaLM: Scaling language modeling with pathways. arXiv preprint arXiv:2204.02311 (2022).

[30]

Dan C. Ciresan, Ueli Meier, Jonathan Masci, Luca Maria Gambardella, and Jürgen Schmidhuber. 2011. High-performance neural networks for visual object classification. CoRR abs/1102.0183 (2011). http://arxiv.org/abs/1102.0183.

[31]

Contributors to Wikimedia Projects. 2021. AVX-512—Wikipedia. Retrieved June 3, 2021 from https://en.wikipedia.org/w/index.php?title=AVX-512&oldid=1025044245.

[32]

Contributors to Wikimedia Projects. 2021. CUDA—Wikipedia. Retrieved June 3, 2021 from https://en.wikipedia.org/w/index.php?title=CUDA&oldid=1025500257.

[33]

Contributors to Wikimedia Projects. 2021. Hyperparameter Optimization—Wikipedia. Retrieved June 3, 2021 from https://en.wikipedia.org/w/index.php?title=Hyperparameter_optimization&oldid=1022309479.

[34]

Contributors to Wikimedia Projects. 2021. Multiply-Accumulate Operation—Wikipedia. Retrieved June 3, 2021 from https://en.wikipedia.org/w/index.php?title=Multiply-accumulate_operation&oldid=1026461481.

[35]

Contributors to Wikimedia Projects. 2021. SSE \(^{4}\) —Wikipedia. Retrieved June 3, 2021 from https://en.wikipedia.org/w/index.php?title=SSE4&oldid=1023092035.

[36]

Ekin D. Cubuk, Barret Zoph, Jonathon Shlens, and Quoc V. Le. 2020. RandAugment: Practical automated data augmentation with a reduced search space. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. 702–703.

[37]

Mostafa Dehghani, Yi Tay, Anurag Arnab, Lucas Beyer, and Ashish Vaswani. 2022. The efficiency misnomer. In Proceedings of the International Conference on Learning Representations.

[38]

Tim Dettmers and Luke Zettlemoyer. 2019. Sparse networks from scratch: Faster training without losing performance. arXiv preprint arXiv:1907.04840 (2019).

[39]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers).

[40]

Thomas G. Dietterich. 2000. Ensemble methods in machine learning. In Proceedings of the International Workshop on Multiple Classifier Systems. 1–15.

Digital Library

[41]

Carl Doersch, Abhinav Gupta, and Alexei A. Efros. 2015. Unsupervised visual representation learning by context prediction. In Proceedings of the IEEE International Conference on Computer Vision. 1422–1430.

Digital Library

[42]

Xin Dong, Shangyu Chen, and Sinno Jialin Pan. 2017. Learning to prune deep neural networks via layer-wise optimal brain surgeon. In Proceedings of the 31st International Conference on Neural Information Processing Systems.

[43]

Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, et al. 2021. An image is worth 16x16 words: Transformers for image recognition at scale. In Proceedings of the 9th International Conference on Learning Representations (ICLR ’21).https://openreview.net/forum?id=YicbFdNTTy.

[44]

Marat Dukhan, Yiming Wu Wu, and Hao Lu. 2020. QNNPACK: Open Source Library for Optimized Mobile Deep Learning—Facebook Engineering. Retrieved June 3, 2021 from https://engineering.fb.com/2018/10/29/ml-applications/qnnpack.

[45]

Erich Elsen, Marat Dukhan, Trevor Gale, and Karen Simonyan. 2020. Fast sparse ConvNets. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 14629–14638.

[46]

Thomas Elsken, Jan Hendrik Metzen, and Frank Hutter. 2019. Neural architecture search: A survey.Journal of Machine Learning Research 20, 55 (2019), 1–21.

[47]

Utku Evci, Trevor Gale, Jacob Menick, Pablo Samuel Castro, and Erich Elsen. 2020. Rigging the lottery: Making all tickets winners. In Proceedings of the International Conference on Machine Learning. 2943–2952.

[48]

Alhussein Fawzi, Horst Samulowitz, Deepak Turaga, and Pascal Frossard. 2016. Adaptive data augmentation for image classification. In Proceedings of the 2016 IEEE International Conference on Image Processing (ICIP’16). IEEE, Los Alamitos, CA, 3688–3692.

[49]

Spyros Gidaris, Praveer Singh, and Nikos Komodakis. 2018. Unsupervised representation learning by predicting image rotations. In Proceedings of the 6th International Conference on Learning Representations (ICLR ’18).

[50]

Daniel Golovin, Benjamin Solnik, Subhodeep Moitra, Greg Kochanski, John Karro, and D. Sculley. 2017. Google Vizier: A service for black-box optimization. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 1487–1495.

Digital Library

[51]

Google. 2021. Edge TPU Performance Benchmarks \(\vert\) Coral. Retrieved June 3, 2021 from https://coral.ai/docs/edgetpu/benchmarks.

[52]

Arjun Gopalan, Da-Cheng Juan, Cesar Ilharco Magalhaes, Chun-Sung Ferng, Allan Heydon, Chun-Ta Lu, Philip Pham, George Yu, Yicheng Fan, and Yueqi Wang. 2021. Neural structured learning: Training neural networks with structured signals. In Proceedings of the 14th ACM International Conference on Web Search and Data Mining. 1150–1153.

Digital Library

[53]

Song Han, Huizi Mao, and William J. Dally. 2016. Deep compression: Compressing deep neural networks with pruning, trained quantization and Huffman coding. In Proceedings of the 4th International Conference on Learning Representations (ICLR ’16).

[54]

Song Han, Jeff Pool, John Tran, and William J. Dally. 2015. Learning both weights and connections for efficient neural networks. In Proceedings of the 28th International Conference on Neural Information Processing Systems (NIPS’15). 1135–1143.

[55]

Lars Kai Hansen and Peter Salamon. 1990. Neural network ensembles. IEEE Transactions on Pattern Analysis and Machine Intelligence 12, 10 (1990), 993–1001.

Digital Library

[56]

Babak Hassibi, David G. Stork, and Gregory J. Wolff. 1993. Optimal brain surgeon and general network pruning. In Proceedings of the IEEE International Conference on Neural Networks. IEEE, Los Alamitos, CA, 293–299.

[57]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770–778.

[58]

Yihui He, Ji Lin, Zhijian Liu, Hanrui Wang, Li-Jia Li, and Song Han. 2018. AMC: AutoML for model compression and acceleration on mobile devices. In Proceedings of the European Conference on Computer Vision (ECCV’18). 784–800.

Digital Library

[59]

Lennart Heim. 2022. Estimating xn–PaLM-kd53c’s training cost. Blog.heim. Retrieved February 1, 2023 from https://blog.heim.xyz/palm-training-cost.

[60]

Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. 2014. Distilling the knowledge in a neural network. In Proceedings of the NeurIPS 2014 Deep Learning Workshop.

[61]

Sara Hooker, Nyalleng Moorosi, Gregory Clark, Samy Bengio, and Emily Denton. 2020. Characterising bias in compressed models. arXiv preprint arXiv:2010.03058 (2020).

[62]

Andrew Howard, Mark Sandler, Grace Chu, Liang-Chieh Chen, Bo Chen, Mingxing Tan, Weijun Wang, et al. 2019. Searching for MobileNetV3. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 1314–1324.

[63]

Jeremy Howard and Sebastian Ruder. 2018. Universal language model fine-tuning for text classification. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, Volume 1 (Long Papers). 328–339.

[64]

Chi-Hung Hsu, Shu-Huan Chang, Jhao-Hong Liang, Hsin-Ping Chou, Chun-Hao Liu, Shih-Chieh Chang, Jia-Yu Pan, Yu-Ting Chen, Wei Wei, and Da-Cheng Juan. 2018. Monas: Multi-objective neural architecture search using reinforcement learning. arXiv preprint arXiv:1806.10332 (2018).

[65]

Itay Hubara, Matthieu Courbariaux, Daniel Soudry, Ran El-Yaniv, and Yoshua Bengio. 2017. Quantized neural networks: Training neural networks with low precision weights and activations. Journal of Machine Learning Research 18, 1 (2017), 6869–6898.

Digital Library

[66]

Hiroshi Inoue. 2018. Data augmentation by pairing samples for images classification. In Proceedings of the 6th International Conference on Learning Representations (ICLR’18).

[67]

Benoit Jacob, Skirmantas Kligys, Bo Chen, Menglong Zhu, Matthew Tang, Andrew Howard, Hartwig Adam, and Dmitry Kalenichenko. 2018. Quantization and training of neural networks for efficient integer-arithmetic-only inference. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2704–2713.

[68]

Max Jaderberg, Valentin Dalibard, Simon Osindero, Wojciech M. Czarnecki, Jeff Donahue, Ali Razavi, Oriol Vinyals, et al. 2017. Population based training of neural networks. arXiv preprint arXiv:1711.09846 (2017).

[69]

Kevin Jamieson and Ameet Talwalkar. 2016. Non-stochastic best arm identification and hyperparameter optimization. In Proceedings of the 19th International Conference on Artificial Intelligence and Statistics (AISTATS’16). 240–248.

[70]

Jeremy Jordan. 2020. Setting the learning rate of your neural network.Jeremy Jordan. Retrieved February 1, 2023 from https://www.jeremyjordan.me/nn-learning-rate.

[71]

Norman P. Jouppi, Cliff Young, Nishant Patil, David Patterson, Gaurav Agrawal, Raminder Bajwa, Sarah Bates, et al. 2017. In-datacenter performance analysis of a tensor processing unit. In Proceedings of the 44th Annual International Symposium on Computer Architecture. 1–12.

Digital Library

[72]

Prabhu Kaliamoorthi, Sujith Ravi, and Zornitsa Kozareva. 2019. PRADO: Projection attention networks for document classification on-device. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP’19). 5012–5021.

[73]

Prabhu Kaliamoorthi, Aditya Siddhant, Edward Li, and Melvin Johnson. 2021. Distilling large language models into tiny and effective students using pQRNN. arXiv preprint arXiv:2101.08890 (2021).

[74]

Pankaj Kanwar, Peter Brandt, and Zongwei Zhou. 2021. TensorFlow 2 MLPerf Submissions Demonstrate Best-in-Class Performance on Google Cloud. Retrieved June 3, 2021 from https://blog.tensorflow.org/2020/07/tensorflow-2-mlperf-submissions.html.

[75]

Raghuraman Krishnamoorthi. 2018. Quantizing deep convolutional networks for efficient inference: A whitepaper. arXiv:1806.08342 (2018). https://arxiv.org/abs/1806.08342v1.

[76]

Alex Krizhevsky. 2009. Learning Multiple Layers of Features from Tiny Images. Technical Report. University of Toronto.

[77]

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems 25. 1097–1105.

[78]

Anders Krogh and Jesper Vedelsby. 1994. Neural network ensembles, cross validation and active learning. In Proceedings of the 7th International Conference on Neural Information Processing Systems(NIPS’94).

Digital Library

[79]

H. T. Kung and C. E. Leiserson. 1980. Introduction to VLSI systems. Algorithms for VLSI Processor Arrays, C. A. Mead and L. Conway (Eds.). Addison-Wesley, Reading, MA, 271–292.

[80]

Hsiang-Tsung Kung. 1982. Why systolic architectures?IEEE Computer 15, 1 (1982), 37–46.

Digital Library

[81]

Yann Lecun, Leon Bottou, Yoshua Bengio, and Patrick Haffner. 1998. Gradient-based learning applied to document recognition. Proceedings of the IEEE 86, 11 (Nov.1998), 2278–2324. DOI:

[82]

Yann LeCun, John S. Denker, and Sara A. Solla. 1990. Optimal brain damage. In Advances in Neural Information Processing Systems 2. 598–605.

Digital Library

[83]

Fengfu Li, Bo Zhang, and Bin Liu. 2016. Ternary weight networks. In Proceedings of the 30th Conference on Neural Information Processing Systems (NIPS’16).

[84]

Hao Li, Asim Kadav, Igor Durdanovic, Hanan Samet, and Hans Peter Graf. 2016. Pruning filters for efficient ConvNets. In Proceedings of the 4th International Conference on Learning Representations (ICLR’16): Poster.

[85]

Lisha Li, Kevin Jamieson, Giulia DeSalvo, Afshin Rostamizadeh, and Ameet Talwalkar. 2017. Hyperband: A novel bandit-based approach to hyperparameter optimization. Journal of Machine Learning Research 18, 1 (2017), 6765–6816.

Digital Library

[86]

Sharon Y. Li. 2020. Automating data augmentation: Practice, theory and new direction. SAIL Blog. Retrieved February 1, 2023 from http://ai.stanford.edu/blog/data-augmentation.

[87]

Chenxi Liu, Barret Zoph, Maxim Neumann, Jonathon Shlens, Wei Hua, Li-Jia Li, Li Fei-Fei, Alan Yuille, Jonathan Huang, and Kevin Murphy. 2018. Progressive neural architecture search. In Proceedings of the European Conference on Computer Vision (ECCV’18). 19–34.

Digital Library

[88]

Hanxiao Liu, Karen Simonyan, and Yiming Yang. 2019. DARTS: Differentiable architecture search. In Proceedings of the 7th International Conference on Learning Representations (ICLR’19).

[89]

Zechun Liu, Haoyuan Mu, Xiangyu Zhang, Zichao Guo, Xin Yang, Kwang-Ting Cheng, and Jian Sun. 2019. Metapruning: Meta learning for automatic neural network channel pruning. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 3296–3305.

[90]

Arm Ltd.2021. SIMD ISAs \(\vert\) Neon—Arm Developer. Retrieved June 3, 2021 from https://developer.arm.com/architectures/instruction-sets/simd-isas/neon.

[91]

Sachin Mehta and Mohammad Rastegari. 2021. MobileViT: Light-weight, general-purpose, and mobile-friendly vision transformer. arXiv preprint arXiv:2110.02178 (2021).

[92]

Gaurav Menghani and Sujith Ravi. 2019. Learning from a teacher using unlabeled data. arXiv preprint arXiv:1911.05275 (2019).

[93]

Tomas Mikolov, Edouard Grave, Piotr Bojanowski, Christian Puhrsch, and Armand Joulin. 2017. Advances in pre-training distributed word representations. arXiv preprint arXiv:1712.09405 (2017).

[94]

Rahul Mishra, Hari Prabhat Gupta, and Tanima Dutta. 2020. A survey on deep neural network compression: Challenges, overview, and solutions. arXiv preprint arXiv:2010.03954 (2020).

[95]

MLCommons. 2022. v2.0 Results. Retrieved June 29, 2022 from https://mlcommons.org/en/training-normal-20.

[96]

MLCommons. 2022. v2.0 Results. Retrieved June 29, 2022 from https://mlcommons.org/en/inference-datacenter-20.

[97]

Jonas Močkus. 1975. On Bayesian methods for seeking the extremum. In Proceedings of the Optimization Techniques IFIP Technical Conference. 400–404.

[98]

Pavlo Molchanov, Stephen Tyree, Tero Karras, Timo Aila, and Jan Kautz. 2017. Pruning convolutional neural networks for resource efficient inference. arXiv:1611.06440 (2017).

[99]

MosaicML. 2022. Blazingly Fast Computer Vision Training with the Mosaic ResNet and Composer. Retrieved June 29, 2022 from https://www.mosaicml.com/blog/mosaic-resnet.

[100]

NVIDIA. 2020. GTC 2020: Accelerating Sparsity in the NVIDIA Ampere Architecture. Retrieved June 3, 2021 from https://developer.nvidia.com/gtc/2020/video/s22085-vid.

[101]

NVIDIA. 2020. Inside Volta: The World’s Most Advanced Data Center GPU \(\vert\) NVIDIA Developer Blog. Retrieved June 3, 2021 from https://developer.nvidia.com/blog/inside-volta.

[102]

NVIDIA. 2021. NVIDIA Embedded Systems for Next-Gen Autonomous Machines. Retrieved June 4, 2021 from https://www.nvidia.com/en-us/autonomous-machines/embedded-systems.

[103]

Rina Panigrahy. 2021. Matrix Compression Operator. Retrieved June 5, 2021 from https://blog.tensorflow.org/2020/02/matrix-compression-operator-tensorflow.html.

[104]

PapersWithCode.com. 2021. Papers with Code—The Latest in Machine Learning. Retrieved June 3, 2021 from https://paperswithcode.com.

[105]

Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, et al. 2019. PyTorch: An imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems 32 (NeurIPS’19). 8024–8035.

[106]

Jeffrey Pennington, Richard Socher, and Christopher D. Manning. 2014. GloVe: Global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP’14). 1532–1543.

[107]

Hieu Pham, Melody Guan, Barret Zoph, Quoc Le, and Jeff Dean. 2018. Efficient neural architecture search via parameters sharing. In Proceedings of the International Conference on Machine Learning. 4095–4104.

[108]

Rajat Raina, Anand Madhavan, and Andrew Y. Ng. 2009. Large-scale deep unsupervised learning using graphics processors. In Proceedings of the 26th Annual International Conference on Machine Learning. 873–880.

Digital Library

[109]

Aditya Ramesh, Prafulla Dhariwal, Alex Nichol, Casey Chu, and Mark Chen. 2022. Hierarchical text-conditional image generation with CLIP latents. arXiv preprint arXiv:2204.06125 (2022).

[110]

Mohammad Rastegari, Vicente Ordonez, Joseph Redmon, and Ali Farhadi. 2016. XNOR-Net: ImageNet classification using binary convolutional neural networks. In Proceedings of the European Conference on Computer Vision. 525–542.

[111]

Sujith Ravi. 2019. ProjectionNet: Learning efficient on-device deep networks using neural projections. In Proceedings of the 36th International Conference on Machine Learning.

[112]

Sujith Ravi and Zornitsa Kozareva. 2018. Self-governing neural networks for on-device short text classification. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 887–893.

[113]

Esteban Real, Alok Aggarwal, Yanping Huang, and Quoc V. Le. 2019. Regularized evolution for image classifier architecture search. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 4780–4789.

Digital Library

[114]

Google Research. 2021. Fast Sparse ConvNets—GitHub Repository. Retrieved June 3, 2021 from https://github.com/google-research/google-research/tree/master/fastconvnets.

[115]

Nadav Rotem, Jordan Fix, Saleem Abdulrasool, Garret Catron, Summer Deng, Roman Dzhabarov, Nick Gibson, et al. 2018. Glow: Graph lowering compiler techniques for neural networks. arXiv preprint arXiv:1805.00907 (2018).

[116]

Chitwan Saharia, William Chan, Saurabh Saxena, Lala Li, Jay Whang, Emily Denton, Seyed Kamyar Seyed Ghasemipour, et al. 2022. Photorealistic text-to-image diffusion models with deep language understanding. arXiv preprint arXiv:2205.11487 (2022).

[117]

Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, and Liang-Chieh Chen. 2018. MobileNetV2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4510–4520.

[118]

Victor Sanh, Lysandre Debut, Julien Chaumond, and Thomas Wolf. 2019. DistilBERT, a distilled version of BERT: Smaller, faster, cheaper and lighter. In Proceedings of the 5th Workshop on Energy Efficient Machine Learning and Cognitive Computing (NeurIPS’19).

[119]

Chinnadhurai Sankar, Sujith Ravi, and Zornitsa Kozareva. 2019. Transferable neural projection representations. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT’19). 3355–3360.

[120]

Chinnadhurai Sankar, Sujith Ravi, and Zornitsa Kozareva. 2021. ProFormer: Towards on-device LSH projection based transformers. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics. 2823–2828.

[121]

Kaz Sato. 2021. What Makes TPUs Fine-Tuned for Deep Learning? \(\vert\) Google Cloud Blog. Retrieved June 3, 2021 from https://cloud.google.com/blog/products/ai-machine-learning/what-makes-tpus-fine-tuned-for-deep-learning.

[122]

Julian Schrittwieser, Ioannis Antonoglou, Thomas Hubert, Karen Simonyan, Laurent Sifre, Simon Schmitt, Arthur Guez, et al. 2020. Mastering Atari, Go, chess and shogi by planning with a learned model. Nature 588, 7839 (2020), 604–609.

[123]

Patrice Y. Simard, David Steinkraus, and John C. Platt. 2003. Best practices for convolutional neural networks applied to visual document analysis. In Proceedings of the 7th International Conference on Document Analysis and Recognition (ICDAR’03).

[124]

Karen Simonyan and Andrew Zisserman. 2015. Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556 (2015).

[125]

Dusan Stosic. 2020. Training Neural Networks with Tensor Cores—Dusan Stosic, NVIDIA. Retrieved June 3, 2021 from https://www.youtube.com/watch?v=jF4-_ZK_tyc.

[126]

Chen Sun, Abhinav Shrivastava, Saurabh Singh, and Abhinav Gupta. 2017. Revisiting unreasonable effectiveness of data in deep learning era. In Proceedings of the IEEE International Conference on Computer Vision. 843–852.

[127]

Zhiqing Sun, Hongkun Yu, Xiaodan Song, Renjie Liu, Yiming Yang, and Denny Zhou. 2020. MobileBERT: A compact task-agnostic BERT for resource-limited devices. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2158–2170. DOI:.

[128]

Ilya Sutskever, Oriol Vinyals, and Quoc V. Le. 2014. Sequence to sequence learning with neural networks. In Advances in Neural Information Processing Systems 27 (NeurIPS’14). 3104–3112.

[129]

Vivienne Sze, Yu-Hsin Chen, Tien-Ju Yang, and Joel S. Emer. 2017. Efficient processing of deep neural networks: A tutorial and survey. Proceedings of the IEEE 105, 12 (2017), 2295–2329.

[130]

Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2015. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1–9.

[131]

Mingxing Tan, Bo Chen, Ruoming Pang, Vijay Vasudevan, Mark Sandler, Andrew Howard, and Quoc V. Le. 2019. MnasNet: Platform-aware neural architecture search for mobile. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2820–2828.

[132]

Yi Tay, Mostafa Dehghani, Dara Bahri, and Donald Metzler. 2023. Efficient transformers: A survey. ACM Computing Surveys 55, 6 (2023), Article 109: 28 pages.

Digital Library

[133]

TensorFlow. 2021. Model Optimization \(\vert\) TensorFlow Lite. Retrieved June 3, 2021 from https://www.tensorflow.org/lite/performance/model_optimization.

[134]

Sik-Ho Tsang. 2019. Review: Xception—With depthwise separable convolution, better than Inception-v3 (image classification). Medium. Retrieved February 1, 2023 from https://towardsdatascience.com/review-xception-with-depthwise-separable-convolution-better-than-inception-v3-image-dc967dd42568.

[135]

Gregor Urban, Krzysztof J. Geras, Samira Ebrahimi Kahou, Özlem Aslan, Shengjie Wang, Abdelrahman Mohamed, Matthai Philipose, Matthew Richardson, and Rich Caruana. 2017. Do deep convolutional nets really need to be deep and convolutional? In Proceedings of the 5th International Conference on Learning Representations (ICLR’17).

[136]

Vincent Vanhoucke, Andrew Senior, and Mark Z. Mao. 2011. Improving the speed of neural networks on CPUs. In Proceedings of the Deep Learning and Unsupervised Feature Learning Workshop (NIPS’11).

[137]

Nicolas Vasilache, Oleksandr Zinenko, Theodoros Theodoridis, Priya Goyal, Zachary DeVito, William S. Moses, Sven Verdoolaege, Andrew Adams, and Albert Cohen. 2018. Tensor comprehensions: Framework-agnostic high-performance machine learning abstractions. arXiv preprint arXiv:1802.04730 (2018).

[138]

Ashish Vaswani, Noam M. Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Proceedings of the 31st Conference on Neural InformationProcessingSystems (NIPS’17). 1–11.

[139]

Peisong Wang, Qiang Chen, Xiangyu He, and Jian Cheng. 2020. Towards accurate post-training network quantization via bit-split and stitching. In Proceedings of the International Conference on Machine Learning. 9847–9856. http://proceedings.mlr.press/v119/wang20c.html.

[140]

Shibo Wang and Pankaj Kanwar. 2021. BFloat16: The Secret to High Performance on Cloud TPUs \(\vert\) Google Cloud Blog. Retrieved June 3, 2021 from https://cloud.google.com/blog/products/ai-machine-learning/bfloat16-the-secret-to-high-performance-on-cloud-tpus.

[141]

Bichen Wu, Xiaoliang Dai, Peizhao Zhang, Yanghan Wang, Fei Sun, Yiming Wu, Yuandong Tian, Peter Vajda, Yangqing Jia, and Kurt Keutzer. 2019. FBNet: Hardware-aware efficient ConvNet design via differentiable neural architecture search. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10734–10742.

[142]

Qizhe Xie, Minh-Thang Luong, Eduard Hovy, and Quoc V. Le. 2020. Self-training with noisy student improves ImageNet classification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10687–10698.

[143]

I. Zeki Yalniz, Hervé Jégou, Kan Chen, Manohar Paluri, and Dhruv Mahajan. 2019. Billion-scale semi-supervised learning for image classification. arXiv preprint arXiv:1905.00546 (2019).

[144]

Jiahui Yu, Yuanzhong Xu, Jing Yu Koh, Thang Luong, Gunjan Baid, Zirui Wang, Vijay Vasudevan, et al. 2022. Scaling autoregressive models for content-rich text-to-image generation. arXiv preprint arXiv:2206.10789 (2022).

[145]

Tong Yu and Hong Zhu. 2020. Hyper-parameter optimization: A review of algorithms and applications. arXiv preprint arXiv:2003.05689 (2020).

[146]

Xiyu Yu, Tongliang Liu, Xinchao Wang, and Dacheng Tao. 2017. On compressing deep models by low rank and sparse decomposition. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). IEEE, Los Alamitos, CA, 67–76. DOI:

[147]

Sergey Zagoruyko and N. Komodakis. 2016. Paying more attention to attention: Improving the performance of convolutional neural networks via attention transfer. In Proceedings of the 5th International Conference on Learning Representations (ICLR’16). 1–13.

[148]

Hongyi Zhang, Moustapha Cissé, Yann N. Dauphin, and David Lopez-Paz. 2018. mixup: Beyond empirical risk minimization. In Proceedings of the 6th International Conference on Learning Representations (ICLR ’18). 1–13.

[149]

Michael Zhu and Suyog Gupta. 2018. To prune, or not to prune: Exploring the efficacy of pruning for model compression. In Proceedings of the 6th International Conference on Learning Representations (ICLR ’18).

[150]

Barret Zoph and Quoc V. Le. 2017. Neural architecture search with reinforcement learning. In Proceedings of the 5th International Conference on Learning Representations (ICLR ’17).

[151]

Barret Zoph, Vijay Vasudevan, Jonathon Shlens, and Quoc V. Le. 2018. Learning transferable architectures for scalable image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 8697–8710.

Cited By

Yi WZhang LXu YCheng XChen T(2025)MIDF-DMAP: Multimodal information dynamic fusion for drug molecule activity predictionExpert Systems with Applications10.1016/j.eswa.2024.125403260(125403)Online publication date: Jan-2025
https://doi.org/10.1016/j.eswa.2024.125403
Khan HJamil SPiran MKwon OLee J(2024)A Comprehensive Survey on the Investigation of Machine-Learning-Powered Augmented Reality Applications in EducationTechnologies10.3390/technologies1205007212:5(72)Online publication date: 19-May-2024
https://doi.org/10.3390/technologies12050072
Mizrahi DLaufer IZuckerman I(2024)Real-World Spatial Synchronization of Event-CMOS Cameras through Deep Learning: A Novel CNN-DGCNN ApproachSensors10.3390/s2413405024:13(4050)Online publication date: 21-Jun-2024
https://doi.org/10.3390/s24134050
Show More Cited By

Index Terms

Efficient Deep Learning: A Survey on Making Deep Learning Models Smaller, Faster, and Better
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
    2. Natural language processing
  2. Machine learning

Recommendations

Deep Reinforcement Learning: From Q-Learning to Deep Q-Learning
Neural Information Processing
Abstract
As the two hottest branches of machine learning, deep learning and reinforcement learning both play a vital role in the field of artificial intelligence. Combining deep learning with reinforcement learning, deep reinforcement learning is a method ...
Advanced Deep Learning with Keras: Apply deep learning techniques, autoencoders, GANs, variational autoencoders, deep reinforcement learning, policy gradients, and more
Deep learning: systematic review, models, challenges, and research directions
Abstract
The current development in deep learning is witnessing an exponential transition into automation applications. This automation transition can provide a promising framework for higher performance and lower complexity. This ongoing transition ...

Comments

Information & Contributors

Information

Published In

cover image ACM Computing Surveys

ACM Computing Surveys Volume 55, Issue 12

December 2023

825 pages

ISSN:0360-0300

EISSN:1557-7341

DOI:10.1145/3582891

Editor:
Albert Zomaya
University of Sydney, Australia

Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 March 2023

Online AM: 20 January 2023

Accepted: 22 November 2022

Revised: 30 June 2022

Received: 13 July 2021

Published in CSUR Volume 55, Issue 12

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Survey

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

141
Total Citations
View Citations
8,019
Total Downloads

Downloads (Last 12 months)4,911
Downloads (Last 6 weeks)557

Reflects downloads up to 10 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Yi WZhang LXu YCheng XChen T(2025)MIDF-DMAP: Multimodal information dynamic fusion for drug molecule activity predictionExpert Systems with Applications10.1016/j.eswa.2024.125403260(125403)Online publication date: Jan-2025
https://doi.org/10.1016/j.eswa.2024.125403
Khan HJamil SPiran MKwon OLee J(2024)A Comprehensive Survey on the Investigation of Machine-Learning-Powered Augmented Reality Applications in EducationTechnologies10.3390/technologies1205007212:5(72)Online publication date: 19-May-2024
https://doi.org/10.3390/technologies12050072
Mizrahi DLaufer IZuckerman I(2024)Real-World Spatial Synchronization of Event-CMOS Cameras through Deep Learning: A Novel CNN-DGCNN ApproachSensors10.3390/s2413405024:13(4050)Online publication date: 21-Jun-2024
https://doi.org/10.3390/s24134050
Kim TPark T(2024)Reinforcement Learning and Genetic Algorithm-Based Network Module for Camera-LiDAR DetectionRemote Sensing10.3390/rs1613228716:13(2287)Online publication date: 22-Jun-2024
https://doi.org/10.3390/rs16132287
Sanida MSanida TSideris ADasygenis M(2024)An Advanced Deep Learning Framework for Multi-Class Diagnosis from Chest X-ray ImagesJ10.3390/j70100037:1(48-71)Online publication date: 22-Jan-2024
https://doi.org/10.3390/j7010003
Xu YPan QWang ZHu B(2024)A Novel Hypersonic Target Trajectory Estimation Method Based on Long Short-Term Memory and a Multi-Head Attention MechanismEntropy10.3390/e2610082326:10(823)Online publication date: 26-Sep-2024
https://doi.org/10.3390/e26100823
Wu YXue W(2024)Data-Driven Weather Forecasting and Climate Modeling from the Perspective of DevelopmentAtmosphere10.3390/atmos1506068915:6(689)Online publication date: 6-Jun-2024
https://doi.org/10.3390/atmos15060689
Suo LWang ZLiu HCui LSun XQin X(2024)Innovative Deep Learning Approaches for High-Precision Segmentation and Characterization of Sandstone Pore Structures in ReservoirsApplied Sciences10.3390/app1416717814:16(7178)Online publication date: 15-Aug-2024
https://doi.org/10.3390/app14167178
Liang JChen JZhou MLi HXu YXu FYin LChai X(2024)An Intelligent Detection System for Wheat Appearance QualityAgronomy10.3390/agronomy1405105714:5(1057)Online publication date: 16-May-2024
https://doi.org/10.3390/agronomy14051057
Wang YXiao JCheng XWei QTang N(2024)Underwater acoustic signal classification based on a spatial–temporal fusion neural networkFrontiers in Marine Science10.3389/fmars.2024.133171711Online publication date: 12-Mar-2024
https://doi.org/10.3389/fmars.2024.1331717
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Full Text

View this article in Full Text.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View full text|Download PDF

View Issue’s Table of Contents