Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
survey

Efficient Deep Learning: A Survey on Making Deep Learning Models Smaller, Faster, and Better

Published: 02 March 2023 Publication History

Abstract

Deep learning has revolutionized the fields of computer vision, natural language understanding, speech recognition, information retrieval, and more. However, with the progressive improvements in deep learning models, their number of parameters, latency, and resources required to train, among others, have all increased significantly. Consequently, it has become important to pay attention to these footprint metrics of a model as well, not just its quality. We present and motivate the problem of efficiency in deep learning, followed by a thorough survey of the five core areas of model efficiency (spanning modeling techniques, infrastructure, and hardware) and the seminal work there. We also present an experiment-based guide along with code for practitioners to optimize their model training and deployment. We believe this is the first comprehensive survey in the efficient deep learning space that covers the landscape of model efficiency from modeling techniques to hardware support. It is our hope that this survey would provide readers with the mental model and the necessary understanding of the field to apply generic efficiency techniques to immediately get significant improvements, and also equip them with ideas for further research and experimentation to achieve additional gains.

Supplementary Material

3578938-supp (3578938-supp.pdf)
Supplementary material

References

[1]
Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, et al. 2015. TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. Retrieved February 1, 2023 from https://www.tensorflow.org/.(Software available from tensorflow.org.)
[2]
Apoorv Agnihotri and Nipun Batra. 2020. Exploring Bayesian optimization. Distill 5, 5 (2020), e26.
[3]
Bahriye Akay, Dervis Karaboga, and Rustu Akay. 2022. A comprehensive survey on optimizing deep learning models by metaheuristics. Artificial Intelligence Review 55, 2 (Feb.2022), 829–894.
[4]
Android Developers. 2021. Neural Networks API \(\vert\) Android NDK \(\vert\) Android Developers. Retrieved June 3, 2021 from https://developer.android.com/ndk/guides/neuralnetworks.
[5]
Sajid Anwar, Kyuyeon Hwang, and Wonyong Sung. 2017. Structured pruning of deep convolutional neural networks. ACM Journal on Emerging Technologies in Computing Systems 13, 3 (2017), 1–18.
[6]
Apple Authors. 2021. Accelerate \(\vert\) Apple Developer Documentation. Retrieved June 3, 2021 from https://developer.apple.com/documentation/accelerate.
[7]
PyTorch Authors. 2021. Performance Tuning Guide—PyTorch Tutorials 1.8.1+cu102 Documentation. Retrieved June 3, 2021 from https://pytorch.org/tutorials/recipes/recipes/tuning_guide.html.
[8]
PyTorch Authors. 2021. PyTorch Mobile. Retrieved June 3, 2021 from https://pytorch.org/mobile/home.
[9]
PyTorch Authors. 2021. Quantization Recipe—PyTorch Tutorials 1.8.1+cu102 documentation. Retrieved June 3, 2021 from https://pytorch.org/tutorials/recipes/quantization.html.
[10]
PyTorch Authors. 2021. torch.jit.script—PyTorch 1.8.1 Documentation. Retrieved June 3, 2021 from https://pytorch.org/docs/stable/generated/torch.jit.script.html.
[11]
TensorFlow Authors. 2021. Post-Training Quantization \(\vert\) TensorFlow Lite. Retrieved June 3, 2021 from https://www.tensorflow.org/lite/performance/post_training_quantization.
[12]
Tensorflow Authors. 2021. TensorFlow Lite Converter. Retrieved June 3, 2021 from https://www.tensorflow.org/lite/convert.
[13]
Tensorflow Authors. 2021. TensorFlow Lite \(\vert\) ML for Mobile and Edge Devices. Retrieved June 3, 2021 from https://www.tensorflow.org/lite.
[14]
TensorFlow Authors. 2021. XLA: Optimizing Compiler for Machine Learning \(\vert\) TensorFlow. Retrieved June 3, 2021 from https://www.tensorflow.org/xla.
[15]
XNNPACK Authors. 2021. XNNPACK. Retrieved June 3, 2021 from https://github.com/google/XNNPACK.
[16]
XNNPACK Authors. 2021. XNNPACK Backend for TensorFlow Lite. Retrieved June 3, 2021 from https://github.com/tensorflow/tensorflow/blob/master/tensorflow/lite/delegates/xnnpack/README.md/#sparse-inference.
[17]
Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2015. Neural machine translation by jointly learning to align and translate. In Proceedings of the 3rd International Conference on Learning Representations (ICLR’15).
[18]
James Bergstra and Yoshua Bengio. 2012. Random search for hyper-parameter optimization.Journal of Machine Learning Research 13, 2 (2012), 281–305.
[19]
Avrim Blum and Tom Mitchell. 1998. Combining labeled and unlabeled data with co-training. In Proceedings of the 11th Annual Conference on Computational Learning Theory. 92–100.
[20]
James Bradbury, Stephen Merity, Caiming Xiong, and Richard Socher. 2017. Quasi-recurrent neural networks. In Proceedings of the 5th International Conference on Learning Representations (ICLR ’17).
[21]
Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, et al. 2020. Language models are few-shot learners. In Advances in Neural Information Processing Systems 33 (NeurIPS’20).
[22]
Cristian Buciluǎ, Rich Caruana, and Alexandru Niculescu-Mizil. 2006. Model compression. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 535–541.
[23]
Moses S. Charikar. 2002. Similarity estimation techniques from rounding algorithms. In Proceedings of the 34th Annual ACM Symposium on Theory of Computing. 380–388.
[24]
Sneha Chaudhari, Varun Mithal, Gungor Polatkan, and Rohan Ramanath. 2021. An attentive survey of attention models. ACM Transactions on Intelligent Systems and Technology 12, 5 (2021), Article 53, 32 pages. DOI:
[25]
Nitesh V. Chawla, Kevin W. Bowyer, Lawrence O. Hall, and W. Philip Kegelmeyer. 2002. SMOTE: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research 16 (2002), 321–357.
[26]
Yu Cheng, Duo Wang, Pan Zhou, and Tao Zhang. 2018. Model compression and acceleration for deep neural networks: The principles, progress, and challenges. IEEE Signal Processing Magazine 35, 1 (2018), 126–136. DOI:
[27]
François Chollet. 2017. Xception: Deep learning with depthwise separable convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1251–1258.
[28]
Francois Chollet. 2020. The Keras Blog. Retrieved June 4, 2021 from https://blog.keras.io.
[29]
Aakanksha Chowdhery, Sharan Narang, Jacob Devlin, Maarten Bosma, Gaurav Mishra, Adam Roberts, Paul Barham, et al. 2022. PaLM: Scaling language modeling with pathways. arXiv preprint arXiv:2204.02311 (2022).
[30]
Dan C. Ciresan, Ueli Meier, Jonathan Masci, Luca Maria Gambardella, and Jürgen Schmidhuber. 2011. High-performance neural networks for visual object classification. CoRR abs/1102.0183 (2011). http://arxiv.org/abs/1102.0183.
[31]
Contributors to Wikimedia Projects. 2021. AVX-512—Wikipedia. Retrieved June 3, 2021 from https://en.wikipedia.org/w/index.php?title=AVX-512&oldid=1025044245.
[32]
Contributors to Wikimedia Projects. 2021. CUDA—Wikipedia. Retrieved June 3, 2021 from https://en.wikipedia.org/w/index.php?title=CUDA&oldid=1025500257.
[33]
Contributors to Wikimedia Projects. 2021. Hyperparameter Optimization—Wikipedia. Retrieved June 3, 2021 from https://en.wikipedia.org/w/index.php?title=Hyperparameter_optimization&oldid=1022309479.
[34]
Contributors to Wikimedia Projects. 2021. Multiply-Accumulate Operation—Wikipedia. Retrieved June 3, 2021 from https://en.wikipedia.org/w/index.php?title=Multiply-accumulate_operation&oldid=1026461481.
[35]
Contributors to Wikimedia Projects. 2021. SSE \(^{4}\) —Wikipedia. Retrieved June 3, 2021 from https://en.wikipedia.org/w/index.php?title=SSE4&oldid=1023092035.
[36]
Ekin D. Cubuk, Barret Zoph, Jonathon Shlens, and Quoc V. Le. 2020. RandAugment: Practical automated data augmentation with a reduced search space. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. 702–703.
[37]
Mostafa Dehghani, Yi Tay, Anurag Arnab, Lucas Beyer, and Ashish Vaswani. 2022. The efficiency misnomer. In Proceedings of the International Conference on Learning Representations.
[38]
Tim Dettmers and Luke Zettlemoyer. 2019. Sparse networks from scratch: Faster training without losing performance. arXiv preprint arXiv:1907.04840 (2019).
[39]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers).
[40]
Thomas G. Dietterich. 2000. Ensemble methods in machine learning. In Proceedings of the International Workshop on Multiple Classifier Systems. 1–15.
[41]
Carl Doersch, Abhinav Gupta, and Alexei A. Efros. 2015. Unsupervised visual representation learning by context prediction. In Proceedings of the IEEE International Conference on Computer Vision. 1422–1430.
[42]
Xin Dong, Shangyu Chen, and Sinno Jialin Pan. 2017. Learning to prune deep neural networks via layer-wise optimal brain surgeon. In Proceedings of the 31st International Conference on Neural Information Processing Systems.
[43]
Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, et al. 2021. An image is worth 16x16 words: Transformers for image recognition at scale. In Proceedings of the 9th International Conference on Learning Representations (ICLR ’21).https://openreview.net/forum?id=YicbFdNTTy.
[44]
Marat Dukhan, Yiming Wu Wu, and Hao Lu. 2020. QNNPACK: Open Source Library for Optimized Mobile Deep Learning—Facebook Engineering. Retrieved June 3, 2021 from https://engineering.fb.com/2018/10/29/ml-applications/qnnpack.
[45]
Erich Elsen, Marat Dukhan, Trevor Gale, and Karen Simonyan. 2020. Fast sparse ConvNets. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 14629–14638.
[46]
Thomas Elsken, Jan Hendrik Metzen, and Frank Hutter. 2019. Neural architecture search: A survey.Journal of Machine Learning Research 20, 55 (2019), 1–21.
[47]
Utku Evci, Trevor Gale, Jacob Menick, Pablo Samuel Castro, and Erich Elsen. 2020. Rigging the lottery: Making all tickets winners. In Proceedings of the International Conference on Machine Learning. 2943–2952.
[48]
Alhussein Fawzi, Horst Samulowitz, Deepak Turaga, and Pascal Frossard. 2016. Adaptive data augmentation for image classification. In Proceedings of the 2016 IEEE International Conference on Image Processing (ICIP’16). IEEE, Los Alamitos, CA, 3688–3692.
[49]
Spyros Gidaris, Praveer Singh, and Nikos Komodakis. 2018. Unsupervised representation learning by predicting image rotations. In Proceedings of the 6th International Conference on Learning Representations (ICLR ’18).
[50]
Daniel Golovin, Benjamin Solnik, Subhodeep Moitra, Greg Kochanski, John Karro, and D. Sculley. 2017. Google Vizier: A service for black-box optimization. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 1487–1495.
[51]
Google. 2021. Edge TPU Performance Benchmarks \(\vert\) Coral. Retrieved June 3, 2021 from https://coral.ai/docs/edgetpu/benchmarks.
[52]
Arjun Gopalan, Da-Cheng Juan, Cesar Ilharco Magalhaes, Chun-Sung Ferng, Allan Heydon, Chun-Ta Lu, Philip Pham, George Yu, Yicheng Fan, and Yueqi Wang. 2021. Neural structured learning: Training neural networks with structured signals. In Proceedings of the 14th ACM International Conference on Web Search and Data Mining. 1150–1153.
[53]
Song Han, Huizi Mao, and William J. Dally. 2016. Deep compression: Compressing deep neural networks with pruning, trained quantization and Huffman coding. In Proceedings of the 4th International Conference on Learning Representations (ICLR ’16).
[54]
Song Han, Jeff Pool, John Tran, and William J. Dally. 2015. Learning both weights and connections for efficient neural networks. In Proceedings of the 28th International Conference on Neural Information Processing Systems (NIPS’15). 1135–1143.
[55]
Lars Kai Hansen and Peter Salamon. 1990. Neural network ensembles. IEEE Transactions on Pattern Analysis and Machine Intelligence 12, 10 (1990), 993–1001.
[56]
Babak Hassibi, David G. Stork, and Gregory J. Wolff. 1993. Optimal brain surgeon and general network pruning. In Proceedings of the IEEE International Conference on Neural Networks. IEEE, Los Alamitos, CA, 293–299.
[57]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770–778.
[58]
Yihui He, Ji Lin, Zhijian Liu, Hanrui Wang, Li-Jia Li, and Song Han. 2018. AMC: AutoML for model compression and acceleration on mobile devices. In Proceedings of the European Conference on Computer Vision (ECCV’18). 784–800.
[59]
Lennart Heim. 2022. Estimating xn–PaLM-kd53c’s training cost. Blog.heim. Retrieved February 1, 2023 from https://blog.heim.xyz/palm-training-cost.
[60]
Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. 2014. Distilling the knowledge in a neural network. In Proceedings of the NeurIPS 2014 Deep Learning Workshop.
[61]
Sara Hooker, Nyalleng Moorosi, Gregory Clark, Samy Bengio, and Emily Denton. 2020. Characterising bias in compressed models. arXiv preprint arXiv:2010.03058 (2020).
[62]
Andrew Howard, Mark Sandler, Grace Chu, Liang-Chieh Chen, Bo Chen, Mingxing Tan, Weijun Wang, et al. 2019. Searching for MobileNetV3. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 1314–1324.
[63]
Jeremy Howard and Sebastian Ruder. 2018. Universal language model fine-tuning for text classification. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, Volume 1 (Long Papers). 328–339.
[64]
Chi-Hung Hsu, Shu-Huan Chang, Jhao-Hong Liang, Hsin-Ping Chou, Chun-Hao Liu, Shih-Chieh Chang, Jia-Yu Pan, Yu-Ting Chen, Wei Wei, and Da-Cheng Juan. 2018. Monas: Multi-objective neural architecture search using reinforcement learning. arXiv preprint arXiv:1806.10332 (2018).
[65]
Itay Hubara, Matthieu Courbariaux, Daniel Soudry, Ran El-Yaniv, and Yoshua Bengio. 2017. Quantized neural networks: Training neural networks with low precision weights and activations. Journal of Machine Learning Research 18, 1 (2017), 6869–6898.
[66]
Hiroshi Inoue. 2018. Data augmentation by pairing samples for images classification. In Proceedings of the 6th International Conference on Learning Representations (ICLR’18).
[67]
Benoit Jacob, Skirmantas Kligys, Bo Chen, Menglong Zhu, Matthew Tang, Andrew Howard, Hartwig Adam, and Dmitry Kalenichenko. 2018. Quantization and training of neural networks for efficient integer-arithmetic-only inference. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2704–2713.
[68]
Max Jaderberg, Valentin Dalibard, Simon Osindero, Wojciech M. Czarnecki, Jeff Donahue, Ali Razavi, Oriol Vinyals, et al. 2017. Population based training of neural networks. arXiv preprint arXiv:1711.09846 (2017).
[69]
Kevin Jamieson and Ameet Talwalkar. 2016. Non-stochastic best arm identification and hyperparameter optimization. In Proceedings of the 19th International Conference on Artificial Intelligence and Statistics (AISTATS’16). 240–248.
[70]
Jeremy Jordan. 2020. Setting the learning rate of your neural network.Jeremy Jordan. Retrieved February 1, 2023 from https://www.jeremyjordan.me/nn-learning-rate.
[71]
Norman P. Jouppi, Cliff Young, Nishant Patil, David Patterson, Gaurav Agrawal, Raminder Bajwa, Sarah Bates, et al. 2017. In-datacenter performance analysis of a tensor processing unit. In Proceedings of the 44th Annual International Symposium on Computer Architecture. 1–12.
[72]
Prabhu Kaliamoorthi, Sujith Ravi, and Zornitsa Kozareva. 2019. PRADO: Projection attention networks for document classification on-device. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP’19). 5012–5021.
[73]
Prabhu Kaliamoorthi, Aditya Siddhant, Edward Li, and Melvin Johnson. 2021. Distilling large language models into tiny and effective students using pQRNN. arXiv preprint arXiv:2101.08890 (2021).
[74]
Pankaj Kanwar, Peter Brandt, and Zongwei Zhou. 2021. TensorFlow 2 MLPerf Submissions Demonstrate Best-in-Class Performance on Google Cloud. Retrieved June 3, 2021 from https://blog.tensorflow.org/2020/07/tensorflow-2-mlperf-submissions.html.
[75]
Raghuraman Krishnamoorthi. 2018. Quantizing deep convolutional networks for efficient inference: A whitepaper. arXiv:1806.08342 (2018). https://arxiv.org/abs/1806.08342v1.
[76]
Alex Krizhevsky. 2009. Learning Multiple Layers of Features from Tiny Images. Technical Report. University of Toronto.
[77]
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems 25. 1097–1105.
[78]
Anders Krogh and Jesper Vedelsby. 1994. Neural network ensembles, cross validation and active learning. In Proceedings of the 7th International Conference on Neural Information Processing Systems(NIPS’94).
[79]
H. T. Kung and C. E. Leiserson. 1980. Introduction to VLSI systems. Algorithms for VLSI Processor Arrays, C. A. Mead and L. Conway (Eds.). Addison-Wesley, Reading, MA, 271–292.
[80]
Hsiang-Tsung Kung. 1982. Why systolic architectures?IEEE Computer 15, 1 (1982), 37–46.
[81]
Yann Lecun, Leon Bottou, Yoshua Bengio, and Patrick Haffner. 1998. Gradient-based learning applied to document recognition. Proceedings of the IEEE 86, 11 (Nov.1998), 2278–2324. DOI:
[82]
Yann LeCun, John S. Denker, and Sara A. Solla. 1990. Optimal brain damage. In Advances in Neural Information Processing Systems 2. 598–605.
[83]
Fengfu Li, Bo Zhang, and Bin Liu. 2016. Ternary weight networks. In Proceedings of the 30th Conference on Neural Information Processing Systems (NIPS’16).
[84]
Hao Li, Asim Kadav, Igor Durdanovic, Hanan Samet, and Hans Peter Graf. 2016. Pruning filters for efficient ConvNets. In Proceedings of the 4th International Conference on Learning Representations (ICLR’16): Poster.
[85]
Lisha Li, Kevin Jamieson, Giulia DeSalvo, Afshin Rostamizadeh, and Ameet Talwalkar. 2017. Hyperband: A novel bandit-based approach to hyperparameter optimization. Journal of Machine Learning Research 18, 1 (2017), 6765–6816.
[86]
Sharon Y. Li. 2020. Automating data augmentation: Practice, theory and new direction. SAIL Blog. Retrieved February 1, 2023 from http://ai.stanford.edu/blog/data-augmentation.
[87]
Chenxi Liu, Barret Zoph, Maxim Neumann, Jonathon Shlens, Wei Hua, Li-Jia Li, Li Fei-Fei, Alan Yuille, Jonathan Huang, and Kevin Murphy. 2018. Progressive neural architecture search. In Proceedings of the European Conference on Computer Vision (ECCV’18). 19–34.
[88]
Hanxiao Liu, Karen Simonyan, and Yiming Yang. 2019. DARTS: Differentiable architecture search. In Proceedings of the 7th International Conference on Learning Representations (ICLR’19).
[89]
Zechun Liu, Haoyuan Mu, Xiangyu Zhang, Zichao Guo, Xin Yang, Kwang-Ting Cheng, and Jian Sun. 2019. Metapruning: Meta learning for automatic neural network channel pruning. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 3296–3305.
[90]
Arm Ltd.2021. SIMD ISAs \(\vert\) Neon—Arm Developer. Retrieved June 3, 2021 from https://developer.arm.com/architectures/instruction-sets/simd-isas/neon.
[91]
Sachin Mehta and Mohammad Rastegari. 2021. MobileViT: Light-weight, general-purpose, and mobile-friendly vision transformer. arXiv preprint arXiv:2110.02178 (2021).
[92]
Gaurav Menghani and Sujith Ravi. 2019. Learning from a teacher using unlabeled data. arXiv preprint arXiv:1911.05275 (2019).
[93]
Tomas Mikolov, Edouard Grave, Piotr Bojanowski, Christian Puhrsch, and Armand Joulin. 2017. Advances in pre-training distributed word representations. arXiv preprint arXiv:1712.09405 (2017).
[94]
Rahul Mishra, Hari Prabhat Gupta, and Tanima Dutta. 2020. A survey on deep neural network compression: Challenges, overview, and solutions. arXiv preprint arXiv:2010.03954 (2020).
[95]
MLCommons. 2022. v2.0 Results. Retrieved June 29, 2022 from https://mlcommons.org/en/training-normal-20.
[96]
MLCommons. 2022. v2.0 Results. Retrieved June 29, 2022 from https://mlcommons.org/en/inference-datacenter-20.
[97]
Jonas Močkus. 1975. On Bayesian methods for seeking the extremum. In Proceedings of the Optimization Techniques IFIP Technical Conference. 400–404.
[98]
Pavlo Molchanov, Stephen Tyree, Tero Karras, Timo Aila, and Jan Kautz. 2017. Pruning convolutional neural networks for resource efficient inference. arXiv:1611.06440 (2017).
[99]
MosaicML. 2022. Blazingly Fast Computer Vision Training with the Mosaic ResNet and Composer. Retrieved June 29, 2022 from https://www.mosaicml.com/blog/mosaic-resnet.
[100]
NVIDIA. 2020. GTC 2020: Accelerating Sparsity in the NVIDIA Ampere Architecture. Retrieved June 3, 2021 from https://developer.nvidia.com/gtc/2020/video/s22085-vid.
[101]
NVIDIA. 2020. Inside Volta: The World’s Most Advanced Data Center GPU \(\vert\) NVIDIA Developer Blog. Retrieved June 3, 2021 from https://developer.nvidia.com/blog/inside-volta.
[102]
NVIDIA. 2021. NVIDIA Embedded Systems for Next-Gen Autonomous Machines. Retrieved June 4, 2021 from https://www.nvidia.com/en-us/autonomous-machines/embedded-systems.
[103]
Rina Panigrahy. 2021. Matrix Compression Operator. Retrieved June 5, 2021 from https://blog.tensorflow.org/2020/02/matrix-compression-operator-tensorflow.html.
[104]
PapersWithCode.com. 2021. Papers with Code—The Latest in Machine Learning. Retrieved June 3, 2021 from https://paperswithcode.com.
[105]
Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, et al. 2019. PyTorch: An imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems 32 (NeurIPS’19). 8024–8035.
[106]
Jeffrey Pennington, Richard Socher, and Christopher D. Manning. 2014. GloVe: Global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP’14). 1532–1543.
[107]
Hieu Pham, Melody Guan, Barret Zoph, Quoc Le, and Jeff Dean. 2018. Efficient neural architecture search via parameters sharing. In Proceedings of the International Conference on Machine Learning. 4095–4104.
[108]
Rajat Raina, Anand Madhavan, and Andrew Y. Ng. 2009. Large-scale deep unsupervised learning using graphics processors. In Proceedings of the 26th Annual International Conference on Machine Learning. 873–880.
[109]
Aditya Ramesh, Prafulla Dhariwal, Alex Nichol, Casey Chu, and Mark Chen. 2022. Hierarchical text-conditional image generation with CLIP latents. arXiv preprint arXiv:2204.06125 (2022).
[110]
Mohammad Rastegari, Vicente Ordonez, Joseph Redmon, and Ali Farhadi. 2016. XNOR-Net: ImageNet classification using binary convolutional neural networks. In Proceedings of the European Conference on Computer Vision. 525–542.
[111]
Sujith Ravi. 2019. ProjectionNet: Learning efficient on-device deep networks using neural projections. In Proceedings of the 36th International Conference on Machine Learning.
[112]
Sujith Ravi and Zornitsa Kozareva. 2018. Self-governing neural networks for on-device short text classification. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 887–893.
[113]
Esteban Real, Alok Aggarwal, Yanping Huang, and Quoc V. Le. 2019. Regularized evolution for image classifier architecture search. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 4780–4789.
[114]
Google Research. 2021. Fast Sparse ConvNets—GitHub Repository. Retrieved June 3, 2021 from https://github.com/google-research/google-research/tree/master/fastconvnets.
[115]
Nadav Rotem, Jordan Fix, Saleem Abdulrasool, Garret Catron, Summer Deng, Roman Dzhabarov, Nick Gibson, et al. 2018. Glow: Graph lowering compiler techniques for neural networks. arXiv preprint arXiv:1805.00907 (2018).
[116]
Chitwan Saharia, William Chan, Saurabh Saxena, Lala Li, Jay Whang, Emily Denton, Seyed Kamyar Seyed Ghasemipour, et al. 2022. Photorealistic text-to-image diffusion models with deep language understanding. arXiv preprint arXiv:2205.11487 (2022).
[117]
Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, and Liang-Chieh Chen. 2018. MobileNetV2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4510–4520.
[118]
Victor Sanh, Lysandre Debut, Julien Chaumond, and Thomas Wolf. 2019. DistilBERT, a distilled version of BERT: Smaller, faster, cheaper and lighter. In Proceedings of the 5th Workshop on Energy Efficient Machine Learning and Cognitive Computing (NeurIPS’19).
[119]
Chinnadhurai Sankar, Sujith Ravi, and Zornitsa Kozareva. 2019. Transferable neural projection representations. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT’19). 3355–3360.
[120]
Chinnadhurai Sankar, Sujith Ravi, and Zornitsa Kozareva. 2021. ProFormer: Towards on-device LSH projection based transformers. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics. 2823–2828.
[121]
Kaz Sato. 2021. What Makes TPUs Fine-Tuned for Deep Learning? \(\vert\) Google Cloud Blog. Retrieved June 3, 2021 from https://cloud.google.com/blog/products/ai-machine-learning/what-makes-tpus-fine-tuned-for-deep-learning.
[122]
Julian Schrittwieser, Ioannis Antonoglou, Thomas Hubert, Karen Simonyan, Laurent Sifre, Simon Schmitt, Arthur Guez, et al. 2020. Mastering Atari, Go, chess and shogi by planning with a learned model. Nature 588, 7839 (2020), 604–609.
[123]
Patrice Y. Simard, David Steinkraus, and John C. Platt. 2003. Best practices for convolutional neural networks applied to visual document analysis. In Proceedings of the 7th International Conference on Document Analysis and Recognition (ICDAR’03).
[124]
Karen Simonyan and Andrew Zisserman. 2015. Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556 (2015).
[125]
Dusan Stosic. 2020. Training Neural Networks with Tensor Cores—Dusan Stosic, NVIDIA. Retrieved June 3, 2021 from https://www.youtube.com/watch?v=jF4-_ZK_tyc.
[126]
Chen Sun, Abhinav Shrivastava, Saurabh Singh, and Abhinav Gupta. 2017. Revisiting unreasonable effectiveness of data in deep learning era. In Proceedings of the IEEE International Conference on Computer Vision. 843–852.
[127]
Zhiqing Sun, Hongkun Yu, Xiaodan Song, Renjie Liu, Yiming Yang, and Denny Zhou. 2020. MobileBERT: A compact task-agnostic BERT for resource-limited devices. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2158–2170. DOI:.
[128]
Ilya Sutskever, Oriol Vinyals, and Quoc V. Le. 2014. Sequence to sequence learning with neural networks. In Advances in Neural Information Processing Systems 27 (NeurIPS’14). 3104–3112.
[129]
Vivienne Sze, Yu-Hsin Chen, Tien-Ju Yang, and Joel S. Emer. 2017. Efficient processing of deep neural networks: A tutorial and survey. Proceedings of the IEEE 105, 12 (2017), 2295–2329.
[130]
Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2015. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1–9.
[131]
Mingxing Tan, Bo Chen, Ruoming Pang, Vijay Vasudevan, Mark Sandler, Andrew Howard, and Quoc V. Le. 2019. MnasNet: Platform-aware neural architecture search for mobile. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2820–2828.
[132]
Yi Tay, Mostafa Dehghani, Dara Bahri, and Donald Metzler. 2023. Efficient transformers: A survey. ACM Computing Surveys 55, 6 (2023), Article 109: 28 pages.
[133]
TensorFlow. 2021. Model Optimization \(\vert\) TensorFlow Lite. Retrieved June 3, 2021 from https://www.tensorflow.org/lite/performance/model_optimization.
[134]
Sik-Ho Tsang. 2019. Review: Xception—With depthwise separable convolution, better than Inception-v3 (image classification). Medium. Retrieved February 1, 2023 from https://towardsdatascience.com/review-xception-with-depthwise-separable-convolution-better-than-inception-v3-image-dc967dd42568.
[135]
Gregor Urban, Krzysztof J. Geras, Samira Ebrahimi Kahou, Özlem Aslan, Shengjie Wang, Abdelrahman Mohamed, Matthai Philipose, Matthew Richardson, and Rich Caruana. 2017. Do deep convolutional nets really need to be deep and convolutional? In Proceedings of the 5th International Conference on Learning Representations (ICLR’17).
[136]
Vincent Vanhoucke, Andrew Senior, and Mark Z. Mao. 2011. Improving the speed of neural networks on CPUs. In Proceedings of the Deep Learning and Unsupervised Feature Learning Workshop (NIPS’11).
[137]
Nicolas Vasilache, Oleksandr Zinenko, Theodoros Theodoridis, Priya Goyal, Zachary DeVito, William S. Moses, Sven Verdoolaege, Andrew Adams, and Albert Cohen. 2018. Tensor comprehensions: Framework-agnostic high-performance machine learning abstractions. arXiv preprint arXiv:1802.04730 (2018).
[138]
Ashish Vaswani, Noam M. Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Proceedings of the 31st Conference on Neural InformationProcessingSystems (NIPS’17). 1–11.
[139]
Peisong Wang, Qiang Chen, Xiangyu He, and Jian Cheng. 2020. Towards accurate post-training network quantization via bit-split and stitching. In Proceedings of the International Conference on Machine Learning. 9847–9856. http://proceedings.mlr.press/v119/wang20c.html.
[140]
Shibo Wang and Pankaj Kanwar. 2021. BFloat16: The Secret to High Performance on Cloud TPUs \(\vert\) Google Cloud Blog. Retrieved June 3, 2021 from https://cloud.google.com/blog/products/ai-machine-learning/bfloat16-the-secret-to-high-performance-on-cloud-tpus.
[141]
Bichen Wu, Xiaoliang Dai, Peizhao Zhang, Yanghan Wang, Fei Sun, Yiming Wu, Yuandong Tian, Peter Vajda, Yangqing Jia, and Kurt Keutzer. 2019. FBNet: Hardware-aware efficient ConvNet design via differentiable neural architecture search. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10734–10742.
[142]
Qizhe Xie, Minh-Thang Luong, Eduard Hovy, and Quoc V. Le. 2020. Self-training with noisy student improves ImageNet classification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10687–10698.
[143]
I. Zeki Yalniz, Hervé Jégou, Kan Chen, Manohar Paluri, and Dhruv Mahajan. 2019. Billion-scale semi-supervised learning for image classification. arXiv preprint arXiv:1905.00546 (2019).
[144]
Jiahui Yu, Yuanzhong Xu, Jing Yu Koh, Thang Luong, Gunjan Baid, Zirui Wang, Vijay Vasudevan, et al. 2022. Scaling autoregressive models for content-rich text-to-image generation. arXiv preprint arXiv:2206.10789 (2022).
[145]
Tong Yu and Hong Zhu. 2020. Hyper-parameter optimization: A review of algorithms and applications. arXiv preprint arXiv:2003.05689 (2020).
[146]
Xiyu Yu, Tongliang Liu, Xinchao Wang, and Dacheng Tao. 2017. On compressing deep models by low rank and sparse decomposition. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). IEEE, Los Alamitos, CA, 67–76. DOI:
[147]
Sergey Zagoruyko and N. Komodakis. 2016. Paying more attention to attention: Improving the performance of convolutional neural networks via attention transfer. In Proceedings of the 5th International Conference on Learning Representations (ICLR’16). 1–13.
[148]
Hongyi Zhang, Moustapha Cissé, Yann N. Dauphin, and David Lopez-Paz. 2018. mixup: Beyond empirical risk minimization. In Proceedings of the 6th International Conference on Learning Representations (ICLR ’18). 1–13.
[149]
Michael Zhu and Suyog Gupta. 2018. To prune, or not to prune: Exploring the efficacy of pruning for model compression. In Proceedings of the 6th International Conference on Learning Representations (ICLR ’18).
[150]
Barret Zoph and Quoc V. Le. 2017. Neural architecture search with reinforcement learning. In Proceedings of the 5th International Conference on Learning Representations (ICLR ’17).
[151]
Barret Zoph, Vijay Vasudevan, Jonathon Shlens, and Quoc V. Le. 2018. Learning transferable architectures for scalable image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 8697–8710.

Cited By

View all
  • (2025)MIDF-DMAP: Multimodal information dynamic fusion for drug molecule activity predictionExpert Systems with Applications10.1016/j.eswa.2024.125403260(125403)Online publication date: Jan-2025
  • (2024)A Comprehensive Survey on the Investigation of Machine-Learning-Powered Augmented Reality Applications in EducationTechnologies10.3390/technologies1205007212:5(72)Online publication date: 19-May-2024
  • (2024)Real-World Spatial Synchronization of Event-CMOS Cameras through Deep Learning: A Novel CNN-DGCNN ApproachSensors10.3390/s2413405024:13(4050)Online publication date: 21-Jun-2024
  • Show More Cited By

Index Terms

  1. Efficient Deep Learning: A Survey on Making Deep Learning Models Smaller, Faster, and Better

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image ACM Computing Surveys
        ACM Computing Surveys  Volume 55, Issue 12
        December 2023
        825 pages
        ISSN:0360-0300
        EISSN:1557-7341
        DOI:10.1145/3582891
        Issue’s Table of Contents

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 02 March 2023
        Online AM: 20 January 2023
        Accepted: 22 November 2022
        Revised: 30 June 2022
        Received: 13 July 2021
        Published in CSUR Volume 55, Issue 12

        Permissions

        Request permissions for this article.

        Check for updates

        Author Tags

        1. Efficient deep learning
        2. efficient machine learning
        3. efficient artificial intelligence
        4. quantization
        5. pruning
        6. sparsity
        7. distillation
        8. model compression
        9. model optimization

        Qualifiers

        • Survey

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)4,911
        • Downloads (Last 6 weeks)557
        Reflects downloads up to 10 Oct 2024

        Other Metrics

        Citations

        Cited By

        View all
        • (2025)MIDF-DMAP: Multimodal information dynamic fusion for drug molecule activity predictionExpert Systems with Applications10.1016/j.eswa.2024.125403260(125403)Online publication date: Jan-2025
        • (2024)A Comprehensive Survey on the Investigation of Machine-Learning-Powered Augmented Reality Applications in EducationTechnologies10.3390/technologies1205007212:5(72)Online publication date: 19-May-2024
        • (2024)Real-World Spatial Synchronization of Event-CMOS Cameras through Deep Learning: A Novel CNN-DGCNN ApproachSensors10.3390/s2413405024:13(4050)Online publication date: 21-Jun-2024
        • (2024)Reinforcement Learning and Genetic Algorithm-Based Network Module for Camera-LiDAR DetectionRemote Sensing10.3390/rs1613228716:13(2287)Online publication date: 22-Jun-2024
        • (2024)An Advanced Deep Learning Framework for Multi-Class Diagnosis from Chest X-ray ImagesJ10.3390/j70100037:1(48-71)Online publication date: 22-Jan-2024
        • (2024)A Novel Hypersonic Target Trajectory Estimation Method Based on Long Short-Term Memory and a Multi-Head Attention MechanismEntropy10.3390/e2610082326:10(823)Online publication date: 26-Sep-2024
        • (2024)Data-Driven Weather Forecasting and Climate Modeling from the Perspective of DevelopmentAtmosphere10.3390/atmos1506068915:6(689)Online publication date: 6-Jun-2024
        • (2024)Innovative Deep Learning Approaches for High-Precision Segmentation and Characterization of Sandstone Pore Structures in ReservoirsApplied Sciences10.3390/app1416717814:16(7178)Online publication date: 15-Aug-2024
        • (2024)An Intelligent Detection System for Wheat Appearance QualityAgronomy10.3390/agronomy1405105714:5(1057)Online publication date: 16-May-2024
        • (2024)Underwater acoustic signal classification based on a spatial–temporal fusion neural networkFrontiers in Marine Science10.3389/fmars.2024.133171711Online publication date: 12-Mar-2024
        • Show More Cited By

        View Options

        Get Access

        Login options

        Full Access

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Full Text

        View this article in Full Text.

        Full Text

        HTML Format

        View this article in HTML Format.

        HTML Format

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media