survey

Lightweight Deep Learning for Resource-Constrained Environments: A Survey

Authors:

Hong-Han Shuai,

Wen-Huang ChengAuthors Info & Claims

ACM Computing Surveys, Volume 56, Issue 10

Article No.: 267, Pages 1 - 42

https://doi.org/10.1145/3657282

Published: 24 June 2024 Publication History

Abstract

Over the past decade, the dominance of deep learning has prevailed across various domains of artificial intelligence, including natural language processing, computer vision, and biomedical signal processing. While there have been remarkable improvements in model accuracy, deploying these models on lightweight devices, such as mobile phones and microcontrollers, is constrained by limited resources. In this survey, we provide comprehensive design guidance tailored for these devices, detailing the meticulous design of lightweight models, compression methods, and hardware acceleration strategies. The principal goal of this work is to explore methods and concepts for getting around hardware constraints without compromising the model’s accuracy. Additionally, we explore two notable paths for lightweight deep learning in the future: deployment techniques for TinyML and Large Language Models. Although these paths undoubtedly have potential, they also present significant challenges, encouraging research into unexplored areas.

References

[1]

M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean, M. Devin, S. Ghemawat, G. Irving, M. Isard, M. Kudlur, J. Levenberg, R. Monga, S. Moore, D. G. Murray, B. Steiner, P. Tucker, V. Vasudevan, P. Warden, M. Wicke, Y. Yu, and X. Zheng. 2016. TensorFlow: A system for large-scale machine learning. In OSDI. 265–283.

Digital Library

[2]

M. S. Abdelfattah, A. Mehrotra, Ł. Dudziak, and N. D. Lane. 2021. Zero-cost proxies for lightweight NAS. In ICLR.

[3]

2024. Advances in Image Manipulation Workshop in Conjunction with ECCV 2022. Retrieved from https://data.vision.ee.ethz.ch/cvl/aim22/

[4]

D. Amodei and D. Hernandez. 2018. AI and Compute. Retrieved from https://openai.com/blog/ai-and-compute

[5]

S. An, Q. Liao, Z. Lu, and J.-H. Xue. 2022. Efficient semantic segmentation via self-attention and self-distillation. IEEE Trans. Intell. Transport. Syst. 23, 9 (2022), 15256–15266.

Digital Library

[6]

R. Anil, A. M. Dai, O. Firat, M. Johnson, D. Lepikhin, A. Passos, S. Shakeri, E. Taropa, P. Bailey, Z. Chen, E. Chu, J. H. Clark, L. E. Shafey, Y. Huang, K. Meier-Hellstern, G. Mishra, E. Moreira, M. Omernick, K. Robinson, S. Ruder, Y. Tay, K. Xiao, Y. Xu, Y. Zhang, G. H. Abrego, J. Ahn, J. Austin, P. Barham, J. Botha, J. Bradbury, S. Brahma, K. Brooks, M. Catasta, Y. Cheng, C. Cherry, C. A. Choquette-Choo, A. Chowdhery, C. Crepy, S. Dave, M. Dehghani, S. Dev, J. Devlin, M. Díaz, N. Du, E. Dyer, V. Feinberg, F. Feng, V. Fienber, M. Freitag, X. Garcia, S. Gehrmann, L. Gonzalez, G. Gur-Ari, S. Hand, H. Hashemi, L. Hou, J. Howland, A. Hu, J. Hui, J. Hurwitz, M. Isard, A. Ittycheriah, M. Jagielski, W. Jia, K. Kenealy, M. Krikun, S. Kudugunta, C. Lan, K. Lee, B. Lee, E. Li, M. Li, W. Li, Y. Li, J. Li, H. Lim, H. Lin, Z. Liu, F. Liu, M. Maggioni, A. Mahendru, J. Maynez, V. Misra, M. Moussalem, Z. Nado, J. Nham, E. Ni, A. Nystrom, A. Parrish, M. Pellat, M. Polacek, A. Polozov, R. Pope, S. Qiao, E. Reif, B. Richter, P. Riley, A. C. Ros, A. Roy, B. Saeta, R. Samuel, R. Shelby, A. Slone, D. Smilkov, D. R. So, D. Sohn, S. Tokumine, D. Valter, V. Vasudevan, K. Vodrahalli, X. Wang, P. Wang, Z. Wang, T. Wang, J. Wieting, Y. Wu, K. Xu, Y. Xu, L. Xue, P. Yin, J. Yu, Q. Zhang, S. Zheng, C. Zheng, W. Zhou, D. Zhou, S. Petrov, Y. Wu. 2023. PaLM 2 technical report. Google. arXiv preprint arXiv:2305.10403 (2023).

[7]

A. Asperti, D. Evangelista, and M. Marzolla. 2021. Dissecting FLOPs along input dimensions for GreenAI cost estimations. In LOD. 86–100.

[8]

C. Banbury, C. Zhou, I. Fedorov, R. Matas, U. Thakker, D. Gope, V. Janapa Reddi, M. Mattina, and P. Whatmough. 2021. MicroNets: Neural network architectures for deploying TinyML applications on commodity microcontrollers. In Annual Conference on Machine Learning and Systems.

[9]

R. Banner, I. Hubara, E. Hoffer, and D. Soudry. 2018. Scalable methods for 8-bit training of neural networks. In Annual Conference on Neural Information Processing Systems.

[10]

M. Bastian. 2024. GPT-4 Has More than a Trillion Parameters - Report. Retrieved from https://the-decoder.com/gpt-4-has-a-trillion-parameters/

[11]

A. Berthelier, T. Chateau, S. Duffner, C. Garcia, and C. Blanc. 2021. Deep model compression and architecture optimization for embedded systems: A survey. J. Sig. Process. Syst. 93, 8 (2021), 863–878.

[12]

M. Booshehri, A. Malekpour, and P. Luksch. 2013. An improving method for loop unrolling. Int. J. Comput. Sci. Inf. Secur. 11, 5 (2013), 73–76.

[13]

H. Cai, C. Gan, T. Wang, Z. Zhang, and S. Han. 2020. Once-for-all: Train one network and specialize it for efficient deployment. In ICLR.

[14]

Y. Cai, Z. Yao, Z. Dong, A. Gholami, M. W. Mahoney, and K. Keutzer. 2020. ZeroQ: A novel zero shot quantization framework. In CVPR. 13169–13178.

[15]

A. Capotondi, M. Rusci, M. Fariselli, and L. Benini. 2020. CMix-NN: Mixed low-precision CNN library for memory-constrained edge devices. IEEE Trans. Circ. Syst. II 67, 5 (2020), 871–875.

[16]

M. Capra, B. Bussolino, A. Marchisio, G. Masera, M. Martina, and M. Shafique. 2020. Hardware and software optimizations for accelerating deep neural networks: Survey of current trends, challenges, and the road ahead. IEEE Access 8 (2020), 225134–225180.

[17]

B. Chen, T. Medini, J. Farwell, C. Tai, A. Shrivastava, et al. 2020. SLIDE: In defense of smart algorithms over hardware acceleration for large-scale deep learning systems. Annual Conference on Machine Learning and Systems. 291–306.

[18]

C.-Y. Chen, L. Lo, P.-J. Huang, H.-H. Shuai, and W.-H. Cheng. 2021. FashionMirror: Co-attention feature-remapping virtual try-on with sequential template poses. In ICCV. 13809–13818.

[19]

D. Chen, J.-P. Mei, H. Zhang, C. Wang, Y. Feng, and C. Chen. 2022. Knowledge distillation with the reused teacher classifier. In CVPR. 11933–11942.

[20]

D. Chen, J.-P. Mei, Y. Zhang, C. Wang, Z. Wang, Y. Feng, and C. Chen. 2021. Cross-layer distillation with semantic calibration. In AAAI, Vol. 35. 7028–7036.

[21]

H. Chen, Y. Wang, C. Xu, B. Shi, C. Xu, Q. Tian, and C. Xu. 2020. AdderNet: Do we really need multiplications in deep learning? In CVPR.

[22]

P. Chen, S. Liu, H. Zhao, and J. Jia. 2021. Distilling knowledge via knowledge review. In CVPR. 5008–5017.

[23]

T. Chen, Z. Du, N. Sun, J. Wang, C. Wu, Y. Chen, and O. Temam. 2014. DianNao: A small-footprint high-throughput accelerator for ubiquitous machine-learning. ACM SIGARCH Comput. Archit. News 42, 1 (2014), 269–284.

Digital Library

[24]

T. Chen, M. Li, Y. Li, M. Lin, N. Wang, M. Wang, T. Xiao, B. Xu, C. Zhang, and Z. Zhang. 2016. MXNet: A flexible and efficient machine learning library for heterogeneous distributed systems. In NIPSW.

[25]

W. Chen, Y. Wang, S. Yang, C. Liu, and L. Zhang. 2020. You only search once: A fast automation framework for single-stage DNN/Accelerator co-design. In DATE. 1283–1286.

[26]

W. Chen, D. Xie, Y. Zhang, and S. Pu. 2019. All you need is a few shifts: Designing efficient convolutional neural networks for image classification. In CVPR. 7241–7250.

[27]

Y. Chen, X. Dai, D. Chen, M. Liu, X. Dong, L. Yuan, and Z. Liu. 2022. Mobile-Former: Bridging MobileNet and transformer. In CVPR. 5270–5279.

[28]

Y. Chen, T. Luo, S. Liu, S. Zhang, L. He, J. Wang, L. Li, T. Chen, Z. Xu, N. Sun, and O. Temam. 2014. DaDianNao: A machine-learning supercomputer. In MICRO. 609–622.

Digital Library

[29]

Y. Chen, T. Yang, X. Zhang, G. Meng, C. Pan, and J. Sun. 2019. DetNAS: Neural architecture search on object detection. In NIPS. 4–1.

[30]

S. Chetlur, C. Woolley, P. Vandermersch, J. Cohen, J. Tran, B. Catanzaro, and E. Shelhamer. 2014. cuDNN: Efficient primitives for deep learning. arXiv preprint arXiv:1410.0759 (2014).

[31]

R. Child, S. Gray, A. Radford, and I. Sutskever. 2019. Generating long sequences with sparse transformers. arXiv preprint arXiv:1904.10509 (2019).

[32]

J. Cho, Y. Jung, S. Lee, and Y. Jung. 2021. Reconfigurable binary neural network accelerator with adaptive parallelism scheme. Electronics 10, 3 (2021), 230.

[33]

J. Choi, Z. Wang, S. Venkataramani, P. I.-J. Chuang, V. Srinivasan, and K. Gopalakrishnan. 2018. PACT: Parameterized clipping activation for quantized neural networks. arXiv preprint arXiv:1805.06085 (2018).

[34]

K. Choi, D. Hong, H. Yoon, J. Yu, Y. Kim, and J. Lee. 2021. DANCE: Differentiable accelerator/network co-exploration. In DAC.

Digital Library

[35]

F. Chollet. 2017. Xception: Deep learning with depthwise separable convolutions. In CVPR. 1251–1258.

[36]

K. Choromanski, V. Likhosherstov, D. Dohan, X. Song, A. Gane, T. Sarlos, P. Hawkins, J. Davis, A. Mohiuddin, L. Kaiser, D. Belanger, L. Colwell, and A. Weller. 2021. Rethinking attention with performers. In ICLR.

[37]

X. Dai, A. Wan, P. Zhang, B. Wu, Z. He, Z. Wei, K. Chen, Y. Tian, M. Yu, P. Vajda, and J. E. Gonzalez. 2021. FBNetV3: Joint architecture-recipe search using predictor pretraining. In CVPR. 16276–16285.

[38]

Z. Dai, H. Liu, Q. V. Le, and M. Tan. 2021. CoAtNet: Marrying convolution and attention for all data sizes. In NIPS. 3965–3977.

[39]

R. David, J. Duke, A. Jain, V. Janapa Reddi, N. Jeffries, J. Li, N. Kreeger, I. Nappier, M. Natraj, T. Wang, P. Warden, and R. Rhodes. 2021. TensorFlow Lite Micro: Embedded machine learning for TinyML systems. In MLSys. 800–811.

[40]

J. Deng, W. Li, Y. Chen, and L. Duan. 2021. Unbiased mean teacher for cross-domain object detection. In CVPR. 4091–4101.

[41]

X. Dong, S. Chen, and S. Pan. 2017. Learning to prune deep neural networks via layer-wise optimal brain surgeon. In NIPS.

[42]

Z. Dong, Z. Yao, D. Arfeen, A. Gholami, M. W. Mahoney, and K. Keutzer. 2020. HAWQ-V2: Hessian aware trace-weighted quantization of neural networks. In NIPS. 18518–18529.

[43]

Z. Dong, Z. Yao, A. Gholami, M. W. Mahoney, and K. Keutzer. 2019. HAWQ: Hessian aware quantization of neural networks with mixed-precision. In ICCV. 293–302.

[44]

A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby. 2021. An image is worth 16x16 words: Transformers for image recognition at scale. In ICLR.

[45]

L. Du, Y. Du, Y. Li, J. Su, Y.-C. Kuan, C.-C. Liu, and M.-C. F. Chang. 2017. A reconfigurable streaming deep convolutional neural network accelerator for Internet of Things. IEEE Trans. Circ. Syst. I 65, 1 (2017), 198–208.

[46]

S. Dubey, V. K. Soni, and B. K. Dubey. 2019. Application of microcontroller in assembly line for safety and controlling. Int. J. Res. Analyt. Rev. 6, 1 (2019), 107–111.

[47]

S. d’Ascoli, H. Touvron, M. L. Leavitt, A. S. Morcos, G. Biroli, and L. Sagun. 2021. ConViT: Improving vision transformers with soft convolutional inductive biases. In ICML. 2286–2296.

[48]

M. Elhoushi, Z. Chen, F. Shafiq, Y. H. Tian, and J. Y. Li. 2021. DeepShift: Towards multiplication-less neural networks. In CVPR. 2359–2368.

[49]

F. Faghri, I. Tabrizian, I. Markov, D. Alistarh, D. M. Roy, and A. Ramezani-Kebrya. 2020. Adaptive gradient quantization for data-parallel SGD. In NIPS. 3174–3185.

[50]

Z. Fan, W. Hu, H. Guo, F. Liu, and D. Xu. 2021. Hardware and algorithm co-optimization for pointwise convolution and channel shuffle in ShuffleNet V2. In SMC. 3212–3217.

[51]

M. Feurer, A. Klein, K. Eggensperger, J. T. Springenberg, M. Blum, and F. Hutter. 2019. Auto-sklearn: Efficient and robust automated machine learning. In Automated Machine Learning. Springer International Publishing, Cham, 113–134.

[52]

2024. ONNX. Retrieved from https://onnx.ai/

[53]

M. Fraccaroli, E. Lamma, and F. Riguzzi. 2022. Symbolic DNN-tuner. Mach. Learn. 111, 2 (2022), 625–650.

Digital Library

[54]

J. Frankle and M. Carbin. 2019. The lottery ticket hypothesis: Finding sparse, trainable neural networks. In ICLR.

[55]

E. Frantar and D. Alistarh. 2023. SparseGPT: Massive language models can be accurately pruned in one-shot. arXiv preprint arXiv:2301.00774 (2023).

[56]

Z. Fu, M. He, Z. Tang, and Y. Zhang. 2023. Optimizing data locality by executor allocation in spark computing environment. Comput. Sci. Inf. Syst. 20, 1 (2023), 491–512.

[57]

J. Getzner, B. Charpentier, and S. Günnemann. 2023. Accuracy is not the only metric that matters: Estimating the energy consumption of deep learning models. In ICLR.

[58]

A. Gholami, S. Kim, Z. Dong, Z. Yao, M. W. Mahoney, and K. Keutzer. 2022. A survey of quantization methods for efficient neural network inference. LPCV, 291–326.

[59]

A. Gholami, K. Kwon, B. Wu, Z. Tai, X. Yue, P. Jin, S. Zhao, and K. Keutzer. 2018. SqueezeNext: Hardware-aware neural network design. In CVPRW. 1638–1647.

[60]

A. N. Gomez, M. Ren, R. Urtasun, and R. B. Grosse. 2017. The reversible residual network: Backpropagation without storing activations. In NIPS.

[61]

Google. 2023. Post-training Quantization | TensorFlow Lite. Retrieved from https://www.tensorflow.org/lite/performance/post_training_quantization

[62]

J. Gou, B. Yu, S. J. Maybank, and D. Tao. 2021. Knowledge distillation: A survey. Int. J. Comput. Vis. 129, 6 (2021), 1789–1819.

Digital Library

[63]

B. Graham, A. El-Nouby, H. Touvron, P. Stock, A. Joulin, H. Jégou, and M. Douze. 2021. LeViT: A vision transformer in ConvNet’s clothing for faster inference. In ICCV. 12259–12269.

[64]

R. M. Gray and D. L. Neuhoff. 1998. Quantization. Trans. Inf. Theor. 44, 6 (1998), 2325–2383.

Digital Library

[65]

K. Guo, L. Sui, J. Qiu, J. Yu, J. Wang, S. Yao, S. Han, Y. Wang, and H. Yang. 2017. Angel-eye: A complete design flow for mapping CNN onto embedded FPGA. IEEE Trans. Comput.-aided Des. Integ. Circ. Syst. 37, 1 (2017), 35–47.

[66]

Q. Guo, X. Wang, Y. Wu, Z. Yu, D. Liang, X. Hu, and P. Luo. 2020. Online knowledge distillation via collaborative learning. In CVPR. 11020–11029.

[67]

Y. Guo, A. Yao, and Y. Chen. 2016. Dynamic network surgery for efficient DNNs. In NIPS.

[68]

Z. Guo, R. Zhang, L. Qiu, X. Ma, X. Miao, X. He, and B. Cui. 2023. CALIP: Zero-shot enhancement of clip with parameter-free attention. In AAAI, Vol. 37. 746–754.

Digital Library

[69]

M. Gupta and P. Agrawal. 2022. Compression of deep learning models for text: A survey. ACM Trans. Knowl. Discov. Data 16, 4 (2022), 1–55.

Digital Library

[70]

S. Gupta, A. Agrawal, K. Gopalakrishnan, and P. Narayanan. 2015. Deep learning with limited numerical precision. In ICML, 1737–1746.

[71]

S. Gupta and B. Akin. 2020. Accelerator-aware neural network design using AutoML. In Annual Conference on Machine Learning and Systems Workshop.

[72]

T. J. Ham, S. J. Jung, S. Kim, Y. H. Oh, Y. Park, Y. Song, J.-H. Park, S. Lee, K. Park, J. W. Lee, and D.-K. Jeong. 2020. A^2303 3: Accelerating attention mechanisms in neural networks with approximation. In HPCA. 328–341.

[73]

T. J. Ham, Y. Lee, S. H. Seo, S. Kim, H. Choi, S. J. Jung, and J. W. Lee. 2021. ELSA: Hardware-Software co-design for efficient, lightweight self-attention mechanism in neural networks. In ISCA. 692–705.

[74]

K. Han, Y. Wang, H. Chen, X. Chen, J. Guo, Z. Liu, Y. Tang, A. Xiao, C. Xu, Y. Xu, Z. Yang, Y. Zhang, and D. Tao. 2023. A survey on vision transformer. IEEE Trans. Pattern Anal. Mach. Intell. 45, 1 (2023), 87–110.

[75]

S. Han, H. Mao, and W. J. Dally. 2016. Deep compression: Compressing deep neural networks with pruning, trained quantization and Huffman coding. In ICLR.

[76]

B. Hassibi, D. G. Stork, and G. J. Wolff. 1993. Optimal brain surgeon and general network pruning. In ICNN. 293–299.

[77]

K. He, X. Zhang, S. Ren, and J. Sun. 2016. Deep residual learning for image recognition. In CVPR. 770–778.

[78]

X. He, K. Zhao, and X. Chu. 2021. AutoML: A survey of the state-of-the-art. Knowl.-based Syst. 212 (2021), 106622.

[79]

Y. He, Y. Ding, P. Liu, L. Zhu, H. Zhang, and Y. Yang. 2020. Learning filter pruning criteria for deep convolutional neural networks acceleration. In CVPR. 2006–2015.

[80]

Y. He, G. Kang, X. Dong, Y. Fu, and Y. Yang. 2018. Soft filter pruning for accelerating deep convolutional neural networks. In IJCAI. 2234–2240.

[81]

Y. He, P. Liu, Z. Wang, Z. Hu, and Y. Yang. 2019. Filter pruning via geometric median for deep convolutional neural networks acceleration. In CVPR. 4340–4349.

[82]

Y. He, X. Liu, H. Zhong, and Y. Ma. 2019. AddressNet: Shift-based primitives for efficient convolutional neural networks. In WACV. 1213–1222.

[83]

Y. He, X. Zhang, and J. Sun. 2017. Channel pruning for accelerating very deep neural networks. In CVPR. 1389–1397.

[84]

S. C. Hidayati, T. W. Goh, J.-S. G. Chan, C.-C. Hsu, J. See, L.-K. Wong, K.-L. Hua, Y. Tsao, and W.-H. Cheng. 2020. Dress with style: Learning style from joint deep embedding of clothing styles and body shapes. Trans. Multim. 23 (2020), 365–377.

[85]

G. Hinton, O. Vinyals, and J. Dean. 2015. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 (2015).

[86]

J. Ho, A. Jain, and P. Abbeel. 2020. Denoising diffusion probabilistic models. In NIPS. 6840–6851.

[87]

Y. Hou, Z. Ma, C. Liu, and C. C. Loy. 2019. Learning lightweight lane detection CNNs by self attention distillation. In ICCV. 1013–1021.

[88]

A. Howard, M. Sandler, G. Chu, L.-C. Chen, B. Chen, M. Tan, W. Wang, Y. Zhu, R. Pang, V. Vasudevan, Q. V. Le, and H. Adam. 2019. Searching for MobileNetV3. In ICCV. 1314–1324.

[89]

A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, and H. Adam. 2017. MobileNets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017).

[90]

L.-C. Hsu, C.-T. Chiu, K.-T. Lin, H.-H. Chou, and Y.-Y. Pu. 2020. ESSA: An energy-aware bit-serial streaming deep convolutional neural network accelerator. J. Syst. Archit. 111 (2020), 101831.

[91]

J. Hu, L. Shen, and G. Sun. 2018. Squeeze-and-excitation networks. In CVPR. 7132–7141.

[92]

W. Hu, Z. Che, N. Liu, M. Li, J. Tang, C. Zhang, and J. Wang. 2023. CATRO: Channel pruning via class-aware trace ratio optimization. Trans. Neural. Netw. Learn. Syst. (2023), 1–13. DOI:

[93]

G. Huang, S. Liu, L. Van der Maaten, and K. Q. Weinberger. 2018. CondenseNet: An efficient DenseNet using learned group convolutions. In CVPR. 2752–2761.

[94]

G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger. 2017. Densely connected convolutional networks. In CVPR. 4700–4708.

[95]

J.-C. Huang and T. Leng. 1999. Generalized loop-unrolling: A method for program speedup. In ASSET. 244–248.

[96]

Z. Huang and N. Wang. 2019. Like what you like: Knowledge distill via neuron selectivity transfer. In ICLR.

[97]

I. Hubara, M. Courbariaux, D. Soudry, R. El-Yaniv, and Y. Bengio. 2016. Binarized neural networks. In NIPS. 4114–4122.

[98]

F. N. Iandola, S. Han, M. W. Moskewicz, K. Ashraf, W. J. Dally, and K. Keutzer. 2017. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and< 0.5 MB model size. In ICLR.

[99]

B. Jacob, S. Kligys, B. Chen, M. Zhu, M. Tang, A. Howard, H. Adam, and D. Kalenichenko. 2018. Quantization and training of neural networks for efficient integer-arithmetic-only inference. In CVPR, 2704–2713.

[100]

Y. Jeon and J. Kim. 2018. Constructing fast network through deconstruction of convolution. In NIPS.

[101]

M. Jia, L. Tang, B.-C. Chen, C. Cardie, S. Belongie, B. Hariharan, and S.-N. Lim. 2022. Visual prompt tuning. In ECCV. 709–727.

Digital Library

[102]

N. Jouppi, G. Kurian, S. Li, P. Ma, R. Nagarajan, L. Nai, N. Patil, S. Subramanian, A. Swing, B. Towles, C. Young, X. Zhou, Z. Zhou, and D. Patterson. 2023. TPU v4: An optically reconfigurable supercomputer for machine learning with hardware support for embeddings. In ISCA. 1–14.

[103]

N. P. Jouppi, C. Young, N. Patil, D. Patterson, G. Agrawal, R. Bajwa, S. Bates, S. Bhatia, N. Boden, A. Borchers, R. Boyle, P.-l. Cantin, C. Chao, C. Clark, J. Coriell, M. Daley, M. Dau, J. Dean, B. Gelb, T. V. Ghaemmaghami, R. Gottipati, W. Gulland, R. Hagmann, C. R. Ho, D. Hogberg, J. Hu, R. Hundt, D. Hurt, J. Ibarz, A. Jaffey, A. Jaworski, A. Kaplan, H. Khaitan, A. Koch, N. Kumar, S. Lacy, J. Laudon, J. Law, D. Le, C. Leary, Z. Liu, K. Lucke, A. Lundin, G. MacKean, A. Maggiore, M. Mahony, K. Miller, R. Nagarajan, R. Narayanaswami, R. Ni, K. Nix, T. Norrie, M. Omernick, N. Penukonda, A. Phelps, J. Ross, M. Ross, A. Salek, E. Samadiani, C. Severn, G. Sizikov, M. Snelham, J. Souter, D. Steinberg, A. Swing, M. Tan, G. Thorson, B. Tian, H. Toma, E. Tuttle, V. Vasudevan, R. Walter, W. Wang, E. Wilcox, and D. H. Yoon. 2017. In-datacenter performance analysis of a tensor processing unit. In ISCA. 1–12.

[104]

S. Jung, C. Son, S. Lee, J. Son, J.-J. Han, Y. Kwak, S. J. Hwang, and C. Choi. 2019. Learning to quantize deep networks by optimizing quantization intervals with task loss. In CVPR. 4350–4359.

[105]

B. Kang, X. Chen, D. Wang, H. Peng, and H. Lu. 2023. Exploring lightweight hierarchical vision transformers for efficient visual tracking. In ICCV. 9612–9621.

[106]

M. Kang and B. Han. 2020. Operation-aware soft channel pruning using differentiable masks. In ICML. 7021–7032.

[107]

K. Kim, B. Ji, D. Yoon, and S. Hwang. 2021. Self-knowledge distillation with progressive refinement of targets. In ICCV. 6567–6576.

[108]

N. Kitaev, Ł. Kaiser, and A. Levskaya. 2020. Reformer: The efficient transformer. In ICLR.

[109]

L. Kotthoff, C. Thornton, H. H. Hoos, F. Hutter, and K. Leyton-Brown. 2019. Auto-WEKA: Automatic model selection and hyperparameter optimization in WEKA. In Automated Machine Learning. Springer International Publishing, Cham, 81–95.

[110]

A. Krizhevsky, I. Sutskever, and G. E. Hinton. 2012. ImageNet classification with deep convolutional neural networks. In NIPS. 1097–1105.

Digital Library

[111]

S. Kumar, V. Bitorff, D. Chen, C. Chou, B. Hechtman, H. Lee, N. Kumar, P. Mattson, S. Wang, T. Wang, et al. 2019. Scale MLPerf-0.6 models on Google TPU-v3 pods. arXiv preprint arXiv:1909.09756 (2019).

[112]

L. Lai, N. Suda, and V. Chandra. 2018. CMSIS-NN: Efficient neural network kernels for ARM Cortex-M CPUs. arXiv preprint arXiv:1801.06601 (2018).

[113]

Y. LeCun, J. Denker, and S. Solla. 1989. Optimal brain damage. In NIPS.

[114]

N. Lee, T. Ajanthan, and P. H. Torr. 2019. SNIP: Single-shot network pruning based on connection sensitivity. In ICLR.

[115]

H. Li, A. Kadav, I. Durdanovic, H. Samet, and H. P. Graf. 2017. Pruning filters for efficient ConvNets. In ICLR.

[116]

N. Li, S. Takaki, Y. Tomiokay, and H. Kitazawa. 2016. A multistage dataflow implementation of a deep convolutional neural network based on FPGA for high-speed object recognition. In SSIAI. 165–168.

[117]

S. Li, M. Lin, Y. Wang, Y. Wu, Y. Tian, L. Shao, and R. Ji. 2023. Distilling a powerful student model via online knowledge distillation. Trans. Neural. Netw. Learn. Syst. 34, 11 (2023), 8743–8752.

[118]

S. Li, M. Tan, R. Pang, A. Li, L. Cheng, Q. V. Le, and N. P. Jouppi. 2021. Searching for fast model families on datacenter accelerators. In CVPR. 8085–8095.

[119]

Y. Li, C. Hao, X. Zhang, X. Liu, Y. Chen, J. Xiong, W.-m. Hwu, and D. Chen. 2020. EDD: Efficient differentiable DNN architecture and implementation co-search for embedded AI solutions. In DAC. 1–6.

[120]

Y. Li, Y. Hu, F. Wu, and K. Li. 2022. DiVIT: Algorithm and architecture co-design of differential attention in vision transformer. J. Syst. Archit. 128, C (2022), 102520.

Digital Library

[121]

T. Liang, J. Glossner, L. Wang, S. Shi, and X. Zhang. 2021. Pruning and quantization for deep neural network acceleration: A survey. Neurocomputing 461 (2021), 370–403.

Digital Library

[122]

Y. Liang, G. Chongjian, Z. Tong, Y. Song, J. Wang, and P. Xie. 2021. EViT: Expediting vision transformers via token reorganizations. In ICLR.

[123]

J. Lin, W.-M. Chen, H. Cai, C. Gan, and S. Han. 2021. MCUNetV2: Memory-efficient patch-based inference for tiny deep learning. In NIPS.

[124]

J. Lin, W.-M. Chen, Y. Lin, C. Gan, S. Han, et al. 2020. MCUNet: Tiny deep learning on IoT devices. In NIPS. 11711–11722.

[125]

S. Lin, H. Xie, B. Wang, K. Yu, X. Chang, X. Liang, and G. Wang. 2022. Knowledge distillation via the target-aware transformer. In CVPR. 10915–10924.

[126]

Y. Lin, D. Hafdi, K. Wang, Z. Liu, and S. Han. 2020. Neural-hardware architecture search. In NIPSWS.

[127]

Y.-J. Lin and T. S. Chang. 2017. Data and hardware efficient design for convolutional neural network. IEEE Trans. Circ. Syst. I 65, 5 (2017), 1642–1651.

[128]

B. Liu, F. Li, X. Wang, B. Zhang, and J. Yan. 2023. Ternary weight networks. In ICASSP. 1–5.

[129]

H. Liu, K. Simonyan, and Y. Yang. 2019. DARTS: Differentiable architecture search. In ICLR.

[130]

L. Liu, S. Zhang, Z. Kuang, A. Zhou, J.-H. Xue, X. Wang, Y. Chen, W. Yang, Q. Liao, and W. Zhang. 2021. Group Fisher pruning for practical network compression. In ICML.

[131]

X. Liu, M. Ye, D. Zhou, and Q. Liu. 2021. Post-training quantization with multiple points: Mixed precision without mixed precision. In AAAI, Vol. 35. 8697–8705.

[132]

Z. Liu, H. Hu, Y. Lin, Z. Yao, Z. Xie, Y. Wei, J. Ning, Y. Cao, Z. Zhang, L. Dong, F. Wei, and B. Guo. 2022. Swin transformer V2: Scaling up capacity and resolution. In CVPR. 12009–12019.

[133]

Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, and B. Guo. 2021. Swin transformer: Hierarchical vision transformer using shifted windows. In ICCV. 10012–10022.

[134]

Z. Liu, H. Mu, X. Zhang, Z. Guo, X. Yang, K.-T. Cheng, and J. Sun. 2019. Metapruning: Meta learning for automatic neural network channel pruning. In ICCV. 3296–3305.

[135]

G. Luo, Y. Zhou, X. Sun, Y. Wang, L. Cao, Y. Wu, F. Huang, and R. Ji. 2022. Towards lightweight transformer via group-wise transformation for vision-and-language tasks. Trans. Image. Process. 31 (2022), 3386–3398.

Digital Library

[136]

T. Luo, S. Liu, L. Li, Y. Wang, S. Zhang, T. Chen, Z. Xu, O. Temam, and Y. Chen. 2016. DaDianNao: A neural network supercomputer. IEEE Trans. Comput. 66, 1 (2016), 73–88.

Digital Library

[137]

N. Ma, X. Zhang, H.-T. Zheng, and J. Sun. 2018. ShuffleNet V2: Practical guidelines for efficient CNN architecture design. In ECCV. 116–131.

Digital Library

[138]

2021. Mobile AI Workshop 2021. Retrieved from https://ai-benchmark.com/workshops/mai/2021/#challenges

[139]

2022. Mobile AI Workshop 2022. Retrieved from https://ai-benchmark.com/workshops/mai/2022/#challenges

[140]

2023. Mobile AI Workshop 2023. Retrieved from https://ai-benchmark.com/workshops/mai/2023/#challenges

[141]

S. Mehta, M. Ghazvininejad, S. Iyer, L. Zettlemoyer, and H. Hajishirzi. 2021. DeLighT: Very deep and light-weight transformer. In ICLR.

[142]

S. Mehta, R. Koncel-Kedziorski, M. Rastegari, and H. Hajishirzi. 2018. Pyramidal recurrent unit for language modeling. In EMNLP.

[143]

S. Mehta, R. Koncel-Kedziorski, M. Rastegari, and H. Hajishirzi. 2020. DeFINE: Deep factorized input token embeddings for neural sequence modeling. In ICLR.

[144]

S. Mehta and M. Rastegari. 2022. MobileViT: Light-weight, general-purpose, and mobile-friendly vision transformer. In ICLR.

[145]

L. Mezdour, K. Kadem, M. Merouani, A. S. Haichour, S. Amarasinghe, and R. Baghdadi. 2023. A deep learning model for loop interchange. In ACM SIGPLAN CC. 50–60.

Digital Library

[146]

P. Micikevicius, S. Narang, J. Alben, G. Diamos, E. Elsen, D. Garcia, B. Ginsburg, M. Houston, O. Kuchaiev, G. Venkatesh, and H. Wu. 2018. Mixed precision training. In ICLR.

[147]

M. M. Naseer, K. Ranasinghe, S. H. Khan, M. Hayat, F. Shahbaz Khan, and M.-H. Yang. 2021. Intriguing properties of vision transformers. In NIPS.

[148]

A. Nechi, L. Groth, S. Mulhem, F. Merchant, R. Buchty, and M. Berekovic. 2023. FPGA-based deep learning inference accelerators: Where are we standing? ACM Trans. Reconfig. Technol. Syst. 16, 4 (2023), 1–32.

Digital Library

[149]

NVIDIA. 2023. NVIDIA CUDA-X: GPU Accelerated Libraries. Retrieved from https://developer.nvidia.com/gpu-accelerated-libraries

[150]

OpenAI. 2023. GPT-4 technical report. OpenAI. (2023).

[151]

A. Parashar, M. Rhu, A. Mukkara, A. Puglielli, R. Venkatesan, B. Khailany, J. Emer, S. W. Keckler, and W. J. Dally. 2017. SCNN: An accelerator for compressed-sparse convolutional neural networks. ACM SIGARCH Comput. Archit. News 45, 2 (2017), 27–40.

Digital Library

[152]

A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. Kopf, E. Yang, Z. DeVito, M. Raison, A. Tejani, S. Chilamkurthy, B. Steiner, L. Fang, J. Bai, and S. Chintala. 2019. PyTorch: An imperative style, high-performance deep learning library. In NIPS.

[153]

H. Peng, J. Wu, S. Chen, and J. Huang. 2019. Collaborative channel pruning for deep networks. In ICML. 5113–5122.

[154]

H. Pouransari, Z. Tu, and O. Tuzel. 2020. Least squares binary quantization of neural networks. In CVPRW. 698–699.

[155]

Z. Qi, W. Chen, R. A. Naqvi, and K. Siddique. 2022. Designing deep learning hardware accelerator and efficiency evaluation. Comput. Intell. Neurosci. 2022 (2022).

Digital Library

[156]

J. Qiu, J. Wang, S. Yao, K. Guo, B. Li, E. Zhou, J. Yu, T. Tang, N. Xu, S. Song, Y. Wang, and H. Yang. 2016. Going deeper with embedded FPGA platform for convolutional neural network. In ACM FPGA. 26–35.

Digital Library

[157]

I. Radosavovic, R. P. Kosaraju, R. Girshick, K. He, and P. Dollár. 2020. Designing network design spaces. In CVPR. 10428–10436.

[158]

Y. Rao, W. Zhao, B. Liu, J. Lu, J. Zhou, and C.-J. Hsieh. 2021. DynamicViT: Efficient vision transformers with dynamic token sparsification. In NIPS. 13937–13949.

[159]

M. Rastegari, V. Ordonez, J. Redmon, and A. Farhadi. 2016. XNOR-Net: ImageNet classification using binary convolutional neural networks. In ECCV. 525–542.

[160]

P. P. Ray. 2022. A review on TinyML: State-of-the-art and prospects. J. King Saud Univ.-Comput. Inf. Sci. 34, 4 (2022), 1595–1623.

Digital Library

[161]

E. Real, S. Moore, A. Selle, S. Saxena, Y. L. Suematsu, J. Tan, Q. V. Le, and A. Kurakin. 2017. Large-scale evolution of image classifiers. In ICML. 2902–2911.

[162]

P. Ren, Y. Xiao, X. Chang, P.-Y. Huang, Z. Li, X. Chen, and X. Wang. 2021. A comprehensive survey of neural architecture search: Challenges and solutions. Comput. Surv. 54, 4 (2021), 1–34.

Digital Library

[163]

D. Roggen, R. Cobden, A. Pouryazdan, and M. Zeeshan. 2022. Wearable FPGA platform for accelerated DSP and AI applications. In PerComW. 66–69.

[164]

B. Rokh, A. Azarpeyvand, and A. Khanteymoori. 2023. A comprehensive survey on model quantization for deep neural networks in image classification. ACM Trans. Intell. Syst. Technol. 14, 6 (2023), 1–50.

Digital Library

[165]

R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer. 2022. High-resolution image synthesis with latent diffusion models. In CVPR. 10684–10695.

[166]

C. Sakr, S. Dai, R. Venkatesan, B. Zimmer, W. Dally, and B. Khailany. 2022. Optimal clipping and magnitude-aware differentiation for improved quantization-aware training. In ICML. 19123–19138.

[167]

M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen. 2018. MobileNetV2: Inverted residuals and linear bottlenecks. In CVPR. 4510–4520.

[168]

R. Schwartz, J. Dodge, N. A. Smith, and O. Etzioni. 2020. Green AI. Commun. ACM 63, 12 (2020), 54–63.

Digital Library

[169]

L. Sekanina. 2021. Neural architecture search and hardware accelerator co-search: A survey. IEEE Access 9 (2021), 151337–151362.

[170]

K. P. Seng, P. J. Lee, and L. M. Ang. 2021. Embedded intelligence on FPGA: Survey, applications and challenges. Electronics 10, 8 (2021), 895.

[171]

Y. Shang, Z. Yuan, B. Xie, B. Wu, and Y. Yan. 2023. Post-training quantization on diffusion models. In CVPR. 1972–1981.

[172]

K. Simonyan and A. Zisserman. 2015. Very deep convolutional networks for large-scale image recognition. In ICLR.

[173]

S. Sinha. 2023. State of IoT 2023: Number of Connected IoT Devices Growing 16% to 16.7 Billion Globally. Retrieved from https://iot-analytics.com/number-connected-iot-devices/

[174]

Y. Song, J. Sohl-Dickstein, D. P. Kingma, A. Kumar, S. Ermon, and B. Poole. 2021. Score-based generative modeling through stochastic differential equations. In ICLR.

[175]

A. Srinivas, T.-Y. Lin, N. Parmar, J. Shlens, P. Abbeel, and A. Vaswani. 2021. Bottleneck transformers for visual recognition. In CVPR. 16519–16529.

[176]

A. Stoutchinin, F. Conti, and L. Benini. 2019. Optimally scheduling CNN convolutions for efficient memory access. arXiv preprint arXiv:1902.01492 (2019).

[177]

E. Strubell, A. Ganesh, and A. McCallum. 2019. Energy and policy considerations for deep learning in NLP. In ACL.

[178]

Z. Su, L. Fang, W. Kang, D. Hu, M. Pietikäinen, and L. Liu. 2020. Dynamic group convolution for accelerating convolutional neural networks. In ECCV. 138–155.

Digital Library

[179]

M. Sultana, M. Naseer, M. H. Khan, S. Khan, and F. S. Khan. 2022. Self-distilled vision transformer for domain generalization. In ACCV. 3068–3085.

[180]

M. Sun, Z. Liu, A. Bair, and J. Z. Kolter. 2023. A simple and effective pruning approach for large language models. arXiv preprint arXiv:2306.11695 (2023).

[181]

M. Sun, H. Ma, G. Kang, Y. Jiang, T. Chen, X. Ma, Z. Wang, and Y. Wang. 2022. VAQF: Fully automatic software-hardware co-design framework for low-bit vision transformer. arXiv preprint arXiv:2201.06618 (2022).

[182]

Y. Sun, H. Wang, B. Xue, Y. Jin, G. G. Yen, and M. Zhang. 2020. Surrogate-assisted evolutionary deep learning using an end-to-end random forest-based performance predictor. IEEE Trans. Evolut. Computat. 24, 2 (2020), 350–364.

[183]

V. Sze, Y.-H. Chen, T.-J. Yang, and J. S. Emer. 2020. How to evaluate deep neural network processors: Tops/w (alone) considered harmful. IEEE Solid-state Circ. Mag. 12, 3 (2020), 28–41.

[184]

C. Szegedy, S. Ioffe, V. Vanhoucke, and A. A. Alemi. 2017. Inception-v4, inception-ResNet and the impact of residual connections on learning. In AAAI.

[185]

C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. 2015. Going deeper with convolutions. In CVPR. 1–9.

[186]

C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna. 2016. Rethinking the inception architecture for computer vision. In CVPR. 2818–2826.

[187]

A. Talwalkar. 2020. The Push for Energy Efficient “Green AI.” Retrieved from https://spectrum.ieee.org/energy-efficient-green-ai-strategies

[188]

J. Tan, L. Niu, J. K. Adams, V. Boominathan, J. T. Robinson, R. G. Baraniuk, and A. Veeraraghavan. 2019. Face detection and verification using lensless cameras. Trans. Computat. Imag. 5, 2 (2019), 180–194.

[189]

M. Tan, B. Chen, R. Pang, V. Vasudevan, M. Sandler, A. Howard, and Q. V. Le. 2019. MnasNet: Platform-aware neural architecture search for mobile. In CVPR. 2820–2828.

[190]

M. Tan and Q. Le. 2019. EfficientNet: Rethinking model scaling for convolutional neural networks. In ICML. 6105–6114.

[191]

M. Tan and Q. Le. 2021. EfficientNetV2: Smaller models and faster training. In ICML. 10096–10106.

[192]

M. Tan and Q. V. Le. 2019. MixConv: Mixed depthwise convolutional kernels. In BMVC.

[193]

C. Tao, L. Hou, W. Zhang, L. Shang, X. Jiang, Q. Liu, P. Luo, and N. Wong. 2022. Compression of generative pre-trained language models via quantization. In ACL.

[194]

A. Tarvainen and H. Valpola. 2017. Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. In NIPS.

Digital Library

[195]

Y. Tay, M. Dehghani, D. Bahri, and D. Metzler. 2021. Efficient transformers: A survey. Comput. Surv. 54, 4 (2021), 1–41.

[196]

Y. Tian, D. Krishnan, and P. Isola. 2020. Contrastive representation distillation. In ICLR.

[197]

H. Touvron, M. Cord, M. Douze, F. Massa, A. Sablayrolles, and H. Jégou. 2021. Training data-efficient image transformers & distillation through attention. In ICML. 10347–10357.

[198]

H. Touvron, T. Lavril, G. Izacard, X. Martinet, M.-A. Lachaux, T. Lacroix, B. Rozière, N. Goyal, E. Hambro, F. Azhar, A. Rodriguez, A. Joulin, E. Grave, and G. Lample. 2023. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971 (2023).

[199]

H. Touvron, L. Martin, K. Stone, P. Albert, A. Almahairi, Y. Babaei, N. Bashlykov, S. Batra, P. Bhargava, S. Bhosale, D. Bikel, L. Blecher, C. C. Ferrer, M. Chen, G. Cucurull, D. Esiobu, J. Fernandes, J. Fu, W. Fu, B. Fuller, C. Gao, V. Goswami, N. Goyal, A. Hartshorn, S. Hosseini, R. Hou, H. Inan, M. Kardas, V. Kerkez, M. Khabsa, I. Kloumann, A. Korenev, P. S. Koura, M.-A. Lachaux, T. Lavril, J. Lee, D. Liskovich, Y. Lu, Y. Mao, X. Martinet, T. Mihaylov, P. Mishra, I. Molybog, Y. Nie, A. Poulton, J. Reizenstein, R. Rungta, K. Saladi, A. Schelten, R. Silva, E. M. Smith, R. Subramanian, X. E. Tan, B. Tang, R. Taylor, A. Williams, J. X. Kuan, P. Xu, Z. Yan, I. Zarov, Y. Zhang, A. Fan, M. Kambadur, S. Narang, A. Rodriguez, R. Stojnic, S. Edunov, and T. Scialom. 2023. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288 (2023).

[200]

S. Um, S. Kim, S. Kim, and H.-J. Yoo. 2021. A 43.1 TOPS/W energy-efficient absolute-difference-accumulation operation computing-in-memory with computation reuse. IEEE Trans. Circ. Syst. II 68, 5 (2021), 1605–1609.

[201]

H. Vanholder. 2016. Efficient inference with tensorrt. In GPU Technology Conference.

[202]

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin. 2017. Attention is all you need. In NIPS.

[203]

L. N. Viet, T. N. Dinh, D. T. Minh, H. N. Viet, and Q. L. Tran. 2021. UET-Headpose: A sensor-based top-view head pose dataset. In KSE. 1–7.

[204]

A. Wan, X. Dai, P. Zhang, Z. He, Y. Tian, S. Xie, B. Wu, M. Yu, T. Xu, K. Chen, P. Vajda, and J. E. Gonzalez. 2020. FBNetV2: Differentiable neural architecture search for spatial and channel dimensions. In CVPR. 12965–12974.

[205]

H. Wang, Z. Zhang, and S. Han. 2021. SpAtten: Efficient sparse attention architecture with cascade token and head pruning. In HPCA. 97–110.

[206]

L. Wang, X. Dong, Y. Wang, L. Liu, W. An, and Y. Guo. 2022. Learnable lookup table for neural network quantization. In CVPR. 12423–12433.

[207]

N. Wang, J. Choi, D. Brand, C.-Y. Chen, and K. Gopalakrishnan. 2018. Training deep neural networks with 8-bit floating point numbers. In NIPS. 7686–7695.

[208]

S. Wang, B. Z. Li, M. Khabsa, H. Fang, and H. Ma. 2020. Linformer: Self-attention with linear complexity. arXiv preprint arXiv:2006.04768 (2020).

[209]

X. Wang, M. Kan, S. Shan, and X. Chen. 2019. Fully learnable group convolution for acceleration of deep neural networks. In CVPR. 9049–9058.

[210]

X. Wang, L. L. Zhang, Y. Wang, and M. Yang. 2022. Towards efficient vision transformer inference: A first study of transformers on mobile devices. In WMCSA. 1–7.

Digital Library

[211]

Z. Wang, K. Xu, S. Wu, L. Liu, L. Liu, and D. Wang. 2020. Sparse-YOLO: Hardware/software co-design of an FPGA accelerator for YOLOv2. IEEE Access 8 (2020), 116569–116585.

[212]

X. Wei, C. H. Yu, P. Zhang, Y. Chen, Y. Wang, H. Hu, Y. Liang, and J. Cong. 2017. Automated systolic array architecture synthesis for high throughput CNN inference on FPGAs. In DAC. 1–6.

Digital Library

[213]

M. E. Wolf and M. S. Lam. 1991. A data locality optimizing algorithm. In PLDI. 30–44.

Digital Library

[214]

M. Wortsman, G. Ilharco, S. Y. Gadre, R. Roelofs, R. Gontijo-Lopes, A. S. Morcos, H. Namkoong, A. Farhadi, Y. Carmon, S. Kornblith, and L. Schmidt. 2022. Model soups: Averaging weights of multiple fine-tuned models improves accuracy without increasing inference time. In ICML, 23965–23998.

[215]

B. Wu, X. Dai, P. Zhang, Y. Wang, F. Sun, Y. Wu, Y. Tian, P. Vajda, Y. Jia, and K. Keutzer. 2019. FBNet: Hardware-aware efficient ConvNet design via differentiable neural architecture search. In CVPR. 10734–10742.

[216]

B. Wu, A. Wan, X. Yue, P. Jin, S. Zhao, N. Golmant, A. Gholaminejad, J. Gonzalez, and K. Keutzer. 2018. Shift: A zero flop, zero parameter alternative to spatial convolutions. In CVPR. 9127–9135.

[217]

H. Wu, B. Xiao, N. Codella, M. Liu, X. Dai, L. Yuan, and L. Zhang. 2021. CVT: Introducing convolutions to vision transformers. In ICCV. 22–31.

[218]

X. Wu, C. Li, R. Y. Aminabadi, Z. Yao, and Y. He. 2023. Understanding INT4 quantization for transformer models: Latency speedup, composability, and failure cases. arXiv preprint arXiv:2301.12017 (2023).

[219]

Z. Wu, Z. Liu, J. Lin, Y. Lin, and S. Han. 2020. Lite transformer with long-short range attention. In ICLR.

[220]

T. Xiao, P. Dollar, M. Singh, E. Mintun, T. Darrell, and R. Girshick. 2021. Early convolutions help transformers see better. In NIPS.

[221]

H. Xie, M.-X. Lee, T.-J. Chen, H.-J. Chen, H.-I. Liu, H.-H. Shuai, and W.-H. Cheng. 2023. Most important person-guided dual-branch cross-patch attention for group affect recognition. In ICCV. 20598–20608.

[222]

R. Xu, E. H.-M. Sha, Q. Zhuge, Y. Song, and H. Wang. 2023. Loop interchange and tiling for multi-dimensional loops to minimize write operations on NVMs. J. Syst. Archit. 135 (2023), 102799.

Digital Library

[223]

Y. Xue, C. Chen, and A. Słowik. 2023. Neural architecture search based on a multi-objective evolutionary algorithm with probability stack. IEEE Trans. Evolut. Comput. 27, 4 (2023).

Digital Library

[224]

C. Yang, L. Xie, C. Su, and A. L. Yuille. 2019. Snapshot distillation: Teacher-student optimization in one generation. In CVPR. 2859–2868.

[225]

J. Yang, B. Martinez, A. Bulat, G. Tzimiropoulos. 2021. Knowledge distillation via softmax regression representation learning. In ICLR.

[226]

L. Yang, H. Jiang, R. Cai, Y. Wang, S. Song, G. Huang, and Q. Tian. 2021. CondenseNet V2: Sparse feature reactivation for deep networks. In CVPR. 3569–3578.

[227]

T.-J. Yang, A. Howard, B. Chen, X. Zhang, A. Go, M. Sandler, V. Sze, and H. Adam. 2018. NetAdapt: Platform-aware neural network adaptation for mobile applications. In ECCV. 285–300.

Digital Library

[228]

T.-J. Yang, Y.-L. Liao, and V. Sze. 2021. NetAdaptv2: Efficient neural architecture search with fast super-network training and architecture optimization. In CVPR. 2402–2411.

[229]

Z. Yao, Z. Dong, Z. Zheng, A. Gholami, J. Yu, E. Tan, L. Wang, Q. Huang, Y. Wang, M. Mahoney. 2021. HAWQ-V3: Dyadic neural network quantization. In ICML. 11875–11886.

[230]

J. Ye, X. Chen, N. Xu, C. Zu, Z. Shao, S. Liu, Y. Cui, Z. Zhou, C. Gong, Y. Shen, J. Zhou, S. Chen, T. Gui, Q. Zhang, and X. Huang. 2023. A comprehensive capability analysis of GPT-3 and GPT-3.5 series models. arXiv preprint arXiv:2303.10420 (2023).

[231]

J. Ye, X. Lu, Z. Lin, and J. Z. Wang. 2018. Rethinking the smaller-norm-less-informative assumption in channel pruning of convolution layers. In ICLR.

[232]

H. Yin, A. Vahdat, J. Alvarez, A. Mallya, J. Kautz, and P. Molchanov. 2022. AdaViT: Adaptive tokens for efficient vision transformer. In CVPR, 10809–10818.

[233]

J. Yoon, D. Kang, and M. Cho. 2022. Semi-supervised domain adaptation via sample-to-sample self-distillation. In WACV. 1978–1987.

[234]

H. You, X. Chen, Y. Zhang, C. Li, S. Li, Z. Liu, Z. Wang, and Y. Lin. 2020. ShiftAddNet: A hardware-inspired deep network. In NIPS. 2771–2783.

[235]

C. Yu, T. Chen, and Z. Gan. 2023. Boost transformer-based language models with GPU-friendly sparsity and quantization. In ACL. 218–235.

[236]

J. Yu, J. Liu, X. Wei, H. Zhou, Y. Nakata, D. Gudovskiy, T. Okuno, J. Li, K. Keutzer, and S. Zhang. 2022. Cross-domain object detection with mean-teacher transformer. In ECCV.

Digital Library

[237]

L. Yuan, Y. Chen, T. Wang, W. Yu, Y. Shi, Z. Jiang, F. E. Tay, J. Feng, and S. Yan. 2021. Tokens-to-token ViT: Training vision transformers from scratch on ImageNet. In ICCV. 558–567.

[238]

L. Yuan, F. E. Tay, G. Li, T. Wang, and J. Feng. 2020. Revisiting knowledge distillation via label smoothing regularization. In CVPR. 3903–3911.

[239]

M. Yuan and Y. Lin. 2006. Model selection and estimation in regression with grouped variables. J. R. Stat. Soc. B 68, 1 (2006), 49–67.

[240]

C. Zhang, P. Li, G. Sun, Y. Guan, B. Xiao, and J. Cong. 2015. Optimizing FPGA-based accelerator design for deep convolutional neural networks. In ACM FPGA. 161–170.

Digital Library

[241]

C. Zhang, G. Sun, Z. Fang, P. Zhou, P. Pan, and J. Cong. 2018. Caffeine: Toward uniformed representation and acceleration for deep convolutional neural networks. IEEE Trans. Comput.-aided Des. Integ. Circ. Syst. 38, 11 (2018), 2072–2085.

Digital Library

[242]

H. Zhang, Z. Hu, W. Qin, M. Xu, and M. Wang. 2021. Adversarial co-distillation learning for image recognition. Pattern Recog. 111 (2021), 107659.

[243]

H. Zhang, F. Li, S. Liu, L. Zhang, H. Su, J. Zhu, L. M. Ni, and H.-Y. Shum. 2023. DINO: DETR with improved DeNoising anchor boxes for end-to-end object detection. In ICLR.

[244]

L. Zhang, A. Rao, and M. Agrawala. 2023. Adding conditional control to text-to-image diffusion models. In ICCV. 3836–3847.

[245]

L. Zhang, J. Song, A. Gao, J. Chen, C. Bao, and K. Ma. 2019. Be your own teacher: Improve the performance of convolutional neural networks via self distillation. In ICCV. 3713–3722.

[246]

S. Zhang, Z. Du, L. Zhang, H. Lan, S. Liu, L. Li, Q. Guo, T. Chen, and Y. Chen. 2016. Cambricon-X: An accelerator for sparse neural networks. In MICRO. 1–12.

Digital Library

[247]

X. Zhang, X. Zhou, M. Lin, and J. Sun. 2018. ShuffleNet: An extremely efficient convolutional neural network for mobile devices. In CVPR. 6848–6856.

[248]

F. Faghri, I. Tabrizian, I. Markov, D. Alistarh, D. M. Roy, and A. Ramezani-Kebrya. 2020. Adaptive gradient quantization for data-parallel SGD. NIPS 33 (2020), 3174–3185.

[249]

Y. Zhang, T. Xiang, T. M. Hospedales, and H. Lu. 2018. Deep mutual learning. In CVPR. 4320–4328.

[250]

Z. Zhang, J. Li, W. Shao, Z. Peng, R. Zhang, X. Wang, and P. Luo. 2019. Differentiable learning-to-group channels via groupable convolutional neural networks. In ICCV. 3542–3551.

[251]

B. Zhao, Q. Cui, R. Song, Y. Qiu, and J. Liang. 2022. Decoupled knowledge distillation. In CVPR. 11953–11962.

[252]

D. Zhou, Q. Hou, Y. Chen, J. Feng, and S. Yan. 2020. Rethinking bottleneck structure for efficient mobile network design. In ECCV. 680–697.

Digital Library

[253]

X. Zhou, Z. Du, Q. Guo, S. Liu, C. Liu, C. Wang, X. Zhou, L. Li, T. Chen, and Y. Chen. 2018. Cambricon-S: Addressing irregularity in sparse neural networks through a cooperative software/hardware approach. In MICRO. 15–28.

Digital Library

[254]

Y. Zhou, X. Dong, B. Akin, M. Tan, D. Peng, T. Meng, A. Yazdanbakhsh, D. Huang, R. Narayanaswami, and J. Laudon. 2021. Rethinking co-design of neural architectures and hardware accelerators. arXiv preprint arXiv:2102.08619 (2021).

[255]

C. Zhu, S. Han, H. Mao, and W. J. Dally. 2017. Trained ternary quantization. In ICLR.

[256]

B. Zoph and Q. V. Le. 2017. Neural architecture search with reinforcement learning. In ICLR.

Cited By

Amiri AGhaffarnia ASakib SWu DLiang Y(2025)FocalCA: A Hybrid-Convolutional-Attention Encoder for Intrusion Detection on UNSW-NB15 Achieving High Accuracy Without Data Balancing2025 IEEE 4th International Conference on AI in Cybersecurity (ICAIC)10.1109/ICAIC63015.2025.10848591(1-8)Online publication date: 5-Feb-2025
https://doi.org/10.1109/ICAIC63015.2025.10848591
Kinnas MViolos JKarapiperis NKompatsiaris I(2025)Selecting Images With Entropy for Frugal Knowledge DistillationIEEE Access10.1109/ACCESS.2025.354038413(28189-28203)Online publication date: 2025
https://doi.org/10.1109/ACCESS.2025.3540384
Rodrigues Moreira LMoreira RTravençolo BBackes A(2025)Deep learning based image classification for embedded devices: A systematic reviewNeurocomputing10.1016/j.neucom.2025.129402623(129402)Online publication date: Mar-2025
https://doi.org/10.1016/j.neucom.2025.129402
Show More Cited By

Index Terms

Lightweight Deep Learning for Resource-Constrained Environments: A Survey

Recommendations

Efficient Deep Learning: A Survey on Making Deep Learning Models Smaller, Faster, and Better
Deep learning has revolutionized the fields of computer vision, natural language understanding, speech recognition, information retrieval, and more. However, with the progressive improvements in deep learning models, their number of parameters, latency, ...
Compression of Deep Learning Models for Text: A Survey
In recent years, the fields of natural language processing (NLP) and information retrieval (IR) have made tremendous progress thanks to deep learning models like Recurrent Neural Networks (RNNs), Gated Recurrent Units (GRUs) and Long Short-Term Memory (...
Enable Deep Learning on Mobile Devices: Methods, Systems, and Applications
Deep neural networks (DNNs) have achieved unprecedented success in the field of artificial intelligence (AI), including computer vision, natural language processing, and speech recognition. However, their superior performance comes at the considerable ...

Comments

Information & Contributors

Information

Published In

cover image ACM Computing Surveys

ACM Computing Surveys Volume 56, Issue 10

October 2024

954 pages

EISSN:1557-7341

DOI:10.1145/3613652

Editors:
David Atienza
Swiss Federal Institute of Technology Lausanne (EPFL), Switzerland
,
Michela Milano
University of Bologna, Italy

Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 June 2024

Online AM: 11 May 2024

Accepted: 02 April 2024

Revised: 02 March 2024

Received: 15 December 2022

Published in CSUR Volume 56, Issue 10

Check for updates

Author Tags

Qualifiers

Survey

Funding Sources

National Science and Technology Council, Taiwan
National Key Fields Industry-University Cooperation and Skilled Personnel Training Act
Ministry of Education (MOE) and industry partners in Taiwan

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

14
Total Citations
View Citations
1,287
Total Downloads

Downloads (Last 12 months)1,287
Downloads (Last 6 weeks)191

Reflects downloads up to 22 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Amiri AGhaffarnia ASakib SWu DLiang Y(2025)FocalCA: A Hybrid-Convolutional-Attention Encoder for Intrusion Detection on UNSW-NB15 Achieving High Accuracy Without Data Balancing2025 IEEE 4th International Conference on AI in Cybersecurity (ICAIC)10.1109/ICAIC63015.2025.10848591(1-8)Online publication date: 5-Feb-2025
https://doi.org/10.1109/ICAIC63015.2025.10848591
Kinnas MViolos JKarapiperis NKompatsiaris I(2025)Selecting Images With Entropy for Frugal Knowledge DistillationIEEE Access10.1109/ACCESS.2025.354038413(28189-28203)Online publication date: 2025
https://doi.org/10.1109/ACCESS.2025.3540384
Rodrigues Moreira LMoreira RTravençolo BBackes A(2025)Deep learning based image classification for embedded devices: A systematic reviewNeurocomputing10.1016/j.neucom.2025.129402623(129402)Online publication date: Mar-2025
https://doi.org/10.1016/j.neucom.2025.129402
Islam MSwapnil SBillal MKarim AShafiabady NHassan M(2025)Resource constraint crop damage classification using depth channel shufflingEngineering Applications of Artificial Intelligence10.1016/j.engappai.2025.110117144(110117)Online publication date: Mar-2025
https://doi.org/10.1016/j.engappai.2025.110117
Hoang KLe TNguyen H(2025)Lightweight speaker verification with integrated VAD and speech enhancementDigital Signal Processing10.1016/j.dsp.2024.104969159(104969)Online publication date: Apr-2025
https://doi.org/10.1016/j.dsp.2024.104969
Ning CShen DDong YWang YDuan P(2025)Fine-grained modeling and optimal control methods via video-based positioning for multi-occupant smart lighting systemsBuilding and Environment10.1016/j.buildenv.2025.112683272(112683)Online publication date: Mar-2025
https://doi.org/10.1016/j.buildenv.2025.112683
Cvetkovic SStankovic SNikolic S(2025)BioEdgeNet: A compact deep residual network for stress recognition on edge devicesBiomedical Signal Processing and Control10.1016/j.bspc.2024.107361102(107361)Online publication date: Apr-2025
https://doi.org/10.1016/j.bspc.2024.107361
Pinto LAlves ADos Santos AMoura FOliveira Wde Morais Jde Oliveira RCardoso DDa Rocha Seruffo M(2024)Optimized Assertiveness-Cost Evaluation: An Innovative Performance Measuring Method for Machine Learning Models2024 IEEE Latin American Conference on Computational Intelligence (LA-CCI)10.1109/LA-CCI62337.2024.10814843(1-6)Online publication date: 13-Nov-2024
https://doi.org/10.1109/LA-CCI62337.2024.10814843
Wei LLiang ZWang JLiang H(2024)Improved MobileNetV3 for Weather Scene Classification in Drone Aerial Video Imagery2024 7th International Conference on Computer Information Science and Application Technology (CISAT)10.1109/CISAT62382.2024.10695376(326-331)Online publication date: 12-Jul-2024
https://doi.org/10.1109/CISAT62382.2024.10695376
Strnadel JLojda JSmrž PŠimek V(2024)Machine Learning in Context of IoT/Edge Devices and LoLiPoP-IoT Project*2024 Austrochip Workshop on Microelectronics (Austrochip)10.1109/Austrochip62761.2024.10716234(1-4)Online publication date: 25-Sep-2024
https://doi.org/10.1109/Austrochip62761.2024.10716234
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Full Text

View this article in Full Text.

Figures

Tables

Media

View full text|Download PDF

View Issue’s Table of Contents