Abstract
In artificial intelligence, the large role is played by machine learning (ML) in a variety of applications. This article aims at providing a comprehensive survey on summarizing recent trends and advances in hardware accelerator design for machine learning based on various hardware platforms like ASIC, FPGA and GPU. In this article, we look at different architectures that allow NN executions in respect of computational units, network topologies, dataflow optimization and accelerators based on new technologies. The important features of the various strategies for enhancing acceleration performance are highlighted. The numerous current difficulties like fair comparison, as well as potential subjects and obstacles in this field has been examined. This study intends to provide readers with a fast overview of neural network compression and acceleration, a clear evaluation of different methods, and the confidence to get started in the right path.
Similar content being viewed by others
Data Availability
No data and material are used for this review article.
Code Availability
No software application or custom code is required in this review article.
References
Liang, Q., Shenoy, P., & Irwin, D. (2020). Ai on the edge: Rethinking ai-based iot applications using specialized edge architectures. arXiv preprint arXiv:2003.12488.
Li, W., & Liewig, M. (2020). A survey of AI accelerators for edge environment. In Trends and Innovations in Information Systems and Technologies: Volume 28 (pp. 35–44). Springer International Publishing.
Zhou, X., Canady, R., Bao, S., & Gokhale, A. (2020). Cost-effective hardware accelerator recommendation for edge computing. In 3rd USENIX Workshop on Hot Topics in Edge Computing (HotEdge 20).
Marchisio, A., Hanif, M. A., Khalid, F., Plastiras, G., Kyrkou, C., Theocharides, T., & Shafique, M. (2019, July). Deep learning for edge computing: Current trends, cross-layer optimizations, and open research challenges. In 2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI) (pp. 553–559). IEEE.
Krestinskaya, O., James, A. P., & Chua, L. O. (2019). Neuromemristive circuits for edge computing: a review. IEEE transactions on neural networks and learning systems, 31(1), 4–23.
Rodríguez, A., Valverde, J., Portilla, J., Otero, A., Riesgo, T., & De la Torre, E. (2018). Fpga-based high-performance embedded systems for adaptive edge computing in cyber-physical systems: the artico3 framework. Sensors, 18(6), 1877.
Osta, M., Ibrahim, A., & Valle, M. (2019). FPGA implementation of approximate CORDIC circuits for energy efficient applications. In 2019 26th IEEE International Conference on Electronics, Circuits and Systems (ICECS) (pp. 127–128). IEEE.
Usami, K., Ochi, H., & Ono, Y. (2020). Approximate computing based on latest-result reuse for image edge detection. In 2020 35th International Technical Conference on Circuits/Systems, Computers and Communications (ITC-CSCC) (pp. 234–239). IEEE.
Leipnitz, M. T., & Nazar, G. L. (2019). High-level synthesis of approximate designs under real-time constraints. ACM Transactions on Embedded Computing Systems (TECS), 18(5s), 1–21.
Ono, Y., & Usami, K. (2019). Approximate computing technique using memoization and simplified multiplication. In 2019 34th International Technical Conference on Circuits/Systems, Computers and Communications (ITC-CSCC) (pp. 1–4). IEEE.
Ibrahim, A., Osta, M., Alameh, M., Saleh, M., Chible, H., & Valle, M. (2018). Approximate computing methods for embedded machine learning. In 2018 25th IEEE International Conference on Electronics, Circuits and Systems (ICECS) (pp. 845–848). IEEE.
Liu, B., Qin, H., Gong, Y., Ge, W., Xia, M., & Shi, L. (2018). EERA-ASR: An energy-efficient reconfigurable architecture for automatic speech recognition with hybrid DNN and approximate computing. IEEE Access, 6, 52227–52237.
Choi, J., & Venkataramani, S. (2019). Approximate computing techniques for deep neural networks. Approximate Circuits: Methodologies and CAD. https://doi.org/10.1007/978-3-319-99322-5_15
Chen, C. Y., Choi, J., Gopalakrishnan, K., Srinivasan, V., & Venkataramani, S. (2018, March). Exploiting approximate computing for deep learning acceleration. In 2018 Design, Automation & Test in Europe Conference & Exhibition (DATE) (pp. 821–826). IEEE.
Mazahir, S., Hasan, O., & Shafique, M. (2019). Self-compensating accelerators for efficient approximate computing. Microelectronics Journal, 88, 9–17.
Wang, X., Han, Y., Leung, V. C., Niyato, D., Yan, X., & Chen, X. (2020). Convergence of edge computing and deep learning: a comprehensive survey. IEEE Communications Surveys & Tutorials, 22(2), 869–904.
Reuther, A., Michaleas, P., Jones, M., Gadepally, V., Samsi, S., & Kepner, J. (2020). Survey of machine learning accelerators. In 2020 IEEE high performance extreme computing conference (HPEC) (pp. 1–12). IEEE.
Owaida, M., Alonso, G., Fogliarini, L., Hock-Koon, A., & Melet, P. E. (2019). Lowering the latency of data processing pipelines through FPGA based hardware acceleration. Proceedings of the VLDB Endowment, 13(1), 71–85.
Capra, M., Bussolino, B., Marchisio, A., Shafique, M., Masera, G., & Martina, M. (2020). An updated survey of efficient hardware architectures for accelerating deep convolutional neural networks. Future Internet, 12(7), 113.
Zaman, K. S., Reaz, M. B. I., Ali, S. H. M., Bakar, A. A. A., & Chowdhury, M. E. H. (2021). Custom hardware architectures for deep learning on portable devices: a review. IEEE Transactions on Neural Networks and Learning Systems, 33(11), 6068–6088.
Akkad, G., Mansour, A., & Inaty, E. (2023). Embedded deep learning accelerators: a survey on recent advances. IEEE Transactions on Artificial Intelligence. https://doi.org/10.1109/TAI.2023.3311776
Mohaidat, T., & Khalil, K. (2024). A survey on neural network hardware accelerators. IEEE Transactions on Artificial Intelligence. https://doi.org/10.1109/TAI.2024.3377147
Bertazzoni, S., Canese, L., Cardarilli, G. C., Di Nunzio, L., Fazzolari, R., Re, M., & Spanò, S. (2024). Design space exploration for edge machine learning featured by MathWorks FPGA DL processor: a survey. IEEE Access, 12, 9418–9439. https://doi.org/10.1109/ACCESS.2024.3352266
Manor, E., & Greenberg, S. (2022). Custom hardware inference accelerator for tensorflow lite for microcontrollers. IEEE Access, 10, 73484–73493.
Wulfert, L., Kühnel, J., Krupp, L., Viga, J., Wiede, C., Gembaczka, P., & Grabmaier, A. (2024). AIfES: a next-generation edge AI framework. IEEE Transactions on Pattern Analysis and Machine Intelligence, 46(6), 4519–4533. https://doi.org/10.1109/TPAMI.2024.3355495
Rosero-Montalvo, P. D., Tözün, P., & Hernandez, W. (2024). Optimized CNN architectures benchmarking in hardware-constrained edge devices in IoT environments. IEEE Internet of Things Journal, 11(11), 20357–20366. https://doi.org/10.1109/JIOT.2024.3369607
Haris, J., Gibson, P., Cano, J., Agostini, N. B., & Kaeli, D. (2023). SECDA-TFLite: a toolkit for efficient development of FPGA-based DNN accelerators for edge inference. Journal of Parallel and Distributed Computing, 173, 140–151.
Al Koutayni, M. R., Reis, G., & Stricker, D. (2023). Deepedgesoc: END-to-end deep learning framework for edge iot devices. Internet of Things, 21, 100665.
Kim, V. H., & Choi, K. K. (2023). A reconfigurable CNN-based accelerator design for fast and energy-efficient object detection system on mobile FPGA. IEEE Access, 11, 59438–59445. https://doi.org/10.1109/ACCESS.2023.3285279
Magalhães, S. C., dos Santos, F. N., Machado, P., Moreira, A. P., & Dias, J. (2023). Benchmarking edge computing devices for grape bunches and trunks detection using accelerated object detection single shot multibox deep learning models. Engineering Applications of Artificial Intelligence, 117, 105604.
Jin, Y., Cai, J., Xu, J., Huan, Y., Yan, Y., Huang, B., & Zou, Z. (2021). Self-aware distributed deep learning framework for heterogeneous IoT edge devices. Future Generation Computer Systems, 125, 908–920.
Xia, M., Huang, Z., Tian, L., Wang, H., Chang, V., Zhu, Y., & Feng, S. (2021). SparkNoC: an energy-efficiency FPGA-based accelerator using optimized lightweight CNN for edge computing. Journal of Systems Architecture, 115, 101991.
Liu, X., Yang, J., Zou, C., Chen, Q., Yan, X., Chen, Y., & Cai, C. (2021). Collaborative edge computing with FPGA-based CNN accelerators for energy-efficient and time-aware face tracking system. IEEE Transactions on Computational Social Systems, 9(1), 252–266.
Sadi, M., & Guin, U. (2021). Test and yield loss reduction of AI and deep learning accelerators. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 41(1), 104–115.
Lee, J., Kang, S., Lee, J., Shin, D., Han, D., & Yoo, H. J. (2020). The hardware and algorithm co-design for energy-efficient DNN processor on edge/mobile devices. IEEE Transactions on Circuits and Systems I: Regular Papers, 67(10), 3458–3470.
Jain, V., Giraldo, S., De Roose, J., Mei, L., Boons, B., & Verhelst, M. (2023). Tinyvers: a tiny versatile system-on-chip with state-retentive eMRAM for ML inference at the extreme edge. IEEE Journal of Solid-State Circuits, 58(8), 2360–2371. https://doi.org/10.1109/JSSC.2023.3236566
Chang, I. F., Chen, H. R., & Chao, P. C. P. (2023). Design and implementation for a high-efficiency hardware accelerator to realize the learning machine for predicting OLED degradation. Microsystem Technologies, 29(8), 1069–1081.
Wang, H., Sayadi, H., Dinakarrao, S. M. P., Sasan, A., Rafatirad, S., & Homayoun, H. (2021). Enabling micro AI for securing edge devices at hardware level. IEEE Journal on Emerging and Selected Topics in Circuits and Systems, 11(4), 803–815.
Russo, E., Palesi, M., Monteleone, S., Patti, D., Mineo, A., Ascia, G., & Catania, V. (2021). DNN model compression for IoT domain-specific hardware accelerators. IEEE Internet of Things Journal, 9(9), 6650–6662.
Sze, V., Chen, Y. H., Yang, T. J., & Emer, J. S. (2017). Efficient processing of deep neural networks: a tutorial and survey. Proceedings of the IEEE, 105(12), 2295–2329.
Han, S., Liu, X., Mao, H., Pu, J., Pedram, A., Horowitz, M. A., & Dally, W. J. (2016). EIE: efficient inference engine on compressed deep neural network. ACM SIGARCH Computer Architecture News, 44(3), 243–254.
Wang, C., Gong, L., Yu, Q., Li, X., Xie, Y., & Zhou, X. (2016). DLAU: a scalable deep learning accelerator unit on FPGA. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 36(3), 513–517.
Zhao, R., Song, W., Zhang, W., Xing, T., Lin, J. H., Srivastava, M., & Zhang, Z. (2017). Accelerating binarized convolutional neural networks with software-programmable FPGAs. In Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (pp. 15–24).
Mohsin, M. A., & Perera, D. G. (2018). An FPGA-based hardware accelerator for K-nearest neighbor classification for machine learning on mobile devices. In Proceedings of the 9th International Symposium on Highly-Efficient Accelerators and Reconfigurable Technologies (pp. 1–7).
Zhang, C., Li, P., Sun, G., Guan, Y., Xiao, B., & Cong, J. (2015). Optimizing FPGA-based accelerator design for deep convolutional neural networks. In Proceedings of the 2015 ACM/SIGDA international symposium on field-programmable gate arrays (pp. 161–170).
Chen, Y., Xie, Y., Song, L., Chen, F., & Tang, T. (2020). A survey of accelerator architectures for deep neural networks. Engineering, 6(3), 264–274.
Liu, X., Mao, M., Liu, B., Li, H., Chen, Y., Li, B., & Yang, J. (2015). RENO: A high-efficient reconfigurable neuromorphic computing accelerator design. In Proceedings of the 52nd Annual Design Automation Conference (pp. 1–6).
Chen, Y., Chen, T., Xu, Z., Sun, N., & Temam, O. (2016). DianNao family: energy-efficient hardware accelerators for machine learning. Communications of the ACM, 59(11), 105–112.
Shawahna, A., Sait, S. M., & El-Maleh, A. (2018). FPGA-based accelerators of deep learning networks for learning and classification: A review. ieee Access, 7, 7823–7859.
Jouppi, N. P., Young, C., Patil, N., Patterson, D., Agrawal, G., Bajwa, R., & Yoon, D. H. (2017, June). In-datacenter performance analysis of a tensor processing unit. In Proceedings of the 44th annual international symposium on computer architecture (pp. 1–12).
Chen, Y. H., Krishna, T., Emer, J. S., & Sze, V. (2016). Eyeriss: an energy-efficient reconfigurable accelerator for deep convolutional neural networks. IEEE journal of solid-state circuits, 52(1), 127–138.
Chen, Y. H., Emer, J., & Sze, V. (2016). Eyeriss: A spatial architecture for energy-efficient dataflow for convolutional neural networks. ACM SIGARCH computer architecture news, 44(3), 367–379.
Sze, V., Chen, Y. H., Emer, J., Suleiman, A., & Zhang, Z. (2017). Hardware for machine learning: Challenges and opportunities. In 2017 IEEE custom integrated circuits conference (CICC) (pp. 1–8). IEEE.
Deng, L., Li, G., Han, S., Shi, L., & Xie, Y. (2020). Model compression and hardware acceleration for neural networks: a comprehensive survey. Proceedings of the IEEE, 108(4), 485–532.
Ardestani, A. S. (2018). Design and Optimization of Hardware Accelerators for Deep Learning (Doctoral dissertation, The University of Utah).
Bojnordi, M. N., & Ipek, E. (2016). Memristive boltzmann machine: A hardware accelerator for combinatorial optimization and deep learning. In 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA) (pp. 1–13). IEEE.
Kim, D., Kung, J., Chai, S., Yalamanchili, S., & Mukhopadhyay, S. (2016). Neurocube: a programmable digital neuromorphic architecture with high-density 3D memory. ACM SIGARCH Computer Architecture News, 44(3), 380–392.
Lu, H., Wei, X., Lin, N., Yan, G., & Li, X. (2018). Tetris: Re-architecting convolutional neural network computation for machine learning accelerators. In 2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD) (pp. 1–8). IEEE.
Du, L., & Du, Y. (2017). Hardware accelerator design for machine learning. Machine Learning-Advanced Techniques and Emerging Applications. IntechOpen: London.
Gawande, N. A., Daily, J. A., Siegel, C., Tallent, N. R., & Vishnu, A. (2020). Scaling deep learning workloads: Nvidia dgx-1/pascal and intel knights landing. Future Generation Computer Systems, 108, 1162–1172.
Chen, J., & Ran, X. (2019). Deep learning with edge computing: a review. Proceedings of the IEEE, 107(8), 1655–1674.
Merenda, M., Porcaro, C., & Iero, D. (2020). Edge machine learning for AI-enabled iot devices: a review. Sensors, 20(9), 2533.
Li, H., Ota, K., & Dong, M. (2018). Learning IoT in edge: Deep learning for the Internet of Things with edge computing. IEEE Network, 32(1), 96–101.
Teerapittayanon, S., McDanel, B., & Kung, H. T. (2017). Distributed deep neural networks over the cloud, the edge and end devices. In 2017 IEEE 37th international conference on distributed computing systems (ICDCS) (pp. 328–339). IEEE.
Zhao, Z., Barijough, K. M., & Gerstlauer, A. (2018). Deepthings: Distributed adaptive deep learning inference on resource-constrained iot edge clusters. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 37(11), 2348–2359.
Wang, J., Zhang, J., Bao, W., Zhu, X., Cao, B., & Yu, P. S. (2018). Not just privacy: Improving performance of private deep learning in mobile cloud. In Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining (pp. 2407–2416).
Dias, M., Abad, A., & Trancoso, I. (2018). Exploring hashing and cryptonet based approaches for privacy-preserving speech emotion recognition. In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 2057–2061). IEEE.
Zhou, Z., Chen, X., Li, E., Zeng, L., Luo, K., & Zhang, J. (2019). Edge intelligence: Paving the last mile of artificial intelligence with edge computing. Proceedings of the IEEE, 107(8), 1738–1762.
Deng, S., Zhao, H., Fang, W., Yin, J., Dustdar, S., & Zomaya, A. Y. (2020). Edge intelligence: the confluence of edge computing and artificial intelligence. IEEE Internet of Things Journal, 7(8), 7457–7469.
Sajjad, M., Nasir, M., Muhammad, K., Khan, S., Jan, Z., Sangaiah, A. K., & Baik, S. W. (2020). Raspberry Pi assisted face recognition framework for enhanced law-enforcement services in smart cities. Future Generation Computer Systems, 108, 995–1007.
Nikouei, S. Y., Chen, Y., Song, S., Xu, R., Choi, B. Y., & Faughnan, T. (2018). Smart surveillance as an edge network service: From harr-cascade, svm to a lightweight cnn. In 2018 IEEE 4th international conference on collaboration and internet computing (cic) (pp. 256–265). IEEE.
Xu, R., Nikouei, S. Y., Chen, Y., Polunchenko, A., Song, S., Deng, C., & Faughnan, T. R. (2018). Real-time human objects tracking for smart surveillance at the edge. In 2018 IEEE International conference on communications (ICC) (pp. 1–6). IEEE.
Fafoutis, X., Marchegiani, L., Elsts, A., Pope, J., Piechocki, R., & Craddock, I. (2018). Extending the battery lifetime of wearable sensors with embedded machine learning. In 2018 IEEE 4th World Forum on Internet of Things (WF-IoT) (pp. 269–274). IEEE.
Haigh, K. Z., Mackay, A. M., Cook, M. R., & Lin, L. G. (2015). Machine learning for embedded systems: a case study. BBN Technologies: Cambridge, MA, USA, 8571, 1–12.
Chand, G., Ali, M., Barmada, B., Liesaputra, V., & Ramirez-Prado, G. (2019). Tracking a person’s behaviour in a smart house. In Service-Oriented Computing–ICSOC 2018 Workshops: ADMS, ASOCA, ISYyCC, CloTS, DDBS, and NLS4IoT, Hangzhou, China, November 12–15, 2018, Revised Selected Papers 16 (pp. 241–252). Springer International Publishing.
Rosato, D., Masciadri, A., Comai, S., & Salice, F. (2018). Non-invasive monitoring system to detect sitting people. In Proceedings of the 4th EAI International Conference on Smart Objects and Technologies for Social Good (pp. 261–264).
Martin Wisniewski, L., Bec, J. M., Boguszewski, G., & Gamatié, A. (2022). Hardware solutions for low-power smart edge computing. Journal of Low Power Electronics and Applications, 12(4), 61.
Funding
In this review article has not been funded by anyone.
Author information
Authors and Affiliations
Contributions
1st author prepared (Idea, writing, grammar correction) this manuscript. 2nd and 3rd author guided the 1st author for manuscript preparation.
Corresponding author
Ethics declarations
Conflict of interest
All authors do not have any conflict of interest.
Ethical Approval
This article does not contain any studies with human participants or animals performed by any of the authors.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Samanta, A., Hatai, I. & Mal, A.K. A Survey on Hardware Accelerator Design of Deep Learning for Edge Devices. Wireless Pers Commun 137, 1715–1760 (2024). https://doi.org/10.1007/s11277-024-11443-2
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11277-024-11443-2