Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Performance Modeling of Computer Vision-based CNN on Edge GPUs

Published: 08 October 2022 Publication History

Abstract

Convolutional Neural Networks (CNNs) are currently widely used in various fields, particularly for computer vision applications. Edge platforms have drawn tremendous attention from academia and industry due to their ability to improve execution time and preserve privacy. However, edge platforms struggle to satisfy CNNs’ needs due to their computation and energy constraints. Thus, it is challenging to find the most efficient CNN that respects accuracy, time, energy, and memory footprint constraints for a target edge platform. Furthermore, given the size of the design space of CNNs and hardware platforms, performance evaluation of CNNs entails several efforts. Consequently, designers need tools to quickly explore large design space and select the CNN that offers the best performance trade-off for a set of hardware platforms. This article proposes a Machine Learning (ML)–based modeling approach for CNN performances on edge GPU-based platforms for vision applications. We implement and compare five of the most successful ML algorithms for accurate and rapid CNN performance predictions on three different edge GPUs in image classification. Experimental results demonstrate the robustness and usefulness of our proposed methodology. For three of the five ML algorithms — XGBoost, Random Forest, and Ridge Polynomial regression — average errors of 11%, 6%, and 8% have been obtained for CNN inference execution time, power consumption, and memory usage, respectively.

References

[1]
Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, Manjunath Kudlur, Josh Levenberg, Rajat Monga, Sherry Moore, Derek G. Murray, Benoit Steiner, Paul Tucker, Vijay Vasudevan, Pete Warden, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2016. TensorFlow: A system for large-scale machine learning. In 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI’16). USENIX Association, Savannah, GA, 265–283. https://www.usenix.org/conference/osdi16/technical-sessions/presentation/abadi.
[2]
Mohamed S. Abdelfattah, Łukasz Dudziak, Thomas Chau, Royson Lee, Hyeji Kim, and Nicholas D. Lane. 2020. Best of both worlds: AutoML codesign of a CNN and its hardware accelerator. In 57th ACM/IEEE Design Automation Conference (DAC’20). 1–6.
[3]
Hervé Abdi. 2007. The Kendall rank correlation coefficient. Encyclopedia of Measurement and Statistics. Sage, Thousand Oaks, CA (2007), 508–510.
[4]
Marcos Amarís, Raphael Y. de Camargo, Mohamed Dyab, Alfredo Goldman, and Denis Trystram. 2016. A comparison of GPU execution time prediction using machine learning and analytical modeling. In IEEE 15th International Symposium on Network Computing and Applications (NCA’16). 326–333.
[5]
Yehia Arafa, Abdel-Hameed Badawy, Gopinath Chennupati, Atanu Barai, Nandakishore Santhi, and Stephan Eidenbenz. 2020. Fast, Accurate, and Scalable Memory Modeling of GPGPUs Using Reuse Profiles. Association for Computing Machinery, New York, NY.
[6]
Mariette Awad and Rahul Khanna. 2015. Support vector regression. In Efficient Learning Machines. Springer, 67–80.
[7]
Jacob Benesty, Jingdong Chen, Yiteng Huang, and Israel Cohen. 2009. Pearson Correlation Coefficient. Springer, Berlin, Heidelberg,1–4.
[8]
Hadjer Benmeziane, Kaoutar El Maghraoui, Hamza Ouarnoughi, Smail Niar, Martin Wistuba, and Naigang Wang. 2021. A comprehensive survey on hardware-aware neural architecture search. arXiv:2101.09336 [cs.LG].
[9]
Simone Bianco, Remi Cadene, Luigi Celona, and Paolo Napoletano. 2018. Benchmark analysis of representative deep neural network architectures. IEEE Access 6 (2018), 64270–64277.
[10]
Ermao Cai, Da-Cheng Juan, Dimitrios Stamoulis, and Diana Marculescu. 2017. NeuralPower: Predict and deploy energy-efficient convolutional neural networks. In Proceedings of the 9th Asian Conference on Machine Learning(Proceedings of Machine Learning Research, Vol. 77), Min-Ling Zhang and Yung-Kyun Noh (Eds.). PMLR, Yonsei University, Seoul, Republic of Korea, 622–637. https://proceedings.mlr.press/v77/cai17a.html.
[11]
Han Cai, Chuang Gan, Ligeng Zhu, and Song Han. 2020. TinyTL: Reduce memory, not parameters for efficient on-device learning. In Advances in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M. F. Balcan, and H. Lin (Eds.), Vol. 33. Curran Associates, Inc., 11285–11297. https://proceedings.neurips.cc/paper/2020/file/81f7acabd411274fcf65ce2070ed568a-Paper.pdf.
[12]
Jiasi Chen and Xukan Ran. 2019. Deep learning with edge computing: A review. Proceedings of IEEE 107, 8 (2019), 1655–1674.
[13]
Tianqi Chen and Carlos Guestrin. 2016. XGBoost: A scalable tree boosting system(KDD’16). ACM, New York, NY, 785—794.
[14]
Yunpeng Chen, Jianan Li, Huaxin Xiao, Xiaojie Jin, Shuicheng Yan, and Jiashi Feng. 2017. Dual path networks. In Advances in Neural Information Processing Systems, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.), Vol. 30. Curran Associates, Inc.https://proceedings.neurips.cc/paper/2017/file/f7e0b956540676a129760a3eae309294-Paper.pdf.
[15]
Sharan Chetlur, Cliff Woolley, Philippe Vandermersch, Jonathan Cohen, John Tran, Bryan Catanzaro, and Evan Shelhamer. 2014. CuDNN: Efficient primitives for deep learning. arXiv preprint arXiv:1410.0759 (2014).
[16]
Arnaud de Myttenaere, Boris Golden, Bénédicte Le Grand, and Fabrice Rossi. 2016. Mean absolute percentage error for regression models. Neurocomputing 192 (2016), 38–48.
[17]
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. ImageNet: A large-scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA. 248–255.
[18]
Lei Deng, Guoqi Li, Song Han, Luping Shi, and Yuan Xie. 2020. Model compression and hardware acceleration for neural networks: A comprehensive survey. Proc IEEE 108, 4 (2020), 485–532.
[19]
Amir Gholami, Kiseok Kwon, Bichen Wu, Zizheng Tai, Xiangyu Yue, Peter Jin, Sicheng Zhao, and Kurt Keutzer. 2018. Squeezenext: Hardware-aware neural network design. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Salt Lake City, Utah, USA. 1638–1647.
[20]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, Nevada, USA. 770–778.
[21]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Identity mappings in deep residual networks. In European Conference on Computer Vision, Las Vegas, Nevada, USA. Springer, 630–645.
[22]
M. A. Hearst, S. T. Dumais, E. Osuna, J. Platt, and B. Scholkopf. 1998. Support vector machines. IEEE Intelligent Systems and their Applications 13, 4 (1998), 18–28.
[23]
Arthur E. Hoerl and Robert W. Kennard. 1970. Ridge regression: Biased estimation for nonorthogonal problems. Technometrics 12, 1 (1970), 55–67.
[24]
Morteza Hosseini, Mohammad Ebrahimabadi, Arnab Neelim Mazumder, Houman Homayoun, and Tinoosh Mohsenin. 2021. A fast method to fine-tune neural networks for the least energy consumption on FPGAs. UMBC Student Collection (2021).
[25]
Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. 2017. MobileNets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017).
[26]
Jie Hu, Li Shen, and Gang Sun. 2018. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7132–7141.
[27]
Gao Huang, Zhuang Liu, Laurens Van Der Maaten, and Kilian Q. Weinberger. 2017. Densely connected convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, Hawaii, USA. 4700–4708.
[28]
Andrey Ignatov, Radu Timofte, William Chou, Ke Wang, Max Wu, Tim Hartley, and Luc Van Gool. 2018. AI benchmark: Running deep neural networks on Android smartphones. In Proceedings of the European Conference on Computer Vision (ECCV’18) Workshops, Munich, Germany. 1–8.
[29]
Joel Janai, Fatma Güney, Aseem Behl, Andreas Geiger, et al. 2020. Computer vision for autonomous vehicles: Problems, datasets and state of the art. Foundations and Trends® in Computer Graphics and Vision 12, 1–3 (2020), 1–308.
[30]
Jongmin Jo, Sucheol Jeong, and Pilsung Kang. 2020. Benchmarking GPU-accelerated edge devices. In 2020 IEEE International Conference on Big Data and Smart Computing (BigComp’20), Busan, Korea South. 117–120.
[31]
Daniel Justus, John Brennan, Stephen Bonner, and Andrew Stephen McGough. 2018. Predicting the computational cost of deep learning models. In 2018 IEEE International Conference on Big Data (Big Data’18), Seattle, WA, USA. 3873–3882.
[32]
Asifullah Khan, Anabia Sohail, Umme Zahoora, and Aqsa Saeed Qureshi. 2020. A survey of the recent architectures of deep convolutional neural networks. Artificial Intelligence Review 53, 8 (2020), 5455–5516.
[33]
Mohsen Kiani and Amir Rajabzadeh. 2021. SDAM: A combined stack distance-analytical modeling approach to estimate memory performance in GPUs. The Journal of Supercomputing 77, 5 (2021), 5120–5147.
[34]
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. ImageNet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems, F. Pereira, C. J. C. Burges, L. Bottou, and K. Q. Weinberger (Eds.), Vol. 25. Curran Associates, Inc.
[35]
Yen-Lin Lee, Pei-Kuei Tsung, and Max Wu. 2018. Techology trend of edge AI. In 2018 International Symposium on VLSI Design, Automation and Test (VLSI-DAT’18), Hsinchu, Taiwan. 1–2.
[36]
Cheng Li, Abdul Dakkak, Jinjun Xiong, Wei Wei, Lingjie Xu, and Wen-mei Hwu. 2020. XSP: Across-stack profiling and analysis of machine learning models on GPUs. In 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS’20), New Orleans, LA, USA. 326–327.
[37]
Andy Liaw and Matthew Wiener.2002. Classification and regression by randomForest. R News 2, 3 (2002), 18–22.
[38]
Mingbao Lin, Rongrong Ji, Yan Wang, Yichen Zhang, Baochang Zhang, Yonghong Tian, and Ling Shao. 2020. HRank: Filter pruning using high-rank feature map. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Virtual. 1529–1538.
[39]
Mingbao Lin, Rongrong Ji, Yuxin Zhang, Baochang Zhang, Yongjian Wu, and Yonghong Tian. 2020. Channel pruning via automatic structure search. arXiv preprint arXiv:2001.08565 (2020).
[40]
Peiye Liu, Bo Wu, Huadong Ma, and Mingoo Seok. 2019. MemNet: Memory-efficiency guided neural architecture search with augment-trim learning. arXiv preprint arXiv:1907.09569 (2019).
[41]
Zongqing Lu, Swati Rallapalli, Kevin Chan, and Thomas La Porta. 2017. Modeling the resource requirements of convolutional neural networks on mobile devices. In Proceedings of the 25th ACM International Conference on Multimedia (Mountain View, CA) (MM’17). ACM, New York, NY, 1663—1671.
[42]
Yufei Ma, Yu Cao, Sarma Vrudhula, and Jae-Sun Seo. 2020. Performance modeling for CNN inference accelerators on FPGA. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 39, 4 (2020), 843–856.
[43]
Susmita Dey Manasi and Sachin S. Sapatnekar. 2021. DeepOpt: Optimized scheduling of CNN workloads for ASIC-based systolic deep learning accelerators. In Proceedings of the 26th Asia and South Pacific Design Automation Conference (Tokyo, Japan) (ASPDAC’21). ACM, New York, NY, 235—241.
[44]
Jiandong Mu, Wei Zhang, Hao Liang, and Sharad Sinha. 2020. Optimizing OpenCL-Based CNN design on FPGA with comprehensive design space exploration and collaborative performance modeling. ACM Transactions on Reconfigurable Technology and Systems 13, 3, Article 13 (Jun 2020), 28 pages.
[45]
Fionn Murtagh. 1991. Multilayer perceptrons for classification and regression. Neurocomputing 2, 5 (1991), 183–197.
[46]
Paulo Eduardo Nogueira, Rivalino Matias, and Elder Vicente. 2014. An experimental study on execution time variation in computer experiments. In Proceedings of the 29th Annual ACM Symposium on Applied Computing (Gyeongju, Republic of Korea) (SAC’14). ACM, New York, NY, 1529—1534.
[47]
NVIDIA. 2007. NVIDIA Profiler (nvprof). Retrieved June 30, 2020 from https://docs.nvidia.com/cuda/profiler-users-guide/index.html#nvprof-overview.
[48]
NVIDIA. 2019. Tegrastats Utility. Retrieved December 01, 2020 from https://docs.nvidia.com/jetson/l4t/index.html#page/Tegra.
[49]
Eva Ostertagová. 2012. Modelling using polynomial regression. Procedia Engineering 48 (2012), 500–506.
[50]
Hang Qi, Evan R. Sparks, and Ameet Talwalkar. 2017. PALEO: A performance model for deep neural networks.
[51]
Haotong Qin, Ruihao Gong, Xianglong Liu, Xiao Bai, Jingkuan Song, and Nicu Sebe. 2020. Binary neural networks: A survey. Pattern Recognition 105 (2020), 107281.
[52]
Crefeda Faviola Rodrigues, Graham Riley, and Mikel Luján. 2017. Fine-grained energy profiling for deep convolutional neural networks on the Jetson TX1. In 2017 IEEE International Symposium on Workload Characterization (IISWC’17). 114–115.
[53]
Crefeda Faviola Rodrigues, Graham Riley, and Mikel Luján. 2017. Fine-grained energy profiling for deep convolutional neural networks on the Jetson TX1. In 2017 IEEE International Symposium on Workload Characterization (IISWC’17). 114–115.
[54]
Juan D. Rodriguez, Aritz Perez, and Jose A. Lozano. 2010. Sensitivity analysis of k-fold cross validation in prediction error estimation. IEEE Transactions on Pattern Analysis and Machine Intelligence 32, 3 (2010), 569–575.
[55]
Shaohuai Shi, Qiang Wang, and Xiaowen Chu. 2018. Performance modeling and evaluation of distributed deep learning frameworks on GPUs. In IEEE 16th International Conference on Dependable, Autonomic and Secure Computing, 16th International Conference on Pervasive Intelligence and Computing, 4th International Conference on Big Data Intelligence and Computing and Cyber Science and Technology Congress (DASC/PiCom/DataCom/CyberSciTech’18). Athens, Greece. 949–957.
[56]
Kevin Siu, Dylan Malone Stuart, Mostafa Mahmoud, and Andreas Moshovos. 2018. Memory requirements for convolutional neural network hardware accelerators. In 2018 IEEE International Symposium on Workload Characterization (IISWC’18). Raleigh, NC, USA. 111–121.
[57]
Dimitrios Stamoulis, Ermao Cai, Da-Cheng Juan, and Diana Marculescu. 2018. HyperPower: Power- and memory-constrained hyper-parameter optimization for neural networks. In 2018 Design, Automation Test in Europe Conference Exhibition (DATE’18). Dresden, Germany. 19–24.
[58]
Qi Sun, Tinghuan Chen, Jin Miao, and Bei Yu. 2019. Power-driven DNN dataflow optimization on FPGA. In 2019 IEEE/ACM International Conference on Computer-Aided Design (ICCAD’19). Westminster, CO, USA. 1–7.
[59]
Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2015. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Boston, Massachusetts, USA. 1–9.
[60]
Mingxing Tan, Bo Chen, Ruoming Pang, Vijay Vasudevan, Mark Sandler, Andrew Howard, and Quoc V. Le. 2019. MNASNet: Platform-aware neural architecture search for mobile. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, California, USA. 2820–2828.
[61]
Mingxing Tan and Quoc V. Le. 2019. EfficientNet: Rethinking model scaling for convolutional neural networks. arXiv preprint arXiv:1905.11946 (2019).
[62]
Han Vanholder. 2016. Efficient Inference with TensorRT. Retrieved June 30, 2020 from https://developer.nvidia.com/tensorrt.
[63]
Delia Velasco-Montero, Jorge Fernández-Berni, Ricardo Carmona-Galán, and Ángel Rodráguez-Vázquez. 2018. Optimum selection of DNN model and framework for edge inference. IEEE Access 6 (2018), 51680–51692.
[64]
Delia Velasco-Montero, Jorge Fernández-Berni, Ricardo Carmona-Galán, and Ángel Rodráguez-Vázquez. 2020. PreVIous: A methodology for prediction of visual inference performance on IoT devices. IEEE Internet of Things Journal 7, 10 (2020), 9227–9240.
[65]
Mengdi Wang, Chen Meng, Guoping Long, Chuan Wu, Jun Yang, Wei Lin, and Yangqing Jia. 2019. Characterizing deep learning training workloads on Alibaba-PAI. In 2019 IEEE International Symposium on Workload Characterization (IISWC’19). Orlando, FL, USA. 189–202.
[66]
Yu Emma Wang, Gu-Yeon Wei, and David Brooks. 2019. Benchmarking TPU, GPU, and CPU platforms for deep learning. arXiv preprint arXiv:1907.10701 (2019).
[67]
Bichen Wu, Xiaoliang Dai, Peizhao Zhang, Yanghan Wang, Fei Sun, Yiming Wu, Yuandong Tian, Peter Vajda, Yangqing Jia, and Kurt Keutzer. 2019. FBNet: Hardware-aware efficient ConvNet design via differentiable neural architecture search. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, California, USA. 10734–10742.
[68]
Bichen Wu, Forrest Iandola, Peter H. Jin, and Kurt Keutzer. 2017. SqueezeDet: Unified, small, low power fully convolutional neural networks for real-time object detection for autonomous driving. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. Honolulu, Hawaii, USA. 129–137.
[69]
Huaizheng Zhang, Yizheng Huang, Yonggang Wen, Jianxiong Yin, and Kyle Guan. 2020. InferBench: Understanding deep learning inference serving with an automatic benchmarking system. arXiv preprint arXiv:2011.02327 (2020).
[70]
Xiaofan Zhang, Hanchen Ye, Junsong Wang, Yonghua Lin, Jinjun Xiong, Wen-mei Hwu, and Deming Chen. 2020. DNNExplorer: A framework for modeling and exploring a novel paradigm of FPGA-based DNN accelerator(ICCAD’20). ACM, New York, NY, Article 61, 9 pages.
[71]
Xiangyu Zhang, Xinyu Zhou, Mengxiao Lin, and Jian Sun. 2018. ShuffleNet: An extremely efficient convolutional neural network for mobile devices. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City, Utah, USA. 6848–6856.
[72]
Barret Zoph, Vijay Vasudevan, Jonathon Shlens, and Quoc V. Le. 2018. Learning transferable architectures for scalable image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City, Utah, USA. IEEE, 8697–8710.
[73]
Junhua Zou, Ting Rui, You Zhou, Chengsong Yang, and Sai Zhang. 2018. Convolutional neural network simplification via feature map pruning. Computers & Electrical Engineering 70 (2018), 950–958.

Cited By

View all
  • (2024)Performance Prediction for Deep Learning Models With Pipeline Inference StrategyIEEE Internet of Things Journal10.1109/JIOT.2023.329425311:2(2964-2978)Online publication date: 15-Jan-2024
  • (2023)Accurate Deep Learning Inference Latency Prediction over Dynamic Running Mobile Devices2023 19th International Conference on Mobility, Sensing and Networking (MSN)10.1109/MSN60784.2023.00052(293-300)Online publication date: 14-Dec-2023

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Embedded Computing Systems
ACM Transactions on Embedded Computing Systems  Volume 21, Issue 5
September 2022
526 pages
ISSN:1539-9087
EISSN:1558-3465
DOI:10.1145/3561947
  • Editor:
  • Tulika Mitra
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Journal Family

Publication History

Published: 08 October 2022
Online AM: 26 March 2022
Accepted: 14 March 2022
Revised: 13 February 2022
Received: 16 July 2021
Published in TECS Volume 21, Issue 5

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Performance modeling
  2. CNN
  3. edge GPU
  4. execution time
  5. power consumption
  6. memory usage
  7. machine learning
  8. regression analysis

Qualifiers

  • Research-article
  • Refereed

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)193
  • Downloads (Last 6 weeks)12
Reflects downloads up to 01 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Performance Prediction for Deep Learning Models With Pipeline Inference StrategyIEEE Internet of Things Journal10.1109/JIOT.2023.329425311:2(2964-2978)Online publication date: 15-Jan-2024
  • (2023)Accurate Deep Learning Inference Latency Prediction over Dynamic Running Mobile Devices2023 19th International Conference on Mobility, Sensing and Networking (MSN)10.1109/MSN60784.2023.00052(293-300)Online publication date: 14-Dec-2023

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

Full Text

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media