research-article

Open access

Bang for the Buck: Evaluating the cost-effectiveness of Heterogeneous Edge Platforms for Neural Network Workloads

Authors:

Amarjeet Saini,

Omkar B Shende,

Mohammad Khalid Pandit,

Gayathri AnanthanarayananAuthors Info & Claims

SEC '23: Proceedings of the Eighth ACM/IEEE Symposium on Edge Computing

Pages 94 - 107

https://doi.org/10.1145/3583740.3628437

Published: 07 August 2024 Publication History

Abstract

Machine learning (ML) applications have experienced remarkable growth and integration into various domains. However, challenges with cloud-based deployments, such as latency, privacy, reliability, bandwidth and connectivity, have driven the popularity of deploying ML on edge devices. ML application deployment stack consists of various components such as neural network models, input frameworks, software runtime libraries and hardware architecture. Understanding the impact of different components in the ML stack on deployment effectiveness, particularly in terms of cost effectiveness, remains a challenge. In this work, we systematically analyze the diverse choices available for each component of the ML stack and their influence on deployment performance. We empirically evaluate eight heterogeneous edge platforms and eight software runtime libraries, considering various hardware components like CPUs, GPUs, NPUs, and VPUs for ML inference. Our findings contribute to a better understanding of optimizing cost effectiveness in ML deployments on edge platforms, aiding decision-making for application developers and stakeholders.

References

[1]

[n. d.]. The 2022 Gartner Hype Cycle™for Artificial Intelligence (AI). https://www.gartner.com/en/articles/what-s-new-in-artificial-intelligence-from-the-2022-gartner-hype-cycle.

[2]

[n. d.]. Acuity Toolkit : https://github.com/khadas/aml_npu_sdk.

[3]

[n. d.]. AI in Edge Devices. https://www.linleygroup.com/events/agenda.php?num=49&day=1.

[4]

[n. d.]. ARMNN : https://github.com/ARM-software/armnn.

[5]

[n. d.]. Intel NCS2. https://www.intel.com/content/www/us/en/developer/articles/tool/neural-compute-stick.html.

[6]

[n. d.]. Intel® Distribution of OpenVINO™ Toolkit. https://github.com/openvinotoolkit/openvino.

[7]

[n. d.]. Jetson Xavier AGX. https://www.nvidia.com/en-in/autonomous-machines/embedded-systems/jetson-agx-xavier/.

[8]

[n. d.]. Jetson Xavier Nano. https://developer.nvidia.com/embedded/jetson-nano-developer-kit.

[9]

[n. d.]. Jetson Xavier NX. https://www.nvidia.com/en-in/autonomous-machines/embedded-systems/jetson-xavier-nx/.

[10]

[n. d.]. Jetson Xavier TX2. https://developer.nvidia.com/embedded/jetson-tx2.

[11]

[n. d.]. Keras Applications. https://keras.io/api/applications/.

[12]

[n. d.]. Khadas Vim3. https://www.khadas.com/vim3.

[13]

[n. d.]. KSNN : https://github.com/khadas/ksnn.

[14]

[n. d.]. NVIDIA TensorRT. https://developer.nvidia.com/tensorrt.

[15]

[n. d.]. ODROID H2. https://www.hardkernel.com/shop/odroid-h2/.

[16]

[n. d.]. Odroid M1. https://www.hardkernel.com/shop/odroid-m1-with-8gbyte-ram/.

[17]

[n. d.]. PyArmNN : https://github.com/NXPmicro/pyarmnn-release.

[18]

[n. d.]. Pycoral : https://github.com/google-coral/pycoral.

[19]

[n. d.]. RKNN : https://github.com/rockchip-linux/rknn-toolkit.

[20]

[n. d.]. Top 5 Edge AI Trends to Watch in 2023. https://blogs.nvidia.com/blog/2022/12/19/edge-ai-trends-2023/.

[21]

Hazem A. Abdelhafez, Hassan Halawa, Amr Almoallim, Amirhossein Ahmadi, Karthik Pattabiraman, and Matei Ripeanu. 2022. Characterizing Variability in Heterogeneous Edge Systems: A Methodology & Case Study. In 7th IEEE/ACM Symposium on Edge Computing, SEC 2022, Seattle, WA, USA, December 5--8, 2022. IEEE, 107--121.

[22]

Robert Adolf, Saketh Rama, Brandon Reagen, Gu-Yeon Wei, and David Brooks. 2016. Fathom: Reference workloads for modern deep learning methods. In 2016 IEEE International Symposium on Workload Characterization (IISWC). IEEE, 1--10.

[23]

Mattia Antonini, Tran Huy Vu, Chulhong Min, Alessandro Montanari, Akhil Mathur, and Fahim Kawsar. 2019. Resource Characterisation of Personal-Scale Sensing Models on Edge Accelerators. In Proceedings of the First International Workshop on Challenges in Artificial Intelligence and Machine Learning for Internet of Things (New York, NY, USA) (AIChallengeIoT'19). Association for Computing Machinery, New York, NY, USA, 49--55.

Digital Library

[24]

Martin Arlitt, Manish Marwah, Gowtham Bellala, Amip Shah, Jeff Healey, and Ben Vandiver. 2015. IoTAbench: An Internet of Things Analytics Benchmark. In Proceedings of the 6th ACM/SPEC International Conference on Performance Engineering (Austin, Texas, USA) (ICPE '15). Association for Computing Machinery, New York, NY, USA, 133--144.

Digital Library

[25]

S. Baller, A. Jindal, M. Chadha, and M. Gerndt. 2021. DeepEdgeBench: Benchmarking Deep Neural Networks on Edge Devices. In 2021 IEEE International Conference on Cloud Engineering (IC2E). IEEE Computer Society, 20--30.

[26]

Simone Bianco, Rémi Cadène, Luigi Celona, and Paolo Napoletano. 2018. Benchmark Analysis of Representative Deep Neural Network Architectures. IEEE Access 6 (2018), 64270--64277.

[27]

Michaela Blott, Lisa Halder, Miriam Leeser, and Linda Doyle. 2019. Qutibench: Benchmarking neural networks on heterogeneous hardware. ACM Journal on Emerging Technologies in Computing Systems (JETC) 15, 4 (2019), 1--38.

Digital Library

[28]

Tianshi Chen, Yunji Chen, Marc Duranton, Qi Guo, Atif Hashmi, Mikko Lipasti, Andrew Nere, Shi Qiu, Michele Sebag, and Olivier Temam. 2012. BenchNN: On the broad potential application scope of hardware neural network accelerators. In 2012 IEEE International Symposium on Workload Characterization (IISWC). IEEE, 36--45.

Digital Library

[29]

François Chollet. 2017. Xception: Deep learning with depthwise separable convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1251--1258.

[30]

Cody Coleman, Daniel Kang, Deepak Narayanan, Luigi Nardi, Tian Zhao, Jian Zhang, Peter Bailis, Kunle Olukotun, Chris Ré, and Matei Zaharia. 2019. Analysis of dawnbench, a time-to-accuracy machine learning performance benchmark. ACM SIGOPS Operating Systems Review 53, 1 (2019), 14--25.

Digital Library

[31]

Anthony Danalis, Gabriel Marin, Collin McCurdy, Jeremy S Meredith, Philip C Roth, Kyle Spafford, Vinod Tipparaju, and Jeffrey S Vetter. 2010. The scalable heterogeneous computing (SHOC) benchmark suite. In Proceedings of the 3rd workshop on general-purpose computation on graphics processing units. 63--74.

Digital Library

[32]

DeepBench. 2018. DeepBench-2018. In https://svail.github.io/DeepBench/.

[33]

Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition. IEEE, 248--255.

[34]

Johann Hauswald, Yiping Kang, Michael A Laurenzano, Quan Chen, Cheng Li, Trevor Mudge, Ronald G Dreslinski, Jason Mars, and Lingjia Tang. 2015. Djinn and tonic: Dnn as a service and its implications for future warehouse scale computers. In 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA). IEEE, 27--40.

Digital Library

[35]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Identity mappings in deep residual networks. In European conference on computer vision. Springer, 630--645.

[36]

Gao Huang, Zhuang Liu, Laurens Van Der Maaten, and Kilian Q Weinberger. 2017. Densely connected convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 4700--4708.

[37]

Spencer Lin Jason Jackson and IBM Marcelo Sávio. [n. d.]. The edge computing advantage. https://www.ibm.com/downloads/cas/Y7RA6X93.

[38]

EunJin Jeong, Jangryul Kim, Samnieng Tan, Jaeseong Lee, and Soonhoi Ha. 2021. Deep learning inference parallelization on heterogeneous processors with tensorrt. IEEE Embedded Systems Letters 14, 1 (2021), 15--18.

[39]

Jongmin Jo, Sucheol Jeong, and Pilsung Kang. 2020. Benchmarking gpu-accelerated edge devices. In 2020 IEEE International Conference on Big Data and Smart Computing (BigComp). IEEE, 117--120.

[40]

Duseok Kang, DongHyun Kang, Jintaek Kang, Sungjoo Yoo, and Soonhoi Ha. 2018. Joint optimization of speed, accuracy, and energy for embedded image recognition systems. In 2018 Design, Automation & Test in Europe Conference & Exhibition (DATE). IEEE, 715--720.

[41]

Yu Liu, Hantian Zhang, Luyuan Zeng, Wentao Wu, and Ce Zhang. 2018. MLbench: benchmarking machine learning services against human experts. Proceedings of the VLDB Endowment 11, 10 (2018), 1220--1232.

Digital Library

[42]

Vijay Janapa Reddi, Christine Cheng, David Kanter, Peter Mattson, Guenther Schmuelling, Carole-Jean Wu, Brian Anderson, Maximilien Breughe, Mark Charlebois, William Chou, et al. 2020. Mlperf inference benchmark. In 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA). IEEE, 446--459.

Digital Library

[43]

Joseph Redmon and Ali Farhadi. 2018. YOLOv3: An Incremental Improvement.

[44]

Albert Reuther, Peter Michaleas, Michael Jones, Vijay Gadepally, Siddharth Samsi, and Jeremy Kepner. 2019. Survey and benchmarking of machine learning accelerators. In 2019 IEEE high performance extreme computing conference (HPEC). IEEE, 1--9.

[45]

Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, and Liang-Chieh Chen. 2018. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 4510--4520.

[46]

Linpeng Tang, Yida Wang, Theodore L Willke, and Kai Li. 2018. Scheduling computation graphs of deep learning models on manycore cpus. arXiv preprint arXiv:1807.09667 (2018).

[47]

Jin-Hua Tao, Zi-Dong Du, Qi Guo, Hui-Ying Lan, Lei Zhang, Sheng-Yuan Zhou, Ling-Jie Xu, Cong Liu, Hai-Feng Liu, Shan Tang, et al. 2018. Benchip: Benchmarking intelligence processors. Journal of Computer Science and Technology 33, 1 (2018), 1--23.

[48]

Blesson Varghese, Nan Wang, David Bermbach, Cheol-Ho Hong, Eyal De Lara, Weisong Shi, and Christopher Stewart. 2021. A Survey on Edge Performance Benchmarking. ACM Comput. Surv. 54, 3, Article 66 (apr 2021), 33 pages.

Digital Library

[49]

Siqi Wang, Anuj Pathania, and Tulika Mitra. 2020. Neural network inference on mobile SoCs. IEEE Design & Test 37, 5 (2020), 50--57.

[50]

Samuel Williams, Andrew Waterman, and David Patterson. 2009. Roofline: an insightful visual performance model for multicore architectures. Commun. ACM 52, 4 (4 2009).

Digital Library

[51]

Yecheng Xiang and Hyoseung Kim. 2019. Pipelined data-parallel CPU/GPU scheduling for multi-DNN real-time inference. In 2019 IEEE Real-Time Systems Symposium (RTSS). IEEE, 392--405.

Cited By

Van Der Staay AFischer RBuschjäger S(2024)Stress-Testing USB Accelerators for Efficient Edge Inference2024 IEEE/ACM Symposium on Edge Computing (SEC)10.1109/SEC62691.2024.00015(1-14)Online publication date: 4-Dec-2024
https://doi.org/10.1109/SEC62691.2024.00015
Kasioulis MSymeonides MIoannou GPallis GDikaiakos M(2024)Energy modeling of inference workloads with AI accelerators at the Edge: A benchmarking study2024 IEEE International Conference on Cloud Engineering (IC2E)10.1109/IC2E61754.2024.00028(189-196)Online publication date: 24-Sep-2024
https://doi.org/10.1109/IC2E61754.2024.00028
Vainio ATarkoma S(2024)Flow Control Solution to Avoid Bottlenecks in Edge Computing for Video Analytics2024 9th International Conference on Fog and Mobile Edge Computing (FMEC)10.1109/FMEC62297.2024.10710217(74-81)Online publication date: 2-Sep-2024
https://doi.org/10.1109/FMEC62297.2024.10710217

Index Terms

Bang for the Buck: Evaluating the cost-effectiveness of Heterogeneous Edge Platforms for Neural Network Workloads
1. Computer systems organization
  1. Embedded and cyber-physical systems
    1. Embedded systems
      1. Embedded hardware
2. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
  2. Machine learning

Recommendations

Data-aware scheduling of legacy kernels on heterogeneous platforms with distributed memory
SPAA '10: Proceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures

In this paper, we describe a runtime to automatically enhance the performance of applications running on heterogeneous platforms consisting of a multi-core (CPU) and a throughput-oriented many-core (GPU). The CPU and GPU are connected by a non-coherent ...
GPU Acceleration for Simulating Massively Parallel Many-Core Platforms
Emerging massively parallel architectures such as a general-purpose processor plus many-core programmable accelerators are creating an increasing demand for novel methods to perform their architectural simulation. Most state-of-the-art simulation ...
Evaluating the Support of MTC Applications on Intel Xeon Phi Many-Core Accelerators
CLUSTER '15: Proceedings of the 2015 IEEE International Conference on Cluster Computing

As Many-Task Computing (MTC) is becoming common-place on clusters, grids, and supercomputers, research that aims to take advantage of the new advances in hardware for MTC workloads is becoming more relevant. A good example is the design of frameworks ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SEC '23: Proceedings of the Eighth ACM/IEEE Symposium on Edge Computing

December 2023

405 pages

ISBN:9798400701238

DOI:10.1145/3583740

Chair:
Kewei Sha,
Program Chairs:
Suman Banerjee,
Jiasi Chen
University of Michigan

Copyright © 2023 Copyright is held by the owner/author(s). Publication rights licensed to ACM.

Publication rights licensed to ACM. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of a national government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

Sponsors

SIGMOBILE: ACM Special Interest Group on Mobility of Systems, Users, Data and Computing
IEEE Computer Society

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 August 2024

Check for updates

Author Tags

Qualifiers

Research-article

Conference

SEC '23

Sponsor:

SIGMOBILE

SEC '23: Eighth ACM/IEEE Symposium on Edge Computing

December 6 - 9, 2023

DE, Wilmington, USA

Acceptance Rates

Overall Acceptance Rate 40 of 100 submissions, 40%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
85
Total Downloads

Downloads (Last 12 months)85
Downloads (Last 6 weeks)19

Reflects downloads up to 26 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Van Der Staay AFischer RBuschjäger S(2024)Stress-Testing USB Accelerators for Efficient Edge Inference2024 IEEE/ACM Symposium on Edge Computing (SEC)10.1109/SEC62691.2024.00015(1-14)Online publication date: 4-Dec-2024
https://doi.org/10.1109/SEC62691.2024.00015
Kasioulis MSymeonides MIoannou GPallis GDikaiakos M(2024)Energy modeling of inference workloads with AI accelerators at the Edge: A benchmarking study2024 IEEE International Conference on Cloud Engineering (IC2E)10.1109/IC2E61754.2024.00028(189-196)Online publication date: 24-Sep-2024
https://doi.org/10.1109/IC2E61754.2024.00028
Vainio ATarkoma S(2024)Flow Control Solution to Avoid Bottlenecks in Edge Computing for Video Analytics2024 9th International Conference on Fog and Mobile Edge Computing (FMEC)10.1109/FMEC62297.2024.10710217(74-81)Online publication date: 2-Sep-2024
https://doi.org/10.1109/FMEC62297.2024.10710217

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Figures

Tables

Media

View Table of Conten