research-article

Ace-Sniper: Cloud–Edge Collaborative Scheduling Framework With DNN Inference Latency Modeling on Heterogeneous Devices

Authors:

Xuehai ZhouAuthors Info & Claims

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, Volume 43, Issue 2

Pages 534 - 547

https://doi.org/10.1109/TCAD.2023.3314388

Published: 01 February 2024 Publication History

Abstract

The cloud–edge collaborative inference requires efficient scheduling of artificial intelligence (AI) tasks to the appropriate edge intelligence devices. Gls DNN inference latency has become a vital basis for improving scheduling efficiency. However, edge devices exhibit highly heterogeneous due to the differences in hardware architectures, computing power, etc. Meanwhile, the diverse deep neural networks (DNNs) are continuing to iterate over time. The diversity of devices and DNNs introduces high computational costs for measurement methods, while invasive prediction methods face significant development efforts and application limitations. In this article, we propose and develop Ace-Sniper, a scheduling framework with DNN inference latency modeling on heterogeneous devices. First, to address the device heterogeneity, a unified hardware resource modeling (HRM) is designed by considering the platforms as black-box functions that output feature vectors. Second, neural network similarity (NNS) is introduced for feature extraction of diverse and frequently iterated DNNs. Finally, with the results of HRM and NNS as input, the performance characterization network is designed to predict the latencies of the given unseen DNNs on heterogeneous devices, which can be combined into most time-based scheduling algorithms. Experimental results show that the average relative error of DNN inference latency prediction is 11.11%, and the prediction accuracy reaches 93.2%. Compared with the nontime-aware scheduling methods, the average waiting time for tasks is reduced by 82.95%, and the platform throughput is improved by 63% on average.

References

[1]

M. Li, Y. Li, Y. Tian, L. Jiang, and Q. Xu, “AppealNet: An efficient and highly-accurate edge/cloud collaborative architecture for DNN inference,” in Proc. 58th ACM/IEEE Design Autom. Conf., San Francisco, CA, USA, Dec. 2021, pp. 409–414.

[2]

“The KubeEdge SIG AI.” Sedna. Website. 2020. [Online]. Available: https://github.com/kubeedge/sedna

[3]

Linux Foundation. “Edgexfoundry.” Website. 2018. [Online]. Available: https://www.edgexfoundry.org

[4]

C. Caiazza, S. Giordano, V. Luconi, and A. Vecchio, “Edge computing vs centralized cloud: Impact of communication latency on the energy consumption of LTE terminal nodes,” Comput. Commun., vol. 194, pp. 213–225, Oct. 2022.

[5]

W. Shi, J. Cao, Q. Zhang, Y. Li, and L. Xu, “Edge computing: Vision and challenges,” IEEE Internet Things J., vol. 3, no. 5, pp. 637–646, Oct. 2016.

[6]

X. Zhang, M. Hu, J. Xia, T. Wei, M. Chen, and S. Hu, “Efficient federated learning for cloud-based AIoT applications,” IEEE Trans. Comput.-Aided Design Integr. Circuits Syst., vol. 40, no. 11, pp. 2211–2223, Nov. 2021.

Digital Library

[7]

Y. Xiang and H. Kim, “Pipelined data-parallel CPU/GPU scheduling for multi-DNN real-time inference,” in Proc. IEEE Real-Time Syst. Symp., Dec. 2019, pp. 392–405.

[8]

L. Zeng, X. Chen, Z. Zhou, L. Yang, and J. Zhang, “CoEdge: Cooperative DNN inference with adaptive workload partitioning over heterogeneous edge devices,” IEEE/ACM Trans. Netw., vol. 29, no. 2, pp. 595–608, Apr. 2021.

Digital Library

[9]

X. Wu, H. Xu, and Y. Wang, “Irina: Accelerating DNN inference with efficient online scheduling,” in Proc. 4th Asia-Pacific Workshop Netw., 2020, pp. 36–43.

[10]

W. Seo, S. Cha, Y. Kim, J. Huh, and J. Park, “SLO-aware inference scheduler for heterogeneous processors in edge platforms,” ACM Trans. Archit. Code Optim., vol. 18, no. 4, p. 43, 2021.

[11]

D. Casini, A. Biondi, and G. Buttazzo, “Task splitting and load balancing of dynamic real-time workloads for semi-partitioned EDF,” IEEE Trans. Comput., vol. 70, no. 12, pp. 2168–2181, Dec. 2021.

Digital Library

[12]

S. Liu, Z. Wang, G. Wei, and M. Li, “Distributed set-membership filtering for multirate systems under the round-robin scheduling over sensor networks,” IEEE Trans. Cybern., vol. 50, no. 5, pp. 1910–1920, May 2020.

[13]

S. Kalapothas, G. Flamis, and P. Kitsos, “Efficient edge-AI application deployment for FPGAs,” Information, vol. 13, no. 6, p. 279, 2022.

[14]

L. Fick, S. Skrzyniarz, M. Parikh, M. B. Henry, and D. Fick, “Analog matrix processor for edge AI real-time video analytics,” in Proc. IEEE Int. Solid-State Circuits Conf., San Francisco, CA, USA, Feb. 2022, pp. 260–262.

[15]

D. Kanter. “Supercomputing 19: HPC meets machine learning.” Website. 2019. https://www.realworldtech.com/sc19-hpc-meets-machine-learning/

[16]

L. L. Zhanget al., “nn-Meter: Towards accurate latency prediction of deep-learning model inference on diverse edge devices,” in Proc. 19th Annu. Int. Conf. Mobile Syst., Appl., Services, 2021, pp. 81–93.

[17]

Z. Jia, J. Thomas, T. Warszawski, M. Gao, M. Zaharia, and A. Aiken, “Optimizing DNN computation with relaxed graph substitutions,” in Proc. Mach. Learn. Syst., Stanford, CA, USA, Mar./Apr. 2019, pp. 1–13.

[18]

Y. He, J. Lin, Z. Liu, H. Wang, L.-J. Li, and S. Han, “AMC: AutoML for model compression and acceleration on mobile devices,” in Proc. 15th Eur. Conf. Comput. Vis., 2018, pp. 815–832.

[19]

H. Liu, K. Simonyan, and Y. Yang, “DARTS: Differentiable architecture search,” in Proc. 7th Int. Conf. Learn. Represent., New Orleans, LA, USA, May 2019, pp. 1–13.

[20]

M. Tanet al., “MnasNet: Platform-aware neural architecture search for mobile,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2019, pp. 2820–2828.

[21]

W. Liu, J. Geng, Z. Zhu, J. Cao, and Z. Lian, “Sniper: Cloud-edge collaborative inference scheduling with neural network similarity modeling,” in Proc. 59th ACM/IEEE Design Autom. Conf., San Francisco, CA, USA, Jul. 2022, pp. 505–510.

[22]

H. Lee, S. Lee, S. Chong, and S. J. Hwang, “Hardware-adaptive efficient latency prediction for NAS via meta-learning,” in Proc. Adv. Neural Inf. Process. Syst., vol. 34, Dec. 2021, pp. 27016–27028.

[23]

J. Zhao, K. Yang, X. Wei, Y. Ding, L. Hu, and G. Xu, “A heuristic clustering-based task deployment approach for load balancing using Bayes theorem in cloud environment,” IEEE Trans. Parallel Distrib. Syst., vol. 27, no. 2, pp. 305–316, Feb. 2016.

Digital Library

[24]

D. W. Otter, J. R. Medina, and J. K. Kalita, “A survey of the usages of deep learning for natural language processing,” IEEE Trans. Neural Netw. Learn. Syst., vol. 32, no. 2, pp. 604–624, Feb. 2021.

[25]

A. Shahroudnejad, “A survey on understanding, visualizations, and explanation of deep neural networks,” 2021, arXiv:2102.01792.

[26]

J. A. Alslieet al., “Áika: A distributed edge system for AI inference,” Big Data Cogn. Comput., vol. 6, no. 2, p. 68, 2022.

[27]

S. Zhang, W. Li, C. Wang, Z. Tari, and A. Y. Zomaya, “DyBatch: Efficient batching and fair scheduling for deep learning inference on time-sharing devices,” in Proc. 20th IEEE/ACM Int. Symp. Clust., Cloud Internet Comput., May 2020, pp. 609–618.

[28]

Y. Wang, G.-Y. Wei, and D. Brooks, “A systematic methodology for analysis of deep learning hardware and software platforms,” in Proc. Mach. Learn. Syst., Austin, TX, USA, Mar. 2020, pp. 1–14.

[29]

L. Yanget al., “Co-exploration of neural architectures and heterogeneous ASIC accelerator designs targeting multiple tasks,” in Proc. 57th ACM/IEEE Design Autom. Conf., San Francisco, CA, USA, Jul. 2020, pp. 1–6.

[30]

J. Adler and I. Parmryd, “Quantifying colocalization by correlation: The Pearson correlation coefficient is superior to the Mander’s overlap coefficient,” Cytometry A, vol. 77A, no. 8, pp. 733–742, 2010.

[31]

A. Vaswaniet al. “Attention is all you need,” in Proc. Adv. Neural Inf. Process. Syst., vol. 30. Long Beach, CA, USA, Dec. 2017, pp. 5998–6008.

[32]

M. Dominic and B. N. Jain. “Conditions for on-line scheduling of hard real-time tasks on multiprocessors,” J. Parallel Distrib. Comput., vol. 55, no. 1, pp. 121–137, 1998.

Digital Library

[33]

N. Mansouri, B. M. H. Zade, and M. M. Javidi, “Hybrid task scheduling strategy for cloud computing by modified particle swarm optimization and fuzzy theory,” Comput. Ind. Eng., vol. 130, pp. 597–633, Apr. 2019.

Digital Library

[34]

L. Dudziak, T. Chau, M. S. Abdelfattah, R. Lee, H. Kim, and N. D. Lane, “BRP-NAS: Prediction-based NAS using GCNs,” in Proc. Adv. Neural Inf. Process. Syst., vol. 33, 2020, pp. 10480–10490.

[35]

Y. Choi, Y. Kim, and M. Rhu, “LazyBatching: An SLA-aware batching system for cloud machine learning inference,” in Proc. IEEE Int. Symp. High-Perform. Comput. Archit., 2021, pp. 493–506.

[36]

C. Zhang, M. Yu, W. Wang, and F. Yan, “MArk: Exploiting cloud services for cost-effective, SLO-aware machine learning inference serving,” in Proc. USENIX Annu. Tech. Conf., Renton, WA, USA, Jul. 2019, pp. 1049–1062.

[37]

A. Samajdar, J. M. Joseph, Y. Zhu, P. Whatmough, M. Mattina, and T. Krishna, “A systematic methodology for characterizing scalability of DNN accelerators using SCALE-sim,” in Proc. IEEE Int. Symp. Perform. Anal. Syst. Softw., Boston, MA, USA, Aug. 2020, pp. 58–68.

[38]

Z. Jia, S. Lin, C. R. Qi, and A. Aiken, “Exploring hidden dimensions in accelerating convolutional neural networks,” in Proc. 35th Int. Conf. Mach. Learn., vol. 80, Jul. 2018, pp. 2274–2283.

[39]

M. Sbai, M. R. U. Saputra, N. Trigoni, and A. Markham, “Cut, distil and encode (CDE): Split cloud-edge deep inference,” in Proc. 18th Annu. IEEE Int. Conf. Sens., Commun., Netw., Rome, Italy, Jul. 2021, pp. 1–9.

[40]

B. Zamirai, S. Latifi, P. Zamirai, and S. A. Mahlke. “SIEVE: Speculative inference on the edge with versatile exportation,” in Proc. 57th ACM/IEEE Design Autom. Conf., San Francisco, CA, USA, 2020, pp. 1–6.

[41]

R. G. Pacheco, R. S. Couto, and O. Simeone, “Calibration-aided edge inference offloading via adaptive model partitioning of deep neural networks,” in Proc. Int. Conf. Commun., Montreal, QC, Canada, Jun. 2021, pp. 1–6.

Cited By

Shirzadian Gilan MMaham B(2024)Power-efficient full-duplex near user with power allocation and antenna selection in NOMA-based systemsEURASIP Journal on Wireless Communications and Networking10.1186/s13638-024-02391-32024:1Online publication date: 23-Aug-2024
https://dl.acm.org/doi/10.1186/s13638-024-02391-3
Liu WZhu ZLi BXiong YLian ZGeng JZhou X(2024)Arch2End: Two-Stage Unified System-Level Modeling for Heterogeneous Intelligent DevicesIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2024.344370643:11(4154-4165)Online publication date: 1-Nov-2024
https://dl.acm.org/doi/10.1109/TCAD.2024.3443706

Index Terms

Ace-Sniper: Cloud–Edge Collaborative Scheduling Framework With DNN Inference Latency Modeling on Heterogeneous Devices

Index terms have been assigned to the content through auto-classification.

Recommendations

An adaptive DNN inference acceleration framework with end–edge–cloud collaborative computing
Abstract
Deep Neural Networks (DNNs) based on intelligent applications have been intensively deployed on mobile devices. Unfortunately, resource-constrained mobile devices cannot meet stringent latency requirements due to a large amount of ...
Highlights
- An adaptive DNN inference acceleration framework is proposed to reduce DNN inference latency in the end–edge–cloud computing environment.
Distributing deep learning inference on edge devices
CoNEXT '20: Proceedings of the 16th International Conference on emerging Networking EXperiments and Technologies

Deep Neural Networks (DNNs) and Convolutional Neural Networks (CNNs) are widely used in IoT related applications. However, inferencing pre-trained large DNNs and CNNs consumes a significant amount of time, memory and computational resources. This makes ...
Niagara: Scheduling DNN Inference Services on Heterogeneous Edge Processors
Service-Oriented Computing
Abstract
Intelligent applications heavily rely on deep neural network (DNN) inference services executed on edge devices to fulfill functional prerequisites while safeguarding user data privacy. However, the execution of such DNN services on resource-...

Comments

Information & Contributors

Information

Published In

cover image IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems Volume 43, Issue 2

Feb. 2024

300 pages

Issue’s Table of Contents

1937-4151 © 2023 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See https://www.ieee.org/publications/rights/index.html for more information.

Publisher

IEEE Press

Publication History

Published: 01 February 2024

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 02 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Shirzadian Gilan MMaham B(2024)Power-efficient full-duplex near user with power allocation and antenna selection in NOMA-based systemsEURASIP Journal on Wireless Communications and Networking10.1186/s13638-024-02391-32024:1Online publication date: 23-Aug-2024
https://dl.acm.org/doi/10.1186/s13638-024-02391-3
Liu WZhu ZLi BXiong YLian ZGeng JZhou X(2024)Arch2End: Two-Stage Unified System-Level Modeling for Heterogeneous Intelligent DevicesIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2024.344370643:11(4154-4165)Online publication date: 1-Nov-2024
https://dl.acm.org/doi/10.1109/TCAD.2024.3443706

View Options

View options

Figures

Tables

Media

View Issue’s Table of Contents