Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Ace-Sniper: Cloud–Edge Collaborative Scheduling Framework With DNN Inference Latency Modeling on Heterogeneous Devices

Published: 01 February 2024 Publication History

Abstract

The cloud–edge collaborative inference requires efficient scheduling of artificial intelligence (AI) tasks to the appropriate edge intelligence devices. Gls DNN inference latency has become a vital basis for improving scheduling efficiency. However, edge devices exhibit highly heterogeneous due to the differences in hardware architectures, computing power, etc. Meanwhile, the diverse deep neural networks (DNNs) are continuing to iterate over time. The diversity of devices and DNNs introduces high computational costs for measurement methods, while invasive prediction methods face significant development efforts and application limitations. In this article, we propose and develop Ace-Sniper, a scheduling framework with DNN inference latency modeling on heterogeneous devices. First, to address the device heterogeneity, a unified hardware resource modeling (HRM) is designed by considering the platforms as black-box functions that output feature vectors. Second, neural network similarity (NNS) is introduced for feature extraction of diverse and frequently iterated DNNs. Finally, with the results of HRM and NNS as input, the performance characterization network is designed to predict the latencies of the given unseen DNNs on heterogeneous devices, which can be combined into most time-based scheduling algorithms. Experimental results show that the average relative error of DNN inference latency prediction is 11.11%, and the prediction accuracy reaches 93.2%. Compared with the nontime-aware scheduling methods, the average waiting time for tasks is reduced by 82.95%, and the platform throughput is improved by 63% on average.

References

[1]
M. Li, Y. Li, Y. Tian, L. Jiang, and Q. Xu, “AppealNet: An efficient and highly-accurate edge/cloud collaborative architecture for DNN inference,” in Proc. 58th ACM/IEEE Design Autom. Conf., San Francisco, CA, USA, Dec. 2021, pp. 409–414.
[2]
The KubeEdge SIG AI.” Sedna. Website. 2020. [Online]. Available: https://github.com/kubeedge/sedna
[3]
Linux Foundation. “Edgexfoundry.” Website. 2018. [Online]. Available: https://www.edgexfoundry.org
[4]
C. Caiazza, S. Giordano, V. Luconi, and A. Vecchio, “Edge computing vs centralized cloud: Impact of communication latency on the energy consumption of LTE terminal nodes,” Comput. Commun., vol. 194, pp. 213–225, Oct. 2022.
[5]
W. Shi, J. Cao, Q. Zhang, Y. Li, and L. Xu, “Edge computing: Vision and challenges,” IEEE Internet Things J., vol. 3, no. 5, pp. 637–646, Oct. 2016.
[6]
X. Zhang, M. Hu, J. Xia, T. Wei, M. Chen, and S. Hu, “Efficient federated learning for cloud-based AIoT applications,” IEEE Trans. Comput.-Aided Design Integr. Circuits Syst., vol. 40, no. 11, pp. 2211–2223, Nov. 2021.
[7]
Y. Xiang and H. Kim, “Pipelined data-parallel CPU/GPU scheduling for multi-DNN real-time inference,” in Proc. IEEE Real-Time Syst. Symp., Dec. 2019, pp. 392–405.
[8]
L. Zeng, X. Chen, Z. Zhou, L. Yang, and J. Zhang, “CoEdge: Cooperative DNN inference with adaptive workload partitioning over heterogeneous edge devices,” IEEE/ACM Trans. Netw., vol. 29, no. 2, pp. 595–608, Apr. 2021.
[9]
X. Wu, H. Xu, and Y. Wang, “Irina: Accelerating DNN inference with efficient online scheduling,” in Proc. 4th Asia-Pacific Workshop Netw., 2020, pp. 36–43.
[10]
W. Seo, S. Cha, Y. Kim, J. Huh, and J. Park, “SLO-aware inference scheduler for heterogeneous processors in edge platforms,” ACM Trans. Archit. Code Optim., vol. 18, no. 4, p. 43, 2021.
[11]
D. Casini, A. Biondi, and G. Buttazzo, “Task splitting and load balancing of dynamic real-time workloads for semi-partitioned EDF,” IEEE Trans. Comput., vol. 70, no. 12, pp. 2168–2181, Dec. 2021.
[12]
S. Liu, Z. Wang, G. Wei, and M. Li, “Distributed set-membership filtering for multirate systems under the round-robin scheduling over sensor networks,” IEEE Trans. Cybern., vol. 50, no. 5, pp. 1910–1920, May 2020.
[13]
S. Kalapothas, G. Flamis, and P. Kitsos, “Efficient edge-AI application deployment for FPGAs,” Information, vol. 13, no. 6, p. 279, 2022.
[14]
L. Fick, S. Skrzyniarz, M. Parikh, M. B. Henry, and D. Fick, “Analog matrix processor for edge AI real-time video analytics,” in Proc. IEEE Int. Solid-State Circuits Conf., San Francisco, CA, USA, Feb. 2022, pp. 260–262.
[15]
D. Kanter. “Supercomputing 19: HPC meets machine learning.” Website. 2019. https://www.realworldtech.com/sc19-hpc-meets-machine-learning/
[16]
L. L. Zhanget al., “nn-Meter: Towards accurate latency prediction of deep-learning model inference on diverse edge devices,” in Proc. 19th Annu. Int. Conf. Mobile Syst., Appl., Services, 2021, pp. 81–93.
[17]
Z. Jia, J. Thomas, T. Warszawski, M. Gao, M. Zaharia, and A. Aiken, “Optimizing DNN computation with relaxed graph substitutions,” in Proc. Mach. Learn. Syst., Stanford, CA, USA, Mar./Apr. 2019, pp. 1–13.
[18]
Y. He, J. Lin, Z. Liu, H. Wang, L.-J. Li, and S. Han, “AMC: AutoML for model compression and acceleration on mobile devices,” in Proc. 15th Eur. Conf. Comput. Vis., 2018, pp. 815–832.
[19]
H. Liu, K. Simonyan, and Y. Yang, “DARTS: Differentiable architecture search,” in Proc. 7th Int. Conf. Learn. Represent., New Orleans, LA, USA, May 2019, pp. 1–13.
[20]
M. Tanet al., “MnasNet: Platform-aware neural architecture search for mobile,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2019, pp. 2820–2828.
[21]
W. Liu, J. Geng, Z. Zhu, J. Cao, and Z. Lian, “Sniper: Cloud-edge collaborative inference scheduling with neural network similarity modeling,” in Proc. 59th ACM/IEEE Design Autom. Conf., San Francisco, CA, USA, Jul. 2022, pp. 505–510.
[22]
H. Lee, S. Lee, S. Chong, and S. J. Hwang, “Hardware-adaptive efficient latency prediction for NAS via meta-learning,” in Proc. Adv. Neural Inf. Process. Syst., vol. 34, Dec. 2021, pp. 27016–27028.
[23]
J. Zhao, K. Yang, X. Wei, Y. Ding, L. Hu, and G. Xu, “A heuristic clustering-based task deployment approach for load balancing using Bayes theorem in cloud environment,” IEEE Trans. Parallel Distrib. Syst., vol. 27, no. 2, pp. 305–316, Feb. 2016.
[24]
D. W. Otter, J. R. Medina, and J. K. Kalita, “A survey of the usages of deep learning for natural language processing,” IEEE Trans. Neural Netw. Learn. Syst., vol. 32, no. 2, pp. 604–624, Feb. 2021.
[25]
A. Shahroudnejad, “A survey on understanding, visualizations, and explanation of deep neural networks,” 2021, arXiv:2102.01792.
[26]
J. A. Alslieet al., “Áika: A distributed edge system for AI inference,” Big Data Cogn. Comput., vol. 6, no. 2, p. 68, 2022.
[27]
S. Zhang, W. Li, C. Wang, Z. Tari, and A. Y. Zomaya, “DyBatch: Efficient batching and fair scheduling for deep learning inference on time-sharing devices,” in Proc. 20th IEEE/ACM Int. Symp. Clust., Cloud Internet Comput., May 2020, pp. 609–618.
[28]
Y. Wang, G.-Y. Wei, and D. Brooks, “A systematic methodology for analysis of deep learning hardware and software platforms,” in Proc. Mach. Learn. Syst., Austin, TX, USA, Mar. 2020, pp. 1–14.
[29]
L. Yanget al., “Co-exploration of neural architectures and heterogeneous ASIC accelerator designs targeting multiple tasks,” in Proc. 57th ACM/IEEE Design Autom. Conf., San Francisco, CA, USA, Jul. 2020, pp. 1–6.
[30]
J. Adler and I. Parmryd, “Quantifying colocalization by correlation: The Pearson correlation coefficient is superior to the Mander’s overlap coefficient,” Cytometry A, vol. 77A, no. 8, pp. 733–742, 2010.
[31]
A. Vaswaniet al. “Attention is all you need,” in Proc. Adv. Neural Inf. Process. Syst., vol. 30. Long Beach, CA, USA, Dec. 2017, pp. 5998–6008.
[32]
M. Dominic and B. N. Jain. “Conditions for on-line scheduling of hard real-time tasks on multiprocessors,” J. Parallel Distrib. Comput., vol. 55, no. 1, pp. 121–137, 1998.
[33]
N. Mansouri, B. M. H. Zade, and M. M. Javidi, “Hybrid task scheduling strategy for cloud computing by modified particle swarm optimization and fuzzy theory,” Comput. Ind. Eng., vol. 130, pp. 597–633, Apr. 2019.
[34]
L. Dudziak, T. Chau, M. S. Abdelfattah, R. Lee, H. Kim, and N. D. Lane, “BRP-NAS: Prediction-based NAS using GCNs,” in Proc. Adv. Neural Inf. Process. Syst., vol. 33, 2020, pp. 10480–10490.
[35]
Y. Choi, Y. Kim, and M. Rhu, “LazyBatching: An SLA-aware batching system for cloud machine learning inference,” in Proc. IEEE Int. Symp. High-Perform. Comput. Archit., 2021, pp. 493–506.
[36]
C. Zhang, M. Yu, W. Wang, and F. Yan, “MArk: Exploiting cloud services for cost-effective, SLO-aware machine learning inference serving,” in Proc. USENIX Annu. Tech. Conf., Renton, WA, USA, Jul. 2019, pp. 1049–1062.
[37]
A. Samajdar, J. M. Joseph, Y. Zhu, P. Whatmough, M. Mattina, and T. Krishna, “A systematic methodology for characterizing scalability of DNN accelerators using SCALE-sim,” in Proc. IEEE Int. Symp. Perform. Anal. Syst. Softw., Boston, MA, USA, Aug. 2020, pp. 58–68.
[38]
Z. Jia, S. Lin, C. R. Qi, and A. Aiken, “Exploring hidden dimensions in accelerating convolutional neural networks,” in Proc. 35th Int. Conf. Mach. Learn., vol. 80, Jul. 2018, pp. 2274–2283.
[39]
M. Sbai, M. R. U. Saputra, N. Trigoni, and A. Markham, “Cut, distil and encode (CDE): Split cloud-edge deep inference,” in Proc. 18th Annu. IEEE Int. Conf. Sens., Commun., Netw., Rome, Italy, Jul. 2021, pp. 1–9.
[40]
B. Zamirai, S. Latifi, P. Zamirai, and S. A. Mahlke. “SIEVE: Speculative inference on the edge with versatile exportation,” in Proc. 57th ACM/IEEE Design Autom. Conf., San Francisco, CA, USA, 2020, pp. 1–6.
[41]
R. G. Pacheco, R. S. Couto, and O. Simeone, “Calibration-aided edge inference offloading via adaptive model partitioning of deep neural networks,” in Proc. Int. Conf. Commun., Montreal, QC, Canada, Jun. 2021, pp. 1–6.

Cited By

View all
  • (2024)Power-efficient full-duplex near user with power allocation and antenna selection in NOMA-based systemsEURASIP Journal on Wireless Communications and Networking10.1186/s13638-024-02391-32024:1Online publication date: 23-Aug-2024
  • (2024)Arch2End: Two-Stage Unified System-Level Modeling for Heterogeneous Intelligent DevicesIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2024.344370643:11(4154-4165)Online publication date: 1-Nov-2024

Index Terms

  1. Ace-Sniper: Cloud–Edge Collaborative Scheduling Framework With DNN Inference Latency Modeling on Heterogeneous Devices
              Index terms have been assigned to the content through auto-classification.

              Recommendations

              Comments

              Information & Contributors

              Information

              Published In

              cover image IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
              IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems  Volume 43, Issue 2
              Feb. 2024
              300 pages

              Publisher

              IEEE Press

              Publication History

              Published: 01 February 2024

              Qualifiers

              • Research-article

              Contributors

              Other Metrics

              Bibliometrics & Citations

              Bibliometrics

              Article Metrics

              • Downloads (Last 12 months)0
              • Downloads (Last 6 weeks)0
              Reflects downloads up to 02 Feb 2025

              Other Metrics

              Citations

              Cited By

              View all
              • (2024)Power-efficient full-duplex near user with power allocation and antenna selection in NOMA-based systemsEURASIP Journal on Wireless Communications and Networking10.1186/s13638-024-02391-32024:1Online publication date: 23-Aug-2024
              • (2024)Arch2End: Two-Stage Unified System-Level Modeling for Heterogeneous Intelligent DevicesIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2024.344370643:11(4154-4165)Online publication date: 1-Nov-2024

              View Options

              View options

              Figures

              Tables

              Media

              Share

              Share

              Share this Publication link

              Share on social media