Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

EvoLP: Self-Evolving Latency Predictor for Model Compression in Real-Time Edge Systems

Published: 02 October 2023 Publication History

Abstract

Edge devices are increasingly utilized for deploying deep learning applications on embedded systems. The real-time nature of many applications and the limited resources of edge devices necessitate latency-targeted neural network compression. However, measuring latency on real devices is challenging and expensive. Therefore, this letter presents a novel and efficient framework, named EvoLP, to accurately predict the inference latency of models on edge devices. This predictor can evolve to achieve higher latency prediction precision during the network compression process. Experimental results demonstrate that EvoLP outperforms previous state-of-the-art approaches by being evaluated on three edge devices and four model variants. Moreover, when incorporated into a model compression framework, it effectively guides the compression process for higher model accuracy while satisfying strict latency constraints. We open-source EvoLP at <uri>https://github.com/ntuliuteam/EvoLP</uri>.

References

[1]
S. Pouyanfaret al., “A survey on deep learning: Algorithms, techniques, and applications,” ACM Comput. Surv., vol. 51, no. 5, pp. 1–36, 2018.
[2]
X. Luo, D. Liu, H. Kong, S. Huai, H. Chen, and W. Liu, “SurgeNAS: A comprehensive surgery on hardware-aware differentiable neural architecture search,” IEEE Trans. Comput., vol. 72, no. 4, pp. 1081–1094, Apr. 2023.
[3]
T.-J. Yanget al., “NetAdapt: Platform-aware neural network adaptation for mobile applications,” in Proc. ECCV, Sep. 2018, pp. 289–304.
[4]
X. Daiet al., “ChamNet: Towards efficient network design through platform-aware model adaptation,” in Proc. CVPR, Jun. 2019, pp. 11390–11399.
[5]
A. Gholamiet al., “SqueezeNext: Hardware-aware neural network design,” in Proc. CVPR Workshop, Jun. 2018, pp. 1638–1647.
[6]
G. Li, S. K. Mandal, U. Y. Ogras, and R. Marculescu, “Flash: Fast neural architecture search with hardware optimization,” ACM Trans. Embedded Comput. Syst., vol. 20, pp. 1–26, Aug. 2021.
[7]
L. L. Zhanget al., “nn-Meter: Towards accurate latency prediction of deep-learning model inference on diverse edge devices,” in Proc. 19th MobiSys, 2021, pp. 81–93.
[8]
J. Liet al., “Towards an accurate latency model for convolutional neural network layers on GPUs,” in Proc. MILCOM, 2021, pp. 904–909.
[9]
M. Riedmiller and A. Lernen, Multi Layer Perceptron: Machine Learning Lab Special Lecture, Univ. Freiburg, Freiburg im Breisgau, Germany, pp. 7–24, 2014.
[10]
S. Huaiet al., “Latency-constrained DNN architecture learning for edge systems using zerorized batch normalization,” Future Gener. Comput Syst., vol. 142, pp. 314–327, May 2023.
[11]
A. Paszkeet al., “PyTorch: An imperative style, high-performance deep learning library,” in Proc. NeurIPS, vol. 32. 2019, pp. 1–12.
[12]
H. Konget al., “EDLAB: A benchmark for edge deep learning accelerators,” IEEE Des. Test, vol. 39, no. 3, pp. 8–17, Jun. 2022.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image IEEE Embedded Systems Letters
IEEE Embedded Systems Letters  Volume 16, Issue 2
June 2024
162 pages

Publisher

IEEE Press

Publication History

Published: 02 October 2023

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 13 Jan 2025

Other Metrics

Citations

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media