research-article

EvoLP: Self-Evolving Latency Predictor for Model Compression in Real-Time Edge Systems

Authors:

Weichen LiuAuthors Info & Claims

IEEE Embedded Systems Letters, Volume 16, Issue 2

Pages 174 - 177

https://doi.org/10.1109/LES.2023.3321599

Published: 02 October 2023 Publication History

Abstract

Edge devices are increasingly utilized for deploying deep learning applications on embedded systems. The real-time nature of many applications and the limited resources of edge devices necessitate latency-targeted neural network compression. However, measuring latency on real devices is challenging and expensive. Therefore, this letter presents a novel and efficient framework, named EvoLP, to accurately predict the inference latency of models on edge devices. This predictor can evolve to achieve higher latency prediction precision during the network compression process. Experimental results demonstrate that EvoLP outperforms previous state-of-the-art approaches by being evaluated on three edge devices and four model variants. Moreover, when incorporated into a model compression framework, it effectively guides the compression process for higher model accuracy while satisfying strict latency constraints. We open-source EvoLP at <uri>https://github.com/ntuliuteam/EvoLP</uri>.

References

[1]

S. Pouyanfaret al., “A survey on deep learning: Algorithms, techniques, and applications,” ACM Comput. Surv., vol. 51, no. 5, pp. 1–36, 2018.

Digital Library

Google Scholar

[2]

X. Luo, D. Liu, H. Kong, S. Huai, H. Chen, and W. Liu, “SurgeNAS: A comprehensive surgery on hardware-aware differentiable neural architecture search,” IEEE Trans. Comput., vol. 72, no. 4, pp. 1081–1094, Apr. 2023.

Digital Library

Google Scholar

[3]

T.-J. Yanget al., “NetAdapt: Platform-aware neural network adaptation for mobile applications,” in Proc. ECCV, Sep. 2018, pp. 289–304.

Google Scholar

[4]

X. Daiet al., “ChamNet: Towards efficient network design through platform-aware model adaptation,” in Proc. CVPR, Jun. 2019, pp. 11390–11399.

Google Scholar

[5]

A. Gholamiet al., “SqueezeNext: Hardware-aware neural network design,” in Proc. CVPR Workshop, Jun. 2018, pp. 1638–1647.

Google Scholar

[6]

G. Li, S. K. Mandal, U. Y. Ogras, and R. Marculescu, “Flash: Fast neural architecture search with hardware optimization,” ACM Trans. Embedded Comput. Syst., vol. 20, pp. 1–26, Aug. 2021.

Digital Library

Google Scholar

[7]

L. L. Zhanget al., “nn-Meter: Towards accurate latency prediction of deep-learning model inference on diverse edge devices,” in Proc. 19th MobiSys, 2021, pp. 81–93.

Google Scholar

[8]

J. Liet al., “Towards an accurate latency model for convolutional neural network layers on GPUs,” in Proc. MILCOM, 2021, pp. 904–909.

Google Scholar

[9]

M. Riedmiller and A. Lernen, Multi Layer Perceptron: Machine Learning Lab Special Lecture, Univ. Freiburg, Freiburg im Breisgau, Germany, pp. 7–24, 2014.

Google Scholar

[10]

S. Huaiet al., “Latency-constrained DNN architecture learning for edge systems using zerorized batch normalization,” Future Gener. Comput Syst., vol. 142, pp. 314–327, May 2023.

Digital Library

Google Scholar

[11]

A. Paszkeet al., “PyTorch: An imperative style, high-performance deep learning library,” in Proc. NeurIPS, vol. 32. 2019, pp. 1–12.

Google Scholar

[12]

H. Konget al., “EDLAB: A benchmark for edge deep learning accelerators,” IEEE Des. Test, vol. 39, no. 3, pp. 8–17, Jun. 2022.

Crossref

Google Scholar

Recommendations

Real-time lossless compression of mosaic video sequences
Special issue on multi-dimensional image processing

This paper presents a simple, fast coding technique for lossless compression of mosaic video data. The design of a video codec needs to strike a balance between the compression performance and the codec throughput. Aiming to make the encoding throughput ...
Second compression for pixelated images under edge-based compression algorithms: JPEG-LS as an example

This paper details the examination of a particular case of data compression, where the compression algorithm removes the redundancy from data, which occurs when edge-based compression algorithms compress (previously compressed) pixelated images. The newly ...
EVOLP: Tranformation-Based Semantics
Computational Logic in Multi-Agent Systems

Over the years, Logic Programming has proved to be a good and natural tool for expressing, querying and manipulating explicit knowledge in many areas of computer science. However, it is not so easy to use in dynamic environments. Evolving Logic Programs ...

Comments

Information & Contributors

Information

Published In

cover image IEEE Embedded Systems Letters

IEEE Embedded Systems Letters Volume 16, Issue 2

June 2024

162 pages

Issue’s Table of Contents

Publisher

IEEE Press

Publication History

Published: 02 October 2023

Qualifiers

Research-article

Recommendations

Real-time lossless compression of mosaic video sequences

Second compression for pixelated images under edge-based compression algorithms: JPEG-LS as an example

EVOLP: Tranformation-Based Semantics

Comments

Published In

Publisher

Publication History

Qualifiers

Other Metrics

Article Metrics

Other Metrics

Abstract

References

Recommendations

Real-time lossless compression of mosaic video sequences

Second compression for pixelated images under edge-based compression algorithms: JPEG-LS as an example

EVOLP: Tranformation-Based Semantics

Comments

Information

Published In

Publisher

Publication History

Qualifiers

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

View options

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations