short-paper

Open access

Benchmarking Video Object Detection Systems on Embedded Devices under Resource Contention

Authors:

Ran Xu,

Yin Li,

Somali ChaterjiAuthors Info & Claims

EMDL'21: Proceedings of the 5th International Workshop on Embedded and Mobile Deep Learning

Pages 19 - 24

https://doi.org/10.1145/3469116.3470010

Published: 24 June 2021 Publication History

PDF eReader

Abstract

Adaptive and efficient computer vision systems have been proposed to make computer vision tasks, e.g., object classification and object detection, optimized for embedded boards or mobile devices. These studies focus on optimizing the model (deep network) or system itself, by designing an efficient network architecture or adapting the network architecture at runtime using approximation knobs, such as image size, type of object tracker, head of the object detector (e.g., lighter-weight heads such as one-shot object detectors like YOLO over two-shot object detectors like FRCNN). In this work, we benchmark different video object detection protocols, including FastAdapt, with respect to accuracy, latency, and energy consumption on three different embedded boards that represent the leading edge mobile GPUs. Our set of protocols consists of Faster R-CNN, YOLOv3, SELSA, MEGA, and REPP. Further, we characterize their performance under different levels of resource contention, specifically GPU contention, as would arise due to co-located applications on these boards, contending with the video object detection task. Our key insights are that object detectors have to be coupled with trackers to keep up with the latency requirements (e.g., 30 fps). With this, FastAdapt achieves up to 76 fps on the most well-resourced NVIDIA Jetson-class board---the NVIDIA AGX Xavier. Second, adaptive protocols like FastAdapt, FRCNN, and YOLO (specifically our adaptive variants, FRCNN+ and YOLO+) work well under resource constraints. Among the latest video object detection heads, SELSA achieves the highest accuracy but at a latency of over 2 sec per frame. Our energy consumption experiments bring out that FastAdapt, adaptive FRCNN, and adaptive YOLO are best-in-class, relative to the non-adaptive protocols SELSA, MEGA, and REPP.

References

[1]

Yihong Chen, Yue Cao, Han Hu, and Liwei Wang. 2020. Memory enhanced global-local aggregation for video object detection. In CVPR. 10337--10346.

Abstract

References

Cited By

Index Terms

Recommendations

Virtuoso: Energy- and Latency-aware Streamlining of Streaming Videos on Systems-on-Chips

Machine Intelligence on Resource-Constrained IoT Devices: The Case of Thread Granularity Optimization for CNN Inference

Benchmarking OpenCL, OpenACC, OpenMP, and CUDA: Programming Productivity, Performance, and Energy Consumption

Comments

Information

Published In

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Funding Sources

Conference

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View options

PDF

eReader

Get Access

Login options

Full Access

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations