research-article

Open access

Eagle: End-to-end Deep Reinforcement Learning based Autonomous Control of PTZ Cameras

Authors:

Sandeep Singh Sandha,

Bharathan Balaji,

Mani SrivastavaAuthors Info & Claims

IoTDI '23: Proceedings of the 8th ACM/IEEE Conference on Internet of Things Design and Implementation

Pages 144 - 157

https://doi.org/10.1145/3576842.3582366

Published: 09 May 2023 Publication History

All formats PDF

Abstract

Existing approaches for autonomous control of pan-tilt-zoom (PTZ) cameras use multiple stages where object detection and localization are performed separately from the control of the PTZ mechanisms. These approaches require manual labels and suffer from performance bottlenecks due to error propagation across the multi-stage flow of information. The large size of object detection neural networks also makes prior solutions infeasible for real-time deployment in resource-constrained devices. We present an end-to-end deep reinforcement learning (RL) solution called Eagle1 to train a neural network policy that directly takes images as input to control the PTZ camera. Training reinforcement learning is cumbersome in the real world due to labeling effort, runtime environment stochasticity, and fragile experimental setups. We introduce a photo-realistic simulation framework for training and evaluation of PTZ camera control policies. Eagle achieves superior camera control performance by maintaining the object of interest close to the center of captured images at high resolution and has up to 17% more tracking duration than the state-of-the-art. Eagle policies are lightweight (90x fewer parameters than Yolo5s) and can run on embedded camera platforms such as Raspberry PI (33 FPS) and Jetson Nano (38 FPS), facilitating real-time PTZ tracking for resource-constrained environments. With domain randomization, Eagle policies trained in our simulator can be transferred directly to real-world scenarios2.

References

[1]

2022. Embedded vision for raspberry pi, jetson, Arduino and more. https://www.arducam.com/

[2]

2022. Yolo neural object detector. https://github.com/ultralytics/yolov5

[3]

Michael Balaban. 2021. Deep Learning Hardware Deep Dive – RTX 3090, RTX 3080, and RTX 3070. https://lambdalabs.com/blog/deep-learning-hardware-deep-dive-rtx-30xx/

[4]

Bharathan Balaji, Sunil Mallya, Sahika Genc, Saurabh Gupta, Leo Dirac, Vineet Khare, Gourav Roy, Tao Sun, Yunzhe Tao, Brian Townsend, 2020. Deepracer: Autonomous racing platform for experimentation with sim2real reinforcement learning. In 2020 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2746–2754.

[5]

Keni Bernardin, Florian Van De Camp, and Rainer Stiefelhagen. 2007. Automatic person detection and tracking using fuzzy controlled active cameras. In 2007 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 1–8.

[6]

Alex Bewley, Zongyuan Ge, Lionel Ott, Fabio Ramos, and Ben Upcroft. 2016. Simple online and realtime tracking. In 2016 IEEE International Conference on Image Processing (ICIP). 3464–3468. https://doi.org/10.1109/ICIP.2016.7533003

[7]

Niccolò Bisagno, Alberto Xamin, Francesco De Natale, Nicola Conci, and Bernhard Rinner. 2020. Dynamic Camera Reconfiguration with Reinforcement Learning and Stochastic Methods for Crowd Surveillance. Sensors 20, 17 (2020), 4691.

[8]

Gengjie Chen, Pierre-Luc St-Charles, Wassim Bouachir, Guillaume-Alexandre Bilodeau, and Robert Bergevin. 2015. Reproducible evaluation of pan-tilt-zoom tracking. In 2015 IEEE International Conference on Image Processing (ICIP). IEEE, 2055–2059.

Digital Library

[9]

Shengyong Chen, Youfu Li, and Ngai Ming Kwok. 2011. Active vision in robotic systems: A survey of recent developments. The International Journal of Robotics Research 30, 11 (2011), 1343–1377.

Digital Library

[10]

Charles Hamesse, Benoît Pairet, Rihab Lahouli, Timothée Fréville, and Rob Haelterman. 2021. Simulation of Pan-Tilt-Zoom Tracking for Augmented Reality Air Traffic Control. In 2021 International Conference on 3D Immersion (IC3D). IEEE, 1–5.

[11]

Samer Hanoun, James Zhang, Vu Le, Burhan Khan, Michael Johnstone, Michael Fielding, Asim Bhatti, Doug Creighton, and Saeid Nahavandi. 2017. A framework for designing active Pan-Tilt-Zoom (PTZ) camera networks for surveillance applications. In 2017 Annual IEEE International Systems Conference (SysCon). IEEE, 1–6.

[12]

Tyler Highlander and John Gallagher. 2019. Attention Neural Networks for Pan-Tilt-Zoom Control with Active Hand-Off. In 2019 7th International Conference on Robot Intelligence Technology and Applications (RiTA). IEEE, 130–135.

[13]

Dongchil Kim, Kyoungman Kim, and Sungjoo Park. 2019. Automatic PTZ camera control based on deep-Q network in video surveillance system. In 2019 International Conference on Electronics, Information, and Communication (ICEIC). IEEE, 1–3.

[14]

Christos Kyrkou. 2021. C3 Net: end-to-end deep learning for efficient real-time visual active camera control. Journal of Real-Time Image Processing (2021), 1–13.

[15]

Ezequiel López-Rubio, Miguel A Molina-Cabello, Francisco M Castro, Rafael M Luque-Baena, Manuel J Marín-Jiménez, and Nicolás Guil. 2021. Anomalous object detection by active search with PTZ cameras. Expert Systems with Applications 181 (2021), 115150.

[16]

Wenhan Luo, Peng Sun, Fangwei Zhong, Wei Liu, Tong Zhang, and Yizhou Wang. 2019. End-to-end active object tracking and its real-world deployment via reinforcement learning. IEEE transactions on pattern analysis and machine intelligence 42, 6 (2019), 1317–1332.

[17]

Christian Micheloni, Bernhard Rinner, and Gian Luca Foresti. 2010. Video analysis in pan-tilt-zoom camera networks. IEEE Signal Processing Magazine 27, 5 (2010), 78–90.

[18]

Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, 2015. Human-level control through deep reinforcement learning. nature 518, 7540 (2015), 529–533.

[19]

Nvidia Nvidia. 2021. Jetson Nano Developer Kit. https://developer.nvidia.com/embedded/jetson-nano-developer-kit

[20]

Raspberry Pi. 2022. Raspberry pi 4 model B. https://www.raspberrypi.com/products/raspberry-pi-4-model-b/

[21]

Pietro Salvagnini, Marco Cristani, Alessio Del Bue, and Vittorio Murino. 2011. An experimental framework for evaluating PTZ tracking algorithms. In International Conference on Computer Vision Systems. Springer, 81–90.

[22]

Sandeep Singh Sandha, Mohit Aggarwal, Igor Fedorov, and Mani Srivastava. 2020. Mango: A Python Library for Parallel Hyperparameter Tuning. In ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 3987–3991.

[23]

John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. 2017. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017).

[24]

Shital Shah, Debadeepta Dey, Chris Lovett, and Ashish Kapoor. 2018. Airsim: High-fidelity visual and physical simulation for autonomous vehicles. In Field and service robotics. Springer, 621–635.

[25]

Halil Utku Unlu, Phillip Stefan Niehaus, Daniel Chirita, Nikolaos Evangeliou, and Anthony Tzes. 2019. Deep learning-based visual tracking of UAVs using a PTZ camera system. In IECON 2019-45th Annual Conference of the IEEE Industrial Electronics Society, Vol. 1. IEEE, 638–644.

Digital Library

Cited By

Yang ZFang HLiu HLi JJiang YZhu M(2024)Active Visual Perception Enhancement Method Based on Deep Reinforcement LearningElectronics10.3390/electronics1309165413:9(1654)Online publication date: 25-Apr-2024
https://doi.org/10.3390/electronics13091654
Ravier RGaragić DGaloppo TRhodes BZulch P(2024)Multiagent Reinforcement Learning and Game-Theoretic Optimization for Autonomous Sensor Control2024 IEEE Aerospace Conference10.1109/AERO58975.2024.10521284(1-12)Online publication date: 2-Mar-2024
https://doi.org/10.1109/AERO58975.2024.10521284
Sharma PSrivastava M(2023)Impact of Delays and Computation Placement on Sense-Act Application Performance in IoTMILCOM 2023 - 2023 IEEE Military Communications Conference (MILCOM)10.1109/MILCOM58377.2023.10356219(133-138)Online publication date: 30-Oct-2023
https://doi.org/10.1109/MILCOM58377.2023.10356219

Index Terms

Eagle: End-to-end Deep Reinforcement Learning based Autonomous Control of PTZ Cameras
1. Computer systems organization
  1. Embedded and cyber-physical systems
    1. Sensors and actuators

Recommendations

Stereo vision using two PTZ cameras

The research of traditional stereo vision is mainly based on static cameras. As PTZ (Pan-Tilt-Zoom) cameras are able to obtain multi-view-angle and multi-resolution information, they have received more and more concern in both research and real ...
Cooperative object tracking using dual‐pan–tilt–zoom cameras based on planar ground assumption

Pan–tilt–zoom (PTZ) cameras play an important role in visual surveillance system. Dual‐PTZ camera system is the simplest and most typical one. The superiority of this system lies in that it can obtain both large‐view information and high‐resolution local‐...
Homography-based block motion estimation for video coding of PTZ cameras

We propose a homography-based search (HBS) algorithm for block motion estimation.We use optical flow tracking algorithm to obtain homography between two frames.Adaptive thresholds are adopted in our method to classify different kinds of blocks. Due to ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

IoTDI '23: Proceedings of the 8th ACM/IEEE Conference on Internet of Things Design and Implementation

May 2023

514 pages

ISBN:9798400700378

DOI:10.1145/3576842

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGBED: ACM Special Interest Group on Embedded Systems

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 May 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

Army Research Laboratory (ARL)
Air Force Office of Scientific Research (AFOSR)
Semiconductor Research Corporation (SRC) and DARPA
National Science Foundation (NSF)

Conference

IoTDI '23

Sponsor:

SIGBED

IoTDI '23: International Conference on Internet-of-Things Design and Implementation

May 9 - 12, 2023

TX, San Antonio, USA

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
306
Total Downloads

Downloads (Last 12 months)275
Downloads (Last 6 weeks)42

Reflects downloads up to 30 Aug 2024

Other Metrics

View Author Metrics

Citations

Cited By

Yang ZFang HLiu HLi JJiang YZhu M(2024)Active Visual Perception Enhancement Method Based on Deep Reinforcement LearningElectronics10.3390/electronics1309165413:9(1654)Online publication date: 25-Apr-2024
https://doi.org/10.3390/electronics13091654
Ravier RGaragić DGaloppo TRhodes BZulch P(2024)Multiagent Reinforcement Learning and Game-Theoretic Optimization for Autonomous Sensor Control2024 IEEE Aerospace Conference10.1109/AERO58975.2024.10521284(1-12)Online publication date: 2-Mar-2024
https://doi.org/10.1109/AERO58975.2024.10521284
Sharma PSrivastava M(2023)Impact of Delays and Computation Placement on Sense-Act Application Performance in IoTMILCOM 2023 - 2023 IEEE Military Communications Conference (MILCOM)10.1109/MILCOM58377.2023.10356219(133-138)Online publication date: 30-Oct-2023
https://doi.org/10.1109/MILCOM58377.2023.10356219

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents