Continuous Viewpoint Planning in Conjunction with Dynamic Exploration for Active Object Recognition
Abstract
:1. Introduction
- A novel continuous viewpoint planning method for active object recognition based on proximal policy optimization is proposed to deal with the problem of quantization error of discrete viewpoint planning methods;
- An adaptive entropy regularization based dynamic exploration scheme is presented to automatically adjust viewpoint exploration in the learning process;
- Experiments are carried out on the public dataset GERMS, and the proposed method obtains rather promising results.
2. Related Work
3. Problem Statement
4. Proposed Method
4.1. Belief Fusion for State Representation
4.2. Continuous VP Policy Network Combined with Dynamic Exploration
4.3. Reward Setting
4.4. Training the Policy Network
Algorithm 1: Training the continuous VP policy network |
5. Experiments
5.1. Experimental Setup
5.2. Ablation Study
5.3. Dynamic Exploration Study
5.4. Comparison with the State-of-the-Art Methods
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Pal, S.K.; Pramanik, A.; Maiti, J.; Mitra, P. Deep learning in multi-object detection and tracking: State of the art. Appl. Intell. 2021, 51, 6400–6429. [Google Scholar] [CrossRef]
- Jayaraman, D.; Grauman, K. End-to-End Policy Learning for Active Visual Categorization. IEEE Trans. Pattern Anal. Mach. Intell. 2019, 41, 1601–1614. [Google Scholar] [CrossRef]
- Patten, T.; Zillich, M.; Fitch, R.; Vincze, M.; Sukkarieh, S. Viewpoint evaluation for online 3-D active object classification. IEEE Robot. Autom. Lett. 2015, 1, 73–81. [Google Scholar] [CrossRef]
- Potthast, C.; Breitenmoser, A.; Sha, F.; Sukhatme, G.S. Active multi-view object recognition: A unifying view on online feature selection and view planning. Robot. Auton. Syst. 2016, 84, 31–47. [Google Scholar] [CrossRef]
- Wu, K.; Ranasinghe, R.; Dissanayake, G. Active recognition and pose estimation of household objects in clutter. In Proceedings of the 2015 IEEE International Conference on Robotics and Automation (ICRA), Seattle, WA, USA, 26–30 May 2015; pp. 4230–4237. [Google Scholar]
- Andreopoulos, A.; Tsotsos, J.K. 50 Years of object recognition: Directions forward. Comput. Vis. Image Underst. 2013, 117, 827–891. [Google Scholar] [CrossRef]
- Zeng, R.; Wen, Y.; Zhao, W.; Liu, Y.J. View planning in robot active vision: A survey of systems, algorithms, and applications. Comput. Vis. Media 2020, 6, 225–245. [Google Scholar] [CrossRef]
- Becerra, I.; Valentin-Coronado, L.M.; Murrieta-Cid, R.; Latombe, J.C. Appearance-based motion strategies for object detection. In Proceedings of the 2014 IEEE International Conference on Robotics and Automation (ICRA), Hong Kong, China, 31 May–7 June 2014; pp. 6455–6461. [Google Scholar]
- Deinzer, F.; Denzler, J.; Derichs, C.; Niemann, H. Aspects of optimal viewpoint selection and viewpoint fusion. In Proceedings of the Asian Conference on Computer Vision, Hyderabad, India, 13–16 January 2006; pp. 902–912. [Google Scholar]
- Liu, H.; Li, F.; Xu, X.; Sun, F. Active object recognition using hierarchical local-receptive-field-based extreme learning machine. Memetic Comput. 2018, 10, 233–241. [Google Scholar] [CrossRef]
- Malmir, M.; Cottrell, G.W. Belief tree search for active object recognition. In Proceedings of the 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Vancouver, BC, Canada, 24–28 September 2017; pp. 4276–4283. [Google Scholar]
- Malmir, M.; Sikka, K.; Forster, D.; Movellan, J.R.; Cottrell, G. Deep Q-Learning for Active Recognition of GERMS: Baseline Performance on a Standardized Dataset for Active Learning. In Proceedings of the British Machine Vision Conference (BMVC), Swansea, UK, 7–10 September 2015; pp. 161.1–161.11. [Google Scholar]
- Schulman, J.; Wolski, F.; Dhariwal, P.; Radford, A.; Klimov, O. Proximal policy optimization algorithms. arXiv 2017, arXiv:1707.06347. [Google Scholar]
- Hämäläinen, P.; Babadi, A.; Ma, X.; Lehtinen, J. PPO-CMA: Proximal policy optimization with covariance matrix adaptation. In Proceedings of the 2020 IEEE 30th International Workshop on Machine Learning for Signal Processing (MLSP), Espoo, Finland, 21–24 September 2020; pp. 1–6. [Google Scholar]
- Mnih, V.; Badia, A.P.; Mirza, M.; Graves, A.; Lillicrap, T.; Harley, T.; Silver, D.; Kavukcuoglu, K. Asynchronous methods for deep reinforcement learning. In Proceedings of the International Conference on Machine Learning, New York, NY, USA, 19–24 June 2016; pp. 1928–1937. [Google Scholar]
- Liu, H.; Wu, Y.; Sun, F. Extreme trust region policy optimization for active object recognition. IEEE Trans. Neural Netw. Learn. Syst. 2018, 29, 2253–2258. [Google Scholar] [CrossRef] [PubMed]
- Schulman, J.; Levine, S.; Abbeel, P.; Jordan, M.; Moritz, P. Trust region policy optimization. In Proceedings of the International Conference on Machine Learning, Lille, France, 6–11 July 2015; pp. 1889–1897. [Google Scholar]
- Huang, G.B.; Zhu, Q.Y.; Siew, C.K. Extreme learning machine: Theory and applications. Neurocomputing 2006, 70, 489–501. [Google Scholar] [CrossRef]
- Gangapurwala, S.; Mitchell, A.; Havoutis, I. Guided constrained policy optimization for dynamic quadrupedal robot locomotion. IEEE Robot. Autom. Lett. 2020, 5, 3642–3649. [Google Scholar] [CrossRef] [Green Version]
- Guan, Y.; Ren, Y.; Li, S.E.; Sun, Q.; Luo, L.; Li, K. Centralized cooperation for connected and automated vehicles at intersections by proximal policy optimization. IEEE Trans. Veh. Technol. 2020, 69, 12597–12608. [Google Scholar] [CrossRef]
- Zhang, L.; Zhang, Y.; Zhao, X.; Zou, Z. Image captioning via proximal policy optimization. Image Vis. Comput. 2021, 108, 104126. [Google Scholar] [CrossRef]
- Ying, C.S.; Chow, A.H.; Wang, Y.H.; Chin, K.S. Adaptive Metro Service Schedule and Train Composition with a Proximal Policy Optimization Approach Based on Deep Reinforcement Learning. IEEE Trans. Intell. Transp. Syst. 2021, 6, 1–12. [Google Scholar] [CrossRef]
- August, M.; Hernández-Lobato, J.M. Taking gradients through experiments: LSTMs and memory proximal policy optimization for black-box quantum control. In Proceedings of the International Conference on High Performance Computing, Frankfurt, Germany, 24–28 June 2018; pp. 591–613. [Google Scholar]
- Vanvuchelen, N.; Gijsbrechts, J.; Boute, R. Use of Proximal Policy Optimization for the Joint Replenishment Problem. Comput. Ind. 2020, 119, 103239. [Google Scholar] [CrossRef]
- Paletta, L.; Pinz, A. Active object recognition by view integration and reinforcement learning. Robot. Auton. Syst. 2000, 31, 71–86. [Google Scholar] [CrossRef]
- Zhao, D.; Chen, Y.; Lv, L. Deep reinforcement learning with visual attention for vehicle classification. IEEE Trans. Cogn. Dev. Syst. 2016, 9, 356–367. [Google Scholar] [CrossRef]
- Liu, H.; Sun, F.; Zhang, X. Robotic material perception using active multimodal fusion. IEEE Trans. Ind. Electron. 2018, 66, 9878–9886. [Google Scholar] [CrossRef]
- Hammersley, J. Monte Carlo Methods; Springer Science and Business Media: Berlin/Heidelberg, Germany, 2013. [Google Scholar]
- Schulman, J.; Moritz, P.; Levine, S.; Jordan, M.; Abbeel, P. High-dimensional continuous control using generalized advantage estimation. arXiv 2015, arXiv:1506.02438. [Google Scholar]
- Abadi, M.; Barham, P.; Chen, J.; Chen, Z.; Davis, A.; Dean, J.; Devin, M.; Ghemawat, S.; Irving, G.; Isard, M.; et al. Tensorflow: A System for Large-Scale Machine Learning. In Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation, Savannah, GA, USA, 2–4 November 2016; pp. 265–283. [Google Scholar]
- Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
Number of Tracks | Images/Track | Total Number of Images | |
---|---|---|---|
Train | 816 | 157 ± 12 | 76,722 |
Test | 549 | 145 ± 19 | 51,561 |
Abbreviation | Interpretation |
---|---|
BL | Baseline PPO framework [13] with a fixed exploration scheme (i.e., the standard deviation is a constant) |
SSDN | Separate standard deviation network |
ER | Entropy regularization (with a fixed coefficient) |
AERC | Adaptive entropy regularization coefficient |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Sun, H.; Zhu, F.; Kong, Y.; Wang, J.; Zhao, P. Continuous Viewpoint Planning in Conjunction with Dynamic Exploration for Active Object Recognition. Entropy 2021, 23, 1702. https://doi.org/10.3390/e23121702
Sun H, Zhu F, Kong Y, Wang J, Zhao P. Continuous Viewpoint Planning in Conjunction with Dynamic Exploration for Active Object Recognition. Entropy. 2021; 23(12):1702. https://doi.org/10.3390/e23121702
Chicago/Turabian StyleSun, Haibo, Feng Zhu, Yanzi Kong, Jianyu Wang, and Pengfei Zhao. 2021. "Continuous Viewpoint Planning in Conjunction with Dynamic Exploration for Active Object Recognition" Entropy 23, no. 12: 1702. https://doi.org/10.3390/e23121702
APA StyleSun, H., Zhu, F., Kong, Y., Wang, J., & Zhao, P. (2021). Continuous Viewpoint Planning in Conjunction with Dynamic Exploration for Active Object Recognition. Entropy, 23(12), 1702. https://doi.org/10.3390/e23121702