Abstract
Humans and many animals can selectively sample necessary part of the visual scene to carry out daily activities like foraging and finding prey or mates. Selective attention allows them to efficiently use the limited resources of the brain by deploying sensory apparatus to collect data believed to be pertinent to the organisms current situation. Robots operating in dynamic environments are similarly exposed to a wide variety of stimuli, which they must process with limited sensory and computational resources. Computational saliency models inspired by biological studies have previously been used in robotic applications, but these had limited capacity to deal with dynamic environments and have no capacity to reason about uncertainty when planning their sensor placement strategy. This paper generalises the traditional model of saliency by using a Kalman filter estimator to describe an agent’s understanding of the world. The resulting modelling of uncertainty allows the agents to adopt a richer set of strategies to deploy sensory apparatus than is possible with the winner-take-all mechanism of the traditional saliency model. This paper demonstrates the use of three utility functions that are used to encapsulate the perceptual state that is valued by the agent. Each utility function thereby produces a distinct sensory deployment behaviour.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
An alternative term used to describe the internal belief is the state estimates.
Here \(\varvec{w} \sim \mathcal {N}\left( \varvec{\mu },\varvec{\varSigma }\right) \) means that the random variable \(\varvec{w}\) follows the normal distribution that is completely defined by its mean (\(\varvec{\mu }\)) and variance (\(\varvec{\varSigma }\)) and is analytically described as
$$\begin{aligned} p\left( \varvec{g}\right) = \frac{1}{\left( 2\pi \right) {|\varvec{\varSigma }|}^{1/2}} e^{-\frac{1}{2}\left( \varvec{g}-\varvec{\mu }\right) ^{\mathsf {T}}\varvec{\varSigma }^{-1}\left( \varvec{g}-\varvec{\mu }\right) .} \end{aligned}$$
References
Bakhtari, A., & Benhabib, B. (2007). An active vision system for multitarget surveillance in dynamic environments. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, 37(1), 190–198.
Bollmann, M., Hoischen, R., & Mertsching, B. (1997). Integration of static and dynamic scene features guiding visual attention. Mustererkennung, 19, 483–490.
Borji, A., Sihite, D. N., & Itti, L. (2013). Quantitative analysis of human-model agreement in visual saliency modeling: A comparative study. IEEE Transactions on Image Processing, 22(1), 55–69.
Bylinskii, Z., Judd, T., Oliva, A., Torralba, A., & Durand, F. (2016). What do different evaluation metrics tell us about saliency models? arXiv preprint arXiv:1604.03605.
Corbetta, M., & Shulman, G. L. (2002). Control of goal-directed and stimulus-driven attention in the brain. Nature Reviews Neuroscience, 3(3), 201–215.
Courty, N., & Marchand, E. (2003). Visual perception based on salient features. In 2003 IEEE/RSJ international conference on intelligent robots and systems, 2003. (IROS 2003). Proceedings (Vol. 1, pp. 1024–1029). https://doi.org/10.1109/IROS.2003.1250762.
Davison, A. J., & Murray, D. W. (2002). Simultaneous localization and map-building using active vision. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(7), 865–880.
Dawkins, M. S., & Woodington, A. (2000). Pattern recognition and active vision in chickens. Nature, 403(6770), 652–655.
Dominey, P. F., & Arbib, M. A. (1992). A cortico-subcortical model for generation of spatially accurate sequential saccades. Cerebral Cortex, 2(2), 153–175.
Endres, D., Neumann, H., Kolesnik, M., & Giese, M. (2011). Hooligan detection: The effects of saliency and expert knowledge. In 4th international conference on imaging for crime detection and prevention 2011 (ICDP 2011) (pp. 1–6). IET.
Findlay, J. M., & Walker, R. (1999). A model of saccade generation based on parallel processing and competitive inhibition. Behavioral and Brain Sciences, 22(04), 661–674.
Forrester, A., Sobester, A., & Keane, A. (2008). Engineering design via surrogate modelling: A practical guide. Wiley. https://books.google.co.nz/books?id=ulMHmeMnRCcC.
Frintrop, S. (2011). Towards attentive robots. Paladyn, Journal of Behavioral Robotics, 2(2), 64–70.
Frintrop, S., & Jensfelt, P. (2008). Attentional landmarks and active gaze control for visual slam. IEEE Transactions on Robotics, 24(5), 1054–1065.
Guo, C., & Zhang, L. (2010). A novel multiresolution spatiotemporal saliency detection model and its applications in image and video compression. IEEE Transactions on Image Processing, 19(1), 185–198.
Hunter, R. S. (1958). Photoelectric color difference meter. Journal of the Optical Society of America, 48(12), 985–995. https://doi.org/10.1364/JOSA.48.000985, http://www.osapublishing.org/abstract.cfm?URI=josa-48-12-985.
Itti, L. (2004). Automatic foveation for video compression using a neurobiological model of visual attention. IEEE Transactions on Image Processing, 13(10), 1304–1318.
Itti, L., & Baldi, P. (2005). A principled approach to detecting surprising events in video. In IEEE computer society conference on computer vision and pattern recognition, 2005. CVPR 2005 (Vol. 1, pp. 631–637). IEEE.
Itti, L., Carmi, R. (2009). Eye-tracking data from human volunteers watching complex video stimuli. crcns.org. http://dx.doi.org/10.6080/K0TD9V7F.
Itti, L., & Koch, C. (2000). A saliency-based search mechanism for overt and covert shifts of visual attention. Vision Research, 40(10), 1489–1506.
Itti, L., Koch, C., & Niebur, E. (1998). A model of saliency-based visual attention for rapid scene analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(11), 1254–1259.
James, W. (1890). The principles of psychology. New York: H. Holt and Company.
Jones, D. R., Schonlau, M., & Welch, W. J. (1998). Efficient global optimization of expensive black-box functions. Journal of Global optimization, 13(4), 455–492.
Kailath, T. (1980). Linear systems (Vol. 156). New York, NJ: Prentice-Hall Englewood Cliffs.
Kalman, R. E. (1960). A new approach to linear filtering and prediction problems. Journal of Fluids Engineering, 82(1), 35–45.
Koch, C., & Ullman, S. (1985). Shifts in selective visual attention: Towards the underlying neural circuitry. Human Neurobiology, 4(4), 219–27.
Le Meur, O., Le Callet, P., Barba, D., et al. (2007). A spatio-temporal model to predict visual fixation: Description and assessment. Vision Research, 47(19), 2483–2498.
Le Meur, O., Thoreau, D., Le Callet, P., & Barba, D. (2005). A spatio-temporal model of the selective human visual attention. In IEEE international conference on image processing, 2005. ICIP 2005 (Vol. 3, pp. 3–1188). IEEE.
Li, Z., Qin, S., & Itti, L. (2011). Visual attention guided bit allocation in video compression. Image and Vision Computing, 29(1), 1–14.
Li, J., Xia, C., Song, Y., Fang, S., & Chen, X. (2015). A data-driven metric for comprehensive evaluation of saliency models. In Proceedings of the IEEE international conference on computer vision (pp. 190–198).
Marat, S., Ho Phuoc, T., Granjon, L., Guyader, N., Pellerin, D., & Guérin-Dugué, A. (2009). Modelling spatio-temporal saliency to predict gaze direction for short videos. International Journal of Computer Vision, 82(3), 231–243.
Masaki, I. (1992). Vision-based vehicle guidance. In Proceedings of the 1992 international conference on industrial electronics, control, instrumentation, and automation, 1992. Power electronics and motion control (pp. 862–867). IEEE.
Mateescu, V. A., Hadizadeh, H., & Bajić, I. V. (2012). Evaluation of several visual saliency models in terms of gaze prediction accuracy on video. In Globecom workshops (GC Wkshps), 2012 IEEE (pp. 1304–1308). IEEE.
Najemnik, J., & Geisler, W. S. (2005). Optimal eye movement strategies in visual search. Nature, 434(7031), 387.
Nothdurft, H. C. (2000). Salience from feature contrast: Variations with texture density. Vision Research, 40(23), 3181–3200.
Posner, M. I., & Cohen, Y. (1984). Attention and performance X: Control of language processes. Components of Visual Orienting, 32, 531–556.
Roy, S., & Mitra, P. (2016). Visual saliency detection: A kalman filter based approach. arXiv preprint arXiv:1604.04825.
Treisman, A. M., & Gelade, G. (1980). A feature-integration theory of attention. Cognitive Psychology, 12(1), 97–136.
Tsotsos, J., Culhane, S., Kei Wai, W., Lai, Y., Davis, N., & Nuflo, F. (1995). Modeling visual attention via selective tuning. Artificial Intelligence, 78(1), 507–545.
Walsh, V., & Butler, S. (1996). Different ways of looking at seeing. Behavioural Brain Research, 76, 1–3.
Walther, D., & Koch, C. (2006). Modeling attention to salient proto-objects. Neural Networks, 19(9), 1395–1407.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Bhakta, A., Hollitt, C., Browne, W.N. et al. Utility function generated saccade strategies for robot active vision: a probabilistic approach. Auton Robot 43, 947–966 (2019). https://doi.org/10.1007/s10514-018-9752-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10514-018-9752-3