research-article

Semi-Automatic Annotation with Predicted Visual Saliency Maps for Object Recognition in Wearable Video

Authors:

J. Benois-Pineau,

M. S. Garcia-Vazquez,

L. A. Oropesa Morales,

A. A. Ramirez AcostaAuthors Info & Claims

WearMMe '17: Proceedings of the 2017 Workshop on Wearable MultiMedia

Pages 10 - 14

https://doi.org/10.1145/3080538.3080541

Published: 06 June 2017 Publication History

Abstract

Recognition of objects of a given category in visual content is one of the key problems in computer vision and multimedia. It is strongly needed in wearable video shooting for a wide range of important applications in society. Supervised learning approaches are proved to be the most efficient in this task. They require available ground truth for training models. It is specifically true for Deep Convolution Networks, but is also hold for other popular models such as SVM on visual signatures. Annotation of ground truth when drawing bounding boxes (BB) is a very tedious task requiring important human resource. The research in prediction of visual attention in images and videos has attained maturity, specifically in what concerns bottom-up visual attention modeling. Hence, instead of annotating the ground truth manually with BB we propose to use automatically predicted salient areas as object locators for annotation. Such a prediction of saliency is not perfect, nevertheless. Hence active contours models on saliency maps are used in order to isolate the most prominent areas covering the objects. The approach is tested in the framework of a well-studied supervised learning model by SVM with psycho-visual weighted Bag-of-Words. An egocentric GTEA dataset was used in the experiment. The difference in mAP (mean average precision) is less than 10 percent while the mean annotation time is 36% lower.

References

[1]

Babaee, M., Tsoukalas, S., Rigoll, G. and Datcu, M. 2015. Visualization-Based Active Learning for the Annotation of SAR Images. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing. 8, 10 (Oct. 2015), 4687--4698.

[2]

Borji, A. and Itti, L. 2013. State-of-the-Art in Visual Attention Modeling. IEEE Trans. Pattern Anal. Mach. Intell. 35, 1 (Jan. 2013), 185--207.

Digital Library

[3]

Boujut, H., Benois-Pineau, J. and Mégret, R. 2012. Fusion of Multiple Visual Cues for Visual Saliency Extraction from Wearable Camera Settings with Strong Motion. 12th European Conference on Computer Vision (ECCV 2012) (Firenze, Italy, Oct. 2012), 436--445.

Digital Library

[4]

Buso, V., González-Díaz, I. and Benois-Pineau, J. 2015. Goal-oriented top-down probabilistic visual attention model for recognition of manipulated objects in egocentric videos. Signal Processing: Image Communication. 39, Part B, (Nov. 2015), 418--431.

Digital Library

[5]

Chan, T.F., Sandberg, B.Y. and Vese, L.A. 2000. Active Contours without Edges for Vector-Valued Images. Journal of Visual Communication and Image Representation. 11, 2 (Jun. 2000), 130--141.

Digital Library

[6]

Chan, T.F. and Vese, L.A. 2001. Active contours without edges. IEEE Transactions on Image Processing. 10, 2 (Feb. 2001), 266--277.

Digital Library

[7]

Dammak, S.M., Jedidi, A. and Bouaziz, R. 2013. Automation and evaluation of the semantic annotation of Web resources. Internet Technology and Secured Transactions (ICITST), 2013 8th International Conference for (Dec. 2013), 443--448.

Digital Library

[8]

Fathi, A., Ren, X. and Rehg, J.M. 2011. Learning to recognize objects in egocentric activities. 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (Jun. 2011), 3281--3288.

Digital Library

[9]

Felzenszwalb, P., McAllester, D. and Ramanan, D. 2008. A discriminatively trained, multiscale, deformable part model. IEEE Conference on Computer Vision and Pattern Recognition, 2008. CVPR 2008 (Jun. 2008), 1--8.

[10]

Ferracani, A., Pezzatini, D., Bertini, M., Meucci, S. and Del Bimbo, A. 2015. A System for Video Recommendation Using Visual Saliency, Crowdsourced and Automatic Annotations. Proceedings of the 23rd ACM International Conference on Multimedia (New York, NY, USA, 2015), 757--758.

Digital Library

[11]

González Díaz, I., Buso, V., Benois-Pineau, J., Bourmaud, G. and Megret, R. 2013. Modeling Instrumental Activities of Daily Living in Egocentric Vision As Sequences of Active Objects and Context for Alzheimer Disease Research. Proceedings of the 1st ACM International Workshop on Multimedia Indexing and Information Retrieval for Healthcare (New York, NY, USA, 2013), 11--14.

Digital Library

[12]

Goyal, S. and Benjamin, P. 2014. Object Recognition Using Deep Neural Networks: A Survey. arXiv:1412.3684 {cs}. (Dec. 2014).

[13]

Ishihara, T., Kitani, K.M., Ma, W., Takagi, H. and Asakawa, 2015, Recognizing Hand-Object Interactions in Wearable camera videos, IEEE ICIP-2015, Proceedings of 2015 IEEE International Conference on Image Processing, 27--30 Sept, 2015, Quebec City, 1349--1353

[14]

Kass, M., Witkin, A. and Terzopoulos, D. 1988. Snakes: Active contour models. International Journal of Computer Vision. 1, 4 (Jan. 1988), 321--331.

[15]

Li, J. and Wang, J.Z. 2008. Real-Time Computerized Annotation of Pictures. IEEE Transactions on Pattern Analysis and Machine Intelligence. 30, 6 (Jun. 2008), 985--1002.

Digital Library

[16]

Miu, T., Missier, P. and Plötz, T. 2015. Bootstrapping Personalised Human Activity Recognition Models Using Online Active Learning. 2015 IEEE International Conference on Computer and Information Technology; Ubiquitous Computing and Communications; Dependable, Autonomic and Secure Computing; Pervasive Intelligence and Computing (CIT/IUCC/DASC/PICOM) (Oct. 2015), 1138--1147.

[17]

Mumford, D. and Shah, J. 1989. Optimal approximations by piecewise smooth functions and associated variational problems. Communications on Pure and Applied Mathematics. 42, 5 (Jul. 1989), 577--685.

[18]

Viola, P. and Jones, M. 2001. Rapid object detection using a boosted cascade of simple features. Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2001. CVPR 2001 (2001), I-511-I-518 vol.1.

[19]

Pérez de San Roman, Ph., Benois-Pineau, J., Domenger, J.-Ph., de Rugy, A., Paclet, F., Cataert, D. 2017. Saliency Driven Object recognition in egocentric videos with deep CNN: toward application in assistance to Neuroprostheses, 2017 Computer Vision and Image Understanding, Published online. url: http://www.sciencedirect.com/science/article/pii/S1077314217300462

Cited By

González-Díaz IBenois-Pineau JDomenger Jde Rugy AAizawa KLew MSatoh S(2018)Perceptually-guided Understanding of Egocentric Video ContentProceedings of the 2018 ACM on International Conference on Multimedia Retrieval10.1145/3206025.3206073(434-441)Online publication date: 5-Jun-2018
https://dl.acm.org/doi/10.1145/3206025.3206073

Index Terms

Semi-Automatic Annotation with Predicted Visual Saliency Maps for Object Recognition in Wearable Video
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
      2. Computer vision tasks
        Scene understanding

Recommendations

Geometrical cues in visual saliency models for active object recognition in egocentric videos

In the problem of "human sensing", videos recorded with wearable cameras give an "egocentric" view of the world, capturing details of human activities. In this paper we continue research on visual saliency for such kind of content with the goal of "...
Geometrical Cues in Visual Saliency Models for Active Object Recognition in Egocentric Videos
PIVP '14: Proceedings of the 1st International Workshop on Perception Inspired Video Processing

In the problem of "human sensing", videos recorded with wearable cameras give an "egocentric" view of the world, capturing details of human activities. In this paper we continue research on visual saliency for such kind of content with the goal of "...
Saliency for free: Saliency prediction as a side-effect of object recognition
Highlights
- We show that saliency maps can be obtained as a byproduct of image classification.
Abstract
Saliency is the perceptual capacity of our visual system to focus our attention (i.e. gaze) on relevant objects instead of the background. So far, computational methods for saliency estimation required the explicit generation of a ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

WearMMe '17: Proceedings of the 2017 Workshop on Wearable MultiMedia

June 2017

22 pages

ISBN:9781450350334

DOI:10.1145/3080538

General Chairs:
Stefano Alletto
University of Modena and Reggio Emilia
,
Federico Pernici
University of Florence
,
Yoichi Sato
The University of Tokyo

Copyright © 2017 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 06 June 2017

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

CONACYT

Conference

ICMR '17

Sponsor:

SIGMM

ICMR '17: International Conference on Multimedia Retrieval

June 6, 2017

Bucharest, Romania

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
108
Total Downloads

Downloads (Last 12 months)2
Downloads (Last 6 weeks)0

Reflects downloads up to 11 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

González-Díaz IBenois-Pineau JDomenger Jde Rugy AAizawa KLew MSatoh S(2018)Perceptually-guided Understanding of Egocentric Video ContentProceedings of the 2018 ACM on International Conference on Multimedia Retrieval10.1145/3206025.3206073(434-441)Online publication date: 5-Jun-2018
https://dl.acm.org/doi/10.1145/3206025.3206073

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten