Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3080538.3080541acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Semi-Automatic Annotation with Predicted Visual Saliency Maps for Object Recognition in Wearable Video

Published: 06 June 2017 Publication History

Abstract

Recognition of objects of a given category in visual content is one of the key problems in computer vision and multimedia. It is strongly needed in wearable video shooting for a wide range of important applications in society. Supervised learning approaches are proved to be the most efficient in this task. They require available ground truth for training models. It is specifically true for Deep Convolution Networks, but is also hold for other popular models such as SVM on visual signatures. Annotation of ground truth when drawing bounding boxes (BB) is a very tedious task requiring important human resource. The research in prediction of visual attention in images and videos has attained maturity, specifically in what concerns bottom-up visual attention modeling. Hence, instead of annotating the ground truth manually with BB we propose to use automatically predicted salient areas as object locators for annotation. Such a prediction of saliency is not perfect, nevertheless. Hence active contours models on saliency maps are used in order to isolate the most prominent areas covering the objects. The approach is tested in the framework of a well-studied supervised learning model by SVM with psycho-visual weighted Bag-of-Words. An egocentric GTEA dataset was used in the experiment. The difference in mAP (mean average precision) is less than 10 percent while the mean annotation time is 36% lower.

References

[1]
Babaee, M., Tsoukalas, S., Rigoll, G. and Datcu, M. 2015. Visualization-Based Active Learning for the Annotation of SAR Images. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing. 8, 10 (Oct. 2015), 4687--4698.
[2]
Borji, A. and Itti, L. 2013. State-of-the-Art in Visual Attention Modeling. IEEE Trans. Pattern Anal. Mach. Intell. 35, 1 (Jan. 2013), 185--207.
[3]
Boujut, H., Benois-Pineau, J. and Mégret, R. 2012. Fusion of Multiple Visual Cues for Visual Saliency Extraction from Wearable Camera Settings with Strong Motion. 12th European Conference on Computer Vision (ECCV 2012) (Firenze, Italy, Oct. 2012), 436--445.
[4]
Buso, V., González-Díaz, I. and Benois-Pineau, J. 2015. Goal-oriented top-down probabilistic visual attention model for recognition of manipulated objects in egocentric videos. Signal Processing: Image Communication. 39, Part B, (Nov. 2015), 418--431.
[5]
Chan, T.F., Sandberg, B.Y. and Vese, L.A. 2000. Active Contours without Edges for Vector-Valued Images. Journal of Visual Communication and Image Representation. 11, 2 (Jun. 2000), 130--141.
[6]
Chan, T.F. and Vese, L.A. 2001. Active contours without edges. IEEE Transactions on Image Processing. 10, 2 (Feb. 2001), 266--277.
[7]
Dammak, S.M., Jedidi, A. and Bouaziz, R. 2013. Automation and evaluation of the semantic annotation of Web resources. Internet Technology and Secured Transactions (ICITST), 2013 8th International Conference for (Dec. 2013), 443--448.
[8]
Fathi, A., Ren, X. and Rehg, J.M. 2011. Learning to recognize objects in egocentric activities. 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (Jun. 2011), 3281--3288.
[9]
Felzenszwalb, P., McAllester, D. and Ramanan, D. 2008. A discriminatively trained, multiscale, deformable part model. IEEE Conference on Computer Vision and Pattern Recognition, 2008. CVPR 2008 (Jun. 2008), 1--8.
[10]
Ferracani, A., Pezzatini, D., Bertini, M., Meucci, S. and Del Bimbo, A. 2015. A System for Video Recommendation Using Visual Saliency, Crowdsourced and Automatic Annotations. Proceedings of the 23rd ACM International Conference on Multimedia (New York, NY, USA, 2015), 757--758.
[11]
González Díaz, I., Buso, V., Benois-Pineau, J., Bourmaud, G. and Megret, R. 2013. Modeling Instrumental Activities of Daily Living in Egocentric Vision As Sequences of Active Objects and Context for Alzheimer Disease Research. Proceedings of the 1st ACM International Workshop on Multimedia Indexing and Information Retrieval for Healthcare (New York, NY, USA, 2013), 11--14.
[12]
Goyal, S. and Benjamin, P. 2014. Object Recognition Using Deep Neural Networks: A Survey. arXiv:1412.3684 {cs}. (Dec. 2014).
[13]
Ishihara, T., Kitani, K.M., Ma, W., Takagi, H. and Asakawa, 2015, Recognizing Hand-Object Interactions in Wearable camera videos, IEEE ICIP-2015, Proceedings of 2015 IEEE International Conference on Image Processing, 27--30 Sept, 2015, Quebec City, 1349--1353
[14]
Kass, M., Witkin, A. and Terzopoulos, D. 1988. Snakes: Active contour models. International Journal of Computer Vision. 1, 4 (Jan. 1988), 321--331.
[15]
Li, J. and Wang, J.Z. 2008. Real-Time Computerized Annotation of Pictures. IEEE Transactions on Pattern Analysis and Machine Intelligence. 30, 6 (Jun. 2008), 985--1002.
[16]
Miu, T., Missier, P. and Plötz, T. 2015. Bootstrapping Personalised Human Activity Recognition Models Using Online Active Learning. 2015 IEEE International Conference on Computer and Information Technology; Ubiquitous Computing and Communications; Dependable, Autonomic and Secure Computing; Pervasive Intelligence and Computing (CIT/IUCC/DASC/PICOM) (Oct. 2015), 1138--1147.
[17]
Mumford, D. and Shah, J. 1989. Optimal approximations by piecewise smooth functions and associated variational problems. Communications on Pure and Applied Mathematics. 42, 5 (Jul. 1989), 577--685.
[18]
Viola, P. and Jones, M. 2001. Rapid object detection using a boosted cascade of simple features. Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2001. CVPR 2001 (2001), I-511-I-518 vol.1.
[19]
Pérez de San Roman, Ph., Benois-Pineau, J., Domenger, J.-Ph., de Rugy, A., Paclet, F., Cataert, D. 2017. Saliency Driven Object recognition in egocentric videos with deep CNN: toward application in assistance to Neuroprostheses, 2017 Computer Vision and Image Understanding, Published online. url: http://www.sciencedirect.com/science/article/pii/S1077314217300462

Cited By

View all
  • (2018)Perceptually-guided Understanding of Egocentric Video ContentProceedings of the 2018 ACM on International Conference on Multimedia Retrieval10.1145/3206025.3206073(434-441)Online publication date: 5-Jun-2018

Index Terms

  1. Semi-Automatic Annotation with Predicted Visual Saliency Maps for Object Recognition in Wearable Video

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      WearMMe '17: Proceedings of the 2017 Workshop on Wearable MultiMedia
      June 2017
      22 pages
      ISBN:9781450350334
      DOI:10.1145/3080538
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 06 June 2017

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. active contour
      2. object recognition
      3. saliency maps
      4. visual object annotation

      Qualifiers

      • Research-article

      Funding Sources

      • CONACYT

      Conference

      ICMR '17
      Sponsor:

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)2
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 11 Feb 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2018)Perceptually-guided Understanding of Egocentric Video ContentProceedings of the 2018 ACM on International Conference on Multimedia Retrieval10.1145/3206025.3206073(434-441)Online publication date: 5-Jun-2018

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media