Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Harvesting Visual Objects from Internet Images via Deep-Learning-Based Objectness Assessment

Published: 08 August 2019 Publication History

Abstract

The collection of internet images has been growing in an astonishing speed. It is undoubted that these images contain rich visual information that can be useful in many applications, such as visual media creation and data-driven image synthesis. In this article, we focus on the methodologies for building a visual object database from a collection of internet images. Such database is built to contain a large number of high-quality visual objects that can help with various data-driven image applications. Our method is based on dense proposal generation and objectness-based re-ranking. A novel deep convolutional neural network is designed for the inference of proposal objectness, the probability of a proposal containing optimally located foreground object. In our work, the objectness is quantitatively measured in regard of completeness and fullness, reflecting two complementary features of an optimal proposal: a complete foreground and relatively small background. Our experiments indicate that object proposals re-ranked according to the output of our network generally achieve higher performance than those produced by other state-of-the-art methods. As a concrete example, a database of over 1.2 million visual objects has been built using the proposed method, and has been successfully used in various data-driven image applications.

Supplementary Material

wu (wu.zip)
Supplemental movie and image files for, Harvesting Visual Objects from Internet Images via Deep-Learning-Based Objectness Assessment

References

[1]
{n.d.}. https://www.instagram.com. Accessed: 2018-May-20.
[2]
{n.d.}. https://www.flickr.com. Accessed: 2018-May-20.
[3]
{n.d.}. https://www.facebook.com. Accessed: 2018-May-20.
[4]
Bogdan Alexe, Thomas Deselaers, and Vittorio Ferrari. 2012. Measuring the objectness of image windows. IEEE Transactions on Pattern Analysis and Machine Intelligence 34, 11 (2012), 2189--2202.
[5]
Pablo Arbeláez, Jordi Pont-Tuset, Jonathan T. Barron, Ferran Marques, and Jitendra Malik. 2014. Multiscale combinatorial grouping. In Computer Vision and Pattern Recognition.
[6]
Neelima Chavali, Harsh Agrawal, Aroma Mahendru, and Dhruv Batra. 2016. Object-proposal evaluation protocol is ‘gameable’. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[7]
Qifeng Chen, Dingzeyu Li, and Chi-Keung Tang. 2013. KNN matting. IEEE Transactions on Pattern Analysis and Machine Intelligence 35, 9 (2013), 2175--2188.
[8]
Tao Chen, Ming-Ming Cheng, Ping Tan, Ariel Shamir, and Shi-Min Hu. 2009. Sketch2Photo: Internet image montage. ACM Transactions on Graphics (TOG) 28, 5, Article 124 (2009), 10 pages.
[9]
Xinlei Chen, Abhinav Shrivastava, and Abhinav Gupta. 2013. NEIL: Extracting visual knowledge from web data. In 2013 IEEE International Conference on Computer Vision. 1409--1416.
[10]
Alex Yong-Sang Chia, Shaojie Zhuo, Raj Kumar Gupta, Yu-Wing Tai, Siu-Yeung Cho, Ping Tan, and Stephen Lin. 2011. Semantic colorization with internet images. In ACM Transactions on Graphics (TOG), Vol. 30. ACM, 156.
[11]
Minsu Cho, Suha Kwak, Cordelia Schmid, and Jean Ponce. 2015. Unsupervised object discovery and localization in the wild: Part-based matching with bottom-up region proposals. In 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 1201--1210.
[12]
Jifeng Dai, Kaiming He, Yi Li, Shaoqing Ren, and Jian Sun. 2016. Instance-sensitive fully convolutional networks. Proceedings of European Conference on Computer Vision (2016).
[13]
Santosh K. Divvala, Ali Farhadi, and Carlos Guestrin. 2014. Learning everything about anything: Webly-supervised visual concept learning. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[14]
Mark Everingham, Luc Van Gool, Christopher K. I. Williams, John Winn, and Andrew Zisserman. {n.d.}. The PASCAL Visual Object Classes Challenge 2007 (VOC2007) Results. http://www.pascal-network.org/challenges/VOC/voc2007/workshop/index.html.
[15]
Mark Everingham, Luc Van Gool, Christopher K. I. Williams, John Winn, and Andrew Zisserman. {n.d.}. The PASCAL Visual Object Classes Challenge 2012 (VOC2012) Results. http://www.pascal-network.org/challenges/VOC/voc2012/workshop/index.html.
[16]
Amir Ghodrati, Ali Diba, Marco Pedersoli, Tinne Tuytelaars, and Luc Van Gool. 2015. Deep proposal: Hunting objects by cascading deep convolutional layers. In Proceedings of the IEEE International Conference on Computer Vision. 2578--2586.
[17]
Jingwei Guan, Shuai Yi, Xingyu Zeng, Wai-Kuen Cham, and Xiaogang Wang. 2017. Visual importance and distortion guided deep image quality assessment framework. IEEE Transactions on Multimedia 19, 11 (Nov. 2017), 2505--2520.
[18]
Hedi Harzallah, Frederick Jurie, and Cordelia Schmid. 2009. Combining efficient object localization and image classification. In 2009 IEEE 12th International Conference on Computer Vision. 237--244.
[19]
James Hays and Alexei A. Efros. 2007. Scene completion using millions of photographs. ACM Transactions on Graphics (SIGGRAPH 2007) 26, 3 (2007).
[20]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2014. Spatial pyramid pooling in deep convolutional networks for visual recognition. CoRR abs/1406.4729 (2014).
[21]
Sergey Ioffe and Christian Szegedy. 2015. Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167 (2015).
[22]
Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor Darrell. 2014. Caffe: Convolutional architecture for fast feature embedding. arXiv preprint arXiv:1408.5093 (2014).
[23]
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems. 1097--1105.
[24]
Weicheng Kuo, Bharath Hariharan, and Jitendra Malik. 2015. DeepBox: Learning objectness with convolutional networks. CoRR abs/1505.02146 (2015). http://arxiv.org/abs/1505.02146.
[25]
Jean-François Lalonde, Derek Hoiem, Alexei A. Efros, Carsten Rother, John Winn, and Antonio Criminisi. 2007. Photo clip art. ACM Transactions on Graphics (SIGGRAPH 2007) 26, 3 (2007), 3.
[26]
Guanbin Li, Yuan Xie, Liang Lin, and Yizhou Yu. 2017. Instance-level salient object segmentation. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[27]
Guanbin Li and Yizhou Yu. 2016. Visual saliency detection based on multiscale deep CNN features. IEEE Transactions on Image Processing 25, 11 (2016), 5012--5024.
[28]
Guanbin Li and Yizhou Yu. 2018. Contrast-oriented deep neural networks for salient object detection. IEEE Transactions on Neural Networks and Learning Systems 99 (2018), 1--14.
[29]
Nan Li, Yifang Xu, and Chao Wang. 2017. Quasi-homography warps in image stitching. IEEE Transactions on Multimedia PP, 99 (2017), 1--1.
[30]
Tsung-Yi Lin, Michael Maire, Serge J. Belongie, Lubomir D. Bourdev, Ross B. Girshick, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C. Lawrence Zitnick. 2014. Microsoft COCO: Common objects in context. CoRR abs/1405.0312 (2014). http://arxiv.org/abs/1405.0312.
[31]
Cewu Lu, Shu Liu, Jiaya Jia, and Chi-Keung Tang. 2015. Contour box: Rejecting object proposals without explicit closed contours. In The IEEE International Conference on Computer Vision (ICCV).
[32]
Wenting Lu, Jingxuan Li, Tao Li, Weidong Guo, Honggang Zhang, and Jun Guo. 2013. Web multimedia object classification using cross-domain correlation knowledge. IEEE Transactions on Multimedia 15, 8 (Dec 2013), 1920--1929.
[33]
Lei Ma, Hongliang Li, Fanman Meng, Qingbo Wu, and King Ngi Ngan. 2017. Learning efficient binary codes from high-level feature representations for multilabel image retrieval. IEEE Transactions on Multimedia 19, 11 (Nov 2017), 2545--2560.
[34]
Kevis-Kokitsi Maninis, Jordi Pont-Tuset, Pablo Arbeláez, and Luc Van Gool. 2016. Convolutional oriented boundaries. In European Conference on Computer Vision (ECCV).
[35]
Pedro O. Pinheiro, Ronan Collobert, and Piotr Dollár. 2015. Learning to segment object candidates. CoRR abs/1506.06204 (2015). http://arxiv.org/abs/1506.06204.
[36]
Pedro O. Pinheiro, Tsung-Yi Lin, Ronan Collobert, and Piotr Dollár. 2016. Learning to refine object segments. CoRR abs/1603.08695 (2016). http://arxiv.org/abs/1603.08695.
[37]
Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. 2015. You only look once: Unified, real-time object detection. arXiv preprint arXiv:1506.02640 (2015).
[38]
Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster R-CNN: Towards real-time object detection with region proposal networks. arXiv preprint arXiv:1506.01497 (2015).
[39]
Carsten Rother, Vladimir Kolmogorov, and Andrew Blake. 2004. “GrabCut”: Interactive foreground extraction using iterated graph cuts. ACM Trans. Graph. 23, 3 (2004), 309--314.
[40]
Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg, and Li Fei-Fei. 2015. ImageNet large scale visual recognition challenge. International Journal of Computer Vision (IJCV) 115, 3 (2015), 211--252.
[41]
Bryan C. Russell, Ricardo Martin-Brualla, Daniel J. Butler, Steven M. Seitz, and Luke Zettlemoyer. 2013. 3D Wikipedia: Using online text to automatically label and navigate reconstructed geometry. ACM Transactions on Graphics (SIGGRAPH Asia 2013) 32, 6 (2013).
[42]
Patsorn Sangkloy, Nathan Burnell, Cusuh Ham, and James Hays. 2016. The sketchy database: Learning to retrieve badly drawn bunnies. ACM Transactions on Graphics (TOG) 35, 4 (2016), 119.
[43]
Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. CoRR abs/1409.1556 (2014).
[44]
Noah Snavely, Steven M. Seitz, and Richard Szeliski. 2006. Photo tourism: Exploring photo collections in 3D. In SIGGRAPH Conference Proceedings. ACM, New York, 835--846.
[45]
Kevin Tang, Armand Joulin, Li-Jia Li, and Li Fei-Fei. 2014. Co-localization in real-world images. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[46]
Yi-Hsuan Tsai, Xiaohui Shen, Zhe Lin, Kalyan Sunkavalli, and Ming-Hsuan Yang. 2016. Sky is not the limit: Semantic-aware sky replacement. ACM Transactions on Graphics (Proc. SIGGRAPH) 35, 4 (2016).
[47]
J. R. R. Uijlings, K. E. A. van de Sande, T. Gevers, and A. W. M. Smeulders. 2013. Selective search for object recognition. International Journal of Computer Vision 104, 2 (2013), 154--171. https://ivi.fnwi.uva.nl/isis/publications/2013/UijlingsIJCV2013.
[48]
Ji Wan, Dayong Wang, Steven Chu Hong Hoi, Pengcheng Wu, Jianke Zhu, Yongdong Zhang, and Jintao Li. 2014. Deep learning for content-based image retrieval: A comprehensive study. In Proceedings of the 22nd ACM International Conference on Multimedia (MM’14). ACM, New York, 157--166.
[49]
Jiang Wang, Yang Song, Thomas Leung, Chuck Rosenberg, Jingbin Wang, James Philbin, Bo Chen, and Ying Wu. 2014. Learning fine-grained image similarity with deep ranking. CoRR abs/1404.4661 (2014). http://arxiv.org/abs/1404.4661.
[50]
Miao Wang, Yu-Kun Lai, Yuan Liang, Ralph R. Martin, and Shi-Min Hu. 2014. BiggerPicture: Data-driven image extrapolation using graph matching. ACM Transactions on Graphics (TOG) 33, 6 (2014), 173.
[51]
Wenying Wang, Dongming Zhang, Yongdong Zhang, Jintao Li, and Xiaoguang Gu. 2011. Robust spatial matching for object retrieval and its parallel implementation on GPU. IEEE Transactions on Multimedia 13, 6 (Dec 2011), 1308--1318.
[52]
Xiaoyu Wang, Ming Yang, Shenghuo Zhu, and Yuanqing Lin. 2013. Regionlets for generic object detection. In 2013 IEEE International Conference on Computer Vision. 17--24.
[53]
Kan Wu and Yizhou Yu. 2018. Automatic object extraction from images using deep neural networks and the level-set method. IET Image Processing (February 2018). http://digital-library.theiet.org/content/journals/10.1049/iet-ipr.2017.1144.
[54]
Jianxiong Xiao, James Hays, Krista A. Ehinger, Aude Oliva, and Antonio Torralba. 2010. Sun database: Large-scale scene recognition from abbey to zoo. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’10). IEEE, 3485--3492.
[55]
Yao Xiao, Cewu Lu, Efstratios Tsougenis, Yongyi Lu, and Chi-Keung Tang. 2015. Complexity-adaptive distance metric for object proposals generation. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[56]
Saining Xie and Zhuowen Tu. 2015. Holistically-nested edge detection. In Proceedings of IEEE International Conference on Computer Vision.
[57]
Linjun Yang, Bo Geng, Yang Cai, Alan Hanjalic, and Hua Xian-Sheng. 2011. Object retrieval using visual query context. IEEE Transactions on Multimedia 13, 6 (Dec 2011), 1295--1307.
[58]
Yazhou Yao, Jian Zhang, Fumin Shen, Xiansheng Hua, Jingsong Xu, and Zhenmin Tang. 2017. Exploiting web images for dataset construction: A domain robust approach. IEEE Transactions on Multimedia 19, 8 (Aug. 2017), 1771--1784.
[59]
Fang-Lue Zhang, Miao Wang, and Shi-Min Hu. 2013. Aesthetic image enhancement by dependence-aware object recomposition. IEEE Transactions on Multimedia 15, 7 (Nov. 2013), 1480--1490.
[60]
Huaizheng Zhang, Han Hu, Guanyu Gao, Yonggang Wen, and Kyle Guan. 2018. Deepqoe: A unified framework for learning to predict video QoE. In 2018 IEEE International Conference on Multimedia and Expo (ICME). 1--6.
[61]
Jianming Zhang, Shuga Ma, Mehrnoosh Sameki, Stan Sclaroff, Margrit Betke, Zhe Lin, Xiaohui Shen, Brian Price, and Radomír Měch. 2015. Salient object subitizing. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[62]
Jing Zhang, Ying Yang, Qi Tian, Li Zhuo, and Xin Liu. 2017. Personalized social image recommendation method based on user-image-tag model. IEEE Transactions on Multimedia 19, 11 (Nov 2017), 2439--2449.
[63]
Larry Zitnick and Piotr Dollar. 2014. Edge boxes: Locating object proposals from edges. In ECCV. https://www.microsoft.com/en-us/research/publication/edge-boxes-locating-object-proposals-from-edges/.

Cited By

View all
  • (2023)Image Captioning With Novel Topics Guidance and Retrieval-Based Topics Re-WeightingIEEE Transactions on Multimedia10.1109/TMM.2022.320269025(5984-5999)Online publication date: 1-Jan-2023
  • (2021)A Multi-instance Multi-label Dual Learning Approach for Video CaptioningACM Transactions on Multimedia Computing, Communications, and Applications10.1145/344679217:2s(1-18)Online publication date: 14-Jun-2021
  • (2020)Indoor positioning and wayfinding systems: a surveyHuman-centric Computing and Information Sciences10.1186/s13673-020-00222-010:1Online publication date: 2-May-2020

Index Terms

  1. Harvesting Visual Objects from Internet Images via Deep-Learning-Based Objectness Assessment

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Multimedia Computing, Communications, and Applications
    ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 15, Issue 3
    August 2019
    331 pages
    ISSN:1551-6857
    EISSN:1551-6865
    DOI:10.1145/3352586
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 08 August 2019
    Accepted: 01 March 2019
    Revised: 01 February 2019
    Received: 01 August 2018
    Published in TOMM Volume 15, Issue 3

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Object detection
    2. convolutional neural networks
    3. internet images
    4. object proposals
    5. objectness

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Funding Sources

    • EU H2020 project-AniAge
    • National Natural Science Foundation of China

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)7
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 15 Oct 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)Image Captioning With Novel Topics Guidance and Retrieval-Based Topics Re-WeightingIEEE Transactions on Multimedia10.1109/TMM.2022.320269025(5984-5999)Online publication date: 1-Jan-2023
    • (2021)A Multi-instance Multi-label Dual Learning Approach for Video CaptioningACM Transactions on Multimedia Computing, Communications, and Applications10.1145/344679217:2s(1-18)Online publication date: 14-Jun-2021
    • (2020)Indoor positioning and wayfinding systems: a surveyHuman-centric Computing and Information Sciences10.1186/s13673-020-00222-010:1Online publication date: 2-May-2020

    View Options

    Get Access

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media