research-article

Harvesting Visual Objects from Internet Images via Deep-Learning-Based Objectness Assessment

Authors:

Yizhou YuAuthors Info & Claims

ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), Volume 15, Issue 3

Article No.: 72, Pages 1 - 23

https://doi.org/10.1145/3318463

Published: 08 August 2019 Publication History

Abstract

The collection of internet images has been growing in an astonishing speed. It is undoubted that these images contain rich visual information that can be useful in many applications, such as visual media creation and data-driven image synthesis. In this article, we focus on the methodologies for building a visual object database from a collection of internet images. Such database is built to contain a large number of high-quality visual objects that can help with various data-driven image applications. Our method is based on dense proposal generation and objectness-based re-ranking. A novel deep convolutional neural network is designed for the inference of proposal objectness, the probability of a proposal containing optimally located foreground object. In our work, the objectness is quantitatively measured in regard of completeness and fullness, reflecting two complementary features of an optimal proposal: a complete foreground and relatively small background. Our experiments indicate that object proposals re-ranked according to the output of our network generally achieve higher performance than those produced by other state-of-the-art methods. As a concrete example, a database of over 1.2 million visual objects has been built using the proposed method, and has been successfully used in various data-driven image applications.

Supplementary Material

wu (wu.zip)

Supplemental movie and image files for, Harvesting Visual Objects from Internet Images via Deep-Learning-Based Objectness Assessment

Download
18.03 MB

References

[1]

{n.d.}. https://www.instagram.com. Accessed: 2018-May-20.

[2]

{n.d.}. https://www.flickr.com. Accessed: 2018-May-20.

[3]

{n.d.}. https://www.facebook.com. Accessed: 2018-May-20.

[4]

Bogdan Alexe, Thomas Deselaers, and Vittorio Ferrari. 2012. Measuring the objectness of image windows. IEEE Transactions on Pattern Analysis and Machine Intelligence 34, 11 (2012), 2189--2202.

Digital Library

[5]

Pablo Arbeláez, Jordi Pont-Tuset, Jonathan T. Barron, Ferran Marques, and Jitendra Malik. 2014. Multiscale combinatorial grouping. In Computer Vision and Pattern Recognition.

[6]

Neelima Chavali, Harsh Agrawal, Aroma Mahendru, and Dhruv Batra. 2016. Object-proposal evaluation protocol is ‘gameable’. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]

Qifeng Chen, Dingzeyu Li, and Chi-Keung Tang. 2013. KNN matting. IEEE Transactions on Pattern Analysis and Machine Intelligence 35, 9 (2013), 2175--2188.

Digital Library

[8]

Tao Chen, Ming-Ming Cheng, Ping Tan, Ariel Shamir, and Shi-Min Hu. 2009. Sketch2Photo: Internet image montage. ACM Transactions on Graphics (TOG) 28, 5, Article 124 (2009), 10 pages.

Digital Library

[9]

Xinlei Chen, Abhinav Shrivastava, and Abhinav Gupta. 2013. NEIL: Extracting visual knowledge from web data. In 2013 IEEE International Conference on Computer Vision. 1409--1416.

Digital Library

[10]

Alex Yong-Sang Chia, Shaojie Zhuo, Raj Kumar Gupta, Yu-Wing Tai, Siu-Yeung Cho, Ping Tan, and Stephen Lin. 2011. Semantic colorization with internet images. In ACM Transactions on Graphics (TOG), Vol. 30. ACM, 156.

Digital Library

[11]

Minsu Cho, Suha Kwak, Cordelia Schmid, and Jean Ponce. 2015. Unsupervised object discovery and localization in the wild: Part-based matching with bottom-up region proposals. In 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 1201--1210.

[12]

Jifeng Dai, Kaiming He, Yi Li, Shaoqing Ren, and Jian Sun. 2016. Instance-sensitive fully convolutional networks. Proceedings of European Conference on Computer Vision (2016).

[13]

Santosh K. Divvala, Ali Farhadi, and Carlos Guestrin. 2014. Learning everything about anything: Webly-supervised visual concept learning. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

Digital Library

[14]

Mark Everingham, Luc Van Gool, Christopher K. I. Williams, John Winn, and Andrew Zisserman. {n.d.}. The PASCAL Visual Object Classes Challenge 2007 (VOC2007) Results. http://www.pascal-network.org/challenges/VOC/voc2007/workshop/index.html.

[15]

Mark Everingham, Luc Van Gool, Christopher K. I. Williams, John Winn, and Andrew Zisserman. {n.d.}. The PASCAL Visual Object Classes Challenge 2012 (VOC2012) Results. http://www.pascal-network.org/challenges/VOC/voc2012/workshop/index.html.

[16]

Amir Ghodrati, Ali Diba, Marco Pedersoli, Tinne Tuytelaars, and Luc Van Gool. 2015. Deep proposal: Hunting objects by cascading deep convolutional layers. In Proceedings of the IEEE International Conference on Computer Vision. 2578--2586.

Digital Library

[17]

Jingwei Guan, Shuai Yi, Xingyu Zeng, Wai-Kuen Cham, and Xiaogang Wang. 2017. Visual importance and distortion guided deep image quality assessment framework. IEEE Transactions on Multimedia 19, 11 (Nov. 2017), 2505--2520.

[18]

Hedi Harzallah, Frederick Jurie, and Cordelia Schmid. 2009. Combining efficient object localization and image classification. In 2009 IEEE 12th International Conference on Computer Vision. 237--244.

[19]

James Hays and Alexei A. Efros. 2007. Scene completion using millions of photographs. ACM Transactions on Graphics (SIGGRAPH 2007) 26, 3 (2007).

Digital Library

[20]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2014. Spatial pyramid pooling in deep convolutional networks for visual recognition. CoRR abs/1406.4729 (2014).

[21]

Sergey Ioffe and Christian Szegedy. 2015. Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167 (2015).

Digital Library

[22]

Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor Darrell. 2014. Caffe: Convolutional architecture for fast feature embedding. arXiv preprint arXiv:1408.5093 (2014).

[23]

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems. 1097--1105.

Digital Library

[24]

Weicheng Kuo, Bharath Hariharan, and Jitendra Malik. 2015. DeepBox: Learning objectness with convolutional networks. CoRR abs/1505.02146 (2015). http://arxiv.org/abs/1505.02146.

Digital Library

[25]

Jean-François Lalonde, Derek Hoiem, Alexei A. Efros, Carsten Rother, John Winn, and Antonio Criminisi. 2007. Photo clip art. ACM Transactions on Graphics (SIGGRAPH 2007) 26, 3 (2007), 3.

Digital Library

[26]

Guanbin Li, Yuan Xie, Liang Lin, and Yizhou Yu. 2017. Instance-level salient object segmentation. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]

Guanbin Li and Yizhou Yu. 2016. Visual saliency detection based on multiscale deep CNN features. IEEE Transactions on Image Processing 25, 11 (2016), 5012--5024.

Digital Library

[28]

Guanbin Li and Yizhou Yu. 2018. Contrast-oriented deep neural networks for salient object detection. IEEE Transactions on Neural Networks and Learning Systems 99 (2018), 1--14.

[29]

Nan Li, Yifang Xu, and Chao Wang. 2017. Quasi-homography warps in image stitching. IEEE Transactions on Multimedia PP, 99 (2017), 1--1.

[30]

Tsung-Yi Lin, Michael Maire, Serge J. Belongie, Lubomir D. Bourdev, Ross B. Girshick, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C. Lawrence Zitnick. 2014. Microsoft COCO: Common objects in context. CoRR abs/1405.0312 (2014). http://arxiv.org/abs/1405.0312.

[31]

Cewu Lu, Shu Liu, Jiaya Jia, and Chi-Keung Tang. 2015. Contour box: Rejecting object proposals without explicit closed contours. In The IEEE International Conference on Computer Vision (ICCV).

Digital Library

[32]

Wenting Lu, Jingxuan Li, Tao Li, Weidong Guo, Honggang Zhang, and Jun Guo. 2013. Web multimedia object classification using cross-domain correlation knowledge. IEEE Transactions on Multimedia 15, 8 (Dec 2013), 1920--1929.

Digital Library

[33]

Lei Ma, Hongliang Li, Fanman Meng, Qingbo Wu, and King Ngi Ngan. 2017. Learning efficient binary codes from high-level feature representations for multilabel image retrieval. IEEE Transactions on Multimedia 19, 11 (Nov 2017), 2545--2560.

[34]

Kevis-Kokitsi Maninis, Jordi Pont-Tuset, Pablo Arbeláez, and Luc Van Gool. 2016. Convolutional oriented boundaries. In European Conference on Computer Vision (ECCV).

[35]

Pedro O. Pinheiro, Ronan Collobert, and Piotr Dollár. 2015. Learning to segment object candidates. CoRR abs/1506.06204 (2015). http://arxiv.org/abs/1506.06204.

[36]

Pedro O. Pinheiro, Tsung-Yi Lin, Ronan Collobert, and Piotr Dollár. 2016. Learning to refine object segments. CoRR abs/1603.08695 (2016). http://arxiv.org/abs/1603.08695.

[37]

Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. 2015. You only look once: Unified, real-time object detection. arXiv preprint arXiv:1506.02640 (2015).

[38]

Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster R-CNN: Towards real-time object detection with region proposal networks. arXiv preprint arXiv:1506.01497 (2015).

[39]

Carsten Rother, Vladimir Kolmogorov, and Andrew Blake. 2004. “GrabCut”: Interactive foreground extraction using iterated graph cuts. ACM Trans. Graph. 23, 3 (2004), 309--314.

Digital Library

[40]

Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg, and Li Fei-Fei. 2015. ImageNet large scale visual recognition challenge. International Journal of Computer Vision (IJCV) 115, 3 (2015), 211--252.

Digital Library

[41]

Bryan C. Russell, Ricardo Martin-Brualla, Daniel J. Butler, Steven M. Seitz, and Luke Zettlemoyer. 2013. 3D Wikipedia: Using online text to automatically label and navigate reconstructed geometry. ACM Transactions on Graphics (SIGGRAPH Asia 2013) 32, 6 (2013).

Digital Library

[42]

Patsorn Sangkloy, Nathan Burnell, Cusuh Ham, and James Hays. 2016. The sketchy database: Learning to retrieve badly drawn bunnies. ACM Transactions on Graphics (TOG) 35, 4 (2016), 119.

Digital Library

[43]

Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. CoRR abs/1409.1556 (2014).

[44]

Noah Snavely, Steven M. Seitz, and Richard Szeliski. 2006. Photo tourism: Exploring photo collections in 3D. In SIGGRAPH Conference Proceedings. ACM, New York, 835--846.

Digital Library

[45]

Kevin Tang, Armand Joulin, Li-Jia Li, and Li Fei-Fei. 2014. Co-localization in real-world images. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

Digital Library

[46]

Yi-Hsuan Tsai, Xiaohui Shen, Zhe Lin, Kalyan Sunkavalli, and Ming-Hsuan Yang. 2016. Sky is not the limit: Semantic-aware sky replacement. ACM Transactions on Graphics (Proc. SIGGRAPH) 35, 4 (2016).

Digital Library

[47]

J. R. R. Uijlings, K. E. A. van de Sande, T. Gevers, and A. W. M. Smeulders. 2013. Selective search for object recognition. International Journal of Computer Vision 104, 2 (2013), 154--171. https://ivi.fnwi.uva.nl/isis/publications/2013/UijlingsIJCV2013.

Digital Library

[48]

Ji Wan, Dayong Wang, Steven Chu Hong Hoi, Pengcheng Wu, Jianke Zhu, Yongdong Zhang, and Jintao Li. 2014. Deep learning for content-based image retrieval: A comprehensive study. In Proceedings of the 22nd ACM International Conference on Multimedia (MM’14). ACM, New York, 157--166.

Digital Library

[49]

Jiang Wang, Yang Song, Thomas Leung, Chuck Rosenberg, Jingbin Wang, James Philbin, Bo Chen, and Ying Wu. 2014. Learning fine-grained image similarity with deep ranking. CoRR abs/1404.4661 (2014). http://arxiv.org/abs/1404.4661.

Digital Library

[50]

Miao Wang, Yu-Kun Lai, Yuan Liang, Ralph R. Martin, and Shi-Min Hu. 2014. BiggerPicture: Data-driven image extrapolation using graph matching. ACM Transactions on Graphics (TOG) 33, 6 (2014), 173.

Digital Library

[51]

Wenying Wang, Dongming Zhang, Yongdong Zhang, Jintao Li, and Xiaoguang Gu. 2011. Robust spatial matching for object retrieval and its parallel implementation on GPU. IEEE Transactions on Multimedia 13, 6 (Dec 2011), 1308--1318.

Digital Library

[52]

Xiaoyu Wang, Ming Yang, Shenghuo Zhu, and Yuanqing Lin. 2013. Regionlets for generic object detection. In 2013 IEEE International Conference on Computer Vision. 17--24.

Digital Library

[53]

Kan Wu and Yizhou Yu. 2018. Automatic object extraction from images using deep neural networks and the level-set method. IET Image Processing (February 2018). http://digital-library.theiet.org/content/journals/10.1049/iet-ipr.2017.1144.

[54]

Jianxiong Xiao, James Hays, Krista A. Ehinger, Aude Oliva, and Antonio Torralba. 2010. Sun database: Large-scale scene recognition from abbey to zoo. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’10). IEEE, 3485--3492.

[55]

Yao Xiao, Cewu Lu, Efstratios Tsougenis, Yongyi Lu, and Chi-Keung Tang. 2015. Complexity-adaptive distance metric for object proposals generation. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[56]

Saining Xie and Zhuowen Tu. 2015. Holistically-nested edge detection. In Proceedings of IEEE International Conference on Computer Vision.

Digital Library

[57]

Linjun Yang, Bo Geng, Yang Cai, Alan Hanjalic, and Hua Xian-Sheng. 2011. Object retrieval using visual query context. IEEE Transactions on Multimedia 13, 6 (Dec 2011), 1295--1307.

Digital Library

[58]

Yazhou Yao, Jian Zhang, Fumin Shen, Xiansheng Hua, Jingsong Xu, and Zhenmin Tang. 2017. Exploiting web images for dataset construction: A domain robust approach. IEEE Transactions on Multimedia 19, 8 (Aug. 2017), 1771--1784.

Digital Library

[59]

Fang-Lue Zhang, Miao Wang, and Shi-Min Hu. 2013. Aesthetic image enhancement by dependence-aware object recomposition. IEEE Transactions on Multimedia 15, 7 (Nov. 2013), 1480--1490.

Digital Library

[60]

Huaizheng Zhang, Han Hu, Guanyu Gao, Yonggang Wen, and Kyle Guan. 2018. Deepqoe: A unified framework for learning to predict video QoE. In 2018 IEEE International Conference on Multimedia and Expo (ICME). 1--6.

[61]

Jianming Zhang, Shuga Ma, Mehrnoosh Sameki, Stan Sclaroff, Margrit Betke, Zhe Lin, Xiaohui Shen, Brian Price, and Radomír Měch. 2015. Salient object subitizing. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[62]

Jing Zhang, Ying Yang, Qi Tian, Li Zhuo, and Xin Liu. 2017. Personalized social image recommendation method based on user-image-tag model. IEEE Transactions on Multimedia 19, 11 (Nov 2017), 2439--2449.

[63]

Larry Zitnick and Piotr Dollar. 2014. Edge boxes: Locating object proposals from edges. In ECCV. https://www.microsoft.com/en-us/research/publication/edge-boxes-locating-object-proposals-from-edges/.

Cited By

Al-Qatf MWang XHawbani AAbdussalam AAlsamhi S(2023)Image Captioning With Novel Topics Guidance and Retrieval-Based Topics Re-WeightingIEEE Transactions on Multimedia10.1109/TMM.2022.320269025(5984-5999)Online publication date: 1-Jan-2023
https://dl.acm.org/doi/10.1109/TMM.2022.3202690
Ji WWang R(2021)A Multi-instance Multi-label Dual Learning Approach for Video CaptioningACM Transactions on Multimedia Computing, Communications, and Applications10.1145/344679217:2s(1-18)Online publication date: 14-Jun-2021
https://dl.acm.org/doi/10.1145/3446792
Kunhoth JKarkar AAl-Maadeed SAl-Ali A(2020)Indoor positioning and wayfinding systems: a surveyHuman-centric Computing and Information Sciences10.1186/s13673-020-00222-010:1Online publication date: 2-May-2020
https://dl.acm.org/doi/10.1186/s13673-020-00222-0

Index Terms

Harvesting Visual Objects from Internet Images via Deep-Learning-Based Objectness Assessment
1. Computing methodologies
  1. Computer graphics
    1. Image manipulation
      1. Image processing

Recommendations

Spatiotemporal salient object detection by integrating with objectness

This paper proposes a novel spatiotemporal salient object detection method by integrating saliency and objectness, for videos with complicated motion and complex scenes. The initial salient object detection result is first built upon both saliency map ...
Robust object proposals re-ranking for object detection in autonomous driving using convolutional neural networks

Object proposals have recently emerged as an essential cornerstone for object detection. The current state-of-the-art object detectors employ object proposals to detect objects within a modest set of candidate bounding box proposals instead of ...
Off-the-Shelf Deep Features for Saliency Detection
Abstract
Computational saliency refers to the ability to highlight the salient visual information for processing. The mechanism has proven to be helpful for human as well as computer vision. Computational saliency focuses on designing algorithms which, ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Multimedia Computing, Communications, and Applications

ACM Transactions on Multimedia Computing, Communications, and Applications Volume 15, Issue 3

August 2019

331 pages

ISSN:1551-6857

EISSN:1551-6865

DOI:10.1145/3352586

Editor:
Alberto Del Bimbo
University of Firenze, Italy

Issue’s Table of Contents

Copyright © 2019 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 August 2019

Accepted: 01 March 2019

Revised: 01 February 2019

Received: 01 August 2018

Published in TOMM Volume 15, Issue 3

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

EU H2020 project-AniAge
National Natural Science Foundation of China

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
173
Total Downloads

Downloads (Last 12 months)7
Downloads (Last 6 weeks)0

Reflects downloads up to 15 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Al-Qatf MWang XHawbani AAbdussalam AAlsamhi S(2023)Image Captioning With Novel Topics Guidance and Retrieval-Based Topics Re-WeightingIEEE Transactions on Multimedia10.1109/TMM.2022.320269025(5984-5999)Online publication date: 1-Jan-2023
https://dl.acm.org/doi/10.1109/TMM.2022.3202690
Ji WWang R(2021)A Multi-instance Multi-label Dual Learning Approach for Video CaptioningACM Transactions on Multimedia Computing, Communications, and Applications10.1145/344679217:2s(1-18)Online publication date: 14-Jun-2021
https://dl.acm.org/doi/10.1145/3446792
Kunhoth JKarkar AAl-Maadeed SAl-Ali A(2020)Indoor positioning and wayfinding systems: a surveyHuman-centric Computing and Information Sciences10.1186/s13673-020-00222-010:1Online publication date: 2-May-2020
https://dl.acm.org/doi/10.1186/s13673-020-00222-0

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Issue’s Table of Contents