research-article

Deep Learning for Content-Based Image Retrieval: A Comprehensive Study

Authors:

Steven Chu Hong Hoi,

Yongdong Zhang,

Jintao LiAuthors Info & Claims

MM '14: Proceedings of the 22nd ACM international conference on Multimedia

Pages 157 - 166

https://doi.org/10.1145/2647868.2654948

Published: 03 November 2014 Publication History

Abstract

Learning effective feature representations and similarity measures are crucial to the retrieval performance of a content-based image retrieval (CBIR) system. Despite extensive research efforts for decades, it remains one of the most challenging open problems that considerably hinders the successes of real-world CBIR systems. The key challenge has been attributed to the well-known ``semantic gap'' issue that exists between low-level image pixels captured by machines and high-level semantic concepts perceived by human. Among various techniques, machine learning has been actively investigated as a possible direction to bridge the semantic gap in the long term. Inspired by recent successes of deep learning techniques for computer vision and other applications, in this paper, we attempt to address an open problem: if deep learning is a hope for bridging the semantic gap in CBIR and how much improvements in CBIR tasks can be achieved by exploring the state-of-the-art deep learning techniques for learning feature representations and similarity measures. Specifically, we investigate a framework of deep learning with application to CBIR tasks with an extensive set of empirical studies by examining a state-of-the-art deep learning method (Convolutional Neural Networks) for CBIR tasks under varied settings. From our empirical studies, we find some encouraging results and summarize some important insights for future research.

References

[1]

D. H. Ackley, G. E. Hinton, and T. J. Sejnowski. A learning algorithm for boltzmann machines*. Cognitive science, 9(1):147--169, 1985.

[2]

A. Bar-Hillel, T. Hertz, N. Shental, and D. Weinshall. Learning distance functions using equivalence relations. In ICML, pages 11--18, 2003.

Digital Library

[3]

H. Bay, T. Tuytelaars, and L. J. V. Gool. Surf: Speeded up robust features. In ECCV (1), pages 404--417, 2006.

Digital Library

[4]

B. C. Becker and E. G. Ortiz. Evaluating open-universe face identification on the web. In CVPR Workshops, pages 904--911, 2013.

Digital Library

[5]

Y. Bengio, A. C. Courville, and P. Vincent. Unsupervised feature learning and deep learning: A review and new perspectives. CoRR, abs/1206.5538, 2012.

[6]

H. Chang and D.-Y. Yeung. Kernel-based distance metric learning for content-based image retrieval. Image and Vision Computing, 25(5):695--703, 2007.

Digital Library

[7]

G. Chechik, V. Sharma, U. Shalit, and S. Bengio. Large scale online learning of image similarity through ranking. Journal of Machine Learning Research, 11:1109--1135, 2010.

Digital Library

[8]

D. C. Ciresan, A. Giusti, L. M. Gambardella, and J. Schmidhuber. Deep neural networks segment neuronal membranes in electron microscopy images. In NIPS, pages 2852--2860, 2012.

[9]

K. Crammer, O. Dekel, J. Keshet, S. Shalev-Shwartz, and Y. Singer. Online passive-aggressive algorithms. Journal of Machine Learning Research, 7:551--585, 2006.

Digital Library

[10]

J. Dean, G. Corrado, R. Monga, K. Chen, M. Devin, Q. V. Le, M. Z. Mao, M. Ranzato, A. W. Senior, P. A. Tucker, K. Yang, and A. Y. Ng. Large scale distributed deep networks. In NIPS, pages 1232--1240, 2012.

Digital Library

[11]

L. Deng. A tutorial survey of architectures, algorithms, and applications for deep learning. APSIPA Transactions on Signal and Information Processing, 3:e2, 2014.

[12]

C. Domeniconi, J. Peng, and D. Gunopulos. Locally adaptive metric nearest-neighbor classification. IEEE Trans. Pattern Anal. Mach. Intell., 24(9):1281--1285, 2002.

Digital Library

[13]

J. Donahue, Y. Jia, O. Vinyals, J. Hoffman, N. Zhang, E. Tzeng, and T. Darrell. Decaf: A deep convolutional activation feature for generic visual recognition. CoRR, abs/1310.1531, 2013.

[14]

R. B. Girshick, J. Donahue, T. Darrell, and J. Malik. Rich feature hierarchies for accurate object detection and semantic segmentation. CoRR, abs/1311.2524, 2013.

Digital Library

[15]

M. Guillaumin, J. J. Verbeek, and C. Schmid. Is that you? metric learning approaches for face identification. In ICCV, pages 498--505, 2009.

[16]

G. Hinton, L. Deng, D. Yu, G. E. Dahl, A.-r. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, T. N. Sainath, et al. Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. Signal Processing Magazine, IEEE, 29(6):82--97, 2012.

[17]

G. E. Hinton, S. Osindero, and Y. W. Teh. A fast learning algorithm for deep belief nets. Neural Computation, 18(7):1527--1554, 2006.

Digital Library

[18]

S. C. H. Hoi, W. Liu, M. R. Lyu, and W.-Y. Ma. Learning distance metrics with contextual constraints for image retrieval. In CVPR (2), pages 2072--2078, 2006.

Digital Library

[19]

E. H. Huang, R. Socher, C. D. Manning, and A. Y. Ng. Improving word representations via global context and multiple word prototypes. In ACL (1), pages 873--882, 2012.

Digital Library

[20]

G. B. Huang, M. Ramesh, T. Berg, and E. Learned-Miller. Labeled faces in the wild: A database for studying face recognition in unconstrained environments. Technical Report 07-49, University of Massachusetts, Amherst, October 2007.

[21]

A. K. Jain and A. Vailaya. Image retrieval using color and shape. Pattern Recognition, 29(8):1233--1244, 1996.

[22]

P. Jain, B. Kulis, I. S. Dhillon, and K. Grauman. Online metric learning and fast similarity search. In NIPS, pages 761--768, 2008.

Digital Library

[23]

H. Jégou, F. Perronnin, M. Douze, J. Sánchez, P. Pérez, and C. Schmid. Aggregating local image descriptors into compact codes. IEEE Trans. Pattern Anal. Mach. Intell., 34(9):1704--1716, 2012.

Digital Library

[24]

R. Jin, S. Wang, and Y. Zhou. Regularized distance metric learning: Theory and algorithm. In NIPS, pages 862--870, 2009.

Digital Library

[25]

Y. Jing and S. Baluja. Visualrank: Applying pagerank to large-scale image search. IEEE Trans. Pattern Anal. Mach. Intell., 30(11):1877--1890, 2008.

Digital Library

[26]

A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In NIPS, pages 1106--1114, 2012.

Digital Library

[27]

N. Kumar, A. C. Berg, P. N. Belhumeur, and S. K. Nayar. Attribute and simile classifiers for face verification. In ICCV, pages 365--372, 2009.

[28]

Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278--2324, 1998.

[29]

J.-E. Lee, R. Jin, and A. K. Jain. Rank-based distance metric learning: An application to image retrieval. In CVPR, 2008.

[30]

M. S. Lew, N. Sebe, C. Djeraba, and R. Jain. Content-based multimedia information retrieval: State of the art and challenges. TOMCCAP, 2(1):1--19, 2006.

Digital Library

[31]

D. G. Lowe. Object recognition from local scale-invariant features. In ICCV, pages 1150--1157, 1999.

Digital Library

[32]

B. S. Manjunath and W.-Y. Ma. Texture features for browsing and retrieval of image data. IEEE Trans. Pattern Anal. Mach. Intell., 18(8):837--842, 1996.

Digital Library

[33]

A. S. Mian, Y. Hu, R. Hartley, and R. A. Owens. Image set based face recognition using self-regularized non-negative coding and adaptive distance metric learning. IEEE Transactions on Image Processing, 22(12):5252--5262, 2013.

Digital Library

[34]

T. Mikolov, W. tau Yih, and G. Zweig. Linguistic regularities in continuous space word representations. In HLT-NAACL, pages 746--751, 2013.

[35]

M. Norouzi, D. J. Fleet, and R. Salakhutdinov. Hamming distance metric learning. In NIPS, pages 1070--1078, 2012.

Digital Library

[36]

A. Oliva and A. Torralba. Modeling the shape of the scene: A holistic representation of the spatial envelope. International Journal of Computer Vision, 42(3):145--175, 2001.

Digital Library

[37]

A. Oliva and A. Torralba. Scene-centered description from spatial envelope properties. In Biologically Motivated Computer Vision, pages 263--272, 2002.

Digital Library

[38]

J. Philbin, O. Chum, M. Isard, J. Sivic, and A. Zisserman. Object retrieval with large vocabularies and fast spatial matching. In CVPR, 2007.

[39]

A. S. Razavian, H. Azizpour, J. Sullivan, and S. Carlsson. Cnn features off-the-shelf: an astounding baseline for recognition. CoRR, abs/1403.6382, 2014.

Digital Library

[40]

R. Salakhutdinov and G. E. Hinton. Deep Boltzmann machines. In AISTATS, pages 448--455, 2009.

[41]

R. Salakhutdinov and G. E. Hinton. Semantic hashing. Int. J. Approx. Reasoning, 50(7):969--978, 2009.

Digital Library

[42]

R. Salakhutdinov, A. Mnih, and G. E. Hinton. Restricted boltzmann machines for collaborative filtering. In ICML, pages 791--798, 2007.

Digital Library

[43]

P. Sermanet, D. Eigen, X. Zhang, M. Mathieu, R. Fergus, and Y. LeCun. Overfeat: Integrated recognition, localization and detection using convolutional networks. CoRR, abs/1312.6229, 2013.

[44]

J. Sivic, B. C. Russell, A. A. Efros, A. Zisserman, and W. T. Freeman. Discovering objects and their localization in images. In ICCV, pages 370--377, 2005.

Digital Library

[45]

A. W. M. Smeulders, M. Worring, S. Santini, A. Gupta, and R. Jain. Content-based image retrieval at the end of the early years. IEEE Trans. Pattern Anal. Mach. Intell., 22(12):1349--1380, 2000.

Digital Library

[46]

D. Wang, S. C. H. Hoi, P. Wu, J. Zhu, Y. He, and C. Miao. Learning to name faces: a multimodal learning scheme for search-based face annotation. In SIGIR, pages 443--452, 2013.

Digital Library

[47]

Z. Wang, Y. Hu, and L.-T. Chia. Learning image-to-class distance metric for image classification. ACM TIST, 4(2):34, 2013.

Digital Library

[48]

K. Q. Weinberger, J. Blitzer, and L. K. Saul. Distance metric learning for large margin nearest neighbor classification. In NIPS, 2005.

Digital Library

[49]

J. Wu and J. M. Rehg. Centrist: A visual descriptor for scene categorization. IEEE Trans. Pattern Anal. Mach. Intell., 33(8):1489--1501, 2011.

Digital Library

[50]

L. Wu and S. C. H. Hoi. Enhancing bag-of-words models with semantics-preserving metric learning. IEEE MultiMedia, 18(1):24--37, 2011.

Digital Library

[51]

L. Wu, S. C. H. Hoi, and N. Yu. Semantics-preserving bag-of-words models and applications. IEEE Transactions on Image Processing, 19(7):1908--1920, 2010.

Digital Library

[52]

P. Wu, S. C. H. Hoi, H. Xia, P. Zhao, D. Wang, and C. Miao. Online multimodal deep similarity learning with application to image retrieval. In ACM Multimedia, pages 153--162, 2013.

Digital Library

[53]

H. Xie, Y. Zhang, J. Tan, L. Guo, and J. Li. Contextual query expansion for image retrieval. IEEE Transactions on Multimedia, 16(4):1104--1114, 2014.

Digital Library

[54]

J. Yang, Y.-G. Jiang, A. G. Hauptmann, and C.-W. Ngo. Evaluating bag-of-visual-words representations in scene classification. In Multimedia Information Retrieval, pages 197--206, 2007.

Digital Library

[55]

D. Yu, M. L. Seltzer, J. Li, J.-T. Huang, and F. Seide. Feature learning in deep neural networks - a study on speech recognition tasks. CoRR, abs/1301.3605, 2013.

[56]

M. D. Zeiler and R. Fergus. Visualizing and understanding convolutional networks. CoRR, abs/1311.2901, 2013.

[57]

L. Zhang, Y. Zhang, X. Gu, J. Tang, and Q. Tian. Scalable similarity search with topology preserving hashing. IEEE Transactions on Image Processing, 23(7):3025--3039, 2014.

[58]

Y. Zhang, L. Zhang, and Q. Tian. A prior-free weighting scheme for binary code ranking. IEEE Transactions on Multimedia, 16(4):1127--1139, 2014.

Digital Library

Cited By

Hu QZhang LDrahota JWoldt WVarner DBishop ALaGrange TNeale CTang Z(2024)Combining Multi-View UAV Photogrammetry, Thermal Imaging, and Computer Vision Can Derive Cost-Effective Ecological Indicators for Habitat AssessmentRemote Sensing10.3390/rs1606108116:6(1081)Online publication date: 20-Mar-2024
https://doi.org/10.3390/rs16061081
Folarin AMunin-Doce AFerreno-Gonzalez SCiriano-Palacios JDiaz-Casas V(2024)Real Time Vessel Detection Model Using Deep Learning Algorithms for Controlling a Barrier SystemJournal of Marine Science and Engineering10.3390/jmse1208136312:8(1363)Online publication date: 10-Aug-2024
https://doi.org/10.3390/jmse12081363
Jagdale BSugave SKulkarni YGutte V(2024)Privacy-aware quantum convolutional neural network for blockchain-based IoT health care dataIntelligent Decision Technologies10.3233/IDT-23038618:2(1337-1354)Online publication date: 7-Jun-2024
https://doi.org/10.3233/IDT-230386
Show More Cited By

Index Terms

Deep Learning for Content-Based Image Retrieval: A Comprehensive Study
1. Computing methodologies
2. Information systems
  1. Information retrieval
    1. Document representation

Recommendations

Content-based image retrieval with compact deep convolutional features

Convolutional neural networks (CNNs) with deep learning have recently achieved a remarkable success with a superior performance in computer vision applications. Most of CNN-based methods extract image features at the last layer using a single CNN ...
Deep convolutional learning for Content Based Image Retrieval

In this paper we propose a model retraining method for learning more efficient convolutional representations for Content Based Image Retrieval. We employ a deep CNN model to obtain the feature representations from the activations of the convolutional ...
Content-based image retrieval by clustering
MIR '03: Proceedings of the 5th ACM SIGMM international workshop on Multimedia information retrieval

In a typical content-based image retrieval (CBIR) system, query results are a set of images sorted by feature similarities with respect to the query. However, images with high feature similarities to the query may be very different from the query in ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MM '14: Proceedings of the 22nd ACM international conference on Multimedia

November 2014

1310 pages

ISBN:9781450330633

DOI:10.1145/2647868

General Chairs:
Kien A. Hua
University of Central Florida, USA
,
Yong Rui
Microsoft Research, China
,
Ralf Steinmetz
Technische Universitt Darmstadt, Germany
,
Program Chairs:
Alan Hanjalic
Delft University of Technology, Netherlands
,
Apostol (Paul) Natsev
Google, USA
,
Wenwu Zhu
Tsinghua University, China

Copyright © 2014 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 03 November 2014

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Conference

MM '14

Sponsor:

SIGMM

MM '14: 2014 ACM Multimedia Conference

November 3 - 7, 2014

Florida, Orlando, USA

Acceptance Rates

MM '14 Paper Acceptance Rate 55 of 286 submissions, 19%;

Overall Acceptance Rate 995 of 4,171 submissions, 24%

Upcoming Conference

MM '24

Sponsor:
sigmm

The 32nd ACM International Conference on Multimedia

October 28 - November 1, 2024

Melbourne , VIC , Australia

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

607
Total Citations
View Citations
6,712
Total Downloads

Downloads (Last 12 months)239
Downloads (Last 6 weeks)20

Reflects downloads up to 09 Aug 2024

Other Metrics

View Author Metrics

Citations

Cited By

Hu QZhang LDrahota JWoldt WVarner DBishop ALaGrange TNeale CTang Z(2024)Combining Multi-View UAV Photogrammetry, Thermal Imaging, and Computer Vision Can Derive Cost-Effective Ecological Indicators for Habitat AssessmentRemote Sensing10.3390/rs1606108116:6(1081)Online publication date: 20-Mar-2024
https://doi.org/10.3390/rs16061081
Folarin AMunin-Doce AFerreno-Gonzalez SCiriano-Palacios JDiaz-Casas V(2024)Real Time Vessel Detection Model Using Deep Learning Algorithms for Controlling a Barrier SystemJournal of Marine Science and Engineering10.3390/jmse1208136312:8(1363)Online publication date: 10-Aug-2024
https://doi.org/10.3390/jmse12081363
Jagdale BSugave SKulkarni YGutte V(2024)Privacy-aware quantum convolutional neural network for blockchain-based IoT health care dataIntelligent Decision Technologies10.3233/IDT-23038618:2(1337-1354)Online publication date: 7-Jun-2024
https://doi.org/10.3233/IDT-230386
Vuong Nguyen L(2024)Classifications, evaluation metrics, datasets, and domains in recommendation services: A surveyInternational Journal of Hybrid Intelligent Systems10.3233/HIS-24000320:2(85-100)Online publication date: 11-Jun-2024
https://doi.org/10.3233/HIS-240003
Krishnamoorthy SLakshmi SJanet JSundararaj V(2024)Advanced Image Processing Techniques for Medical Image Retrieval Using Visual Features and Distance MeasuresJournal of Circuits, Systems and Computers10.1142/S0218126624501032Online publication date: 20-Apr-2024
https://doi.org/10.1142/S0218126624501032
Gao HXiao JYin YLiu TShi J(2024)A Mutually Supervised Graph Attention Network for Few-Shot Segmentation: The Perspective of Fully Utilizing Limited SamplesIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2022.3155486(1-13)Online publication date: 2024
https://doi.org/10.1109/TNNLS.2022.3155486
Han LPaoletti MTao XWu ZHaut JLi PPastor-Vargas RPlaza A(2024)Hash-Based Remote Sensing Image RetrievalIEEE Transactions on Geoscience and Remote Sensing10.1109/TGRS.2024.342935062(1-23)Online publication date: 2024
https://doi.org/10.1109/TGRS.2024.3429350
Peng ZChen XHuang WKong XLi JXue S(2024)Shielding Object Detection: Enhancing Adversarial Defense through Ensemble Methods2024 5th Information Communication Technologies Conference (ICTC)10.1109/ICTC61510.2024.10601992(88-97)Online publication date: 10-May-2024
https://doi.org/10.1109/ICTC61510.2024.10601992
Yang FIsmail NPang YKebande VAl-Dhaqm AKoh T(2024)A Systematic Literature Review of Deep Learning Approaches for Sketch-Based Image Retrieval: Datasets, Metrics, and Future DirectionsIEEE Access10.1109/ACCESS.2024.335793912(14847-14869)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3357939
Zhou YZhang ZWang XSheng QZhao R(2024)Multimodal archive resources organization based on deep learning: a prospective frameworkAslib Journal of Information Management10.1108/AJIM-07-2023-0239Online publication date: 25-Jan-2024
https://doi.org/10.1108/AJIM-07-2023-0239
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents