research-article

Deep Neighborhood Component Analysis for Visual Similarity Modeling

Authors:

Richang HongAuthors Info & Claims

ACM Transactions on Intelligent Systems and Technology (TIST), Volume 11, Issue 3

Article No.: 29, Pages 1 - 15

https://doi.org/10.1145/3375787

Published: 18 April 2020 Publication History

Abstract

Learning effective visual similarity is an essential problem in multimedia research. Despite the promising progress made in recent years, most existing approaches learn visual features and similarities in two separate stages, which inevitably limits their performance. Once useful information has been lost in the feature extraction stage, it can hardly be recovered later. This article proposes a novel end-to-end approach for visual similarity modeling, called deep neighborhood component analysis, which discriminatively trains deep neural networks to jointly learn visual features and similarities. Specifically, we first formulate a metric learning objective that maximizes the intra-class correlations and minimizes the inter-class correlations under the neighborhood component analysis criterion, and then train deep convolutional neural networks to learn a nonlinear mapping that projects visual instances from original feature space to a discriminative and neighborhood-structure-preserving embedding space, thus resulting in better performance. We conducted extensive evaluations on several widely used and challenging datasets, and the impressive results demonstrate the effectiveness of our proposed approach.

References

[1]

Ejaz Ahmed, Michael Jones, and Tim K. Marks. 2015. An improved deep learning architecture for person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3908--3916.

[2]

Mukund Balasubramanian and Eric L. Schwartz. 2002. The isomap algorithm and topological stability. Science 295, 5552 (2002), 7--7.

[3]

Mikhail Belkin and Partha Niyogi. 2003. Laplacian eigenmaps for dimensionality reduction and data representation. Neural Computation 15, 6 (2003), 1373--1396.

Digital Library

[4]

Aurélien Bellet, Amaury Habrard, and Marc Sebban. 2013. A survey on metric learning for feature vectors and structured data. arXiv:1306.6709.

[5]

Ingwer Borg and Patrick J. F. Groenen. 2005. Modern Multidimensional Scaling: Theory and Applications. Springer Science 8 Business Media.

[6]

Xinyuan Cai, Chunheng Wang, Baihua Xiao, Xue Chen, and Ji Zhou. 2012. Deep nonlinear metric learning with independent subspace analysis for face verification. In Proceedings of the ACM International Conference on Multimedia. 749--752.

Digital Library

[7]

Tat-Seng Chua, Jinhui Tang, Richang Hong, Haojie Li, Zhiping Luo, and Yantao Zheng. 2009. NUS-WIDE: A real-world web image database from National University of Singapore. In Proceedings of the ACM International Conference on Image and Video Retrieval. 48.

Digital Library

[8]

Adam Coates, Andrew Y. Ng, and Honglak Lee. 2011. An analysis of single-layer networks in unsupervised feature learning. Journal of Machine Learning Research 15 (2011), 215--223.

[9]

Zhengming Ding and Yun Fu. 2017. Robust transfer metric learning for image classification. IEEE Transactions on Image Processing 26, 2 (2017), 660--670.

Digital Library

[10]

Bo Geng, Dacheng Tao, and Chao Xu. 2011. DAML: Domain adaptation metric learning. IEEE Transactions on Image Processing 20, 10 (2011), 2980--2989.

Digital Library

[11]

Wu Gengshen, Han Jungong, Yuchen Guo, Li Liu, Guiguang Ding, Qiang Ni, and Ling Shao. 2019. Unsupervised deep video hashing via balanced code for large-scale video retrieval. IEEE Transactions on Image Processing 28, 4 (April 2019), 1993--2007.

[12]

Jacob Goldberger, Geoffrey E. Hinton, Sam T. Roweis, and Ruslan R. Salakhutdinov. 2005. Neighbourhood components analysis. In Advances in Neural Information Processing Systems. 513--520.

[13]

Matthieu Guillaumin, Jakob Verbeek, and Cordelia Schmid. 2009. Is that you? Metric learning approaches for face identification. In Proceedings of the IEEE International Conference on Computer Vision. 498--505.

[14]

Yuchen Guo, Guiguang Ding, and Jungong Han. 2017. Robust quantization for general similarity search. IEEE Transactions on Image Processing 27, 2 (2017), 949--963.

Digital Library

[15]

R. Hadsell, S. Chopra, and Y. LeCun. 2006. Dimensionality reduction by learning an invariant mapping. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Vol. 2. 1735--1742.

[16]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770--778.

[17]

Xiaofei He and Partha Niyogi. 2004. Locality preserving projections. In Advances in Neural Information Processing Systems. 153--160.

[18]

Steven Ch Hoi, Wei Liu, and Shih-Fu Chang. 2010. Semi-supervised distance metric learning for collaborative image retrieval and clustering. ACM Transactions on Multimedia Computing, Communications, and Applications 6, 3 (2010), 18.

Digital Library

[19]

Richang Hong, Lei Li, Junjie Cai, Dapeng Tao, Meng Wang, and Qi Tian. 2017. Coherent semantic-visual indexing for large-scale image retrieval in the cloud. IEEE Transactions on Image Processing 26, 9 (2017), 4128--4138.

Digital Library

[20]

Richang Hong, Yang Yang, Meng Wang, and Xian-Sheng Hua. 2015. Learning visual semantic relationships for efficient visual retrieval. IEEE Transactions on Big Data 1, 4 (2015), 152--161.

Digital Library

[21]

Junlin Hu, Jiwen Lu, and Yap-Peng Tan. 2016. Deep metric learning for visual tracking. IEEE Transactions on Circuits and Systems for Video Technology 26, 11 (2016), 2056--2068.

Digital Library

[22]

Nan Jiang and Wenyu Liu. 2014. Data-driven spatially-adaptive metric adjustment for visual tracking. IEEE Transactions on Image Processing 23, 4 (2014), 1556--1568.

Digital Library

[23]

Hu Junlin, Lu Jiwen, Tan Yap-Peng, and Zhou Jie. 2016. Deep transfer metric learning. IEEE Transactions on Image Processing 25, 12 (2016), 5576--5588.

Digital Library

[24]

Samuel Kaski and Jaakko Peltonen. 2003. Informative discriminant analysis. In Proceedings of the International Conference on Machine Learning.

[25]

Alex Krizhevsky and Geoffrey Hinton. 2009. Learning Multiple Layers of Features from Tiny Images. Technical Report. University of Toronto.

[26]

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. ImageNet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems. 1097--1105.

Digital Library

[27]

Brian Kulis. 2012. Metric learning: A survey. Foundations and Trends in Machine Learning 5 (2012), 287--364.

[28]

Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. 2015. Deep Learning. Nature 521, 7553 (2015), 436--444.

[29]

Yann LeCun, Corinna Cortes, and Christopher J. C. Burges. 1998. The MNIST Database of Handwritten Digits. Retrieved February 29, 2020 from http://yann.lecun.com/exdb/mnist/.

[30]

Xirong Li, Tiberio Uricchio, Lamberto Ballan, Marco Bertini, Cees G. M. Snoek, and Alberto Del Bimbo. 2016. Socializing the semantic gap: A comparative survey on image tag assignment, refinement, and retrieval. ACM Computing Surveys 49, 1 (2016), 14.

Digital Library

[31]

David G. Lowe. 1999. Object recognition from local scale-invariant features. In Proceedings of the IEEE International Conference on Computer Vision, Vol. 2. 1150--1157.

Digital Library

[32]

Jiwen Lu, Junlin Hu, and Yap-Peng Tan. 2017a. Discriminative deep metric learning for face and kinship verification. IEEE Transactions on Image Processing 26, 9 (2017), 4269--4282.

Digital Library

[33]

Jiwen Lu, Junlin Hu, and Jie Zhou. 2017b. Deep metric learning for visual understanding: An overview of recent advances. IEEE Signal Processing Magazine 34, 6 (2017), 76--84.

[34]

Yair Movshovitz-Attias, Alexander Toshev, Thomas K. Leung, Sergey Ioffe, and Saurabh Singh. 2017. No fuss distance metric learning using proxies. In Proceedings of the IEEE International Conference on Computer Vision. 360--368.

[35]

Yuval Netzer, Tao Wang, Adam Coates, Alessandro Bissacco, Bo Wu, and Andrew Y. Ng. 2011. Reading digits in natural images with unsupervised feature learning. In Proceedings of the NIPS Workshop on Deep Learning and Unsupervised Feature Learning. 5.

[36]

Hieu V. Nguyen and Li Bai. 2010. Cosine similarity metric learning for face verification. In Proceedings of the Asian Conference on Computer Vision. 709--720.

Digital Library

[37]

Timo Ojala, Matti Pietikäinen, and David Harwood. 1996. A comparative study of texture measures with classification based on featured distributions. Pattern Recognition 29, 1 (1996), 51--59.

[38]

Omkar M. Parkhi, Andrea Vedaldi, and Andrew Zisserman. 2015. Deep face recognition. In Proceedings of the British Machine Vision Conference, Vol. 1. 6.

[39]

Ruslan Salakhutdinov and Geoff Hinton. 2007. Learning a nonlinear embedding by preserving class neighbourhood structure. In Proceedings of the International Conference on Artificial Intelligence and Statistics, Vol. 2. 412--419.

[40]

Florian Schroff, Dmitry Kalenichenko, and James Philbin. 2015. FaceNet: A unified embedding for face recognition and clustering. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 815--823.

[41]

Arnold W. M. Smeulders, Marcel Worring, Simone Santini, Amarnath Gupta, and Ramesh Jain. 2000. Content-based image retrieval at the end of the early years. IEEE Transactions on Pattern Analysis and Machine Intelligence 22, 12 (2000), 1349--1380.

Digital Library

[42]

Yaniv Taigman, Ming Yang, Marc’Aurelio Ranzato, and Lior Wolf. 2014. DeepFace: Closing the gap to human-level performance in face verification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1701--1708.

Digital Library

[43]

Lorenzo Torresani and Kuang-Chih Lee. 2007. Large margin component analysis. In Advances in Neural Information Processing Systems. 1385--1392.

[44]

Fei Wang and Jimeng Sun. 2015. Survey on distance metric learning and dimensionality reduction in data mining. Data Mining and Knowledge Discovery 29, 2 (2015), 534--564.

Digital Library

[45]

Meng Wang, Yue Gao, Ke Lu, and Yong Rui. 2012a. View-based discriminative probabilistic modeling for 3D object retrieval and recognition. IEEE Transactions on Image Processing 22, 4 (2012), 1395--1407.

Digital Library

[46]

Meng Wang, Richang Hong, Xiao-Tong Yuan, Shuicheng Yan, and Tat-Seng Chua. 2012b. Movie2Comics: Towards a lively video content presentation. IEEE Transactions on Multimedia 14, 3 (2012), 858--870.

Digital Library

[47]

Meng Wang, Xueliang Liu, and Xindong Wu. 2015. Visual classification by l1-hypergraph modeling. IEEE Transactions on Knowledge and Data Engineering 27, 9 (2015), 2564--2574.

Digital Library

[48]

Zhengxiang Wang, Yiqun Hu, and Liang-Tien Chia. 2013. Learning image-to-class distance metric for image classification. ACM Transactions on Intelligent Systems and Technology 4, 2 (2013), 34.

Digital Library

[49]

Yunchao Wei, Yao Zhao, Zhenfeng Zhu, Shikui Wei, Yanhui Xiao, Jiashi Feng, and Shuicheng Yan. 2016. Modality-dependent cross-media retrieval. ACM Transactions on Intelligent Systems and Technology 7, 4 (2016), 57.

Digital Library

[50]

Gengshen Wu, Jungong Han, Zijia Lin, Guiguang Ding, Baochang Zhang, and Qiang Ni. 2019. Joint image-text hashing for fast large-scale cross-media retrieval using self-supervised deep learning. IEEE Transactions on Industrial Electronics 66, 12 (Dec. 2019), 9868--9877.

[51]

Liu Yang and Rong Jin. 2006. Distance Metric Learning: A Comprehensive Survey. Michigan State University.

[52]

Xun Yang, Meng Wang, and Dacheng Tao. 2017. Person re-identification with metric learning using privileged information. IEEE Transactions on Image Processing 27, 2 (2017), 791--805.

[53]

Xun Yang, Peicheng Zhou, and Meng Wang. 2019. Person reidentification via structural deep metric learning. IEEE Transactions on Neural Networks and Learning Systems 30, 10 (2019), 2987--2998.

[54]

Matthew D. Zeiler. 2012. ADADELTA: An adaptive learning rate method. arXiv:1212.5701.

[55]

Wenzhao Zheng, Zhaodong Chen, Jiwen Lu, and Jie Zhou. 2019. Hardness-aware deep metric learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 72--81.

[56]

Fuzhen Zhuang, Xiaohu Cheng, Ping Luo, Sinno Jialin Pan, and Qing He. 2018. Supervised representation learning with double encoding-layer autoencoder for transfer learning. ACM Transactions on Intelligent Systems and Technology 9, 2 (2018), 16.

Digital Library

Cited By

Chaki JUçar A(2024)An Efficient and Robust Approach Using Inductive Transfer-Based Ensemble Deep Neural Networks for Kidney Stone DetectionIEEE Access10.1109/ACCESS.2024.337067212(32894-32910)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3370672
Qin WZhang HHong RLim ESun Q(2023)Causal Interventional Training for Image RecognitionIEEE Transactions on Multimedia10.1109/TMM.2021.313671725(1033-1044)Online publication date: 2023
https://doi.org/10.1109/TMM.2021.3136717
Yang CQi JWang AZha JLiu CYao S(2023)Application of machine learning in MOFs for gas adsorption and separationMaterials Research Express10.1088/2053-1591/ad0c0710:12(122001)Online publication date: 12-Dec-2023
https://doi.org/10.1088/2053-1591/ad0c07
Show More Cited By

Index Terms

Deep Neighborhood Component Analysis for Visual Similarity Modeling
1. Computing methodologies
  1. Machine learning
    1. Machine learning approaches
      1. Neural networks
2. Information systems
  1. Information retrieval
    1. Document representation
      1. Content analysis and feature selection

Recommendations

Improved deep metric learning with local neighborhood component analysis
Abstract
Deep metric learning aims to learn a discriminative feature space in which features have larger intra-class similarities and smaller inter-class similarities. Most recent studies mainly focus on designing different loss functions or ...
Cross-media similarity metric learning with unified deep networks

As a highlighting research topic in the multimedia area, cross-media retrieval aims to capture the complex correlations among multiple media types. Learning better shared representation and distance metric for multimedia data is important to boost the ...
Learning spatially regularized similarity for robust visual tracking

Matching visual appearances of the target object over consecutive frames is a critical step in visual tracking. The accuracy performance of a practical tracking system highly depends on the similarity metric used for visual matching. Recent attempts to ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Intelligent Systems and Technology

ACM Transactions on Intelligent Systems and Technology Volume 11, Issue 3

Survey Paper and Regular Papers

June 2020

286 pages

ISSN:2157-6904

EISSN:2157-6912

DOI:10.1145/3392081

Editor:
Yu Zheng
JD Finance, China

Issue’s Table of Contents

Copyright © 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 April 2020

Accepted: 01 December 2019

Revised: 01 November 2019

Received: 01 June 2019

Published in TIST Volume 11, Issue 3

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

National Natural Science Foundation of China (NSFC)
National Major Research Program of China

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

13
Total Citations
View Citations
424
Total Downloads

Downloads (Last 12 months)34
Downloads (Last 6 weeks)4

Reflects downloads up to 15 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Chaki JUçar A(2024)An Efficient and Robust Approach Using Inductive Transfer-Based Ensemble Deep Neural Networks for Kidney Stone DetectionIEEE Access10.1109/ACCESS.2024.337067212(32894-32910)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3370672
Qin WZhang HHong RLim ESun Q(2023)Causal Interventional Training for Image RecognitionIEEE Transactions on Multimedia10.1109/TMM.2021.313671725(1033-1044)Online publication date: 2023
https://doi.org/10.1109/TMM.2021.3136717
Yang CQi JWang AZha JLiu CYao S(2023)Application of machine learning in MOFs for gas adsorption and separationMaterials Research Express10.1088/2053-1591/ad0c0710:12(122001)Online publication date: 12-Dec-2023
https://doi.org/10.1088/2053-1591/ad0c07
Ghojogh BCrowley MKarray FGhodsi AGhojogh BCrowley MKarray FGhodsi A(2023)Stochastic Neighbour EmbeddingElements of Dimensionality Reduction and Manifold Learning10.1007/978-3-031-10602-6_16(455-477)Online publication date: 3-Feb-2023
https://doi.org/10.1007/978-3-031-10602-6_16
Li ZTang J(2022)A survey on social image semantic analysisChinese Science Bulletin10.1360/TB-2022-093868:25(3368-3384)Online publication date: 11-Nov-2022
https://doi.org/10.1360/TB-2022-0938
He JHong RLiu XXu MSun Q(2022)Revisiting Local Descriptor for Improved Few-Shot ClassificationACM Transactions on Multimedia Computing, Communications, and Applications10.1145/351191718:2s(1)Online publication date: 2022
https://doi.org/10.1145/3511917
Xu DShen XLyu YDu XFeng F(2022)MC‐Net: Learning mutually‐complementary features for image manipulation localizationInternational Journal of Intelligent Systems10.1002/int.2282637:5(3072-3089)Online publication date: 17-Jan-2022
https://doi.org/10.1002/int.22826
Shi ZChang CChen HDu XZhang H(2022)PR‐NET: Progressively‐refined neural network for image manipulation localizationInternational Journal of Intelligent Systems10.1002/int.2282237:5(3166-3188)Online publication date: 14-Jan-2022
https://doi.org/10.1002/int.22822
Zhao YXu TLiu XGuo DHu ZLiu HLi Y(2022)Visual feature synthesis with semantic reconstructor for traditional and generalized zero‐shot object classificationInternational Journal of Intelligent Systems10.1002/int.2281137:5(2934-2951)Online publication date: 14-Jan-2022
https://doi.org/10.1002/int.22811
Guo PXiao KYe ZZhu W(2021)Route Optimization via Environment-Aware Deep Network and Reinforcement LearningACM Transactions on Intelligent Systems and Technology10.1145/346164512:6(1-21)Online publication date: 31-Dec-2021
https://dl.acm.org/doi/10.1145/3461645
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Issue’s Table of Contents