Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Deep Neighborhood Component Analysis for Visual Similarity Modeling

Published: 18 April 2020 Publication History

Abstract

Learning effective visual similarity is an essential problem in multimedia research. Despite the promising progress made in recent years, most existing approaches learn visual features and similarities in two separate stages, which inevitably limits their performance. Once useful information has been lost in the feature extraction stage, it can hardly be recovered later. This article proposes a novel end-to-end approach for visual similarity modeling, called deep neighborhood component analysis, which discriminatively trains deep neural networks to jointly learn visual features and similarities. Specifically, we first formulate a metric learning objective that maximizes the intra-class correlations and minimizes the inter-class correlations under the neighborhood component analysis criterion, and then train deep convolutional neural networks to learn a nonlinear mapping that projects visual instances from original feature space to a discriminative and neighborhood-structure-preserving embedding space, thus resulting in better performance. We conducted extensive evaluations on several widely used and challenging datasets, and the impressive results demonstrate the effectiveness of our proposed approach.

References

[1]
Ejaz Ahmed, Michael Jones, and Tim K. Marks. 2015. An improved deep learning architecture for person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3908--3916.
[2]
Mukund Balasubramanian and Eric L. Schwartz. 2002. The isomap algorithm and topological stability. Science 295, 5552 (2002), 7--7.
[3]
Mikhail Belkin and Partha Niyogi. 2003. Laplacian eigenmaps for dimensionality reduction and data representation. Neural Computation 15, 6 (2003), 1373--1396.
[4]
Aurélien Bellet, Amaury Habrard, and Marc Sebban. 2013. A survey on metric learning for feature vectors and structured data. arXiv:1306.6709.
[5]
Ingwer Borg and Patrick J. F. Groenen. 2005. Modern Multidimensional Scaling: Theory and Applications. Springer Science 8 Business Media.
[6]
Xinyuan Cai, Chunheng Wang, Baihua Xiao, Xue Chen, and Ji Zhou. 2012. Deep nonlinear metric learning with independent subspace analysis for face verification. In Proceedings of the ACM International Conference on Multimedia. 749--752.
[7]
Tat-Seng Chua, Jinhui Tang, Richang Hong, Haojie Li, Zhiping Luo, and Yantao Zheng. 2009. NUS-WIDE: A real-world web image database from National University of Singapore. In Proceedings of the ACM International Conference on Image and Video Retrieval. 48.
[8]
Adam Coates, Andrew Y. Ng, and Honglak Lee. 2011. An analysis of single-layer networks in unsupervised feature learning. Journal of Machine Learning Research 15 (2011), 215--223.
[9]
Zhengming Ding and Yun Fu. 2017. Robust transfer metric learning for image classification. IEEE Transactions on Image Processing 26, 2 (2017), 660--670.
[10]
Bo Geng, Dacheng Tao, and Chao Xu. 2011. DAML: Domain adaptation metric learning. IEEE Transactions on Image Processing 20, 10 (2011), 2980--2989.
[11]
Wu Gengshen, Han Jungong, Yuchen Guo, Li Liu, Guiguang Ding, Qiang Ni, and Ling Shao. 2019. Unsupervised deep video hashing via balanced code for large-scale video retrieval. IEEE Transactions on Image Processing 28, 4 (April 2019), 1993--2007.
[12]
Jacob Goldberger, Geoffrey E. Hinton, Sam T. Roweis, and Ruslan R. Salakhutdinov. 2005. Neighbourhood components analysis. In Advances in Neural Information Processing Systems. 513--520.
[13]
Matthieu Guillaumin, Jakob Verbeek, and Cordelia Schmid. 2009. Is that you? Metric learning approaches for face identification. In Proceedings of the IEEE International Conference on Computer Vision. 498--505.
[14]
Yuchen Guo, Guiguang Ding, and Jungong Han. 2017. Robust quantization for general similarity search. IEEE Transactions on Image Processing 27, 2 (2017), 949--963.
[15]
R. Hadsell, S. Chopra, and Y. LeCun. 2006. Dimensionality reduction by learning an invariant mapping. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Vol. 2. 1735--1742.
[16]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770--778.
[17]
Xiaofei He and Partha Niyogi. 2004. Locality preserving projections. In Advances in Neural Information Processing Systems. 153--160.
[18]
Steven Ch Hoi, Wei Liu, and Shih-Fu Chang. 2010. Semi-supervised distance metric learning for collaborative image retrieval and clustering. ACM Transactions on Multimedia Computing, Communications, and Applications 6, 3 (2010), 18.
[19]
Richang Hong, Lei Li, Junjie Cai, Dapeng Tao, Meng Wang, and Qi Tian. 2017. Coherent semantic-visual indexing for large-scale image retrieval in the cloud. IEEE Transactions on Image Processing 26, 9 (2017), 4128--4138.
[20]
Richang Hong, Yang Yang, Meng Wang, and Xian-Sheng Hua. 2015. Learning visual semantic relationships for efficient visual retrieval. IEEE Transactions on Big Data 1, 4 (2015), 152--161.
[21]
Junlin Hu, Jiwen Lu, and Yap-Peng Tan. 2016. Deep metric learning for visual tracking. IEEE Transactions on Circuits and Systems for Video Technology 26, 11 (2016), 2056--2068.
[22]
Nan Jiang and Wenyu Liu. 2014. Data-driven spatially-adaptive metric adjustment for visual tracking. IEEE Transactions on Image Processing 23, 4 (2014), 1556--1568.
[23]
Hu Junlin, Lu Jiwen, Tan Yap-Peng, and Zhou Jie. 2016. Deep transfer metric learning. IEEE Transactions on Image Processing 25, 12 (2016), 5576--5588.
[24]
Samuel Kaski and Jaakko Peltonen. 2003. Informative discriminant analysis. In Proceedings of the International Conference on Machine Learning.
[25]
Alex Krizhevsky and Geoffrey Hinton. 2009. Learning Multiple Layers of Features from Tiny Images. Technical Report. University of Toronto.
[26]
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. ImageNet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems. 1097--1105.
[27]
Brian Kulis. 2012. Metric learning: A survey. Foundations and Trends in Machine Learning 5 (2012), 287--364.
[28]
Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. 2015. Deep Learning. Nature 521, 7553 (2015), 436--444.
[29]
Yann LeCun, Corinna Cortes, and Christopher J. C. Burges. 1998. The MNIST Database of Handwritten Digits. Retrieved February 29, 2020 from http://yann.lecun.com/exdb/mnist/.
[30]
Xirong Li, Tiberio Uricchio, Lamberto Ballan, Marco Bertini, Cees G. M. Snoek, and Alberto Del Bimbo. 2016. Socializing the semantic gap: A comparative survey on image tag assignment, refinement, and retrieval. ACM Computing Surveys 49, 1 (2016), 14.
[31]
David G. Lowe. 1999. Object recognition from local scale-invariant features. In Proceedings of the IEEE International Conference on Computer Vision, Vol. 2. 1150--1157.
[32]
Jiwen Lu, Junlin Hu, and Yap-Peng Tan. 2017a. Discriminative deep metric learning for face and kinship verification. IEEE Transactions on Image Processing 26, 9 (2017), 4269--4282.
[33]
Jiwen Lu, Junlin Hu, and Jie Zhou. 2017b. Deep metric learning for visual understanding: An overview of recent advances. IEEE Signal Processing Magazine 34, 6 (2017), 76--84.
[34]
Yair Movshovitz-Attias, Alexander Toshev, Thomas K. Leung, Sergey Ioffe, and Saurabh Singh. 2017. No fuss distance metric learning using proxies. In Proceedings of the IEEE International Conference on Computer Vision. 360--368.
[35]
Yuval Netzer, Tao Wang, Adam Coates, Alessandro Bissacco, Bo Wu, and Andrew Y. Ng. 2011. Reading digits in natural images with unsupervised feature learning. In Proceedings of the NIPS Workshop on Deep Learning and Unsupervised Feature Learning. 5.
[36]
Hieu V. Nguyen and Li Bai. 2010. Cosine similarity metric learning for face verification. In Proceedings of the Asian Conference on Computer Vision. 709--720.
[37]
Timo Ojala, Matti Pietikäinen, and David Harwood. 1996. A comparative study of texture measures with classification based on featured distributions. Pattern Recognition 29, 1 (1996), 51--59.
[38]
Omkar M. Parkhi, Andrea Vedaldi, and Andrew Zisserman. 2015. Deep face recognition. In Proceedings of the British Machine Vision Conference, Vol. 1. 6.
[39]
Ruslan Salakhutdinov and Geoff Hinton. 2007. Learning a nonlinear embedding by preserving class neighbourhood structure. In Proceedings of the International Conference on Artificial Intelligence and Statistics, Vol. 2. 412--419.
[40]
Florian Schroff, Dmitry Kalenichenko, and James Philbin. 2015. FaceNet: A unified embedding for face recognition and clustering. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 815--823.
[41]
Arnold W. M. Smeulders, Marcel Worring, Simone Santini, Amarnath Gupta, and Ramesh Jain. 2000. Content-based image retrieval at the end of the early years. IEEE Transactions on Pattern Analysis and Machine Intelligence 22, 12 (2000), 1349--1380.
[42]
Yaniv Taigman, Ming Yang, Marc’Aurelio Ranzato, and Lior Wolf. 2014. DeepFace: Closing the gap to human-level performance in face verification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1701--1708.
[43]
Lorenzo Torresani and Kuang-Chih Lee. 2007. Large margin component analysis. In Advances in Neural Information Processing Systems. 1385--1392.
[44]
Fei Wang and Jimeng Sun. 2015. Survey on distance metric learning and dimensionality reduction in data mining. Data Mining and Knowledge Discovery 29, 2 (2015), 534--564.
[45]
Meng Wang, Yue Gao, Ke Lu, and Yong Rui. 2012a. View-based discriminative probabilistic modeling for 3D object retrieval and recognition. IEEE Transactions on Image Processing 22, 4 (2012), 1395--1407.
[46]
Meng Wang, Richang Hong, Xiao-Tong Yuan, Shuicheng Yan, and Tat-Seng Chua. 2012b. Movie2Comics: Towards a lively video content presentation. IEEE Transactions on Multimedia 14, 3 (2012), 858--870.
[47]
Meng Wang, Xueliang Liu, and Xindong Wu. 2015. Visual classification by l1-hypergraph modeling. IEEE Transactions on Knowledge and Data Engineering 27, 9 (2015), 2564--2574.
[48]
Zhengxiang Wang, Yiqun Hu, and Liang-Tien Chia. 2013. Learning image-to-class distance metric for image classification. ACM Transactions on Intelligent Systems and Technology 4, 2 (2013), 34.
[49]
Yunchao Wei, Yao Zhao, Zhenfeng Zhu, Shikui Wei, Yanhui Xiao, Jiashi Feng, and Shuicheng Yan. 2016. Modality-dependent cross-media retrieval. ACM Transactions on Intelligent Systems and Technology 7, 4 (2016), 57.
[50]
Gengshen Wu, Jungong Han, Zijia Lin, Guiguang Ding, Baochang Zhang, and Qiang Ni. 2019. Joint image-text hashing for fast large-scale cross-media retrieval using self-supervised deep learning. IEEE Transactions on Industrial Electronics 66, 12 (Dec. 2019), 9868--9877.
[51]
Liu Yang and Rong Jin. 2006. Distance Metric Learning: A Comprehensive Survey. Michigan State University.
[52]
Xun Yang, Meng Wang, and Dacheng Tao. 2017. Person re-identification with metric learning using privileged information. IEEE Transactions on Image Processing 27, 2 (2017), 791--805.
[53]
Xun Yang, Peicheng Zhou, and Meng Wang. 2019. Person reidentification via structural deep metric learning. IEEE Transactions on Neural Networks and Learning Systems 30, 10 (2019), 2987--2998.
[54]
Matthew D. Zeiler. 2012. ADADELTA: An adaptive learning rate method. arXiv:1212.5701.
[55]
Wenzhao Zheng, Zhaodong Chen, Jiwen Lu, and Jie Zhou. 2019. Hardness-aware deep metric learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 72--81.
[56]
Fuzhen Zhuang, Xiaohu Cheng, Ping Luo, Sinno Jialin Pan, and Qing He. 2018. Supervised representation learning with double encoding-layer autoencoder for transfer learning. ACM Transactions on Intelligent Systems and Technology 9, 2 (2018), 16.

Cited By

View all
  • (2024)An Efficient and Robust Approach Using Inductive Transfer-Based Ensemble Deep Neural Networks for Kidney Stone DetectionIEEE Access10.1109/ACCESS.2024.337067212(32894-32910)Online publication date: 2024
  • (2023)Causal Interventional Training for Image RecognitionIEEE Transactions on Multimedia10.1109/TMM.2021.313671725(1033-1044)Online publication date: 2023
  • (2023)Application of machine learning in MOFs for gas adsorption and separationMaterials Research Express10.1088/2053-1591/ad0c0710:12(122001)Online publication date: 12-Dec-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Intelligent Systems and Technology
ACM Transactions on Intelligent Systems and Technology  Volume 11, Issue 3
Survey Paper and Regular Papers
June 2020
286 pages
ISSN:2157-6904
EISSN:2157-6912
DOI:10.1145/3392081
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 April 2020
Accepted: 01 December 2019
Revised: 01 November 2019
Received: 01 June 2019
Published in TIST Volume 11, Issue 3

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Metric learning
  2. neighborhood component analysis
  3. visual similarity modeling

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

  • National Natural Science Foundation of China (NSFC)
  • National Major Research Program of China

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)34
  • Downloads (Last 6 weeks)4
Reflects downloads up to 15 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2024)An Efficient and Robust Approach Using Inductive Transfer-Based Ensemble Deep Neural Networks for Kidney Stone DetectionIEEE Access10.1109/ACCESS.2024.337067212(32894-32910)Online publication date: 2024
  • (2023)Causal Interventional Training for Image RecognitionIEEE Transactions on Multimedia10.1109/TMM.2021.313671725(1033-1044)Online publication date: 2023
  • (2023)Application of machine learning in MOFs for gas adsorption and separationMaterials Research Express10.1088/2053-1591/ad0c0710:12(122001)Online publication date: 12-Dec-2023
  • (2023)Stochastic Neighbour EmbeddingElements of Dimensionality Reduction and Manifold Learning10.1007/978-3-031-10602-6_16(455-477)Online publication date: 3-Feb-2023
  • (2022)A survey on social image semantic analysisChinese Science Bulletin10.1360/TB-2022-093868:25(3368-3384)Online publication date: 11-Nov-2022
  • (2022)Revisiting Local Descriptor for Improved Few-Shot ClassificationACM Transactions on Multimedia Computing, Communications, and Applications10.1145/351191718:2s(1)Online publication date: 2022
  • (2022)MC‐Net: Learning mutually‐complementary features for image manipulation localizationInternational Journal of Intelligent Systems10.1002/int.2282637:5(3072-3089)Online publication date: 17-Jan-2022
  • (2022)PR‐NET: Progressively‐refined neural network for image manipulation localizationInternational Journal of Intelligent Systems10.1002/int.2282237:5(3166-3188)Online publication date: 14-Jan-2022
  • (2022)Visual feature synthesis with semantic reconstructor for traditional and generalized zero‐shot object classificationInternational Journal of Intelligent Systems10.1002/int.2281137:5(2934-2951)Online publication date: 14-Jan-2022
  • (2021)Route Optimization via Environment-Aware Deep Network and Reinforcement LearningACM Transactions on Intelligent Systems and Technology10.1145/346164512:6(1-21)Online publication date: 31-Dec-2021
  • Show More Cited By

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media