research-article

Progressive Unsupervised Learning of Local Descriptors

Authors:

Hua HuangAuthors Info & Claims

MM '22: Proceedings of the 30th ACM International Conference on Multimedia

Pages 2371 - 2379

https://doi.org/10.1145/3503161.3547792

Published: 10 October 2022 Publication History

Abstract

Training tuple construction is a crucial step in unsupervised local descriptor learning. Existing approaches perform this step relying on heuristics, which suffer from inaccurate supervision signals and struggle to achieve the desired performance. To address the problem, this work presents DescPro, an unsupervised approach that progressively explores both accurate and informative training tuples for model optimization without using heuristics. Specifically, DescPro consists of a Robust Cluster Assignment (RCA) method to infer pairwise relationships by clustering reliable samples with the increasingly powerful CNN model, and a Similarity-weighted Positive Sampling (SPS) strategy to select informative positive pairs for training tuple construction. Extensive experimental results show that, with the collaboration of the above two modules, DescPro can outperform state-of-the-art unsupervised local descriptors and even rival competitive supervised ones on standard benchmarks.

References

[1]

Vassileios Balntas, Riba Edgar, Ponsa Daniel, and Mikolajczyk Krystian. 2016. Learning local feature descriptors with triplets and shallow convolutional neural networks. In Proceedings of the British Machine Vision Conference. 1--11.

[2]

Vassileios Balntas, Karel Lenc, Andrea Vedaldi, and Krystian Mikolajczyk. 2017. HPatches: A Benchmark and Evaluation of Handcrafted and Learned Local Descriptors. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3852--3861.

[3]

Yoshua Bengio, Jérôme Louradour, Ronan Collobert, and Jason Weston. 2009. Curriculum Learning. In ICML. 41--48.

[4]

Matthew Brown, Gang Hua, and Simon Winder. 2011. Discriminative Learning of Local Image Descriptors. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 33, 1 (2011), 43--57.

Digital Library

[5]

Michael Calonder, Vincent Lepetit, Christoph Strecha, and Pascal Fua. 2010. BRIEF: Binary Robust Independent Elementary Features. In Proceedings of the European Conference on Computer Vision. 778--792.

[6]

Carlos Campos, Richard Elvira, Juan J. Gómez Rodríguez, José M. M. Montiel, and Juan D. Tardós. 2021. ORB-SLAM3: An Accurate Open-Source Library for Visual, Visual--Inertial, and Multimap SLAM. IEEE Transactions on Robotics, Vol. 37, 6 (2021), 1874--1890.

[7]

Mathilde Caron, Piotr Bojanowski, Armand Joulin, and Matthijs Douze. 2018. Deep Clustering for Unsupervised Learning of Visual Features. In Proceedings of the European Conference on Computer Vision. 139--156.

[8]

Jianlong Chang, Lingfeng Wang, Gaofeng Meng, Shiming Xiang, and Chunhong Pan. 2017. Deep Adaptive Image Clustering. In Proceedings of the IEEE International Conference on Computer Vision. 5880--5888.

[9]

Yueqi Duan, Ziwei Wang, Jiwen Lu, Xudong Lin, and Jie Zhou. 2018. GraphBit: Bitwise Interaction Mining via Deep Reinforcement Learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 8270--8279.

[10]

Bin Fan, Hongmin Liu, Hui Zeng, Jiyong Zhang, Xin Liu, and Junwei Han. 2021. Deep Unsupervised Binary Descriptor Learning Through Locality Consistency and Self Distinctiveness. IEEE Transactions on Multimedia, Vol. 23 (2021), 2770--2781.

Digital Library

[11]

Xavier Glorot and Yoshua Bengio. 2010. Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics. 249--256.

[12]

Xifeng Guo, Long Gao, Xinwang Liu, and Jianping Yin. 2017. Improved Deep Embedded Clustering with Local Structure Preservation. In IJCAI. 1753--1759.

[13]

Kun He, Yan Lu, and Stan Sclaroff. 2018. Local Descriptors Optimized for Average Precision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 596--605.

[14]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification. In Proceedings of the IEEE International Conference on Computer Vision. 1026--1034.

Digital Library

[15]

Jared Heinly, Enrique Dunn, and Jan-Michael Frahm. 2012. Comparative Evaluation of Binary Features. In Proceedings of the European Conference on Computer Vision. 759--773.

[16]

Jiabo Huang, Qi Dong, Shaogang Gong, and Xiatian Zhu. 2019. Unsupervised Deep Learning by Neighbourhood Discovery. In ICML.

[17]

Jiabo Huang, Qi Dong, Shaogang Gong, and Xiatian Zhu. 2020. Unsupervised Deep Learning via Affinity Diffusion. In AAAI. 11029--11036.

[18]

Sergey Ioffe and Christian Szegedy. 2015. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. In ICML. 448--456.

Digital Library

[19]

Jeff Johnson, Matthijs Douze, and Hervé Jégou. 2019. Billion-scale similarity search with GPUs. IEEE Transactions on Big Data, Vol. 7, 3 (2019), 535--547.

[20]

Michel Keller, Zetao Chen, Fabiola Maffra, Patrik Schmuck, and Margarita Chli. 2018. Learning Deep Descriptors with Scale-Aware Triplet Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2762--2770.

[21]

Takumi Kobayashi. 2021. t-vMF Similarity for Regularizing In-Class Feature Distribution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6612--6621.

[22]

Shiwei Li, Lu Yuan, Jian Sun, and Long Quan. 2015. Dual-Feature Warping-Based Motion Model Estimation. In Proceedings of the IEEE International Conference on Computer Vision. 4283--4291.

Digital Library

[23]

Kevin Lin, Jiwen Lu, Chu-Song Chen, Jie Zhou, and Ming-Ting Sun. 2019. Unsupervised Deep Learning of Compact Binary Descriptors. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 41, 6 (2019), 1501--1514.

Digital Library

[24]

Venice Erin Liong, Jiwen Lu, Gang Wang, Pierre Moulin, and Jie Zhou. 2015. Deep hashing for compact binary codes learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2475--2483.

[25]

S. Lloyd. 1982. Least squares quantization in PCM. IEEE Transactions on Information Theory, Vol. 28, 2 (1982), 129--137.

Digital Library

[26]

David G. Lowe. 2004. Distinctive Image Features from Scale-Invariant Keys. International Journal of Computer Vision, Vol. 60, 2 (2004), 91--110.

Digital Library

[27]

Yunqi Miao, Zijia Lin, Xiao Ma, Guiguang Ding, and Jungong Han. 2021. Learning Transformation-Invariant Local Descriptors With Low-Coupling Binary Codes. IEEE Transactions on Image Processing, Vol. 30 (2021), 7554--7566.

[28]

Anastasiya Mishchuk, Dmytro Mishkin, Filip Radenović, and Jivri Matas. 2017. Working Hard to Know Your Neighbor's Margins: Local Descriptor Learning Loss. In NeurIPS. 4829--4840.

[29]

Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Kopf, Edward Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. 2019. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In NeurIPS. 8024--8035.

Digital Library

[30]

James Philbin, Ondrej Chum, Michael Isard, Josef Sivic, and Andrew Zisserman. 2007. Object retrieval with large vocabularies and fast spatial matching. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1--8.

[31]

Ethan Rublee, Vincent Rabaud, Kurt Konolige, and Gary Bradski. 2011. ORB: An efficient alternative to SIFT or SURF. In Proceedings of the IEEE International Conference on Computer Vision. 2564--2571.

Digital Library

[32]

Andrew M. Saxe, James L. McClelland, and Surya Ganguli. 2014. Exact solutions to the nonlinear dynamics of learning in deep linear neural networks. In ICLR.

[33]

Johannes L. Schönberger, Hans Hardmeier, Torsten Sattler, and Marc Pollefeys. 2017. Comparative Evaluation of Hand-Crafted and Learned Local Features. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6959--6968.

[34]

Edgar Simo-Serra, Eduard Trulls, Luis Ferraz, Iasonas Kokkinos, Pascal Fua, and Francesc Moreno-Noguer. 2015. Discriminative Learning of Deep Convolutional Feature Point Descriptors. In Proceedings of the IEEE International Conference on Computer Vision. 118--126.

Digital Library

[35]

Yurun Tian, Axel Barroso-Laguna, Tony Ng, Vassileios Balntas, and Krystian Mikolajczyk. 2020. HyNet: Learning Local Descriptor with Hybrid Similarity Measure and Triplet Loss. In NeurIPS. 7401--7412.

[36]

Yurun Tian, Bin Fan, and Fuchao Wu. 2017. L2-Net: Deep Learning of Discriminative Patch Descriptor in Euclidean Space. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6128--6136.

[37]

Yurun Tian, Xin Yu, Bin Fan, Fuchao Wu, Huub Heijnen, and Vassileios Balntas. 2019. SOSNet: Second Order Similarity Regularization for Local Descriptor Learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 11008--11017.

[38]

Tomasz Trzcinski, Mario Christoudias, and Vincent Lepetit. 2015. Learning Image Descriptors with Boosting. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 37, 3 (2015), 597--610.

Digital Library

[39]

Tomasz Trzcinski and Vincent Lepetit. 2012. Efficient Discriminative Projections for Compact Binary Descriptors. In Proceedings of the European Conference on Computer Vision. 228--242.

Digital Library

[40]

Song Wang, Xin Guo, Yun Tie, Lin Qi, and Ling Guan. 2020a. Deep Local Feature Descriptor Learning With Dual Hard Batch Construction. IEEE Transactions on Image Processing, Vol. 29 (2020), 9572--9583.

[41]

Song Wang, Xin Guo, Yun Tie, Lin Qi, and Ling Guan. 2020b. Deep Local Feature Descriptor Learning With Dual Hard Batch Construction. IEEE Transactions on Image Processing, Vol. 29 (2020), 9572--9583.

[42]

Shuang Wang, Yanfeng Li, Xuefeng Liang, Dou Quan, Bowu Yang, Shaowei Wei, and Licheng Jiao. 2019. Better and Faster: Exponential Loss for Image Patch Matching. In Proceedings of the IEEE International Conference on Computer Vision. 4811--4820.

[43]

Junyuan Xie, Ross Girshick, and Ali Farhadi. 2016. Unsupervised Deep Embedding for Clustering Analysis. In ICML. 478--487.

[44]

Bo Yang, Xiao Fu, Nicholas D. Sidiropoulos, and Mingyi Hong. 2017. Towards K-Means-Friendly Spaces: Simultaneous Deep Learning and Clustering. In ICML. 3861--3870.

[45]

Jianwei Yang, Devi Parikh, and Dhruv Batra. 2016. Joint Unsupervised Learning of Deep Representations and Image Clusters. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5147--5156.

[46]

Xin Yu, Yurun Tian, Fatih Porikli, Richard Hartley, Hongdong Li, Huub Heijnen, and Vassileios Balntas. 2019. Unsupervised Extraction of Local Image Descriptors via Relative Distance Ranking Loss. In Proceedings of the IEEE International Conference on Computer Vision Workshop. 2893--2902.

[47]

Sergey Zagoruyko and Nikos Komodakis. 2015. Learning to compare image patches via convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4353--4361.

[48]

Linguang Zhang and Szymon Rusinkiewicz. 2019. Learning Local Descriptors With a CDF-Based Dynamic Soft Margin. In Proceedings of the IEEE International Conference on Computer Vision. 2969--2978.

[49]

Xu Zhang, Felix X. Yu, Sanjiv Kumar, and Shih-Fu Chang. 2017. Learning Spread-Out Local Feature Descriptors. In Proceedings of the IEEE International Conference on Computer Vision. 4605--4613.

[50]

Xuefei Zhe, Shifeng Chen, and Hong Yan. 2019. Directional statistics-based deep metric learning for image classification and retrieval. Pattern Recognition, Vol. 93 (2019), 113--123.

Digital Library

Index Terms

Progressive Unsupervised Learning of Local Descriptors
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
  2. Machine learning
    1. Machine learning approaches

Recommendations

Enhancing Sparse Retrieval via Unsupervised Learning
SIGIR-AP '23: Proceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region

Recent work has shown that neural retrieval models excel at text ranking tasks in a supervised setting when given large amounts of manually labeled training data. However, it remains an open question how to train unsupervised retrieval models that are ...
Improve Deep Learning with Unsupervised Objective
Neural Information Processing
Abstract
We propose a novel approach capable of embedding the unsupervised objective into hidden layers of the deep neural network (DNN) for preserving important unsupervised information. To this end, we exploit a very simple yet effective unsupervised ...
A novel double-layer sparse representation approach for unsupervised dictionary learning

We propose a DLSR approach for dictionary learning.The DLSR formulation enhances reconstructive and discriminative abilities of dictionary.A DLSR-OMP algorithm is developed to solve the DLSR formulation. This paper presents a novel double-layer sparse ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MM '22: Proceedings of the 30th ACM International Conference on Multimedia

October 2022

7537 pages

ISBN:9781450392037

DOI:10.1145/3503161

General Chairs:
João Magalhães
NOVA University of Lisbon, Portugal
,
Alberto del Bimbo
University of Florence, Italy
,
Shin'ichi Satoh
National Institute of Informatics, Japan
,
Nicu Sebe
University of Trento, Italy
,
Program Chairs:
Xavier Alameda-Pineda
Inria, Grenoble, France
,
Qin Jin
Renmin University of China, China
,
Vincent Oria
New Jersey Institute of Technology, USA
,
Laura Toni
University College London, UK

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 October 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Natural Science Foundation of China
National Key R&D Program of China

Conference

MM '22

Sponsor:

SIGMM

MM '22: The 30th ACM International Conference on Multimedia

October 10 - 14, 2022

Lisboa, Portugal

Acceptance Rates

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
142
Total Downloads

Downloads (Last 12 months)51
Downloads (Last 6 weeks)8

Reflects downloads up to 10 Nov 2024

Other Metrics

View Author Metrics

Citations

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents