Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3503161.3547792acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Progressive Unsupervised Learning of Local Descriptors

Published: 10 October 2022 Publication History

Abstract

Training tuple construction is a crucial step in unsupervised local descriptor learning. Existing approaches perform this step relying on heuristics, which suffer from inaccurate supervision signals and struggle to achieve the desired performance. To address the problem, this work presents DescPro, an unsupervised approach that progressively explores both accurate and informative training tuples for model optimization without using heuristics. Specifically, DescPro consists of a Robust Cluster Assignment (RCA) method to infer pairwise relationships by clustering reliable samples with the increasingly powerful CNN model, and a Similarity-weighted Positive Sampling (SPS) strategy to select informative positive pairs for training tuple construction. Extensive experimental results show that, with the collaboration of the above two modules, DescPro can outperform state-of-the-art unsupervised local descriptors and even rival competitive supervised ones on standard benchmarks.

References

[1]
Vassileios Balntas, Riba Edgar, Ponsa Daniel, and Mikolajczyk Krystian. 2016. Learning local feature descriptors with triplets and shallow convolutional neural networks. In Proceedings of the British Machine Vision Conference. 1--11.
[2]
Vassileios Balntas, Karel Lenc, Andrea Vedaldi, and Krystian Mikolajczyk. 2017. HPatches: A Benchmark and Evaluation of Handcrafted and Learned Local Descriptors. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3852--3861.
[3]
Yoshua Bengio, Jérôme Louradour, Ronan Collobert, and Jason Weston. 2009. Curriculum Learning. In ICML. 41--48.
[4]
Matthew Brown, Gang Hua, and Simon Winder. 2011. Discriminative Learning of Local Image Descriptors. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 33, 1 (2011), 43--57.
[5]
Michael Calonder, Vincent Lepetit, Christoph Strecha, and Pascal Fua. 2010. BRIEF: Binary Robust Independent Elementary Features. In Proceedings of the European Conference on Computer Vision. 778--792.
[6]
Carlos Campos, Richard Elvira, Juan J. Gómez Rodríguez, José M. M. Montiel, and Juan D. Tardós. 2021. ORB-SLAM3: An Accurate Open-Source Library for Visual, Visual--Inertial, and Multimap SLAM. IEEE Transactions on Robotics, Vol. 37, 6 (2021), 1874--1890.
[7]
Mathilde Caron, Piotr Bojanowski, Armand Joulin, and Matthijs Douze. 2018. Deep Clustering for Unsupervised Learning of Visual Features. In Proceedings of the European Conference on Computer Vision. 139--156.
[8]
Jianlong Chang, Lingfeng Wang, Gaofeng Meng, Shiming Xiang, and Chunhong Pan. 2017. Deep Adaptive Image Clustering. In Proceedings of the IEEE International Conference on Computer Vision. 5880--5888.
[9]
Yueqi Duan, Ziwei Wang, Jiwen Lu, Xudong Lin, and Jie Zhou. 2018. GraphBit: Bitwise Interaction Mining via Deep Reinforcement Learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 8270--8279.
[10]
Bin Fan, Hongmin Liu, Hui Zeng, Jiyong Zhang, Xin Liu, and Junwei Han. 2021. Deep Unsupervised Binary Descriptor Learning Through Locality Consistency and Self Distinctiveness. IEEE Transactions on Multimedia, Vol. 23 (2021), 2770--2781.
[11]
Xavier Glorot and Yoshua Bengio. 2010. Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics. 249--256.
[12]
Xifeng Guo, Long Gao, Xinwang Liu, and Jianping Yin. 2017. Improved Deep Embedded Clustering with Local Structure Preservation. In IJCAI. 1753--1759.
[13]
Kun He, Yan Lu, and Stan Sclaroff. 2018. Local Descriptors Optimized for Average Precision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 596--605.
[14]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification. In Proceedings of the IEEE International Conference on Computer Vision. 1026--1034.
[15]
Jared Heinly, Enrique Dunn, and Jan-Michael Frahm. 2012. Comparative Evaluation of Binary Features. In Proceedings of the European Conference on Computer Vision. 759--773.
[16]
Jiabo Huang, Qi Dong, Shaogang Gong, and Xiatian Zhu. 2019. Unsupervised Deep Learning by Neighbourhood Discovery. In ICML.
[17]
Jiabo Huang, Qi Dong, Shaogang Gong, and Xiatian Zhu. 2020. Unsupervised Deep Learning via Affinity Diffusion. In AAAI. 11029--11036.
[18]
Sergey Ioffe and Christian Szegedy. 2015. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. In ICML. 448--456.
[19]
Jeff Johnson, Matthijs Douze, and Hervé Jégou. 2019. Billion-scale similarity search with GPUs. IEEE Transactions on Big Data, Vol. 7, 3 (2019), 535--547.
[20]
Michel Keller, Zetao Chen, Fabiola Maffra, Patrik Schmuck, and Margarita Chli. 2018. Learning Deep Descriptors with Scale-Aware Triplet Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2762--2770.
[21]
Takumi Kobayashi. 2021. t-vMF Similarity for Regularizing In-Class Feature Distribution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6612--6621.
[22]
Shiwei Li, Lu Yuan, Jian Sun, and Long Quan. 2015. Dual-Feature Warping-Based Motion Model Estimation. In Proceedings of the IEEE International Conference on Computer Vision. 4283--4291.
[23]
Kevin Lin, Jiwen Lu, Chu-Song Chen, Jie Zhou, and Ming-Ting Sun. 2019. Unsupervised Deep Learning of Compact Binary Descriptors. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 41, 6 (2019), 1501--1514.
[24]
Venice Erin Liong, Jiwen Lu, Gang Wang, Pierre Moulin, and Jie Zhou. 2015. Deep hashing for compact binary codes learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2475--2483.
[25]
S. Lloyd. 1982. Least squares quantization in PCM. IEEE Transactions on Information Theory, Vol. 28, 2 (1982), 129--137.
[26]
David G. Lowe. 2004. Distinctive Image Features from Scale-Invariant Keys. International Journal of Computer Vision, Vol. 60, 2 (2004), 91--110.
[27]
Yunqi Miao, Zijia Lin, Xiao Ma, Guiguang Ding, and Jungong Han. 2021. Learning Transformation-Invariant Local Descriptors With Low-Coupling Binary Codes. IEEE Transactions on Image Processing, Vol. 30 (2021), 7554--7566.
[28]
Anastasiya Mishchuk, Dmytro Mishkin, Filip Radenović, and Jivri Matas. 2017. Working Hard to Know Your Neighbor's Margins: Local Descriptor Learning Loss. In NeurIPS. 4829--4840.
[29]
Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Kopf, Edward Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. 2019. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In NeurIPS. 8024--8035.
[30]
James Philbin, Ondrej Chum, Michael Isard, Josef Sivic, and Andrew Zisserman. 2007. Object retrieval with large vocabularies and fast spatial matching. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1--8.
[31]
Ethan Rublee, Vincent Rabaud, Kurt Konolige, and Gary Bradski. 2011. ORB: An efficient alternative to SIFT or SURF. In Proceedings of the IEEE International Conference on Computer Vision. 2564--2571.
[32]
Andrew M. Saxe, James L. McClelland, and Surya Ganguli. 2014. Exact solutions to the nonlinear dynamics of learning in deep linear neural networks. In ICLR.
[33]
Johannes L. Schönberger, Hans Hardmeier, Torsten Sattler, and Marc Pollefeys. 2017. Comparative Evaluation of Hand-Crafted and Learned Local Features. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6959--6968.
[34]
Edgar Simo-Serra, Eduard Trulls, Luis Ferraz, Iasonas Kokkinos, Pascal Fua, and Francesc Moreno-Noguer. 2015. Discriminative Learning of Deep Convolutional Feature Point Descriptors. In Proceedings of the IEEE International Conference on Computer Vision. 118--126.
[35]
Yurun Tian, Axel Barroso-Laguna, Tony Ng, Vassileios Balntas, and Krystian Mikolajczyk. 2020. HyNet: Learning Local Descriptor with Hybrid Similarity Measure and Triplet Loss. In NeurIPS. 7401--7412.
[36]
Yurun Tian, Bin Fan, and Fuchao Wu. 2017. L2-Net: Deep Learning of Discriminative Patch Descriptor in Euclidean Space. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6128--6136.
[37]
Yurun Tian, Xin Yu, Bin Fan, Fuchao Wu, Huub Heijnen, and Vassileios Balntas. 2019. SOSNet: Second Order Similarity Regularization for Local Descriptor Learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 11008--11017.
[38]
Tomasz Trzcinski, Mario Christoudias, and Vincent Lepetit. 2015. Learning Image Descriptors with Boosting. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 37, 3 (2015), 597--610.
[39]
Tomasz Trzcinski and Vincent Lepetit. 2012. Efficient Discriminative Projections for Compact Binary Descriptors. In Proceedings of the European Conference on Computer Vision. 228--242.
[40]
Song Wang, Xin Guo, Yun Tie, Lin Qi, and Ling Guan. 2020a. Deep Local Feature Descriptor Learning With Dual Hard Batch Construction. IEEE Transactions on Image Processing, Vol. 29 (2020), 9572--9583.
[41]
Song Wang, Xin Guo, Yun Tie, Lin Qi, and Ling Guan. 2020b. Deep Local Feature Descriptor Learning With Dual Hard Batch Construction. IEEE Transactions on Image Processing, Vol. 29 (2020), 9572--9583.
[42]
Shuang Wang, Yanfeng Li, Xuefeng Liang, Dou Quan, Bowu Yang, Shaowei Wei, and Licheng Jiao. 2019. Better and Faster: Exponential Loss for Image Patch Matching. In Proceedings of the IEEE International Conference on Computer Vision. 4811--4820.
[43]
Junyuan Xie, Ross Girshick, and Ali Farhadi. 2016. Unsupervised Deep Embedding for Clustering Analysis. In ICML. 478--487.
[44]
Bo Yang, Xiao Fu, Nicholas D. Sidiropoulos, and Mingyi Hong. 2017. Towards K-Means-Friendly Spaces: Simultaneous Deep Learning and Clustering. In ICML. 3861--3870.
[45]
Jianwei Yang, Devi Parikh, and Dhruv Batra. 2016. Joint Unsupervised Learning of Deep Representations and Image Clusters. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5147--5156.
[46]
Xin Yu, Yurun Tian, Fatih Porikli, Richard Hartley, Hongdong Li, Huub Heijnen, and Vassileios Balntas. 2019. Unsupervised Extraction of Local Image Descriptors via Relative Distance Ranking Loss. In Proceedings of the IEEE International Conference on Computer Vision Workshop. 2893--2902.
[47]
Sergey Zagoruyko and Nikos Komodakis. 2015. Learning to compare image patches via convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4353--4361.
[48]
Linguang Zhang and Szymon Rusinkiewicz. 2019. Learning Local Descriptors With a CDF-Based Dynamic Soft Margin. In Proceedings of the IEEE International Conference on Computer Vision. 2969--2978.
[49]
Xu Zhang, Felix X. Yu, Sanjiv Kumar, and Shih-Fu Chang. 2017. Learning Spread-Out Local Feature Descriptors. In Proceedings of the IEEE International Conference on Computer Vision. 4605--4613.
[50]
Xuefei Zhe, Shifeng Chen, and Hong Yan. 2019. Directional statistics-based deep metric learning for image classification and retrieval. Pattern Recognition, Vol. 93 (2019), 113--123.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
MM '22: Proceedings of the 30th ACM International Conference on Multimedia
October 2022
7537 pages
ISBN:9781450392037
DOI:10.1145/3503161
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 October 2022

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. convolutional neural networks
  2. local descriptor learning
  3. unsupervised learning

Qualifiers

  • Research-article

Funding Sources

Conference

MM '22
Sponsor:

Acceptance Rates

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 142
    Total Downloads
  • Downloads (Last 12 months)51
  • Downloads (Last 6 weeks)8
Reflects downloads up to 10 Nov 2024

Other Metrics

Citations

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media