research-article

Object and spatial discrimination makes weakly supervised local feature better

Authors:

Feng YangAuthors Info & Claims

Volume 180, Issue C

https://doi.org/10.1016/j.neunet.2024.106697

Published: 01 December 2024 Publication History

Abstract

Local feature extraction plays a crucial role in numerous critical visual tasks. However, there remains room for improvement in both descriptors and keypoints, particularly regarding the discriminative power of descriptors and the localization precision of keypoints. To address these challenges, this study introduces a novel local feature extraction pipeline named OSDFeat (Object and Spatial Discrimination Feature). OSDFeat employs a decoupling strategy, training descriptor and detection networks independently. Inspired by semantic correspondence, we propose an Object and Spatial Discrimination ResUNet (OSD-ResUNet). OSD-ResUNet captures features from the feature map that differentiate object appearance and spatial context, thus enhancing descriptor performance. To further improve the discriminative capability of descriptors, we propose a Discrimination Information Retained Normalization module (DIRN). DIRN complementarily integrates spatial-wise normalization and channel-wise normalization, yielding descriptors that are more distinguishable and informative. In the detection network, we propose a Cross Saliency Pooling module (CSP). CSP employs a cross-shaped kernel to aggregate long-range context in both vertical and horizontal dimensions. By enhancing the saliency of keypoints, CSP enables the detection network to effectively utilize descriptor information and achieve more precise localization of keypoints. Compared to the previous best local feature extraction methods, OSDFeat achieves Mean Matching Accuracy of 79.4% in local feature matching task, improving by 1.9% and achieving state-of-the-art results. Additionally, OSDFeat achieves competitive results in Visual Localization and 3D Reconstruction. The results of this study indicate that object and spatial discrimination can improve the accuracy and robustness of local feature, even in challenging environments. The code is available at https://github.com/pandaandyy/OSDFeat.

Graphical abstract

Display Omitted

Highlights

•

We propose OSD-ResUNet, which enhances descriptor learning by incorporating object appearance and spatial context.

•

We propose DIRN, which combines spatial-wise and channel-wise normalization to preserve discriminative information.

•

We propose CSP, which enhances keypoint saliency by aggregating global and local information with long-range dependencies.

•

We propose OSDFeat, a local feature extraction pipeline, achieving state-of-the-art on Hpatches and competitive results on Aachen Day-Night and ETH benchmarks.

References

[1]

Almalioglu Y., Turan M., Saputra M.R.U., de Gusmão P.P., Markham A., Trigoni N., Selfvio: Self-supervised deep monocular visual–Inertial odometry and depth estimation, Neural Networks 150 (2022) 119–136.

Abstract

Graphical abstract

Highlights

References

Index Terms

Recommendations

Local feature-based multi-object recognition scheme for surveillance

Binary feature from intensity quantization and weakly spatial contextual coding for image search

Feature extraction based on co-occurrence of adjacent local binary patterns

Comments

Information

Published In

Publisher

Publication History

Author Tags

Qualifiers

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

View options

Share

Share this Publication link

Share on social media

Affiliations