Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Deep Unsupervised Binary Descriptor Learning Through Locality Consistency and Self Distinctiveness

Published: 01 January 2021 Publication History

Abstract

Deep learning has been successfully applied to learn local feature descriptors in recent years. However, most of existing methods are supervised methods relying on a large number of labeled training patches, which are also proposed for learning real valued descriptors. In this paper, we propose a novel unsupervised deep learning method for binary descriptor learning. The binary descriptors are much more compact and efficient than the real valued descriptors and unsupervised leaning is highly required in many applications due to its label-free characteristic as the annotations are sometimes expensive to obtain. The core idea of our method is to explore the locality consistency in the descriptor space as well as to distinguish different patches while maintaining the ability to match a patch with its geometric transformed ones. We also give a theorical analysis about the role of batch normalization in learning effective binary descriptors. Benefited from this analysis, there is no need to append two additional losses on minimizing the quantization error and maximizing the entropy to the final learning objective like previous works did, thus simplifying our network training. Experiments on four benchmarks demonstrate that the proposed method is able to learn binary descriptors significantly outperforming previous unsupervised binary descriptors, even superior to most supervised ones. Especially, it obtains 21.2% of improvement on the UBC Phototour dataset, and 19.8%, 26.7%, 26.0% of improvements for patch verification, matching, retrieval tasks respectively on the HPatches dataset compared to the previous best unsupervised method.

References

[1]
W. Tan, B. Yan, K. Li, and Q. Tian, “Image retargeting for preserving robust local feature: Application to mobile visual search,”IEEE Trans. Multimedia, vol. 18, no. 1, pp. 128–137, Jan. 2016.
[2]
L. Duan, J. Lin, Z. Wang, T. Huang, and W. Gao, “Weighted component hashing of binary aggregated descriptors for fast visual search,”IEEE Trans. Multimedia, vol. 17, no. 6, pp. 828–842, Jun. 2015.
[3]
V. E. Liong, J. Lu, Y.-P. Tan, and J. Zhou, “Deep video hashing,”IEEE Trans. Multimedia, vol. 19, no. 6, pp. 1209–1219, Jun. 2017.
[4]
J. Zheng, Y. Wang, H. Wang, B. Li, and H. Hu, “A novel projective-consistent plane based image stitching method,”IEEE Trans. Multimedia, vol. 21, no. 10, pp. 2561–2575, Oct. 2019.
[5]
B. Fan, Q. Kong, X. Wang, Z. Wang, S. Xiang, C. Pan, and P. Fua, “A performance evaluation of local features for image based 3d reconstruction,”IEEE Trans. Image Process., vol. 28, no. 10, pp. 4774–4789, Oct. 2019.
[6]
J. Piao and S. Kim, “Real-time visualinertial slam based on adaptive keyframe selection for mobile ar applications,”IEEE Trans. Multimedia, vol. 21, no. 11, pp. 2827–2836, Nov. 2019.
[7]
J. Ren, X. Jiang, J. Yuan, and N. Magnenat-Thalmann, “Sound-event classification using robust texture features for robot hearing,”IEEE Trans. Multimedia, vol. 19, no. 3, pp. 447–458, Mar. 2017.
[8]
S. Kuanar, C. Conly, and K. R. Rao, “Deep learning based hevc in-loop filtering for decoder quality enhancement,” in Proc. Picture Coding Symp., 2018, pp. 164–168.
[9]
S. Kuanar, K. R. Rao, and C. Conly, “Fast mode decision in hevc intra prediction, using region wise cnn feature classification,” in Proc. IEEE Int. Conf. Multimedia Expo Workshops, 2018, pp. 1–4.
[10]
W. Zhang, W. Zhang, K. Liu, and J. Gu, “A feature descriptor based on local normalized difference for real-world texture classification,”IEEE Trans. Multimedia, vol. 20, no. 4, pp. 880–888, Apr. 2018.
[11]
S. Liu and X. Zhang, “Image decolorization combining local features and exposure features,”IEEE Trans. Multimedia, vol. 21, no. 10, pp. 2461–2472, Oct. 2019.
[12]
S. Kuanar, K. R. Rao, M. Bilas, and J. Bredow, “Adaptive cu mode selection in hevc intra prediction: A deep learning approach,”Circuits, Syst., Signal Process., vol. 38, pp. 5081–5102, 2019.
[13]
D. G. Lowe, “Distinctive image features from scale-invariant keypoints,”Int. J. Comput. Vis., vol. 60, no. 2, pp. 91–110, 2004.
[14]
Y. Tian, B. Fan, and F. Wu, “L2Net: Deep learning of discriminative patch descriptor in euclidean space,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2017, pp. 6128–6136.
[15]
A. Mishchuk, D. Mishkin, F. Radenovic, and J. Matas, “Working hard to know your neighbor's margins: Local descriptor learning loss,” in Proc. Neural Inf. Process. Syst., 2017, pp. 4829–4840.
[16]
K. He, Y. Lu, and S. Sclaroff, “Local descriptors optimized for average precision,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2018, pp. 596–605.
[17]
Y. Tian, X. Yu, B. Fan, F. Wu, H. Heijnen, and V. Balntas, “SOSNet: Second order similarity regularization for local descriptor learning,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2019, pp. 11008–11017.
[18]
M. Zieba, P. Semberecki, T. El-Gaaly, and T. Trzcinski, “BinGAN: Learning compact binary descriptors with a regularized GAN,” in Proc. Neural Inf. Process. Syst., 2018, pp. 6237–6247.
[19]
Y. Duan, J. Lu, Z. Wang, J. Feng, and J. Zhou, “Learning deep binary descriptor with multi-quantization,” IEEE Trans. Pattern Anal. Mach. Intell., 2019, pp. 4857–4866.
[20]
K. Lin, J. Lu, C. Chen, J. Zhou, and M. Sun, “Unsupervised deep learning of compact binary descriptors,”IEEE Trans. Pattern Anal. Mach. Intell., vol. 41, no. 6, pp. 1501–1514, Jun. 2019.
[21]
C. Ma, C. Gong, X. Li, X. Huang, W. Liu, and J. Yang, “Toward making unsupervised graph hashing discriminative,”IEEE Trans. Multimedia, vol. 22, no. 3, pp. 760–774, Mar. 2020.
[22]
E. Rublee, V. Rabaud, K. Konolige, and G. Bradski, “ORB: an efficient alternative to SIFT or SURF,” in Proc. Int. Conf. Comput. Vis., 2011, pp. 2564–2571.
[23]
X. Yang and T. Cheng, “Local difference binary for ultra-fast and distinctive feature description,”IEEE Trans. Pattern Anal. Mach. Intell., vol. 36, no. 1, pp. 188–194, Jan. 2014.
[24]
Y. Duan, Z. Wang, J. Lu, X. Lin, and J. Zhou, “GraphBit: Bitwise interaction mining via deep reinforcement learning,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2018, pp. 8270–8279.
[25]
Z. Chen, J. Lu, J. Feng, and J. Zhou, “Nonlinear discrete hashing,”IEEE Trans. Multimedia, vol. 19, no. 1, pp. 123–135, Jan. 2017.
[26]
J. Song, T. He, L. Gao, X. Xu, A. Hanjalic, and H. T. Shen, “Binary generative adversarial networks for image retrieval,” in Proc. AAAI Conf. Artif. Intell., 2018, pp. 394–401.
[27]
J. Zhang and Y. Peng, “Multi-pathway generative adversarial hashing for unsupervised cross-modal retrieval,”IEEE Trans. Multimedia, vol. 22, no. 1, pp. 174–187, Jan. 2020.
[28]
T. Ojala, M. Pietikinen, and D. Harwood, “A comparative study of texture measures with classification based on feature distributions,”Pattern Recognit., vol. 19, no. 3, pp. 51–59, 1996.
[29]
M. Calonder, V. Lepetit, M. Ozuysal, T. Trzcinski, C. Strecha, and P. Fua, “BRIEF: Computing a local binary descriptor very fast,”IEEE Trans. Pattern Anal. Mach. Intell., vol. 33, no. 7, pp. 1281–1298, Jul. 2012.
[30]
S. Leutenegger, M. Chli, and R. Siegwart, “BRISK: Binary robust invariant scalable keypoints,” in Proc. Int. Conf. Comput. Vis., 2011, pp. 2548–2555.
[31]
A. Alahi, R. Ortiz, and P. Vandergheynst, “FREAK: Fast retina keypoint,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2012, pp. 510–517.
[32]
Z. Wang, B. Fan, and F. Wu, “FRIF: Fast robust invariant feature,” in Proc. Brit. Mach. Vision Conf., 2013, pp. 1–12.
[33]
T. Trzcinski and V. Lepetit, “Efficient discriminative projections for compact binary descriptors,” in Proc. Eur. Conf. Comput. Vis., 2012, pp. 228–242.
[34]
B. Fan, Q. Kong, T. Trzcinski, Z. Wang, C. Pan, and P. Fua, “Receptive fields selection for binary feature description,”IEEE Trans. Image Process., vol. 23, no. 6, pp. 2583–2595, Jun. 2014.
[35]
T. Trzcinski, M. Christoudias, and V. Lepetit, “Learning image descriptors with boosting,”IEEE Trans. Pattern Anal. Mach. Intell., vol. 37, no. 3, pp. 597–610, Mar. 2015.
[36]
Y. Gao, W. Huang, and Y. Qiao, “Local multi-grouped binary descriptor with ring-based pooling configuration and optimization,”IEEE Trans. Image Process., vol. 24, no. 12, pp. 4820–4833, Dec. 2015.
[37]
V. Balntas, L. Tang, and K. Mikolajczyk, “Binary online learned descriptors,”IEEE Trans. Pattern Anal. Mach. Intell., vol. 40, no. 3, pp. 555–567, Mar. 2018.
[38]
C. Strecha, A. Bronstein, M. Bronstein, and P. Fua, “LDAHash: Improved matching with smaller descriptors,”IEEE Trans. Pattern Anal. Mach. Intell., vol. 34, no. 1, pp. 66–78, Jan. 2012.
[39]
W. Liu, J. Wang, R. Ji, Y. Jiang, and S. Chang, “Supervised hashing with kernels,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2012, pp. 2074–2081.
[40]
F. Shen, C. Shen, W. Liu, and H. T. Shen, “Supervised discrete hashing,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2015, pp. 37–45.
[41]
K. Ding, B. Fan, C. Huo, S. Xiang, and C. Pan, “Cross-modal hashing via rank-order preserving,”IEEE Trans. Multimedia, vol. 19, no. 3, pp. 571–585, Mar. 2017.
[42]
L. Ma, H. Li, F. Meng, Q. Wu, and K. N. Ngan, “Learning efficient binary codes from high-level feature representations for multilabel image retrieval,”IEEE Trans. Multimedia, vol. 19, no. 11, pp. 2545–2560, Nov. 2017.
[43]
C. Kang, L. Zhu, X. Qian, J. Han, M. Wang, and Y. Y. Tang, “Geometry and topology preserving hashing for sift feature,”IEEE Trans. Multimedia, vol. 21, no. 6, pp. 1563–1576, Jun. 2019.
[44]
L. Zhang and S. Rusinkiewicz, “Learning local descriptors with a cdf-based dynamic soft margin,” in Proc. Int. Conf. Comput. Vis., 2019, pp. 2969–2978.
[45]
Z. Wang, B. Fan, and G. W. an Fuchao Wu, “Exploring local and overall ordinal information for robust feature description,”IEEE Trans. Pattern Anal. Mach. Intell., vol. 38, no. 11, pp. 2198–2211, Nov. 2016.
[46]
V. Balntas, K. Lenc, A. Vedaldi, and K. Mikolajczyk, “HPatches: A benchmark and evaluation of handcrafted and learned local descriptors,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2017, pp. 3852–3861.
[47]
R. Xia, Y. Pan, H. Lai, C. Liu, and S. Yan, “Supervised hashing for image retrieval via image representation learning,” in Proc. AAAI Conf. Artif. Intell., 2014, pp. 2156–2162.
[48]
H. Lai, Y. Pan, Y. Liu, and S. Yan, “Simultaneous feature learning and hash coding with deep neural networks,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2015, pp. 3270–3278.
[49]
W. Li, S. Wang, and W. Kang, “Feature learning based deep supervised hashing with pairwise labels,” in Proc. Int. Joint Conf. Artif. Intell., 2016, pp. 1711–1717.
[50]
Z. Zhang, Q. Zou, Y. Lin, L. Chen, and S. Wang, “Improved deep hashing with soft pairwise similarity for multi-label image retrieval,”IEEE Trans. Multimedia, vol. 22, no. 2, pp. 540–553, Feb. 2020.
[51]
D. Wang, P. Cui, M. Ou, and W. Zhu, “Learning compact hash codes for multimodal representations using orthogonal deep structure,”IEEE Trans. Multimedia, vol. 17, no. 9, pp. 1404–1416, Sep. 2015.
[52]
J. Bai, B. Ni, M. Wang, Z. Li, S. Cheng, X. Yang, C. Hu, and W. Gao, “Deep progressive hashing for image retrieval,”IEEE Trans. Multimedia, vol. 21, no. 12, pp. 3178–3193, Dec. 2019.
[53]
J. Bai, Z. Li, B. Ni, M. Wang, X. Yang, C. Hu, and W. Gao, “Loopy residual hashing: Filling the quantization gap for image retrieval,”IEEE Trans. Multimedia, vol. 22, no. 1, pp. 215–228, Jan. 2020.
[54]
K. He, F. Cakir, S. A. Bargal, and S. Sclaroff, “Hashing as tie-aware learning to rank,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2018, pp. 4023–4032.
[55]
V. E. Liong, J. Lu, G. Wang, P. Moulin, and J. Zhou, “Deep hashing for compact binary codes learning,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2015, pp. 2475–2483.
[56]
Y. Duan, J. Lu, Z. Wang, J. Feng, and J. Zhou, “Learning deep binary descriptor with multi-quantization,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2017, pp. 1183–1192.
[57]
K. He, X. Zhang, S. Ren, and J. Sun, “Delving deep into rectifiers: Surpassing human-level performance on imagenet classification,” in Proc. Int. Conf. Comput. Vis., 2015, pp. 1026–1034.
[58]
Z. Wang, B. Fan, and F. Wu, “Affine subspace representation for feature description,” in Proc. Eur. Conf. Comput. Vis., 2014, pp. 94–108.
[59]
A. Vedaldi and K. Lenc, “Matconvnet - convolutional neural networks for matlab,” in Proc. ACM Multimedia, 2015, pp. 689–692.
[60]
M. Brown, G. Hua, and S. Winder, “Discriminative learning of local image descriptors,”IEEE Trans. Pattern Anal. Mach. Intell., vol. 33, no. 1, pp. 43–57, Jan. 2011.
[61]
N. Snavely, S. M. Seitz, and R. Szeliski, “Photo tourism: Exploring photo collections in 3d,” ACM Trans. Graphics, vol. 25, no. 3, pp. 835–846, 2006.
[62]
M. Paulin, M. Douze, Z. Harchaoui, J. Mairal, F. Perronin, and C. Schmid, “Local convolutional features with unsupervised training for image retrieval,” in Proc. Int. Conf. Comput. Vis., 2015, pp. 91–99.
[63]
K. Mikolajczyk and C. Schmid, “A performance evaluation of local descriptors,”IEEE Trans. Pattern Anal. and Mach. Intell., vol. 27, no. 10, pp. 1615–1630, Oct. 2005.
[64]
K. Mikolajczyk and C. Schmid, “Scale & affine invariant interest point detectors,”Int. J. Comput. Vis., vol. 60, no. 1, pp. 63–86, 2004.
[65]
X. Boix, M. Gygli, G. Roig, and L. V. Gool, “Sparse quantization for patch description,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2013, pp. 2842–2849.
[66]
A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in Proc. Advances Neural Inf. Process. Syst., 2012, pp. 1097–1105.

Cited By

View all
  • (2024)Geometry-Enhanced Attentive Multi-View Stereo for Challenging Matching ScenariosIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2024.337669234:8(7401-7416)Online publication date: 1-Aug-2024
  • (2024)AFSRNet: learning local descriptors with adaptive multi-scale feature fusion and symmetric regularizationApplied Intelligence10.1007/s10489-024-05418-w54:7(5406-5416)Online publication date: 1-Apr-2024
  • (2023)Revisiting unsupervised local descriptor learningProceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence and Thirty-Fifth Conference on Innovative Applications of Artificial Intelligence and Thirteenth Symposium on Educational Advances in Artificial Intelligence10.1609/aaai.v37i3.25367(2680-2688)Online publication date: 7-Feb-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image IEEE Transactions on Multimedia
IEEE Transactions on Multimedia  Volume 23, Issue
2021
1967 pages

Publisher

IEEE Press

Publication History

Published: 01 January 2021

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 10 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Geometry-Enhanced Attentive Multi-View Stereo for Challenging Matching ScenariosIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2024.337669234:8(7401-7416)Online publication date: 1-Aug-2024
  • (2024)AFSRNet: learning local descriptors with adaptive multi-scale feature fusion and symmetric regularizationApplied Intelligence10.1007/s10489-024-05418-w54:7(5406-5416)Online publication date: 1-Apr-2024
  • (2023)Revisiting unsupervised local descriptor learningProceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence and Thirty-Fifth Conference on Innovative Applications of Artificial Intelligence and Thirteenth Symposium on Educational Advances in Artificial Intelligence10.1609/aaai.v37i3.25367(2680-2688)Online publication date: 7-Feb-2023
  • (2023)Attention Weighted Local DescriptorsIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2023.326672845:9(10632-10649)Online publication date: 1-Sep-2023
  • (2023)Covariant Peak Constraint for Accurate Keypoint Detection and Keypoint-Specific Descriptor LearningIEEE Transactions on Multimedia10.1109/TMM.2023.333321126(5383-5397)Online publication date: 15-Nov-2023
  • (2023)CNDesc: Cross Normalization for Local Descriptors LearningIEEE Transactions on Multimedia10.1109/TMM.2022.316933125(3989-4001)Online publication date: 1-Jan-2023
  • (2023)Seeing Through Darkness: Visual Localization at Night via Weakly Supervised Learning of Domain Invariant FeaturesIEEE Transactions on Multimedia10.1109/TMM.2022.315416525(1713-1726)Online publication date: 1-Jan-2023
  • (2022)Progressive Unsupervised Learning of Local DescriptorsProceedings of the 30th ACM International Conference on Multimedia10.1145/3503161.3547792(2371-2379)Online publication date: 10-Oct-2022

View Options

View options

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media