research-article

Deep Unsupervised Binary Descriptor Learning Through Locality Consistency and Self Distinctiveness

Authors:

Junwei HanAuthors Info & Claims

IEEE Transactions on Multimedia, Volume 23

Pages 2770 - 2781

https://doi.org/10.1109/TMM.2020.3016122

Published: 01 January 2021 Publication History

Abstract

Deep learning has been successfully applied to learn local feature descriptors in recent years. However, most of existing methods are supervised methods relying on a large number of labeled training patches, which are also proposed for learning real valued descriptors. In this paper, we propose a novel unsupervised deep learning method for binary descriptor learning. The binary descriptors are much more compact and efficient than the real valued descriptors and unsupervised leaning is highly required in many applications due to its label-free characteristic as the annotations are sometimes expensive to obtain. The core idea of our method is to explore the locality consistency in the descriptor space as well as to distinguish different patches while maintaining the ability to match a patch with its geometric transformed ones. We also give a theorical analysis about the role of batch normalization in learning effective binary descriptors. Benefited from this analysis, there is no need to append two additional losses on minimizing the quantization error and maximizing the entropy to the final learning objective like previous works did, thus simplifying our network training. Experiments on four benchmarks demonstrate that the proposed method is able to learn binary descriptors significantly outperforming previous unsupervised binary descriptors, even superior to most supervised ones. Especially, it obtains 21.2% of improvement on the UBC Phototour dataset, and 19.8%, 26.7%, 26.0% of improvements for patch verification, matching, retrieval tasks respectively on the HPatches dataset compared to the previous best unsupervised method.

References

[1]

W. Tan, B. Yan, K. Li, and Q. Tian, “Image retargeting for preserving robust local feature: Application to mobile visual search,”IEEE Trans. Multimedia, vol. 18, no. 1, pp. 128–137, Jan. 2016.

Digital Library

[2]

L. Duan, J. Lin, Z. Wang, T. Huang, and W. Gao, “Weighted component hashing of binary aggregated descriptors for fast visual search,”IEEE Trans. Multimedia, vol. 17, no. 6, pp. 828–842, Jun. 2015.

Digital Library

[3]

V. E. Liong, J. Lu, Y.-P. Tan, and J. Zhou, “Deep video hashing,”IEEE Trans. Multimedia, vol. 19, no. 6, pp. 1209–1219, Jun. 2017.

Digital Library

[4]

J. Zheng, Y. Wang, H. Wang, B. Li, and H. Hu, “A novel projective-consistent plane based image stitching method,”IEEE Trans. Multimedia, vol. 21, no. 10, pp. 2561–2575, Oct. 2019.

Digital Library

[5]

B. Fan, Q. Kong, X. Wang, Z. Wang, S. Xiang, C. Pan, and P. Fua, “A performance evaluation of local features for image based 3d reconstruction,”IEEE Trans. Image Process., vol. 28, no. 10, pp. 4774–4789, Oct. 2019.

[6]

J. Piao and S. Kim, “Real-time visualinertial slam based on adaptive keyframe selection for mobile ar applications,”IEEE Trans. Multimedia, vol. 21, no. 11, pp. 2827–2836, Nov. 2019.

[7]

J. Ren, X. Jiang, J. Yuan, and N. Magnenat-Thalmann, “Sound-event classification using robust texture features for robot hearing,”IEEE Trans. Multimedia, vol. 19, no. 3, pp. 447–458, Mar. 2017.

Digital Library

[8]

S. Kuanar, C. Conly, and K. R. Rao, “Deep learning based hevc in-loop filtering for decoder quality enhancement,” in Proc. Picture Coding Symp., 2018, pp. 164–168.

[9]

S. Kuanar, K. R. Rao, and C. Conly, “Fast mode decision in hevc intra prediction, using region wise cnn feature classification,” in Proc. IEEE Int. Conf. Multimedia Expo Workshops, 2018, pp. 1–4.

[10]

W. Zhang, W. Zhang, K. Liu, and J. Gu, “A feature descriptor based on local normalized difference for real-world texture classification,”IEEE Trans. Multimedia, vol. 20, no. 4, pp. 880–888, Apr. 2018.

Digital Library

[11]

S. Liu and X. Zhang, “Image decolorization combining local features and exposure features,”IEEE Trans. Multimedia, vol. 21, no. 10, pp. 2461–2472, Oct. 2019.

Digital Library

[12]

S. Kuanar, K. R. Rao, M. Bilas, and J. Bredow, “Adaptive cu mode selection in hevc intra prediction: A deep learning approach,”Circuits, Syst., Signal Process., vol. 38, pp. 5081–5102, 2019.

Digital Library

[13]

D. G. Lowe, “Distinctive image features from scale-invariant keypoints,”Int. J. Comput. Vis., vol. 60, no. 2, pp. 91–110, 2004.

Digital Library

[14]

Y. Tian, B. Fan, and F. Wu, “L2Net: Deep learning of discriminative patch descriptor in euclidean space,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2017, pp. 6128–6136.

[15]

A. Mishchuk, D. Mishkin, F. Radenovic, and J. Matas, “Working hard to know your neighbor's margins: Local descriptor learning loss,” in Proc. Neural Inf. Process. Syst., 2017, pp. 4829–4840.

[16]

K. He, Y. Lu, and S. Sclaroff, “Local descriptors optimized for average precision,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2018, pp. 596–605.

[17]

Y. Tian, X. Yu, B. Fan, F. Wu, H. Heijnen, and V. Balntas, “SOSNet: Second order similarity regularization for local descriptor learning,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2019, pp. 11008–11017.

[18]

M. Zieba, P. Semberecki, T. El-Gaaly, and T. Trzcinski, “BinGAN: Learning compact binary descriptors with a regularized GAN,” in Proc. Neural Inf. Process. Syst., 2018, pp. 6237–6247.

[19]

Y. Duan, J. Lu, Z. Wang, J. Feng, and J. Zhou, “Learning deep binary descriptor with multi-quantization,” IEEE Trans. Pattern Anal. Mach. Intell., 2019, pp. 4857–4866.

[20]

K. Lin, J. Lu, C. Chen, J. Zhou, and M. Sun, “Unsupervised deep learning of compact binary descriptors,”IEEE Trans. Pattern Anal. Mach. Intell., vol. 41, no. 6, pp. 1501–1514, Jun. 2019.

Digital Library

[21]

C. Ma, C. Gong, X. Li, X. Huang, W. Liu, and J. Yang, “Toward making unsupervised graph hashing discriminative,”IEEE Trans. Multimedia, vol. 22, no. 3, pp. 760–774, Mar. 2020.

[22]

E. Rublee, V. Rabaud, K. Konolige, and G. Bradski, “ORB: an efficient alternative to SIFT or SURF,” in Proc. Int. Conf. Comput. Vis., 2011, pp. 2564–2571.

[23]

X. Yang and T. Cheng, “Local difference binary for ultra-fast and distinctive feature description,”IEEE Trans. Pattern Anal. Mach. Intell., vol. 36, no. 1, pp. 188–194, Jan. 2014.

Digital Library

[24]

Y. Duan, Z. Wang, J. Lu, X. Lin, and J. Zhou, “GraphBit: Bitwise interaction mining via deep reinforcement learning,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2018, pp. 8270–8279.

[25]

Z. Chen, J. Lu, J. Feng, and J. Zhou, “Nonlinear discrete hashing,”IEEE Trans. Multimedia, vol. 19, no. 1, pp. 123–135, Jan. 2017.

Digital Library

[26]

J. Song, T. He, L. Gao, X. Xu, A. Hanjalic, and H. T. Shen, “Binary generative adversarial networks for image retrieval,” in Proc. AAAI Conf. Artif. Intell., 2018, pp. 394–401.

[27]

J. Zhang and Y. Peng, “Multi-pathway generative adversarial hashing for unsupervised cross-modal retrieval,”IEEE Trans. Multimedia, vol. 22, no. 1, pp. 174–187, Jan. 2020.

Digital Library

[28]

T. Ojala, M. Pietikinen, and D. Harwood, “A comparative study of texture measures with classification based on feature distributions,”Pattern Recognit., vol. 19, no. 3, pp. 51–59, 1996.

[29]

M. Calonder, V. Lepetit, M. Ozuysal, T. Trzcinski, C. Strecha, and P. Fua, “BRIEF: Computing a local binary descriptor very fast,”IEEE Trans. Pattern Anal. Mach. Intell., vol. 33, no. 7, pp. 1281–1298, Jul. 2012.

[30]

S. Leutenegger, M. Chli, and R. Siegwart, “BRISK: Binary robust invariant scalable keypoints,” in Proc. Int. Conf. Comput. Vis., 2011, pp. 2548–2555.

[31]

A. Alahi, R. Ortiz, and P. Vandergheynst, “FREAK: Fast retina keypoint,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2012, pp. 510–517.

[32]

Z. Wang, B. Fan, and F. Wu, “FRIF: Fast robust invariant feature,” in Proc. Brit. Mach. Vision Conf., 2013, pp. 1–12.

[33]

T. Trzcinski and V. Lepetit, “Efficient discriminative projections for compact binary descriptors,” in Proc. Eur. Conf. Comput. Vis., 2012, pp. 228–242.

[34]

B. Fan, Q. Kong, T. Trzcinski, Z. Wang, C. Pan, and P. Fua, “Receptive fields selection for binary feature description,”IEEE Trans. Image Process., vol. 23, no. 6, pp. 2583–2595, Jun. 2014.

Digital Library

[35]

T. Trzcinski, M. Christoudias, and V. Lepetit, “Learning image descriptors with boosting,”IEEE Trans. Pattern Anal. Mach. Intell., vol. 37, no. 3, pp. 597–610, Mar. 2015.

Digital Library

[36]

Y. Gao, W. Huang, and Y. Qiao, “Local multi-grouped binary descriptor with ring-based pooling configuration and optimization,”IEEE Trans. Image Process., vol. 24, no. 12, pp. 4820–4833, Dec. 2015.

Digital Library

[37]

V. Balntas, L. Tang, and K. Mikolajczyk, “Binary online learned descriptors,”IEEE Trans. Pattern Anal. Mach. Intell., vol. 40, no. 3, pp. 555–567, Mar. 2018.

[38]

C. Strecha, A. Bronstein, M. Bronstein, and P. Fua, “LDAHash: Improved matching with smaller descriptors,”IEEE Trans. Pattern Anal. Mach. Intell., vol. 34, no. 1, pp. 66–78, Jan. 2012.

Digital Library

[39]

W. Liu, J. Wang, R. Ji, Y. Jiang, and S. Chang, “Supervised hashing with kernels,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2012, pp. 2074–2081.

[40]

F. Shen, C. Shen, W. Liu, and H. T. Shen, “Supervised discrete hashing,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2015, pp. 37–45.

[41]

K. Ding, B. Fan, C. Huo, S. Xiang, and C. Pan, “Cross-modal hashing via rank-order preserving,”IEEE Trans. Multimedia, vol. 19, no. 3, pp. 571–585, Mar. 2017.

Digital Library

[42]

L. Ma, H. Li, F. Meng, Q. Wu, and K. N. Ngan, “Learning efficient binary codes from high-level feature representations for multilabel image retrieval,”IEEE Trans. Multimedia, vol. 19, no. 11, pp. 2545–2560, Nov. 2017.

[43]

C. Kang, L. Zhu, X. Qian, J. Han, M. Wang, and Y. Y. Tang, “Geometry and topology preserving hashing for sift feature,”IEEE Trans. Multimedia, vol. 21, no. 6, pp. 1563–1576, Jun. 2019.

Digital Library

[44]

L. Zhang and S. Rusinkiewicz, “Learning local descriptors with a cdf-based dynamic soft margin,” in Proc. Int. Conf. Comput. Vis., 2019, pp. 2969–2978.

[45]

Z. Wang, B. Fan, and G. W. an Fuchao Wu, “Exploring local and overall ordinal information for robust feature description,”IEEE Trans. Pattern Anal. Mach. Intell., vol. 38, no. 11, pp. 2198–2211, Nov. 2016.

Digital Library

[46]

V. Balntas, K. Lenc, A. Vedaldi, and K. Mikolajczyk, “HPatches: A benchmark and evaluation of handcrafted and learned local descriptors,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2017, pp. 3852–3861.

[47]

R. Xia, Y. Pan, H. Lai, C. Liu, and S. Yan, “Supervised hashing for image retrieval via image representation learning,” in Proc. AAAI Conf. Artif. Intell., 2014, pp. 2156–2162.

[48]

H. Lai, Y. Pan, Y. Liu, and S. Yan, “Simultaneous feature learning and hash coding with deep neural networks,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2015, pp. 3270–3278.

[49]

W. Li, S. Wang, and W. Kang, “Feature learning based deep supervised hashing with pairwise labels,” in Proc. Int. Joint Conf. Artif. Intell., 2016, pp. 1711–1717.

[50]

Z. Zhang, Q. Zou, Y. Lin, L. Chen, and S. Wang, “Improved deep hashing with soft pairwise similarity for multi-label image retrieval,”IEEE Trans. Multimedia, vol. 22, no. 2, pp. 540–553, Feb. 2020.

Digital Library

[51]

D. Wang, P. Cui, M. Ou, and W. Zhu, “Learning compact hash codes for multimodal representations using orthogonal deep structure,”IEEE Trans. Multimedia, vol. 17, no. 9, pp. 1404–1416, Sep. 2015.

Digital Library

[52]

J. Bai, B. Ni, M. Wang, Z. Li, S. Cheng, X. Yang, C. Hu, and W. Gao, “Deep progressive hashing for image retrieval,”IEEE Trans. Multimedia, vol. 21, no. 12, pp. 3178–3193, Dec. 2019.

Digital Library

[53]

J. Bai, Z. Li, B. Ni, M. Wang, X. Yang, C. Hu, and W. Gao, “Loopy residual hashing: Filling the quantization gap for image retrieval,”IEEE Trans. Multimedia, vol. 22, no. 1, pp. 215–228, Jan. 2020.

Digital Library

[54]

K. He, F. Cakir, S. A. Bargal, and S. Sclaroff, “Hashing as tie-aware learning to rank,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2018, pp. 4023–4032.

[55]

V. E. Liong, J. Lu, G. Wang, P. Moulin, and J. Zhou, “Deep hashing for compact binary codes learning,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2015, pp. 2475–2483.

[56]

Y. Duan, J. Lu, Z. Wang, J. Feng, and J. Zhou, “Learning deep binary descriptor with multi-quantization,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2017, pp. 1183–1192.

[57]

K. He, X. Zhang, S. Ren, and J. Sun, “Delving deep into rectifiers: Surpassing human-level performance on imagenet classification,” in Proc. Int. Conf. Comput. Vis., 2015, pp. 1026–1034.

[58]

Z. Wang, B. Fan, and F. Wu, “Affine subspace representation for feature description,” in Proc. Eur. Conf. Comput. Vis., 2014, pp. 94–108.

[59]

A. Vedaldi and K. Lenc, “Matconvnet - convolutional neural networks for matlab,” in Proc. ACM Multimedia, 2015, pp. 689–692.

[60]

M. Brown, G. Hua, and S. Winder, “Discriminative learning of local image descriptors,”IEEE Trans. Pattern Anal. Mach. Intell., vol. 33, no. 1, pp. 43–57, Jan. 2011.

Digital Library

[61]

N. Snavely, S. M. Seitz, and R. Szeliski, “Photo tourism: Exploring photo collections in 3d,” ACM Trans. Graphics, vol. 25, no. 3, pp. 835–846, 2006.

Digital Library

[62]

M. Paulin, M. Douze, Z. Harchaoui, J. Mairal, F. Perronin, and C. Schmid, “Local convolutional features with unsupervised training for image retrieval,” in Proc. Int. Conf. Comput. Vis., 2015, pp. 91–99.

[63]

K. Mikolajczyk and C. Schmid, “A performance evaluation of local descriptors,”IEEE Trans. Pattern Anal. and Mach. Intell., vol. 27, no. 10, pp. 1615–1630, Oct. 2005.

Digital Library

[64]

K. Mikolajczyk and C. Schmid, “Scale & affine invariant interest point detectors,”Int. J. Comput. Vis., vol. 60, no. 1, pp. 63–86, 2004.

Digital Library

[65]

X. Boix, M. Gygli, G. Roig, and L. V. Gool, “Sparse quantization for patch description,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2013, pp. 2842–2849.

[66]

A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in Proc. Advances Neural Inf. Process. Syst., 2012, pp. 1097–1105.

Digital Library

Cited By

Liu YCai QWang CYang JFan HDong JChen S(2024)Geometry-Enhanced Attentive Multi-View Stereo for Challenging Matching ScenariosIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2024.337669234:8(7401-7416)Online publication date: 1-Aug-2024
https://dl.acm.org/doi/10.1109/TCSVT.2024.3376692
Li DLiang HLam K(2024)AFSRNet: learning local descriptors with adaptive multi-scale feature fusion and symmetric regularizationApplied Intelligence10.1007/s10489-024-05418-w54:7(5406-5416)Online publication date: 1-Apr-2024
https://dl.acm.org/doi/10.1007/s10489-024-05418-w
Wang WZhang LHuang HWilliams BChen YNeville J(2023)Revisiting unsupervised local descriptor learningProceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence and Thirty-Fifth Conference on Innovative Applications of Artificial Intelligence and Thirteenth Symposium on Educational Advances in Artificial Intelligence10.1609/aaai.v37i3.25367(2680-2688)Online publication date: 7-Feb-2023
https://dl.acm.org/doi/10.1609/aaai.v37i3.25367
Show More Cited By

Recommendations

Locality-preserving descriptor for robust texture feature representation

Recent texture classification methods include rotation-invariant feature-encoding procedures based on local binary patterns. Such methods are robust to rotational changes, but they result in discarded locality information (i.e., geometrical information) ...
Binary Feature Descriptor for Omnidirectional Images Processing
IPAC '15: Proceedings of the International Conference on Intelligent Information Processing, Security and Advanced Communication

An omnidirectional image has a 360° view around a viewpoint and which could be applied in a variety of fields, such as autonomous navigation, surveillance systems, virtual reality and remote monitoring, is presented. Many techniques of digital image ...
Image retrieval based on deep Tamura feature descriptor
Abstract
Various levels of visual features have different effects in image retrieval, and deep features can express higher-level features or semantic information. Tamura texture feature belongs to the handcrafted feature, and it can represent texture ...

Comments

Information & Contributors

Information

Published In

cover image IEEE Transactions on Multimedia

IEEE Transactions on Multimedia Volume 23, Issue

2021

1967 pages

ISSN:1520-9210

Issue’s Table of Contents

1520-9210 © 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See https://www.ieee.org/publications/rights/index.html for more information.

Publisher

IEEE Press

Publication History

Published: 01 January 2021

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

7
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 10 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Liu YCai QWang CYang JFan HDong JChen S(2024)Geometry-Enhanced Attentive Multi-View Stereo for Challenging Matching ScenariosIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2024.337669234:8(7401-7416)Online publication date: 1-Aug-2024
https://dl.acm.org/doi/10.1109/TCSVT.2024.3376692
Li DLiang HLam K(2024)AFSRNet: learning local descriptors with adaptive multi-scale feature fusion and symmetric regularizationApplied Intelligence10.1007/s10489-024-05418-w54:7(5406-5416)Online publication date: 1-Apr-2024
https://dl.acm.org/doi/10.1007/s10489-024-05418-w
Wang WZhang LHuang HWilliams BChen YNeville J(2023)Revisiting unsupervised local descriptor learningProceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence and Thirty-Fifth Conference on Innovative Applications of Artificial Intelligence and Thirteenth Symposium on Educational Advances in Artificial Intelligence10.1609/aaai.v37i3.25367(2680-2688)Online publication date: 7-Feb-2023
https://dl.acm.org/doi/10.1609/aaai.v37i3.25367
Wang CXu RLu KXu SMeng WZhang YFan BZhang X(2023)Attention Weighted Local DescriptorsIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2023.326672845:9(10632-10649)Online publication date: 1-Sep-2023
https://dl.acm.org/doi/10.1109/TPAMI.2023.3266728
Fu YZhang PTang FWu Y(2023)Covariant Peak Constraint for Accurate Keypoint Detection and Keypoint-Specific Descriptor LearningIEEE Transactions on Multimedia10.1109/TMM.2023.333321126(5383-5397)Online publication date: 15-Nov-2023
https://dl.acm.org/doi/10.1109/TMM.2023.3333211
Wang CXu RXu SMeng WZhang X(2023)CNDesc: Cross Normalization for Local Descriptors LearningIEEE Transactions on Multimedia10.1109/TMM.2022.316933125(3989-4001)Online publication date: 1-Jan-2023
https://dl.acm.org/doi/10.1109/TMM.2022.3169331
Fan BYang YFeng WWu FLu JLiu H(2023)Seeing Through Darkness: Visual Localization at Night via Weakly Supervised Learning of Domain Invariant FeaturesIEEE Transactions on Multimedia10.1109/TMM.2022.315416525(1713-1726)Online publication date: 1-Jan-2023
https://dl.acm.org/doi/10.1109/TMM.2022.3154165
Wang WZhang LHuang HMagalhães Jdel Bimbo ASatoh SSebe NAlameda-Pineda XJin QOria VToni L(2022)Progressive Unsupervised Learning of Local DescriptorsProceedings of the 30th ACM International Conference on Multimedia10.1145/3503161.3547792(2371-2379)Online publication date: 10-Oct-2022
https://dl.acm.org/doi/10.1145/3503161.3547792

View Options

View options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Issue’s Table of Contents