research-article

Deep Ordinal Hashing With Spatial Attention

Authors:

Jinhui TangAuthors Info & Claims

IEEE Transactions on Image Processing, Volume 28, Issue 5

Pages 2173 - 2186

https://doi.org/10.1109/TIP.2018.2883522

Published: 01 May 2019 Publication History

Abstract

Hashing has attracted increasing research attention in recent years due to its high efficiency of computation and storage in image retrieval. Recent works have demonstrated the superiority of simultaneous feature representations and hash functions learning with deep neural networks. However, most existing deep hashing methods directly learn the hash functions by encoding the global semantic information, while ignoring the local spatial information of images. The loss of local spatial structure makes the performance bottleneck of hash functions, therefore limiting its application for accurate similarity retrieval. In this paper, we propose a novel deep ordinal hashing (DOH) method, which learns ordinal representations to generate ranking-based hash codes by leveraging the ranking structure of feature space from both local and global views. In particular, to effectively build the ranking structure, we propose to learn the rank correlation space by exploiting the local spatial information from fully convolutional network and the global semantic information from the convolutional neural network simultaneously. More specifically, an effective spatial attention model is designed to capture the local spatial information by selectively learning well-specified locations closely related to target objects. In such hashing framework, the local spatial and global semantic nature of images is captured in an end-to-end ranking-to-hashing manner. Experimental results conducted on three widely used datasets demonstrate that the proposed DOH method significantly outperforms the state-of-the-art hashing methods.

References

[1]

A. Gionis, P. Indyk, and R. Motwani, “ Similarity search in high dimensions via hashing,” in Proc. VLDB Conf., 1999, pp. 518–529.

Digital Library

[2]

L. Jin, K. Li, H. Hu, G.-J. Qi, and J. Tang, “ Semantic neighbor graph hashing for multimodal retrieval,” IEEE Trans. Image Process., vol. Volume 27, no. Issue 3, pp. 1405–1417, 2018.

[3]

A. Z. Broder, M. Charikar, A. M. Frieze, and M. Mitzenmacher, “ Min-wise independent permutations,” J. Comput. Syst. Sci., vol. Volume 60, no. Issue 3, pp. 630–659, 2000.

Digital Library

[4]

Y. Gong, S. Lazebnik, A. Gordo, and F. Perronnin, “ Iterative quantization: A procrustean approach to learning binary codes for large-scale image retrieval,” IEEE Trans. Pattern Anal. Mach. Intell., vol. Volume 35, no. Issue 12, pp. 2916–2929, 2013.

Digital Library

[5]

J. Tang and Z. Li, “ Weakly supervised multimodal hashing for scalable social image retrieval,” IEEE Trans. Circuits Syst. Video Technol., vol. Volume 28, no. Issue 10, pp. 2730–2741, 2018.

[6]

A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ ImageNet classification with deep convolutional neural networks,” in Proc. Int. Conf. Adv. Neural Inf. Process. Syst., 2012, pp. 1097–1105.

Digital Library

[7]

M. Oquab, L. Bottou, I. Laptev, and J. Sivic, “ Learning and transferring mid-level image representations using convolutional neural networks,” in Proc. Comput. Vis. Pattern Recognit., Jun. 2014, pp. 1717–1724.

Digital Library

[8]

J. Lu, J. Hu, and J. Zhou, “ Deep metric learning for visual understanding: An overview of recent advances,” IEEE Signal Process. Mag., vol. Volume 34, no. Issue 6, pp. 76–84, 2017.

Digital Library

[9]

L. Chen, “ SCA-CNN: Spatial and channel-wise attention in convolutional networks for image captioning,” in Proc. Comput. Vis. Pattern Recognit., Jul. 2017, pp. 6298–6306.

[10]

Q. You, H. Jin, Z. Wang, C. Fang, and J. Luo, “ Image captioning with semantic attention,” in Proc. Comput. Vis. Pattern Recognit., Jun. 2016, pp. 4651–4659.

[11]

Z. Yang, Y. Yuan, Y. Wu, W. W. Cohen, and R. R. Salakhutdinov, “ Review networks for caption generation,” in Proc. Int. Conf. Neural Inf. Process. Syst., 2016, pp. 2361–2369.

Digital Library

[12]

H. Liu, R. Wang, S. Shan, and X. Chen, “ Deep supervised hashing for fast image retrieval,” in Proc. Comput. Vis. Pattern Recognit., Jun. 2016, pp. 2064–2072.

[13]

J. Song, T. He, L. Gao, X. Xu, and H. T. Shen. (2017). “ Deep region hashing for efficient large-scale instance search from images .” {Online}. Available: https://arxiv.org/abs/1701.07901

[14]

J. Tang, Z. Li, and X. Zhu, “ Supervised deep hashing for scalable face image retrieval,” Pattern Recognit., vol. Volume 75, pp. 25–32, 2018.

Digital Library

[15]

V. E. Liong, J. Lu, G. Wang, P. Moulin, and J. Zhou, “ Deep hashing for compact binary codes learning,” in Proc. Comput. Vis. Pattern Recognit., Jun. 2015, pp. 2475–2483.

[16]

F. Zhao, Y. Huang, L. Wang, and T. Tan, “ Deep semantic ranking based hashing for multi-label image retrieval,” in Proc. Comput. Vis. Pattern Recognit., Jun. 2015, pp. 1556–1564.

[17]

Y. Cao, M. Long, J. Wang, and S. Liu, “ Deep visual-semantic quantization for efficient image retrieval,” in Proc. Comput. Vis. Pattern Recognit., Jul. 2017, pp. 1328–1337.

Digital Library

[18]

B. Zhou, A. Khosla, A. Lapedriza, A. Oliva, and A. Torralba, “ Learning deep features for discriminative localization,” in Proc. Comput. Vis. Pattern Recognit., Jun. 2016, pp. 2921–2929.

[19]

S. Ren, K. He, R. Girshick, and J. Sun, “ Faster R-CNN: Towards real-time object detection with region proposal networks,” in Proc. Int. Conf. Neural Inf. Process. Syst., 2015, pp. 91–99.

Digital Library

[20]

M. Oquab, L. Bottou, I. Laptev, and J. Sivic, “ Is object localization for free?–Weakly-supervised learning with convolutional neural networks,” in Proc. Comput. Vis. Pattern Recognit., Jun. 2015, pp. 685–694.

[21]

D. Li, J.-B. Huang, Y. Li, S. Wang, and M.-H. Yang, “ Weakly supervised object localization with progressive domain adaptation,” in Proc. Comput. Vis. Pattern Recognit., Jun. 2016, pp. 3512–3520.

[22]

A. Diba, V. Sharma, A. Pazandeh, H. Pirsiavash, and L. Van Gool, “ Weakly supervised cascaded convolutional networks,” in Proc. Comput. Vis. Pattern Recognit., Jul. 2017, pp. 914–922.

[23]

M. Melucci, “ On rank correlation in information retrieval evaluation,” ACM SIGIR Forum, vol. Volume 41, no. Issue 1, pp. 18–33, 2007.

Digital Library

[24]

J. Yagnik, D. Strelow, D. A. Ross, and R.-S. Lin, “ The power of comparative reasoning,” in Proc. IEEE Int. Conf. Comput. Vis., Nov. 2011, pp. 2431–2438.

Digital Library

[25]

K. Li, G.-J. Qi, J. Ye, and K. A. Hua, “ Linear subspace ranking hashing for cross-modal retrieval,” IEEE Trans. Pattern Anal. Mach. Intell., vol. Volume 39, no. Issue 9, pp. 1825–1838, 2017.

[26]

J. Lu, C. Xiong, D. Parikh, and R. Socher, “ Knowing when to look: Adaptive attention via a visual sentinel for image captioning,” in Proc. Comput. Vis. Pattern Recognit., Jul. 2017, pp. 375–383.

[27]

Y. Yu, J. Choi, Y. Kim, K. Yoo, S.-H. Lee, and G. Kim, “ Supervising neural attention models for video captioning by human gaze data,” in Proc. Comput. Vis. Pattern Recognit., Jul. 2017, pp. 490–498.

[28]

X. Xu, X. Chen, C. Liu, A. Rohrbach, T. Darrell, and D. Song, “ Fooling vision and language models despite localization and attention mechanism,” in Proc. Comput. Vis. Pattern Recognit., Jul. 2018, pp. 4951–4961.

[29]

C. Cao, “ Look and think twice: Capturing top-down visual attention with feedback convolutional neural networks,” in Proc. IEEE Int. Conf. Comput. Vis., Dec. 2015, pp. 2956–2964.

Digital Library

[30]

Y. Zhu, C. Zhao, H. Guo, J. Wang, X. Zhao, and H. Lu, “ Attention couplenet: Fully convolutional attention coupling network for object detection,” IEEE Trans. Image Process., vol. Volume 28, no. Issue 1, pp. 113–126, 2019. pub-id-type=doi>10.1109/TIP.2018.2865280</object-id>.

Digital Library

[31]

X. Wang, A. Shrivastava, and A. Gupta, “ A-fast-RCNN: Hard positive generation via adversary for object detection,” in Proc. Comput. Vis. Pattern Recognit., Jul. 2017, pp. 2606–2615.

[32]

K. Xu, “ Show, attend and tell: Neural image caption generation with visual attention,” in Proc. Int. Conf. Mach. Learn., 2015, pp. 2048–2057.

Digital Library

[33]

H. Xu and K. Saenko, “ Ask, attend and answer: Exploring question-guided spatial attention for visual question answering,” in Proc. Eur. Conf. Comput. Vis., 2016, pp. 451–466.

[34]

T. Luong, H. Pham, and C. D. Manning, “ Effective approaches to attention-based neural machine translation,” in Proc. Conf. Empirical Methods Natural Lang. Process., 2015, pp. 1412–1421.

[35]

Z. Yang, X. He, J. Gao, L. Deng, and A. Smola, “ Stacked attention networks for image question answering,” in Proc. Comput. Vis. Pattern Recognit., Jun. 2016, pp. 21–29.

[36]

H. Liu, J. Feng, M. Qi, J. Jiang, and S. Yan, “ End-to-end comparative attention networks for person re-identification,” IEEE Trans. Image Process., vol. Volume 26, no. Issue 7, pp. 3492–3506, 2017.

[37]

Y. Weiss, A. Torralba, and R. Fergus, “ Spectral hashing,” in Proc. Int. Conf. Neural Inf. Process. Syst., 2009, pp. 1753–1760.

Digital Library

[38]

D. Zhang, J. Wang, D. Cai, and J. Lu, “ Self-taught hashing for fast similarity search,” in Proc. Int. ACM SIGIR Conf. Res. Develop. Inf. Retr., 2010, pp. 18–25.

Digital Library

[39]

W. Liu, J. Wang, S. Kumar, and S.-F. Chang, “ Hashing with graphs,” in Proc. Int. Conf. Mach. Learn., 2011, pp. 1–8.

Digital Library

[40]

J. Tang, Z. Li, M. Wang, and R. Zhao, “ Neighborhood discriminant hashing for large-scale image retrieval,” IEEE Trans. Image Process., vol. Volume 24, no. Issue 9, pp. 2827–2840, 2015.

[41]

W. Liu, J. Wang, R. Ji, Y.-G. Jiang, and S.-F. Chang, “ Supervised hashing with kernels,” in Proc. Comput. Vis. Pattern Recognit., Jun. 2012, pp. 2074–2081.

Digital Library

[42]

F. Shen, C. Shen, W. Liu, and H. T. Shen, “ Supervised discrete hashing,” in Proc. Comput. Vis. Pattern Recognit., Jun. 2015, pp. 37–45.

[43]

G. Lin, C. Shen, and A. van den Hengel, “ Supervised hashing using graph cuts and boosted decision trees,” IEEE Trans. Pattern Anal. Mach. Intell., vol. Volume 37, no. Issue 11, pp. 2317–2331, 2015.

Digital Library

[44]

B. Kulis and T. Darrell, “ Learning to hash with binary reconstructive embeddings,” in Proc. Int. Conf. Neural Inf. Process. Syst., 2009, pp. 1042–1050.

Digital Library

[45]

M. Norouzi and D. M. Blei, “ Minimal loss hashing for compact binary codes,” in Proc. Int. Conf. Mach. Learn., 2011, pp. 353–360.

Digital Library

[46]

B. Neyshabur, N. Srebro, R. R. Salakhutdinov, Y. Makarychev, and P. Yadollahpour, “ The power of asymmetry in binary hashing,” in Proc. Int. Conf. Neural Inf. Process. Syst., 2013, pp. 2823–2831.

Digital Library

[47]

K. Li, G.-J. Qi, and K. A. Hua, “ Learning label preserving binary codes for multimedia retrieval: A general approach,” ACM Trans. Multimedia Comput., Commun., Appl., vol. Volume 14, no. Issue 1, pp. 1:1–1:23, 2017.

Digital Library

[48]

H. Lai, Y. Pan, Y. Liu, and S. Yan, “ Simultaneous feature learning and hash coding with deep neural networks,” in Proc. Comput. Vis. Pattern Recognit., Jun. 2015, pp. 3270–3278.

[49]

J. Tang, J. Lin, Z. Li, and J. Yang, “ Discriminative deep quantization hashing for face image retrieval,” IEEE Trans. Neural Netw. Learn. Syst., vol. Volume 29, no. Issue 12, pp. 6154–6162, 2018.

[50]

R. Zhang, L. Lin, R. Zhang, W. Zuo, and L. Zhang, “ Bit-scalable deep hashing with regularized similarity learning for image retrieval and person re-identification,” IEEE Trans. Image Process., vol. Volume 24, no. Issue 12, pp. 4766–4779, 2015.

[51]

T.-T. Do, A.-D. Doan, and N.-M. Cheung, “ Learning to hash with binary deep neural network,” in Proc. Eur. Conf. Comput. Vis., 2016, pp. 219–234.

[52]

J. Lu, V. E. Liong, and J. Zhou, “ Deep hashing for scalable image search,” IEEE Trans. Image Process., vol. Volume 26, no. Issue 5, pp. 2352–2367, 2017.

Digital Library

[53]

H.-F. Yang, K. Lin, and C.-S. Chen, “ Supervised learning of semantics-preserving hash via deep convolutional neural networks,” IEEE Trans. Pattern Anal. Mach. Intell., vol. Volume 40, no. Issue 2, pp. 437–451, 2018.

Digital Library

[54]

L. Jin, K. Li, Z. Li, F. Xiao, G.-J. Qi, and J. Tang, “ Deep semantic-preserving ordinal hashing for cross-modal similarity search,” IEEE Trans. Neural Netw. Learn. Syst., to be published. pub-id-type=doi>10.1109/TNNLS.2018.2869601</object-id>.

[55]

Y. Jia, “ Caffe: Convolutional architecture for fast feature embedding,” in Proc. ACM Int. Conf. Multimedia, 2014, pp. 675–678.

Digital Library

[56]

M. J. Huiskes and M. S. Lew, “ The MIR flickr retrieval evaluation,” in Proc. ACM Int. Conf. Multimedia Inf. Retr., 2008, pp. 39–43.

Digital Library

[57]

A. Krizhevsky and G. Hinton, “ Learning multiple layers of features from tiny images,” <institution content-type=institution>Univ. Tornoto</institution>, Tornoto, ON, Canada, Tech. Rep. 4, 2009, p. pp.7, vol. Volume 1 .

[58]

T.-S. Chua, J. Tang, R. Hong, H. Li, Z. Luo, and Y. Zheng, “ NUS-WIDE: A real-world Web image database from National University of Singapore,” in Proc. ACM Int. Conf. Image Video Retr., 2009, p. pp.48.

Digital Library

Cited By

Xu MLou YMa WLi XZhou XGurrin CKongkachandra RSchoeffmann KDang-Nguyen DRossetto LSatoh SZhou L(2024)Parametric CAD Primitive Retrieval via Multi-Modal Fusion and Deep HashingProceedings of the 2024 International Conference on Multimedia Retrieval10.1145/3652583.3658041(1061-1069)Online publication date: 30-May-2024
https://dl.acm.org/doi/10.1145/3652583.3658041
Piao MSheng YYan JJin C(2024)Image Hash Layer Triggered CNN Framework for Wafer Map Failure Pattern Retrieval and ClassificationACM Transactions on Knowledge Discovery from Data10.1145/363805318:4(1-26)Online publication date: 13-Feb-2024
https://dl.acm.org/doi/10.1145/3638053
Wang ZGao ZHan MYang YShen H(2024)Estimating the Semantics via Sector Embedding for Image-Text RetrievalIEEE Transactions on Multimedia10.1109/TMM.2024.340766426(10342-10353)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1109/TMM.2024.3407664
Show More Cited By

Recommendations

Deep spatial attention hashing network for image retrieval
Abstract
Hashing is one of the most popular image retrieval technique since its fast-computational speed and low storage cost. Recently, deep hashing methods have greatly improved the image retrieval performance in contrast to traditional ...
Discrete Spatial Importance-Based Deep Weighted Hashing
Computer Vision – ACCV 2020
Abstract
Hashing is a widely used technique for large-scale approximate nearest neighbor searching in multimedia retrieval. Recent works have proved that using deep neural networks is a promising solution for learning both feature representation and hash ...
Deep Discrete Supervised Hashing

Hashing has been widely used for large-scale search due to its low storage cost and fast query speed. By using supervised information, supervised hashing can significantly outperform unsupervised hashing. Recently, discrete supervised hashing and ...

Comments

Information & Contributors

Information

Published In

cover image IEEE Transactions on Image Processing

IEEE Transactions on Image Processing Volume 28, Issue 5

May 2019

214 pages

ISSN:1057-7149

Issue’s Table of Contents

Copyright © 2019.

Publisher

IEEE Press

Publication History

Published: 01 May 2019

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

37
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 10 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Xu MLou YMa WLi XZhou XGurrin CKongkachandra RSchoeffmann KDang-Nguyen DRossetto LSatoh SZhou L(2024)Parametric CAD Primitive Retrieval via Multi-Modal Fusion and Deep HashingProceedings of the 2024 International Conference on Multimedia Retrieval10.1145/3652583.3658041(1061-1069)Online publication date: 30-May-2024
https://dl.acm.org/doi/10.1145/3652583.3658041
Piao MSheng YYan JJin C(2024)Image Hash Layer Triggered CNN Framework for Wafer Map Failure Pattern Retrieval and ClassificationACM Transactions on Knowledge Discovery from Data10.1145/363805318:4(1-26)Online publication date: 13-Feb-2024
https://dl.acm.org/doi/10.1145/3638053
Wang ZGao ZHan MYang YShen H(2024)Estimating the Semantics via Sector Embedding for Image-Text RetrievalIEEE Transactions on Multimedia10.1109/TMM.2024.340766426(10342-10353)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1109/TMM.2024.3407664
Duan JHao YZhu BCheng LZhou PWang X(2024)Efficient Unsupervised Video Hashing With Contextual Modeling and Structural ControllingIEEE Transactions on Multimedia10.1109/TMM.2024.336892426(7438-7450)Online publication date: 22-Feb-2024
https://dl.acm.org/doi/10.1109/TMM.2024.3368924
Pei YWang ZLi NChen HHuang BTu W(2024)Deep Hashing Network With Hybrid Attention and Adaptive Weighting for Image RetrievalIEEE Transactions on Multimedia10.1109/TMM.2023.332819726(4961-4973)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1109/TMM.2023.3328197
Lu ZJin LLi ZTang J(2024)Self-Paced Relational Contrastive Hashing for Large-Scale Image RetrievalIEEE Transactions on Multimedia10.1109/TMM.2023.331033326(3392-3404)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1109/TMM.2023.3310333
Qin QXie KZhang WWang CHuang L(2024)Deep Neighborhood Structure-Preserving Hashing for Large-Scale Image RetrievalIEEE Transactions on Multimedia10.1109/TMM.2023.328976526(1881-1893)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1109/TMM.2023.3289765
Zhang XLiu XNie XKang XYin Y(2024)Semi-Supervised Semi-Paired Cross-Modal HashingIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2023.331238534:7(6517-6529)Online publication date: 1-Jul-2024
https://dl.acm.org/doi/10.1109/TCSVT.2023.3312385
Li MMa LLi YGe M(2023)CCAHInternational Journal of Intelligent Systems10.1155/2023/79920472023Online publication date: 1-Jan-2023
https://dl.acm.org/doi/10.1155/2023/7992047
Cao MYan RShu XZhang JWang JXie GEl Saddik AMei TCucchiara RBertini MTobon Vallejo DAtrey PHossain M(2023)MUP: Multi-granularity Unified Perception for Panoramic Activity RecognitionProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3612435(7666-7675)Online publication date: 26-Oct-2023
https://dl.acm.org/doi/10.1145/3581783.3612435
Show More Cited By

View Options

View options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Issue’s Table of Contents