Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Deep Ordinal Hashing With Spatial Attention

Published: 01 May 2019 Publication History

Abstract

Hashing has attracted increasing research attention in recent years due to its high efficiency of computation and storage in image retrieval. Recent works have demonstrated the superiority of simultaneous feature representations and hash functions learning with deep neural networks. However, most existing deep hashing methods directly learn the hash functions by encoding the global semantic information, while ignoring the local spatial information of images. The loss of local spatial structure makes the performance bottleneck of hash functions, therefore limiting its application for accurate similarity retrieval. In this paper, we propose a novel deep ordinal hashing (DOH) method, which learns ordinal representations to generate ranking-based hash codes by leveraging the ranking structure of feature space from both local and global views. In particular, to effectively build the ranking structure, we propose to learn the rank correlation space by exploiting the local spatial information from fully convolutional network and the global semantic information from the convolutional neural network simultaneously. More specifically, an effective spatial attention model is designed to capture the local spatial information by selectively learning well-specified locations closely related to target objects. In such hashing framework, the local spatial and global semantic nature of images is captured in an end-to-end ranking-to-hashing manner. Experimental results conducted on three widely used datasets demonstrate that the proposed DOH method significantly outperforms the state-of-the-art hashing methods.

References

[1]
A. Gionis, P. Indyk, and R. Motwani, “ Similarity search in high dimensions via hashing,” in Proc. VLDB Conf., 1999, pp. 518–529.
[2]
L. Jin, K. Li, H. Hu, G.-J. Qi, and J. Tang, “ Semantic neighbor graph hashing for multimodal retrieval,” IEEE Trans. Image Process., vol. Volume 27, no. Issue 3, pp. 1405–1417, 2018.
[3]
A. Z. Broder, M. Charikar, A. M. Frieze, and M. Mitzenmacher, “ Min-wise independent permutations,” J. Comput. Syst. Sci., vol. Volume 60, no. Issue 3, pp. 630–659, 2000.
[4]
Y. Gong, S. Lazebnik, A. Gordo, and F. Perronnin, “ Iterative quantization: A procrustean approach to learning binary codes for large-scale image retrieval,” IEEE Trans. Pattern Anal. Mach. Intell., vol. Volume 35, no. Issue 12, pp. 2916–2929, 2013.
[5]
J. Tang and Z. Li, “ Weakly supervised multimodal hashing for scalable social image retrieval,” IEEE Trans. Circuits Syst. Video Technol., vol. Volume 28, no. Issue 10, pp. 2730–2741, 2018.
[6]
A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ ImageNet classification with deep convolutional neural networks,” in Proc. Int. Conf. Adv. Neural Inf. Process. Syst., 2012, pp. 1097–1105.
[7]
M. Oquab, L. Bottou, I. Laptev, and J. Sivic, “ Learning and transferring mid-level image representations using convolutional neural networks,” in Proc. Comput. Vis. Pattern Recognit., Jun. 2014, pp. 1717–1724.
[8]
J. Lu, J. Hu, and J. Zhou, “ Deep metric learning for visual understanding: An overview of recent advances,” IEEE Signal Process. Mag., vol. Volume 34, no. Issue 6, pp. 76–84, 2017.
[9]
L. Chen, “ SCA-CNN: Spatial and channel-wise attention in convolutional networks for image captioning,” in Proc. Comput. Vis. Pattern Recognit., Jul. 2017, pp. 6298–6306.
[10]
Q. You, H. Jin, Z. Wang, C. Fang, and J. Luo, “ Image captioning with semantic attention,” in Proc. Comput. Vis. Pattern Recognit., Jun. 2016, pp. 4651–4659.
[11]
Z. Yang, Y. Yuan, Y. Wu, W. W. Cohen, and R. R. Salakhutdinov, “ Review networks for caption generation,” in Proc. Int. Conf. Neural Inf. Process. Syst., 2016, pp. 2361–2369.
[12]
H. Liu, R. Wang, S. Shan, and X. Chen, “ Deep supervised hashing for fast image retrieval,” in Proc. Comput. Vis. Pattern Recognit., Jun. 2016, pp. 2064–2072.
[13]
J. Song, T. He, L. Gao, X. Xu, and H. T. Shen. (2017). “ Deep region hashing for efficient large-scale instance search from images .” {Online}. Available: https://arxiv.org/abs/1701.07901
[14]
J. Tang, Z. Li, and X. Zhu, “ Supervised deep hashing for scalable face image retrieval,” Pattern Recognit., vol. Volume 75, pp. 25–32, 2018.
[15]
V. E. Liong, J. Lu, G. Wang, P. Moulin, and J. Zhou, “ Deep hashing for compact binary codes learning,” in Proc. Comput. Vis. Pattern Recognit., Jun. 2015, pp. 2475–2483.
[16]
F. Zhao, Y. Huang, L. Wang, and T. Tan, “ Deep semantic ranking based hashing for multi-label image retrieval,” in Proc. Comput. Vis. Pattern Recognit., Jun. 2015, pp. 1556–1564.
[17]
Y. Cao, M. Long, J. Wang, and S. Liu, “ Deep visual-semantic quantization for efficient image retrieval,” in Proc. Comput. Vis. Pattern Recognit., Jul. 2017, pp. 1328–1337.
[18]
B. Zhou, A. Khosla, A. Lapedriza, A. Oliva, and A. Torralba, “ Learning deep features for discriminative localization,” in Proc. Comput. Vis. Pattern Recognit., Jun. 2016, pp. 2921–2929.
[19]
S. Ren, K. He, R. Girshick, and J. Sun, “ Faster R-CNN: Towards real-time object detection with region proposal networks,” in Proc. Int. Conf. Neural Inf. Process. Syst., 2015, pp. 91–99.
[20]
M. Oquab, L. Bottou, I. Laptev, and J. Sivic, “ Is object localization for free?–Weakly-supervised learning with convolutional neural networks,” in Proc. Comput. Vis. Pattern Recognit., Jun. 2015, pp. 685–694.
[21]
D. Li, J.-B. Huang, Y. Li, S. Wang, and M.-H. Yang, “ Weakly supervised object localization with progressive domain adaptation,” in Proc. Comput. Vis. Pattern Recognit., Jun. 2016, pp. 3512–3520.
[22]
A. Diba, V. Sharma, A. Pazandeh, H. Pirsiavash, and L. Van Gool, “ Weakly supervised cascaded convolutional networks,” in Proc. Comput. Vis. Pattern Recognit., Jul. 2017, pp. 914–922.
[23]
M. Melucci, “ On rank correlation in information retrieval evaluation,” ACM SIGIR Forum, vol. Volume 41, no. Issue 1, pp. 18–33, 2007.
[24]
J. Yagnik, D. Strelow, D. A. Ross, and R.-S. Lin, “ The power of comparative reasoning,” in Proc. IEEE Int. Conf. Comput. Vis., Nov. 2011, pp. 2431–2438.
[25]
K. Li, G.-J. Qi, J. Ye, and K. A. Hua, “ Linear subspace ranking hashing for cross-modal retrieval,” IEEE Trans. Pattern Anal. Mach. Intell., vol. Volume 39, no. Issue 9, pp. 1825–1838, 2017.
[26]
J. Lu, C. Xiong, D. Parikh, and R. Socher, “ Knowing when to look: Adaptive attention via a visual sentinel for image captioning,” in Proc. Comput. Vis. Pattern Recognit., Jul. 2017, pp. 375–383.
[27]
Y. Yu, J. Choi, Y. Kim, K. Yoo, S.-H. Lee, and G. Kim, “ Supervising neural attention models for video captioning by human gaze data,” in Proc. Comput. Vis. Pattern Recognit., Jul. 2017, pp. 490–498.
[28]
X. Xu, X. Chen, C. Liu, A. Rohrbach, T. Darrell, and D. Song, “ Fooling vision and language models despite localization and attention mechanism,” in Proc. Comput. Vis. Pattern Recognit., Jul. 2018, pp. 4951–4961.
[29]
C. Cao, “ Look and think twice: Capturing top-down visual attention with feedback convolutional neural networks,” in Proc. IEEE Int. Conf. Comput. Vis., Dec. 2015, pp. 2956–2964.
[30]
Y. Zhu, C. Zhao, H. Guo, J. Wang, X. Zhao, and H. Lu, “ Attention couplenet: Fully convolutional attention coupling network for object detection,” IEEE Trans. Image Process., vol. Volume 28, no. Issue 1, pp. 113–126, 2019. pub-id-type=doi>10.1109/TIP.2018.2865280</object-id>.
[31]
X. Wang, A. Shrivastava, and A. Gupta, “ A-fast-RCNN: Hard positive generation via adversary for object detection,” in Proc. Comput. Vis. Pattern Recognit., Jul. 2017, pp. 2606–2615.
[32]
K. Xu, “ Show, attend and tell: Neural image caption generation with visual attention,” in Proc. Int. Conf. Mach. Learn., 2015, pp. 2048–2057.
[33]
H. Xu and K. Saenko, “ Ask, attend and answer: Exploring question-guided spatial attention for visual question answering,” in Proc. Eur. Conf. Comput. Vis., 2016, pp. 451–466.
[34]
T. Luong, H. Pham, and C. D. Manning, “ Effective approaches to attention-based neural machine translation,” in Proc. Conf. Empirical Methods Natural Lang. Process., 2015, pp. 1412–1421.
[35]
Z. Yang, X. He, J. Gao, L. Deng, and A. Smola, “ Stacked attention networks for image question answering,” in Proc. Comput. Vis. Pattern Recognit., Jun. 2016, pp. 21–29.
[36]
H. Liu, J. Feng, M. Qi, J. Jiang, and S. Yan, “ End-to-end comparative attention networks for person re-identification,” IEEE Trans. Image Process., vol. Volume 26, no. Issue 7, pp. 3492–3506, 2017.
[37]
Y. Weiss, A. Torralba, and R. Fergus, “ Spectral hashing,” in Proc. Int. Conf. Neural Inf. Process. Syst., 2009, pp. 1753–1760.
[38]
D. Zhang, J. Wang, D. Cai, and J. Lu, “ Self-taught hashing for fast similarity search,” in Proc. Int. ACM SIGIR Conf. Res. Develop. Inf. Retr., 2010, pp. 18–25.
[39]
W. Liu, J. Wang, S. Kumar, and S.-F. Chang, “ Hashing with graphs,” in Proc. Int. Conf. Mach. Learn., 2011, pp. 1–8.
[40]
J. Tang, Z. Li, M. Wang, and R. Zhao, “ Neighborhood discriminant hashing for large-scale image retrieval,” IEEE Trans. Image Process., vol. Volume 24, no. Issue 9, pp. 2827–2840, 2015.
[41]
W. Liu, J. Wang, R. Ji, Y.-G. Jiang, and S.-F. Chang, “ Supervised hashing with kernels,” in Proc. Comput. Vis. Pattern Recognit., Jun. 2012, pp. 2074–2081.
[42]
F. Shen, C. Shen, W. Liu, and H. T. Shen, “ Supervised discrete hashing,” in Proc. Comput. Vis. Pattern Recognit., Jun. 2015, pp. 37–45.
[43]
G. Lin, C. Shen, and A. van den Hengel, “ Supervised hashing using graph cuts and boosted decision trees,” IEEE Trans. Pattern Anal. Mach. Intell., vol. Volume 37, no. Issue 11, pp. 2317–2331, 2015.
[44]
B. Kulis and T. Darrell, “ Learning to hash with binary reconstructive embeddings,” in Proc. Int. Conf. Neural Inf. Process. Syst., 2009, pp. 1042–1050.
[45]
M. Norouzi and D. M. Blei, “ Minimal loss hashing for compact binary codes,” in Proc. Int. Conf. Mach. Learn., 2011, pp. 353–360.
[46]
B. Neyshabur, N. Srebro, R. R. Salakhutdinov, Y. Makarychev, and P. Yadollahpour, “ The power of asymmetry in binary hashing,” in Proc. Int. Conf. Neural Inf. Process. Syst., 2013, pp. 2823–2831.
[47]
K. Li, G.-J. Qi, and K. A. Hua, “ Learning label preserving binary codes for multimedia retrieval: A general approach,” ACM Trans. Multimedia Comput., Commun., Appl., vol. Volume 14, no. Issue 1, pp. 1:1–1:23, 2017.
[48]
H. Lai, Y. Pan, Y. Liu, and S. Yan, “ Simultaneous feature learning and hash coding with deep neural networks,” in Proc. Comput. Vis. Pattern Recognit., Jun. 2015, pp. 3270–3278.
[49]
J. Tang, J. Lin, Z. Li, and J. Yang, “ Discriminative deep quantization hashing for face image retrieval,” IEEE Trans. Neural Netw. Learn. Syst., vol. Volume 29, no. Issue 12, pp. 6154–6162, 2018.
[50]
R. Zhang, L. Lin, R. Zhang, W. Zuo, and L. Zhang, “ Bit-scalable deep hashing with regularized similarity learning for image retrieval and person re-identification,” IEEE Trans. Image Process., vol. Volume 24, no. Issue 12, pp. 4766–4779, 2015.
[51]
T.-T. Do, A.-D. Doan, and N.-M. Cheung, “ Learning to hash with binary deep neural network,” in Proc. Eur. Conf. Comput. Vis., 2016, pp. 219–234.
[52]
J. Lu, V. E. Liong, and J. Zhou, “ Deep hashing for scalable image search,” IEEE Trans. Image Process., vol. Volume 26, no. Issue 5, pp. 2352–2367, 2017.
[53]
H.-F. Yang, K. Lin, and C.-S. Chen, “ Supervised learning of semantics-preserving hash via deep convolutional neural networks,” IEEE Trans. Pattern Anal. Mach. Intell., vol. Volume 40, no. Issue 2, pp. 437–451, 2018.
[54]
L. Jin, K. Li, Z. Li, F. Xiao, G.-J. Qi, and J. Tang, “ Deep semantic-preserving ordinal hashing for cross-modal similarity search,” IEEE Trans. Neural Netw. Learn. Syst., to be published. pub-id-type=doi>10.1109/TNNLS.2018.2869601</object-id>.
[55]
Y. Jia, “ Caffe: Convolutional architecture for fast feature embedding,” in Proc. ACM Int. Conf. Multimedia, 2014, pp. 675–678.
[56]
M. J. Huiskes and M. S. Lew, “ The MIR flickr retrieval evaluation,” in Proc. ACM Int. Conf. Multimedia Inf. Retr., 2008, pp. 39–43.
[57]
A. Krizhevsky and G. Hinton, “ Learning multiple layers of features from tiny images,” <institution content-type=institution>Univ. Tornoto</institution>, Tornoto, ON, Canada, Tech. Rep. 4, 2009, p. pp.7, vol. Volume 1 .
[58]
T.-S. Chua, J. Tang, R. Hong, H. Li, Z. Luo, and Y. Zheng, “ NUS-WIDE: A real-world Web image database from National University of Singapore,” in Proc. ACM Int. Conf. Image Video Retr., 2009, p. pp.48.

Cited By

View all
  • (2024)Parametric CAD Primitive Retrieval via Multi-Modal Fusion and Deep HashingProceedings of the 2024 International Conference on Multimedia Retrieval10.1145/3652583.3658041(1061-1069)Online publication date: 30-May-2024
  • (2024)Image Hash Layer Triggered CNN Framework for Wafer Map Failure Pattern Retrieval and ClassificationACM Transactions on Knowledge Discovery from Data10.1145/363805318:4(1-26)Online publication date: 13-Feb-2024
  • (2024)Estimating the Semantics via Sector Embedding for Image-Text RetrievalIEEE Transactions on Multimedia10.1109/TMM.2024.340766426(10342-10353)Online publication date: 1-Jan-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image IEEE Transactions on Image Processing
IEEE Transactions on Image Processing  Volume 28, Issue 5
May 2019
214 pages

Publisher

IEEE Press

Publication History

Published: 01 May 2019

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 10 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Parametric CAD Primitive Retrieval via Multi-Modal Fusion and Deep HashingProceedings of the 2024 International Conference on Multimedia Retrieval10.1145/3652583.3658041(1061-1069)Online publication date: 30-May-2024
  • (2024)Image Hash Layer Triggered CNN Framework for Wafer Map Failure Pattern Retrieval and ClassificationACM Transactions on Knowledge Discovery from Data10.1145/363805318:4(1-26)Online publication date: 13-Feb-2024
  • (2024)Estimating the Semantics via Sector Embedding for Image-Text RetrievalIEEE Transactions on Multimedia10.1109/TMM.2024.340766426(10342-10353)Online publication date: 1-Jan-2024
  • (2024)Efficient Unsupervised Video Hashing With Contextual Modeling and Structural ControllingIEEE Transactions on Multimedia10.1109/TMM.2024.336892426(7438-7450)Online publication date: 22-Feb-2024
  • (2024)Deep Hashing Network With Hybrid Attention and Adaptive Weighting for Image RetrievalIEEE Transactions on Multimedia10.1109/TMM.2023.332819726(4961-4973)Online publication date: 1-Jan-2024
  • (2024)Self-Paced Relational Contrastive Hashing for Large-Scale Image RetrievalIEEE Transactions on Multimedia10.1109/TMM.2023.331033326(3392-3404)Online publication date: 1-Jan-2024
  • (2024)Deep Neighborhood Structure-Preserving Hashing for Large-Scale Image RetrievalIEEE Transactions on Multimedia10.1109/TMM.2023.328976526(1881-1893)Online publication date: 1-Jan-2024
  • (2024)Semi-Supervised Semi-Paired Cross-Modal HashingIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2023.331238534:7(6517-6529)Online publication date: 1-Jul-2024
  • (2023)CCAHInternational Journal of Intelligent Systems10.1155/2023/79920472023Online publication date: 1-Jan-2023
  • (2023)MUP: Multi-granularity Unified Perception for Panoramic Activity RecognitionProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3612435(7666-7675)Online publication date: 26-Oct-2023
  • Show More Cited By

View Options

View options

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media