research-article

Efficient Multi-modal Hashing with Online Query Adaption for Multimedia Retrieval

Authors:

Huaxiang ZhangAuthors Info & Claims

ACM Transactions on Information Systems (TOIS), Volume 40, Issue 2

Article No.: 41, Pages 1 - 36

https://doi.org/10.1145/3477180

Published: 27 September 2021 Publication History

Abstract

Multi-modal hashing supports efficient multimedia retrieval well. However, existing methods still suffer from two problems: (1) Fixed multi-modal fusion. They collaborate the multi-modal features with fixed weights for hash learning, which cannot adaptively capture the variations of online streaming multimedia contents. (2) Binary optimization challenge. To generate binary hash codes, existing methods adopt either two-step relaxed optimization that causes significant quantization errors or direct discrete optimization that consumes considerable computation and storage cost. To address these problems, we first propose a Supervised Multi-modal Hashing with Online Query-adaption method. A self-weighted fusion strategy is designed to adaptively preserve the multi-modal features into hash codes by exploiting their complementarity. Besides, the hash codes are efficiently learned with the supervision of pair-wise semantic labels to enhance their discriminative capability while avoiding the challenging symmetric similarity matrix factorization. Further, we propose an efficient Unsupervised Multi-modal Hashing with Online Query-adaption method with an adaptive multi-modal quantization strategy. The hash codes are directly learned without the reliance on the specific objective formulations. Finally, in both methods, we design a parameter-free online hashing module to adaptively capture query variations at the online retrieval stage. Experiments validate the superiority of our proposed methods.

References

[1]

Mokhtar S. Bazaraa, Hanif D. Sherali, and Chitharanjan M. Shetty. 2013. Nonlinear Programming: Theory and Algorithms. John Wiley & Sons.

Digital Library

[2]

Julian Besag. 1986. On the statistical analysis of dirty pictures. J. Roy. Stat. Soc.: Series B (Methodol.) 48, 3 (1986), 259–279.

[3]

Yue Cao, Mingsheng Long, Jianmin Wang, and Shichen Liu. 2017. Collective deep quantization for efficient cross-modal retrieval. In Proceedings of the 31st AAAI Conference on Artificial Intelligence (AAAI’17). 3974–3980.

Digital Library

[4]

Yong Chen, Zhibao Tian, Hui Zhang, Jun Wang, and Dell Zhang. 2020. Strongly constrained discrete hashing. IEEE Trans. Image Process. 29 (2020), 3596–3611.

Digital Library

[5]

Miaomiao Cheng, Liping Jing, and Michael K. Ng. 2020. Robust unsupervised cross-modal hashing for multimedia retrieval. ACM Trans. Info. Syst. 38, 3 (2020), 1–25.

Digital Library

[6]

Tat-Seng Chua, Jinhui Tang, Richang Hong, Haojie Li, Zhiping Luo, and Yantao Zheng. 2009. NUS-WIDE: A real-world web image database from national university of singapore. In Proceedings of the 8th ACM International Conference on Image and Video Retrieval (CIVR’09). 48.

Digital Library

[7]

G. Ding, Y. Guo, J. Zhou, and Y. Gao. 2016. Large-Scale cross-modality search via collective matrix factorization hashing. IEEE Trans. Image Process. 25, 11 (2016), 5427–5440.

Digital Library

[8]

Yunchao Gong, Svetlana Lazebnik, Albert Gordo, and Florent Perronnin. 2013. Iterative quantization: A procrustean approach to learning binary codes for large-scale image retrieval. IEEE Trans. Pattern Anal. Mach. Intell. 35, 12 (2013), 2916–2929.

Digital Library

[9]

Casper Hansen, Christian Hansen, Jakob Grue Simonsen, Stephen Alstrup, and Christina Lioma. 2020. Unsupervised semantic hashing with pairwise reconstruction. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’20). ACM, 2009–2012.

Digital Library

[10]

Mark J. Huiskes and Michael S. Lew. 2008. The MIR flickr retrieval evaluation. In Proceedings of the 1st ACM SIGMM International Conference on Multimedia Information Retrieval (MIR’08). 39–43.

Digital Library

[11]

Qing-Yuan Jiang and Wu-Jun Li. 2017. Deep cross-modal hashing. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). 3232–3240.

[12]

Saehoon Kim and Seungjin Choi. 2013. Multi-view anchor graph hashing. In Proceeding of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’13). 3123–3127.

[13]

Shaishav Kumar and Raghavendra Udupa. 2011. Learning hash functions for cross-view similarity search. In Proceedings of the 22nd International Joint Conference on Artificial Intelligence (IJCAI’11).

Digital Library

[14]

Chuan-Xiang Li, Zhen-Duo Chen, Peng-Fei Zhang, Xin Luo, Liqiang Nie, Wei Zhang, and Xin-Shun Xu. 2018. SCRATCH: A scalable discrete matrix factorization hashing for cross-modal retrieval. In Proceedings of the 26th ACM International Conference on Multimedia (MM’18). 1–9.

Digital Library

[15]

Stan Z. Li. 2012. Markov Random Field Modeling in Computer Vision. Springer Science & Business Media.

Digital Library

[16]

Mingbao Lin, Rongrong Ji, Shen Chen, Xiaoshuai Sun, and Chia-Wen Lin. 2020. Similarity-Preserving linkage hashing for online image retrieval. IEEE Trans. Image Process. 29 (2020), 5289–5300.

[17]

Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C. Lawrence Zitnick. 2014. Microsoft coco: Common objects in context. In Proceedings of 13th European Conference on Computer Vision (ECCV’14). 740–755.

[18]

Li Liu, Mengyang Yu, and Ling Shao. 2015. Multiview alignment hashing for efficient image search. IEEE Trans. Image Process. 24, 3 (2015), 956–966.

Digital Library

[19]

W. Liu, C. Mu, S. Kumar, and S. F. Chang. 2014. Discrete graph hashing. Adv. Neural Info. Process. Syst. 4 (2014), 3419–3427.

Digital Library

[20]

Xianglong Liu, Junfeng He, and Bo Lang. 2014. Multiple feature kernel hashing for large-scale visual search. Pattern Recogn. 47, 2 (2014), 748–757.

Digital Library

[21]

Xianglong Liu, Zhujin Li, Cheng Deng, and Dacheng Tao. 2017. Distributed adaptive binary quantization for fast nearest neighbor search. IEEE Trans. Image Process. 26, 11 (2017), 5324–5336.

Digital Library

[22]

Mingsheng Long, Yue Cao, Jianmin Wang, and Philip S. Yu. 2016. Composite correlation quantization for efficient multimodal retrieval. In Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’16). 579–588.

Digital Library

[23]

Xu Lu, Lei Zhu, Zhiyong Cheng, Jingjing Li, Xiushan Nie, and Huaxiang Zhang. 2019. Flexible online multi-modal hashing for large-scale multimedia retrieval. In Proceedings of the 27th ACM International Conference on Multimedia (MM’19). 1129–1137.

Digital Library

[24]

Xu Lu, Lei Zhu, Zhiyong Cheng, Liqiang Nie, and Huaxiang Zhang. 2019. Online multi-modal hashing with dynamic query adaption. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’19). 715–724.

Digital Library

[25]

Xu Lu, Lei Zhu, Jingjing Li, Huaxiang Zhang, and Hengtao Shen. 2020. Efficient supervised discrete multi-view hashing for large-scale multimedia search. IEEE Trans. Multimedia 22, 8 (2020), 2048–2060.

[26]

Xin Luo, Liqiang Nie, Xiangnan He, Ye Wu, Zhen-Duo Chen, and Xin-Shun Xu. 2018. Fast scalable supervised hashing. In Proceedings of the 41st International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’18). 735–744.

Digital Library

[27]

Xin Luo, Peng-Fei Zhang, Ye Wu, Zhen-Duo Chen, Hua-Junjie Huang, and Xin-Shun Xu. 2018. Asymmetric discrete cross-modal hashing. In Proceedings of the ACM on International Conference on Multimedia Retrieval (ICMR’18). 204–212.

Digital Library

[28]

Devraj Mandal, Kunal N. Chaudhury, and Soma Biswas. 2018. Generalized semantic preserving hashing for cross-modal retrieval. IEEE Trans. Image Process. 28, 1 (2018), 102–112.

Digital Library

[29]

Fumin Shen, Chunhua Shen, Wei Liu, and Heng Tao Shen. 2015. Supervised discrete hashing. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’15). 37–45.

[30]

Fumin Shen, Yan Xu, Li Liu, Yang Yang, Zi Huang, and Heng Tao Shen. 2018. Unsupervised deep hashing with similarity-adaptive and discrete optimization. IEEE Trans. Pattern Anal. Mach. Intell. 40, 12 (2018), 3034-3044.

Digital Library

[31]

Xiaobo Shen, Funmin Shen, Liliu, Yunhao Yuan, Weiwei Liu, and Quansen Sun. 2018. Multiview discrete hashing for scalable multimedia search. ACM Trans. Intell. Syst. Technol. 9, 5 (2018), 53.

Digital Library

[32]

Xiao-Bo Shen, Fumin Shen, Quan-Sen Sun, and Yunhao Yuan. 2015. Multi-view latent hashing for efficient multimedia search. In Proceedings of the 23rd Annual ACM Conference on Multimedia Conference (MM’15). 831–834.

Digital Library

[33]

Karen Simonyan and Andrew Zisserman. 2015. Very deep convolutional networks for large-scale image recognition. In Proceedings of the 3rd International Conference on Learning Representations.

[34]

Jingkuan Song, Yi Yang, Zi Huang, Heng Tao Shen, and Jiebo Luo. 2013. Effective multiple feature hashing for large-scale near-duplicate video retrieval. IEEE Trans. Multimedia 15, 8 (2013), 1997–2008.

Digital Library

[35]

Changchang Sun, Xuemeng Song, Fuli Feng, Wayne Xin Zhao, Hao Zhang, and Liqiang Nie. 2019. Supervised hierarchical cross-modal hashing. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’19). 725–734.

Digital Library

[36]

Jingdong Wang and Ting Zhang. 2018. Composite quantization. IEEE Trans. Pattern Anal. Mach. Intell. 41, 6 (2018), 1308–1322.

Digital Library

[37]

J. Wang, T. Zhang, J. Song, N. Sebe, and H. T. Shen. 2018. A survey on learning to hash. IEEE Trans. Pattern Anal. Mach. Intell. 40, 4 (2018), 769–790.

[38]

Xiaojuan Wang, Ting Zhang, Guo-Jun Qi, Jinhui Tang, and Jingdong Wang. 2016. Supervised quantization for similarity search. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16). 2018–2026.

[39]

Xin Wang, Wenwu Zhu, and Chenghao Liu. 2019. Semi-Supervised deep quantization for cross-modal search. In Proceedings of the 27th ACM International Conference on Multimedia (MM’19). 1730–739.

Digital Library

[40]

Zijian Wang, Zheng Zhang, Yadan Luo, and Zi Huang. 2019. Deep collaborative discrete hashing with semantic-invariant structure. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’19). 905–908.

Digital Library

[41]

Gengshen Wu, Zijia Lin, Jungong Han, Li Liu, Guiguang Ding, Baochang Zhang, and Jialie Shen. 2018. Unsupervised deep hashing via binary latent factor models for large-scale cross-modal retrieval. In Proceedings of the 27th International Joint Conference on Artificial Intelligence (IJCAI’18). 2854–2860.

Digital Library

[42]

X. Xu, F. Shen, Y. Yang, H. T. Shen, and X. Li. 2017. Learning discriminative binary codes for large-scale cross-modal retrieval. IEEE Trans. Image Process. 26, 5 (2017), 2494–2507.

Digital Library

[43]

Erkun Yang, Cheng Deng, Chao Li, Wei Liu, Jie Li, and Dacheng Tao. 2018. Shared predictive cross-modal deep quantization. IEEE Trans. Neural Netw. Learn. Syst. 29, 11 (2018), 5292–5303.

[44]

Rui Yang, Yuliang Shi, and Xin-Shun Xu. 2017. Discrete multi-view hashing for effective image retrieval. In Proceedings of the ACM on International Conference on Multimedia Retrieval (ICMR’17). 175–783.

Digital Library

[45]

Zhan Yang, Jun Long, Lei Zhu, and Wenti Huang. 2020. Nonlinear robust discrete hashing for cross-modal retrieval. In Proceedings of the 43rd International ACM SIGIR conference on research and development in Information Retrieval (SIGIR’20). ACM, 1349–1358.

Digital Library

[46]

Dongqing Zhang and Wu-Jun Li. 2014. Large-scale supervised multimodal hashing with semantic correlation maximization. In Proceedings of the 28th AAAI Conference on Artificial Intelligence (AAAI’14). 2177–2183.

Digital Library

[47]

Dan Zhang, Fei Wang, and Luo Si. 2011. Composite hashing with multiple information sources. In Proceeding of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’11). 225–234.

Digital Library

[48]

Peichao Zhang, Wei Zhang, Wu-Jun Li, and Minyi Guo. 2014. Supervised hashing with latent factor models. In Proceeding of the 37th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’14). 173–182.

Digital Library

[49]

Lei Zhu, Zi Huang, Zhihui Li, Liang Xie, and Heng Tao Shen. 2018. Exploring auxiliary context: Discrete semantic transfer hashing for scalable image retrieval. IEEE Trans. Neural Netw. Learn. Syst. 29, 11 (2018), 5264–5276.

[50]

Lei Zhu, Jialie Shen, Liang Xie, and Zhiyong Cheng. 2017. Unsupervised visual hashing with semantic assistant for content-based image retrieval. IEEE Trans. Knowl. Data Eng. 29, 2 (2017), 472–486.

Digital Library

[51]

Xiaofeng Zhu, Zi Huang, Hong Cheng, Jiangtao Cui, and Heng Tao Shen. 2013. Sparse hashing for fast multimedia search. ACM Trans. Info. Syst. 31, 2 (2013), 1–24.

Digital Library

Cited By

Lu XLiu LNing LZhang LMu SZhang H(2024)Multi-Facet Weighted Asymmetric Multi-Modal Hashing Based on Latent Semantic DistributionIEEE Transactions on Multimedia10.1109/TMM.2024.336366426(7307-7320)Online publication date: 2024
https://doi.org/10.1109/TMM.2024.3363664
Zhang TLu XZhang HNie XYin YShen J(2024)Relational Network via Cascade CRF for Video Language GroundingIEEE Transactions on Multimedia10.1109/TMM.2023.330371226(8297-8311)Online publication date: 2024
https://doi.org/10.1109/TMM.2023.3303712
Sun YDai JRen ZLi QPeng D(2024)Relaxed Energy Preserving Hashing for Image RetrievalIEEE Transactions on Intelligent Transportation Systems10.1109/TITS.2024.335184125:7(7388-7400)Online publication date: Jul-2024
https://doi.org/10.1109/TITS.2024.3351841
Show More Cited By

Index Terms

Efficient Multi-modal Hashing with Online Query Adaption for Multimedia Retrieval
1. Information systems
  1. Information retrieval

Recommendations

Flexible Multi-modal Hashing for Scalable Multimedia Retrieval
Survey Paper and Regular Paper

Multi-modal hashing methods could support efficient multimedia retrieval by combining multi-modal features for binary hash learning at the both offline training and online query stages. However, existing multi-modal methods cannot binarize the queries, ...
Online Multi-modal Hashing with Dynamic Query-adaption
SIGIR'19: Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval

Multi-modal hashing is an effective technique to support large-scale multimedia retrieval, due to its capability of encoding heterogeneous multi-modal features into compact and similarity-preserving binary codes. Although great progress has been ...
Graph Convolutional Multi-modal Hashing for Flexible Multimedia Retrieval
MM '21: Proceedings of the 29th ACM International Conference on Multimedia

Multi-modal hashing makes an important contribution to multimedia retrieval, where a key challenge is to encode heterogeneous modalities into compact hash codes. To solve this dilemma, graph-based multi-modal hashing methods generally define individual ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Information Systems

ACM Transactions on Information Systems Volume 40, Issue 2

April 2022

587 pages

ISSN:1046-8188

EISSN:1558-2868

DOI:10.1145/3484931

Issue’s Table of Contents

Copyright © 2021 Association for Computing Machinery.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 September 2021

Accepted: 01 July 2021

Revised: 01 February 2021

Received: 01 October 2020

Published in TOIS Volume 40, Issue 2

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Refereed

Funding Sources

National Natural Science Foundation of China
Natural Science Foundation of Shandong, China
Major Fundamental Research Project of Shandong, China
Youth Innovation Project of Shandong Universities, China
Taishan Scholar Project of Shandong, China

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

9
Total Citations
View Citations
433
Total Downloads

Downloads (Last 12 months)74
Downloads (Last 6 weeks)4

Reflects downloads up to 01 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Lu XLiu LNing LZhang LMu SZhang H(2024)Multi-Facet Weighted Asymmetric Multi-Modal Hashing Based on Latent Semantic DistributionIEEE Transactions on Multimedia10.1109/TMM.2024.336366426(7307-7320)Online publication date: 2024
https://doi.org/10.1109/TMM.2024.3363664
Zhang TLu XZhang HNie XYin YShen J(2024)Relational Network via Cascade CRF for Video Language GroundingIEEE Transactions on Multimedia10.1109/TMM.2023.330371226(8297-8311)Online publication date: 2024
https://doi.org/10.1109/TMM.2023.3303712
Sun YDai JRen ZLi QPeng D(2024)Relaxed Energy Preserving Hashing for Image RetrievalIEEE Transactions on Intelligent Transportation Systems10.1109/TITS.2024.335184125:7(7388-7400)Online publication date: Jul-2024
https://doi.org/10.1109/TITS.2024.3351841
Cai K(2023)Construction of law and economics litigation service platform based on multimedia retrievalApplied Mathematics and Nonlinear Sciences10.2478/amns.2023.2.000088:2(2913-2926)Online publication date: 5-Jul-2023
https://doi.org/10.2478/amns.2023.2.00008
Nie XShi YMeng ZHuang JGuan WYin Y(2023)Complex Scenario Image Retrieval via Deep Similarity-aware HashingACM Transactions on Multimedia Computing, Communications, and Applications10.1145/362401620:4(1-24)Online publication date: 11-Dec-2023
https://dl.acm.org/doi/10.1145/3624016
Gu XSun YNi FChen SWang XSong RLi BCao XEl Saddik AMei TCucchiara RBertini MTobon Vallejo DAtrey PHossain M(2023)TeViS: Translating Text Synopses to Video StoryboardsProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3612417(4968-4979)Online publication date: 26-Oct-2023
https://dl.acm.org/doi/10.1145/3581783.3612417
Liu XYi JCheung YXu XCui Z(2023)OMGH: Online Manifold-Guided Hashing for Flexible Cross-Modal RetrievalIEEE Transactions on Multimedia10.1109/TMM.2022.316666825(3811-3824)Online publication date: 1-Jan-2023
https://dl.acm.org/doi/10.1109/TMM.2022.3166668
Zhu LZheng CGuan WLi JYang YShen H(2023)Multi-Modal Hashing for Efficient Multimedia Retrieval: A SurveyIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.328292136:1(239-260)Online publication date: 5-Jun-2023
https://dl.acm.org/doi/10.1109/TKDE.2023.3282921
Yan MWen YShi QTian X(2022)A Multimodal Retrieval and Ranking Method for Scientific Documents Based on HFS and XLNetScientific Programming10.1155/2022/53735312022Online publication date: 4-Jan-2022
https://dl.acm.org/doi/10.1155/2022/5373531
Li YYang YWang JPeng SYao T(2022)Coupled Local and Global Semantic Alignment for Image-Text Matching2022 12th International Conference on Information Technology in Medicine and Education (ITME)10.1109/ITME56794.2022.00109(497-501)Online publication date: Nov-2022
https://doi.org/10.1109/ITME56794.2022.00109

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Issue’s Table of Contents