Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Efficient Multi-modal Hashing with Online Query Adaption for Multimedia Retrieval

Published: 27 September 2021 Publication History

Abstract

Multi-modal hashing supports efficient multimedia retrieval well. However, existing methods still suffer from two problems: (1) Fixed multi-modal fusion. They collaborate the multi-modal features with fixed weights for hash learning, which cannot adaptively capture the variations of online streaming multimedia contents. (2) Binary optimization challenge. To generate binary hash codes, existing methods adopt either two-step relaxed optimization that causes significant quantization errors or direct discrete optimization that consumes considerable computation and storage cost. To address these problems, we first propose a Supervised Multi-modal Hashing with Online Query-adaption method. A self-weighted fusion strategy is designed to adaptively preserve the multi-modal features into hash codes by exploiting their complementarity. Besides, the hash codes are efficiently learned with the supervision of pair-wise semantic labels to enhance their discriminative capability while avoiding the challenging symmetric similarity matrix factorization. Further, we propose an efficient Unsupervised Multi-modal Hashing with Online Query-adaption method with an adaptive multi-modal quantization strategy. The hash codes are directly learned without the reliance on the specific objective formulations. Finally, in both methods, we design a parameter-free online hashing module to adaptively capture query variations at the online retrieval stage. Experiments validate the superiority of our proposed methods.

References

[1]
Mokhtar S. Bazaraa, Hanif D. Sherali, and Chitharanjan M. Shetty. 2013. Nonlinear Programming: Theory and Algorithms. John Wiley & Sons.
[2]
Julian Besag. 1986. On the statistical analysis of dirty pictures. J. Roy. Stat. Soc.: Series B (Methodol.) 48, 3 (1986), 259–279.
[3]
Yue Cao, Mingsheng Long, Jianmin Wang, and Shichen Liu. 2017. Collective deep quantization for efficient cross-modal retrieval. In Proceedings of the 31st AAAI Conference on Artificial Intelligence (AAAI’17). 3974–3980.
[4]
Yong Chen, Zhibao Tian, Hui Zhang, Jun Wang, and Dell Zhang. 2020. Strongly constrained discrete hashing. IEEE Trans. Image Process. 29 (2020), 3596–3611.
[5]
Miaomiao Cheng, Liping Jing, and Michael K. Ng. 2020. Robust unsupervised cross-modal hashing for multimedia retrieval. ACM Trans. Info. Syst. 38, 3 (2020), 1–25.
[6]
Tat-Seng Chua, Jinhui Tang, Richang Hong, Haojie Li, Zhiping Luo, and Yantao Zheng. 2009. NUS-WIDE: A real-world web image database from national university of singapore. In Proceedings of the 8th ACM International Conference on Image and Video Retrieval (CIVR’09). 48.
[7]
G. Ding, Y. Guo, J. Zhou, and Y. Gao. 2016. Large-Scale cross-modality search via collective matrix factorization hashing. IEEE Trans. Image Process. 25, 11 (2016), 5427–5440.
[8]
Yunchao Gong, Svetlana Lazebnik, Albert Gordo, and Florent Perronnin. 2013. Iterative quantization: A procrustean approach to learning binary codes for large-scale image retrieval. IEEE Trans. Pattern Anal. Mach. Intell. 35, 12 (2013), 2916–2929.
[9]
Casper Hansen, Christian Hansen, Jakob Grue Simonsen, Stephen Alstrup, and Christina Lioma. 2020. Unsupervised semantic hashing with pairwise reconstruction. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’20). ACM, 2009–2012.
[10]
Mark J. Huiskes and Michael S. Lew. 2008. The MIR flickr retrieval evaluation. In Proceedings of the 1st ACM SIGMM International Conference on Multimedia Information Retrieval (MIR’08). 39–43.
[11]
Qing-Yuan Jiang and Wu-Jun Li. 2017. Deep cross-modal hashing. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). 3232–3240.
[12]
Saehoon Kim and Seungjin Choi. 2013. Multi-view anchor graph hashing. In Proceeding of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’13). 3123–3127.
[13]
Shaishav Kumar and Raghavendra Udupa. 2011. Learning hash functions for cross-view similarity search. In Proceedings of the 22nd International Joint Conference on Artificial Intelligence (IJCAI’11).
[14]
Chuan-Xiang Li, Zhen-Duo Chen, Peng-Fei Zhang, Xin Luo, Liqiang Nie, Wei Zhang, and Xin-Shun Xu. 2018. SCRATCH: A scalable discrete matrix factorization hashing for cross-modal retrieval. In Proceedings of the 26th ACM International Conference on Multimedia (MM’18). 1–9.
[15]
Stan Z. Li. 2012. Markov Random Field Modeling in Computer Vision. Springer Science & Business Media.
[16]
Mingbao Lin, Rongrong Ji, Shen Chen, Xiaoshuai Sun, and Chia-Wen Lin. 2020. Similarity-Preserving linkage hashing for online image retrieval. IEEE Trans. Image Process. 29 (2020), 5289–5300.
[17]
Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C. Lawrence Zitnick. 2014. Microsoft coco: Common objects in context. In Proceedings of 13th European Conference on Computer Vision (ECCV’14). 740–755.
[18]
Li Liu, Mengyang Yu, and Ling Shao. 2015. Multiview alignment hashing for efficient image search. IEEE Trans. Image Process. 24, 3 (2015), 956–966.
[19]
W. Liu, C. Mu, S. Kumar, and S. F. Chang. 2014. Discrete graph hashing. Adv. Neural Info. Process. Syst. 4 (2014), 3419–3427.
[20]
Xianglong Liu, Junfeng He, and Bo Lang. 2014. Multiple feature kernel hashing for large-scale visual search. Pattern Recogn. 47, 2 (2014), 748–757.
[21]
Xianglong Liu, Zhujin Li, Cheng Deng, and Dacheng Tao. 2017. Distributed adaptive binary quantization for fast nearest neighbor search. IEEE Trans. Image Process. 26, 11 (2017), 5324–5336.
[22]
Mingsheng Long, Yue Cao, Jianmin Wang, and Philip S. Yu. 2016. Composite correlation quantization for efficient multimodal retrieval. In Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’16). 579–588.
[23]
Xu Lu, Lei Zhu, Zhiyong Cheng, Jingjing Li, Xiushan Nie, and Huaxiang Zhang. 2019. Flexible online multi-modal hashing for large-scale multimedia retrieval. In Proceedings of the 27th ACM International Conference on Multimedia (MM’19). 1129–1137.
[24]
Xu Lu, Lei Zhu, Zhiyong Cheng, Liqiang Nie, and Huaxiang Zhang. 2019. Online multi-modal hashing with dynamic query adaption. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’19). 715–724.
[25]
Xu Lu, Lei Zhu, Jingjing Li, Huaxiang Zhang, and Hengtao Shen. 2020. Efficient supervised discrete multi-view hashing for large-scale multimedia search. IEEE Trans. Multimedia 22, 8 (2020), 2048–2060.
[26]
Xin Luo, Liqiang Nie, Xiangnan He, Ye Wu, Zhen-Duo Chen, and Xin-Shun Xu. 2018. Fast scalable supervised hashing. In Proceedings of the 41st International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’18). 735–744.
[27]
Xin Luo, Peng-Fei Zhang, Ye Wu, Zhen-Duo Chen, Hua-Junjie Huang, and Xin-Shun Xu. 2018. Asymmetric discrete cross-modal hashing. In Proceedings of the ACM on International Conference on Multimedia Retrieval (ICMR’18). 204–212.
[28]
Devraj Mandal, Kunal N. Chaudhury, and Soma Biswas. 2018. Generalized semantic preserving hashing for cross-modal retrieval. IEEE Trans. Image Process. 28, 1 (2018), 102–112.
[29]
Fumin Shen, Chunhua Shen, Wei Liu, and Heng Tao Shen. 2015. Supervised discrete hashing. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’15). 37–45.
[30]
Fumin Shen, Yan Xu, Li Liu, Yang Yang, Zi Huang, and Heng Tao Shen. 2018. Unsupervised deep hashing with similarity-adaptive and discrete optimization. IEEE Trans. Pattern Anal. Mach. Intell. 40, 12 (2018), 3034-3044.
[31]
Xiaobo Shen, Funmin Shen, Liliu, Yunhao Yuan, Weiwei Liu, and Quansen Sun. 2018. Multiview discrete hashing for scalable multimedia search. ACM Trans. Intell. Syst. Technol. 9, 5 (2018), 53.
[32]
Xiao-Bo Shen, Fumin Shen, Quan-Sen Sun, and Yunhao Yuan. 2015. Multi-view latent hashing for efficient multimedia search. In Proceedings of the 23rd Annual ACM Conference on Multimedia Conference (MM’15). 831–834.
[33]
Karen Simonyan and Andrew Zisserman. 2015. Very deep convolutional networks for large-scale image recognition. In Proceedings of the 3rd International Conference on Learning Representations.
[34]
Jingkuan Song, Yi Yang, Zi Huang, Heng Tao Shen, and Jiebo Luo. 2013. Effective multiple feature hashing for large-scale near-duplicate video retrieval. IEEE Trans. Multimedia 15, 8 (2013), 1997–2008.
[35]
Changchang Sun, Xuemeng Song, Fuli Feng, Wayne Xin Zhao, Hao Zhang, and Liqiang Nie. 2019. Supervised hierarchical cross-modal hashing. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’19). 725–734.
[36]
Jingdong Wang and Ting Zhang. 2018. Composite quantization. IEEE Trans. Pattern Anal. Mach. Intell. 41, 6 (2018), 1308–1322.
[37]
J. Wang, T. Zhang, J. Song, N. Sebe, and H. T. Shen. 2018. A survey on learning to hash. IEEE Trans. Pattern Anal. Mach. Intell. 40, 4 (2018), 769–790.
[38]
Xiaojuan Wang, Ting Zhang, Guo-Jun Qi, Jinhui Tang, and Jingdong Wang. 2016. Supervised quantization for similarity search. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16). 2018–2026.
[39]
Xin Wang, Wenwu Zhu, and Chenghao Liu. 2019. Semi-Supervised deep quantization for cross-modal search. In Proceedings of the 27th ACM International Conference on Multimedia (MM’19). 1730–739.
[40]
Zijian Wang, Zheng Zhang, Yadan Luo, and Zi Huang. 2019. Deep collaborative discrete hashing with semantic-invariant structure. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’19). 905–908.
[41]
Gengshen Wu, Zijia Lin, Jungong Han, Li Liu, Guiguang Ding, Baochang Zhang, and Jialie Shen. 2018. Unsupervised deep hashing via binary latent factor models for large-scale cross-modal retrieval. In Proceedings of the 27th International Joint Conference on Artificial Intelligence (IJCAI’18). 2854–2860.
[42]
X. Xu, F. Shen, Y. Yang, H. T. Shen, and X. Li. 2017. Learning discriminative binary codes for large-scale cross-modal retrieval. IEEE Trans. Image Process. 26, 5 (2017), 2494–2507.
[43]
Erkun Yang, Cheng Deng, Chao Li, Wei Liu, Jie Li, and Dacheng Tao. 2018. Shared predictive cross-modal deep quantization. IEEE Trans. Neural Netw. Learn. Syst. 29, 11 (2018), 5292–5303.
[44]
Rui Yang, Yuliang Shi, and Xin-Shun Xu. 2017. Discrete multi-view hashing for effective image retrieval. In Proceedings of the ACM on International Conference on Multimedia Retrieval (ICMR’17). 175–783.
[45]
Zhan Yang, Jun Long, Lei Zhu, and Wenti Huang. 2020. Nonlinear robust discrete hashing for cross-modal retrieval. In Proceedings of the 43rd International ACM SIGIR conference on research and development in Information Retrieval (SIGIR’20). ACM, 1349–1358.
[46]
Dongqing Zhang and Wu-Jun Li. 2014. Large-scale supervised multimodal hashing with semantic correlation maximization. In Proceedings of the 28th AAAI Conference on Artificial Intelligence (AAAI’14). 2177–2183.
[47]
Dan Zhang, Fei Wang, and Luo Si. 2011. Composite hashing with multiple information sources. In Proceeding of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’11). 225–234.
[48]
Peichao Zhang, Wei Zhang, Wu-Jun Li, and Minyi Guo. 2014. Supervised hashing with latent factor models. In Proceeding of the 37th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’14). 173–182.
[49]
Lei Zhu, Zi Huang, Zhihui Li, Liang Xie, and Heng Tao Shen. 2018. Exploring auxiliary context: Discrete semantic transfer hashing for scalable image retrieval. IEEE Trans. Neural Netw. Learn. Syst. 29, 11 (2018), 5264–5276.
[50]
Lei Zhu, Jialie Shen, Liang Xie, and Zhiyong Cheng. 2017. Unsupervised visual hashing with semantic assistant for content-based image retrieval. IEEE Trans. Knowl. Data Eng. 29, 2 (2017), 472–486.
[51]
Xiaofeng Zhu, Zi Huang, Hong Cheng, Jiangtao Cui, and Heng Tao Shen. 2013. Sparse hashing for fast multimedia search. ACM Trans. Info. Syst. 31, 2 (2013), 1–24.

Cited By

View all
  • (2024)Multi-Facet Weighted Asymmetric Multi-Modal Hashing Based on Latent Semantic DistributionIEEE Transactions on Multimedia10.1109/TMM.2024.336366426(7307-7320)Online publication date: 2024
  • (2024)Relational Network via Cascade CRF for Video Language GroundingIEEE Transactions on Multimedia10.1109/TMM.2023.330371226(8297-8311)Online publication date: 2024
  • (2024)Relaxed Energy Preserving Hashing for Image RetrievalIEEE Transactions on Intelligent Transportation Systems10.1109/TITS.2024.335184125:7(7388-7400)Online publication date: Jul-2024
  • Show More Cited By

Index Terms

  1. Efficient Multi-modal Hashing with Online Query Adaption for Multimedia Retrieval

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Information Systems
    ACM Transactions on Information Systems  Volume 40, Issue 2
    April 2022
    587 pages
    ISSN:1046-8188
    EISSN:1558-2868
    DOI:10.1145/3484931
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 27 September 2021
    Accepted: 01 July 2021
    Revised: 01 February 2021
    Received: 01 October 2020
    Published in TOIS Volume 40, Issue 2

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Multi-modal hashing
    2. online query adaption
    3. asymmetric semantic supervision
    4. adaptive multi-modal quantization
    5. complementary
    6. prototypes

    Qualifiers

    • Research-article
    • Refereed

    Funding Sources

    • National Natural Science Foundation of China
    • Natural Science Foundation of Shandong, China
    • Major Fundamental Research Project of Shandong, China
    • Youth Innovation Project of Shandong Universities, China
    • Taishan Scholar Project of Shandong, China

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)74
    • Downloads (Last 6 weeks)4
    Reflects downloads up to 01 Sep 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Multi-Facet Weighted Asymmetric Multi-Modal Hashing Based on Latent Semantic DistributionIEEE Transactions on Multimedia10.1109/TMM.2024.336366426(7307-7320)Online publication date: 2024
    • (2024)Relational Network via Cascade CRF for Video Language GroundingIEEE Transactions on Multimedia10.1109/TMM.2023.330371226(8297-8311)Online publication date: 2024
    • (2024)Relaxed Energy Preserving Hashing for Image RetrievalIEEE Transactions on Intelligent Transportation Systems10.1109/TITS.2024.335184125:7(7388-7400)Online publication date: Jul-2024
    • (2023)Construction of law and economics litigation service platform based on multimedia retrievalApplied Mathematics and Nonlinear Sciences10.2478/amns.2023.2.000088:2(2913-2926)Online publication date: 5-Jul-2023
    • (2023)Complex Scenario Image Retrieval via Deep Similarity-aware HashingACM Transactions on Multimedia Computing, Communications, and Applications10.1145/362401620:4(1-24)Online publication date: 11-Dec-2023
    • (2023)TeViS: Translating Text Synopses to Video StoryboardsProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3612417(4968-4979)Online publication date: 26-Oct-2023
    • (2023)OMGH: Online Manifold-Guided Hashing for Flexible Cross-Modal RetrievalIEEE Transactions on Multimedia10.1109/TMM.2022.316666825(3811-3824)Online publication date: 1-Jan-2023
    • (2023)Multi-Modal Hashing for Efficient Multimedia Retrieval: A SurveyIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.328292136:1(239-260)Online publication date: 5-Jun-2023
    • (2022)A Multimodal Retrieval and Ranking Method for Scientific Documents Based on HFS and XLNetScientific Programming10.1155/2022/53735312022Online publication date: 4-Jan-2022
    • (2022)Coupled Local and Global Semantic Alignment for Image-Text Matching2022 12th International Conference on Information Technology in Medicine and Education (ITME)10.1109/ITME56794.2022.00109(497-501)Online publication date: Nov-2022

    View Options

    Get Access

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media