Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3595916.3626424acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Multi-view–enhanced modal fusion hashing for Unsupervised cross-modal retrieval

Published: 01 January 2024 Publication History

Abstract

Cross-modal hashing is an important direction for multimodal data management and applications, which has recently received more and more attention. Unsupervised cross-modal retrieval does not rely on tag information and is more applicable to the real world. However it still faces some problems. Existing methods mainly encode for local features or global features. Due to the effect of negative samples, it is easy to cause noise interference. To solve these problems, we propose a Multi-view–enhanced modal fusion hashing for Unsupervised cross-modal retrieval (MUCH) to improve these problems. Firstly, we propose a multi-view network. Images inherently contain richer semantics, and we employ a multi-view network to observe the image from different perspectives and obtain the overall and local features of the image. Secondly, we introduce a noise cancellation module to approximate the cross-modal data feature alignment from both intra-modal and cross-modal perspectives before generating the hash code. Finally, we construct a distribution-based similarity weighting matrix to replace the graphical similarity matrix. And we performed multi-view enhancement experiments on JDSH and CIRH, with 1% to 2% enhancement over DAEH on all three datasets.

References

[1]
Tat-Seng Chua, Jinhui Tang, Richang Hong, Haojie Li, Zhiping Luo, and Yantao Zheng. 2009. Nus-wide: a real-world web image database from national university of singapore. In Proceedings of the ACM international conference on image and video retrieval. 1–9.
[2]
Cheng Deng, Zhaojia Chen, Xianglong Liu, Xinbo Gao, and Dacheng Tao. 2018. Triplet-based deep hashing network for cross-modal retrieval. IEEE Transactions on Image Processing 27, 8 (2018), 3893–3903.
[3]
Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, 2020. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020).
[4]
Mark J Huiskes and Michael S Lew. 2008. The mir flickr retrieval evaluation. In Proceedings of the 1st ACM international conference on Multimedia information retrieval. 39–43.
[5]
Qing-Yuan Jiang and Wu-Jun Li. 2019. Discrete latent factor model for cross-modal hashing. IEEE Transactions on Image Processing 28, 7 (2019), 3490–3501.
[6]
Chao Li, Cheng Deng, Ning Li, Wei Liu, Xinbo Gao, and Dacheng Tao. 2018. Self-supervised adversarial hashing networks for cross-modal retrieval. In Proceedings of the IEEE conference on computer vision and pattern recognition. 4242–4251.
[7]
Liang Li, Baihua Zheng, and Weiwei Sun. 2022. Adaptive structural similarity preserving for unsupervised cross modal hashing. In Proceedings of the 30th ACM International Conference on Multimedia. 3712–3721.
[8]
Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. 2014. Microsoft coco: Common objects in context. In Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13. Springer, 740–755.
[9]
Zijia Lin, Guiguang Ding, Mingqing Hu, and Jianmin Wang. 2015. Semantics-preserving hashing for cross-view retrieval. In Proceedings of the IEEE conference on computer vision and pattern recognition. 3864–3872.
[10]
Song Liu, Shengsheng Qian, Yang Guan, Jiawei Zhan, and Long Ying. 2020. Joint-modal distribution-based similarity hashing for large-scale unsupervised deep cross-modal retrieval. In Proceedings of the 43rd international ACM SIGIR conference on research and development in information retrieval. 1379–1388.
[11]
Xiaoqing Liu, Huanqiang Zeng, Yifan Shi, Jianqing Zhu, and Kai-Kuang Ma. 2022. Deep Rank Cross-Modal Hashing with Semantic Consistent for Image-Text Retrieval. In ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 4828–4832.
[12]
Georgii Mikriukov, Mahdyar Ravanbakhsh, and Begüm Demir. 2022. Deep unsupervised contrastive hashing for large-scale cross-modal text-image retrieval in remote sensing. arXiv preprint arXiv:2201.08125 (2022).
[13]
Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, 2021. Learning transferable visual models from natural language supervision. In International conference on machine learning. PMLR, 8748–8763.
[14]
Bin Shan, Weichong Yin, Yu Sun, Hao Tian, Hua Wu, and Haifeng Wang. 2022. ERNIE-ViL 2.0: Multi-view Contrastive Learning for Image-Text Pre-training. arXiv preprint arXiv:2209.15270 (2022).
[15]
Yufeng Shi, Xinge You, Feng Zheng, Shuo Wang, and Qinmu Peng. 2019. Equally-Guided Discriminative Hashing for Cross-modal Retrieval. In IJCAI. 4767–4773.
[16]
Yufeng Shi, Yue Zhao, Xin Liu, Feng Zheng, Weihua Ou, Xinge You, and Qinmu Peng. 2022. Deep adaptively-enhanced hashing with discriminative similarity guidance for unsupervised cross-modal retrieval. IEEE Transactions on Circuits and Systems for Video Technology 32, 10 (2022), 7255–7268.
[17]
Shupeng Su, Zhisheng Zhong, and Chao Zhang. 2019. Deep joint-semantics reconstructing hashing for large-scale unsupervised cross-modal retrieval. In Proceedings of the IEEE/CVF international conference on computer vision. 3027–3035.
[18]
Wentao Tan, Lei Zhu, Jingjing Li, Huaxiang Zhang, and Junwei Han. 2022. Teacher-student learning: Efficient hierarchical message aggregation hashing for cross-modal retrieval. IEEE Transactions on Multimedia (2022).
[19]
Rong-Cheng Tu, Jie Jiang, Qinghong Lin, Chengfei Cai, Shangxuan Tian, Hongfa Wang, and Wei Liu. 2023. Unsupervised cross-modal hashing with modality-interaction. IEEE Transactions on Circuits and Systems for Video Technology (2023).
[20]
Rong-Cheng Tu, Xian-Ling Mao, Qinghong Lin, Wenjin Ji, Weize Qin, Wei Wei, and Heyan Huang. 2023. Unsupervised Cross-modal Hashing via Semantic Text Mining. IEEE Transactions on Multimedia (2023).
[21]
Rong-Cheng Tu, Xian-Ling Mao, Rong-Xin Tu, Binbin Bian, Chengfei Cai, Wei Wei, Heyan Huang, 2022. Deep cross-modal proxy hashing. IEEE Transactions on Knowledge and Data Engineering (2022).
[22]
Botong Wu, Qiang Yang, Wei-Shi Zheng, Yizhou Wang, and Jingdong Wang. 2015. Quantized Correlation Hashing for Fast Cross-Modal Search. In IJCAI, Vol. 1. 2.
[23]
Gengshen Wu, Zijia Lin, Jungong Han, Li Liu, Guiguang Ding, Baochang Zhang, and Jialie Shen. 2018. Unsupervised Deep Hashing via Binary Latent Factor Models for Large-scale Cross-modal Retrieval. In IJCAI, Vol. 1. 5.
[24]
Dejie Yang, Dayan Wu, Wanqian Zhang, Haisu Zhang, Bo Li, and Weiping Wang. 2020. Deep semantic-alignment hashing for unsupervised cross-modal retrieval. In Proceedings of the 2020 international conference on multimedia retrieval. 44–52.
[25]
Zhaoda Ye and Yuxin Peng. 2018. Multi-scale correlation for sequential cross-modal hashing learning. In Proceedings of the 26th ACM international conference on Multimedia. 852–860.
[26]
Jun Yu, Hao Zhou, Yibing Zhan, and Dacheng Tao. 2021. Deep graph-neighbor coherence preserving network for unsupervised cross-modal hashing. In Proceedings of the AAAI conference on artificial intelligence, Vol. 35. 4626–4634.
[27]
Dongqing Zhang and Wu-Jun Li. 2014. Large-scale supervised multimodal hashing with semantic correlation maximization. In Proceedings of the AAAI conference on artificial intelligence, Vol. 28.
[28]
Peng-Fei Zhang, Jiasheng Duan, Zi Huang, and Hongzhi Yin. 2021. Joint-teaching: Learning to refine knowledge for resource-constrained unsupervised cross-modal retrieval. In Proceedings of the 29th ACM International Conference on Multimedia. 1517–1525.
[29]
Peng-Fei Zhang, Yang Li, Zi Huang, and Xin-Shun Xu. 2021. Aggregation-based graph convolutional hashing for unsupervised cross-modal retrieval. IEEE Transactions on Multimedia 24 (2021), 466–479.
[30]
Peng-Fei Zhang, Yadan Luo, Zi Huang, Xin-Shun Xu, and Jingkuan Song. 2021. High-order nonlocal Hashing for unsupervised cross-modal retrieval. World Wide Web 24 (2021), 563–583.
[31]
Yuanchao Zheng, Yan Dong, and Xiaowei Zhang. 2022. Relation-Guided Dual Hash Network for Unsupervised Cross-Modal Retrieval. In International Conference on Neural Information Processing. Springer, 497–508.
[32]
Lei Zhu, Xize Wu, Jingjing Li, Zheng Zhang, Weili Guan, and Heng Tao Shen. 2022. Work together: correlation-identity reconstruction hashing for unsupervised cross-modal retrieval. IEEE Transactions on Knowledge and Data Engineering (2022).
[33]
Xitao Zou, Xinzhi Wang, Erwin M Bakker, and Song Wu. 2021. Multi-label semantics preserving based deep cross-modal hashing. Signal Processing: Image Communication 93 (2021), 116131.

Index Terms

  1. Multi-view–enhanced modal fusion hashing for Unsupervised cross-modal retrieval
    Index terms have been assigned to the content through auto-classification.

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MMAsia '23: Proceedings of the 5th ACM International Conference on Multimedia in Asia
    December 2023
    745 pages
    ISBN:9798400702051
    DOI:10.1145/3595916
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 01 January 2024

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Dual hash network
    2. Unsupervised Learning
    3. cross-modal retrieval

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Funding Sources

    • the Science and Technology Research Program of Chongqing Municipal Education Commission
    • Chongqing Natural Science Foundation
    • Humanities and social science research project of Chongqing Education Commission

    Conference

    MMAsia '23
    Sponsor:
    MMAsia '23: ACM Multimedia Asia
    December 6 - 8, 2023
    Tainan, Taiwan

    Acceptance Rates

    Overall Acceptance Rate 59 of 204 submissions, 29%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 95
      Total Downloads
    • Downloads (Last 12 months)95
    • Downloads (Last 6 weeks)9
    Reflects downloads up to 10 Nov 2024

    Other Metrics

    Citations

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media