Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3474085.3475598acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Graph Convolutional Multi-modal Hashing for Flexible Multimedia Retrieval

Published: 17 October 2021 Publication History

Abstract

Multi-modal hashing makes an important contribution to multimedia retrieval, where a key challenge is to encode heterogeneous modalities into compact hash codes. To solve this dilemma, graph-based multi-modal hashing methods generally define individual affinity matrix of each independent modality and apply linear algorithm for heterogeneous modalities fusion and compact hash learning. Several other methods construct graph Laplacian matrix based on semantic information to help learn discriminative hash code. However, these conventional methods roughly ignore the structural similarity of training set and the complex relations among multi-modal samples, which leads to unsatisfactory complementarity of fused hash codes. More notably, they are faced with two other important problems: huge computing and storage costs caused by graph construction and partial modality feature lost problem when incomplete query sample comes. In this paper, we propose a Flexible Graph Convolutional Multi-modal Hashing (FGCMH) method that adopts GCNs with linear complexity to preserve both the modality-individual and modality-fused structural similarity for discriminative hash learning. Necessarily, accurate multimedia retrieval can be performed on complete and incomplete datasets with our method. Specifically, multiple modality-individual GCNs under semantic guidance are proposed to act on each individual modality independently for intra-modality similarity preserving, then the output representations are fused into a fusion graph with adaptive weighting scheme. Hash GCN and semantic GCN, which share parameters in the first two layers, propagate fusion information and generate hash codes under high-level label space supervision. In the query stage, our method adaptively captures various multi-modal contents in a flexible and robust way, even if partial modality features are lost. Experimental results on three publicly datasets show the flexibility and effectiveness of our proposed method.

References

[1]
Cong Bai, Chao Zeng, Qing Ma, Jinglin Zhang, and Shengyong Chen. 2020. Deep Adversarial Discrete Hashing for Cross-Modal Retrieval. In Proceedings of the International Conference on Multimedia Retrieval. 525--531.
[2]
Joan Bruna, Wojciech Zaremba, Arthur Szlam, and Yann LeCun. 2014. Spectral Networks and Locally Connected Networks on Graphs. In Proceedings of the International Conference on Learning Representations.
[3]
Yue Cao, Mingsheng Long, Bin Liu, and Jianmin Wang. 2018. Deep Cauchy Hashing for Hamming Space Retrieval. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1229--1237.
[4]
Yudong Chen, Zhihui Lai, Yujuan Ding, Kaiyi Lin, and Wai Keung Wong. 2019. Deep Supervised Hashing With Anchor Graph. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 9795--9803.
[5]
Yong Chen, Zhibao Tian, Hui Zhang, Jun Wang, and Dell Zhang. 2020. Strongly Constrained Discrete Hashing. IEEE Transactions on Image Processing, Vol. 29 (2020), 3596--3611.
[6]
Zhixiang Chen, Xin Yuan, Jiwen Lu, Qi Tian, and Jie Zhou. 2018. Deep Hashing via Discrepancy Minimization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6838--6847.
[7]
Wei-Lin Chiang, Xuanqing Liu, Si Si, Yang Li, Samy Bengio, and Cho-Jui Hsieh. 2019. Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 257--266.
[8]
Tat-Seng Chua, Jinhui Tang, Richang Hong, Haojie Li, Zhiping Luo, and Yantao Zheng. 2009. NUS-WIDE: a real-world web image database from National University of Singapore. In Proceedings of the ACM International Conference on Image and Video Retrieval. 48.
[9]
Michaë l Defferrard, Xavier Bresson, and Pierre Vandergheynst. 2016. Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering. In Proceedings of the Advances in Neural Information Processing Systems. 3837--3845.
[10]
Jiasheng Duan, Yadan Luo, Ziwei Wang, and Zi Huang. 2020. Semi-supervised Cross-Modal Hashing with Graph Convolutional Networks. In Proceedings of the Australasian Database Conference. 93--104.
[11]
Ralph Gasser, Luca Rossetto, Silvan Heller, and Heiko Schuldt. 2020. Cottontail DB: An Open Source Database System for Multimedia Retrieval and Analysis. In Proceedings of the ACM International Conference on Multimedia. 4465--4468.
[12]
William L. Hamilton, Rex Ying, and Jure Leskovec. 2017. Representation Learning on Graphs: Methods and Applications. Bulletin of the IEEE Computer Society Technical Committee on Data Engineering, Vol. 40, 3 (2017), 52--74.
[13]
Mikael Henaff, Joan Bruna, and Yann LeCun. 2015. Deep Convolutional Networks on Graph-Structured Data. CoRR, Vol. abs/1506.05163 (2015). arxiv: 1506.05163
[14]
Mark J. Huiskes, Bart Thomee, and Michael S. Lew. 2010. New trends and ideas in visual concept detection: the MIR flickr retrieval evaluation initiative. In Proceedings of the ACM SIGMM International Conference on Multimedia Information Retrieval. 527--536.
[15]
Qing-Yuan Jiang and Wu-Jun Li. 2015. Scalable Graph Hashing with Feature Transformation. In Proceedings of the International Joint Conference on Artificial Intelligence. 2248--2254.
[16]
Qing-Yuan Jiang and Wu-Jun Li. 2017. Deep Cross-Modal Hashing. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3270--3278.
[17]
Lu Jin, Kai Li, Hao Hu, Guo-Jun Qi, and Jinhui Tang. 2018. Semantic Neighbor Graph Hashing for Multimodal Retrieval. IEEE Transactions on Image Processing, Vol. 27, 3 (2018), 1405--1417.
[18]
Lu Jin, Zechao Li, Yonghua Pan, and Jinhui Tang. 2020. Weakly-Supervised Image Hashing through Masked Visual-Semantic Graph-based Reasoning. In Proceedings of the ACM International Conference on Multimedia. 916--924.
[19]
Saehoon Kim and Seungjin Choi. 2013. Multi-view anchor graph hashing. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing. 3123--3127.
[20]
Thomas N. Kipf and Max Welling. 2017. Semi-Supervised Classification with Graph Convolutional Networks. In Proceedings of the International Conference on Learning Representations.
[21]
Tsung-Yi Lin, Michael Maire, Serge J. Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C. Lawrence Zitnick. 2014. Microsoft COCO: Common Objects in Context. In Proceedings of the European Conference on Computer Vision, Vol. 8693. 740--755.
[22]
Venice Erin Liong, Jiwen Lu, Ling-Yu Duan, and Yap-Peng Tan. 2020. Deep Variational and Structural Hashing. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 42, 3 (2020), 580--595.
[23]
Li Liu, Mengyang Yu, and Ling Shao. 2015. Multiview alignment hashing for efficient image search. IEEE Transactions on Image Processing, Vol. 24, 3 (2015), 956--966.
[24]
Luyao Liu, Zheng Zhang, and Zi Huang. 2020. Flexible Discrete Multi-view Hashing with Collective Latent Feature Learning. Neural Processing Letters 52 (2020), 1765--1791.
[25]
Wei Liu, Jun Wang, Sanjiv Kumar, and Shih-Fu Chang. 2011. Hashing with Graphs. In Proceedings of the International Conference on Machine Learning. 1--8.
[26]
Xianglong Liu, Junfeng He, Di Liu, and Bo Lang. 2012. Compact kernel hashing with multiple features. In Proceedings of the ACM International Conference on Multimedia. 881--884.
[27]
Xu Lu, Lei Zhu, Zhiyong Cheng, Jingjing Li, Xiushan Nie, and Huaxiang Zhang. 2019. Flexible Online Multi-modal Hashing for Large-scale Multimedia Retrieval. In Proceedings of the ACM International Conference on Multimedia. 1129--1137.
[28]
Xu Lu, Lei Zhu, Jingjing Li, Huaxiang Zhang, and Heng Tao Shen. 2020. Efficient Supervised Discrete Multi-View Hashing for Large-Scale Multimedia Search. IEEE Transactions on Multimedia, Vol. 22, 8 (2020), 2048--2060.
[29]
Min Meng, Haitao Wang, Jun Yu, Hui Chen, and Jigang Wu. 2021. Asymmetric Supervised Consistent and Specific Hashing for Cross-Modal Retrieval. IEEE Transactions on Image Processing, Vol. 30 (2021), 986--1000.
[30]
Md. Abdur Rahman, George Loukas, Syed Maruf Abdullah, Areej Abdu, Syed Sadiqur Rahman, Elham Hassanain, and Yasmine Arafa. 2019. Blockchain and IoT-based Secure Multimedia Retrieval System for a Massive Crowd: Sharing Economy Perspective. In Proceedings of the International Conference on Multimedia Retrieval. 404--407.
[31]
Xiaobo Shen, Funmin Shen, Liliu, Yunhao Yuan, Weiwei Liu, and Quansen Sun. 2018. Multiview Discrete Hashing for Scalable Multimedia Search. ACM Transactions on Intelligent Systems and Technology, Vol. 9, 5 (2018), 53.
[32]
Xiaobo Shen, Fumin Shen, Quan-Sen Sun, and Yunhao Yuan. 2015. Multi-view latent hashing for efficient multimedia search. In Proceedings of the ACM Conference on Multimedia. 831--834.
[33]
Zhanjian Shen, Deming Zhai, Xianming Liu, and Junjun Jiang. 2020. Semi-Supervised Graph Convolutional Hashing Network For Large-Scale Cross-Modal Retrieval. In Proceedings of the IEEE International Conference on Image Processing. 2366--2370.
[34]
Xiaoshuang Shi, Fuyong Xing, Kaidi Xu, Manish Sapkota, and Lin Yang. 2017. Asymmetric Discrete Graph Hashing. In Proceedings of the AAAI Conference on Artificial Intelligence. 2541--2547.
[35]
Jingkuan Song, Yi Yang, Zi Huang, Heng Tao Shen, and Jiebo Luo. 2013. Effective multiple feature hashing for large-scale near-duplicate video retrieval. IEEE Transsactions on Multimedia, Vol. 15, 8 (2013), 1997--2008.
[36]
Rianne van den Berg, Thomas N. Kipf, and Max Welling. 2017. Graph Convolutional Matrix Completion. CoRR, Vol. abs/1706.02263 (2017). arxiv: 1706.02263
[37]
Yongxin Wang, Xin Luo, and Xin-Shun Xu. 2020 a. Label Embedding Online Hashing for Cross-Modal Retrieval. In Proceedings of the ACM International Conference on Multimedia. 871--879.
[38]
Yangtao Wang, Yanzhao Xie, Yu Liu, Ke Zhou, and Xiaocui Li. 2020 b. Fast Graph Convolution Network Based Multi-label Image Recognition via Cross-modal Fusion. In Proceedings of the ACM International Conference on Information and Knowledge Management. 1575--1584.
[39]
Chang Xu, Dacheng Tao, and Chao Xu. 2013. A Survey on Multi-view Learning. CoRR, Vol. abs/1304.5634 (2013).
[40]
Ruiqing Xu, Chao Li, Junchi Yan, Cheng Deng, and Xianglong Liu. 2019. Graph Convolutional Network Hashing for Cross-Modal Retrieval. In Proceedings of the International Joint Conference on Artificial Intelligence. 982--988.
[41]
Rui Yang, Yuliang Shi, and Xin-Shun Xu. 2017. Discrete Multi-view Hashing for Effective Image Retrieval. In Proceedings of the ACM International Conference on Multimedia Retrieval. 175--783.
[42]
Yuning You, Tianlong Chen, Zhangyang Wang, and Yang Shen. 2020. L2-GCN: Layer-Wise and Learned Efficient Training of Graph Convolutional Networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2124--2132.
[43]
Yibing Zhan, Jun Yu, Zhou Yu, Rong Zhang, Dacheng Tao, and Qi Tian. 2018. Comprehensive Distance-Preserving Autoencoders for Cross-Modal Retrieval. In Proceedings of the ACM International Conference on Multimedia. 1137--1145.
[44]
Dan Zhang, Fei Wang, and Luo Si. 2011. Composite hashing with multiple information sources. In Proceedings of the ACM SIGIR conference on Research and Development in Information Retrieval. 225--234.
[45]
Ling Zhao, Yujiao Song, Chao Zhang, Yu Liu, Pu Wang, Tao Lin, Min Deng, and Haifeng Li. 2020. T-GCN: A Temporal Graph Convolutional Network for Traffic Prediction. IEEE Transactions on Intelligent Transportation Systems, Vol. 21, 9 (2020), 3848--3858.
[46]
Chaoqun Zheng, Lei Zhu, Zhiyong Cheng, Jingjing Li, and Anan Liu. 2021. Adaptive Partial Multi-view Hashing for Efficient Social Image Retrieval. IEEE Transactions on Multimedia (2021).
[47]
Xiang Zhou, Fumin Shen, Li Liu, Wei Liu, Liqiang Nie, Yang Yang, and Heng Tao Shen. 2020. Graph Convolutional Network Hashing. IEEE Transactions on Cybernetics, Vol. 50, 4 (2020), 1460--1472.

Cited By

View all
  • (2024)Similarity Transitivity Broken-Aware Multi-Modal HashingIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.339649236:11(7003-7014)Online publication date: Nov-2024
  • (2024)Asymmetric Supervised Fusion-Oriented Hashing for Cross-Modal RetrievalIEEE Transactions on Cybernetics10.1109/TCYB.2023.324101854:2(851-864)Online publication date: Feb-2024
  • (2024)Boosted Curriculum Multi-View Hashing for Multimedia RetrievalIEEE Signal Processing Letters10.1109/LSP.2024.344096831(2065-2069)Online publication date: 2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
MM '21: Proceedings of the 29th ACM International Conference on Multimedia
October 2021
5796 pages
ISBN:9781450386517
DOI:10.1145/3474085
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 October 2021

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. graph convolutional network
  2. hashing
  3. multi-modal
  4. multimedia retrieval

Qualifiers

  • Research-article

Funding Sources

Conference

MM '21
Sponsor:
MM '21: ACM Multimedia Conference
October 20 - 24, 2021
Virtual Event, China

Acceptance Rates

Overall Acceptance Rate 995 of 4,171 submissions, 24%

Upcoming Conference

MM '24
The 32nd ACM International Conference on Multimedia
October 28 - November 1, 2024
Melbourne , VIC , Australia

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)102
  • Downloads (Last 6 weeks)4
Reflects downloads up to 15 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Similarity Transitivity Broken-Aware Multi-Modal HashingIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.339649236:11(7003-7014)Online publication date: Nov-2024
  • (2024)Asymmetric Supervised Fusion-Oriented Hashing for Cross-Modal RetrievalIEEE Transactions on Cybernetics10.1109/TCYB.2023.324101854:2(851-864)Online publication date: Feb-2024
  • (2024)Boosted Curriculum Multi-View Hashing for Multimedia RetrievalIEEE Signal Processing Letters10.1109/LSP.2024.344096831(2065-2069)Online publication date: 2024
  • (2024)A Multi-View Double Alignment Hashing Network with Weighted Contrastive Learning2024 IEEE International Conference on Multimedia and Expo (ICME)10.1109/ICME57554.2024.10687739(1-6)Online publication date: 15-Jul-2024
  • (2024)Adaptive Confidence Multi-View Hashing for Multimedia RetrievalICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)10.1109/ICASSP48485.2024.10447517(7900-7904)Online publication date: 14-Apr-2024
  • (2024)Joint-Modal Graph Convolutional Hashing for unsupervised cross-modal retrievalNeurocomputing10.1016/j.neucom.2024.127911595(127911)Online publication date: Aug-2024
  • (2024)Fast metric multi-view hashing for multimedia retrievalInformation Fusion10.1016/j.inffus.2023.102130103:COnline publication date: 4-Mar-2024
  • (2024)Unsupervised graph reasoning distillation hashing for multimodal hamming space search with vision-language modelInternational Journal of Multimedia Information Retrieval10.1007/s13735-024-00326-813:2Online publication date: 30-Mar-2024
  • (2024)Hierarchical modal interaction balance cross-modal hashing for unsupervised image-text retrievalMultimedia Tools and Applications10.1007/s11042-024-19371-wOnline publication date: 18-May-2024
  • (2023)CLIP-Based Adaptive Graph Attention Network for Large-Scale Unsupervised Multi-Modal Hashing RetrievalSensors10.3390/s2307343923:7(3439)Online publication date: 24-Mar-2023
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media