Modeling Text with Graph Convolutional Network for Cross-Modal Information Retrieval

Yu, Jing; Lu, Yuhang; Qin, Zengchang; Liu, Yanbing; Tan, Jianlong; Guo, Li; Zhang, Weifeng

Computer Science > Information Retrieval

arXiv:1802.00985 (cs)

[Submitted on 3 Feb 2018 (v1), last revised 13 Feb 2018 (this version, v2)]

Title:Modeling Text with Graph Convolutional Network for Cross-Modal Information Retrieval

Authors:Jing Yu, Yuhang Lu, Zengchang Qin, Yanbing Liu, Jianlong Tan, Li Guo, Weifeng Zhang

View PDF

Abstract:Cross-modal information retrieval aims to find heterogeneous data of various modalities from a given query of one modality. The main challenge is to map different modalities into a common semantic space, in which distance between concepts in different modalities can be well modeled. For cross-modal information retrieval between images and texts, existing work mostly uses off-the-shelf Convolutional Neural Network (CNN) for image feature extraction. For texts, word-level features such as bag-of-words or word2vec are employed to build deep learning models to represent texts. Besides word-level semantics, the semantic relations between words are also informative but less explored. In this paper, we model texts by graphs using similarity measure based on word2vec. A dual-path neural network model is proposed for couple feature learning in cross-modal information retrieval. One path utilizes Graph Convolutional Network (GCN) for text modeling based on graph representations. The other path uses a neural network with layers of nonlinearities for image modeling based on off-the-shelf features. The model is trained by a pairwise similarity loss function to maximize the similarity of relevant text-image pairs and minimize the similarity of irrelevant pairs. Experimental results show that the proposed model outperforms the state-of-the-art methods significantly, with 17% improvement on accuracy for the best case.

Comments:	7 pages, 11 figures
Subjects:	Information Retrieval (cs.IR)
Cite as:	arXiv:1802.00985 [cs.IR]
	(or arXiv:1802.00985v2 [cs.IR] for this version)
	https://doi.org/10.48550/arXiv.1802.00985

Submission history

From: Jing Yu [view email]
[v1] Sat, 3 Feb 2018 15:05:01 UTC (1,372 KB)
[v2] Tue, 13 Feb 2018 04:19:45 UTC (1,372 KB)

Computer Science > Information Retrieval

Title:Modeling Text with Graph Convolutional Network for Cross-Modal Information Retrieval

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Information Retrieval

Title:Modeling Text with Graph Convolutional Network for Cross-Modal Information Retrieval

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators