Improving cross-modal and multi-modal retrieval combining content and semantics similarities with probabilistic model

Wang, Shixun; Pan, Peng; Lu, Yansheng; Xie, Liang

doi:10.1007/s11042-013-1737-9

Improving cross-modal and multi-modal retrieval combining content and semantics similarities with probabilistic model

Published: 25 October 2013

Volume 74, pages 2009–2032, (2015)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Shixun Wang¹,
Peng Pan¹,
Yansheng Lu¹ &
…
Liang Xie¹

493 Accesses
Explore all metrics

Abstract

With the ongoing development of the internet, a large number of multimedia documents containing images and texts have appeared in the daily life of people. Therefore, how to effectively and efficiently conduct cross-modal and multi-modal retrieval is being an important issue. Although some methods have been proposed to deal with the issue, their retrieval processes are confined to a single information source of multimedia documents, such as the representations of images and texts at a semantic level. In this paper, we propose a novel probabilistic model, namely CCSS, which not only combines low-level content and high-level semantics similarities through a first-order Markov chain, but also provides heterogeneous similarity measures for different unimedia types. The ranked list for a query is obtained by highlighting an optimal path across the chain. Content similarity focuses on the internal structure of each modality, while semantics similarity focuses on the semantic correlation between different modalities. Both of them are significant and their combination can be complementary to each other. Multi-class logistic regression and random forests are used to map the original features of each unimedia into a semantic space. According to the query-by-example scenario, the experiments on the Wikipedia dataset show that the performance of our model significantly outperforms those of state-of-the-art approaches for cross-modal retrieval. Additionally, the proposed multi-modal method is also shown to outperform previous systems on image retrieval task.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Latent semantic factorization for multimedia representation learning

Article 30 August 2017

Coupled feature selection based semi-supervised modality-dependent cross-modal retrieval

Article 21 April 2018

COREN: Multi-Modal Co-Occurrence Transformer Reasoning Network for Image-Text Retrieval

Article 22 December 2022

Notes

Semantics similarity means metric similarity rather than the similarity among concept labels.
http://scgroup20.ceid.upatras.gr:8000/tmg/

References

Atrey PK, Hossain MA, EI Saddik A, Kankanhalli MS (2010) Multimodal fusion for multimedia analysis: a survey. Multime’d Syst 16(6):345–379
Article Google Scholar
Blei D, Ng A, Jordan M (2003) Latent Dirichlet allocation. J Mach Learn Res 3:993–1022
MATH Google Scholar
Breiman L (2001) Random forests. Mach Learn 45(1):5–32
Article MATH Google Scholar
Carneiro G, Chan A, Moreno P, Vasconcelos N (2007) Supervised learning of semantic classes for image annotation and retrieval. IEEE Trans Pattern Anal Mach Intell 29(3):394–410
Article Google Scholar
Clinchant S, Ah-Pine J, Csurka G (2011) Semantic combination of textual and visual information in multimedia retrieval. ACM Int Conf Multimed Retr
Coviello E, Mumtaz A, Chan A, Lanckriet G (2012) Growing a bag of systems tree for fast and accurate classification. IEEE Int Conf Comput Vis Pattern Recognit (CVPR)
Csurka G, Dance C, Fan L, Willamowski J, Bray C (2004) Visual categorization with bags of keypoints. Workshop Stat Learn Comput Vis ECCV 1:22, Citeseer
Google Scholar
Forney G (1973) The Viterbi algorithm. Proc IEEE 61(3):268–278
Article MathSciNet Google Scholar
Haubold A, Natsev A, Naphade MR (2006) Semantic multimedia retrieval using lexical query expansion and model-based reranking. IEEE Int Conf Multimed Expo (ICME)
Hofmann T (1999) Probabilistic latent semantic indexing. In: Proceedings of the 22th Annual International SIGIR Conference
Hotelling H (1936) Relations between two sets of variates. Biometrika 28(3–4):321–337
Article MATH Google Scholar
Jia Y, Salzmann M, Darrell T (2011) Learning Cross-modality Similarity for Multinomial Data. IEEE Int Conf Comput Vis (ICCV)
Jolliffe IT (2002) Principal component analysis. Springer
Li D, Dimitrova N, Li M, Sethi IK (2003) Multimedia content processing through cross-modal association. Proc ACM Int Conf Multimed
Lmura J, Fujisawa T, Harada T, Kuniyoshi Y (2011) Efficient multi-modal retrieval in conceptual space. ACM Int Conf Multimed
Logan B, Salomon A (2001) A music similarity function based on signal analysis. IEEE Int Conf Multimed Expo (ICME)
Lowe D (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110
Article Google Scholar
Manning CD, Ranghavan P, Schütze H (2008) Introduction to information retrieval. Cambridge University Press, Cambridge
Book MATH Google Scholar
Miotto R, Orio N (2012) A Probabilistic Model to Combine Tags and Acoustic Similarity for Music Retrieval. ACM Trans Inf Syst 30: No. 2, Article 8
Rabiner L (1989) A tutorial on hidden Markov models and selected application in speech recognition. Proc IEEE 77(2):257–286
Article Google Scholar
Rasiwasia N, Moreno P, Vasconcelos N (2007) Bridging the gap: query by semantic example. IEEE Trans Multime’d 9(5):923–938
Article Google Scholar
Rasiwasia N, Pereira JC, Coviello E, Doyle G, Lanckriet G, Levy R, Vasconcelos N (2010) A new approach to cross-modal multimedia retrieval. Proc ACM Int Conf Multimed
Smeulders AWM, Worring M, Santini S, Gupta A, Jain R (2000) Content-based image retrieval at the end of the early years. IEEE Trans Pattern Anal Mach Intell 22(12):1349–1380
Article Google Scholar
Snoek CG, Worring M (2005) Multimodal video indexing: a review of the state-of-the-art. Multimed Tools Appl 25(1):5–35
Article Google Scholar
Turnbull D, Barrington L, Torres D, Lanckriet G (2008) Semantic annotation and retrieval of music and sound effects. IEEE Trans Audio Speech Lang Process 16(2):467–476
Article Google Scholar
Vasconcelos N (2004) Minimum probability of error image retrieval. IEEE Trans Signal Process 52(8):2322–2336
Article MathSciNet Google Scholar
Vía J, Santamaía I, Pérez J (2005) Canonical correlation analysis (CCA) algorithms for multiple data sets: Application to blind SIMO equalization. In proceedings of the 13th European Signal Processing Conference (EUSIPCO)
Vinokourov A, Hardoon DR, Shawe-Taylor J (2003) Learning the semantics of multimedia content with application to web image retrieval and classification. In: International symposium on Independent Component Analysis and Blind Source Separation
Westerveld T, De Vries AP, van Ballegooij A, de Jong F, Hiemstra D (2003) A probabilistic multimedia retrieval model and its evaluation. EURASIP J Appl Signal Process 2:186–198
Article Google Scholar
Xie L, Pan P, Lu Y (2013) A semantic model for cross-modal and multi-modal retrieval. ACM Int Conf Multimed Retr
Yang Y, Xu D, Nie F, Luo J, Zhuang Y (2009) Ranking with local regression and global alignment for cross media retrieval. ACM Int Conf Multimed
Zhai X, Peng Y, Xiao J (2013) Cross-media retrieval by intra-media and inter-media correlation mining. Multime’d. Syst 19(5):395–406
Google Scholar
Zhai X, Peng Y, Xiao J (2012) Cross-modality correlation propagation for cross-media retrieval. Proc ICASSP
Zhai X, Peng Y, Xiao J (2012) Effective heterogeneous similarity measure with nearest neighbors for cross-media retrieval. Int Conf MultiMed Model (MMM)
Zhen Y, Yeung D (2012) A probabilistic model for multimodal hash function learning. Proc ACM KDD
Zhen Y, Yeung D (2012) Co-regularized hashing for multimodal data. Adv Neural Inf Process Syst (NIPS)
Zhen Y, Yeung D (2013) Active hashing and its application to image and text retrieval. Data Min Knowl Disc 26(2):255–274
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science & Technology, Huazhong University of Science and Technology, Wuhan, 430074, Peoples Republic of China
Shixun Wang, Peng Pan, Yansheng Lu & Liang Xie

Authors

Shixun Wang
View author publications
You can also search for this author in PubMed Google Scholar
Peng Pan
View author publications
You can also search for this author in PubMed Google Scholar
Yansheng Lu
View author publications
You can also search for this author in PubMed Google Scholar
Liang Xie
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Peng Pan.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, S., Pan, P., Lu, Y. et al. Improving cross-modal and multi-modal retrieval combining content and semantics similarities with probabilistic model. Multimed Tools Appl 74, 2009–2032 (2015). https://doi.org/10.1007/s11042-013-1737-9

Download citation

Published: 25 October 2013
Issue Date: March 2015
DOI: https://doi.org/10.1007/s11042-013-1737-9

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Improving cross-modal and multi-modal retrieval combining content and semantics similarities with probabilistic model

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Latent semantic factorization for multimedia representation learning

Coupled feature selection based semi-supervised modality-dependent cross-modal retrieval

COREN: Multi-Modal Co-Occurrence Transformer Reasoning Network for Image-Text Retrieval

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now