Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Improving cross-modal and multi-modal retrieval combining content and semantics similarities with probabilistic model

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

With the ongoing development of the internet, a large number of multimedia documents containing images and texts have appeared in the daily life of people. Therefore, how to effectively and efficiently conduct cross-modal and multi-modal retrieval is being an important issue. Although some methods have been proposed to deal with the issue, their retrieval processes are confined to a single information source of multimedia documents, such as the representations of images and texts at a semantic level. In this paper, we propose a novel probabilistic model, namely CCSS, which not only combines low-level content and high-level semantics similarities through a first-order Markov chain, but also provides heterogeneous similarity measures for different unimedia types. The ranked list for a query is obtained by highlighting an optimal path across the chain. Content similarity focuses on the internal structure of each modality, while semantics similarity focuses on the semantic correlation between different modalities. Both of them are significant and their combination can be complementary to each other. Multi-class logistic regression and random forests are used to map the original features of each unimedia into a semantic space. According to the query-by-example scenario, the experiments on the Wikipedia dataset show that the performance of our model significantly outperforms those of state-of-the-art approaches for cross-modal retrieval. Additionally, the proposed multi-modal method is also shown to outperform previous systems on image retrieval task.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Notes

  1. Semantics similarity means metric similarity rather than the similarity among concept labels.

  2. http://scgroup20.ceid.upatras.gr:8000/tmg/

References

  1. Atrey PK, Hossain MA, EI Saddik A, Kankanhalli MS (2010) Multimodal fusion for multimedia analysis: a survey. Multime’d Syst 16(6):345–379

    Article  Google Scholar 

  2. Blei D, Ng A, Jordan M (2003) Latent Dirichlet allocation. J Mach Learn Res 3:993–1022

    MATH  Google Scholar 

  3. Breiman L (2001) Random forests. Mach Learn 45(1):5–32

    Article  MATH  Google Scholar 

  4. Carneiro G, Chan A, Moreno P, Vasconcelos N (2007) Supervised learning of semantic classes for image annotation and retrieval. IEEE Trans Pattern Anal Mach Intell 29(3):394–410

    Article  Google Scholar 

  5. Clinchant S, Ah-Pine J, Csurka G (2011) Semantic combination of textual and visual information in multimedia retrieval. ACM Int Conf Multimed Retr

  6. Coviello E, Mumtaz A, Chan A, Lanckriet G (2012) Growing a bag of systems tree for fast and accurate classification. IEEE Int Conf Comput Vis Pattern Recognit (CVPR)

  7. Csurka G, Dance C, Fan L, Willamowski J, Bray C (2004) Visual categorization with bags of keypoints. Workshop Stat Learn Comput Vis ECCV 1:22, Citeseer

    Google Scholar 

  8. Forney G (1973) The Viterbi algorithm. Proc IEEE 61(3):268–278

    Article  MathSciNet  Google Scholar 

  9. Haubold A, Natsev A, Naphade MR (2006) Semantic multimedia retrieval using lexical query expansion and model-based reranking. IEEE Int Conf Multimed Expo (ICME)

  10. Hofmann T (1999) Probabilistic latent semantic indexing. In: Proceedings of the 22th Annual International SIGIR Conference

  11. Hotelling H (1936) Relations between two sets of variates. Biometrika 28(3–4):321–337

    Article  MATH  Google Scholar 

  12. Jia Y, Salzmann M, Darrell T (2011) Learning Cross-modality Similarity for Multinomial Data. IEEE Int Conf Comput Vis (ICCV)

  13. Jolliffe IT (2002) Principal component analysis. Springer

  14. Li D, Dimitrova N, Li M, Sethi IK (2003) Multimedia content processing through cross-modal association. Proc ACM Int Conf Multimed

  15. Lmura J, Fujisawa T, Harada T, Kuniyoshi Y (2011) Efficient multi-modal retrieval in conceptual space. ACM Int Conf Multimed

  16. Logan B, Salomon A (2001) A music similarity function based on signal analysis. IEEE Int Conf Multimed Expo (ICME)

  17. Lowe D (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110

    Article  Google Scholar 

  18. Manning CD, Ranghavan P, Schütze H (2008) Introduction to information retrieval. Cambridge University Press, Cambridge

    Book  MATH  Google Scholar 

  19. Miotto R, Orio N (2012) A Probabilistic Model to Combine Tags and Acoustic Similarity for Music Retrieval. ACM Trans Inf Syst 30: No. 2, Article 8

  20. Rabiner L (1989) A tutorial on hidden Markov models and selected application in speech recognition. Proc IEEE 77(2):257–286

    Article  Google Scholar 

  21. Rasiwasia N, Moreno P, Vasconcelos N (2007) Bridging the gap: query by semantic example. IEEE Trans Multime’d 9(5):923–938

    Article  Google Scholar 

  22. Rasiwasia N, Pereira JC, Coviello E, Doyle G, Lanckriet G, Levy R, Vasconcelos N (2010) A new approach to cross-modal multimedia retrieval. Proc ACM Int Conf Multimed

  23. Smeulders AWM, Worring M, Santini S, Gupta A, Jain R (2000) Content-based image retrieval at the end of the early years. IEEE Trans Pattern Anal Mach Intell 22(12):1349–1380

    Article  Google Scholar 

  24. Snoek CG, Worring M (2005) Multimodal video indexing: a review of the state-of-the-art. Multimed Tools Appl 25(1):5–35

    Article  Google Scholar 

  25. Turnbull D, Barrington L, Torres D, Lanckriet G (2008) Semantic annotation and retrieval of music and sound effects. IEEE Trans Audio Speech Lang Process 16(2):467–476

    Article  Google Scholar 

  26. Vasconcelos N (2004) Minimum probability of error image retrieval. IEEE Trans Signal Process 52(8):2322–2336

    Article  MathSciNet  Google Scholar 

  27. Vía J, Santamaía I, Pérez J (2005) Canonical correlation analysis (CCA) algorithms for multiple data sets: Application to blind SIMO equalization. In proceedings of the 13th European Signal Processing Conference (EUSIPCO)

  28. Vinokourov A, Hardoon DR, Shawe-Taylor J (2003) Learning the semantics of multimedia content with application to web image retrieval and classification. In: International symposium on Independent Component Analysis and Blind Source Separation

  29. Westerveld T, De Vries AP, van Ballegooij A, de Jong F, Hiemstra D (2003) A probabilistic multimedia retrieval model and its evaluation. EURASIP J Appl Signal Process 2:186–198

    Article  Google Scholar 

  30. Xie L, Pan P, Lu Y (2013) A semantic model for cross-modal and multi-modal retrieval. ACM Int Conf Multimed Retr

  31. Yang Y, Xu D, Nie F, Luo J, Zhuang Y (2009) Ranking with local regression and global alignment for cross media retrieval. ACM Int Conf Multimed

  32. Zhai X, Peng Y, Xiao J (2013) Cross-media retrieval by intra-media and inter-media correlation mining. Multime’d. Syst 19(5):395–406

    Google Scholar 

  33. Zhai X, Peng Y, Xiao J (2012) Cross-modality correlation propagation for cross-media retrieval. Proc ICASSP

  34. Zhai X, Peng Y, Xiao J (2012) Effective heterogeneous similarity measure with nearest neighbors for cross-media retrieval. Int Conf MultiMed Model (MMM)

  35. Zhen Y, Yeung D (2012) A probabilistic model for multimodal hash function learning. Proc ACM KDD

  36. Zhen Y, Yeung D (2012) Co-regularized hashing for multimodal data. Adv Neural Inf Process Syst (NIPS)

  37. Zhen Y, Yeung D (2013) Active hashing and its application to image and text retrieval. Data Min Knowl Disc 26(2):255–274

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Peng Pan.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, S., Pan, P., Lu, Y. et al. Improving cross-modal and multi-modal retrieval combining content and semantics similarities with probabilistic model. Multimed Tools Appl 74, 2009–2032 (2015). https://doi.org/10.1007/s11042-013-1737-9

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-013-1737-9

Keywords