Abstract
This paper deals with multimedia information access. We propose two new approaches for hybrid text-image information processing that can be straightforwardly generalized to the more general multimodal scenario. Both approaches fall in the trans-media pseudo-relevance feedback category. Our first method proposes using a mixture model of the aggregate components, considering them as a single relevance concept. In our second approach, we define trans-media similarities as an aggregation of monomodal similarities between the elements of the aggregate and the new multimodal object. We also introduce the monomodal similarity measures for text and images that serve as basic components for both proposed trans-media similarities. We show how one can frame a large variety of problem in order to address them with the proposed techniques: image annotation or captioning, text illustration and multimedia retrieval and clustering. Finally, we present how these methods can be integrated in two applications: a travel blog assistant system and a tool for browsing the Wikipedia taking into account the multimedia nature of its content.
Similar content being viewed by others
Notes
such as graph embedded into a 2D representation for example
References
Ah-Pine J, Cifarelli C, Clinchant S, Csurka G, Renders J (2008) Xrce’s participation to imageclefphoto 2008. In: Working Notes of the 2008 CLEF Workshop, Aarhus, 17–19 September 2008
Barnard K, Duygulu P, de Freitas N, Forsyth D, Blei D, Jordan MI (2003) Matching words and pictures. J Mach Learn Res 3:1107–1135
Blei D, Michael, Jordan MI (2003) Modeling annotated data. In: ACM SIGIR, Toronto, 28 July–1 August 2003
Carbonell J, Goldstein J (1998) The use of MMR, diversity-based reranking for reordering documents and producing summaries. In: SIGIR, Melbourne, 24–28 August 1998
Carbonetto P, de Freitas N, Barnard K (2004) A statistical model for general contextual object recognition. In: ECCV, Prague, 11–14 May 2004
Chang Y-C, Chen H-H (2006) Approaches of using a word-image ontology and an annotated image corpus as intermedia for cross-language image retrieval. In: CLEF 2006 Working Notes
Clinchant S, Renders J, Csurka G (2007) Xrce’s participation to imageclefphoto 2007. In: Working Notes of the 2007 CLEF Workshop. http://clef.isti.cnr.it/2007/working_notes/CLEF2007WN-Contents.html
Csurka G, Dance C, Fan L, Willamowski J, Bray C (2004) Visual categorization with bags of keypoints. In: ECCV Workshop on Statistical Learning for Computer Vision, Prague, May 2004
Dowman M, Tablan V, Cunningham H, Popov B (2005) Web-assisted annotation, semantic indexing and search of television and radio news. In: Proc. of the 14th international world wide web conference, Chiba, 10–14 May 2005
Duygulu P, Barnard K, de Freitas J, Forsyth D (2002) Object recognition as machine translation: learning a lexicon for a fixed image vocabulary. In: ECCV, Copenhagen, 27 May–2 June 2002
Feng S, Lavrenko V, Manmatha R (2004) Multiple bernoulli relevance models for image and video annotation. In: CVPR, Washington, DC, 27 June–2 July 2004
Flickr (2007) Flickr homepage. http://www.flickr.com
Footstops (2007) Footstops homepage. http://footstops.com/
Grubinger M, Clough P, Hanbury A, Müller H (2007) Overview of the ImageCLEFphoto 2007 photographic retrieval task. In: Working notes of the 2007 CLEF workshop http://www.clef-campaign.org/2007/working_notes/CLEF2007WN-Contents.html
Iyengar G, Duygulu P, Feng S, Ircing P, Khudanpur S, Klakow D, Krause M, Manmatha R, Nock H, Petkova D, Pytlik B, Virga P (2005) Joint visual-text modeling for automatic retrieval of multimedia documents. In: Proceedings of ACM multimedia, Singapore, 6–11 November 2005
Jeon J, Lavrenko V, Manmatha R (2003) Automatic image annotation and retrieval using cross-media relevance models. In: ACM SIGIR, Toronto, 28 July–1 August 2003
Lavrenko V, Manmatha R, Jeon J (2003) A model for learning the semantics of pictures. In: NIPS, Vancouver, 13 December 2003
Li X, Chen L, Zhang L, Lin F, ying Ma W (2006) Image annotation by large-scale content-based image retrieval. In: Proc. of the 14th annual ACM international conference on multimedia (MM06), Santa Barbara, 23–27 October 2006
Li L-J, Wang G, Fei-Fei L (2007) Optimol: automatic object picture collection via incremental model learning. In: CVPR, Minneapolis, 18–23 June 2007
Li J, Wang JZ (2005) Alip: The automatic linguistic indexing of pictures system. In: CVPR ’05: Proceedings of the 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR’05)—vol 2. IEEE Computer Society, Washington, DC, pp. 1208–1209
Maillot N, Chevallet J-P, Valea V, Lim JH (2006) Ipal inter-media pseudo-relevance feedback approach to imageclef 2006 photo retrieval. In: CLEF 2006 Working Notes
Monay F, Gatica-Perez D (2004) Plsa-based image auto-annotation: constraining the latent space. In: ACM MM, New York, 10–16 October 2004
Mori Y, Takahashi H, Oka R (1999) Image-to-word transformation based on dividing and vector quantizing images with words. In: MISRM’99 first international workshop on multimedia intelligent storage and retrieval management, Orlando, October 1999
Pan J, Yang H, Faloutsos C, Duygulu P (2004) Gcap: Graph-based automatic image captioning. In: CVPR workshop on multimedia data and document engineering, Washington, DC, July 2004
Perronnin F, Dance C (2007) Fisher kernels on visual vocabularies for image categorization. In: CVPR, Minneapolis, 18–23 June 2007
Quattoni A, Collins M, Darrell T (2007) Learning visual representations using images with captions. In: CVPR, Minneapolis, 18–23 June 2007
Realtravel (2007) Realtravel homepage. http://realtravel.com/
Rocchio JJ (1971) Relevance feedback in information retrieval. In: Salton G (ed) The SMART retrieval system—experiments in automatic document processing. Kluwer, Deventer
Tao T, Zhai C (2006) Regularized estimation of mixture models for robust pseudo-relevance feedback. In: SIGIR ’06: Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, Seattle, 6–11 August 2006
Travbuddy (2007) Travbuddy homepage. http://www.travbuddy.com/
Travelblog (2007) Travelblog homepage. http://www.travelblog.org/
Travellerspoint (2007) Travellerspoint homepage. http://www.travellerspoint.com
Travelpod (2007) Travelpod homepage. http://www.travelpod.com/
Trippert (2007) Trippert homepage. http://trippert.com/
Vinokourov A, Hardoon DR, Shawe-Taylor J (2003) Learning the semantics of multimedia content with application to web image retrieval and classification. In: Fourth international symposium on independent component analysis and blind source separation, Nara, 1–4 April 2003
Wang X, Zhang L, Jing W-YMF (2006) Annosearch: image auto-annotation by search. In: CVPR, New York, 17–22 June 2006
Yanai K, Barnard K (2005) Probabilistic web image gathering. In: Proc. of ACM multimedia workshop on multimedia information retrieval (MIR05), Singapore, 11–12 November 2005
Zhai C, Lafferty J (2001) Model-based feedback in the language modeling approach to information retrieval. In: CIKM, Atlanta, 5–10 November 2001
Acknowledgements
The authors want to thank particularly INA for their contributions in our work and Florent Perronin for his greatly appreciated help in applying some of the Generic Visual Categorizer (GVC) components. We would like also to acknowledge the following Flickr users whose photographs we reproduced here under Creative Common licences:
Tatiana Sapateiro http://www.flickr.com/photos/tatianasapateiro
Pedro Paulo Silva de Souza http://www.flickr.com/photos/pedrop
Leonardo Pallotta http://www.flickr.com/photos/groundzero
Laszlo Ilyes http://www.flickr.com/photos/laszlo-photo
Jorge Wagner http://www.flickr.com/photos/jorgewagner
UminDaGuma http://www.flickr.com/photos/umindaguma
Scott Robinson http://www.flickr.com/photos/clearlyambiguous
Gabriel Flores Romero http://www.flickr.com/photos/gabofr
Jenny Mealing http://www.flickr.com/photos/jennifrog
Roney http://www.flickr.com/photos/roney
David Katarina http://www.flickr.com/photos/davidkatarina
T. Chu http://www.flickr.com/photos/spyderball
Bill Wilcox http://www.flickr.com/photos/billwilcox
S2RD2 http://www.flickr.com/photos/stuardo
Fred Hsu http://www.flickr.com/photos/fhsu
Abel Pardo López http://www.flickr.com/photos/sancho_panza
Cat http://www.flickr.com/photos/clspeace
Thowra_uk http://www.flickr.com/photos/thowra
Elena Heredero http://www.flickr.com/photos/elenaheredero
Rick McCharles http://www.flickr.com/photos/rickmccharles
Marília Almeida http://www.flickr.com/photos/68306118@N00
Gustavo Madico http://www.flickr.com/photos/desdegus
Douglas Fernandes http://www.flickr.com/photos/thejourney1972
James Preston http://www.flickr.com/photos/jamespreston
Rodrigo Della Fávera http://www.flickr.com/photos/rodrigofavera
Dinesh Rao http://www.flickr.com/photos/dinrao
Marina Campos Vinhal http://www.flickr.com/photos/marinacvinhal
Jorge Gobbi http://www.flickr.com/photos/morrissey
Steve Taylor http://www.flickr.com/photostheboywiththethorninhisside/
Finally would like also to acknowledge the users who wrote the blog paragraphs were used and reproduced here. These texts can be found at the folloing addresses:
http://realtravel.com/cuzco-journals-j1879736.html
http://realtravel.com/machu_picchu-journals-j5181463.html
http://realtravel.com/rio-journals-j4669810.html
http://www.travelpod.com/travel-blog-entries/sarah_s_america/south_america/1140114720/tpod.html
http://www.travelpod.com/travel-blog-entries/rachel_john/roundtheworld/1146006300/tpod.html
http://www.travelpod.com/travel-blog-entries/eatdessertfirst/world_tour_05/1160411340/tpod.html
http://www.travelpod.com/travel-blog-entries/idarich/rtw_2005/1140476400/tpod.html
http://www.travelpod.com/travel-blog-entries/twittg/rtw/1132765860/tpod.html
http://www.travelpod.com/travel-blog-entries/emanddave/worldtrip2006/1155492420/tpod.html
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Ah-Pine, J., Bressan, M., Clinchant, S. et al. Crossing textual and visual content in different application scenarios. Multimed Tools Appl 42, 31–56 (2009). https://doi.org/10.1007/s11042-008-0246-8
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-008-0246-8