Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3126686.3126754acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article
Public Access

Detecting Culture-specific Tags for News Videos through Multimodal Embedding

Published: 23 October 2017 Publication History

Abstract

Many videos on the Web about international events are maintained in different countries, and some come with text descriptions from different cultural points of view. We introduce a new task-detecting culture-specific tags for news videos: given video keyframes and culture information, assign the most relevant tags with cultural preferences. We approach this problem by mapping visual and multilingual textual features into a joint latent space by reliable visual cues, by our proposed two-view pair-pair embedding and three-view embedding, through various canonical correlation analyses variants (Canonical Correlation Analysis, Deep Canonical Correlation Analysis, eneralize Canonical Correlation Analysis). For human-interest international events such as epidemics and transportation disasters, we proof that, for the same news event, tags of videos are significantly different in different cultures.

References

[1]
Galen Andrew, Raman Arora, Jeff Bilmes, and Karen Livescu. 2013. Deep Canonical Correlation Analysis. ICML.
[2]
Adrian Benton, Raman Arora, and Mark Dredze. 2016. Learning multiview embeddings of twitter users. ACL.
[3]
Shane Bergsma and Benjamin Van Durme. 2011. Learning Bilingual Lexicons using the Visual Similarity of Labeled Web Images. IJCAI.
[4]
P. Clough, H. Müller, T. Deselaers, and others. 2005. The CLEF 2005 CrossLanguage Image Retrieval Track. Proc. CLEF 2005 Workshop Working Notes.
[5]
Wei Dong and Wai-Tat Fu. 2010. Cultural Difference in Image Tagging. SIGCHI.
[6]
Andrea Frome, Greg S. Corrado, Jonathon Shlens, Samy Bengio, Jeffrey Dean, MarcAurelio Ranzato, and Tomas Mikolov. 2013. Devise: A deep visual-semantic embedding model. NIPS.
[7]
Yunchao Gong, Liwei Wang, Micah Hodosh, and Julia Hockenmaier. 2014. Improving image-sentence embeddings using large weakly annotated photo collections. ECCV.
[8]
Google. 2006. Google News. https://news.google.com/. (2006).
[9]
Micah Hodosh and Peter Young Julia Hockenmaier. 2013. Framing image description as a ranking task: Data, models and evaluation metrics. Journal of Artificial Intelligence Research.
[10]
Paul Horst. 1961. Generalized canonical correlations and their applications to experimental data. journal of Clinical Psycology.
[11]
Harrold Hotelling. 1936. Relations Between Two Sets of Variates. Biometrika, Vol. 28.
[12]
W. Jiang, S.-F. Chang, and A. C. Loui. 2006. Active context-based concept fusion with partial user labels. ICIP.
[13]
Andrej Karpathy and Li Fei-Fei. 2015. Deep visual-semantic alignments for generating image descriptions. CVPR.
[14]
Andrej Karpathy, Armand Joulin, and Li Fei-Fei. 2014. Deep fragment embeddings for bidirectional image sentence mapping. NIPS.
[15]
Ryan Kiros, Ruslan Salakhutdinov, and Richard S. Zemel. 2014. Unifying visualsemantic embeddings with multimodal neural language models. arXiv preprint arXiv:1411.2539.
[16]
J. Li and J. Z. Wang. 2006. Real-time computerized annotation of pictures. ACM MM.
[17]
X Li, CGM Snoek, and M Worring. 2009. Learning Social Tag Relevance by Neighbor Voting. IEEE Transactions on Multimedia.
[18]
Ang Lu, Weiran Wang, Mohit Bansal, Kevin Gimpel, and Karen Livescu. 2015. Deep Multilingual Correlation for Improved Word Embeddings. HLT-NAACL.
[19]
I. Mani and E. Bloedorn. 1999. Summarizing similarities and differences among related documents. Informational Retrieval.
[20]
Kanti V. Mardia, J. T. Kent, and J. M. Bibby. 1979. Multivariate Analysis. Academic Press.
[21]
Hiroyuki Nakasaki, Mariko Kawaba, Takehito Utsuro, and Tomohiro Fukuhara. 2009. Mining Cross-Lingual/Cross-Cultural Differences in Concerns and Opinions in Blogs. ICCPOL.
[22]
Popescu, A., Kanellos, and I. 2008. Multilingual and Content Based Access to FLICKR Images. ICTTA.
[23]
G.-J. Qi, X.-S. Hua, Y. Rui, J. Tang, T. Mei, and H.-J. Zhang. Correlative multilabel video annotation (In ACM MM 2007).
[24]
S Siersdorfer, J San Pedro, and M Sanderson. 2009. Automatic video tagging using content redundancy. SIGIR.
[25]
P. Snickars and P. Vonderau. 2010. The YouTube Reader. National Library of Sweden.
[26]
Richard Socher, Andrej Karpathy, Quoc V. Le, Christopher D. Manning, and Andrew Y. Ng. 2014. Grounded Compositional Semantics for Finding and Describing Images with Sentences. Transactions of the Association for Computational Linguistics.
[27]
D Wang, M Ogihara, and T Li. 2012. Summarizing the differences from microblogs. SIGIR.
[28]
Liwei Wang, Yin Li, and Svetlana Lazebnik. 2016. Learning Deep StructurePreserving Image-Text Embeddings. CVPR.
[29]
Weiran Wang, Raman Arora, Karen Livescu, and Jeff A. Bilmes. 2015. UNSUPERVISED LEARNING OF ACOUSTIC FEATURES VIA DEEP CANONICAL CORRELATION ANALYSIS. ICASSP.
[30]
Jason Weston, Samy Bengio, and Nicolas Usunier. 2011. Scaling up to large vocabulary image annotation. IJCAI.
[31]
Lexing Xie, Apostol Natsev, Xuming He, John Kender, and others. 2011. Visual Memes in Social Media. ACM MM.
[32]
Fei Yan and Krystian Mikolajczyk. 2015. Deep Correlation for Matching Images and Text. CVPR.
[33]
Ting Yao, Tao Mei, Chong-Wah Ngo, and Shipeng Li. 2013. Annotation for Free: Video Tagging by Mining User Search Behavior. ACM Multimedia.
[34]
WL Zhao, X Wu, and CW Ngo. 2010. On the annotation of web videos by efficient near-duplicate search. IEEE Transactions on Multimedia.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
Thematic Workshops '17: Proceedings of the on Thematic Workshops of ACM Multimedia 2017
October 2017
558 pages
ISBN:9781450354165
DOI:10.1145/3126686
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 October 2017

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. canonical correlation analysis
  2. deep canonical correlation analysis
  3. generalized canonical correlation analysis
  4. multimodal embedding
  5. news video analysis
  6. news video tagging

Qualifiers

  • Research-article

Funding Sources

Conference

MM '17
Sponsor:
MM '17: ACM Multimedia Conference
October 23 - 27, 2017
California, Mountain View, USA

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 237
    Total Downloads
  • Downloads (Last 12 months)57
  • Downloads (Last 6 weeks)12
Reflects downloads up to 12 Jan 2025

Other Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media