research-article

Structured exploration of who, what, when, and where in heterogeneous multimedia news sources

Authors:

Joseph G. Ellis,

Daniel Morozoff-Abegauz,

Shih-Fu ChangAuthors Info & Claims

MM '13: Proceedings of the 21st ACM international conference on Multimedia

Pages 357 - 360

https://doi.org/10.1145/2502081.2508118

Published: 21 October 2013 Publication History

Abstract

We present a fully automatic system from raw data gathering to navigation over heterogeneous news sources, including over 18k hours of broadcast video news, 3.58M online articles, and 430M public Twitter messages. Our system addresses the challenge of extracting "who," "what," "when," and "where" from a truly multimodal perspective, leveraging audiovisual information in broadcast news and those embedded in articles, as well as textual cues in both closed captions and raw document content in articles and social media. Performed over time, we are able to extract and study the trend of topics in the news and detect interesting peaks in news coverage over the life of the topic. We visualize these peaks in trending news topics using automatically extracted keywords and iconic images, and introduce a novel multimodal algorithm for naming speakers in the news. We also present several intuitive navigation interfaces for interacting with these complex topic structures over different news sources.

References

[1]

X. Anguera Miro, S. Bozonnet, N. Evans, C. Fredouille, G. Friedland, and O. Vinyals. Speaker diarization: A review of recent research. ASLP, 2012.

Digital Library

[2]

H. Becker, M. Naaman, and L. Gravano. Beyond trending topics: Real-world event identification on Twitter. In ICWSM, 2011.

[3]

E. L. Bird, Steven and E. Klein. Natural Language Processing with Python. O'Reilly Media Inc., 2009.

Digital Library

[4]

A. S. Das, M. Datar, A. Garg, and S. Rajaram. Google News personalization: Scalable online collaborative filtering. In WWW, 2007.

Digital Library

[5]

M. Everingham, J. Sivic, and A. Zisserman. Taking the bite out of automated naming of characters in TV video. Image Vision Computing, 2009.

Digital Library

[6]

J. R. Finkel, T. Grenager, and C. Manning. Incorporating non-local information into information extraction systems by gibbs sampling. In ACL, 2005.

Digital Library

[7]

W. Hsu, L. Kennedy, C.-W. Huang, S.-F. Chang, C.-Y. Lin, and G. Iyengar. News video story segmentation using fusion of multi-level multi-modal features in TRECVID 2003. In ICASSP, 2004.

[8]

M. Huijbregts. Segmentation, Diarization, and Speech Transcription: Suprise Data Unraveled. PhD thesis, University of Twente, 2008.

[9]

T. Jebara, J. Wang, and S.-F. Chang. Graph construction and b-matching for semi-supervised learning. In ICML, 2009.

Digital Library

[10]

J. Kahn, O. Galibert, L. Quintard, M. Carre, A. Giraudel, and P. Joly. A presentation of the REPERE challenge. In CBMI, 2012.

[11]

Q. Li, S. Anzaroot, W.-P. Lin, X. Li, and H. Ji. Joint inference for cross-document information extraction. In CIKM, 2011.

Digital Library

[12]

National Institute of Standards and Technology. Text REtrieval Conference (TREC): VIDeo track (TRECVID).

[13]

A. Noulas, G. Englebienne, and B. J. A. Kröse. Multimodal speaker diarization. PAMI, 2012.

Digital Library

[14]

M. A. Siegler, U. Jain, B. Raj, and R. M. Stern. Automatic segmentation, classification and clustering of broadcast news audio. In DARPA Speech Recogn. Workshop, 1997.

[15]

M. Uricár, V. Franc, and V. Hlavác. Detector of facial landmarks learned by the structured output SVM. In VISAPP. SciTePress, 2012.

Cited By

Lee SJo K(2020)Person Browser System Based on Named Entity Recognition for Broadcast News Interview VideosInternational Journal of Control, Automation and Systems10.1007/s12555-019-0391-z19:1(186-199)Online publication date: 5-Aug-2020
https://doi.org/10.1007/s12555-019-0391-z
Fernàndez-Cañellas DEspadaler JRodriguez DGarolera BCanet GColom ARimmek JGiro-i-Nieto XBou ERiveiro J(2019)VLX-Stories: Building an Online Event Knowledge Base with Emerging Entity DetectionThe Semantic Web – ISWC 201910.1007/978-3-030-30796-7_24(382-399)Online publication date: 26-Oct-2019
https://dl.acm.org/doi/10.1007/978-3-030-30796-7_24
Yazici AKoyuncu MYilmaz TSattari SSert MGulen E(2018)An intelligent multimedia information system for multimodal content extraction and queryingMultimedia Tools and Applications10.1007/s11042-017-4378-677:2(2225-2260)Online publication date: 1-Jan-2018
https://dl.acm.org/doi/10.1007/s11042-017-4378-6
Show More Cited By

Index Terms

Recommendations

Speaker Diarization For Multiple-Distant-Microphone Meetings Using Several Sources of Information

Human-machine interaction in meetings requires the localization and identification of the speakers interacting with the system as well as the recognition of the words spoken. A seminal step toward this goal is the field of rich transcription research, ...
Multimedia News Summarization in Search
Regular Papers, Survey Papers and Special Issue on Recommender System Benchmarks

It is a necessary but challenging task to relieve users from the proliferative news information and allow them to quickly and comprehensively master the information of the whats and hows that are happening in the world every day. In this article, we ...
News rover: exploring topical structures and serendipity in heterogeneous multimedia news
MM '13: Proceedings of the 21st ACM international conference on Multimedia

News stories are rarely understood in isolation. Every story is driven by key entities that give the story its context. Persons, places, times, and several surrounding topics can often succinctly represent a news event, but are only useful if they can ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MM '13: Proceedings of the 21st ACM international conference on Multimedia

October 2013

1166 pages

ISBN:9781450324045

DOI:10.1145/2502081

General Chairs:
Alejandro (Alex) Jaimes
Yahoo!, Spain
,
Nicu Sebe
University of Trento, Italy
,
Nozha Boujemaa
INRIA, France
,
Program Chairs:
Daniel Gatica-Perez
IDIAP & EPFL, Switzerland
,
David A. Shamma
Yahoo!, USA
,
Marcel Worring
University of Amsterdam, The Netherlands
,
Roger Zimmermann
National University of Singapore, Singapore

Copyright © 2013 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 October 2013

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

MM '13

Sponsor:

SIGMM

MM '13: ACM Multimedia Conference

October 21 - 25, 2013

Barcelona, Spain

Acceptance Rates

MM '13 Paper Acceptance Rate 47 of 235 submissions, 20%;

Overall Acceptance Rate 995 of 4,171 submissions, 24%

Upcoming Conference

MM '24

Sponsor:
sigmm

The 32nd ACM International Conference on Multimedia

October 28 - November 1, 2024

Melbourne , VIC , Australia

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

19
Total Citations
View Citations
888
Total Downloads

Downloads (Last 12 months)8
Downloads (Last 6 weeks)0

Reflects downloads up to 12 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Lee SJo K(2020)Person Browser System Based on Named Entity Recognition for Broadcast News Interview VideosInternational Journal of Control, Automation and Systems10.1007/s12555-019-0391-z19:1(186-199)Online publication date: 5-Aug-2020
https://doi.org/10.1007/s12555-019-0391-z
Fernàndez-Cañellas DEspadaler JRodriguez DGarolera BCanet GColom ARimmek JGiro-i-Nieto XBou ERiveiro J(2019)VLX-Stories: Building an Online Event Knowledge Base with Emerging Entity DetectionThe Semantic Web – ISWC 201910.1007/978-3-030-30796-7_24(382-399)Online publication date: 26-Oct-2019
https://dl.acm.org/doi/10.1007/978-3-030-30796-7_24
Yazici AKoyuncu MYilmaz TSattari SSert MGulen E(2018)An intelligent multimedia information system for multimodal content extraction and queryingMultimedia Tools and Applications10.1007/s11042-017-4378-677:2(2225-2260)Online publication date: 1-Jan-2018
https://dl.acm.org/doi/10.1007/s11042-017-4378-6
Li WJoo JQi HZhu S(2017)Joint Image-Text News Topic Detection and Tracking by Multimodal Topic And-Or GraphIEEE Transactions on Multimedia10.1109/TMM.2016.261627919:2(367-381)Online publication date: 1-Feb-2017
https://dl.acm.org/doi/10.1109/TMM.2016.2616279
Lee SJo K(2017)Automatic person information extraction using overlay text in television news interview videos2017 IEEE 15th International Conference on Industrial Informatics (INDIN)10.1109/INDIN.2017.8104837(583-588)Online publication date: Jul-2017
https://doi.org/10.1109/INDIN.2017.8104837
Lee SJo K(2017)Strategy for automatic person indexing and retrieval system in news interview video sequences2017 10th International Conference on Human System Interactions (HSI)10.1109/HSI.2017.8005031(212-215)Online publication date: Jul-2017
https://doi.org/10.1109/HSI.2017.8005031
Tsai CXu RColgan RKender JMoens MPastra KSaenko KTuytelaars T(2016)News Event Understanding by Mining Latent Factors From Multimodal TensorsProceedings of the 2016 ACM workshop on Vision and Language Integration Meets Multimedia Fusion10.1145/2983563.2983564(9-16)Online publication date: 16-Oct-2016
https://dl.acm.org/doi/10.1145/2983563.2983564
Ellis JKaraman SLi HShim HChang SHanjalic ASnoek CWorring MBulterman DHuet BKelliher AKompatsiaris YLi J(2016)Placing Broadcast News Videos in their Social Media Context Using HashtagsProceedings of the 24th ACM international conference on Multimedia10.1145/2964284.2970929(684-688)Online publication date: 1-Oct-2016
https://dl.acm.org/doi/10.1145/2964284.2970929
Le NOdobez JHanjalic ASnoek CWorring MBulterman DHuet BKelliher AKompatsiaris YLi J(2016)Learning Multimodal Temporal Representation for Dubbing Detection in Broadcast MediaProceedings of the 24th ACM international conference on Multimedia10.1145/2964284.2967211(202-206)Online publication date: 1-Oct-2016
https://dl.acm.org/doi/10.1145/2964284.2967211
Song EEllis JLi HChang SKender JSmith JLuo JBoll SHsu W(2016)Watching What and How Politicians Discuss Various TopicsProceedings of the 2016 ACM on International Conference on Multimedia Retrieval10.1145/2911996.2912025(401-404)Online publication date: 6-Jun-2016
https://dl.acm.org/doi/10.1145/2911996.2912025
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents