research-article

Video summarization via transferrable structured learning

Authors:

Yong YuAuthors Info & Claims

WWW '11: Proceedings of the 20th international conference on World wide web

Pages 287 - 296

https://doi.org/10.1145/1963405.1963448

Published: 28 March 2011 Publication History

Abstract

It is well-known that textual information such as video transcripts and video reviews can significantly enhance the performance of video summarization algorithms. Unfortunately, many videos on the Web such as those from the popular video sharing site YouTube do not have useful textual information. The goal of this paper is to propose a transfer learning framework for video summarization: in the training process both the video features and textual features are exploited to train a summarization algorithm while for summarizing a new video only its video features are utilized. The basic idea is to explore the transferability between videos and their corresponding textual information. Based on the assumption that video features and textual features are highly correlated with each other, we can transfer textual information into knowledge on summarization using video information only. In particular, we formulate the video summarization problem as that of learning a mapping from a set of shots of a video to a subset of the shots using the general framework of SVM-based structured learning. Textual information is transferred by encoding them into a set of constraints used in the structured learning process which tend to provide a more detailed and accurate characterization of the different subsets of shots. Experimental results show significant performance improvement of our approach and demonstrate the utility of textual information for enhancing video summarization.

References

[1]

Ibm multimedia analysis and retrieval system, http://www.alphaworks.ibm.com/tech/imars.

[2]

Trec video, http://trecvid.nist.gov/.

[3]

A. Bagga, J. Hu, J. Zhong, and G. Ramesh. Multi-source combined-media video tracking for summarization. Pattern Recognition, International Conference on, 2:20818, 2002.

[4]

B.-W. Chen, J.-C. Wang, and J.-F. Wang. A novel video summarization based on mining the story-structure and semantic relations among concept entities. Multimedia, 11(2):295--312, Feb. 2009.

Digital Library

[5]

G. Cohen, A. Amir, D. Ponceleon, B. Blanchard, D. Petkovic, and S. Srinivasan. Using audio time scale modification for video browsing. In HICSS, page 3046, Washington, DC, USA, 2000. IEEE Computer Society.

Digital Library

[6]

W. Dai, Y. Chen, G.-R. Xue, Q. Yang, and Y. Yu. Translated learning: Transfer learning across different feature spaces. In NIPS, pages 353--360, 2008.

Digital Library

[7]

W. Dai, G.-R. Xue, Q. Yang, and Y. Yu. Transferring naive bayes classifiers for text classification. In AAAI, pages 540--545. AAAI Press, 2007.

Digital Library

[8]

M. Detyniecki and C. Marsala. Video rushes summarization by adaptive acceleration and stacking of shots. In TVS, pages 65--69, New York, NY, USA, 2007. ACM.

Digital Library

[9]

L. He, E. Sanocki, A. Gupta, and J. Grudin. Auto-summarization of audio-video presentations. In MULTIMEDIA, pages 489--498, New York, NY, USA, 1999. ACM.

Digital Library

[10]

Q. Huang, Z. Liu, A. Rosenberg, D. Gibbo, and B. Shahraray. Automated generation of news content hierarchy by integrating audio, video, and text information. In International Conference on Acoustics, Speech, and Signal Processing, 1999.

Digital Library

[11]

J. Jiang and C. Zhai. Instance weighting for domain adaptation in nlp. In ACL. The Association for Computer Linguistics, 2007.

[12]

S. X. Ju, M. J. Black, S. Minneman, and D. Kimber. Summarization of video-taped presentations: Automatic analysis of motion and gesture. IEEE Trans. on Circuits and Systems for Video Technology, 8:686--696, 1998.

Digital Library

[13]

S. Khuller, A. Moss, and J. Naor. The budgeted maximum coverage problem. Information Processing Letters, 70(1):39--45, 1999.

Digital Library

[14]

N. D. Lawrence and J. C. Platt. Learning to learn with the informative vector machine. In ICML, page 65, New York, NY, USA, 2004. ACM.

Digital Library

[15]

L. Li, K. Zhou, G.-R. Xue, H. Zha, and Y. Yu. Enhancing diversity, coverage and balance for summarization through structure learning. In WWW, pages 71--80, New York, NY, USA, 2009. ACM.

Digital Library

[16]

Y. Li, S.-H. Lee, C.-H. Yeh, and C. C. J. Kuo. Techniques for movie content analysis and skimming: tutorial and overview on video abstraction techniques. Signal Processing Magazine, 23(2):79--89, 2006.

[17]

D. G. Lowe. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vision, 60(2):91--110, 2004.

Digital Library

[18]

R. B. M. Sonka, V. Hlavac. Image Processing, Analysis, and machine vision. 2007.

Digital Library

[19]

L. Mihalkova, T. Huynh, and R. J. Mooney. Mapping and revising markov logic networks for transfer learning. In AAAI, pages 608--614. AAAI Press, 2007.

Digital Library

[20]

M. Mills, J. Cohen, and Y. Y. Wong. A magnifier tool for video data. In CHI, pages 93--98, New York, NY, USA, 1992. ACM.

Digital Library

[21]

C.-W. Ngo, Y.-F. Ma, and H.-J. Zhang. Video summarization and scene detection by graph modeling. IEEE transactions on circuits and systems for video technology, 15(2):296--305, 2005.

Digital Library

[22]

A. Oliva and A. Torralba. Modeling the shape of the scene: A holistic representation of the spatial envelope. Int. J. Comput. Vision, 42(3):145--175, 2001.

Digital Library

[23]

D. M. Russell. A design pattern-based video summarization technique: Moving from low-level signals to high-level structure. Hawaii International Conference on System Sciences, 3:3048, 2000.

Digital Library

[24]

M. A. Smith and T. Kanade. Video skimming and characterization through the combination of image and language understanding techniques. pages 370--382, 2001.

Digital Library

[25]

Y. Taniguchi, A. Akutsu, Y. Tonomura, and H. Hamada. An intuitive and efficient access interface to real-time incoming video based on automatic indexing. In MULTIMEDIA, pages 25--33, New York, NY, USA, 1995. ACM.

Digital Library

[26]

C. M. Taskiran, Z. Pizlo, A. Amir, D. Ponceleon, and E. J. Delp. Automated video program summarization using speech transcripts. Multimedia, 8(4):775--791, 2006.

Digital Library

[27]

M. E. Taylor and P. Stone. Cross-domain transfer for reinforcement learning. In ICML, pages 879--886, New York, NY, USA, 2007. ACM.

Digital Library

[28]

R. Tibshirani and G. Hinton. Coaching variables for regression and classification. Statistics and Computing, 8(1):25--33, 1998.

Digital Library

[29]

B. T. Truong and S. Venkatesh. Generating comprehensible summaries of rushes sequences based on robust feature matching. In TVS, pages 30--34, New York, NY, USA, 2007. ACM.

Digital Library

[30]

I. Tsochantaridis, T. Hofmann, T. Joachims, and Y. Altun. Large margin methods for structured and interdependent output variables. JMLR, 6:1453--1484, 2005.

Digital Library

[31]

S. Uchihashi, J. Foote, A. Girgensohn, and J. Boreczky. Video manga: generating semantically meaningful video summaries. In MULTIMEDIA, pages 383--392, New York, NY, USA, 1999. ACM.

Digital Library

[32]

Z. Xiong, R. Radhakrishan, A. Divakaran, and Y. Ishikawa. Generation of sports highlights using motion activity in combination with a common audio feature extraction framework. In ICIP, 2003.

[33]

M. M. Yeung and B. L. Yeo. Video visualization for compact presentation and fast browsing of pictorial content. IEEE Trans. on Circuits and Systems for Video Technology, 7:771--785, 1997.

Digital Library

[34]

C.-N. J. Yu and T. Joachims. Learning structural svms with latent variables. In ICML, pages 1169--1176, New York, NY, USA, 2009. ACM.

Digital Library

[35]

Y. Yue and T. Joachims. Predicting diverse subsets using structural svms. In ICML, pages 1224--1231, New York, NY, USA, 2008. ACM.

Digital Library

[36]

X. Zhu, J. Fan, A. K. Elmagarmid, and X. Wu. Hierarchical video content description and summarization using unified semantic and visual similarity. Multimedia Syst., 9(1):31--53, 2003.

Digital Library

Cited By

Khilji ASinha USingh PAli ALaskar SDadure PManna RPakray PFavre BBandyopadhyay S(2023)Multimodal text summarization with evaluation approachesSādhanā10.1007/s12046-023-02284-z48:4Online publication date: 24-Oct-2023
https://doi.org/10.1007/s12046-023-02284-z
Singh AKumar M(2023)Bayesian fuzzy clustering and deep CNN-based automatic video summarizationMultimedia Tools and Applications10.1007/s11042-023-15431-983:1(963-1000)Online publication date: 30-May-2023
https://doi.org/10.1007/s11042-023-15431-9
Hang WLiang SChoi KChung FWang S(2021)Selective Transfer Classification Learning With Classification-Error-Based Consensus RegularizationIEEE Transactions on Emerging Topics in Computational Intelligence10.1109/TETCI.2019.28927625:2(178-190)Online publication date: Apr-2021
https://doi.org/10.1109/TETCI.2019.2892762
Show More Cited By

Index Terms

Video summarization via transferrable structured learning
1. Information systems
  1. Information retrieval
    1. Document representation
2. Theory of computation
  1. Semantics and reasoning
    1. Program reasoning
      1. Abstraction

Recommendations

Rushes video summarization using audio-visual information and sequence alignment
TVS '08: Proceedings of the 2nd ACM TRECVid Video Summarization Workshop

This paper describes our system and methodologies for the BBC rushes video summarization task of TRECVID 2008. The procedure of the system is composed of three major steps: shot detection, irrelevant and repetitive subshot removal, and final summary ...
Multi-document video summarization
ICME'09: Proceedings of the 2009 IEEE international conference on Multimedia and Expo

Most previous works on video summarization target on a single video document. With the popularity of video corpus (e.g. news video archives) and web videos, video article that consists of a set of relevant videos are frequently confronted by users. By ...
Video Summarization using Text Subjectivity Classification
WebMedia '22: Proceedings of the Brazilian Symposium on Multimedia and the Web

Video summarization has attracted researchers’ attention because it provides a compact and informative video version, supporting users and systems to save efforts in searching and understanding content of interest. Current techniques employ different ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

WWW '11: Proceedings of the 20th international conference on World wide web

March 2011

840 pages

ISBN:9781450306324

DOI:10.1145/1963405

General Chairs:
S. Sadagopan
IIIT-Bangalore, India
,
Krithi Ramamritham
IIT-Bombay, India
,
Arun Kumar
IBM Research, India
,
M. P. Ravindra
Infosys E & R, India
,
Program Chairs:
Elisa Bertino
Purdue University, USA
,
Ravi Kumar
Yahoo! Research, USA

Copyright © 2011 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web
The International Institute of Information Technology Bangalore: The International Institute of Information Technology Bangalore

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 March 2011

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

WWW '11

WWW '11: 20th International World Wide Web Conference

March 28 - April 1, 2011

Hyderabad, India

Acceptance Rates

Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

29
Total Citations
View Citations
533
Total Downloads

Downloads (Last 12 months)5
Downloads (Last 6 weeks)0

Reflects downloads up to 30 Aug 2024

Other Metrics

View Author Metrics

Citations

Cited By

Khilji ASinha USingh PAli ALaskar SDadure PManna RPakray PFavre BBandyopadhyay S(2023)Multimodal text summarization with evaluation approachesSādhanā10.1007/s12046-023-02284-z48:4Online publication date: 24-Oct-2023
https://doi.org/10.1007/s12046-023-02284-z
Singh AKumar M(2023)Bayesian fuzzy clustering and deep CNN-based automatic video summarizationMultimedia Tools and Applications10.1007/s11042-023-15431-983:1(963-1000)Online publication date: 30-May-2023
https://doi.org/10.1007/s11042-023-15431-9
Hang WLiang SChoi KChung FWang S(2021)Selective Transfer Classification Learning With Classification-Error-Based Consensus RegularizationIEEE Transactions on Emerging Topics in Computational Intelligence10.1109/TETCI.2019.28927625:2(178-190)Online publication date: Apr-2021
https://doi.org/10.1109/TETCI.2019.2892762
Abdi LHashemi S(2021)Unsupervised Domain Adaptation Based on Correlation MaximizationIEEE Access10.1109/ACCESS.2021.31115869(127054-127067)Online publication date: 2021
https://doi.org/10.1109/ACCESS.2021.3111586
Das SKolya AKundu A(2021)Video Summarization Using a Dense Captioning (DenseCap) ModelIntelligent Multi‐modal Data Processing10.1002/9781119571452.ch5(97-129)Online publication date: 30-Apr-2021
https://doi.org/10.1002/9781119571452.ch5
Mujtaba GRyu E(2020)Client-Driven Personalized Trailer Framework Using Thumbnail ContainersIEEE Access10.1109/ACCESS.2020.29829928(60417-60427)Online publication date: 2020
https://doi.org/10.1109/ACCESS.2020.2982992
Fei MJiang WMao W(2018)A novel compact yet rich key frame creation method for compressed video summarizationMultimedia Tools and Applications10.1007/s11042-017-4843-277:10(11957-11977)Online publication date: 1-May-2018
https://dl.acm.org/doi/10.1007/s11042-017-4843-2
Chen XZhang YAi QXu HYan JQin ZKando NSakai TJoho HLi Hde Vries AWhite R(2017)Personalized Key Frame RecommendationProceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3077136.3080776(315-324)Online publication date: 7-Aug-2017
https://dl.acm.org/doi/10.1145/3077136.3080776
Hussein FPiccardi M(2017)V-JAUNEACM Transactions on Multimedia Computing, Communications, and Applications10.1145/306353213:2(1-19)Online publication date: 26-Apr-2017
https://dl.acm.org/doi/10.1145/3063532
Plummer BBrown MLazebnik S(2017)Enhancing Video Summarization via Vision-Language Embedding2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR.2017.118(1052-1060)Online publication date: Jul-2017
https://doi.org/10.1109/CVPR.2017.118
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents