Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article

Automatic summarization of music videos

Published: 01 May 2006 Publication History

Abstract

In this article, we propose a novel approach for automatic music video summarization. The proposed summarization scheme is different from the current methods used for video summarization. The music video is separated into the music track and video track. For the music track, a music summary is created by analyzing the music content using music features, an adaptive clustering algorithm, and music domain knowledge. Then, shots in the video track are detected and clustered. Finally, the music video summary is created by aligning the music summary and clustered video shots. Subjective studies by experienced users have been conducted to evaluate the quality of music summaries and effectiveness of the proposed summarization approach. Experiments are performed on different genres of music videos and comparisons are made with the summaries generated based on music track, video track, and manually. The evaluation results indicate that summaries generated using the proposed method are effective in helping realize users' expectations.

References

[1]
Agnihotri, L., Dimitrova, N., Kender, J., and Zimmerman, J. 2003. Music videos miner. In Proceedings of the ACM International Conference on Multimedia (Berkeley, Calif.). 442--442.
[2]
Agnihotri, L., Dimitrova, N., and Kender, J. 2004. Design and evaluation of a music video summarization system. In Proceedings of the IEEE International Conference on Multimedia and Expo (Taibei, Taiwan).
[3]
Assfalg, J., Bertini, M., DelBimbo, A., Nunziati, W., and Pala, P. 2002. Soccer highlights detection and recognition using HMMs. In Proceedings of the IEEE International Conference on Multimedia and Expo (Lausanne, Switzerland) 1, 825--828.
[4]
Bartsch, M. A. and Wakefield, G. H. 2001. To catch a chorus: Using chroma-based representations for audio thumbnailing. In Proceedings of the Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) (New Paltz, N.Y.). 15--18.
[5]
Chai, W. and Vercoe, B. 2003. Music Thumbnailing via structural analysis. In Proceedings of the ACM International Conference on Multimedia (Berkeley, Calif.). 223--226.
[6]
Chin, J. P., Diehl, V. A. and Norman, K. L. 1988. Development of an instrument measuring user satisfaction of the human-computer interface. In Proceedings of the SIGCHI Conference on Human Factors in Computing System (Washington, D.C.). 213--218.
[7]
Cooper, M. and Foote, J. 2002. Automatic music summarization via similarity analysis. In Proceedings of the International Conference on Music Information Retrieval (Paris, France). 81--85.
[8]
Cormen, T. H., Leiserson, C. E., Rivest, R. L., and Stein, C. 2001. Introduction to Algorithms (2nd ed.). MIT Press, Cambridge, Mass.
[9]
Deller, J. R., Hansen, J. H. L., and Proakis, J. G. 1999. Discrete-Time Processing of Speech Signals. Wiley, New York.
[10]
Dannenberg, R. B. and Hu, N. 2002. Pattern discovery techniques for music audio. In Proceedings of the International Conference on Music Information Retrieval (Paris, France). 63--70.
[11]
DeMenthon, D., Kobla, V., and Maybury, M. T. 1998. Video summarization by curve simplification. In Proceedings of the ACM International Conference on Multimedia (Bristol, U.K.). 211--218.
[12]
Foote, J., Cooper, M., and Girgensohn, A. 2002. Creating music videos using automatic media analysis. In Proceedings of the ACM International Conference on Multimedia (Juan-les-Pins, France). 553--560.
[13]
Gao, S., Maddage, N. C., and Lee, C. H. 2003. A hidden Markov model based approach to music segmentation and identification. In Proceedings of the IEEE Pacific-Rim Conference on Multimedia (Singapore). 1576--1580.
[14]
Duda, R. O., Hart, P. E., and Stork, D. G. 2000. Pattern Classification (2nd ed.). Wiley-Interscience, New York.
[15]
Eugene, N. 1990. The Analysis and Cognition of Basic Melodic Structures. University of Chicago, Chicago, Ill.
[16]
Foote, J., Cooper, M., and Girgensohn, A. 2002. Creating music video using automatic media analysis. In Proceedings of the ACM International Conference on Multimedia (Juan-les-Pins, France). 553--560.
[17]
Gong, Y., Liu, X., and Hua, W. 2001. Summarizing video by minimizing visual content redundancies. In Proceedings of the IEEE International Conference on Multimedia and Expo (Tokyo). 788--791.
[18]
Gong, Y., Liu, X., and Hua, W. 2002. Creating motion video summaries with partial audio-visual alignment. In Proceedings of the IEEE International Conference on Multimedia and Expo (Lausanne, Switzerland) 1, 285--288.
[19]
Gunsel, B. and Tekalp, A. M. 1998. Content-Based video abstraction. In Proceedings of the IEEE International Conference on Image Processing (Chicago, Ill) 3, 128--132.
[20]
Joachims, T. 1998. Text categorization with support vector machines. In Proceedings of the European Conference on Machine Learning. Springer Verlag, New York.
[21]
Kraft, R., Lu, Q., and Teng, S. 2001. Method and apparatus for music summarization and creation of audio summaries. US Patent 6, 225, 546, 2001.
[22]
Logan, B. and Chu, S. 2000. Music summarization using key phrases. In Proceedings of the IEEE International Conference on Audio, Speech and Signal Processing (Istanbul) 2, II749--II752.
[23]
Lu L. and Zhang H. 2003. Automated extraction of music snippets. In Proceedings of the ACM International Conference on Multimedia (Berkeley, Calif.). 140--147.
[24]
Nakamura, Y. and Kanade, T. 1997. Semantic analysis for video contents extraction---Spotting by association in news video. In Proceedings of the ACM International Multimedia Conference (Seattle, Wash.). 393--401.
[25]
Papageorgiou, C., Oren, M., and Poggio T. 1998. A general framework for object detection. In Proceedings of the International Conference on Computer Vision (Bombay, India). 555--562.
[26]
Peeters, G., Burthe, A., and Rodet, X. 2002. Toward automatic music audio summary generation from signal analysis. In Proceedings of the International Conference on Music Information Retrieval (Paris, France). 86--92.
[27]
Pfeiffer, S., Lienhart R., Fischer, S., and Effelsberg, W. 1996. Abstracting digital movies automatically. J. Visual Commun. Image Representation 7, 4, 345--353.
[28]
Rabiner L. R. and Juang B. H. 1993. Fundamentals of Speech Recognition. Prentice-Hall, N.J.
[29]
Scheirer, E. D. 1998. Tempo and beat analysis of acoustic musical signals. J. Acoustical Soc. Am. 103(1), 588--601.
[30]
Sugana, Y. and Iwamiya, S. 2000. The effects of audio-visual synchronization on the attention to the audio-visual materials. Multimedia Modeling (Shuji Hashimoto, ed.), World Scientific, N.J., 1--17.
[31]
Sun, X., Divakaran, A., and Manjunath, B. S. 2001. A motion activity descriptor and its extraction in compressed domain. In Proceedings of the IEEE Pacific-Rim Conference on Multimedia. Lecture Notes in Computer Science 2195, Springer Verlag, New York.
[32]
Sundaram, H., Xie, L., and Chang, S. F. 2002. A utility framework for the automatic generation of audio-visual skims. In Proceedings of the ACM International Multimedia Conference (Juan-les-Pins, France). 189--198.
[33]
Zhang, T. 2003. Automatic singer identification. In Proceedings of the IEEE Conference on Multimedia and Expo, vol. 1 (Baltimore, Md. July 6--9).
[34]
Xu, C., Zhu, Y., and Tian, Q. 2002. Automatic music summarization based on temporal, spectral and cepstral features. In Proceedings of the IEEE International Conference on Multimedia and Expo (Lausanne, Switzerland) 1, 117--120.
[35]
Yow, D., Yeo, B. L., Yeung, M., and Liu, G. 1995. Analysis and presentation of soccer highlights from digital video. In Proceedings of the Asian Conference on Computer Vision, (Singapore) 2, 499--503.
[36]
Zettl, H. 1999. Sight Sound Motion: Applied Media Aesthetics, 3rd ed. Wadsworth, Belmont, Calif.
[37]
Zhuang, Y., Rui, Y., Huang, T. S., and Mehrotra, S. 1998. Key frame extraction using unsupervised clustering. In Proceedings of the IEEE International Conference on Image Processing (Chicago, Ill.). 866--870.

Cited By

View all
  • (2023)Summarizing Web Archive Corpora via Social Media Storytelling by Automatically Selecting and Visualizing ExemplarsACM Transactions on the Web10.1145/360603018:1(1-48)Online publication date: 11-Oct-2023
  • (2018)A Study on Application Scenario of Video Summarization2018 Second International Conference on Electronics, Communication and Aerospace Technology (ICECA)10.1109/ICECA.2018.8474699(936-943)Online publication date: Mar-2018
  • (2017)Modeling the timing of cuts in automatic editing of concert videosMultimedia Tools and Applications10.1007/s11042-016-3304-776:5(6683-6707)Online publication date: 1-Mar-2017
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Multimedia Computing, Communications, and Applications
ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 2, Issue 2
May 2006
82 pages
ISSN:1551-6857
EISSN:1551-6865
DOI:10.1145/1142020
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 May 2006
Published in TOMM Volume 2, Issue 2

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Music summarization
  2. music video
  3. video summarization

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)7
  • Downloads (Last 6 weeks)0
Reflects downloads up to 11 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2023)Summarizing Web Archive Corpora via Social Media Storytelling by Automatically Selecting and Visualizing ExemplarsACM Transactions on the Web10.1145/360603018:1(1-48)Online publication date: 11-Oct-2023
  • (2018)A Study on Application Scenario of Video Summarization2018 Second International Conference on Electronics, Communication and Aerospace Technology (ICECA)10.1109/ICECA.2018.8474699(936-943)Online publication date: Mar-2018
  • (2017)Modeling the timing of cuts in automatic editing of concert videosMultimedia Tools and Applications10.1007/s11042-016-3304-776:5(6683-6707)Online publication date: 1-Mar-2017
  • (2016)Harnessing Music-Related Visual Stereotypes for Music Information RetrievalACM Transactions on Intelligent Systems and Technology10.1145/29267198:2(1-21)Online publication date: 25-Oct-2016
  • (2015)Aesthetics-Guided Summarization from Multiple User Generated VideosACM Transactions on Multimedia Computing, Communications, and Applications10.1145/265952011:2(1-23)Online publication date: 7-Jan-2015
  • (2015)Semantic movie summarization based on string of IE-RoleNetsComputational Visual Media10.1007/s41095-015-0015-31:2(129-141)Online publication date: 16-Aug-2015
  • (2013)Near-lossless semantic video summarization and its applications to video analysisACM Transactions on Multimedia Computing, Communications, and Applications10.1145/2487268.24872699:3(1-23)Online publication date: 3-Jul-2013
  • (2012)Video SummarizationProceedings of the International Conference on Computer Vision and Graphics - Volume 759410.5555/2942031.2942033(1-13)Online publication date: 24-Sep-2012
  • (2012)Video Summarization: Techniques and ClassificationComputer Vision and Graphics10.1007/978-3-642-33564-8_1(1-13)Online publication date: 2012
  • (2010)Multi-View Video SummarizationIEEE Transactions on Multimedia10.1109/TMM.2010.205202512:7(717-729)Online publication date: 1-Nov-2010
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media