tutorial

Video Skimming: Taxonomy and Comprehensive Survey

Authors:

Vivekraj V. K.,

Balasubramanian RamanAuthors Info & Claims

ACM Computing Surveys (CSUR), Volume 52, Issue 5

Article No.: 106, Pages 1 - 38

https://doi.org/10.1145/3347712

Published: 13 September 2019 Publication History

Abstract

Video skimming, also known as dynamic video summarization, generates a temporally abridged version of a given video. Skimming can be achieved by identifying significant components either in uni-modal or multi-modal features extracted from the video. Being dynamic in nature, video skimming, through temporal connectivity, allows better understanding of the video from its summary. Having this obvious advantage, recently, video skimming has drawn the focus of many researchers benefiting from the easy availability of the required computing resources. In this article, we provide a comprehensive survey on video skimming focusing on the substantial amount of literature from the past decade. We present a taxonomy of video skimming approaches and discuss their evolution highlighting key advances. We also provide a study on the components required for the evaluation of a video skimming performance.

Supplementary Material

a106-k-suppl.pdf (k.zip)

Supplemental movie, appendix, image and software files for, Video Skimming: Taxonomy and Comprehensive Survey

Download
222.21 KB

References

[1]

{n.d.}. MuVee autoproducer. Retrieved from http://www.muvee.com.

[2]

{n.d.}. Power director software. Retrieved from http://www.cyberlink.com.

[3]

Jurandy Almeida, Neucimar J. Leite, and Ricardo da S. Torres. 2013. Online video summarization on compressed domain. Journal of Visual Communication and Image Representation 24, 6 (2013), 729--738.

[4]

Sören Auer, Christian Bizer, Georgi Kobilarov, Jens Lehmann, Richard Cyganiak, and Zachary Ives. 2007. DBpedia: A nucleus for a web of open data. In Proceedings of the 6th International The Semantic Web and 2nd Asian Conference on Asian Semantic Web Conference (ISWC’07/ASWC’07). Springer-Verlag, Berlin, 722--735.

Digital Library

[5]

David M. Blei, Andrew Y. Ng, and Michael I. Jordan. 2003. Latent dirichlet allocation. Journal of Machine Learning Research 3, Jan (2003), 993--1022.

Digital Library

[6]

Ali Borji and Laurent Itti. 2013. State-of-the-art in visual attention modeling. IEEE Transactions on Pattern Analysis and Machine Intelligence 35, 1 (2013), 185--207.

Digital Library

[7]

H. Boukadida, S. A. Berrani, and P. Gros. 2016. Automatically creating adaptive video summaries using constraint satisfaction programming: Application to sport content. IEEE Transactions on Circuits and Systems for Video Technology 27, 4 (April 2017), 920--934.

Digital Library

[8]

P. Bouthemy, M. Gelgon, and F. Ganansia. 1999. A unified approach to shot change detection and camera motion characterization. IEEE Transactions on Circuits and Systems for Video Technology 9, 7 (Oct 1999), 1030--1044.

Digital Library

[9]

Jaime Carbonell and Jade Goldstein. 1998. The use of MMR, diversity-based reranking for reordering documents and producing summaries. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 335--336.

Digital Library

[10]

Bo-Wei Chen, Jia-Ching Wang, and Jhing-Fa Wang. 2009. A novel video summarization based on mining the story-structure and semantic relations among concept entities. IEEE Transactions on Multimedia 11, 2 (2009), 295--312.

Digital Library

[11]

F. Chen, C. De Vleeschouwer, H. D. Barrobés, J. G. Escalada, and D. Conejero. 2010. Automatic summarization of audio-visual soccer feeds. In Proceedings of the IEEE International Conference on Multimedia and Expo. 837--842.

[12]

F. Chen, C. De Vleeschouwer, and A. Cavallaro. 2014. Resource allocation for personalized video summarization. IEEE Transactions on Multimedia 16, 2 (Feb 2014), 455--469.

Digital Library

[13]

Liang-Hua Chen, Chih-Wen Su, Hong-Yuan Mark Liao, and Chun-Chieh Shih. 2003. On the preview of digital movies. Journal of Visual Communication and Image Representation 14, 3 (2003), 358--368.

[14]

Kai-Yin Cheng, Sheng-Jie Luo, Bing-Yu Chen, and Hao-Hua Chu. 2009. SmartPlayer: User-centric video fast-forwarding. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. 789--798.

Digital Library

[15]

W. S. Chu, Yale Song, and A. Jaimes. 2015. Video co-summarization: Video summarization by visual co-occurrence. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3584--3592.

[16]

Cheng-Tao Chung, Hsin-Kuan Hsiung, Cheng-Kuang Wei, and Lin-shan Lee. 2014. Personalized video summarization based on multi-layered probabilistic latent semantic analysis with shared topics. In Proceedings of the 9th International Symposium on Chinese Spoken Language Processing. 173--177.

[17]

Yang Cong, Junsong Yuan, and Jiebo Luo. 2012. Towards scalable summarization of consumer videos via sparse dictionary selection. IEEE Transactions on Multimedia 14, 1 (2012), 66--75.

Digital Library

[18]

Adele Cutler and Leo Breiman. 1994. Archetypal analysis. Technometrics 36, 4 (1994), 338--347.

[19]

Chinh T. Dang and Hayder Radha. 2014. Heterogeneity image patch index and its application to consumer video summarization. IEEE Transactions on Image Processing 23, 6 (2014), 2704--2718.

Digital Library

[20]

F. Daniyal and A. Cavallaro. 2011. Multi-camera scheduling for video production. In Proceedings of the 2011 Conference for Visual Media Production. 11--20.

Digital Library

[21]

Kaveh Darabi and Gheorghita Ghinea. 2015. Personalized video summarization using sift. In Proceedings of the 30th Annual ACM Symposium on Applied Computing. 1252--1256.

Digital Library

[22]

A. G. del Molino, C. Tan, J. H. Lim, and A. H. Tan. 2017. Summarization of egocentric videos: A comprehensive survey. IEEE Transactions on Human-Machine Systems 47, 1 (Feb 2017), 65--76.

[23]

Jeff Donahue, Yangqing Jia, Oriol Vinyals, Judy Hoffman, Ning Zhang, Eric Tzeng, and Trevor Darrell. 2014. Decaf: A deep convolutional activation feature for generic visual recognition. In Proceedings of the International Conference on Machine Learning. 647--655.

Digital Library

[24]

Pei Dong, Zhiyong Wang, Li Zhuo, and Dagan Feng. 2010. Video summarization with visual and semantic features. In Proceedings of the Advances in Multimedia Information Processing. Springer, 203--214.

Digital Library

[25]

Pei Dong, Yong Xia, Shanshan Wang, Li Zhuo, and David Dagan Feng. 2014. An iteratively reweighting algorithm for dynamic video summarization. Multimedia Tools and Applications 74, 21 (2014), 9449--9473.

Digital Library

[26]

H. Duxans, X. Anguera, and D. Conejero. 2009. Audio based soccer game summarization. In Proceedings of the IEEE International Symposium on Broadband Multimedia Systems and Broadcasting. 1--6.

[27]

E. Elhamifar and M. C. D. P. Kaluza. 2017. Online summarization via submodular and convex optimization. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 1818--1826.

[28]

E. Elhamifar, G. Sapiro, and R. Vidal. 2012. See all by looking at a few: Sparse modeling for finding representative objects. In Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition. 1600--1607.

Digital Library

[29]

Georgios Evangelopoulos, Konstantinos Rapantzikos, Alexandros Potamianos, Petros Maragos, A. Zlatintsi, and Yair Avrithis. 2008. Movie summarization based on audiovisual saliency detection. In Proceedings of the 15th IEEE International Conference on Image Processing. 2528--2531.

[30]

Georgios Evangelopoulos, Athanasia Zlatintsi, Alexandros Potamianos, Petros Maragos, Konstantinos Rapantzikos, Georgios Skoumas, and Yannis Avrithis. 2013. Multimodal saliency and fusion for movie summarization based on aural, visual, and textual attention. IEEE Transactions on Multimedia 15, 7 (2013), 1553--1568.

Digital Library

[31]

Georgios Evangelopoulos, Athanasia Zlatintsi, Georgios Skoumas, Konstantinos Rapantzikos, Alexandros Potamianos, Petros Maragos, and Y. Avrithis. 2009. Video event detection and summarization using audio, visual and text saliency. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing. 3553--3556.

Digital Library

[32]

Jerome Friedman, Trevor Hastie, and Robert Tibshirani. 2010. A note on the group lasso and a sparse group lasso. arXiv preprint arXiv:1001.0736 (2010).

[33]

Simone Frintrop, Erich Rome, and Henrik I. Christensen. 2010. Computational visual attention systems and their cognitive foundations: A survey. ACM Transactions on Applied Perception 7, 1 (2010), 1--39.

Digital Library

[34]

Yanwei Fu, Yanwen Guo, Yanshu Zhu, Feng Liu, Chuanming Song, and Zhi-Hua Zhou. 2010. Multi-view video summarization. IEEE Transactions on Multimedia 12, 7 (2010), 717--729.

Digital Library

[35]

Lianli Gao, Peng Wang, Jingkuan Song, Zi Huang, Jie Shao, and Heng Tao Shen. 2017. Event video mashup: From hundreds of videos to minutes of skeleton. In Proceedings of the 31st AAAI Conference on Artificial Intelligence.

Digital Library

[36]

Yue Gao, Wei-Bo Wang, Jun-Hai Yong, and He-Jin Gu. 2009. Dynamic video summarization using two-level redundancy detection. Multimedia Tools and Applications 42, 2 (2009), 233--250.

Digital Library

[37]

Ana Garcia del Molino and Michael Gygli. 2018. PHD-GIFs: Personalized highlight detection for automatic GIF creation. In 2018 ACM Multimedia Conference on Multimedia Conference. ACM, 600--608.

Digital Library

[38]

Boqing Gong, Wei-Lun Chao, Kristen Grauman, and Fei Sha. 2014. Diverse sequential subset selection for supervised video summarization. In Advances in Neural Information Processing Systems, Vol. 2. 2069--2077.

Digital Library

[39]

Stephen R. Gulliver and Gheorghita Ghinea. 2006. Defining user perception of distributed multimedia quality. ACM Transactions on Multimedia Computing, Communications, and Applications 2, 4 (2006), 241--257.

Digital Library

[40]

Michael Gygli, Helmut Grabner, Hayko Riemenschneider, and Luc Van Gool. 2014. Creating summaries from user videos. In Proceedings of the European Conference on Computer Vision. Springer, 505--520.

[41]

Michael Gygli, Helmut Grabner, and Luc Van Gool. 2015. Video summarization by learning submodular mixtures of objectives. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3090--3098.

[42]

Michael Gygli, Yale Song, and Liangliang Cao. 2016. Video2gif: Automatic generation of animated gifs from video. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1001--1009.

[43]

Bohyung Han, Jihun Hamm, and Jack Sim. 2011. Personalized video summarization with human in the loop. In IEEE Workshop on Applications of Computer Vision. 51--57.

Digital Library

[44]

A. Hanjalic. 2002. Shot-boundary detection: Unraveled and resolved? IEEE Transactions on Circuits and Systems for Video Technology 12, 2 (Feb 2002), 90--105.

Digital Library

[45]

Hsuan-I Ho, Wei-Chen Chiu, and Yu-Chiang Frank Wang. 2018. Summarizing first-person videos from third persons’ points of view. In Proceedings of the European Conference on Computer Vision. 70--85.

[46]

Richang Hong, Jinhui Tang, Hung-Khoon Tan, Chong-Wah Ngo, Shuicheng Yan, and Tat-Seng Chua. 2011. Beyond search: Event-driven summarization for web videos. ACM Transactions on Multimedia Computing, Communications, and Applications 7, 4 (2011), 35:1--35:18.

Digital Library

[47]

Richang Hong, Jinhui Tang, Hung-Khoon Tan, Shuicheng Yan, Chongwah Ngo, and Tat-Seng Chua. 2009. Event driven summarization for web videos. In Proceedings of the 1st SIGMM Workshop on Social Media. 43--48.

Digital Library

[48]

Weiming Hu, Nianhua Xie, Li Li, Xianglin Zeng, and Stephen Maybank. 2011. A survey on visual content-based video indexing and retrieval. IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews 41, 6 (2011), 797--819.

Digital Library

[49]

Qian Huang, Zhu Liu, A. Rosenberg, D. Gibbon, and B. Shahraray. 1999. Automated generation of news content hierarchy by integrating audio, video, and text information. In Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing., Vol. 6. 3025--3028.

Digital Library

[50]

Zi Huang, Bo Hu, Hong Cheng, Heng Tao Shen, Hongyan Liu, and Xiaofang Zhou. 2010. Mining near-duplicate graph for cluster-based reranking of web video search results. ACM Transactions on Information Systems (TOIS) 28, 4 (2010), 22:1--22:27.

Digital Library

[51]

Peter J. Huber et al. 1964. Robust estimation of a location parameter. The Annals of Mathematical Statistics 35, 1 (1964), 73--101.

[52]

Zhong Ji, Yaru Ma, Yanwei Pang, and Xuelong Li. 2019. Query-aware sparse coding for web multi-video summarization. Information Sciences 478 (2019), 152--166.

[53]

Zhong Ji, Kailin Xiong, Yanwei Pang, and Xuelong Li. 2017. Video summarization with attention-based encoder-decoder networks. arXiv preprint arXiv:1708.09545 (2017).

[54]

Richard M. Jiang, Abdul H. Sadka, and Danny Crookes. 2009. Advances in Video Summarization and Skimming, Vol. 231. Springer, 27--50.

[55]

Yu-Gang Jiang, Chong-Wah Ngo, and Jun Yang. 2007. Towards optimal bag-of-features for object categorization and semantic video retrieval. In Proceedings of the 6th ACM International Conference on Image and Video Retrieval. 494--501.

Digital Library

[56]

Hideo Joho, Joemon M. Jose, Roberto Valenti, and Nicu Sebe. 2009. Exploiting facial expressions for affective video summarisation. In Proceedings of the ACM International Conference on Image and Video Retrieval. 31:1--31:8.

Digital Library

[57]

Narendra Jussien, Guillaume Rochart, and Xavier Lorca. 2008. Choco: An open source Java constraint programming library. In CPAIOR’08 Workshop on Open-Source Software for Integer and Constraint Programming. 1--10.

[58]

Sepandar D. Kamvar, Taher H. Haveliwala, Christopher D. Manning, and Gene H. Golub. 2003. Extrapolation methods for accelerating PageRank computations. In Proceedings of the 12th International Conference on World Wide Web. ACM, 261--270.

Digital Library

[59]

Rajkumar Kannan, Gheorghita Ghinea, and Sridhar Swaminathan. 2015. What do you wish to see? A summarization system for movies based on user preferences. Information Processing 8 Management 51, 3 (2015), 286--305.

Digital Library

[60]

E. Kasutani and A. Yamada. 2001. The MPEG-7 color layout descriptor: A compact image feature description for high-speed image/video segment retrieval. In Proceedings 2001 International Conference on Image Processing, Vol. 1. 674--677 vol.1.

[61]

Harish Katti, Karthik Yadati, Mohan Kankanhalli, and Chua Tat-Seng. 2011. Affective video summarization and story board generation using pupillary dilation and eye gaze. In Proceedings of the IEEE International Symposium on Multimedia. 319--326.

Digital Library

[62]

Steven M. Kay. 1998. Fundamentals of statistical signal processing: Detection theory. Prentice Hall PTR.

[63]

A. Khosla, R. Hamid, C. J. Lin, and N. Sundaresan. 2013. Large-scale video summarization using web-image priors. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2698--2705.

Digital Library

[64]

G. Kim, L. Sigal, and E. P. Xing. 2014. Joint summarization of large-scale collections of web images and videos for storyline reconstruction. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4225--4232.

Digital Library

[65]

Irena Koprinska and Sergio Carrato. 2001. Temporal video segmentation: A survey. Signal Processing: Image Communication 16, 5 (2001), 477--500.

[66]

P. Koutras, A. Zlatintsi, E. Iosif, A. Katsamanis, P. Maragos, and A. Potamianos. 2015. Predicting audio-visual salient events based on visual, audio and text modalities for movie summarization. In Proceedings of the 2015 IEEE International Conference on Image Processing. 4361--4365.

[67]

S. K. Kuanar, K. B. Ranga, and A. S. Chowdhury. 2015. Multi-view video summarization using bipartite matching constrained optimum-path forest clustering. IEEE Transactions on Multimedia 17, 8 (Aug 2015), 1166--1173.

Digital Library

[68]

Alex Kulesza and Ben Taskar. 2011. Learning determinantal point processes. arXiv preprint arXiv:1202.3738 (2011).

Digital Library

[69]

Alex Kulesza, Ben Taskar, et al. 2012. Determinantal point processes for machine learning. Foundations and Trends® in Machine Learning 5, 2--3 (2012), 123--286.

Digital Library

[70]

Robert Laganière, Raphael Bacco, Arnaud Hocevar, Patrick Lambert, Grégory Païs, and Bogdan E Ionescu. 2008. Video summarization from spatio-temporal features. In Proceedings of the 2nd ACM TRECVid Video Summarization Workshop. 144--148.

Digital Library

[71]

X. Li, B. Zhao, and X. Lu. 2017. A general framework for edited video and raw video summarization. IEEE Transactions on Image Processing 26, 8 (Aug 2017), 3652--3664.

Digital Library

[72]

Ying Li, Shih-Hung Lee, Chia-Hung Yeh, and C. C. Jay Kuo. 2006. Techniques for movie content analysis and skimming: Tutorial and overview on video abstraction techniques. IEEE Signal Processing Magazine 23, 2 (2006), 79--89.

[73]

Yingbo Li, Bernard Merialdo, Mickael Rouvier, and Georges Linares. 2011. Static and dynamic video summaries. In Proceedings of the 19th ACM International Conference on Multimedia. 1573--1576.

Digital Library

[74]

Chin-Yew Lin. 2004. Rouge: A package for automatic evaluation of summaries. In Text Summarization Branches Out: Proceedings of the ACL-04 Workshop. 74--81.

[75]

Yen-Liang Lin, Vlad I. Morariu, and Winston Hsu. 2015. Summarizing while recording: Context-based highlight detection for egocentric videos. In Proceedings of the IEEE International Conference on Computer Vision Workshops. 51--59.

Digital Library

[76]

C. Liu, J. Yuen, and A. Torralba. 2011. SIFT flow: Dense correspondence across scenes and its applications. IEEE Transactions on Pattern Analysis and Machine Intelligence 33, 5 (May 2011), 978--994.

Digital Library

[77]

David G. Lowe. 2004. Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision 60, 2 (2004), 91--110.

Digital Library

[78]

Lie Lu, Hong-Jiang Zhang, and Stan Z. Li. 2003. Content-based audio classification and segmentation by using support vector machines. Multimedia Systems 8, 6 (2003), 482--492.

[79]

Z. Lu and K. Grauman. 2013. Story-driven summarization for egocentric video. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2714--2721.

Digital Library

[80]

Minh-Thang Luong, Hieu Pham, and Christopher D. Manning. 2015. Effective approaches to attention-based neural machine translation. arXiv preprint arXiv:1508.04025 (2015).

[81]

Yu-Fei Ma, Lie Lu, Hong-Jiang Zhang, and Mingjing Li. 2002. A user attention model for video summarization. In Proceedings of the 10th ACM International Conference on Multimedia. 533--542.

Digital Library

[82]

Yu-Fei Ma and Hong-Jiang Zhang. 2002. A model of motion attention for video skimming. In Proceedings of International Conference on Image Processing, Vol. 1. I--129--I--132.

[83]

B. Mahasseni, M. Lam, and S. Todorovic. 2017. Unsupervised video summarization with adversarial LSTM networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2982--2991.

[84]

S. Marvaniya, M. Damoder, V. Gopalakrishnan, K. N. Iyer, and K. Soni. 2016. Real-time video summarization on mobile. In Proceedings of the IEEE International Conference on Image Processing. 176--180.

[85]

Brian W. Matthews. 1975. Comparison of the predicted and observed secondary structure of T4 phage lysozyme. Biochimica et Biophysica Acta (BBA)-Protein Structure 405, 2 (1975), 442--451.

[86]

Irfan Mehmood, Muhammad Sajjad, Seungmin Rho, and Sung Wook Baik. 2016. Divide-and-conquer based summarization framework for extracting affective video content. Neurocomputing 174 (2016), 393--403.

Digital Library

[87]

Tao Mei, Lin-Xie Tang, Jinhui Tang, and Xian-Sheng Hua. 2013. Near-lossless semantic video summarization and its applications to video analysis. ACM Transactions on Multimedia Computing Communications and Applications 9, 3 (July 2013), 16:1--16:23.

Digital Library

[88]

J. Meng, H. Wang, J. Yuan, and Y. P. Tan. 2016. From keyframes to key objects: Video summarization by representative object proposal selection. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. 1039--1048.

[89]

A. Mitra, S. Biswas, and C. Bhattacharyya. 2017. Bayesian modeling of temporal coherence in videos for entity discovery and summarization. IEEE Transactions on Pattern Analysis and Machine Intelligence 39, 3 (March 2017), 430--443.

Digital Library

[90]

Arthur G. Money and Harry Agius. 2008. Video summarisation: A conceptual framework and survey of the state of the art. Journal of Visual Communication and Image Representation 19, 2 (2008), 121--143.

Digital Library

[91]

Arthur G. Money and Harry Agius. 2010. Elvis: Entertainment-led video summaries. ACM Transactions on Multimedia Computing, Communications, and Applications 6, 3 (2010), 17:1--17:30.

Digital Library

[92]

Jeho Nam and Ahmed H. Tewfik. 1999. Dynamic video summarization and visualization. In Proceedings of the 7th ACM International Conference on Multimedia (Part 2). 53--56.

Digital Library

[93]

Apostol Natsev, John R. Smith, Jelena Tešié, Lexing Xie, and Rong Yan. 2008. IBM multimedia analysis and retrieval system. In Proceedings of the 2008 ACM International Conference on Content-based Image and Video Retrieval. 553--554.

Digital Library

[94]

Chong-Wah Ngo, Yu-Fei Ma, and Hong-Jiang Zhang. 2005. Video summarization and scene detection by graph modeling. IEEE Transactions on Circuits and Systems for Video Technology 15, 2 (2005), 296--305.

Digital Library

[95]

Chong-Wah Ngo, Ting-Chuen Pong, and Hong-Jiang Zhang. 2002. Motion-based video representation for scene change detection. Proceedings of the International Journal of Computer Vision 50, 2 (2002), 127--142.

Digital Library

[96]

Chong-Wah Ngo, Wan-Lei Zhao, and Yu-Gang Jiang. 2006. Fast tracking of near-duplicate keyframes in broadcast domain with transitivity propagation. In Proceedings of the 14th ACM International Conference on Multimedia. 845--854.

Digital Library

[97]

Payam Oskouie, Sara Alipour, and Amir-Masoud Eftekhari-Moghadam. 2014. Multimodal feature extraction and fusion for semantic mining of soccer video: A survey. Artificial Intelligence Review 42, 2 (2014), 173--210.

Digital Library

[98]

S. H. Ou, C. H. Lee, V. S. Somayazulu, Y. K. Chen, and S. Y. Chien. 2015. On-line multi-view video summarization for wireless video sensor network. IEEE Journal of Selected Topics in Signal Processing 9, 1 (Feb 2015), 165--179.

[99]

Paul Over, Alan F. Smeaton, and George Awad. 2008. The TRECVid 2008 BBC rushes summarization evaluation. In Proceedings of the 2nd ACM TRECVid Video Summarization Workshop. 1--20.

Digital Library

[100]

Paul Over, Alan F. Smeaton, and Philip Kelly. 2007. The TRECVID 2007 BBC rushes summarization evaluation pilot. In Proceedings of the International Workshop on TRECVID Video Summarization. 1--15.

Digital Library

[101]

Jim Owens. 2015. Television Sports Production. CRC Press.

[102]

R. Panda, A. Das, and A. K. Roy-Chowdhury. 2016. Embedded sparse coding for summarizing multi-view videos. In Proceedings of the IEEE International Conference on Image Processing. 191--195.

[103]

Rameswar Panda, Abir Das, Ziyan Wu, Jan Ernst, and Amit K. Roy-Chowdhury. 2017. Weakly supervised summarization of web videos. In Proceedings of the IEEE International Conference on Computer Vision. IEEE, 3677--3686.

[104]

R. Panda and A. K. Roy-Chowdhury. 2017. Collaborative summarization of topic-related videos. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4274--4283.

[105]

R. Panda and A. K. Roy-Chowdhury. 2017. Sparse modeling for topic-oriented video summarization. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing. 1388--1392.

[106]

Ted Pedersen, Siddharth Patwardhan, and Jason Michelizzi. 2004. WordNet:: Similarity: measuring the relatedness of concepts. In Demonstration Papers at HLT-NAACL 2004. 38--41.

Digital Library

[107]

Wei-Ting Peng, Chia-Han Chang, Wei-Ta Chu, Wei-Jia Huang, Chien-Nan Chou, Wen-Yan Chang, and Yi-Ping Hung. 2010. A real-time user interest meter and its applications in home video summarizing. In Proceedings of the IEEE International Conference on Multimedia and Expo. 849--854.

[108]

Wei-Ting Peng, Yueh-Hsuan Chiang, Wei-Ta Chu, Wei-Jia Huang, Wei-Lun Chang, Po-Chung Huang, and Yi-Ping Hung. 2008. Aesthetics-based automatic home video skimming system. In Proceedings of the Advances in Multimedia Modeling. Springer, 186--197.

Digital Library

[109]

Wei-Ting Peng, Wei-Ta Chu, Chia-Han Chang, Chien-Nan Chou, Wei-Jia Huang, Wen-Yan Chang, and Yi-Ping Hung. 2011. Editing by viewing: Automatic home video summarization by viewing behavior analysis. IEEE Transactions on Multimedia 13, 3 (2011), 539--550.

Digital Library

[110]

Wei-Ting Peng, Wei-Jia Huang, Wei-Ta Chu, Chien-Nan Chou, Wen-Yan Chang, Chia-Han Chang, and Yi-Ping Hung. 2009. A user experience model for home video summarization. In Proceedings of the Advances in Multimedia Modeling. Springer, 484--495.

Digital Library

[111]

Silvia Pfeiffer, Rainer Lienhart, Stephan Fischer, and Wolfgang Effelsberg. 1996. Abstracting digital movies automatically. Journal of Visual Communication and Image Representation 7, 4 (1996), 345--353.

[112]

B. A. Plummer, M. Brown, and S. Lazebnik. 2017. Enhancing video summarization via vision-language embedding. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1052--1060.

[113]

Dulce Ponceleon, Arnon Amir, Savitha Srinivasan, Tanveer Syeda-Mahmood, and Dragutin Petkovic. 1999. CueVideo: Automated multimedia indexing and retrieval. In Proceedings of the 7th ACM International Conference on Multimedia (Part 2). 199.

Digital Library

[114]

Danila Potapov, Matthijs Douze, Zaid Harchaoui, and Cordelia Schmid. 2014. Category-specific video summarization. In Proceedings of the European Conference on Computer Vision. 540--555.

[115]

Z. Rasheed and M. Shah. 2003. Scene detection in Hollywood movies and TV shows. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Vol. 2. II--343--II--348.

[116]

Z. Rasheed and M. Shah. 2005. Detection and representation of scenes in videos. IEEE Transactions on Multimedia 7, 6 (Dec 2005), 1097--1105.

Digital Library

[117]

Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. 2016. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 779--788.

[118]

J. Ren and J. Jiang. 2009. Hierarchical modeling and adaptive clustering for real-time summarization of rush videos. IEEE Transactions on Multimedia 11, 5 (Aug 2009), 906--917.

Digital Library

[119]

Reede Ren, Hemant Misra, and Joemon M. Jose. 2010. Semantic based adaptive movie summarisation. In Proceedings of the Advances in Multimedia Modeling. Springer, 389--399.

Digital Library

[120]

Tongwei Ren, Yan Liu, and Gangshan Wu. 2010. Video summary quality evaluation based on 4C assessment and user interaction. In Proceedings of the Multimedia Interaction and Intelligent User Interfaces. Springer, 243--269.

[121]

Kate Saenko, Brian Kulis, Mario Fritz, and Trevor Darrell. 2010. Adapting visual category models to new domains. In Proceedings of the 11th European Conference on Computer Vision: Part IV. Springer-Verlag, Berlin, 213--226.

Digital Library

[122]

Helmut Schmid. 2013. Probabilistic part-of-speech tagging using decision trees. In Proceedings of the New Methods in Language Processing. Routledge, 154.

[123]

Guy L. Scott and H. Christopher Longuet-Higgins. 1991. An algorithm for associating the features of two images. Proceedings of the Royal Society of London B: Biological Sciences 244, 1309 (1991), 21--26.

[124]

Dafna Shahaf and Carlos Guestrin. 2010. Connecting the dots between news articles. In Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 623--632.

Digital Library

[125]

Aidean Sharghi, Boqing Gong, and Mubarak Shah. 2016. Query-focused extractive video summarization. In Proceedings of European Conference on Computer Vision. 3--19.

[126]

Aidean Sharghi, Jacob S. Laurel, and Boqing Gong. 2017. Query-focused video summarization: Dataset, evaluation, and a memory network based approach. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2127--2136.

[127]

Ali Shokoufandeh and Sven Dickinson. 1999. Applications of bipartite matching to problems in object recognition. In Proceedings of the ICCV Workshop on Graph Algorithms and Computer Vision, Vol. 2. 1--18.

[128]

J. Sivic and A. Zisserman. 2009. Efficient visual search of videos cast as text retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence 31, 4 (April 2009), 591--606.

Digital Library

[129]

Alan F. Smeaton, Paul Over, and Aiden R. Doherty. 2010. Video shot boundary detection: Seven years of TRECVid activity. Computer Vision and Image Understanding 114, 4 (2010), 411--418.

Digital Library

[130]

Michael A. Smith and Takeo Kanade. 1998. Video skimming and characterization through the combination of image and language understanding. In Proceedings of IEEE International Workshop on Content-Based Access of Image and Video Database. 61--70.

Digital Library

[131]

Temple F. Smith and Michael S. Waterman. 1981. Identification of common molecular subsequences. Journal of Molecular Biology 147, 1 (1981), 195--197.

[132]

Yale Song, Jordi Vallmitjana, Amanda Stent, and Alejandro Jaimes. 2015. Tvsum: Summarizing web videos using titles. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5179--5187.

[133]

Baochen Sun, Jiashi Feng, and Kate Saenko. 2016. Return of frustratingly easy domain adaptation. In Thirtieth AAAI Conference on Artificial Intelligence, Vol. 6. 2058--2065.

Digital Library

[134]

Liang Sun, Shuiwang Ji, and Jieping Ye. 2008. Hypergraph spectral learning for multi-label classification. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 668--676.

Digital Library

[135]

Min Sun, Ali Farhadi, and Steve Seitz. 2014. Ranking domain-specific highlights by analyzing edited videos. In Proceedings of the European Conference on Computer Vision. Springer, 787--802.

[136]

Anthony Tang and Sebastian Boring. 2012. #EpicPlay: Crowd-sourcing sports video highlights. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. 1569--1572.

Digital Library

[137]

Cuneyt M. Taskiran. 2006. Evaluation of automatic video summarization systems. In Proceedings of SPIE, Vol. 6073. 178--187.

[138]

Cüneyt M. Taskiran, Zygmunt Pizlo, Arnon Amir, Dulce Ponceleon, and Edward J. Delp. 2006. Automated video program summarization using speech transcripts. IEEE Transactions on Multimedia 8, 4 (2006), 775--791.

Digital Library

[139]

M. Tavassolipour, M. Karimian, and S. Kasaei. 2014. Event detection and summarization in soccer videos using Bayesian network and copula. IEEE Transactions on Circuits and Systems for Video Technology 24, 2 (2014), 291--304.

Digital Library

[140]

Ba Tu Truong and Svetha Venkatesh. 2007. Video abstraction: A systematic review and classification. ACM Transactions on Multimedia Computing, Communications, and Applications 3, 1 (2007), 3:1--3:37.

Digital Library

[141]

Chia-Ming Tsai, Li-Wei Kang, Chia-Wen Lin, and Weisi Lin. 2013. Scene-based movie summarization via role-community networks. IEEE Transactions on Circuits and Systems for Video Technology 23, 11 (2013), 1927--1940.

Digital Library

[142]

Víctor Valdés and José M. Martínez. 2008. Binary tree based on-line video summarization. In Proceedings of the 2nd ACM TRECVid Video Summarization Workshop. 134--138.

Digital Library

[143]

Patrizia Varini, Giuseppe Serra, and Rita Cucchiara. 2015. Egocentric video summarization of cultural tour based on user preferences. In Proceedings of the 23rd ACM International Conference on Multimedia. ACM, 931--934.

Digital Library

[144]

P. Varini, G. Serra, and R. Cucchiara. 2017. Personalized egocentric video summarization of cultural tour on user preferences input. IEEE Transactions on Multimedia 19, 12 (Dec 2017), 2832--2845.

[145]

Nuno Vasconcelos and Andrew Lippman. 1998. Bayesian modeling of video editing and structure: Semantic features for video summarization and browsing. In Proceedings of International Conference on Image Processing. 153--157.

[146]

F. Wang and C. W. Ngo. 2012. Summarizing rushes videos by motion, object, and event understanding. IEEE Transactions on Multimedia 14, 1 (Feb 2012), 76--87.

Digital Library

[147]

L. Wang, Y. Li, and S. Lazebnik. 2016. Learning deep structure-preserving image-text embeddings. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5005--5013.

[148]

Meng Wang, Richang Hong, Guangda Li, Zheng-Jun Zha, Shuicheng Yan, and Tat-Seng Chua. 2012. Event driven web video summarization by tag localization and key-shot identification. IEEE Transactions on Multimedia 14, 4 (2012), 975--985.

Digital Library

[149]

S. Wang and Q. Ji. 2015. Video affective content analysis: A survey of state-of-the-art methods. IEEE Transactions on Affective Computing 6, 4 (Oct 2015), 410--430.

Digital Library

[150]

Xi Wang, Yu-Gang Jiang, Zhenhua Chai, Zichen Gu, Xinyu Du, and Dong Wang. 2014. Real-time summarization of user-generated videos based on semantic recognition. In Proceedings of the 22nd ACM International Conference on Multimedia. 849--852.

Digital Library

[151]

Huawei Wei, Bingbing Ni, Yichao Yan, Huanyu Yu, Xiaokang Yang, and Chen Yao. 2018. Video summarization via semantic attended networks. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence. 216--223.

[152]

Tao Xiang and Shaogang Gong. 2004. Activity based video content trajectory representation and segmentation. In Proceedings of the BMVC. 1--10.

[153]

Xiaohong Xiang and Mohan S. Kankanhalli. 2011. Affect-based adaptive presentation of home videos. In Proceedings of the 19th ACM International Conference on Multimedia. 553--562.

Digital Library

[154]

Baohan Xu, Xi Wang, and Yu-Gang Jiang. 2016. Fast summarization of user-generated videos: Exploiting semantic, emotional, and quality clues. IEEE MultiMedia 23, 3 (2016), 23--33.

Digital Library

[155]

C. Xu, J. Wang, H. Lu, and Y. Zhang. 2008. A novel framework for semantic annotation and personalized retrieval of sports video. IEEE Transactions on Multimedia 10, 3 (April 2008), 421--436.

Digital Library

[156]

Jia Xu, Lopamudra Mukherjee, Yin Li, Jamieson Warner, James M. Rehg, and Vikas Singh. 2015. Gaze-enabled egocentric video summarization via constrained submodular maximization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2235--2244.

[157]

X. Xu, T. M. Hospedales, and S. Gong. 2017. Discovery of shared semantic spaces for multiscene video query and summarization. IEEE Transactions on Circuits and Systems for Video Technology 27, 6 (June 2017), 1353--1367.

Digital Library

[158]

Huan Yang, Baoyuan Wang, Stephen Lin, David Wipf, Minyi Guo, and Baining Guo. 2015. Unsupervised extraction of video highlights via robust recurrent auto-encoders. In Proceedings of the IEEE International Conference on Computer Vision. 4633--4641.

Digital Library

[159]

T. Yao, T. Mei, and Y. Rui. 2016. Highlight detection with pairwise deep ranking for first-person video summarization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 982--990.

[160]

Minerva M. Yeung and Boon-Lock Yeo. 1997. Video visualization for compact presentation and fast browsing of pictorial content. IEEE Transactions on Circuits and Systems for Video Technology 7, 5 (1997), 771--785.

Digital Library

[161]

Serena Yeung, Alireza Fathi, and Li Fei-Fei. 2014. Videoset: Video summary evaluation through text. arXiv preprint arXiv:1406.5824 (2014).

[162]

Atsuo Yoshitaka and Kazuaki Sawada. 2012. Personalized video summarization based on behavior of viewer. In Proceedings of the 8th International Conference on Signal Image Technology and Internet Based Systems. 661--667.

Digital Library

[163]

Junyong You, Guizhong Liu, Li Sun, and Hongliang Li. 2007. A multiple visual models based perceptive analysis framework for multilevel video summarization. IEEE Transactions on Circuits and Systems for Video Technology 17, 3 (2007), 273--285.

Digital Library

[164]

J. Yuan, H. Wang, L. Xiao, W. Zheng, J. Li, F. Lin, and B. Zhang. 2007. A formal study of shot boundary detection. IEEE Transactions on Circuits and Systems for Video Technology 17, 2 (Feb 2007), 168--186.

Digital Library

[165]

Ming Yuan and Yi Lin. 2006. Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 68, 1 (2006), 49--67.

[166]

Zheng Yuan, Taoran Lu, Dapeng Wu, Yu Huang, and Heather Yu. 2011. Video summarization with semantic concept preservation. In Proceedings of the 10th ACM International Conference on Mobile and Ubiquitous Multimedia. 109--112.

Digital Library

[167]

Ke Zhang, Wei-Lun Chao, Fei Sha, and Kristen Grauman. 2016. Summary transfer: Exemplar-based subset selection for video summarization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1059--1067.

[168]

Ke Zhang, Wei-Lun Chao, Fei Sha, and Kristen Grauman. 2016. Video summarization with long short-term memory. In European Conference on Computer Vision. Springer, 766--782.

[169]

S. Zhang, Y. Zhu, and A. K. Roy-Chowdhury. 2016. Context-aware surveillance video summarization. IEEE Transactions on Image Processing 25, 11 (Nov 2016), 5469--5478.

[170]

Ying Zhang, Guanfeng Wang, Beomjoo Seo, and Roger Zimmermann. 2012. Multi-video summary and skim generation of sensor-rich videos in geo-space. In Proceedings of the 3rd Multimedia Systems Conference. 53--64.

Digital Library

[171]

Bin Zhao, Xuelong Li, and Xiaoqiang Lu. 2017. Hierarchical recurrent neural network for video summarization. In Proceedings of the 2017 ACM on Multimedia Conference. ACM, 863--871.

Digital Library

[172]

Bin Zhao and Eric P. Xing. 2014. Quasi real-time summarization for consumer videos. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2513--2520.

Digital Library

[173]

W. L. Zhao, C. W. Ngo, H. K. Tan, and X. Wu. 2007. Near-duplicate keyframe identification with interest point matching and pattern learning. IEEE Transactions on Multimedia 9, 5 (Aug 2007), 1037--1048.

Digital Library

[174]

A. Zlatintsi, E. Iosif, P. Marago, and A. Potamianos. 2015. Audio salient event detection and summarization using audio and text modalities. In Proceedings of the 23rd European Signal Processing Conference. 2311--2315.

Cited By

Kadam BDeshpande A(2024)Multi-head attention with reinforcement learning for supervised video summarizationJournal of Electronic Imaging10.1117/1.JEI.33.5.05301033:05Online publication date: 1-Sep-2024
https://doi.org/10.1117/1.JEI.33.5.053010
Sen DVivekraj V(2024)Multi-Reference Evaluation of Dynamic Video Summaries Using Granule-Aware F-MeasureIEEE Transactions on Emerging Topics in Computational Intelligence10.1109/TETCI.2024.33698558:4(3040-3054)Online publication date: Aug-2024
https://doi.org/10.1109/TETCI.2024.3369855
Xu YZheng JTao YZhu K(2024)Property Constrained Video Summarization via Regret MinimizationSN Computer Science10.1007/s42979-023-02588-15:2Online publication date: 8-Feb-2024
https://doi.org/10.1007/s42979-023-02588-1
Show More Cited By

Index Terms

Recommendations

Accessible skimming: faster screen reading of web pages
UIST '12: Proceedings of the 25th annual ACM symposium on User interface software and technology

In our information-driven web-based society, we are all gradually falling ""victims"" to information overload [5].However, while sighted people are finding ways to sift through information faster, Internet users who are blind are experiencing an even ...
Accessible skimming: faster screen reading of web pages
ASSETS '12: Proceedings of the 14th international ACM SIGACCESS conference on Computers and accessibility

Sighted people know how to quickly glance over the headlines and news articles online to get the gist of information. On the other hand, people who are blind use screen-readers to listen through the content narrated by a serial audio interface. This ...
Machine-Learning-Based Accessibility System
Abstract
Accessing the internet presents considerable difficulties for those with impairments. They frequently have physical constraints that prevent them from using conventional input devices like a mouse or keyboard. However, without the use of ...

Comments

Information & Contributors

Information

Published In

cover image ACM Computing Surveys

ACM Computing Surveys Volume 52, Issue 5

September 2020

791 pages

ISSN:0360-0300

EISSN:1557-7341

DOI:10.1145/3362097

Editor:
Sartaj Sahni
Department of Computer and Information Science and Engineering

Issue’s Table of Contents

Copyright © 2019 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 September 2019

Accepted: 01 July 2019

Revised: 01 March 2019

Received: 01 April 2018

Published in CSUR Volume 52, Issue 5

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Tutorial
Research
Refereed

Funding Sources

Ministry of Human Resource Development (MHRD), Government of India

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

14
Total Citations
View Citations
607
Total Downloads

Downloads (Last 12 months)38
Downloads (Last 6 weeks)2

Reflects downloads up to 10 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Kadam BDeshpande A(2024)Multi-head attention with reinforcement learning for supervised video summarizationJournal of Electronic Imaging10.1117/1.JEI.33.5.05301033:05Online publication date: 1-Sep-2024
https://doi.org/10.1117/1.JEI.33.5.053010
Sen DVivekraj V(2024)Multi-Reference Evaluation of Dynamic Video Summaries Using Granule-Aware F-MeasureIEEE Transactions on Emerging Topics in Computational Intelligence10.1109/TETCI.2024.33698558:4(3040-3054)Online publication date: Aug-2024
https://doi.org/10.1109/TETCI.2024.3369855
Xu YZheng JTao YZhu K(2024)Property Constrained Video Summarization via Regret MinimizationSN Computer Science10.1007/s42979-023-02588-15:2Online publication date: 8-Feb-2024
https://doi.org/10.1007/s42979-023-02588-1
Kadam BDeshpande A(2024)Query-attentive video summarization: a comprehensive reviewMultimedia Tools and Applications10.1007/s11042-024-19977-0Online publication date: 6-Aug-2024
https://doi.org/10.1007/s11042-024-19977-0
Gadhia BModasiya S(2023)An Evaluation-based Analysis of Video Summarising Methods for Diverse DomainsJournal of Innovative Image Processing10.36548/jiip.2023.2.0055:2(127-139)Online publication date: Jun-2023
https://doi.org/10.36548/jiip.2023.2.005
Sabha ASelwal A(2023)Data-driven enabled approaches for criteria-based video summarization: a comprehensive survey, taxonomy, and future directionsMultimedia Tools and Applications10.1007/s11042-023-14925-w82:21(32635-32709)Online publication date: 2-Mar-2023
https://doi.org/10.1007/s11042-023-14925-w
Gupta DSharma A(2023)A comprehensive study of automatic video summarization techniquesArtificial Intelligence Review10.1007/s10462-023-10429-z56:10(11473-11633)Online publication date: 13-Mar-2023
https://doi.org/10.1007/s10462-023-10429-z
Cardoso LGomes GGuimarães Sdo Patrocínio Júnior Z(2023)Hierarchical Time-Aware Approach for Video SummarizationIntelligent Systems10.1007/978-3-031-45368-7_18(274-288)Online publication date: 12-Oct-2023
https://doi.org/10.1007/978-3-031-45368-7_18
Narwal PDuhan NKumar Bhatia K(2022)A comprehensive survey and mathematical insights towards video summarizationJournal of Visual Communication and Image Representation10.1016/j.jvcir.2022.10367089(103670)Online publication date: Nov-2022
https://doi.org/10.1016/j.jvcir.2022.103670
Das TDutta ABiswas S(2022)Summarization of Comic VideosComputational Intelligence in Pattern Recognition10.1007/978-981-19-3089-8_15(151-162)Online publication date: 21-Jun-2022
https://doi.org/10.1007/978-981-19-3089-8_15
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Issue’s Table of Contents