Modality correlation-based video summarization

Wang, Xingrun; Nie, Xiushan; Liu, Xingbo; Wang, Binze; Yin, Yilong

doi:10.1007/s11042-020-08690-3

Modality correlation-based video summarization

Published: 03 March 2020

Volume 79, pages 33875–33890, (2020)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Xingrun Wang¹,
Xiushan Nie ORCID: orcid.org/0000-0001-9644-9723²,
Xingbo Liu¹,
Binze Wang³ &
…
Yilong Yin⁴

389 Accesses
9 Citations
Explore all metrics

Abstract

Video summarization is an important technique to help us browse, store, and retrieve a rapidly increasing amount of video data, which extracts frames or shots from the original video. Text information covers important content of a video, and thus a summarization can be generated by exploring the correlation between the frame and text. In this study, we propose a video summarization method based on the modality correlation. With this method, we first learn the correlation between the text and frame in the respective space, and then fuse two correlations to obtain the importance score of each shot. Finally, video shots that have a high importance score are chosen as the video summarization. Compared to previous methods that seldom apply text to generate the video summarization, or only use the latent common information between text and frame, the proposed method fully utilizes not only the latent common but also modality-specific information for a video summarization. Experiments were conducted on the TVSum50 dataset, and the results verify the effectiveness of our proposed approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multimedia maximal marginal relevance for multi-video summarization

Article 10 October 2014

Video summarization via exploring the global and local importance

Article 03 January 2018

Transforming Multi-concept Attention into Video Summarization

Find the latest articles, discoveries, and news in related topics.

Artificial Intelligence

References

Aner A, Kender JR (2002) Video summaries through mosaic-based shot and scene clustering. In: European conference on computer vision. Springer, Berlin, pp 388–402
Chakraborty S, Tickoo O, Iyer R (2015) Adaptive keyframe selection for video summarization. In: 2015 IEEE winter conference on applications of computer vision (WACV). IEEE, pp 702–709
Cong Y, Yuan J, Luo J (2012) Towards scalable summarization of consumer videos via sparse dictionary selection. IEEE Trans Multimed 14(1):66–75
Article Google Scholar
De Avila SEF, Lopes APB, da Luz A Jr, de Albuquerque Araújo A (2011) Vsumm: a mechanism designed to produce static video summaries and a novel evaluation method. Pattern Recogn Lett 32(1):56–68
Article Google Scholar
Gong B, Chao WL, Grauman K, Sha F (2014) Diverse sequential subset selection for supervised video summarization. In: Advances in neural information processing systems, pp 2069–2077
Guan G, Wang Z, Lu S, Da Deng J, Feng DD (2013) Keypoint-based keyframe selection. IEEE Trans Circ Sys Video Technol 23(4):729–734
Article Google Scholar
Gygli M, Grabner H, Riemenschneider H, Van Gool L (2014) Creating summaries from user videos. In: European conference on computer vision. Springer, Berlin, pp 505–520
Hadi Y, Essannouni F, Thami ROH (2006) Video summarization by k-medoid clustering. In: Proceedings of the 2006 ACM symposium on applied computing. ACM, pp 1400–1401
Han J, Ji X, Hu X, Zhu D, Li K, Jiang X, Cui G, Guo L, Liu T (2013) Representing and retrieving video shots in human-centric brain imaging space. IEEE Trans Image Process 22(7):2723–2736
Article MathSciNet Google Scholar
Han Y, Zhu L, Cheng Z, Li J, Liu X (2018) Discrete optimal graph clustering[J]. IEEE Trans Cybern. https://doi.org/10.1109/TCYB.2018.2881539
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780
Article Google Scholar
Hong R, Tang J, Tan HK, Yan S, Ngo C, Chua TS (2009) Event driven summarization for web videos. In: Proceedings of the first SIGMM workshop on social media. ACM, pp 43–48
Hu T, Li Z, Su W, Mu X, Tang J (2017) Unsupervised video summaries using multiple features and image quality. In: 2017 IEEE third international conference on multimedia big data (BigMM). IEEE, pp 117–120
Kang HW, Matsushita Y, Tang X, Chen XQ (2006) Space-time video montage. In: 2006 IEEE computer society conference on computer vision and pattern recognition (CVPR’06), vol 2. IEEE , pp 1331–1338
Khosla A, Hamid R, Lin CJ, Sundaresan N (2013) Large-scale video summarization using web-image priors. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2698–2705
Kiros R, Zhu Y, Salakhutdinov RR, Zemel R, Urtasun R, Torralba A, Fidler S (2015) Skip-thought vectors. In: Advances in neural information processing systems, pp 3294–3302
Kuanar SK, Panda R, Chowdhury AS (2013) Video key frame extraction through dynamic delaunay clustering with a structural constraint. J Vis Commun Image Represent 24(7):1212–1227
Article Google Scholar
Lee YJ, Ghosh J, Grauman K (2012) Discovering important people and objects for egocentric video summarization. In: 2012 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, pp 1346–1353
Li J, Lu K, Huang Z, Zhu L, Shen HT (2018) Heterogeneous domain adaptation through progressive alignment. IEEE Trans Neural Netw Learning Sys 30 (5):1381–1391
Article MathSciNet Google Scholar
Li J, Lu K, Huang Z, Zhu L, Shen HT (2018) Transfer independently together: a generalized framework for domain adaptation. IEEE Trans Cybern 49 (6):2144–2155
Article Google Scholar
Li J, Wu Y, Zhao J, Lu K (2016) Low-rank discriminant embedding for multiview learning. IEEE Trans Cybern 47(11):3516–3529
Article Google Scholar
Li X, Zhao B, Lu X (2017) A general framework for edited video and raw video summarization. IEEE Trans Image Process 26(8):3652–3664
Article MathSciNet Google Scholar
Lin D, Fidler S, Kong C, Urtasun R (2014) Visual semantic search: retrieving videos via complex textual queries. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2657–2664
Liu D, Hua G, Chen T (2010) A hierarchical visual model for video object summarization. IEEE Trans Pattern Anal Mach Intell 32(12):2178–2190
Article Google Scholar
Lu Z, Grauman K (2013) Story-driven summarization for egocentric video. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2714–2721
Ma YF, Lu L, Zhang HJ, Li M (2002) A user attention model for video summarization. In: Proceedings of the tenth ACM international conference on multimedia. ACM, pp 533–542
Nam J, Tewfik AH (2002) Event-driven video abstraction and visualization. Multimed Tools Appl 16(1-2):55–77
Article Google Scholar
Newey WK (1988) Adaptive estimation of regression models via moment restrictions. J Econ 38(3):301–339
Article MathSciNet Google Scholar
Ngo CW, Ma YF, Zhang HJ (2003) Automatic video summarization by graph modeling. In: Ninth IEEE international conference on computer vision, 2003. Proceedings. IEEE, pp 104–109
Peng Y, Qi J, Yuan Y (2018) Modality-specific cross-modal similarity measurement with recurrent attention network. IEEE Trans Image Process 27 (11):5585–5599
Article MathSciNet Google Scholar
Potapov D, Douze M, Harchaoui Z, Schmid C (2014) Category-specific video summarization. In: European conference on computer vision. Springer, Berlin, pp 540–555
Rush AM, Chopra S, Weston J (2015) A neural attention model for abstractive sentence summarization. arXiv:1509.00685
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556
Song J, Yang Y, Huang Z, Shen HT, Luo J (2013) Effective multiple feature hashing for large-scale near-duplicate video retrieval. IEEE Trans Multimed 15 (8):1997–2008
Article Google Scholar
Song Y, Vallmitjana J, Stent A, Jaimes A (2015) Tvsum: summarizing web videos using titles. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5179–5187
Sun K, Zhu J, Lei Z, Hou X, Zhang Q, Duan J, Qiu G (2017) Learning deep semantic attributes for user video summarization. In: 2017 IEEE international conference on multimedia and expo (ICME). IEEE, pp 643–648
Tang J, Wang K, Shao L (2016) Supervised matrix factorization hashing for cross-modal retrieval. IEEE Trans Image Process 25(7):3157–3166
Article MathSciNet Google Scholar
Truong BT, Venkatesh S (2007) Video abstraction: a systematic review and classification. ACM Trans Multimed Comput Commun Appl (TOMM) 3(1):3
Article Google Scholar
Zhang HJ, Wu J, Zhong D, Smoliar SW (1997) An integrated system for content-based video retrieval and browsing. Pattern Recogn 30(4):643–658
Article Google Scholar
Zhang K, Chao WL, Sha F, Grauman K (2016) Video summarization with long short-term memory. In: European conference on computer vision. Springer, pp 766–782
Zhang S, Zhu Y, Roy-Chowdhury AK (2016) Context-aware surveillance video summarization. IEEE Trans Image Process 25(11):5469–5478
Article MathSciNet Google Scholar
Zhang Z, Liu L, Shen F, Shen HT, Shao L (2018) Binary multi-view clustering. IEEE Trans Pattern Anal Mach Intell 41(7):1774–1782
Article Google Scholar
Zhang Z, Shao L, Xu Y, Liu L, Yang J (2017) Marginal representation learning with graph structure self-adaptation. IEEE Trans Neural Netw Learn Sys 29 (10):4645–4659
Article MathSciNet Google Scholar
Zhang Z, Xu Y, Shao L, Yang J (2017) Discriminative block-diagonal representation learning for image recognition. IEEE Trans Neural Netw Learn Sys 29 (7):3111–3125
Article MathSciNet Google Scholar
Zhuang Y, Rui Y, Huang TS, Mehrotra S (1998) Adaptive key frame extraction using unsupervised clustering. In: 1998 international conference on image processing, 1998. ICIP 98. Proceedings, vol 1. IEEE, pp 866–870

Download references

Acknowledgements

This work was supported in part by the National Natural Science Foundation of China under Grant 61671274, 61876098 and 61573219, in part by the Postdoctoral Science Foundation under Grant 2016M592190, in part by the Fostering Project of Dominant Discipline and Talent Team of Shandong Province Higher Education Institutions.

Author information

Authors and Affiliations

School of Computer Science and Technology, Shandong University, Shandong, 250101, China
Xingrun Wang & Xingbo Liu
School of Computer Science and Technology, Shandong Jianzhu University, Shandong, 250101, China
Xiushan Nie
College of Geology Engineering and Geomatics, Chang’an University, Xi’an, 710054, China
Binze Wang
School of Software Engineering, Shandong University, Shandong, 250101, China
Yilong Yin

Authors

Xingrun Wang
View author publications
You can also search for this author in PubMed Google Scholar
Xiushan Nie
View author publications
You can also search for this author in PubMed Google Scholar
Xingbo Liu
View author publications
You can also search for this author in PubMed Google Scholar
Binze Wang
View author publications
You can also search for this author in PubMed Google Scholar
Yilong Yin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiushan Nie.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, X., Nie, X., Liu, X. et al. Modality correlation-based video summarization. Multimed Tools Appl 79, 33875–33890 (2020). https://doi.org/10.1007/s11042-020-08690-3

Download citation

Received: 16 February 2019
Revised: 08 December 2019
Accepted: 28 January 2020
Published: 03 March 2020
Issue Date: December 2020
DOI: https://doi.org/10.1007/s11042-020-08690-3

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Modality correlation-based video summarization

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Multimedia maximal marginal relevance for multi-video summarization

Video summarization via exploring the global and local importance

Transforming Multi-concept Attention into Video Summarization

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Modality correlation-based video summarization

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Multimedia maximal marginal relevance for multi-video summarization

Video summarization via exploring the global and local importance

Transforming Multi-concept Attention into Video Summarization

Explore related subjects

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation