Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

VSumVis: Interactive Visual Understanding and Diagnosis of Video Summarization Model

Published: 08 June 2021 Publication History

Abstract

With the rapid development of mobile Internet, the popularity of video capture devices has brought a surge in multimedia video resources. Utilizing machine learning methods combined with well-designed features, we could automatically obtain video summarization to relax video resource consumption and retrieval issues. However, there always exists a gap between the summarization obtained by the model and the ones annotated by users. How to help users understand the difference, provide insights in improving the model, and enhance the trust in the model remains challenging in the current study. To address these challenges, we propose VSumVis under a user-centered design methodology, a visual analysis system with multi-feature examination and multi-level exploration, which could help users explore and analyze video content, as well as the intrinsic relationship that existed in our video summarization model. The system contains multiple coordinated views, i.e., video view, projection view, detail view, and sequential frames view. A multi-level analysis process to integrate video events and frames are presented with clusters and nodes visualization in our system. Temporal patterns concerning the difference between the manual annotation score and the saliency score produced by our model are further investigated and distinguished with sequential frames view. Moreover, we propose a set of rich user interactions that enable an in-depth, multi-faceted analysis of the features in our video summarization model. We conduct case studies and interviews with domain experts to provide anecdotal evidence about the effectiveness of our approach. Quantitative feedback from a user study confirms the usefulness of our visual system for exploring the video summarization model.

References

[1]
Muhammad Ajmal, Muhammad Husnain Ashraf, Muhammad Shakir, Yasir Abbas, and Faiz Ali Shah. 2012. Video summarization: Techniques and classification. In Proceedings of the International Conference on Computer Vision and Graphics. Springer, Berlin, 1–13.
[2]
Sachan Priyamvada Rajendra and N. Keshaveni. 2014. A survey of automatic video summarization techniques. International Journal of Electronics, Electrical and Computational System 2, 1 (2014).
[3]
Philippe Aigrain, HongJiang Zhang, and Dragutin Petkovic. 1996. Content-based representation and retrieval of visual media: A state-of-the-art review. Multimedia Tools and Applications 3, 3 (1996), 179–202.
[4]
Wei Ren and Yuesheng Zhu. 2008. A video summarization approach based on machine learning. In Proceedings of Intelligent Information Hiding and Multimedia Signal Processing. IEEE, 450–453.
[5]
Ying Li, Tong Zhang, and Daniel Tretter. 2001. An Overview of Video Abstraction Techniques. Technical Report. HP Laboratory.
[6]
Michael Gygli, Helmut Grabner, Hayko Riemenschneider, and Luc Van Gool. 2014. Creating summaries from user videos. In Proceedings of the European Conference on Computer Vision. Springer, 505–520.
[7]
Ke Zhang, Wei-Lun Chao, Fei Sha, and Kristen Grauman. 2016. Summary transfer: Exemplar-based subset selection for video summarization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 1059–1067.
[8]
Kaiyang Zhou, Yu Qiao, and Tao Xiang. 2018. Deep reinforcement learning for unsupervised video summarization with diversity-representativeness reward. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence. 7582–7589.
[9]
Yongjin Liu, Minjing Yu, Qiufang Fu, Wenfeng Chen, Ye Liu, and Lexing Xie. 2016. Cognitive mechanism related to line drawings and its applications in intelligent process of visual media: A survey. Frontiers of Computer Science 10, 2 (2016), 216–232.
[10]
Carsten Rother, Sanjiv Kumar, Vladimir Kolmogorov, and Andrew Blake. 2005. Digital tapestry [automatic image synthesis]. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Vol. 1. IEEE, 589–596.
[11]
Michael A. Smith and Takeo Kanade. 1998. Video skimming and characterization through the combination of image and language understanding techniques. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. IEEE, 61–70.
[12]
Yong-Jin Liu, Xi Luo, Yu-Ming Xuan, Wen-Feng Chen, and Xiao-Lan Fu. 2011. Image retargeting quality assessment. Computer Graphics Forum 30, 2 (2011), 583–592.
[13]
Arnold W. M. Smeulders, Marcel Worring, Simone Santini, Amarnath Gupta, and Ramesh Jain. 2000. Content-based image retrieval at the end of the early years. IEEE Transactions on Pattern Analysis and Machine Intelligence 22, 12 (2000), 1349–1380.
[14]
Guo-Dao Sun, Ying-Cai Wu, Rong-Hua Liang, and Shi-Xia Liu. 2013. A survey of visual analytics techniques and applications: State-of-the-art research and future challenges. Journal of Computer Science and Technology 28, 5 (2013), 852–867.
[15]
Rita Borgo, Min Chen, Ben Daubney, Edward Grundy, Gunther Heidemann, Benjamin Höferlin, Markus Höferlin, Heike Leitte, Daniel Weiskopf, and Xianghua Xie. 2012. State of the art report on video-based graphics and video visualization. Computer Graphics Forum 31, 8 (2012), 2450–2477.
[16]
Guodao Sun, Yin Zhao, Dizhou Cao, Jianyuan Li, Ronghua Liang, and Yipeng Liu. 2019. AtoMixer: Atom-based interactive visual exploration of traffic surveillance data. Journal of Computer Languages 53 (2019), 53–62.
[17]
Aoyu Wu and Huamin Qu. 2018. Multimodal analysis of video collections: Visual exploration of presentation techniques in TED talks. IEEE Transactions on Visualization and Computer Graphics 26, 7 (2018), 2429–2442.
[18]
Haipeng Zeng, Xingbo Wang, Aoyu Wu, Yong Wang, Quan Li, Alex Endert, and Huamin Qu. 2019. EmoCo: Visual analysis of emotion coherence in presentation videos. IEEE Transactions on Visualization and Computer Graphics 26, 1 (2019), 927–937.
[19]
Kuno Kurzhals, Markus John, Florian Heimerl, Paul Kuznecov, and Daniel Weiskopf. 2016. Visual movie analytics. IEEE Transactions on Multimedia 18, 11 (2016), 2149–2160.
[20]
Hongsen Liao, Li Chen, Yibo Song, and Hao Ming. 2016. Visualization-based active learning for video annotation. IEEE Transactions on Multimedia 18, 11 (2016), 2196–2205.
[21]
Yong-Jin Liu, Cuixia Ma, Guozhen Zhao, Xiaolan Fu, Hongan Wang, Guozhong Dai, and Lexing Xie. 2016. An interactive spiraltape video summarization. IEEE Transactions on Multimedia 18, 7 (2016), 1269–1282.
[22]
Tao Mei, Bo Yang, Shi-Qiang Yang, and Xian-Sheng Hua. 2009. Video collage: Presenting a video sequence using a single image. The Visual Computer 25, 1 (2009), 39–51.
[23]
Cui-Xia Ma, Yong-Jin Liu, Hong-An Wang, Dong-Xing Teng, and Guo-Zhong Dai. 2012. Sketch-based annotation and visualization in video authoring. IEEE Transactions on Multimedia 14, 4 (2012), 1153–1165.
[24]
Minjing Yu, Yong-Jin Liu, Su-Jing Wang, Qiufang Fu, and Xiaolan Fu. 2016. A PMJ-inspired cognitive framework for natural scene categorization in line drawings. Neurocomputing 173 (2016), 2041–2048. Issue Part 3.
[25]
Shingo Uchihashi, Jonathan Foote, Andreas Girgensohn, and John Boreczky. 1999. Video Manga: Generating semantically meaningful video summaries. In Proceedings of the 7th ACM International Conference on Multimedia. ACM, 383–392.
[26]
Amir H. Meghdadi and Pourang Irani. 2013. Interactive exploration of surveillance video through action shot summarization and trajectory visualization. IEEE Transactions on Visualization and Computer Graphics 19, 12 (2013), 2119–2128.
[27]
Markus Hoeferlin, Benjamin Hoeferlin, Gunther Heidemann, and Daniel Weiskopf. 2013. Interactive schematic summaries for faceted exploration of surveillance video. IEEE Transactions on Multimedia 15, 4 (2013), 908–920.
[28]
Benjamin Renoust, Duy-Dinh Le, and Shin’Ichi Satoh. 2016. Visual analytics of political networks from face-tracking of news video. IEEE Transactions on Multimedia 18, 11 (2016), 2184–2195.
[29]
Josua Krause, Adam Perer, and Kenney Ng. 2016. Interacting with predictions: Visual inspection of black-box machine learning models. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems. ACM, 5686–5697.
[30]
Enrico Bertini and Denis Lalanne. 2009. Surveying the complementary role of automatic data analysis and visualization in knowledge discovery. In Proceedings of the ACM SIGKDD Workshop on Visual Analytics and Knowledge Discovery: Integrating Automated Analysis with Interactive Exploration. ACM, 12–20.
[31]
Junhua Lu, Wei Chen, Yuxin Ma, Junming Ke, Zongzhuang Li, Fan Zhang, and Ross Maciejewski. 2017. Recent progress and trends in predictive visual analytics. Frontiers of Computer Science 11, 2 (2017), 192–207.
[32]
Shixia Liu, Xiting Wang, Mengchen Liu, and Jun Zhu. 2017. Towards better analysis of machine learning models: A visual analytics perspective. Visual Informatics 1, 1 (2017), 48–56.
[33]
Yafeng Lu, Rolando Garcia, Brett Hansen, Michael Gleicher, and Ross Maciejewski. 2017. The state-of-the-art in predictive visual analytics. Computer Graphics Forum 36, 3 (2017), 539–562.
[34]
Xun Zhao, Yanhong Wu, Dik Lun Lee, and Weiwei Cui. 2019. iForest: Interpreting random forests via visual analytics. IEEE Transactions on Visualization and Computer Graphics 25, 1 (2019), 407–416.
[35]
Subhajit Das, Dylan Cashman, Remco Chang, and Alex Endert. 2019. BEAMES: Interactive multi-model steering, selection, and inspection for regression tasks. IEEE Computer Graphics and Applications 39, 5 (2019), 20–32.
[36]
Bum Chul Kwon, Min-Je Choi, Joanne Taery Kim, Edward Choi, Young Bin Kim, Soonwook Kwon, Jimeng Sun, and Jaegul Choo. 2019. RetainVis: Visual analytics with interpretable and interactive recurrent neural networks on electronic medical records. IEEE Transactions on Visualization and Computer Graphics 25, 1 (2019), 299–309.
[37]
Mengchen Liu, Jiaxin Shi, Kelei Cao, Jun Zhu, and Shixia Liu. 2018. Analyzing the training processes of deep generative models. IEEE Transactions on Visualization and Computer Graphics 24, 1 (2018), 77–87.
[38]
Minsuk Kahng, Pierre Y. Andrews, Aditya Kalro, and Duen Horng Polo Chau. 2017. ActiVis: Visual exploration of industry-scale deep neural network models. IEEE Transactions on Visualization and Computer Graphics 24, 1 (2017), 88–97.
[39]
Minsuk Kahng, Nikhil Thorat, Duen Horng Polo Chau, Fernanda B. Viégas, and Martin Wattenberg. 2019. GAN Lab: Understanding complex deep generative models using interactive visual experimentation. IEEE Transactions on Visualization and Computer Graphics 25, 1 (2019), 310–320.
[40]
Yao Ming, Shaozu Cao, Ruixiang Zhang, Zhen Li, Yuanzhe Chen, Yangqiu Song, and Huamin Qu. 2017. Understanding hidden memories of recurrent neural networks. In Proceedings of IEEE Conference on Visual Analytics Science and Technology. IEEE, 13–24.
[41]
Cheonbok Park, Inyoup Na, Yongjang Jo, Sungbok Shin, Jaehyo Yoo, Bum Chul Kwon, Jian Zhao, Hyungjong Noh, Yeonsoo Lee, and Jaegul Choo. 2019. SANVis: Visual analytics for understanding self-attention networks. In Proceedings of International Conference on Visualization. IEEE, 146–150.
[42]
Shouxing Xiang, Xi Ye, Jiazhi Xia, Jing Wu, Yang Chen, and Shixia Liu. 2019. Interactive correction of mislabeled training data. In Proceedings of IEEE Conference on Visual Analytics Science and Technology. 57–68.
[43]
Sean Kandel, Ravi Parikh, Andreas Paepcke, Joseph M. Hellerstein, and Jeffrey Heer. 2012. Profiler: Integrated statistical analysis and visualization for data quality assessment. In Proceedings of the International Working Conference on Advanced Visual Interfaces. ACM, 547–554.
[44]
Shixia Liu, Changjian Chen, Yafeng Lu, Fangxin Ouyang, and Bin Wang. 2019. An interactive method to improve crowdsourced annotations. IEEE Transactions on Visualization and Computer Graphics 25, 1 (2019), 235–245.
[45]
Yao Ming, Huamin Qu, and Enrico Bertini. 2019. RuleMatrix: Visualizing and understanding classifiers with rules. IEEE Transactions on Visualization and Computer Graphics 25, 1 (2019), 342–352.
[46]
Jian Zhao, Chidansh Bhatt, Matthew Cooper, and David A. Shamma. 2018. Flexible learning with semantic visual exploration and sequence-based recommendation of MOOC videos. In Proceedings of Conference on Human Factors in Computing Systems. ACM, 1–13.
[47]
Sara Di Bartolomeo, Yixuan Zhang, Fangfang Sheng, and Cody Dunne. 2020. Sequence braiding: Visual overviews of temporal event sequences and attributes. In Proceedings of the International Conference on Visualization. IEEE.
[48]
Qianwen Wang, William Alexander, Jack Pegg, Huamin Qu, and Min Chen. 2021. HypoML: Visual analysis for hypothesis-based evaluation of machine learning models. IEEE Transactions on Visualization and Computer Graphics 27, 2 (2021), 1417–1426.
[49]
Joseph F. DeRose, Jiayao Wang, and Matthew Berger. 2021. Attention flows: Analyzing and comparing attention mechanisms in language models. IEEE Transactions on Visualization and Computer Graphics 27 (2021), 1160–1170.
[50]
Haoran Liang, Ronghua Liang, and Guodao Sun. 2016. Looking into saliency model via space-time visualization. IEEE Transactions on Multimedia 18, 11 (2016), 2271–2281.
[51]
Marcella Cornia, Lorenzo Baraldi, Giuseppe Serra, and Rita Cucchiara. 2018. Predicting human eye fixations via an LSTM-based saliency attentive model. IEEE Transactions on Image Processing 27, 10 (2018), 5142–5154.
[52]
Mengjuan Fei, Wei Jiang, and Weijie Mao. 2018. Creating memorable video summaries that satisfy the user’s intention for taking the videos. Neurocomputing 275 (2018), 1911–1920. Issue C.
[53]
Aditya Khosla, Akhil S. Raju, Antonio Torralba, and Aude Oliva. 2015. Understanding and predicting image memorability at a large scale. In Proceedings of the IEEE International Conference on Computer Vision. IEEE, 2390–2398.
[54]
Phillip Isola, Jianxiong Xiao, Devi Parikh, Antonio Torralba, and Aude Oliva. 2014. What makes a photograph memorable?IEEE Transactions on Pattern Analysis & Machine Intelligence 36, 7 (2014), 1469–1482.
[55]
Hossein Talebi and Peyman Milanfar. 2018. NIMA: Neural image assessment. IEEE Transactions on Image Processing 27, 8 (2018), 3998–4011.
[56]
Ming Jiang, Shengsheng Huang, Juanyong Duan, and Qi Zhao. 2015. Salicon: Saliency in context. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 1072–1080.
[57]
William Playfair. 2005. Playfair’s Commercial and Political Atlas and Statistical Breviary. Cambridge University Press.
[58]
Laurens van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. Journal of Machine Learning Research 9, 11 (2008), 2579–2605.
[59]
Samuel Gratzl, Alexander Lex, Nils Gehlenborg, Hanspeter Pfister, and Marc Streit. 2013. Lineup: Visual analysis of multi-attribute rankings. IEEE Transactions on Visualization and Computer Graphics 19, 12 (2013), 2277–2286.

Cited By

View all
  • (2024)BNoteHelper: A Note-based Outline Generation Tool for Structured Learning on Video-sharing PlatformsACM Transactions on the Web10.1145/363877518:2(1-30)Online publication date: 12-Mar-2024
  • (2022)SurVizor: visualizing and understanding the key content of surveillance videosJournal of Visualization10.1007/s12650-021-00803-w25:3(635-651)Online publication date: 1-Jun-2022

Index Terms

  1. VSumVis: Interactive Visual Understanding and Diagnosis of Video Summarization Model

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Intelligent Systems and Technology
    ACM Transactions on Intelligent Systems and Technology  Volume 12, Issue 4
    August 2021
    368 pages
    ISSN:2157-6904
    EISSN:2157-6912
    DOI:10.1145/3468075
    • Editor:
    • Huan Liu
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 08 June 2021
    Accepted: 01 March 2021
    Revised: 01 March 2021
    Received: 01 October 2020
    Published in TIST Volume 12, Issue 4

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Visual Analytics
    2. video summarization
    3. video visualization
    4. machine learning
    5. multimedia visual analysis

    Qualifiers

    • Research-article
    • Refereed

    Funding Sources

    • National Key Research and Development Program of China
    • National Natural Science Foundation of China
    • Fundamental Research Funds for the Provincial Universities of Zhejiang
    • Zhejiang Provincial Natural Science Foundation of China
    • Zhejiang Provincial Key Research and Development Program of China

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)39
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 13 Sep 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)BNoteHelper: A Note-based Outline Generation Tool for Structured Learning on Video-sharing PlatformsACM Transactions on the Web10.1145/363877518:2(1-30)Online publication date: 12-Mar-2024
    • (2022)SurVizor: visualizing and understanding the key content of surveillance videosJournal of Visualization10.1007/s12650-021-00803-w25:3(635-651)Online publication date: 1-Jun-2022

    View Options

    Get Access

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media