Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3321408.3322622acmotherconferencesArticle/Chapter ViewAbstractPublication Pagesacm-turcConference Proceedingsconference-collections
research-article

DTR-GAN: dilated temporal relational adversarial network for video summarization

Published: 17 May 2019 Publication History

Abstract

Video summarization targets the challenge of finding the smallest subset of frames, while still conveying the whole story of a given video. Thus it is of great significance for large-scale video understanding, allowing efficient processing of the large amount of videos that are uploaded every day. In this paper, we introduce a Dilated Temporal Relational Adversarial Network (DTR-GAN) to achieve frame-level video summarization. The dilated temporal relational units in the generator aim to exploit multi-scale temporal context in order to select key frames. To ensure that the model predicts high quality summaries, we present a discriminator that learns to enhance both the information completeness and compactness via a three-player loss. Experiments on the public TVSum dataset demonstrate the effectiveness of the proposed approach.

References

[1]
Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, et al. 2015. TensorFlow: Large-scale machine learning on heterogeneous systems. (2015). https://www.tensorflow.org/ Software available from tensorflow.org.
[2]
Martin Arjovsky, Soumith Chintala, and Léon Bottou. 2017. Wasserstein generative adversarial networks. In International Conference on Machine Learning. 214--223.
[3]
Liang-Chieh Chen, George Papandreou, Florian Schroff, and Hartwig Adam. 2017. Rethinking atrous convolution for semantic image segmentation. arXiv preprint arXiv: 1706.05587 (2017).
[4]
Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets. In Advances in Neural Information Processing Systems. 2672--2680.
[5]
Alex Graves and Jürgen Schmidhuber. 2005. Framewise phoneme classification with bidirectional LSTM and other neural network architectures. Neural Networks 18, 5--6 (2005), 602--610.
[6]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770--778.
[7]
Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural Computation 9, 8 (1997), 1735--1780.
[8]
Colin Lea, Michael D Flynn, Rene Vidal, Austin Reiter, and Gregory D Hager. 2016. Temporal convolutional networks for action segmentation and detection. arXiv preprint arXiv: 1611.05267 (2016).
[9]
Behrooz Mahasseni, Michael Lam, and Sinisa Todorovic. 2017. Unsupervised video summarization with adversarial lstm networks. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition. 202--211.
[10]
Xudong Mao, Qing Li, Haoran Xie, Raymond YK Lau, Zhen Wang, and Stephen Paul Smolley. 2017. Least squares generative adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision. 2813--2821.
[11]
Jingjing Meng, Hongxing Wang, Junsong Yuan, and Yap-Peng Tan. 2016. From keyframes to key objects: Video summarization by representative object proposal selection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1039--1048.
[12]
Bryan A Plummer, Matthew Brown, and Svetlana Lazebnik. 2017. Enhancing video summarization via vision-language embedding. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5781--5789.
[13]
Danila Potapov, Matthijs Douze, Zaid Harchaoui, and Cordelia Schmid. 2014. Category-specific video summarization. In Proceedings of European Conference on Computer Vision. 540--555.
[14]
Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, et al. 2015. Imagenet large scale visual recognition challenge. International Journal of Computer Vision 115, 3 (2015), 211--252.
[15]
Aidean Sharghi, Boqing Gong, and Mubarak Shah. 2016. Query-focused extractive video summarization. In European Conference on Computer Vision. 3--19.
[16]
Alan F Smeaton, Paul Over, and Wessel Kraaij. 2006. Evaluation campaigns and TRECVid. In Proceedings of the 8th ACM International Workshop on Multimedia Information Retrieval. 321--330.
[17]
Yale Song, Jordi Vallmitjana, Amanda Stent, and Alejandro Jaimes. 2015. Tvsum: Summarizing web videos using titles. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5179--5187.
[18]
Ke Zhang, Wei-Lun Chao, Fei Sha, and Kristen Grauman. 2016. Video summarization with long short-term memory. In Proceedings of European Conference on Computer Vision. 766--782.
[19]
Bin Zhao and Eric P Xing. 2014. Quasi real-time summarization for consumer videos. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2513--2520.
[20]
Kaiyang Zhou, Yu Qiao, and Tao Xiang. 2017. Deep reinforcement learning for unsupervised video summarization with diversity-representativeness reward. arXiv preprint arXiv: 1801.00054 (2017).
[21]
Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A Efros. 2017. Unpaired image-to-image translation using cycle-consistent adversarial networks. arXiv preprint arXiv: 1703.10593 (2017).

Cited By

View all
  • (2024)Multi-modal Video SummarizationProceedings of the 2024 International Conference on Multimedia Retrieval10.1145/3652583.3657582(1214-1218)Online publication date: 30-May-2024
  • (2024)VideoXum: Cross-Modal Visual and Textural Summarization of VideosIEEE Transactions on Multimedia10.1109/TMM.2023.333587526(5548-5560)Online publication date: 2024
  • (2024)“Previously on…” from Recaps to Story Summarization2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52733.2024.01294(13635-13646)Online publication date: 16-Jun-2024
  • Show More Cited By

Index Terms

  1. DTR-GAN: dilated temporal relational adversarial network for video summarization

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    ACM TURC '19: Proceedings of the ACM Turing Celebration Conference - China
    May 2019
    963 pages
    ISBN:9781450371582
    DOI:10.1145/3321408
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 17 May 2019

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. dilated temporal relation
    2. generative adversarial network
    3. three-player loss
    4. video summarization

    Qualifiers

    • Research-article

    Funding Sources

    • Norwegian Research Council FRIPRO
    • National Natural Science Foundation of China

    Conference

    ACM TURC 2019

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)16
    • Downloads (Last 6 weeks)2
    Reflects downloads up to 04 Oct 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Multi-modal Video SummarizationProceedings of the 2024 International Conference on Multimedia Retrieval10.1145/3652583.3657582(1214-1218)Online publication date: 30-May-2024
    • (2024)VideoXum: Cross-Modal Visual and Textural Summarization of VideosIEEE Transactions on Multimedia10.1109/TMM.2023.333587526(5548-5560)Online publication date: 2024
    • (2024)“Previously on…” from Recaps to Story Summarization2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52733.2024.01294(13635-13646)Online publication date: 16-Jun-2024
    • (2024)Video summarization using transformer model assisted AC-SUM-GAN2023 4th International Conference on Intelligent Technologies (CONIT)10.1109/CONIT61985.2024.10627023(1-8)Online publication date: 21-Jun-2024
    • (2024)A deep audio-visual model for efficient dynamic video summarizationJournal of Visual Communication and Image Representation10.1016/j.jvcir.2024.104130100(104130)Online publication date: Apr-2024
    • (2023)Unsupervised Video Summarization Based on Deep Reinforcement Learning with InterpolationSensors10.3390/s2307338423:7(3384)Online publication date: 23-Mar-2023
    • (2023)Two stream multi-layer convolutional network for keyframe-based video summarizationMultimedia Tools and Applications10.1007/s11042-023-14665-x82:25(38467-38508)Online publication date: 16-Mar-2023
    • (2023)Video summarization using deep learning techniques: a detailed analysis and investigationArtificial Intelligence Review10.1007/s10462-023-10444-056:11(12347-12385)Online publication date: 15-Mar-2023
    • (2023)A comprehensive study of automatic video summarization techniquesArtificial Intelligence Review10.1007/s10462-023-10429-z56:10(11473-11633)Online publication date: 13-Mar-2023
    • (2023)A two-stage attention augmented fully convolutional network-based dynamic video summarizationMultimedia Systems10.1007/s00530-023-01154-229:6(3685-3701)Online publication date: 1-Dec-2023
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media