research-article

DTR-GAN: dilated temporal relational adversarial network for video summarization

Authors:

Michael Kampffmeyer,

Xiaoguang Zhao,

Min TanAuthors Info & Claims

ACM TURC '19: Proceedings of the ACM Turing Celebration Conference - China

Article No.: 89, Pages 1 - 6

https://doi.org/10.1145/3321408.3322622

Published: 17 May 2019 Publication History

Abstract

Video summarization targets the challenge of finding the smallest subset of frames, while still conveying the whole story of a given video. Thus it is of great significance for large-scale video understanding, allowing efficient processing of the large amount of videos that are uploaded every day. In this paper, we introduce a Dilated Temporal Relational Adversarial Network (DTR-GAN) to achieve frame-level video summarization. The dilated temporal relational units in the generator aim to exploit multi-scale temporal context in order to select key frames. To ensure that the model predicts high quality summaries, we present a discriminator that learns to enhance both the information completeness and compactness via a three-player loss. Experiments on the public TVSum dataset demonstrate the effectiveness of the proposed approach.

References

[1]

Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, et al. 2015. TensorFlow: Large-scale machine learning on heterogeneous systems. (2015). https://www.tensorflow.org/ Software available from tensorflow.org.

[2]

Martin Arjovsky, Soumith Chintala, and Léon Bottou. 2017. Wasserstein generative adversarial networks. In International Conference on Machine Learning. 214--223.

Digital Library

[3]

Liang-Chieh Chen, George Papandreou, Florian Schroff, and Hartwig Adam. 2017. Rethinking atrous convolution for semantic image segmentation. arXiv preprint arXiv: 1706.05587 (2017).

[4]

Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets. In Advances in Neural Information Processing Systems. 2672--2680.

Digital Library

[5]

Alex Graves and Jürgen Schmidhuber. 2005. Framewise phoneme classification with bidirectional LSTM and other neural network architectures. Neural Networks 18, 5--6 (2005), 602--610.

Digital Library

[6]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770--778.

[7]

Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural Computation 9, 8 (1997), 1735--1780.

Digital Library

[8]

Colin Lea, Michael D Flynn, Rene Vidal, Austin Reiter, and Gregory D Hager. 2016. Temporal convolutional networks for action segmentation and detection. arXiv preprint arXiv: 1611.05267 (2016).

[9]

Behrooz Mahasseni, Michael Lam, and Sinisa Todorovic. 2017. Unsupervised video summarization with adversarial lstm networks. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition. 202--211.

[10]

Xudong Mao, Qing Li, Haoran Xie, Raymond YK Lau, Zhen Wang, and Stephen Paul Smolley. 2017. Least squares generative adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision. 2813--2821.

[11]

Jingjing Meng, Hongxing Wang, Junsong Yuan, and Yap-Peng Tan. 2016. From keyframes to key objects: Video summarization by representative object proposal selection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1039--1048.

[12]

Bryan A Plummer, Matthew Brown, and Svetlana Lazebnik. 2017. Enhancing video summarization via vision-language embedding. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5781--5789.

[13]

Danila Potapov, Matthijs Douze, Zaid Harchaoui, and Cordelia Schmid. 2014. Category-specific video summarization. In Proceedings of European Conference on Computer Vision. 540--555.

[14]

Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, et al. 2015. Imagenet large scale visual recognition challenge. International Journal of Computer Vision 115, 3 (2015), 211--252.

Digital Library

[15]

Aidean Sharghi, Boqing Gong, and Mubarak Shah. 2016. Query-focused extractive video summarization. In European Conference on Computer Vision. 3--19.

[16]

Alan F Smeaton, Paul Over, and Wessel Kraaij. 2006. Evaluation campaigns and TRECVid. In Proceedings of the 8th ACM International Workshop on Multimedia Information Retrieval. 321--330.

Digital Library

[17]

Yale Song, Jordi Vallmitjana, Amanda Stent, and Alejandro Jaimes. 2015. Tvsum: Summarizing web videos using titles. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5179--5187.

[18]

Ke Zhang, Wei-Lun Chao, Fei Sha, and Kristen Grauman. 2016. Video summarization with long short-term memory. In Proceedings of European Conference on Computer Vision. 766--782.

[19]

Bin Zhao and Eric P Xing. 2014. Quasi real-time summarization for consumer videos. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2513--2520.

Digital Library

[20]

Kaiyang Zhou, Yu Qiao, and Tao Xiang. 2017. Deep reinforcement learning for unsupervised video summarization with diversity-representativeness reward. arXiv preprint arXiv: 1801.00054 (2017).

[21]

Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A Efros. 2017. Unpaired image-to-image translation using cycle-consistent adversarial networks. arXiv preprint arXiv: 1703.10593 (2017).

Cited By

Huang JGurrin CKongkachandra RSchoeffmann KDang-Nguyen DRossetto LSatoh SZhou L(2024)Multi-modal Video SummarizationProceedings of the 2024 International Conference on Multimedia Retrieval10.1145/3652583.3657582(1214-1218)Online publication date: 30-May-2024
https://dl.acm.org/doi/10.1145/3652583.3657582
Lin JHua HChen MLi YHsiao JHo CLuo J(2024)VideoXum: Cross-Modal Visual and Textural Summarization of VideosIEEE Transactions on Multimedia10.1109/TMM.2023.333587526(5548-5560)Online publication date: 2024
https://doi.org/10.1109/TMM.2023.3335875
Singh ASrivastava DTapaswi M(2024)“Previously on…” from Recaps to Story Summarization2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52733.2024.01294(13635-13646)Online publication date: 16-Jun-2024
https://doi.org/10.1109/CVPR52733.2024.01294
Show More Cited By

Index Terms

DTR-GAN: dilated temporal relational adversarial network for video summarization
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision tasks
        Video summarization

Recommendations

A GAN based Video Summarization Method with Representation Loss
ICMR '24: Proceedings of the 2024 International Conference on Multimedia Retrieval

An effective video summary should encapsulate the entire narrative and highlight its most critical content. However, supervised learning heavily relies on labor-intensive and time-consuming manual annotations. To tackle the issue, we propose a ...
Triple discriminators - equipped GAN for Denoising of Chinese calligraphic tablet images
Abstract
Denoising of Chinese calligraphic tablet images is of great importance in regard to the study of both content and character shapes in these images. Formerly GAN (generative adversarial network) based image denoising methods model the noise in the ...
GanNeXt: A New Convolutional GAN for Anomaly Detection
Artificial Neural Networks and Machine Learning – ICANN 2023
Abstract
Anomaly detection refers to the process of detecting anomalies from data that do not follow its distribution. In recent years, Transformer-based methods utilizing generative adversarial networks (GANs) have shown remarkable performance in this ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

ACM TURC '19: Proceedings of the ACM Turing Celebration Conference - China

May 2019

963 pages

ISBN:9781450371582

DOI:10.1145/3321408

Copyright © 2019 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 May 2019

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Norwegian Research Council FRIPRO
National Natural Science Foundation of China

Conference

ACM TURC 2019

ACM TURC 2019: ACM Turing Celebration Conference - China

May 17 - 19, 2019

Chengdu, China

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

28
Total Citations
View Citations
268
Total Downloads

Downloads (Last 12 months)16
Downloads (Last 6 weeks)2

Reflects downloads up to 04 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Huang JGurrin CKongkachandra RSchoeffmann KDang-Nguyen DRossetto LSatoh SZhou L(2024)Multi-modal Video SummarizationProceedings of the 2024 International Conference on Multimedia Retrieval10.1145/3652583.3657582(1214-1218)Online publication date: 30-May-2024
https://dl.acm.org/doi/10.1145/3652583.3657582
Lin JHua HChen MLi YHsiao JHo CLuo J(2024)VideoXum: Cross-Modal Visual and Textural Summarization of VideosIEEE Transactions on Multimedia10.1109/TMM.2023.333587526(5548-5560)Online publication date: 2024
https://doi.org/10.1109/TMM.2023.3335875
Singh ASrivastava DTapaswi M(2024)“Previously on…” from Recaps to Story Summarization2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52733.2024.01294(13635-13646)Online publication date: 16-Jun-2024
https://doi.org/10.1109/CVPR52733.2024.01294
Krishnan HGladwin JNambiar NSamanuai AVyshakh Madhu T(2024)Video summarization using transformer model assisted AC-SUM-GAN2023 4th International Conference on Intelligent Technologies (CONIT)10.1109/CONIT61985.2024.10627023(1-8)Online publication date: 21-Jun-2024
https://doi.org/10.1109/CONIT61985.2024.10627023
El-Nagar GEl-Sawy ARashad M(2024)A deep audio-visual model for efficient dynamic video summarizationJournal of Visual Communication and Image Representation10.1016/j.jvcir.2024.104130100(104130)Online publication date: Apr-2024
https://doi.org/10.1016/j.jvcir.2024.104130
Yoon UHong MJo G(2023)Unsupervised Video Summarization Based on Deep Reinforcement Learning with InterpolationSensors10.3390/s2307338423:7(3384)Online publication date: 23-Mar-2023
https://doi.org/10.3390/s23073384
Khurana KDeshpande U(2023)Two stream multi-layer convolutional network for keyframe-based video summarizationMultimedia Tools and Applications10.1007/s11042-023-14665-x82:25(38467-38508)Online publication date: 16-Mar-2023
https://doi.org/10.1007/s11042-023-14665-x
Saini PKumar KKashid SSaini ANegi A(2023)Video summarization using deep learning techniques: a detailed analysis and investigationArtificial Intelligence Review10.1007/s10462-023-10444-056:11(12347-12385)Online publication date: 15-Mar-2023
https://doi.org/10.1007/s10462-023-10444-0
Gupta DSharma A(2023)A comprehensive study of automatic video summarization techniquesArtificial Intelligence Review10.1007/s10462-023-10429-z56:10(11473-11633)Online publication date: 13-Mar-2023
https://doi.org/10.1007/s10462-023-10429-z
Gupta DSharma A(2023)A two-stage attention augmented fully convolutional network-based dynamic video summarizationMultimedia Systems10.1007/s00530-023-01154-229:6(3685-3701)Online publication date: 1-Dec-2023
https://dl.acm.org/doi/10.1007/s00530-023-01154-2
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents