DTR-GAN: Dilated Temporal Relational Adversarial Network for Video Summarization

Zhang, Yujia; Kampffmeyer, Michael; Liang, Xiaodan; Zhang, Dingwen; Tan, Min; Xing, Eric P.

Abstract:The large amount of videos popping up every day, make it is more and more critical that key information within videos can be extracted and understood in a very short time. Video summarization, the task of finding the smallest subset of frames, which still conveys the whole story of a given video, is thus of great significance to improve efficiency of video understanding. In this paper, we propose a novel Dilated Temporal Relational Generative Adversarial Network (DTR-GAN) to achieve frame-level video summarization. Given a video, it can select a set of key frames, which contains the most meaningful and compact information. Specifically, DTR-GAN learns a dilated temporal relational generator and a discriminator with three-player loss in an adversarial manner. A new dilated temporal relation (DTR) unit is introduced for enhancing temporal representation capturing. The generator aims to select key frames by using DTR units to effectively exploit global multi-scale temporal context and to complement the commonly used Bi-LSTM. To ensure that the summaries capture enough key video representation from a global perspective rather than a trivial randomly shorten sequence, we present a discriminator that learns to enforce both the information completeness and compactness of summaries via a three-player loss. The three-player loss includes the generated summary loss, the random summary loss, and the real summary (ground-truth) loss, which play important roles for better regularizing the learned model to obtain useful summaries. Comprehensive experiments on two public datasets SumMe and TVSum show the superiority of our DTR-GAN over the state-of-the-art approaches.

Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:1804.11228 [cs.CV]
	(or arXiv:1804.11228v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.1804.11228

Computer Science > Computer Vision and Pattern Recognition

Title:DTR-GAN: Dilated Temporal Relational Adversarial Network for Video Summarization

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators