ASTS: attention based spatio-temporal sequential framework for movie trailer genre classification

Yu, Yitong; Lu, Ziyu; Li, Yang; Liu, Delong

doi:10.1007/s11042-020-10125-y

ASTS: attention based spatio-temporal sequential framework for movie trailer genre classification

Published: 13 November 2020

Volume 80, pages 9749–9764, (2021)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Yitong Yu¹,
Ziyu Lu ORCID: orcid.org/0000-0002-3049-0085²,
Yang Li¹ &
…
Delong Liu³

489 Accesses
12 Citations
Explore all metrics

Abstract

Automatic movie trailer genre classification is a challenging task because trailers have more diverse content and high-level sequential semantic concepts within the movie storyline, which can help for multimedia search and personalized movie recommendation. Traditional methods generally extract the low-level features or consider the local sequential dependencies among trailer frames, ignoring the global high-level sequential semantic concepts. In this manuscript, we propose a novel and effective Attention based Spatio-temporal Sequential Framework (ASTS) for movie trailer genre classification. The proposed framework mainly consists of two modules, respectively the spatio-temporal descriptive module and the attention-based sequential module. The spatio-temporal descriptive module adopts some advanced convolution neural networks to extract the spatio-temporal features of key trailer frames, which can capture the local spatio-temporal semantic features. The attention-based sequential module is designed to process the extracted spatio-temporal feature representation sequence for capturing the global high-level sequential semantic concepts within the movie storyline. We crawl 14,415 labeled movie trailers from YouTube and integrate them into the public dataset MovieLens. Experiment results show that our proposed framework is superior to state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 4

Multi-modal sequence model with gated fully convolutional blocks for micro-video venue classification

Article 17 December 2019

SSR-MGTI: Self-attention Sequential Recommendation Algorithm Based on Movie Genre Time Interval

Multimodal movie genre classification using recurrent neural network

Article 30 July 2022

Notes

https://www.netflix.com/cn/
https://www.imdb.com/
https://github.com/Marinyyt/MovieTrailer-14k
k is set as 30, which is larger than the most clip numbers in movie trailers. And for those trailers whose clips are larger than 30, we randomly selected 30 clips among them.
Many strategies and methods can be adopted to extract the representative frames. In this paper, we use the interval sampling strategy and leave the exploration of sampling methods for future work.
The crawled dataset is public. https://github.com/Marinyyt/MovieTrailer-14k
We have performed experiments about the different settings for the parameters. Experiment results show that different D_h and D_a have little effects on the performance of our method.

References

Abualigah L, Qasim M (2019) Feature selection and enhanced krill herd algorithm for text document clustering. Springer, Berlin
Book Google Scholar
Carreira J, Zisserman A (2017) Quo vadis, action recognition? A new model and the kinetics dataset. arXiv:1705.07750
Carreira J, Zisserman A (2017) Quo vadis, action recognition? A new model and the kinetics dataset. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6299–6308
Chu WT, Guo HJ (2017) Movie genre classification based on poster images with deep neural networks, pp 39–45. https://doi.org/10.1145/3132515.3132516
Chung J, Gülçehre Ç, Cho K, Bengio Y (2014) Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv:1412.3555
Deldjoo Y, Elahi M, Quadrana M, Cremonesi P (2015) Toward building a content-based video recommendation system based on low-level features. https://doi.org/10.1007/978-3-319-27729-5
Deldjoo Y, Elahi M, Cremonesi P, Garzotto F, Piazzolla P, Quadrana M (2016) Content-based video recommendation system based on stylistic visual features. Journal on Data Semantics 5:1–15. https://doi.org/10.1007/s13740-016-0060-9
Article Google Scholar
Deng J, Dong W, Socher R, Li L J, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition, IEEE, pp 248– 255
Donahue J, Anne Hendricks L, Guadarrama S, Rohrbach M, Venugopalan S, Saenko K, Darrell T (2015) Long-term recurrent convolutional networks for visual recognition and description. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2625–2634
Everingham M, Van Gool L, Williams C K I, Winn J, Zisserman A (2010) The pascal visual object classes (voc) challenge. Int J Comput Vis 88(2):303–338
Article Google Scholar
Harper F M, Konstan J A (2015) The movielens datasets: history and context. ACM Trans Interact Intell Syst 5(4):19:1–19:19
Google Scholar
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Huang H, Shih W, Hsu W (2007) A film classifier based on low-level visual features. In: 2007 IEEE 9th workshop on multimedia signal processing, pp 465–468
Karpathy A, Toderici G, Shetty S, Leung T, Sukthankar R, Fei-Fei L (2014) Large-scale video classification with convolutional neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition
Kingma D, Ba J (2014) Adam: A method for stochastic optimization. Computer Science
Kundalia K, Patel Y, Shah M (2019) Multi-label movie genre detection from a movie poster using knowledge transfer learning. Augmented Human Research 5:11. https://doi.org/10.1007/s41133-019-0029-y
Article Google Scholar
Li Q, Qiu Z, Yao T, Mei T, Rui Y, Luo J (2016) Action recognition by learning deep multi-granular spatio-temporal video representation. In: Proceedings of the 2016 ACM on international conference on multimedia retrieval, pp 159–166
Rasheed Z, Shah M (2002) Movie genre classification by exploiting audio-visual features of previews. In: Object recognition supported by user interaction for service robots, vol 2, pp 1086–1089
Rasheed Z, Shah M (2002) Movie genre classification by exploiting audio-visual features of previews. In: International conference on pattern recognition
Rasheed Z, Sheikh Y, Shah M (2005) On the use of computable features for film classification. IEEE Transactions on Circuits And Systems for Video Technology 15:52–64
Article Google Scholar
Schuster M, Paliwal K K (1997) Bidirectional recurrent neural networks. IEEE Transactions on Signal Processing 45(11):2673–2681
Article Google Scholar
Simões G, Wehrmann J, Barros R, Ruiz D (2016) Movie genre classification with convolutional neural networks, pp 259–266. https://doi.org/10.1109/IJCNN.2016.7727207
Simonyan K, Zisserman A (2014) Two-stream convolutional networks for action recognition in videos. In: Advances in neural information processing systems, pp 568–576
Tran D, Bourdev L, Fergus R, Torresani L, Paluri M (2015) Learning spatiotemporal features with 3d convolutional networks. In: Proceedings of the IEEE international conference on computer vision, pp 4489–4497
Wang L, Xiong Y, Zhe W, Yu Q, Lin D, Tang X, Gool L V (2016) Temporal segment networks: towards good practices for deep action recognition. In: Eccv
Wehrmann J, Barros R C (2017) Movie genre classification: a multi-label approach based on convolutions through time. Appl Soft Comput 61
Wehrmann J, Barros R C, Simões G S, Paula T S, Ruiz DD (2017) (Deep) Learning from frames. In: Intelligent systems
Zha S, Luisier F, Andrews W, Srivastava N, Salakhutdinov R (2015) Exploiting image-trained cnn architectures for unconstrained video classification. arXiv:1503.04144
Zhou H, Hermans T, Karandikar A V, Rehg J M (2010) Movie genre classification via scene categorization. In: International conference on multimedia

Download references

Author information

Authors and Affiliations

School of Information, Central University of Finance and Economics, Beijing, China
Yitong Yu & Yang Li
Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
Ziyu Lu
China Institute of Water Resources and Hydropower Research, Beijing, China
Delong Liu

Authors

Yitong Yu
View author publications
You can also search for this author in PubMed Google Scholar
Ziyu Lu
View author publications
You can also search for this author in PubMed Google Scholar
Yang Li
View author publications
You can also search for this author in PubMed Google Scholar
Delong Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ziyu Lu.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

(ZIP 2.32 KB)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yu, Y., Lu, Z., Li, Y. et al. ASTS: attention based spatio-temporal sequential framework for movie trailer genre classification. Multimed Tools Appl 80, 9749–9764 (2021). https://doi.org/10.1007/s11042-020-10125-y

Download citation

Received: 15 August 2019
Revised: 31 August 2020
Accepted: 19 October 2020
Published: 13 November 2020
Issue Date: March 2021
DOI: https://doi.org/10.1007/s11042-020-10125-y

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

ASTS: attention based spatio-temporal sequential framework for movie trailer genre classification

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Multi-modal sequence model with gated fully convolutional blocks for micro-video venue classification

SSR-MGTI: Self-attention Sequential Recommendation Algorithm Based on Movie Genre Time Interval

Multimodal movie genre classification using recurrent neural network

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Electronic supplementary material

(ZIP 2.32 KB)

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

ASTS: attention based spatio-temporal sequential framework for movie trailer genre classification

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Multi-modal sequence model with gated fully convolutional blocks for micro-video venue classification

SSR-MGTI: Self-attention Sequential Recommendation Algorithm Based on Movie Genre Time Interval

Multimodal movie genre classification using recurrent neural network

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Electronic supplementary material

(ZIP 2.32 KB)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation