MOPO-HBT: A movie poster dataset for title extraction and recognition

Ghosh, Mridul; Roy, Sayan Saha; Banik, Bivan; Mukherjee, Himadri; Obaidullah, Sk Md; Roy, Kaushik

doi:10.1007/s11042-023-17539-4

MOPO-HBT: A movie poster dataset for title extraction and recognition

Published: 06 December 2023

Volume 83, pages 54545–54568, (2024)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Mridul Ghosh^1,2,
Sayan Saha Roy³,
Bivan Banik⁴,
Himadri Mukherjee⁴,
Sk Md Obaidullah² &
…
Kaushik Roy ORCID: orcid.org/0000-0002-3360-7576⁴

172 Accesses
Explore all metrics

Abstract

Real-world images often encompass embedded texts that adhere to disparate disciplines like business, education, and amusement, to name a few. Such images are graphically rich in terms of font attributes, color distribution, foreground-background similarity, and component organization. This aggravates the difficulty of recognizing texts from these images. Such characteristics are very prominent in the case of movie posters. One of the first pieces of information on movie posters is the title. Automatic recognition of movie titles from images can aid in efficient indexing as well as information conveyance. However, it is accompanied by other texts like the names of actors, producers, taglines, dates, etc. Though the organization of components is somewhat similar across different film industries like Tollywood (West Bengal), Bollywood (Mumbai), and Hollywood (Los Angeles), the graffiti patterns differ in multifarious instances. To address the problem of movie title understanding, we propose a dataset named MOvie POsters-Hollywood Bollywood Tollywood (MOPO-HBT) that encompasses movie posters from the aforementioned industries. The entire dataset is publicly available (http://ieee-dataport.org/11564) for research purposes. The baseline results of title identification and recognition were obtained with a CNN-based (Convolutional Neural Network) approach, wherein the titles were extracted using the M-EAST (Modified-Efficient and Accurate Scene Text) detector model.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Algorithm 1

Algorithm 2

Algorithm 3

Script Identification of Movie Titles from Posters

Understanding movie poster: transfer-deep learning approach for graphic-rich text recognition

Article 26 March 2021

CNN-based segmentation of speech balloons and narrative text boxes from comic book page images

Article 21 April 2021

Data Availibility Statement

The MOPO-HBT dataset that supports the findings of this study is openly available in the public repository, IEEE Dataport, at https://www.ieee-dataport.org/documents/mopo-hbt

Notes

The dataset is available on http://ieee-dataport.org/11564
https://www.teaser-trailer.com, visited on 14.01.2022
https://www.yts.mx, visited on 12.01.2022
https://www.imdb.com/, visited on 12.01.2022
http://www.impawards.com/, visited on 15.01.2022
https://www.incinemas.sg, visited on 22.02.2022
https://www.movieinsider.com/, visited on 12.12.2021
https://www.koimoi.com/, visited on 16.01.2021
https://www.in.ign.com, visited on 13.01.2022
https://www.github.com/mridulxyz/MOPO/blob/main/Angle%20extraction
https://www.github.com/mridulxyz/MOPO/blob/main/Coordinate%20extraction
Obtained from Python library
https://www.github.com/mridulxyz/MOPO/commit/229561ffc37355fffb7775e8a67583ab30a78c83

Abbreviations

MOPO-HBT :: MOvie POsters-Hollywood Bollywood Tollywood
CNN :: Convolutional Neural Network
RRC :: Robust Reading Competitions
M-EAST :: Modified- Efficient and Accurate Scene Text
ICDAR :: International Conference on Document Analysis and Recognition
SVT :: Street View Text
CUTE80 :: Curve Text 80
COCO-Text :: Common Objects in Context-Text
CVSI-15 :: Competition on Video Script Identification-15
SVM :: Support Vector Machine
RFN :: Refined Feature Attentive Network
ViTSTR :: Vision Transformer for Scene Text Recognition
BiFPN :: Bi-directional Feature Pyramid Network
EAST :: Efficient and Accurate Scene Text
TexRNet :: Text Refinement Network
VGG-19 :: Visual Geometry Group-19
ROI :: Region of Interest
IMDB :: Internet Movie Database
YTS :: YIFI Torrent Site
IMP :: Independent Moving Pictures Company
IGN :: Imagine Games Network
CUDA :: Compute Unified Device Architecture
ROC :: Receiver Operating Characteristic
AUC :: Area Under the Curve
UOI :: Union Over Intersection

References

Ghosh M, Mukherjee H, Obaidullah SM, Gao X-Z, Roy K (2023) Scene text understanding: recapitulating the past decade. Artificial Intelligence Review, pp 1–73
Luo C, Lin Q, Liu Y, Jin L, Shen C (2021) Separating content from style using adversarial learning for recognizing text in the wild. Int J Comput Vis 129(4):960–976
Article MathSciNet Google Scholar
Bai X, Shi B, Zhang C, Cai X, Qi L (2017) Text/non-text image classification in the wild with convolutional neural networks. Pattern Recognit 66:437–446
Article Google Scholar
Ghosh M, Mukherjee H, Obaidullah SM, Santosh K, Das N, Roy K (2019) Identifying the presence of graphical texts in scene images using cnn. In: 2019 International Conference on Document Analysis and Recognition Workshops (ICDARW), vol 1, pp 86–91. IEEE
Ghosh M, Mukherjee H, Obaidullah SM, Roy K (2021) Stdnet: A cnn-based approach to single-/mixed-script detection. Innov Syst Softw Eng 17(3):277–288
Article Google Scholar
Ghosh M, Baidya G, Mukherjee H, Obaidullah SM, Roy K (2022) A deep learning-based approach to single/mixed script-type identification. In: Advanced computing and systems for security: vol 13, pp 121–132. Springer
Veit A, Matera T, Neumann L, Matas J, Belongie S (2016) Coco-text: Dataset and benchmark for text detection and recognition in natural images. arXiv:1601.07140
Gomez L, Nicolaou A, Karatzas D (2017) Improving patch-based scene text script identification with ensembles of conjoined networks. Pattern Recognit 67:85–96
Article Google Scholar
Liao M, Zou Z, Wan Z, Yao C, Bai X (2022) Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Trans Pattern Anal Mach Intell 45(1):919–931
Article Google Scholar
Ma J, Shao W, Ye H, Wang L, Wang H, Zheng Y, Xue X (2018) Arbitrary-oriented scene text detection via rotation proposals. IEEE Trans Multimed 20(11):3111–3122
Article Google Scholar
Saha S, Chakraborty N, Kundu S, Paul S, Mollah AF, Basu S, Sarkar R (2020) Multi-lingual scene text detection and language identification. Pattern Recognit Lett 138:16–22
Article Google Scholar
Yao C, Bai X, Liu W, Ma Y, Tu Z (2012) Detecting texts of arbitrary orientations in natural images. In: 2012 IEEE Conference on computer vision and pattern recognition, pp 1083–1090. IEEE
Wang, K, Babenko B, Belongie S (2011) End-to-end scene text recognition. In: 2011 International conference on computer vision, pp 1457–1464. IEEE
Risnumawan A, Shivakumara P, Chan CS, Tan CL (2014) A robust arbitrary text detection system for natural scene images. Exp Syst Appl 41(18):8027–8048
Article Google Scholar
Shi B, Bai X, Yao C (2016) An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans Pattern Anal Mach Intell 39(11):2298–2304
Article Google Scholar
Liu X, Meng G, Pan C (2019) Scene text detection and recognition with advances in deep learning: a survey. International Journal on Document Analysis and Recognition (IJDAR) 22(2):143–162
Article Google Scholar
Huang Y-F, Hsieh M-C (2015) Text extraction and recognition from posters for movie title retrieval. In: Proceedings of the 19th International database engineering & applications symposium, pp 180–185
Ghosh M, Roy SS, Mukherjee H, Obaidullah SM, Santosh K, Roy K (2021) Understanding movie poster: Transfer-deep learning approach for graphic-rich text recognition. The Visual Computer, pp 1–20
Lucas SM, Panaretos A, Sosa L, Tang A, Wong S, Young R, Ashida K, Nagai H, Okamoto M, Yamamoto H et al (2005) Icdar 2003 robust reading competitions: entries, results, and future directions. International Journal of Document Analysis and Recognition (IJDAR) 7(2):105–122
Article Google Scholar
Shi B, Yao C, Liao M, Yang M, Xu P, Cui L, Belongie S, Lu S, Bai X (2017) Icdar2017 competition on reading chinese text in the wild (rctw-17). In: 2017 14th Iapr International Conference on Document Analysis and Recognition (ICDAR), vol 1, pp 1429–1434. IEEE
Yuliang L, Lianwen J, Shuaitao Z, Sheng Z (2017) Detecting curve text in the wild: New dataset and new solution. arXiv:1712.02170
Sharma N, Mandal R, Sharma R, Pal U, Blumenstein M (2015) Icdar2015 competition on video script identification (cvsi 2015). In: 2015 13th International Conference on Document Analysis and Recognition (ICDAR), pp 1196–1200. IEEE
Demarty C-H, Penet C, Soleymani M, Gravier G (2015) Vsd, a public dataset for the detection of violent scenes in movies: design, annotation, analysis and evaluation. Multimed Tools Appl 74:7379–7404
Article Google Scholar
Wehrmann J, Barros RC (2017) Movie genre classification: A multi-label approach based on convolutions through time. Appl Soft Comput 61:973–982
Article Google Scholar
Bougiatiotis K, Giannakopoulos T (2018) Enhanced movie content similarity based on textual, auditory and visual information. Exp Syst Appl 96:86–102
Article Google Scholar
Korai MA, Bouk AH, Sindhi AH (2021) Movie genre classification from rgb movie poster image using deep feed-forward network. Yanbu J Eng Sci 18(1):73–80
Google Scholar
Chu W-T, Guo H-J (2017) Movie genre classification based on poster images with deep neural networks. In: Proceedings of the workshop on multimodal understanding of social, affective and subjective attributes, pp 39–45
Barney G, Kaya K (2019) Predicting genre from movie posters. Stanford CS 229: Machine Learning
Gozuacik N, Sakar CO (2019) Turkish movie genre classification from poster images using convolutional neural networks. In: 2019 11th International Conference on Electrical and Electronics Engineering (ELECO), pp 930–934. IEEE
Dewidar M (2019) Inferring movie genres from their poster. Learning 1
Bhunia AK, Kumar G, Roy PP, Balasubramanian R, Pal U (2018) Text recognition in scene image and video frame using color channel selection. Multimedia tools and applications 77(7):8551–8578
Article Google Scholar
Tulsyan K, Srivastava N, Mondal A, Jawahar C (2020) A benchmark system for indian language text recognition. In: International workshop on document analysis systems, pp 74–88. Springer
Li H, Zhang Y, Bayramli B, Lu H (2023) Arbitrary shape scene text detector with accurate text instance generation based on instance-relevant contexts. Multimed Tools Appl 82(12):17827–17852
Article Google Scholar
Guan T, Gu C, Lu C, Tu J, Feng Q, Wu K, Guan X (2022) Industrial scene text detection with refined feature-attentive network. IEEE Trans Circ Syst Vid Technol 32(9):6073–6085
Article Google Scholar
Cai Y, Liu C, Cheng P, Du D, Zhang L, Wang W, Ye Q (2020) Scale-residual learning network for scene text detection. IEEE Trans Circ Syst Vid Technol 31(7):2725–2738
Article Google Scholar
Singh GV, Firdaus M, Ekbal A, Bhattacharyya P (2022) Emoint-trans: A multimodal transformer for identifying emotions and intents in social conversations. IEEE/ACM Trans Aud Speech Lang Process 31:290–300
Article Google Scholar
Firdaus M, Thakur N, Ekbal A (2022) Sentiment guided aspect conditioned dialogue generation in a multimodal system. In: European conference on information retrieval, pp 199–214. Springer
Mishra K, Firdaus M, Ekbal A (2022) Predicting politeness variations in goal-oriented conversations. IEEE Transactions on Computational Social Systems
Long S, He X, Yao C (2021) Scene text detection and recognition: The deep learning era. Int J Comput Vis 129(1):161–184
Article Google Scholar
Kagan D, Levy M, Fire M, Alpert GF (2022) Ethnic representation analysis of commercial movie posters. arXiv:2207.08169
Rahane AA, Subramanian A (2020) Measures of complexity for large scale image datasets. In: 2020 International Conference on Artificial Intelligence in Information and Communication (ICAIIC), pp 282–287. IEEE
Peters RA, Strickland RN (1990) Image complexity metrics for automatic target recognizers. In: Automatic target recognizer system and technology conference, pp 1–17. Citeseer
Ghosh M, Obaidullah SM, Gherardini F, Zdimalova M (2021) Classification of geometric forms in mosaics using deep neural network. J Imaging 7(8):149
Article Google Scholar
Ghosh M, Mukherjee H, Obaidullah SM, Santosh K, Das N, Roy K (2021) Lwsinet: A deep learning-based approach towards video script identification. Multimed Tools Appl 80(19):29095–29128
Article Google Scholar
Ghosh M, Roy SS, Mukherjee H, Obaidullah SM, Gao X-Z, Roy K (2021) Movie title extraction and script separation using shallow convolution neural network. IEEE Access 9:125184–125201
Article Google Scholar
Wang M, Zheng S, Li X, Qin X (2014) A new image denoising method based on gaussian filter. In: 2014 International conference on information science, electronics and electrical engineering, vol 1, pp 163–167. IEEE
BJ, BN, VA NA, Akhil A, et al (2021) A novel binarization method to remove verdigris from ancient metal image. In: 2021 5th International Conference on Intelligent Computing and Control Systems (ICICCS), pp 884–888. IEEE
Suzuki S et al (1985) Topological structural analysis of digitized binary images by border following. Comput Vis Graph Image Process 30(1):32–46
Article Google Scholar
Huadong D, Yang W (2015) A new method for detecting rectangles and triangles. In: 2015 IEEE Advanced Information Technology, Electronic and Automation Control Conference (IAEAC), pp 321–327. IEEE
Firdaus M, Madasu A, Ekbal A (2023) A unified framework for slot based response generation in a multimodal dialogue system. arXiv:2305.17433
Xu X, Zhang Z, Wang Z, Price B, Wang Z, Shi H (2021) Rethinking text segmentation: A novel dataset and a text-specific refinement approach. In: Proceedings of the IEEE/CVF Conference on computer vision and pattern recognition, pp 12045–12055
Cao D, Dang J, Zhong Y (2021) Towards accurate scene text detection with bidirectional feature pyramid network. Symmetry 13(3):486
Article Google Scholar
Zhou X, Yao C, Wen H, Wang Y, Zhou S, He W, Liang J (2017) East: an efficient and accurate scene text detector. In: Proceedings of the IEEE Conference on Computer Vision and pattern recognition, pp 5551–5560
Huang J, Pang G, Kovvuri R, Toh M, Liang KJ, Krishnan P, Yin X, Hassner T (2021) A multiplexed network for end-to-end, multilingual ocr. In: Proceedings of the IEEE/CVF Conference on computer vision and pattern recognition, pp 4547–4557
Sandler M, Howard A, Zhu M, Zhmoginov A, Chen L-C (2018) Mobilenetv2: Inverted residuals and linear bottlenecks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 4510–4520
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556
Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Zacharias E, Teuchler M, Bernier B (2020) Image processing based scene-text detection and recognition with tesseract. arXiv:2004.08079

Download references

Acknowledgements

The authors acknowledge that they have collected the images from various sources: IMDB, Teaser-Trailer, YTS, IMP awards, Incinemas, Movie Insider, koimoi, and IGN.

Author information

Authors and Affiliations

Dept. of Computer Science, Shyampur Siddheswari Mahavidyalaya, Howrah, India
Mridul Ghosh
Dept. of Computer Science & Engineering, Aliah University, Kolkata, India
Mridul Ghosh & Sk Md Obaidullah
Dept. of Electronics & Electrical Communication Engineering, IIT Kharagpur, Kharagpur, India
Sayan Saha Roy
TISA Lab, Dept. of Computer Science, West Bengal State University, Barasat, India
Bivan Banik, Himadri Mukherjee & Kaushik Roy

Authors

Mridul Ghosh
View author publications
You can also search for this author in PubMed Google Scholar
Sayan Saha Roy
View author publications
You can also search for this author in PubMed Google Scholar
Bivan Banik
View author publications
You can also search for this author in PubMed Google Scholar
Himadri Mukherjee
View author publications
You can also search for this author in PubMed Google Scholar
Sk Md Obaidullah
View author publications
You can also search for this author in PubMed Google Scholar
Kaushik Roy
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kaushik Roy.

Ethics declarations

Conflicts of interest

The authors declare that they neither have received any funding for the present work nor have any conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Ghosh, M., Roy, S.S., Banik, B. et al. MOPO-HBT: A movie poster dataset for title extraction and recognition. Multimed Tools Appl 83, 54545–54568 (2024). https://doi.org/10.1007/s11042-023-17539-4

Download citation

Received: 17 May 2022
Revised: 20 September 2023
Accepted: 16 October 2023
Published: 06 December 2023
Issue Date: May 2024
DOI: https://doi.org/10.1007/s11042-023-17539-4

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

MOPO-HBT: A movie poster dataset for title extraction and recognition

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Script Identification of Movie Titles from Posters

Understanding movie poster: transfer-deep learning approach for graphic-rich text recognition

CNN-based segmentation of speech balloons and narrative text boxes from comic book page images

Data Availibility Statement

Notes

Abbreviations

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of interest

Additional information

Publisher's Note

Appendix

Appendix

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now