Abstract
Real-world images often encompass embedded texts that adhere to disparate disciplines like business, education, and amusement, to name a few. Such images are graphically rich in terms of font attributes, color distribution, foreground-background similarity, and component organization. This aggravates the difficulty of recognizing texts from these images. Such characteristics are very prominent in the case of movie posters. One of the first pieces of information on movie posters is the title. Automatic recognition of movie titles from images can aid in efficient indexing as well as information conveyance. However, it is accompanied by other texts like the names of actors, producers, taglines, dates, etc. Though the organization of components is somewhat similar across different film industries like Tollywood (West Bengal), Bollywood (Mumbai), and Hollywood (Los Angeles), the graffiti patterns differ in multifarious instances. To address the problem of movie title understanding, we propose a dataset named MOvie POsters-Hollywood Bollywood Tollywood (MOPO-HBT) that encompasses movie posters from the aforementioned industries. The entire dataset is publicly available (http://ieee-dataport.org/11564) for research purposes. The baseline results of title identification and recognition were obtained with a CNN-based (Convolutional Neural Network) approach, wherein the titles were extracted using the M-EAST (Modified-Efficient and Accurate Scene Text) detector model.
Similar content being viewed by others
Data Availibility Statement
The MOPO-HBT dataset that supports the findings of this study is openly available in the public repository, IEEE Dataport, at https://www.ieee-dataport.org/documents/mopo-hbt
Notes
The dataset is available on http://ieee-dataport.org/11564
https://www.teaser-trailer.com, visited on 14.01.2022
https://www.yts.mx, visited on 12.01.2022
https://www.imdb.com/, visited on 12.01.2022
http://www.impawards.com/, visited on 15.01.2022
https://www.incinemas.sg, visited on 22.02.2022
https://www.movieinsider.com/, visited on 12.12.2021
https://www.koimoi.com/, visited on 16.01.2021
https://www.in.ign.com, visited on 13.01.2022
Obtained from Python library
Abbreviations
- MOPO-HBT :
-
MOvie POsters-Hollywood Bollywood Tollywood
- CNN :
-
Convolutional Neural Network
- RRC :
-
Robust Reading Competitions
- M-EAST :
-
Modified- Efficient and Accurate Scene Text
- ICDAR :
-
International Conference on Document Analysis and Recognition
- SVT :
-
Street View Text
- CUTE80 :
-
Curve Text 80
- COCO-Text :
-
Common Objects in Context-Text
- CVSI-15 :
-
Competition on Video Script Identification-15
- SVM :
-
Support Vector Machine
- RFN :
-
Refined Feature Attentive Network
- ViTSTR :
-
Vision Transformer for Scene Text Recognition
- BiFPN :
-
Bi-directional Feature Pyramid Network
- EAST :
-
Efficient and Accurate Scene Text
- TexRNet :
-
Text Refinement Network
- VGG-19 :
-
Visual Geometry Group-19
- ROI :
-
Region of Interest
- IMDB :
-
Internet Movie Database
- YTS :
-
YIFI Torrent Site
- IMP :
-
Independent Moving Pictures Company
- IGN :
-
Imagine Games Network
- CUDA :
-
Compute Unified Device Architecture
- ROC :
-
Receiver Operating Characteristic
- AUC :
-
Area Under the Curve
- UOI :
-
Union Over Intersection
References
Ghosh M, Mukherjee H, Obaidullah SM, Gao X-Z, Roy K (2023) Scene text understanding: recapitulating the past decade. Artificial Intelligence Review, pp 1–73
Luo C, Lin Q, Liu Y, Jin L, Shen C (2021) Separating content from style using adversarial learning for recognizing text in the wild. Int J Comput Vis 129(4):960–976
Bai X, Shi B, Zhang C, Cai X, Qi L (2017) Text/non-text image classification in the wild with convolutional neural networks. Pattern Recognit 66:437–446
Ghosh M, Mukherjee H, Obaidullah SM, Santosh K, Das N, Roy K (2019) Identifying the presence of graphical texts in scene images using cnn. In: 2019 International Conference on Document Analysis and Recognition Workshops (ICDARW), vol 1, pp 86–91. IEEE
Ghosh M, Mukherjee H, Obaidullah SM, Roy K (2021) Stdnet: A cnn-based approach to single-/mixed-script detection. Innov Syst Softw Eng 17(3):277–288
Ghosh M, Baidya G, Mukherjee H, Obaidullah SM, Roy K (2022) A deep learning-based approach to single/mixed script-type identification. In: Advanced computing and systems for security: vol 13, pp 121–132. Springer
Veit A, Matera T, Neumann L, Matas J, Belongie S (2016) Coco-text: Dataset and benchmark for text detection and recognition in natural images. arXiv:1601.07140
Gomez L, Nicolaou A, Karatzas D (2017) Improving patch-based scene text script identification with ensembles of conjoined networks. Pattern Recognit 67:85–96
Liao M, Zou Z, Wan Z, Yao C, Bai X (2022) Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Trans Pattern Anal Mach Intell 45(1):919–931
Ma J, Shao W, Ye H, Wang L, Wang H, Zheng Y, Xue X (2018) Arbitrary-oriented scene text detection via rotation proposals. IEEE Trans Multimed 20(11):3111–3122
Saha S, Chakraborty N, Kundu S, Paul S, Mollah AF, Basu S, Sarkar R (2020) Multi-lingual scene text detection and language identification. Pattern Recognit Lett 138:16–22
Yao C, Bai X, Liu W, Ma Y, Tu Z (2012) Detecting texts of arbitrary orientations in natural images. In: 2012 IEEE Conference on computer vision and pattern recognition, pp 1083–1090. IEEE
Wang, K, Babenko B, Belongie S (2011) End-to-end scene text recognition. In: 2011 International conference on computer vision, pp 1457–1464. IEEE
Risnumawan A, Shivakumara P, Chan CS, Tan CL (2014) A robust arbitrary text detection system for natural scene images. Exp Syst Appl 41(18):8027–8048
Shi B, Bai X, Yao C (2016) An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans Pattern Anal Mach Intell 39(11):2298–2304
Liu X, Meng G, Pan C (2019) Scene text detection and recognition with advances in deep learning: a survey. International Journal on Document Analysis and Recognition (IJDAR) 22(2):143–162
Huang Y-F, Hsieh M-C (2015) Text extraction and recognition from posters for movie title retrieval. In: Proceedings of the 19th International database engineering & applications symposium, pp 180–185
Ghosh M, Roy SS, Mukherjee H, Obaidullah SM, Santosh K, Roy K (2021) Understanding movie poster: Transfer-deep learning approach for graphic-rich text recognition. The Visual Computer, pp 1–20
Lucas SM, Panaretos A, Sosa L, Tang A, Wong S, Young R, Ashida K, Nagai H, Okamoto M, Yamamoto H et al (2005) Icdar 2003 robust reading competitions: entries, results, and future directions. International Journal of Document Analysis and Recognition (IJDAR) 7(2):105–122
Shi B, Yao C, Liao M, Yang M, Xu P, Cui L, Belongie S, Lu S, Bai X (2017) Icdar2017 competition on reading chinese text in the wild (rctw-17). In: 2017 14th Iapr International Conference on Document Analysis and Recognition (ICDAR), vol 1, pp 1429–1434. IEEE
Yuliang L, Lianwen J, Shuaitao Z, Sheng Z (2017) Detecting curve text in the wild: New dataset and new solution. arXiv:1712.02170
Sharma N, Mandal R, Sharma R, Pal U, Blumenstein M (2015) Icdar2015 competition on video script identification (cvsi 2015). In: 2015 13th International Conference on Document Analysis and Recognition (ICDAR), pp 1196–1200. IEEE
Demarty C-H, Penet C, Soleymani M, Gravier G (2015) Vsd, a public dataset for the detection of violent scenes in movies: design, annotation, analysis and evaluation. Multimed Tools Appl 74:7379–7404
Wehrmann J, Barros RC (2017) Movie genre classification: A multi-label approach based on convolutions through time. Appl Soft Comput 61:973–982
Bougiatiotis K, Giannakopoulos T (2018) Enhanced movie content similarity based on textual, auditory and visual information. Exp Syst Appl 96:86–102
Korai MA, Bouk AH, Sindhi AH (2021) Movie genre classification from rgb movie poster image using deep feed-forward network. Yanbu J Eng Sci 18(1):73–80
Chu W-T, Guo H-J (2017) Movie genre classification based on poster images with deep neural networks. In: Proceedings of the workshop on multimodal understanding of social, affective and subjective attributes, pp 39–45
Barney G, Kaya K (2019) Predicting genre from movie posters. Stanford CS 229: Machine Learning
Gozuacik N, Sakar CO (2019) Turkish movie genre classification from poster images using convolutional neural networks. In: 2019 11th International Conference on Electrical and Electronics Engineering (ELECO), pp 930–934. IEEE
Dewidar M (2019) Inferring movie genres from their poster. Learning 1
Bhunia AK, Kumar G, Roy PP, Balasubramanian R, Pal U (2018) Text recognition in scene image and video frame using color channel selection. Multimedia tools and applications 77(7):8551–8578
Tulsyan K, Srivastava N, Mondal A, Jawahar C (2020) A benchmark system for indian language text recognition. In: International workshop on document analysis systems, pp 74–88. Springer
Li H, Zhang Y, Bayramli B, Lu H (2023) Arbitrary shape scene text detector with accurate text instance generation based on instance-relevant contexts. Multimed Tools Appl 82(12):17827–17852
Guan T, Gu C, Lu C, Tu J, Feng Q, Wu K, Guan X (2022) Industrial scene text detection with refined feature-attentive network. IEEE Trans Circ Syst Vid Technol 32(9):6073–6085
Cai Y, Liu C, Cheng P, Du D, Zhang L, Wang W, Ye Q (2020) Scale-residual learning network for scene text detection. IEEE Trans Circ Syst Vid Technol 31(7):2725–2738
Singh GV, Firdaus M, Ekbal A, Bhattacharyya P (2022) Emoint-trans: A multimodal transformer for identifying emotions and intents in social conversations. IEEE/ACM Trans Aud Speech Lang Process 31:290–300
Firdaus M, Thakur N, Ekbal A (2022) Sentiment guided aspect conditioned dialogue generation in a multimodal system. In: European conference on information retrieval, pp 199–214. Springer
Mishra K, Firdaus M, Ekbal A (2022) Predicting politeness variations in goal-oriented conversations. IEEE Transactions on Computational Social Systems
Long S, He X, Yao C (2021) Scene text detection and recognition: The deep learning era. Int J Comput Vis 129(1):161–184
Kagan D, Levy M, Fire M, Alpert GF (2022) Ethnic representation analysis of commercial movie posters. arXiv:2207.08169
Rahane AA, Subramanian A (2020) Measures of complexity for large scale image datasets. In: 2020 International Conference on Artificial Intelligence in Information and Communication (ICAIIC), pp 282–287. IEEE
Peters RA, Strickland RN (1990) Image complexity metrics for automatic target recognizers. In: Automatic target recognizer system and technology conference, pp 1–17. Citeseer
Ghosh M, Obaidullah SM, Gherardini F, Zdimalova M (2021) Classification of geometric forms in mosaics using deep neural network. J Imaging 7(8):149
Ghosh M, Mukherjee H, Obaidullah SM, Santosh K, Das N, Roy K (2021) Lwsinet: A deep learning-based approach towards video script identification. Multimed Tools Appl 80(19):29095–29128
Ghosh M, Roy SS, Mukherjee H, Obaidullah SM, Gao X-Z, Roy K (2021) Movie title extraction and script separation using shallow convolution neural network. IEEE Access 9:125184–125201
Wang M, Zheng S, Li X, Qin X (2014) A new image denoising method based on gaussian filter. In: 2014 International conference on information science, electronics and electrical engineering, vol 1, pp 163–167. IEEE
BJ, BN, VA NA, Akhil A, et al (2021) A novel binarization method to remove verdigris from ancient metal image. In: 2021 5th International Conference on Intelligent Computing and Control Systems (ICICCS), pp 884–888. IEEE
Suzuki S et al (1985) Topological structural analysis of digitized binary images by border following. Comput Vis Graph Image Process 30(1):32–46
Huadong D, Yang W (2015) A new method for detecting rectangles and triangles. In: 2015 IEEE Advanced Information Technology, Electronic and Automation Control Conference (IAEAC), pp 321–327. IEEE
Firdaus M, Madasu A, Ekbal A (2023) A unified framework for slot based response generation in a multimodal dialogue system. arXiv:2305.17433
Xu X, Zhang Z, Wang Z, Price B, Wang Z, Shi H (2021) Rethinking text segmentation: A novel dataset and a text-specific refinement approach. In: Proceedings of the IEEE/CVF Conference on computer vision and pattern recognition, pp 12045–12055
Cao D, Dang J, Zhong Y (2021) Towards accurate scene text detection with bidirectional feature pyramid network. Symmetry 13(3):486
Zhou X, Yao C, Wen H, Wang Y, Zhou S, He W, Liang J (2017) East: an efficient and accurate scene text detector. In: Proceedings of the IEEE Conference on Computer Vision and pattern recognition, pp 5551–5560
Huang J, Pang G, Kovvuri R, Toh M, Liang KJ, Krishnan P, Yin X, Hassner T (2021) A multiplexed network for end-to-end, multilingual ocr. In: Proceedings of the IEEE/CVF Conference on computer vision and pattern recognition, pp 4547–4557
Sandler M, Howard A, Zhu M, Zhmoginov A, Chen L-C (2018) Mobilenetv2: Inverted residuals and linear bottlenecks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 4510–4520
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556
Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Zacharias E, Teuchler M, Bernier B (2020) Image processing based scene-text detection and recognition with tesseract. arXiv:2004.08079
Acknowledgements
The authors acknowledge that they have collected the images from various sources: IMDB, Teaser-Trailer, YTS, IMP awards, Incinemas, Movie Insider, koimoi, and IGN.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflicts of interest
The authors declare that they neither have received any funding for the present work nor have any conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Ghosh, M., Roy, S.S., Banik, B. et al. MOPO-HBT: A movie poster dataset for title extraction and recognition. Multimed Tools Appl 83, 54545–54568 (2024). https://doi.org/10.1007/s11042-023-17539-4
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-023-17539-4