Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

MOPO-HBT: A movie poster dataset for title extraction and recognition

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Real-world images often encompass embedded texts that adhere to disparate disciplines like business, education, and amusement, to name a few. Such images are graphically rich in terms of font attributes, color distribution, foreground-background similarity, and component organization. This aggravates the difficulty of recognizing texts from these images. Such characteristics are very prominent in the case of movie posters. One of the first pieces of information on movie posters is the title. Automatic recognition of movie titles from images can aid in efficient indexing as well as information conveyance. However, it is accompanied by other texts like the names of actors, producers, taglines, dates, etc. Though the organization of components is somewhat similar across different film industries like Tollywood (West Bengal), Bollywood (Mumbai), and Hollywood (Los Angeles), the graffiti patterns differ in multifarious instances. To address the problem of movie title understanding, we propose a dataset named MOvie POsters-Hollywood Bollywood Tollywood (MOPO-HBT) that encompasses movie posters from the aforementioned industries. The entire dataset is publicly available (http://ieee-dataport.org/11564) for research purposes. The baseline results of title identification and recognition were obtained with a CNN-based (Convolutional Neural Network) approach, wherein the titles were extracted using the M-EAST (Modified-Efficient and Accurate Scene Text) detector model.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Algorithm 1
Algorithm 2
Algorithm 3
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Data Availibility Statement

The MOPO-HBT dataset that supports the findings of this study is openly available in the public repository, IEEE Dataport, at https://www.ieee-dataport.org/documents/mopo-hbt

Notes

  1. The dataset is available on http://ieee-dataport.org/11564

  2. https://www.teaser-trailer.com, visited on 14.01.2022

  3. https://www.yts.mx, visited on 12.01.2022

  4. https://www.imdb.com/, visited on 12.01.2022

  5. http://www.impawards.com/, visited on 15.01.2022

  6. https://www.incinemas.sg, visited on 22.02.2022

  7. https://www.movieinsider.com/, visited on 12.12.2021

  8. https://www.koimoi.com/, visited on 16.01.2021

  9. https://www.in.ign.com, visited on 13.01.2022

  10. https://www.github.com/mridulxyz/MOPO/blob/main/Angle%20extraction

  11. https://www.github.com/mridulxyz/MOPO/blob/main/Coordinate%20extraction

  12. Obtained from Python library

  13. https://www.github.com/mridulxyz/MOPO/commit/229561ffc37355fffb7775e8a67583ab30a78c83

Abbreviations

MOPO-HBT :

MOvie POsters-Hollywood Bollywood Tollywood

CNN :

Convolutional Neural Network

RRC :

Robust Reading Competitions

M-EAST :

Modified- Efficient and Accurate Scene Text

ICDAR :

International Conference on Document Analysis and Recognition

SVT :

Street View Text

CUTE80 :

Curve Text 80

COCO-Text :

Common Objects in Context-Text

CVSI-15 :

Competition on Video Script Identification-15

SVM :

Support Vector Machine

RFN :

Refined Feature Attentive Network

ViTSTR :

Vision Transformer for Scene Text Recognition

BiFPN :

Bi-directional Feature Pyramid Network

EAST :

Efficient and Accurate Scene Text

TexRNet :

Text Refinement Network

VGG-19 :

Visual Geometry Group-19

ROI :

Region of Interest

IMDB :

Internet Movie Database

YTS :

YIFI Torrent Site

IMP :

Independent Moving Pictures Company

IGN :

Imagine Games Network

CUDA :

Compute Unified Device Architecture

ROC :

Receiver Operating Characteristic

AUC :

Area Under the Curve

UOI :

Union Over Intersection

References

  1. Ghosh M, Mukherjee H, Obaidullah SM, Gao X-Z, Roy K (2023) Scene text understanding: recapitulating the past decade. Artificial Intelligence Review, pp 1–73

  2. Luo C, Lin Q, Liu Y, Jin L, Shen C (2021) Separating content from style using adversarial learning for recognizing text in the wild. Int J Comput Vis 129(4):960–976

    Article  MathSciNet  Google Scholar 

  3. Bai X, Shi B, Zhang C, Cai X, Qi L (2017) Text/non-text image classification in the wild with convolutional neural networks. Pattern Recognit 66:437–446

    Article  Google Scholar 

  4. Ghosh M, Mukherjee H, Obaidullah SM, Santosh K, Das N, Roy K (2019) Identifying the presence of graphical texts in scene images using cnn. In: 2019 International Conference on Document Analysis and Recognition Workshops (ICDARW), vol 1, pp 86–91. IEEE

  5. Ghosh M, Mukherjee H, Obaidullah SM, Roy K (2021) Stdnet: A cnn-based approach to single-/mixed-script detection. Innov Syst Softw Eng 17(3):277–288

    Article  Google Scholar 

  6. Ghosh M, Baidya G, Mukherjee H, Obaidullah SM, Roy K (2022) A deep learning-based approach to single/mixed script-type identification. In: Advanced computing and systems for security: vol 13, pp 121–132. Springer

  7. Veit A, Matera T, Neumann L, Matas J, Belongie S (2016) Coco-text: Dataset and benchmark for text detection and recognition in natural images. arXiv:1601.07140

  8. Gomez L, Nicolaou A, Karatzas D (2017) Improving patch-based scene text script identification with ensembles of conjoined networks. Pattern Recognit 67:85–96

    Article  Google Scholar 

  9. Liao M, Zou Z, Wan Z, Yao C, Bai X (2022) Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Trans Pattern Anal Mach Intell 45(1):919–931

    Article  Google Scholar 

  10. Ma J, Shao W, Ye H, Wang L, Wang H, Zheng Y, Xue X (2018) Arbitrary-oriented scene text detection via rotation proposals. IEEE Trans Multimed 20(11):3111–3122

    Article  Google Scholar 

  11. Saha S, Chakraborty N, Kundu S, Paul S, Mollah AF, Basu S, Sarkar R (2020) Multi-lingual scene text detection and language identification. Pattern Recognit Lett 138:16–22

    Article  Google Scholar 

  12. Yao C, Bai X, Liu W, Ma Y, Tu Z (2012) Detecting texts of arbitrary orientations in natural images. In: 2012 IEEE Conference on computer vision and pattern recognition, pp 1083–1090. IEEE

  13. Wang, K, Babenko B, Belongie S (2011) End-to-end scene text recognition. In: 2011 International conference on computer vision, pp 1457–1464. IEEE

  14. Risnumawan A, Shivakumara P, Chan CS, Tan CL (2014) A robust arbitrary text detection system for natural scene images. Exp Syst Appl 41(18):8027–8048

    Article  Google Scholar 

  15. Shi B, Bai X, Yao C (2016) An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans Pattern Anal Mach Intell 39(11):2298–2304

    Article  Google Scholar 

  16. Liu X, Meng G, Pan C (2019) Scene text detection and recognition with advances in deep learning: a survey. International Journal on Document Analysis and Recognition (IJDAR) 22(2):143–162

    Article  Google Scholar 

  17. Huang Y-F, Hsieh M-C (2015) Text extraction and recognition from posters for movie title retrieval. In: Proceedings of the 19th International database engineering & applications symposium, pp 180–185

  18. Ghosh M, Roy SS, Mukherjee H, Obaidullah SM, Santosh K, Roy K (2021) Understanding movie poster: Transfer-deep learning approach for graphic-rich text recognition. The Visual Computer, pp 1–20

  19. Lucas SM, Panaretos A, Sosa L, Tang A, Wong S, Young R, Ashida K, Nagai H, Okamoto M, Yamamoto H et al (2005) Icdar 2003 robust reading competitions: entries, results, and future directions. International Journal of Document Analysis and Recognition (IJDAR) 7(2):105–122

    Article  Google Scholar 

  20. Shi B, Yao C, Liao M, Yang M, Xu P, Cui L, Belongie S, Lu S, Bai X (2017) Icdar2017 competition on reading chinese text in the wild (rctw-17). In: 2017 14th Iapr International Conference on Document Analysis and Recognition (ICDAR), vol 1, pp 1429–1434. IEEE

  21. Yuliang L, Lianwen J, Shuaitao Z, Sheng Z (2017) Detecting curve text in the wild: New dataset and new solution. arXiv:1712.02170

  22. Sharma N, Mandal R, Sharma R, Pal U, Blumenstein M (2015) Icdar2015 competition on video script identification (cvsi 2015). In: 2015 13th International Conference on Document Analysis and Recognition (ICDAR), pp 1196–1200. IEEE

  23. Demarty C-H, Penet C, Soleymani M, Gravier G (2015) Vsd, a public dataset for the detection of violent scenes in movies: design, annotation, analysis and evaluation. Multimed Tools Appl 74:7379–7404

    Article  Google Scholar 

  24. Wehrmann J, Barros RC (2017) Movie genre classification: A multi-label approach based on convolutions through time. Appl Soft Comput 61:973–982

    Article  Google Scholar 

  25. Bougiatiotis K, Giannakopoulos T (2018) Enhanced movie content similarity based on textual, auditory and visual information. Exp Syst Appl 96:86–102

    Article  Google Scholar 

  26. Korai MA, Bouk AH, Sindhi AH (2021) Movie genre classification from rgb movie poster image using deep feed-forward network. Yanbu J Eng Sci 18(1):73–80

    Google Scholar 

  27. Chu W-T, Guo H-J (2017) Movie genre classification based on poster images with deep neural networks. In: Proceedings of the workshop on multimodal understanding of social, affective and subjective attributes, pp 39–45

  28. Barney G, Kaya K (2019) Predicting genre from movie posters. Stanford CS 229: Machine Learning

  29. Gozuacik N, Sakar CO (2019) Turkish movie genre classification from poster images using convolutional neural networks. In: 2019 11th International Conference on Electrical and Electronics Engineering (ELECO), pp 930–934. IEEE

  30. Dewidar M (2019) Inferring movie genres from their poster. Learning 1

  31. Bhunia AK, Kumar G, Roy PP, Balasubramanian R, Pal U (2018) Text recognition in scene image and video frame using color channel selection. Multimedia tools and applications 77(7):8551–8578

    Article  Google Scholar 

  32. Tulsyan K, Srivastava N, Mondal A, Jawahar C (2020) A benchmark system for indian language text recognition. In: International workshop on document analysis systems, pp 74–88. Springer

  33. Li H, Zhang Y, Bayramli B, Lu H (2023) Arbitrary shape scene text detector with accurate text instance generation based on instance-relevant contexts. Multimed Tools Appl 82(12):17827–17852

    Article  Google Scholar 

  34. Guan T, Gu C, Lu C, Tu J, Feng Q, Wu K, Guan X (2022) Industrial scene text detection with refined feature-attentive network. IEEE Trans Circ Syst Vid Technol 32(9):6073–6085

    Article  Google Scholar 

  35. Cai Y, Liu C, Cheng P, Du D, Zhang L, Wang W, Ye Q (2020) Scale-residual learning network for scene text detection. IEEE Trans Circ Syst Vid Technol 31(7):2725–2738

    Article  Google Scholar 

  36. Singh GV, Firdaus M, Ekbal A, Bhattacharyya P (2022) Emoint-trans: A multimodal transformer for identifying emotions and intents in social conversations. IEEE/ACM Trans Aud Speech Lang Process 31:290–300

    Article  Google Scholar 

  37. Firdaus M, Thakur N, Ekbal A (2022) Sentiment guided aspect conditioned dialogue generation in a multimodal system. In: European conference on information retrieval, pp 199–214. Springer

  38. Mishra K, Firdaus M, Ekbal A (2022) Predicting politeness variations in goal-oriented conversations. IEEE Transactions on Computational Social Systems

  39. Long S, He X, Yao C (2021) Scene text detection and recognition: The deep learning era. Int J Comput Vis 129(1):161–184

    Article  Google Scholar 

  40. Kagan D, Levy M, Fire M, Alpert GF (2022) Ethnic representation analysis of commercial movie posters. arXiv:2207.08169

  41. Rahane AA, Subramanian A (2020) Measures of complexity for large scale image datasets. In: 2020 International Conference on Artificial Intelligence in Information and Communication (ICAIIC), pp 282–287. IEEE

  42. Peters RA, Strickland RN (1990) Image complexity metrics for automatic target recognizers. In: Automatic target recognizer system and technology conference, pp 1–17. Citeseer

  43. Ghosh M, Obaidullah SM, Gherardini F, Zdimalova M (2021) Classification of geometric forms in mosaics using deep neural network. J Imaging 7(8):149

    Article  Google Scholar 

  44. Ghosh M, Mukherjee H, Obaidullah SM, Santosh K, Das N, Roy K (2021) Lwsinet: A deep learning-based approach towards video script identification. Multimed Tools Appl 80(19):29095–29128

    Article  Google Scholar 

  45. Ghosh M, Roy SS, Mukherjee H, Obaidullah SM, Gao X-Z, Roy K (2021) Movie title extraction and script separation using shallow convolution neural network. IEEE Access 9:125184–125201

    Article  Google Scholar 

  46. Wang M, Zheng S, Li X, Qin X (2014) A new image denoising method based on gaussian filter. In: 2014 International conference on information science, electronics and electrical engineering, vol 1, pp 163–167. IEEE

  47. BJ, BN, VA NA, Akhil A, et al (2021) A novel binarization method to remove verdigris from ancient metal image. In: 2021 5th International Conference on Intelligent Computing and Control Systems (ICICCS), pp 884–888. IEEE

  48. Suzuki S et al (1985) Topological structural analysis of digitized binary images by border following. Comput Vis Graph Image Process 30(1):32–46

    Article  Google Scholar 

  49. Huadong D, Yang W (2015) A new method for detecting rectangles and triangles. In: 2015 IEEE Advanced Information Technology, Electronic and Automation Control Conference (IAEAC), pp 321–327. IEEE

  50. Firdaus M, Madasu A, Ekbal A (2023) A unified framework for slot based response generation in a multimodal dialogue system. arXiv:2305.17433

  51. Xu X, Zhang Z, Wang Z, Price B, Wang Z, Shi H (2021) Rethinking text segmentation: A novel dataset and a text-specific refinement approach. In: Proceedings of the IEEE/CVF Conference on computer vision and pattern recognition, pp 12045–12055

  52. Cao D, Dang J, Zhong Y (2021) Towards accurate scene text detection with bidirectional feature pyramid network. Symmetry 13(3):486

    Article  Google Scholar 

  53. Zhou X, Yao C, Wen H, Wang Y, Zhou S, He W, Liang J (2017) East: an efficient and accurate scene text detector. In: Proceedings of the IEEE Conference on Computer Vision and pattern recognition, pp 5551–5560

  54. Huang J, Pang G, Kovvuri R, Toh M, Liang KJ, Krishnan P, Yin X, Hassner T (2021) A multiplexed network for end-to-end, multilingual ocr. In: Proceedings of the IEEE/CVF Conference on computer vision and pattern recognition, pp 4547–4557

  55. Sandler M, Howard A, Zhu M, Zhmoginov A, Chen L-C (2018) Mobilenetv2: Inverted residuals and linear bottlenecks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 4510–4520

  56. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556

  57. Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

  58. Zacharias E, Teuchler M, Bernier B (2020) Image processing based scene-text detection and recognition with tesseract. arXiv:2004.08079

Download references

Acknowledgements

The authors acknowledge that they have collected the images from various sources: IMDB, Teaser-Trailer, YTS, IMP awards, Incinemas, Movie Insider, koimoi, and IGN.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kaushik Roy.

Ethics declarations

Conflicts of interest

The authors declare that they neither have received any funding for the present work nor have any conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

Fig. 7
figure 7

Navigating Complexity: a glimpse of challenging text extraction scenarios in the MOPO dataset

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ghosh, M., Roy, S.S., Banik, B. et al. MOPO-HBT: A movie poster dataset for title extraction and recognition. Multimed Tools Appl 83, 54545–54568 (2024). https://doi.org/10.1007/s11042-023-17539-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-023-17539-4

Keywords