Abstract
Videos – a high volume of texts – broadcast via different media, such as television and the internet. Since Optical Character Recognition (OCR) engines are script-dependent, script identification is a precursor. Other than that, video script identification is not trivial as we have difficult issues, such as low resolution, complex background, noise, and blur effects. In this work, a deep learning-based system, which we call LWSINet: LightWeight Script Identification Network (6-layered CNN) is proposed to identify video scripts. For validation, we used a publicly available dataset named CVSI-15. Besides, the effects of three common noises namely, Salt & pepper, Gaussian and Poisson were considered on the scripts along with their hybridized metamorphosis. In our test results, we observed that the proposed CNN is coherent and robust enough to identify scripts in both scenarios, with and without noise. Further, we also employed other well-known handcrafted feature-based and deep learning approaches for a comparison.
Similar content being viewed by others
References
Acharjya D, Anitha A (2017) A comparative study of statistical and rough computing models in predictive data analysis. Int J Ambient Comput Intell (IJACI) 8(2):32–51
Awad A (2019) Denoising images corrupted with impulse Gaussian, or a mixture of impulse and Gaussian noise. Eng Sci Technol Int J 22(3):746–753
Baljozović D, Kovačević B, Baljozović A (2013) Mixed noise removal filter for multi-channel images based on halfspace deepest location. IET Image Process 7(4):310–323
Basu S, Sarkar R, Das N, Kundu M, Nasipuri M, Basu DK (2005) Handwritten Bangla digit recognition using classifier combination through DS technique. In: International conference on pattern recognition and machine intelligence. Springer, Berlin, pp. 236–241
Bhunia AK, Konwer A, Bhunia AK, Bhowmick A, Roy PP, Pal U (2019) Script identification in natural scene image and video frames using an attention based convolutional-LSTM network. Pattern Recogn 85:172–184
Castellano G, De Caro D, Esposito D, Bifulco P, Napoli E, Petra N, Andreozzi E, Cesarelli M, Strollo AG (2019) An FPGA-oriented Algorithm for real-time filtering of poisson noise in video streams, with application to X-ray fluoroscopy. Circ Syst Signal Process 38(7):3269–3294
Cheriet M, Suen CY (1993) Extraction of key letters for cursive script recognition. Pattern Recogn Lett 14(12):1009–1017
Ghosh D, Dube T, Shivaprasad A (2010) Script recognition—a review. IEEE Trans Pattern Anal Mach Intell 32(12):2142–2161
Gomez L, Nicolaou A, Karatzas D (2017) Improving patch-based scene text script identification with ensembles of conjoined networks. Pattern Recogn 67:85–96
Han J, Zhang D, Cheng G, Liu N, Xu D (2018) Advanced deep-learning techniques for salient and category-specific object detection: a survey. IEEE Signal Proc Mag 35(1):84–100
Han K, Wang Y, Tian Q, Guo J, Xu C, Xu C (2020) Ghostnet: More features from cheap operations. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 1580–1589
Howard A, Sandler M, Chu G, Chen LC, Chen B, Tan M, Wang W, Zhu Y, Pang R, vasudevan v et al (2019) Searching for mobilenetv3 In: Proceedings of the IEEE International Conference on Computer Vision, pp 1314–1324
Khaliq A, Ehsan S, Chen Z, Milford M, McDonald-Maier K (2019) A holistic visual place recognition approach using lightweight cnns for significant viewpoint and appearance changes. IEEE Trans Robot 36(2):561–569
Kim WY, Kim YS (2000) A region-based shape descriptor using Zernike moments. Signal Process Image Commun 16(1-2):95–102
Kingma DP, Ba JA (2014) A method for stochastic optimization. arXiv:1412.6980
LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444
Liu Y, Xiao H, Wang W, Zhang M (2015) A robust motion detection algorithm on noisy videos. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, pp 1563–1567
Lu L, Yi Y, Huang F, Wang K, Wang Q (2019) Integrating local CNN and global CNN for script identification in natural scene images. IEEE Access 7:52669–52679
Luisier F, Blu T, Unser M (2010) Image denoising in mixed Poisson–Gaussian noise. IEEE Trans Image Process 20(3):696–708
Mei J, Dai L, Shi B, Bai X (2016) Scene text script identification with convolutional recurrent neural networks. In: 2016 23rd International Conference on Pattern Recognition (ICPR). IEEE, pp 4053–4058
Müller H, Müller W, Squire DM, Marchand-Maillet S, Pun T (2001) Performance evaluation in content-based image retrieval: overview and proposals. Pattern Recogn Lett 22(5):593–601
Naz S, Umar AI, Ahmad R, Ahmed SB, Shirazi SH, Siddiqi I, Razzak MI (2016) Offline cursive Urdu-Nastaliq script recognition using multidimensional recurrent neural networks. Neurocomputing 177:228–241
Obaidullah SM, Mondal A, Das N, Roy K (2014) Script identification from printed Indian document images and performance evaluation using different classifiers. Applied Computational Intelligence and Soft Computing
Obaidullah SM, Santosh KC, Das N, Halder C, Roy K (2018) Handwritten Indic script identification in multi-script document images: A survey. Int J Pattern Recogn Artif Intell 32(10):1856012
Obaidullah SM, Halder C, Santosh KC, Das N, Roy K (2018) PHDIndic_11: page-level handwritten document image dataset of 11 official Indic scripts for script identification. Multimed Tools Appl 77(2):1643–1678
Obaidullah SM, Santosh KC, Halder C, Das N, Roy K (2019) Automatic Indic script identification from handwritten documents: page, block, line and word-level approach. Int J Mach Learn Cybern 10(1):87–106
Pal U, Sinha S, Chaudhuri BB (2003) Multi-script line identification from Indian documents. In: Seventh International Conference on Document Analysis and Recognition, 2003 Proceedings. IEEE, pp 880–884
Pal U, Roy PP, Tripathy N, lladós J (2010) Multi-oriented Bangla and Devnagari text recognition. Pattern Recogn 43(12):4124–4136
Petrovska B, Atanasova-Pacemska T, Corizzo R, Mignone P, Lameski P, Zdravevski E (2020) Aerial scene classification through fine-tuning with adaptive learning rates and label smoothing. Appl Sci 10(17):5792
Rodriguez-Galiano VF, Ghimire B, Rogan J, Chica-Olmo M, Rigol-Sanchez JP (2012) An assessment of the effectiveness of a random forest classifier for land-cover classification. ISPRS J Photogr Remote Sens 67:93–104
Roy S, Shivakumara P, Roy PP, Pal U, Tan CL, Lu T (2015) Bayesian classifier for multi-oriented video text recognition system. Expert Syst Appl 42(13):5554–5566
Sharma N, Shivakumara P, Pal U, Blumenstein M, Tan CL (2012) A new method forword segmentation from arbitrarily-oriented video text lines. In: 2012 International conference on digital image computing techniques and applications (DICTA). IEEE, pp 1–8
Sharma N, Pal U, Blumenstein M (2014) A study on word-level multi-script identification from video frames. In: 2014 International joint conference on neural networks(IJCNN). IEEE, pp 1827–1833
Sharma N, Mandal R, Sharma R, Pal U, Blumenstein M (2015) ICDAR2015 competition on video script identification (CVSI. In: 2015 13th International Conference on Document Analysis and Recognition (ICDAR). IEEE, pp 1196–1200
Shi B, Yao C, Zhang C, Guo X, Huang F, Bai X (2015) Automatic script identification in the wild. In: 2015 13th International Conference on Document Analysis and Recognition (ICDAR). IEEE, pp 531–535
Shi B, Bai X, Yao C (2016) Script identification in the wild via discriminative convolutional neural network. Pattern Recogn 52:448–458
Shijian L, Tan CL (2007) Script and language identification in noisy and degraded document images. IEEE Trans Pattern Anal Mach Intell 30(1):14–24
Shivakumara P, Sharma N, Pal U, Blumenstein M, Tan CL (2014) Gradient-angular-features for word-wise video script identification. In: 2014 22nd International Conference on Pattern Recognition. IEEE, pp 3098–3103
Shivakumara P, Yuan Z, Zhao D, Lu T, Tan CL (2015) New gradient-spatial-structural features for video script identification. Comput Vis Image Underst 130:35–53
Singh PK, Sarkar R, Das N, Basu S, Kundu M, Nasipuri M (2018) Benchmark databases of handwritten Bangla-Roman and Devanagari-Roman mixed-script document images. Multimed Tools Appl 77(7):8441–8473
Szegedy C, Ioffe S, Vanhoucke V, Alemi A (2016) Inception-v4, inception-resnet and the impact of residual connections on learning. arXiv:1602.07261
Soh LK, Tsatsoulis C (1999) Texture analysis of SAR sea ice imagery using gray level co-occurrence matrices. IEEE Trans Geosci Remote Sens 37 (2):780–795
Thanh DNH, Dvoenko SD (2016) A method of total variation to remove the mixed Poisson-Gaussian noise. Pattern Recogn Image Anal 26(2):285–293
Ukil S, Ghosh S, Obaidullah SM, Santosh KC, Roy K, Das N (2018) Deep learning for word-level handwritten Indic script identification. arXiv:1801.01627
Ul-Hasan A, Afzal MZ, Shafait F, Liwicki M, Breuel TM (2015) A sequence learning approach for multiple script identification. In: 2015 13th International Conference on Document Analysis and Recognition (ICDAR). IEEE, pp 1046–1050
Wojna Z, Gorban AN, Lee DS, Murphy K, Yu Q, Li Y, Ibarz J (2017) Attention-based extraction of structured information from street view imagery. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol 1. IEEE, pp 844–850
Wong EK, Chen M (2003) A new robust algorithm for video text extraction. Pattern Recogn 36(6):1397–1406
Yeung S, Ramanathan V, Russakovsky O, Shen L, Mori G, Fei-Fei L (2017) Learning to learn from noisy web videos. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 5154–5162
Zhang P, Shi Z, Gao H (2018) Research on Text Location and Recognition in Natural Images with Deep Learning. In: Proceedings of the 2nd International Conference on Advances in Artificial Intelligence, pp 1–6
Zhou L, Lu Y, Tan CL (2006) Bangla/ english script identification based on analysis of connected component profiles. In: International workshop on document analysis systems. Springer, pp 243–254
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Ghosh, M., Mukherjee, H., Obaidullah, S.M. et al. LWSINet: A deep learning-based approach towards video script identification. Multimed Tools Appl 80, 29095–29128 (2021). https://doi.org/10.1007/s11042-021-11103-8
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-021-11103-8