LWSINet: A deep learning-based approach towards video script identification

Ghosh, Mridul; Mukherjee, Himadri; Obaidullah, Sk Md; Santosh, K. C.; Das, Nibaran; Roy, Kaushik

doi:10.1007/s11042-021-11103-8

LWSINet: A deep learning-based approach towards video script identification

Published: 17 June 2021

Volume 80, pages 29095–29128, (2021)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

325 Accesses
1 Altmetric
Explore all metrics

Abstract

Videos – a high volume of texts – broadcast via different media, such as television and the internet. Since Optical Character Recognition (OCR) engines are script-dependent, script identification is a precursor. Other than that, video script identification is not trivial as we have difficult issues, such as low resolution, complex background, noise, and blur effects. In this work, a deep learning-based system, which we call LWSINet: LightWeight Script Identification Network (6-layered CNN) is proposed to identify video scripts. For validation, we used a publicly available dataset named CVSI-15. Besides, the effects of three common noises namely, Salt & pepper, Gaussian and Poisson were considered on the scripts along with their hybridized metamorphosis. In our test results, we observed that the proposed CNN is coherent and robust enough to identify scripts in both scenarios, with and without noise. Further, we also employed other well-known handcrafted feature-based and deep learning approaches for a comparison.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

STDNet: A CNN-based approach to single-/mixed-script detection

Article 27 April 2021

ICDAR 2021 Competition on Script Identification in the Wild

CNN Based Transfer Learning for Scene Script Identification

References

Acharjya D, Anitha A (2017) A comparative study of statistical and rough computing models in predictive data analysis. Int J Ambient Comput Intell (IJACI) 8(2):32–51
Article Google Scholar
Awad A (2019) Denoising images corrupted with impulse Gaussian, or a mixture of impulse and Gaussian noise. Eng Sci Technol Int J 22(3):746–753
Google Scholar
Baljozović D, Kovačević B, Baljozović A (2013) Mixed noise removal filter for multi-channel images based on halfspace deepest location. IET Image Process 7(4):310–323
Article MathSciNet Google Scholar
Basu S, Sarkar R, Das N, Kundu M, Nasipuri M, Basu DK (2005) Handwritten Bangla digit recognition using classifier combination through DS technique. In: International conference on pattern recognition and machine intelligence. Springer, Berlin, pp. 236–241
Bhunia AK, Konwer A, Bhunia AK, Bhowmick A, Roy PP, Pal U (2019) Script identification in natural scene image and video frames using an attention based convolutional-LSTM network. Pattern Recogn 85:172–184
Article Google Scholar
Castellano G, De Caro D, Esposito D, Bifulco P, Napoli E, Petra N, Andreozzi E, Cesarelli M, Strollo AG (2019) An FPGA-oriented Algorithm for real-time filtering of poisson noise in video streams, with application to X-ray fluoroscopy. Circ Syst Signal Process 38(7):3269–3294
Article Google Scholar
Cheriet M, Suen CY (1993) Extraction of key letters for cursive script recognition. Pattern Recogn Lett 14(12):1009–1017
Article Google Scholar
Ghosh D, Dube T, Shivaprasad A (2010) Script recognition—a review. IEEE Trans Pattern Anal Mach Intell 32(12):2142–2161
Article Google Scholar
Gomez L, Nicolaou A, Karatzas D (2017) Improving patch-based scene text script identification with ensembles of conjoined networks. Pattern Recogn 67:85–96
Article Google Scholar
Han J, Zhang D, Cheng G, Liu N, Xu D (2018) Advanced deep-learning techniques for salient and category-specific object detection: a survey. IEEE Signal Proc Mag 35(1):84–100
Article Google Scholar
Han K, Wang Y, Tian Q, Guo J, Xu C, Xu C (2020) Ghostnet: More features from cheap operations. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 1580–1589
Howard A, Sandler M, Chu G, Chen LC, Chen B, Tan M, Wang W, Zhu Y, Pang R, vasudevan v et al (2019) Searching for mobilenetv3 In: Proceedings of the IEEE International Conference on Computer Vision, pp 1314–1324
Khaliq A, Ehsan S, Chen Z, Milford M, McDonald-Maier K (2019) A holistic visual place recognition approach using lightweight cnns for significant viewpoint and appearance changes. IEEE Trans Robot 36(2):561–569
Article Google Scholar
Kim WY, Kim YS (2000) A region-based shape descriptor using Zernike moments. Signal Process Image Commun 16(1-2):95–102
Article Google Scholar
Kingma DP, Ba JA (2014) A method for stochastic optimization. arXiv:1412.6980
LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444
Article Google Scholar
Liu Y, Xiao H, Wang W, Zhang M (2015) A robust motion detection algorithm on noisy videos. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, pp 1563–1567
Lu L, Yi Y, Huang F, Wang K, Wang Q (2019) Integrating local CNN and global CNN for script identification in natural scene images. IEEE Access 7:52669–52679
Article Google Scholar
Luisier F, Blu T, Unser M (2010) Image denoising in mixed Poisson–Gaussian noise. IEEE Trans Image Process 20(3):696–708
Article MathSciNet Google Scholar
Mei J, Dai L, Shi B, Bai X (2016) Scene text script identification with convolutional recurrent neural networks. In: 2016 23rd International Conference on Pattern Recognition (ICPR). IEEE, pp 4053–4058
Müller H, Müller W, Squire DM, Marchand-Maillet S, Pun T (2001) Performance evaluation in content-based image retrieval: overview and proposals. Pattern Recogn Lett 22(5):593–601
Article Google Scholar
Naz S, Umar AI, Ahmad R, Ahmed SB, Shirazi SH, Siddiqi I, Razzak MI (2016) Offline cursive Urdu-Nastaliq script recognition using multidimensional recurrent neural networks. Neurocomputing 177:228–241
Article Google Scholar
Obaidullah SM, Mondal A, Das N, Roy K (2014) Script identification from printed Indian document images and performance evaluation using different classifiers. Applied Computational Intelligence and Soft Computing
Obaidullah SM, Santosh KC, Das N, Halder C, Roy K (2018) Handwritten Indic script identification in multi-script document images: A survey. Int J Pattern Recogn Artif Intell 32(10):1856012
Obaidullah SM, Halder C, Santosh KC, Das N, Roy K (2018) PHDIndic_11: page-level handwritten document image dataset of 11 official Indic scripts for script identification. Multimed Tools Appl 77(2):1643–1678
Article Google Scholar
Obaidullah SM, Santosh KC, Halder C, Das N, Roy K (2019) Automatic Indic script identification from handwritten documents: page, block, line and word-level approach. Int J Mach Learn Cybern 10(1):87–106
Article Google Scholar
Pal U, Sinha S, Chaudhuri BB (2003) Multi-script line identification from Indian documents. In: Seventh International Conference on Document Analysis and Recognition, 2003 Proceedings. IEEE, pp 880–884
Pal U, Roy PP, Tripathy N, lladós J (2010) Multi-oriented Bangla and Devnagari text recognition. Pattern Recogn 43(12):4124–4136
Article Google Scholar
Petrovska B, Atanasova-Pacemska T, Corizzo R, Mignone P, Lameski P, Zdravevski E (2020) Aerial scene classification through fine-tuning with adaptive learning rates and label smoothing. Appl Sci 10(17):5792
Article Google Scholar
Rodriguez-Galiano VF, Ghimire B, Rogan J, Chica-Olmo M, Rigol-Sanchez JP (2012) An assessment of the effectiveness of a random forest classifier for land-cover classification. ISPRS J Photogr Remote Sens 67:93–104
Article Google Scholar
Roy S, Shivakumara P, Roy PP, Pal U, Tan CL, Lu T (2015) Bayesian classifier for multi-oriented video text recognition system. Expert Syst Appl 42(13):5554–5566
Article Google Scholar
Sharma N, Shivakumara P, Pal U, Blumenstein M, Tan CL (2012) A new method forword segmentation from arbitrarily-oriented video text lines. In: 2012 International conference on digital image computing techniques and applications (DICTA). IEEE, pp 1–8
Sharma N, Pal U, Blumenstein M (2014) A study on word-level multi-script identification from video frames. In: 2014 International joint conference on neural networks(IJCNN). IEEE, pp 1827–1833
Sharma N, Mandal R, Sharma R, Pal U, Blumenstein M (2015) ICDAR2015 competition on video script identification (CVSI. In: 2015 13th International Conference on Document Analysis and Recognition (ICDAR). IEEE, pp 1196–1200
Shi B, Yao C, Zhang C, Guo X, Huang F, Bai X (2015) Automatic script identification in the wild. In: 2015 13th International Conference on Document Analysis and Recognition (ICDAR). IEEE, pp 531–535
Shi B, Bai X, Yao C (2016) Script identification in the wild via discriminative convolutional neural network. Pattern Recogn 52:448–458
Article Google Scholar
Shijian L, Tan CL (2007) Script and language identification in noisy and degraded document images. IEEE Trans Pattern Anal Mach Intell 30(1):14–24
Article Google Scholar
Shivakumara P, Sharma N, Pal U, Blumenstein M, Tan CL (2014) Gradient-angular-features for word-wise video script identification. In: 2014 22nd International Conference on Pattern Recognition. IEEE, pp 3098–3103
Shivakumara P, Yuan Z, Zhao D, Lu T, Tan CL (2015) New gradient-spatial-structural features for video script identification. Comput Vis Image Underst 130:35–53
Article Google Scholar
Singh PK, Sarkar R, Das N, Basu S, Kundu M, Nasipuri M (2018) Benchmark databases of handwritten Bangla-Roman and Devanagari-Roman mixed-script document images. Multimed Tools Appl 77(7):8441–8473
Article Google Scholar
Szegedy C, Ioffe S, Vanhoucke V, Alemi A (2016) Inception-v4, inception-resnet and the impact of residual connections on learning. arXiv:1602.07261
Soh LK, Tsatsoulis C (1999) Texture analysis of SAR sea ice imagery using gray level co-occurrence matrices. IEEE Trans Geosci Remote Sens 37 (2):780–795
Article Google Scholar
Thanh DNH, Dvoenko SD (2016) A method of total variation to remove the mixed Poisson-Gaussian noise. Pattern Recogn Image Anal 26(2):285–293
Article Google Scholar
Ukil S, Ghosh S, Obaidullah SM, Santosh KC, Roy K, Das N (2018) Deep learning for word-level handwritten Indic script identification. arXiv:1801.01627
Ul-Hasan A, Afzal MZ, Shafait F, Liwicki M, Breuel TM (2015) A sequence learning approach for multiple script identification. In: 2015 13th International Conference on Document Analysis and Recognition (ICDAR). IEEE, pp 1046–1050
Wojna Z, Gorban AN, Lee DS, Murphy K, Yu Q, Li Y, Ibarz J (2017) Attention-based extraction of structured information from street view imagery. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol 1. IEEE, pp 844–850
Wong EK, Chen M (2003) A new robust algorithm for video text extraction. Pattern Recogn 36(6):1397–1406
Article Google Scholar
Yeung S, Ramanathan V, Russakovsky O, Shen L, Mori G, Fei-Fei L (2017) Learning to learn from noisy web videos. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 5154–5162
Zhang P, Shi Z, Gao H (2018) Research on Text Location and Recognition in Natural Images with Deep Learning. In: Proceedings of the 2nd International Conference on Advances in Artificial Intelligence, pp 1–6
Zhou L, Lu Y, Tan CL (2006) Bangla/ english script identification based on analysis of connected component profiles. In: International workshop on document analysis systems. Springer, pp 243–254

Download references

Author information

Authors and Affiliations

Department of Computer Science, Shyampur Siddheswari Mahavidyalaya, Howrah, India
Mridul Ghosh
Department of Computer Science, Engineering, Aliah University, Kolkata, India
Mridul Ghosh & Sk Md Obaidullah
Department of Computer Science, West Bengal State University, Kolkata, India
Himadri Mukherjee & Kaushik Roy
KC’s PAMI Research Lab - Computer Science, University of South Dakota, South Dakota, USA
K. C. Santosh
Department of Computer Science, Jadavpur University, Kolkata, India
Nibaran Das

Authors

Mridul Ghosh
View author publications
You can also search for this author in PubMed Google Scholar
Himadri Mukherjee
View author publications
You can also search for this author in PubMed Google Scholar
Sk Md Obaidullah
View author publications
You can also search for this author in PubMed Google Scholar
K. C. Santosh
View author publications
You can also search for this author in PubMed Google Scholar
Nibaran Das
View author publications
You can also search for this author in PubMed Google Scholar
Kaushik Roy
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kaushik Roy.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ghosh, M., Mukherjee, H., Obaidullah, S.M. et al. LWSINet: A deep learning-based approach towards video script identification. Multimed Tools Appl 80, 29095–29128 (2021). https://doi.org/10.1007/s11042-021-11103-8

Download citation

Received: 28 August 2020
Revised: 07 January 2021
Accepted: 21 May 2021
Published: 17 June 2021
Issue Date: August 2021
DOI: https://doi.org/10.1007/s11042-021-11103-8

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

LWSINet: A deep learning-based approach towards video script identification

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

STDNet: A CNN-based approach to single-/mixed-script detection

ICDAR 2021 Competition on Script Identification in the Wild

CNN Based Transfer Learning for Scene Script Identification

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now