research-article

Are Attention blocks better than BiLSTM for text recognition?

Authors:

Amine Mohamed Belhakimi,

Guillaume Chiron,

Florian Arrestier,

Ahmad Montaser AwalAuthors Info & Claims

ICMLT '23: Proceedings of the 2023 8th International Conference on Machine Learning Technologies

Pages 201 - 207

https://doi.org/10.1145/3589883.3589914

Published: 27 June 2023 Publication History

Abstract

This paper studies the impact of using Sequential Attention blocks versus Bidirectional Long-Short-Term Memory (BiLSTM) layers for Optical Character Recognition (OCR). The main target is to improve the inference time – specifically on CPU – of state-of-the-art OCRs, with also the additional constraint of being trainable with only a restricted amount of data. While OCR research often focuses on improving recognition accuracy, there has been little emphasis on optimizing processing speed and model weights. In this context, experimental results presented in this paper show the superiority of Attention blocks compared to BiLSTM layers. Attention blocks appear to be up to 5x faster on CPU, while achieving better and similar decoding rates on a typical industrial dataset of identity document text fields and publicly available Scene Text Recognition (STR) datasets, respectively. Also, in addition to being faster and accurate, which was the primary goal, it appears that Attention blocks lead to lighter models.

References

[1]

Baoguang Shi, Xiang Bai, and Cong Yao. An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. In TPAMI, volume 39, pages 2298–2304. IEEE, 2017. 1, 2, 4, 5, 10, 11, 12

[2]

Jouanne, J., Dauchy, Q., Awal, A.M. (2021). PyraD-DCNN: A Fully Convolutional Neural Network to Replace BLSTM in Offline Text Recognition Systems. In:, Pattern Recognition. ICPR International Workshops and Challenges. ICPR 2021. Lecture Notes in Computer Science(), vol 12661. Springer, Cham. https://doi.org/10.1007/978-3-030-68763-2_49

Digital Library

[3]

Atul Adya, Paramvir Bahl, Jitendra Padhye, Alec Wolman, and Lidong Zhou. 2004. A multi-radio unification protocol for IEEE 802.11 wireless networks. In Proceedings of the IEEE 1st International Conference on Broadnets Networks (BroadNets’04) . IEEE, Los Alamitos, CA, 210–217. https://doi.org/10.1109/BROADNETS.2004.8

Digital Library

[4]

Martin A. Fischler and Robert C. Bolles. 1981. Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 24, 6 (June 1981), 381–395. https://doi.org/10.1145/358669.358692

Digital Library

[5]

Jianfeng Wang and Xiaolin Hu. Gated recurrent convolution neural network for ocr. In NIPS, pages 334–343, 2017. 1, 2, 4, 10, 11, 12

[6]

Coquenet, D., Soullard, Y., Chatelain, C., Paquet, T.: Have Convolutions Already Made Recurrence Obsolete for Unconstrained Handwritten Text Recognition? In: 2019 International Conference on Document Analysis and Recognition workshops (ICDARW).pp.65–70.IEEE, Sydney, Australia (Sep 2019). https://doi.org/10.1109/ICDARW.2019.40083,

[7]

Liang, M., Hu, X.: Recurrent convolutional neural network for object recognition. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (June 2015)

[8]

Wang, J., Hu, X.: Gated recurrent convolution neural network for ocr. In: Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R. (eds.) Advances in Neural Information Processing Systems 30, pp. 335–344. Curran Associates, Inc. (2017)

[9]

Chen-Yu Lee and Simon Osindero. Recursive recurrent nets with attention modeling for ocr in the wild. In CVPR, pages 2231–2239, 2016. 1, 2, 4.

[10]

Fedor Borisyuk, Albert Gordo, and Viswanath Sivakumar. Rosetta: Large scale system for text detection and recognition in images. In KDD, pages 71–79, 2018. 1, 2, 4.

[11]

Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., & Houlsby, N. (2020). An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. CoRR, abs/2010.11929. https://arxiv.org/abs/2010.11929

[12]

Mohamed Yousef, Khaled F. Hussain, and Usama S. Mohammed. 2018. Accurate, Data-Efficient, Unconstrained Text Recognition with Convolutional Neural Networks. CoRR abs/1812.11894, (2018). Retrieved from http://arxiv.org/abs/1812.11894

[13]

R. K. Srivastava, K. Greff, and J. Schmidhuber, “Highway networks,” arXiv preprint arXiv:1505.00387, 2015.

[14]

K. Greff, R. K. Srivastava, and J. Schmidhuber, “Highway and residual networks learn unrolled iterative estimation,” arXiv preprint arXiv:1612.07771, 2016.

[15]

Y. Kim, Y. Jernite, D. Sontag, and A. M. Rush, “Character-aware neural language models.” in AAAI, 2016, pp. 2741–2749.

[16]

Chiron, G., Arrestier, F., & Awal, A. M. (2021). Fast End-to-End Deep Learning Identity Document Detection, Classification and Cropping. In J. Lladós, D. Lopresti, & S. Uchida (Eds.), 16th International Conference on Document Analysis and Recognition, ICDAR 2021, Lausanne, Switzerland, September 5-10, 2021, Proceedings, Part IV (Vol. 12824, pp. 333–347). Springer. https://doi.org/10.1007/978-3-030-86337-1\_23

Digital Library

[17]

Max Jaderberg, Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Synthetic data and artificial neural networks for natural scene text recognition. In Workshop on Deep Learning, NIPS, 2014. 2

[18]

Ankush Gupta, Andrea Vedaldi, and Andrew Zisserman. Synthetic data for text localisation in natural images. In CVPR, 2016. 2

[19]

Anand Mishra, Karteek Alahari, and CV Jawahar. Scene text recognition using higher order language priors. In BMVC, 2012. 3

[20]

Kai Wang, Boris Babenko, and Serge Belongie. End-to-end scene text recognition. In ICCV, pages 1457–1464, 2011. 3

[21]

Simon M Lucas, Alex Panaretos, Luis Sosa, Anthony Tang, Shirley Wong, and Robert Young. Icdar 2003 robust reading competitions. In ICDAR, pages 682–687, 2003. 3

[22]

Dimosthenis Karatzas, Faisal Shafait, Seiichi Uchida, Masakazu Iwamura, Lluis Gomez i Bigorda, Sergi Robles Mestre, Joan Mas, David Fernandez Mota, Jon Almazan Almazan, and Lluis Pere De Las Heras. Icdar 2013 robust reading competition. In ICDAR, pages 1484–1493, 2013. 3

[23]

Dimosthenis Karatzas, Lluis Gomez-Bigorda, Anguelos Nicolaou, Suman Ghosh, Andrew Bagdanov, Masakazu Iwamura, Jiri Matas, Lukas Neumann, Vijay Ramaseshan Chandrasekhar, Shijian Lu, Icdar 2015 competition on robust reading. In ICDAR, pages 1156–1160, 2015. 3

[24]

Trung Quy Phan, Palaiahnakote Shivakumara, Shangxuan Tian, and Chew Lim Tan. Recognizing text with perspective distortion in natural scenes. In ICCV, pages 569–576, 2013. 3

Digital Library

[25]

Anhar Risnumawan, Palaiahankote Shivakumara, Chee Seng Chan, and Chew Lim Tan. A robust arbitrary text detection system for natural scene images. In ESWA, volume 41, pages 8027–8048. Elsevier, 2014. 3

[26]

Warp-ctc, Baidu Research, https://github.com/baidu-research/warp-ctc

Index Terms

Are Attention blocks better than BiLSTM for text recognition?
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
        Object recognition

Recommendations

A segmentation-free approach to text recognition with application to Arabic text
Offline arabic handwritten text recognition: A Survey

Research in offline Arabic handwriting recognition has increased considerably in the past few years. This is evident from the numerous research results published recently in major journals and conferences in the area of handwriting recognition. Features ...
Handwritten Cursive Jawi Character Recognition: A Survey
CGIV '08: Proceedings of the 2008 Fifth International Conference on Computer Graphics, Imaging and Visualisation

The subject of cursive handwritten character recognition is still open to be studied because of its complex nature. Recognition of Arabic handwritten and its variants such as Farsi (Persian) and Urdu had been receiving considerable attention in recent ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

ICMLT '23: Proceedings of the 2023 8th International Conference on Machine Learning Technologies

March 2023

293 pages

ISBN:9781450398329

DOI:10.1145/3589883

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 June 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

ICMLT 2023

ICMLT 2023: 2023 8th International Conference on Machine Learning Technologies

March 10 - 12, 2023

Stockholm, Sweden

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
27
Total Downloads

Downloads (Last 12 months)10
Downloads (Last 6 weeks)1

Reflects downloads up to 25 Dec 2024

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Table of Contents