Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3589883.3589914acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicmltConference Proceedingsconference-collections
research-article

Are Attention blocks better than BiLSTM for text recognition? 

Published: 27 June 2023 Publication History

Abstract

This paper studies the impact of using Sequential Attention blocks versus Bidirectional Long-Short-Term Memory (BiLSTM) layers for Optical Character Recognition (OCR). The main target is to improve the inference time – specifically on CPU – of state-of-the-art OCRs, with also the additional constraint of being trainable with only a restricted amount of data. While OCR research often focuses on improving recognition accuracy, there has been little emphasis on optimizing processing speed and model weights. In this context, experimental results presented in this paper show the superiority of Attention blocks compared to BiLSTM layers.  Attention blocks appear to be up to 5x faster on CPU, while achieving better and similar decoding rates on a typical industrial dataset of identity document text fields and publicly available Scene Text Recognition (STR) datasets, respectively. Also, in addition to being faster and accurate, which was the primary goal, it appears that Attention blocks lead to lighter models.  

References

[1]
Baoguang Shi, Xiang Bai, and Cong Yao. An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. In TPAMI, volume 39, pages 2298–2304. IEEE, 2017. 1, 2, 4, 5, 10, 11, 12
[2]
Jouanne, J., Dauchy, Q., Awal, A.M. (2021). PyraD-DCNN: A Fully Convolutional Neural Network to Replace BLSTM in Offline Text Recognition Systems. In:, Pattern Recognition. ICPR International Workshops and Challenges. ICPR 2021. Lecture Notes in Computer Science(), vol 12661. Springer, Cham. https://doi.org/10.1007/978-3-030-68763-2_49
[3]
Atul Adya, Paramvir Bahl, Jitendra Padhye, Alec Wolman, and Lidong Zhou. 2004. A multi-radio unification protocol for IEEE 802.11 wireless networks. In Proceedings of the IEEE 1st International Conference on Broadnets Networks (BroadNets’04) . IEEE, Los Alamitos, CA, 210–217. https://doi.org/10.1109/BROADNETS.2004.8
[4]
Martin A. Fischler and Robert C. Bolles. 1981. Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 24, 6 (June 1981), 381–395. https://doi.org/10.1145/358669.358692
[5]
Jianfeng Wang and Xiaolin Hu. Gated recurrent convolution neural network for ocr. In NIPS, pages 334–343, 2017. 1, 2, 4, 10, 11, 12
[6]
Coquenet, D., Soullard, Y., Chatelain, C., Paquet, T.: Have Convolutions Already Made Recurrence Obsolete for Unconstrained Handwritten Text Recognition? In: 2019 International Conference on Document Analysis and Recognition workshops (ICDARW).pp.65–70.IEEE, Sydney, Australia (Sep 2019). https://doi.org/10.1109/ICDARW.2019.40083,
[7]
Liang, M., Hu, X.: Recurrent convolutional neural network for object recognition. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (June 2015)
[8]
Wang, J., Hu, X.: Gated recurrent convolution neural network for ocr. In: Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R. (eds.) Advances in Neural Information Processing Systems 30, pp. 335–344. Curran Associates, Inc. (2017)
[9]
Chen-Yu Lee and Simon Osindero. Recursive recurrent nets with attention modeling for ocr in the wild. In CVPR, pages 2231–2239, 2016. 1, 2, 4.
[10]
Fedor Borisyuk, Albert Gordo, and Viswanath Sivakumar. Rosetta: Large scale system for text detection and recognition in images. In KDD, pages 71–79, 2018. 1, 2, 4.
[11]
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., & Houlsby, N. (2020). An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. CoRR, abs/2010.11929. https://arxiv.org/abs/2010.11929
[12]
Mohamed Yousef, Khaled F. Hussain, and Usama S. Mohammed. 2018. Accurate, Data-Efficient, Unconstrained Text Recognition with Convolutional Neural Networks. CoRR abs/1812.11894, (2018). Retrieved from http://arxiv.org/abs/1812.11894
[13]
R. K. Srivastava, K. Greff, and J. Schmidhuber, “Highway networks,” arXiv preprint arXiv:1505.00387, 2015.
[14]
K. Greff, R. K. Srivastava, and J. Schmidhuber, “Highway and residual networks learn unrolled iterative estimation,” arXiv preprint arXiv:1612.07771, 2016.
[15]
Y. Kim, Y. Jernite, D. Sontag, and A. M. Rush, “Character-aware neural language models.” in AAAI, 2016, pp. 2741–2749.
[16]
Chiron, G., Arrestier, F., & Awal, A. M. (2021). Fast End-to-End Deep Learning Identity Document Detection, Classification and Cropping. In J. Lladós, D. Lopresti, & S. Uchida (Eds.), 16th International Conference on Document Analysis and Recognition, ICDAR 2021, Lausanne, Switzerland, September 5-10, 2021, Proceedings, Part IV (Vol. 12824, pp. 333–347). Springer. https://doi.org/10.1007/978-3-030-86337-1\_23
[17]
Max Jaderberg, Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Synthetic data and artificial neural networks for natural scene text recognition. In Workshop on Deep Learning, NIPS, 2014. 2
[18]
Ankush Gupta, Andrea Vedaldi, and Andrew Zisserman. Synthetic data for text localisation in natural images. In CVPR, 2016. 2
[19]
Anand Mishra, Karteek Alahari, and CV Jawahar. Scene text recognition using higher order language priors. In BMVC, 2012. 3
[20]
Kai Wang, Boris Babenko, and Serge Belongie. End-to-end scene text recognition. In ICCV, pages 1457–1464, 2011. 3
[21]
Simon M Lucas, Alex Panaretos, Luis Sosa, Anthony Tang, Shirley Wong, and Robert Young. Icdar 2003 robust reading competitions. In ICDAR, pages 682–687, 2003. 3
[22]
Dimosthenis Karatzas, Faisal Shafait, Seiichi Uchida, Masakazu Iwamura, Lluis Gomez i Bigorda, Sergi Robles Mestre, Joan Mas, David Fernandez Mota, Jon Almazan Almazan, and Lluis Pere De Las Heras. Icdar 2013 robust reading competition. In ICDAR, pages 1484–1493, 2013. 3
[23]
Dimosthenis Karatzas, Lluis Gomez-Bigorda, Anguelos Nicolaou, Suman Ghosh, Andrew Bagdanov, Masakazu Iwamura, Jiri Matas, Lukas Neumann, Vijay Ramaseshan Chandrasekhar, Shijian Lu, Icdar 2015 competition on robust reading. In ICDAR, pages 1156–1160, 2015. 3
[24]
Trung Quy Phan, Palaiahnakote Shivakumara, Shangxuan Tian, and Chew Lim Tan. Recognizing text with perspective distortion in natural scenes. In ICCV, pages 569–576, 2013. 3
[25]
Anhar Risnumawan, Palaiahankote Shivakumara, Chee Seng Chan, and Chew Lim Tan. A robust arbitrary text detection system for natural scene images. In ESWA, volume 41, pages 8027–8048. Elsevier, 2014. 3
[26]
Warp-ctc, Baidu Research, https://github.com/baidu-research/warp-ctc

Index Terms

  1. Are Attention blocks better than BiLSTM for text recognition? 

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    ICMLT '23: Proceedings of the 2023 8th International Conference on Machine Learning Technologies
    March 2023
    293 pages
    ISBN:9781450398329
    DOI:10.1145/3589883
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 27 June 2023

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Attention blocks
    2. LSTM
    3. Optical character recognition
    4. sequence modeling

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    ICMLT 2023

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 27
      Total Downloads
    • Downloads (Last 12 months)7
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 13 Jan 2025

    Other Metrics

    Citations

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media