Abstract
Over the past several years, many archivists and historians have pointed out growing needs closely related to robust and efficient offline handwritten text recognition systems able to assist them in transcribing handwritten archival documents. These systems are able to generate as output an editable text file corresponding to the transcription of the digitized documents. Recently, great attention has been paid to the use of deep learning for solving various issues related to document image analysis. Particularly, deep learning architectures have been extensively used for handwritten text recognition to overcome some limitations of many old-fashion approaches, such as the hidden Markov models. To contribute to this trend, we propose in this article a deep learning architecture able to recognize text lines in handwritten archival document images using octave-based convolutional and attention-based recurrent neural networks. The proposed architecture is composed of the encoder and decoder blocks. First, the octave convolutional layers are used in the encoder block. Then, the attention-based bidirectional long short-term memory network, followed by the connectionist temporal classification layer are used in the decoder block. A set of experiments was carried out to show the effectiveness of the proposed architecture using different benchmark datasets of historical handwritten document images. Qualitative and quantitative results were reported and compared with those of recent state-of-the-art ones and the participating methods in the ICDAR and ICFHR contests in the same conditions (i.e. without using language model nor lexicon constraint). Using the proposed architecture, low character error rates at line level are achieved on three different datasets: 6.02%, 4.30% and 6.9% for the IAM, Rimes and Bentham datasets, respectively.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data Availability
The IAM, Rimes and Bentham datasets used in this work are publicly available at https://fki.tic.heia-fr.ch/databases/iam-handwriting-database, http://www.a2ialab.com/doku.php?id=rimes_database:data:icdar2011:line:icdar2011competitionline and http://transcriptorium.eu/datasets/bentham-collection/, respectively, accessed on 28 April 2024.
The ANT-L dataset is potentially available upon request and pending internal review of such request.
References
Coquenet D, Chatelain C, Paquet T (2022) End-to-end handwritten paragraph text recognition using a vertical attention network. IEEE Transactions on Pattern Analysis and Machine Intelligence
Mechi O, Mehri M, Ingold R, Essoukri Ben Amara N (2022) Recognizing handwritten text lines in ancient document images based on a gated residual recurrent neural network. In: International conference on computational collective intelligence, pp 250–263
Sajedi H (2016) Handwriting recognition of digits, signs, and numerical strings in Persian. Computers & Electrical Engineering, pp 52–65
Nguyen KC, Nguyen CT, Nakagawa M (2020) A semantic segmentation-based method for handwritten Japanese text. In: International conference on frontiers in handwriting recognition, pp 127–132
Chen Y, Fan H, Xu B, Yan Z, Kalantidis Y, Rohrbach M, Feng J (2019) Drop an octave: reducing spatial redundancy in convolutional neural networks with octave convolution. In: International conference on computer vision, pp 3435–3444
Hochreiter S, Jürgen S (1997) Long short-term memory. Neural Comput 9:735–1780
Graves A (2012) Supervised Sequence Labelling. Supervised sequence labelling with recurrent neural networks, pp 5–13
De Sousa Neto AF, Bezerra BLD, Toselli AH, Lima EB (2020) HTR-Flor++: a handwritten text recognition system based on a pipeline of optical and language models. ACM Symposium on Document Engineering, pp 1–4
Cheng K, Yue Y, Song Z (2020) Sentiment classification based on part-of-speech and self-attention mechanism. IEEE Access, pp 16387–16396
Kozielski M, Doetsch P, Ney H (2013) Improvements in RWTH’s system for off-line handwriting recognition. International Conference on Document Analysis and Recognition, pp 935–939
Graves A, Liwicki M, Fernandez S, Bertolami R, Bunke H, Schmidhuber J (2009) A novel connectionist system for unconstrained handwriting recognition. IEEE Trans Pattern Anal Mach Intell 31:855–868
Breuel TM, Ul-Hasan A, Al-Azawi MA, Shafait F (2013) High-performance OCR for printed English and Fraktur using LSTM networks. In: International conference on document analysis and recognition, pp. 683–687
Mahmoud SA, Awaida SM (2009) Recognition of off-line handwritten Arabic (Indian) numerals using multi-scale features and support vector machines vs. hidden Markov models. The Arabian Journal for Science and Engineering, pp 429–444
Yue X, Kuang Z, Lin C, Sun H, Zhang W (2020) RobustScanner: dynamically enhancing positional clues for robust text recognition. In: European conference on computer vision, pp 135–151
Coquenet D, Chatelain C, Paquet T (2020) Recurrence-free unconstrained handwritten text recognition using gated fully convolutional network. In: International conference on frontiers in handwriting recognition, pp 19–24
Bluche T (2016) Joint line segmentation and transcription for end-to-end handwritten paragraph recognition. Advances in Neural Information Processing Systems, pp 838–846
Zamora-Martinez F, Frinken V, España-Boquera S, Castro-Bleda MJ, Fischer A, Bunke H (2014) Neural network language models for off-line handwriting recognition. Pattern Recognit 47:1642–1652
De Sousa Neto AF, Bezerra BLD, Toselli AH, Lima EB (2020) HTR-Flor: a deep learning system for offline handwritten text recognition. In: SIBGRAPI conference on graphics, patterns and images, pp 54–610
Wei H, Liu C, Zhang H, Bao F, Gao G (2019) End-to-end model for offline handwritten Mongolian word recognition. In: International conference on natural language processing and chinese computing, pp 220–230
Thatikonda S (2021) A survey on handwritten character recognition using deep learning technique. Journal of University of Shanghai for Science and Technology, pp. 1019–1024
Dutta K, Krishnan P, Mathew M, Jawahar C (2018) Offline handwriting recognition on Devanagari using a new benchmark dataset. International Workshop on Document Analysis Systems, pp 25–30
Bluche T, Louradour J, Messina R (2017) Scan, attend and read: end-to-end handwritten paragraph recognition with MDLSTM attention. In: International conference on document analysis and recognition, pp 1050–1055
Bluche T Messina R (2017) Gated convolutional recurrent neural networks for multilingual handwriting recognition. In: International conference on document analysis and recognition, pp 646–651
Pham V, Bluche T, Kermorvant C, Louradour J (2014) Dropout improves recurrent neural networks for handwriting recognition. In: International conference on frontiers in handwriting recognition, pp 285–290
Zayene O, Touj SM, Hennebert J, Ingold R, Essoukri Ben Amara N (2018) Multi-dimensional long short-term memory networks for artificial Arabic text recognition in news video. IET Comput Vision 12:710–719
Zhai C, Chen Z, Li J, Xu B (2016) Chinese image text recognition with BLSTM-CTC: a segmentation-free method. Chinese Conference on Pattern Recognition, pp 525–536
Abdallah A, Hamada M, Nurseitov D (2020) Attention-based fully gated CNN-BGRU for Russian handwritten text. arXiv:2008.05373
Liu B, Xu X, Zhang Y (2020) Offline handwritten Chinese text recognition with convolutional neural networks. arXiv:2006.15619
Souibgui MA, Fornés A, Kessentini Y, Tudor C (2020) A few-shot learning approach for historical ciphered manuscript recognition. arXiv:2009.12577
Khamekhem Jemni S, Kessentini Y, Kanoun S, Ogier JM (2018) Offline Arabic handwriting recognition using BLSTMs combination. International Workshop on Document Analysis Systems, pp 31–36
Ghanim TM, Khalil MI, Abbas HM (2020) Comparative study on deep convolution neural networks DCNN-based offline Arabic handwriting recognition. IEEE Access 8:465–482
Yousef M, Hussain KF, Mohammed US (2020) Accurate, data-efficient, unconstrained text recognition with convolutional neural networks. Pattern Recogni 108:107482
Ly NT, Nguyen CT, Nakagawa M (2020) An attention-based row-column encoder-decoder model for text recognition in Japanese historical documents. Pattern Recognition Letters, pp 134–141
Ly NT, Nguyen CT, Nakagawa M (2020) Attention augmented convolutional recurrent network for handwritten Japanese text recognition. In: International conference on frontiers in handwriting recognition, pp 163–168
Ly NT, Nguyen CT, Nakagawa M (2021) 2D self-attention convolutional recurrent network for offline handwritten text recognition. In: International conference on document analysis and recognition, pp 191–204
Coquenet D, Chatelain C Paquet T (2021) SPAN: a simple predict & align network for handwritten paragraph recognition. In: International conference on document analysis and recognition, pp 70–84
Wang T, Zhu Y, Jin L, Luo C, Chen X, Wu Y, Wang Q, Cai M (2020) Decoupled attention network for text recognition. In: AAAI conference on artificial intelligence, pp 12216–12224
Puigcerver J (2017) Are multi-dimensional recurrent layers really necessary for handwritten text recognition?. In: International conference on document analysis and recognition, pp 67–72
Ingle RR, Fujii Y, Deselaers T, Baccash J, A. C. IPopat, (2019) A scalable handwritten text recognition system. In: International conference on document analysis and recognition, pp 17–24
Cojocaru I, Cascianelli S, Baraldi, L Corsini M, Cucchiara R (2020) Watch your strokes: improving handwritten text recognition with deformable convolutions. In: International conference on pattern recognition, pp 5537–5580
Husnain M, Mumtaz S, Coustaty M, Luqman M, Ogier JM, Malik S (2020) Urdu handwritten text recognition: a survey. IET Image Process 14:2291–2300
Sánchez JA, Romero V, Toselli AH, Villegas M, Vidal E (2019) A set of benchmarks for handwritten text recognition on historical documents. Pattern Recognit 94:122–134
Sun C Si D (1997) Skew and slant correction for document images using gradient direction.In: International conference on document analysis and recognition, pp 142–146
Li S, Cai Q, Li H, Cao J, Wang J, Li Z (2020) Frequency separation network for image super-resolution. IEEE Access 8:33768–33777
Wang W, Zhong J, Wu H, Wen Z, Qin J, (2020) RVSeg-Net: an efficient feature pyramid cascade network for retinal vessel segmentation. In: International conference on medical image computing and computer-assisted intervention, pp 796–805
De Valois RL, De Valois KK (1980) Spatial vision. Annu Rev Psychol 31:309–341
Tong G, Li Y, Gao H, Chen H, Wang H, Yang X (2020) MA-CRNN: a multi-scale attention CRNN for Chinese text line recognition in natural scenes. Int J Doc Anal Recognit 23:103–114
Bahdanau D, Cho K, Bengio Y (2014) Neural machine translation by jointly learning to align and translate. arXiv:1409.0473
Chorowski JK, Bahdanou D, Serdyuk D, Cho K, Bengio Y (2015) Attention-based models for speech recognition. Advances in Neural Information Processing Systems, pp 577–585
Xu K, JBa J, Kiros R, Cho K, Courville AC, Salakhutdinov R, Zemel RS, Bengio Y (2015) Show, attend and tell: neural image caption generation with visual attention. In: International conference on machine learning, pp 2048–2057
Wang Z, Xiao D, Fang F, Govinda R, Pain C, Guo Y (2018) Model identification of reduced order fluid dynamics systems using deep learning. International Journal for Numerical Methods in Fluids, pp 255–268
Shi B, Yang M, Wang X (2018) ASTER: an attentional scene text recognizer with flexible rectification. IEEE Trans Pattern Anal Mach Intell 41:2035–2048
Cong F, Hu W, Huo Q, Guo L (2019) A comparative study of attention-based encoder-decoder approaches to natural scene text recognition. In: International conference on document analysis and recognition, pp 916–921
Parikh AP, Täkström O, Das D, Uszkoreit J (2016) A decomposable attention model for natural language inference. arXiv:1606.01933
Chao L, Chen J, Chu W (2020) Variational connectionist temporal classification. In: European conference on computer vision, pp 460–476
Coquenet D, Chatelain C, Paquet T (2022) End-to-end handwritten paragraph text recognition using a vertical attention network. IEEE Transactions on Pattern Analysis and Machine Intelligence
Marti U, Bunke H (1999) A full English sentence database for off-line handwriting recognition. International Conference on Document Analysis and Recognition, pp 705–708
Grosicki E, El-Abed H (2011) ICDAR 2011 - French handwriting recognition competition. In: International conference on document analysis and recognition, pp 1459–1463
Sánchez JA, Romero V, Toselli AH, Villegas M, Vidal E (2014) ICFHR2014 competition on handwritten text recognition on transcriptorium datasets (HTRtS). In: International conference on frontiers in handwriting recognition, pp 785–790
Mostafa A, Mohamed O, Ashraf A, Elbehery A, Jamal S, Khoriba G, Ghoneim AS (2021) OCFormer: a transformer-based model for Arabic handwritten text recognition. In: International mobile, intelligent, and ubiquitous computing conference, pp 182–186
Coquenet D, Chatelain C, Paquet T (2020) Recurrence-free unconstrained handwritten text recognition using gated fully convolutional network. In: International conference on frontiers in handwriting recognition, pp 19–24
Kang L, Riba P, Rusiñol M, Fornés A, Villegas M (2020) Pay attention to what you read: non-recurrent handwritten text-line recognition. arXiv:2005.13044
Coquenet D, Chatelain C, Paquet T (2022) End-to-end handwritten paragraph text recognition using a vertical attention network. IEEE Transactions on Pattern Analysis and Machine Intelligence
Zhang Y, Nie S, Liu W, Xu X, Zhang C, Shen HT (2019) Sequence-to-sequence domain adaptation network for robust text image recognition. In: Conference on computer vision and pattern recognition, pp 2740–749
Sueiras J, Ruiz V, Sanchez A, Velez JF (2018) Offline continuous handwriting recognition using sequence to sequence neural networks. Neurocomputing, pp 119–128
Moysset B, Messina R (2019) Are 2D-LSTM really dead for offline text recognition?. In: International journal on document analysis and recognition, pp 193–208
Gao L, Zhang H, Liu CL (2021) Handwritten text recognition with convolutional prototype network and most aligned frame based CTC training. In: International conference on document analysis and recognition, pp 205–220
Markou K, Tsochatzidis L, Zagorisand K, Papazoglou A, Karagiannis X, Symeonidis S, Pratikakis I (2021) A convolutional recurrent neural network for the handwritten text recognition of historical Greek manuscripts. In: International conference on pattern recognition, pp 249–262
Wang X, Gu Y, Gao X, Hui Z (2019) Dual residual attention module network for single image super resolution. Neurocomputing, pp 269–279
Bluche T (2015) Deep neural networks for large vocabulary handwritten text recognition. Ph.D. thesis
Acknowledgements
This study has been funded under the “19PEJC-08-02” grant agreement number by the Tunisian Ministry of Higher Education and Scientific Research that is gratefully acknowledged.
The authors would like also to thank the Tunisian national archives for providing access to their digital collections.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflicts of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Mechi, O., Mehri, M., Ingold, R. et al. Recognizing text lines in handwritten archival document images using octave convolutional and attention recurrent neural networks. Multimed Tools Appl (2024). https://doi.org/10.1007/s11042-024-19717-4
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11042-024-19717-4