Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Recognizing text lines in handwritten archival document images using octave convolutional and attention recurrent neural networks

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Over the past several years, many archivists and historians have pointed out growing needs closely related to robust and efficient offline handwritten text recognition systems able to assist them in transcribing handwritten archival documents. These systems are able to generate as output an editable text file corresponding to the transcription of the digitized documents. Recently, great attention has been paid to the use of deep learning for solving various issues related to document image analysis. Particularly, deep learning architectures have been extensively used for handwritten text recognition to overcome some limitations of many old-fashion approaches, such as the hidden Markov models. To contribute to this trend, we propose in this article a deep learning architecture able to recognize text lines in handwritten archival document images using octave-based convolutional and attention-based recurrent neural networks. The proposed architecture is composed of the encoder and decoder blocks. First, the octave convolutional layers are used in the encoder block. Then, the attention-based bidirectional long short-term memory network, followed by the connectionist temporal classification layer are used in the decoder block. A set of experiments was carried out to show the effectiveness of the proposed architecture using different benchmark datasets of historical handwritten document images. Qualitative and quantitative results were reported and compared with those of recent state-of-the-art ones and the participating methods in the ICDAR and ICFHR contests in the same conditions (i.e. without using language model nor lexicon constraint). Using the proposed architecture, low character error rates at line level are achieved on three different datasets: 6.02%, 4.30% and 6.9% for the IAM, Rimes and Bentham datasets, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Data Availability

The IAM, Rimes and Bentham datasets used in this work are publicly available at https://fki.tic.heia-fr.ch/databases/iam-handwriting-database, http://www.a2ialab.com/doku.php?id=rimes_database:data:icdar2011:line:icdar2011competitionline and http://transcriptorium.eu/datasets/bentham-collection/, respectively, accessed on 28 April 2024.

The ANT-L dataset is potentially available upon request and pending internal review of such request.

Notes

  1. https://fki.tic.heia-fr.ch/databases/iam-handwriting-database

  2. http://www.a2ialab.com/doku.php?id=rimes_database:data:icdar2011:line:icdar2011competitionline

  3. http://transcriptorium.eu/datasets/bentham-collection/

  4. http://www.archives.nat.tn/

  5. https://guides.nyu.edu/tesseract

References

  1. Coquenet D, Chatelain C, Paquet T (2022) End-to-end handwritten paragraph text recognition using a vertical attention network. IEEE Transactions on Pattern Analysis and Machine Intelligence

  2. Mechi O, Mehri M, Ingold R, Essoukri Ben Amara N (2022) Recognizing handwritten text lines in ancient document images based on a gated residual recurrent neural network. In: International conference on computational collective intelligence, pp 250–263

  3. Sajedi H (2016) Handwriting recognition of digits, signs, and numerical strings in Persian. Computers & Electrical Engineering, pp 52–65

  4. Nguyen KC, Nguyen CT, Nakagawa M (2020) A semantic segmentation-based method for handwritten Japanese text. In: International conference on frontiers in handwriting recognition, pp 127–132

  5. Chen Y, Fan H, Xu B, Yan Z, Kalantidis Y, Rohrbach M, Feng J (2019) Drop an octave: reducing spatial redundancy in convolutional neural networks with octave convolution. In: International conference on computer vision, pp 3435–3444

  6. Hochreiter S, Jürgen S (1997) Long short-term memory. Neural Comput 9:735–1780

    Article  Google Scholar 

  7. Graves A (2012) Supervised Sequence Labelling. Supervised sequence labelling with recurrent neural networks, pp 5–13

  8. De Sousa Neto AF, Bezerra BLD, Toselli AH, Lima EB (2020) HTR-Flor++: a handwritten text recognition system based on a pipeline of optical and language models. ACM Symposium on Document Engineering, pp 1–4

  9. Cheng K, Yue Y, Song Z (2020) Sentiment classification based on part-of-speech and self-attention mechanism. IEEE Access, pp 16387–16396

  10. Kozielski M, Doetsch P, Ney H (2013) Improvements in RWTH’s system for off-line handwriting recognition. International Conference on Document Analysis and Recognition, pp 935–939

  11. Graves A, Liwicki M, Fernandez S, Bertolami R, Bunke H, Schmidhuber J (2009) A novel connectionist system for unconstrained handwriting recognition. IEEE Trans Pattern Anal Mach Intell 31:855–868

    Article  Google Scholar 

  12. Breuel TM, Ul-Hasan A, Al-Azawi MA, Shafait F (2013) High-performance OCR for printed English and Fraktur using LSTM networks. In: International conference on document analysis and recognition, pp. 683–687

  13. Mahmoud SA, Awaida SM (2009) Recognition of off-line handwritten Arabic (Indian) numerals using multi-scale features and support vector machines vs. hidden Markov models. The Arabian Journal for Science and Engineering, pp 429–444

  14. Yue X, Kuang Z, Lin C, Sun H, Zhang W (2020) RobustScanner: dynamically enhancing positional clues for robust text recognition. In: European conference on computer vision, pp 135–151

  15. Coquenet D, Chatelain C, Paquet T (2020) Recurrence-free unconstrained handwritten text recognition using gated fully convolutional network. In: International conference on frontiers in handwriting recognition, pp 19–24

  16. Bluche T (2016) Joint line segmentation and transcription for end-to-end handwritten paragraph recognition. Advances in Neural Information Processing Systems, pp 838–846

  17. Zamora-Martinez F, Frinken V, España-Boquera S, Castro-Bleda MJ, Fischer A, Bunke H (2014) Neural network language models for off-line handwriting recognition. Pattern Recognit 47:1642–1652

    Article  Google Scholar 

  18. De Sousa Neto AF, Bezerra BLD, Toselli AH, Lima EB (2020) HTR-Flor: a deep learning system for offline handwritten text recognition. In: SIBGRAPI conference on graphics, patterns and images, pp 54–610

  19. Wei H, Liu C, Zhang H, Bao F, Gao G (2019) End-to-end model for offline handwritten Mongolian word recognition. In: International conference on natural language processing and chinese computing, pp 220–230

  20. Thatikonda S (2021) A survey on handwritten character recognition using deep learning technique. Journal of University of Shanghai for Science and Technology, pp. 1019–1024

  21. Dutta K, Krishnan P, Mathew M, Jawahar C (2018) Offline handwriting recognition on Devanagari using a new benchmark dataset. International Workshop on Document Analysis Systems, pp 25–30

  22. Bluche T, Louradour J, Messina R (2017) Scan, attend and read: end-to-end handwritten paragraph recognition with MDLSTM attention. In: International conference on document analysis and recognition, pp 1050–1055

  23. Bluche T Messina R (2017) Gated convolutional recurrent neural networks for multilingual handwriting recognition. In: International conference on document analysis and recognition, pp 646–651

  24. Pham V, Bluche T, Kermorvant C, Louradour J (2014) Dropout improves recurrent neural networks for handwriting recognition. In: International conference on frontiers in handwriting recognition, pp 285–290

  25. Zayene O, Touj SM, Hennebert J, Ingold R, Essoukri Ben Amara N (2018) Multi-dimensional long short-term memory networks for artificial Arabic text recognition in news video. IET Comput Vision 12:710–719

    Article  Google Scholar 

  26. Zhai C, Chen Z, Li J, Xu B (2016) Chinese image text recognition with BLSTM-CTC: a segmentation-free method. Chinese Conference on Pattern Recognition, pp 525–536

  27. Abdallah A, Hamada M, Nurseitov D (2020) Attention-based fully gated CNN-BGRU for Russian handwritten text. arXiv:2008.05373

  28. Liu B, Xu X, Zhang Y (2020) Offline handwritten Chinese text recognition with convolutional neural networks. arXiv:2006.15619

  29. Souibgui MA, Fornés A, Kessentini Y, Tudor C (2020) A few-shot learning approach for historical ciphered manuscript recognition. arXiv:2009.12577

  30. Khamekhem Jemni S, Kessentini Y, Kanoun S, Ogier JM (2018) Offline Arabic handwriting recognition using BLSTMs combination. International Workshop on Document Analysis Systems, pp 31–36

  31. Ghanim TM, Khalil MI, Abbas HM (2020) Comparative study on deep convolution neural networks DCNN-based offline Arabic handwriting recognition. IEEE Access 8:465–482

    Article  Google Scholar 

  32. Yousef M, Hussain KF, Mohammed US (2020) Accurate, data-efficient, unconstrained text recognition with convolutional neural networks. Pattern Recogni 108:107482

    Article  Google Scholar 

  33. Ly NT, Nguyen CT, Nakagawa M (2020) An attention-based row-column encoder-decoder model for text recognition in Japanese historical documents. Pattern Recognition Letters, pp 134–141

  34. Ly NT, Nguyen CT, Nakagawa M (2020) Attention augmented convolutional recurrent network for handwritten Japanese text recognition. In: International conference on frontiers in handwriting recognition, pp 163–168

  35. Ly NT, Nguyen CT, Nakagawa M (2021) 2D self-attention convolutional recurrent network for offline handwritten text recognition. In: International conference on document analysis and recognition, pp 191–204

  36. Coquenet D, Chatelain C Paquet T (2021) SPAN: a simple predict & align network for handwritten paragraph recognition. In: International conference on document analysis and recognition, pp 70–84

  37. Wang T, Zhu Y, Jin L, Luo C, Chen X, Wu Y, Wang Q, Cai M (2020) Decoupled attention network for text recognition. In: AAAI conference on artificial intelligence, pp 12216–12224

  38. Puigcerver J (2017) Are multi-dimensional recurrent layers really necessary for handwritten text recognition?. In: International conference on document analysis and recognition, pp 67–72

  39. Ingle RR, Fujii Y, Deselaers T, Baccash J, A. C. IPopat, (2019) A scalable handwritten text recognition system. In: International conference on document analysis and recognition, pp 17–24

  40. Cojocaru I, Cascianelli S, Baraldi, L Corsini M, Cucchiara R (2020) Watch your strokes: improving handwritten text recognition with deformable convolutions. In: International conference on pattern recognition, pp 5537–5580

  41. Husnain M, Mumtaz S, Coustaty M, Luqman M, Ogier JM, Malik S (2020) Urdu handwritten text recognition: a survey. IET Image Process 14:2291–2300

    Article  Google Scholar 

  42. Sánchez JA, Romero V, Toselli AH, Villegas M, Vidal E (2019) A set of benchmarks for handwritten text recognition on historical documents. Pattern Recognit 94:122–134

    Article  Google Scholar 

  43. Sun C Si D (1997) Skew and slant correction for document images using gradient direction.In: International conference on document analysis and recognition, pp 142–146

  44. Li S, Cai Q, Li H, Cao J, Wang J, Li Z (2020) Frequency separation network for image super-resolution. IEEE Access 8:33768–33777

    Article  Google Scholar 

  45. Wang W, Zhong J, Wu H, Wen Z, Qin J, (2020) RVSeg-Net: an efficient feature pyramid cascade network for retinal vessel segmentation. In: International conference on medical image computing and computer-assisted intervention, pp 796–805

  46. De Valois RL, De Valois KK (1980) Spatial vision. Annu Rev Psychol 31:309–341

    Article  Google Scholar 

  47. Tong G, Li Y, Gao H, Chen H, Wang H, Yang X (2020) MA-CRNN: a multi-scale attention CRNN for Chinese text line recognition in natural scenes. Int J Doc Anal Recognit 23:103–114

    Article  Google Scholar 

  48. Bahdanau D, Cho K, Bengio Y (2014) Neural machine translation by jointly learning to align and translate. arXiv:1409.0473

  49. Chorowski JK, Bahdanou D, Serdyuk D, Cho K, Bengio Y (2015) Attention-based models for speech recognition. Advances in Neural Information Processing Systems, pp 577–585

  50. Xu K, JBa J, Kiros R, Cho K, Courville AC, Salakhutdinov R, Zemel RS, Bengio Y (2015) Show, attend and tell: neural image caption generation with visual attention. In: International conference on machine learning, pp 2048–2057

  51. Wang Z, Xiao D, Fang F, Govinda R, Pain C, Guo Y (2018) Model identification of reduced order fluid dynamics systems using deep learning. International Journal for Numerical Methods in Fluids, pp 255–268

  52. Shi B, Yang M, Wang X (2018) ASTER: an attentional scene text recognizer with flexible rectification. IEEE Trans Pattern Anal Mach Intell 41:2035–2048

    Article  Google Scholar 

  53. Cong F, Hu W, Huo Q, Guo L (2019) A comparative study of attention-based encoder-decoder approaches to natural scene text recognition. In: International conference on document analysis and recognition, pp 916–921

  54. Parikh AP, Täkström O, Das D, Uszkoreit J (2016) A decomposable attention model for natural language inference. arXiv:1606.01933

  55. Chao L, Chen J, Chu W (2020) Variational connectionist temporal classification. In: European conference on computer vision, pp 460–476

  56. Coquenet D, Chatelain C, Paquet T (2022) End-to-end handwritten paragraph text recognition using a vertical attention network. IEEE Transactions on Pattern Analysis and Machine Intelligence

  57. Marti U, Bunke H (1999) A full English sentence database for off-line handwriting recognition. International Conference on Document Analysis and Recognition, pp 705–708

  58. Grosicki E, El-Abed H (2011) ICDAR 2011 - French handwriting recognition competition. In: International conference on document analysis and recognition, pp 1459–1463

  59. Sánchez JA, Romero V, Toselli AH, Villegas M, Vidal E (2014) ICFHR2014 competition on handwritten text recognition on transcriptorium datasets (HTRtS). In: International conference on frontiers in handwriting recognition, pp 785–790

  60. Mostafa A, Mohamed O, Ashraf A, Elbehery A, Jamal S, Khoriba G, Ghoneim AS (2021) OCFormer: a transformer-based model for Arabic handwritten text recognition. In: International mobile, intelligent, and ubiquitous computing conference, pp 182–186

  61. Coquenet D, Chatelain C, Paquet T (2020) Recurrence-free unconstrained handwritten text recognition using gated fully convolutional network. In: International conference on frontiers in handwriting recognition, pp 19–24

  62. Kang L, Riba P, Rusiñol M, Fornés A, Villegas M (2020) Pay attention to what you read: non-recurrent handwritten text-line recognition. arXiv:2005.13044

  63. Coquenet D, Chatelain C, Paquet T (2022) End-to-end handwritten paragraph text recognition using a vertical attention network. IEEE Transactions on Pattern Analysis and Machine Intelligence

  64. Zhang Y, Nie S, Liu W, Xu X, Zhang C, Shen HT (2019) Sequence-to-sequence domain adaptation network for robust text image recognition. In: Conference on computer vision and pattern recognition, pp 2740–749

  65. Sueiras J, Ruiz V, Sanchez A, Velez JF (2018) Offline continuous handwriting recognition using sequence to sequence neural networks. Neurocomputing, pp 119–128

  66. Moysset B, Messina R (2019) Are 2D-LSTM really dead for offline text recognition?. In: International journal on document analysis and recognition, pp 193–208

  67. Gao L, Zhang H, Liu CL (2021) Handwritten text recognition with convolutional prototype network and most aligned frame based CTC training. In: International conference on document analysis and recognition, pp 205–220

  68. Markou K, Tsochatzidis L, Zagorisand K, Papazoglou A, Karagiannis X, Symeonidis S, Pratikakis I (2021) A convolutional recurrent neural network for the handwritten text recognition of historical Greek manuscripts. In: International conference on pattern recognition, pp 249–262

  69. Wang X, Gu Y, Gao X, Hui Z (2019) Dual residual attention module network for single image super resolution. Neurocomputing, pp 269–279

  70. Bluche T (2015) Deep neural networks for large vocabulary handwritten text recognition. Ph.D. thesis

Download references

Acknowledgements

This study has been funded under the “19PEJC-08-02” grant agreement number by the Tunisian Ministry of Higher Education and Scientific Research that is gratefully acknowledged.

The authors would like also to thank the Tunisian national archives for providing access to their digital collections.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Olfa Mechi.

Ethics declarations

Conflicts of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mechi, O., Mehri, M., Ingold, R. et al. Recognizing text lines in handwritten archival document images using octave convolutional and attention recurrent neural networks. Multimed Tools Appl (2024). https://doi.org/10.1007/s11042-024-19717-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11042-024-19717-4

Keywords