Article

Revisiting N-Gram Models: Their Impact in Modern Neural Networks for Handwritten Text Recognition

Authors:

Solène Tarride,

Christopher KermorvantAuthors Info & Claims

Document Analysis and Recognition - ICDAR 2024: 18th International Conference, Athens, Greece, August 30–September 4, 2024, Proceedings, Part VI

Pages 167 - 182

https://doi.org/10.1007/978-3-031-70552-6_10

Published: 11 September 2024 Publication History

Abstract

In recent advances in automatic text recognition (ATR), deep neural networks have demonstrated the ability to implicitly capture language statistics, potentially reducing the need for traditional language models. This study directly addresses whether explicit language models, specifically n-gram models, still contribute to the performance of state-of-the-art deep learning architectures in the field of handwriting recognition. We evaluate two prominent neural network architectures, PyLaia [23] and DAN [8], with and without the integration of explicit n-gram language models. Our experiments on three datasets - IAM [19], RIMES [11], and NorHand v2 [2] - at both line and page level, investigate optimal parameters for n-gram models, including their order, weight, smoothing methods and tokenization level. The results show that incorporating character or subword n-gram models significantly improves the performance of ATR models on all datasets, challenging the notion that deep learning models alone are sufficient for optimal performance. In particular, the combination of DAN with a character language model outperforms current benchmarks, confirming the value of hybrid approaches in modern document analysis systems.

References

[1]

Arora, A., et al.: Using ASR methods for OCR. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 663–668 (2019).

[2]

Beyer, Y., Solberg, P.E.: Norhand v2/dataset for handwritten text recognition in Norwegian [data set] (2024).

[3]

Blecher, L., Cucurull, G., Scialom, T., Stojnic, R.: Nougat: Neural Optical Understanding for Academic Documents (2023). https://arxiv.org/abs/2308.13418

[4]

Bluche, T.: Joint line segmentation and transcription for end-to-end handwritten paragraph recognition. In: Proceedings of the 30th International Conference on Neural Information Processing Systems, NIPS 2016, pp. 838–846. Curran Associates Inc., Red Hook (2016).

Digital Library

[5]

Bluche, T., Louradour, J., Knibbe, M., Moysset, B., Benzeghiba, M.F., Kermorvant, C.: The A2iA handwritten arabic text recognition system at the OpenHaRT2013 evaluation campaign. In: Document Analysis Systems (2014).

Digital Library

[6]

Chen SF and Goodman J An empirical study of smoothing techniques for language modeling Comput. Speech Lang. 1999 13 4 359-394

Digital Library

[7]

Choi, H., Lee, J., Yang, J.: N-gram in swin transformers for efficient lightweight image super-resolution. In: 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2071–2081. IEEE Computer Society, Los Alamitos (2023).

[8]

Coquenet, D., Chatelain, C., Paquet, T.: DAN: a segmentation-free document attention network for handwritten document recognition. IEEE Trans. Pattern Anal. Mach. Intell. 1–17 (2023).

Digital Library

[9]

Coquenet D, Chatelain C, and Paquet T End-to-end handwritten paragraph text recognition using a vertical attention network IEEE Trans. Pattern Anal. Mach. Intell. 2023 45 1 508-524

[10]

Diaz, D.H., Qin, S., Ingle, R.R., Fujii, Y., Bissacco, A.: Rethinking text line recognition models. CoRR abs/2104.07787 (2021)

[11]

Grosicki, E., El-Abed, H.: ICDAR 2011 - French handwriting recognition competition. In: 2011 International Conference on Document Analysis and Recognition, pp. 1459–1463 (2011).

Digital Library

[12]

Jurafsky, D., Martin, J.: Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition (2020). https://web.stanford.edu/~jurafsky/slp3/ed3book.pdf

[13]

Kozielski, M., Doetsch, P., Ney, H.: Improvements in RWTH’s system for off-line handwriting recognition. In: International Conference on Document Analysis and Recognition (2013).

Digital Library

[14]

Kudo, T.: Subword regularization: Improving neural network translation models with multiple subword candidates. In: Gurevych, I., Miyao, Y. (eds.) Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Melbourne, Australia, pp. 66–75. Association for Computational Linguistics (2018).

[15]

Kudo, T., Richardson, J.: Sentencepiece: a simple and language independent subword tokenizer and detokenizer for neural text processing. CoRR abs/1808.06226 (2018).

[16]

Kumar, S., Nirschl, M., Holtmann-Rice, D., Liao, H., Suresh, A.T., Yu, F.: Lattice rescoring strategies for long short term memory language models in speech recognition (2017).

[17]

Li, M., et al.: TrOCR: transformer-based optical character recognition with pre-trained models. In: AAAI Conference on Artificial Intelligence (2021).

Digital Library

[18]

Maarand M, Beyer Y, Kåsen A, Fosseide KT, and Kermorvant C Uchida S, Barney E, and Eglin V A comprehensive comparison of open-source libraries for handwritten text recognition in Norwegian DAS 2022 2022 Heidelberg Springer 399-413

Digital Library

[19]

Marti UV and Bunke H The IAM-database: an English sentence database for offline handwriting recognition Int. J. Doc. Anal. Recognit. 2002 5 39-46

[20]

Min Z and Wang J Luo B, Cheng L, Wu ZG, Li H, and Li C Exploring the integration of large language models into automatic speech recognition systems: an empirical study Neural Information Processing 2024 Singapore Springer 69-84

[21]

Neto, A.F.S., Bezerra, B.L.D., Toselli, A.H., Lima, E.B.: A robust handwritten recognition system for learning on different data restriction scenarios. Pattern Recognit. Lett. 1, 1–7 (2022).

Digital Library

[22]

Nguyen, T.T.H., Jatowt, A., Coustaty, M., Doucet, A.: Survey of post-OCR processing approaches. ACM Comput. Surv. 54(6) (2021).

Digital Library

[23]

Puigcerver, J.: Are multidimensional recurrent layers really necessary for handwritten text recognition? In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 67–72 (2017).

[24]

Roy, A., et al.: N-grammer: augmenting transformers with latent n-grams (2022).

[25]

Sennrich, R., Haddow, B., Birch, A.: Neural machine translation of rare words with subword units. In: Erk, K., Smith, N.A. (eds.) Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Berlin, Germany, pp. 1715–1725. Association for Computational Linguistics (2016).

[26]

Soper, E., Fujimoto, S., Yu, Y.Y.: BART for post-correction of OCR newspaper text. In: Proceedings of the Seventh Workshop on Noisy User-generated Text (W-NUT 2021), pp. 284–290. Association for Computational Linguistics, Online (2021).

[27]

Tarride S, Boillet M, and Kermorvant C Fink GA, Jain R, Kise K, and Zanibbi R Key-value information extraction from full handwritten pages Document Analysis and Recognition - ICDAR 2023 2023 Cham Springer 185-204

Digital Library

[28]

Tarride, S., Schneider, Y., Generali, M., Boillet, M., Abadie, B., Kermorvant, C.: Improving automatic text recognition with language models in the pylaia open-source library. In: Submitted at ICDAR (2024)

[29]

Tassopoulou, V., Retsinas, G., Maragos, P.: Enhancing handwritten text recognition with n-gram sequence decomposition and multitask learning. In: 2020 25th International Conference on Pattern Recognition (ICPR), pp. 10555–10560. IEEE Computer Society, Los Alamitos (2021).

[30]

Voigtlaender, P., Doetsch, P., Ney, H.: Handwriting recognition with large multidimensional long short-term memory recurrent neural networks. In: 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 228–233 (2016).

[31]

Wang, D., et al.: DocLLM: a layout-aware generative language model for multimodal document understanding (2023)

[32]

Wigington C, Tensmeyer C, Davis B, Barrett W, Price B, and Cohen S Ferrari V, Hebert M, Sminchisescu C, and Weiss Y Start, follow, read: end-to-end full-page handwriting recognition Computer Vision – ECCV 2018 2018 Cham Springer 372-388

Digital Library

[33]

Xu, H., et al.: A pruned RNNLM lattice-rescoring algorithm for automatic speech recognition. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5929–5933 (2018).

Digital Library

[34]

Yousef, M., Bishop, T.E.: Origaminet: weakly-supervised, segmentation-free, one-step, full page textrecognition by learning to unfold. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2020).

[35]

Zhang, H., Liang, L., Jin, L.: SCUT-HCCDoc: a new benchmark dataset of handwritten chinese text in unconstrained camera-captured documents. Pattern Recognit. 107559 (2020).

Index Terms

Revisiting N-Gram Models: Their Impact in Modern Neural Networks for Handwritten Text Recognition
1. Applied computing
  1. Document management and text processing
    1. Document capture
      1. Document analysis
      2. Optical character recognition
    2. Document searching
2. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Natural language generation
  2. Machine learning
    1. Machine learning approaches
      1. Neural networks

Index terms have been assigned to the content through auto-classification.

Recommendations

Enhancing recurrent neural network-based language models by word tokenization

Different approaches have been used to estimate language models from a given corpus. Recently, researchers have used different neural network architectures to estimate the language models from a given corpus using unsupervised learning neural networks ...
Large Vocabulary Recognition of On-Line Handwritten Cursive Words

This paper presents a writer independent system for large vocabulary recognition of on-line handwritten cursive words. The system first uses a filtering module, based on simple letter features, to quickly reduce a large reference dictionary (lexicon) to ...
Linguistic Knowledge Within Handwritten Text Recognition Models: A Real-World Case Study
Document Analysis and Recognition - ICDAR 2023
Abstract
State-of-the-art handwritten text recognition models make frequent use of deep neural networks, with recurrent and connectionist temporal classification layers, which perform recognition over sequences of characters. This architecture may lead to ...

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings

Document Analysis and Recognition - ICDAR 2024: 18th International Conference, Athens, Greece, August 30–September 4, 2024, Proceedings, Part VI

Aug 2024

455 pages

ISBN:978-3-031-70551-9

DOI:10.1007/978-3-031-70552-6

Editors:
Elisa H. Barney Smith
https://ror.org/016st3p78Luleå Tekniska Universitet, Luleå, Sweden
,
Marcus Liwicki
https://ror.org/016st3p78Luleå Tekniska Universitet, Luleå, Sweden
,
Liangrui Peng
Tsinghua University, Beijing, China

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024.

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 11 September 2024

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 26 Jan 2025

Other Metrics

View Author Metrics

Citations

View Options

View options

Figures

Tables

Media

View Table of Conten