Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1007/978-3-031-70552-6_10guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Revisiting N-Gram Models: Their Impact in Modern Neural Networks for Handwritten Text Recognition

Published: 11 September 2024 Publication History

Abstract

In recent advances in automatic text recognition (ATR), deep neural networks have demonstrated the ability to implicitly capture language statistics, potentially reducing the need for traditional language models. This study directly addresses whether explicit language models, specifically n-gram models, still contribute to the performance of state-of-the-art deep learning architectures in the field of handwriting recognition. We evaluate two prominent neural network architectures, PyLaia [23] and DAN [8], with and without the integration of explicit n-gram language models. Our experiments on three datasets - IAM [19], RIMES [11], and NorHand v2 [2] - at both line and page level, investigate optimal parameters for n-gram models, including their order, weight, smoothing methods and tokenization level. The results show that incorporating character or subword n-gram models significantly improves the performance of ATR models on all datasets, challenging the notion that deep learning models alone are sufficient for optimal performance. In particular, the combination of DAN with a character language model outperforms current benchmarks, confirming the value of hybrid approaches in modern document analysis systems.

References

[1]
Arora, A., et al.: Using ASR methods for OCR. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 663–668 (2019).
[2]
Beyer, Y., Solberg, P.E.: Norhand v2/dataset for handwritten text recognition in Norwegian [data set] (2024).
[3]
Blecher, L., Cucurull, G., Scialom, T., Stojnic, R.: Nougat: Neural Optical Understanding for Academic Documents (2023). https://arxiv.org/abs/2308.13418
[4]
Bluche, T.: Joint line segmentation and transcription for end-to-end handwritten paragraph recognition. In: Proceedings of the 30th International Conference on Neural Information Processing Systems, NIPS 2016, pp. 838–846. Curran Associates Inc., Red Hook (2016).
[5]
Bluche, T., Louradour, J., Knibbe, M., Moysset, B., Benzeghiba, M.F., Kermorvant, C.: The A2iA handwritten arabic text recognition system at the OpenHaRT2013 evaluation campaign. In: Document Analysis Systems (2014).
[6]
Chen SF and Goodman J An empirical study of smoothing techniques for language modeling Comput. Speech Lang. 1999 13 4 359-394
[7]
Choi, H., Lee, J., Yang, J.: N-gram in swin transformers for efficient lightweight image super-resolution. In: 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2071–2081. IEEE Computer Society, Los Alamitos (2023).
[8]
Coquenet, D., Chatelain, C., Paquet, T.: DAN: a segmentation-free document attention network for handwritten document recognition. IEEE Trans. Pattern Anal. Mach. Intell. 1–17 (2023).
[9]
Coquenet D, Chatelain C, and Paquet T End-to-end handwritten paragraph text recognition using a vertical attention network IEEE Trans. Pattern Anal. Mach. Intell. 2023 45 1 508-524
[10]
Diaz, D.H., Qin, S., Ingle, R.R., Fujii, Y., Bissacco, A.: Rethinking text line recognition models. CoRR abs/2104.07787 (2021)
[11]
Grosicki, E., El-Abed, H.: ICDAR 2011 - French handwriting recognition competition. In: 2011 International Conference on Document Analysis and Recognition, pp. 1459–1463 (2011).
[12]
Jurafsky, D., Martin, J.: Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition (2020). https://web.stanford.edu/~jurafsky/slp3/ed3book.pdf
[13]
Kozielski, M., Doetsch, P., Ney, H.: Improvements in RWTH’s system for off-line handwriting recognition. In: International Conference on Document Analysis and Recognition (2013).
[14]
Kudo, T.: Subword regularization: Improving neural network translation models with multiple subword candidates. In: Gurevych, I., Miyao, Y. (eds.) Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Melbourne, Australia, pp. 66–75. Association for Computational Linguistics (2018).
[15]
Kudo, T., Richardson, J.: Sentencepiece: a simple and language independent subword tokenizer and detokenizer for neural text processing. CoRR abs/1808.06226 (2018).
[16]
Kumar, S., Nirschl, M., Holtmann-Rice, D., Liao, H., Suresh, A.T., Yu, F.: Lattice rescoring strategies for long short term memory language models in speech recognition (2017).
[17]
Li, M., et al.: TrOCR: transformer-based optical character recognition with pre-trained models. In: AAAI Conference on Artificial Intelligence (2021).
[18]
Maarand M, Beyer Y, Kåsen A, Fosseide KT, and Kermorvant C Uchida S, Barney E, and Eglin V A comprehensive comparison of open-source libraries for handwritten text recognition in Norwegian DAS 2022 2022 Heidelberg Springer 399-413
[19]
Marti UV and Bunke H The IAM-database: an English sentence database for offline handwriting recognition Int. J. Doc. Anal. Recognit. 2002 5 39-46
[20]
Min Z and Wang J Luo B, Cheng L, Wu ZG, Li H, and Li C Exploring the integration of large language models into automatic speech recognition systems: an empirical study Neural Information Processing 2024 Singapore Springer 69-84
[21]
Neto, A.F.S., Bezerra, B.L.D., Toselli, A.H., Lima, E.B.: A robust handwritten recognition system for learning on different data restriction scenarios. Pattern Recognit. Lett. 1, 1–7 (2022).
[22]
Nguyen, T.T.H., Jatowt, A., Coustaty, M., Doucet, A.: Survey of post-OCR processing approaches. ACM Comput. Surv. 54(6) (2021).
[23]
Puigcerver, J.: Are multidimensional recurrent layers really necessary for handwritten text recognition? In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 67–72 (2017).
[24]
Roy, A., et al.: N-grammer: augmenting transformers with latent n-grams (2022).
[25]
Sennrich, R., Haddow, B., Birch, A.: Neural machine translation of rare words with subword units. In: Erk, K., Smith, N.A. (eds.) Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Berlin, Germany, pp. 1715–1725. Association for Computational Linguistics (2016).
[26]
Soper, E., Fujimoto, S., Yu, Y.Y.: BART for post-correction of OCR newspaper text. In: Proceedings of the Seventh Workshop on Noisy User-generated Text (W-NUT 2021), pp. 284–290. Association for Computational Linguistics, Online (2021).
[27]
Tarride S, Boillet M, and Kermorvant C Fink GA, Jain R, Kise K, and Zanibbi R Key-value information extraction from full handwritten pages Document Analysis and Recognition - ICDAR 2023 2023 Cham Springer 185-204
[28]
Tarride, S., Schneider, Y., Generali, M., Boillet, M., Abadie, B., Kermorvant, C.: Improving automatic text recognition with language models in the pylaia open-source library. In: Submitted at ICDAR (2024)
[29]
Tassopoulou, V., Retsinas, G., Maragos, P.: Enhancing handwritten text recognition with n-gram sequence decomposition and multitask learning. In: 2020 25th International Conference on Pattern Recognition (ICPR), pp. 10555–10560. IEEE Computer Society, Los Alamitos (2021).
[30]
Voigtlaender, P., Doetsch, P., Ney, H.: Handwriting recognition with large multidimensional long short-term memory recurrent neural networks. In: 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 228–233 (2016).
[31]
Wang, D., et al.: DocLLM: a layout-aware generative language model for multimodal document understanding (2023)
[32]
Wigington C, Tensmeyer C, Davis B, Barrett W, Price B, and Cohen S Ferrari V, Hebert M, Sminchisescu C, and Weiss Y Start, follow, read: end-to-end full-page handwriting recognition Computer Vision – ECCV 2018 2018 Cham Springer 372-388
[33]
Xu, H., et al.: A pruned RNNLM lattice-rescoring algorithm for automatic speech recognition. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5929–5933 (2018).
[34]
Yousef, M., Bishop, T.E.: Origaminet: weakly-supervised, segmentation-free, one-step, full page textrecognition by learning to unfold. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2020).
[35]
Zhang, H., Liang, L., Jin, L.: SCUT-HCCDoc: a new benchmark dataset of handwritten chinese text in unconstrained camera-captured documents. Pattern Recognit. 107559 (2020).

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings
Document Analysis and Recognition - ICDAR 2024: 18th International Conference, Athens, Greece, August 30–September 4, 2024, Proceedings, Part VI
Aug 2024
455 pages
ISBN:978-3-031-70551-9
DOI:10.1007/978-3-031-70552-6

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 11 September 2024

Author Tags

  1. Handwritten Text Recognition
  2. Neural Networks
  3. Statistical Language Modeling
  4. Tokenization

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 26 Jan 2025

Other Metrics

Citations

View Options

View options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media