Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Improving Efficiency and Performance Through CTC-Based Transformers for Mathematical Expression Recognition

  • Conference paper
  • First Online:
Document Analysis and Recognition - ICDAR 2024 (ICDAR 2024)

Abstract

Mathematical expression recognition (MER) is an active field with important implications for educational technology, document analysis, and scientific research automation. The bi-dimensionality and complexity of mathematical expressions presents challenges in accurately interpreting expressions, such as distinguishing subscripts from superscripts and comprehending structures like fractions and matrices, which require robust recognition systems that integrate spatial understanding and structural awareness. In this paper, we address the problem of typeset mathematical expression recognition using a convolutional neural network backbone model and a transformer encoder trained with connectionist temporal classification loss. Compared to state-of-the-art systems, this encoder-only model proves highly efficient, achieving a speed-up factor of 3.5 and 4.2 for the IBEM and Im2Latex-100k datasets. Moreover, this approach outperforms most state-of-the-art systems on mark-up-level and image-level metrics across both datasets. A comprehensive study demonstrates the model’s capability to interpret complex reading orders of mathematical expressions, showing that the monotonicity of the CTC alignments is not a limitation of CTC-based models for the problem of MER. These findings underscore the effectiveness of this approach when compared to auto-regressive methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 64.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    KDD cup: https://kdd.org/kdd-cup.

  2. 2.

    Im2Latex-100k data: https://im2markup.yuntiandeng.com/data/.

  3. 3.

    IBEM data: https://zenodo.org/record/7963703.

References

  1. Anitei, D., Sánchez, J.A., Benedí, J.M.: Py4MER: a CTC-based mathematical expression recognition system. In: IbPRIA, pp. 426–441 (2023). https://doi.org/10.1007/978-3-031-36616-1_34

  2. Anitei, D., Sánchez, J.A., Benedí, J.M., Noya, E.: The IBEM dataset: a large printed scientific image dataset for indexing and searching mathematical expressions. Pattern Recognit. Lett. 172, 29–36 (2023). https://doi.org/10.1016/j.patrec.2023.05.033

    Article  Google Scholar 

  3. Bender, S., Haurilet, M., Roitberg, A., Stiefelhagen, R.: Learning fine-grained image representations for mathematical expression recognition. In: ICDARW, vol. 1, pp. 56–61 (2019). https://doi.org/10.1109/ICDARW.2019.00015

  4. Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: CVPR, vol. 1, pp. 886–893 (2005). https://doi.org/10.1109/CVPR.2005.177

  5. Deng, Y., Kanervisto, A., Ling, J., Rush, A.M.: Image-to-markup generation with coarse-to-fine attention. In: ICML, pp. 980–989 (2017). https://doi.org/10.48550/arXiv.1609.04938

  6. Felzenszwalb, P.F., Girshick, R.B., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part-based models. PAMI 32(9), 1627–1645 (2010). https://doi.org/10.1109/TPAMI.2009.167

    Article  Google Scholar 

  7. Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006). https://doi.org/10.1145/1143844.1143891

  8. Le, A.D., Pham, V.L., Ly, V.L., Nguyen, N.Q., Nguyen, H.T., Tran, T.A.: A hybrid vision transformer approach for mathematical expression recognition. In: DICTA, pp. 1–7 (2022). https://doi.org/10.1109/DICTA56598.2022.10034626

  9. Long, J., Hong, Q., Yang, L.: An encoder-decoder method with position-aware for printed mathematical expression recognition. In: ICDAR, pp. 167–181 (2023)

    Google Scholar 

  10. Loshchilov, I., Hutter, F.: Decoupled weight decay regularization (2017). https://arxiv.org/abs/1711.05101

  11. Mirkazemy, A., Adibi, P., Ehsani, S.M.S., Darvishy, A., Hutter, H.P.: Mathematical expression recognition using a new deep neural model. Neural Netw. 167, 865–874 (2023). https://doi.org/10.1016/j.neunet.2023.08.045

    Article  Google Scholar 

  12. Mouchère, H., Viard-Gaudin, C., Zanibbi, R., Garain, U.: ICFHR 2014 competition on recognition of on-line handwritten mathematical expressions (CROHME 2014). In: ICFHR, pp. 791–796 (2014). https://doi.org/10.1109/ICFHR.2014.138

  13. Noya, E., Benedí, J., Sánchez, J., Anitei, D.: Discriminative learning of two-dimensional probabilistic context-free grammars for mathematical expression recognition and retrieval. In: IbPRIA, pp. 333–347 (2022). https://doi.org/10.1007/978-3-031-04881-4_27

  14. Noya, E., Sánchez, J., Benedí, J.: Generation of hypergraphs from the n-best parsing of 2D-probabilistic context-free grammars for mathematical expression recognition. In: ICPR, pp. 5696–5703 (2021). https://doi.org/10.1109/ICPR48806.2021.9412273

  15. Noya, E., Benedí, J., Sánchez, J.A., Anitei, D.: Discriminative estimation of probabilistic context-free grammars for mathematical expression recognition and retrieval. Pattern Anal. Appl. 26, 1–14 (2023). https://doi.org/10.1007/s10044-023-01158-8

    Article  Google Scholar 

  16. Pang, N., Yang, C., Zhu, X., Li, J., Yin, X.C.: Global context-based network with transformer for image2latex. In: ICPR, pp. 4650–4656 (2021). https://doi.org/10.1109/ICPR48806.2021.9412072

  17. Papineni, K., Roukos, S., Ward, T., Zhu, W.: BLEU: a method for automatic evaluation of machine translation. In: ACL, pp. 311–318 (2002). https://doi.org/10.3115/1073083.1073135

  18. Parres, D., Paredes, R.: Fine-tuning vision encoder–decoder transformers for handwriting text recognition on historical documents. In: Proceedings of the 17th International Conference on Document Analysis and Recognition, pp. 253–268 (2023). https://doi.org/10.1007/978-3-031-41685-9_16

  19. Peng, S., Gao, L., Yuan, K., Tang, Z.: Image to LaTeX with graph neural network for mathematical formula recognition. In: ICDAR, pp. 648–663 (2021). https://doi.org/10.1007/978-3-030-86331-9_42

  20. Povey, D., et al.: The Kaldi speech recognition toolkit. In: IEEE Workshop ASRU (2011). https://www.danielpovey.com/files/2011_asru_kaldi.pdf

  21. Povey, D., et al.: Generating exact lattices in the WFST framework. In: ICASSP, pp. 4213–4216 (2012). https://doi.org/10.1109/ICASSP.2012.6288848

  22. Suzuki, M., Tamari, F., Fukuda, R., Uchida, S., Kanahori, T.: INFTY-an integrated OCR system for mathematical documents. In: ACM, pp. 95–104 (2003). https://doi.org/10.1145/958220.958239

  23. Vaswani, A., et al.: Attention is all you need. In: NeurIPS (2017). https://api.semanticscholar.org/CorpusID:13756489

  24. Wang, Z., Liu, J.C.: Translating math formula images to LaTeX sequences using deep neural networks with sequence-level training. IJDAR 24, 63–75 (2021). https://doi.org/10.1007/s10032-020-00360-2

    Article  Google Scholar 

  25. Yan, Z., Zhang, X., Gao, L., Yuan, K., Tang, Z.: ConvMath: a convolutional sequence network for mathematical expression recognition. In: ICPR, pp. 4566–4572 (2021). https://doi.org/10.1109/ICPR48806.2021.9412913

  26. Zanibbi, R., Blostein, D., Cordy, J.: Recognizing mathematical expressions using tree transformation. PAMI 24(11), 1–13 (2002). https://doi.org/10.1109/TPAMI.2002.1046157

    Article  Google Scholar 

  27. Zhang, J., Du, J., Dai, L.: Multi-scale attention with dense encoder for handwritten mathematical expression recognition. In: ICPR, pp. 2245–2250 (2018). https://doi.org/10.1109/ICPR.2018.8546031

  28. Zhang, J., et al.: Watch, Attend and Parse: an end-to-end neural network based approach to handwritten mathematical expression recognition. Pattern Recogn. 71, 196–206 (2017). https://doi.org/10.1016/j.patcog.2017.06.017

    Article  Google Scholar 

  29. Zhang, W., Bai, Z., Zhu, Y.: An improved approach based on CNN-RNNs for mathematical expression recognition. In: ICMSSP, pp. 57–61 (2019). https://doi.org/10.1145/3330393.3330410

  30. Álvaro, F., Sánchez, J., Benedí. J.: An integrated grammar-based approach for mathematical expression recognition. Pattern Recogn. 51, 135–147 (2016). https://doi.org/10.1016/j.patcog.2015.09.013

Download references

Acknowledgment

Work was partially supported by the Generalitat Valenciana under the predoctoral grant CIACIF/2021/313, by grant PID2020-116813RB-I00 funded by MCIN/AEI/ 10.13039/501100011033, with the support of valgrAI—Valencian Graduate School and Research Network of Artificial Intelligence and the Generalitat Valenciana, and cofunded by the European Union.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dan Anitei .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Anitei, D., Parres, D., Sánchez, J.A., Benedí, J.M. (2024). Improving Efficiency and Performance Through CTC-Based Transformers for Mathematical Expression Recognition. In: Barney Smith, E.H., Liwicki, M., Peng, L. (eds) Document Analysis and Recognition - ICDAR 2024. ICDAR 2024. Lecture Notes in Computer Science, vol 14808. Springer, Cham. https://doi.org/10.1007/978-3-031-70549-6_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-70549-6_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-70548-9

  • Online ISBN: 978-3-031-70549-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics