Abstract
Mathematical expression recognition (MER) is an active field with important implications for educational technology, document analysis, and scientific research automation. The bi-dimensionality and complexity of mathematical expressions presents challenges in accurately interpreting expressions, such as distinguishing subscripts from superscripts and comprehending structures like fractions and matrices, which require robust recognition systems that integrate spatial understanding and structural awareness. In this paper, we address the problem of typeset mathematical expression recognition using a convolutional neural network backbone model and a transformer encoder trained with connectionist temporal classification loss. Compared to state-of-the-art systems, this encoder-only model proves highly efficient, achieving a speed-up factor of 3.5 and 4.2 for the IBEM and Im2Latex-100k datasets. Moreover, this approach outperforms most state-of-the-art systems on mark-up-level and image-level metrics across both datasets. A comprehensive study demonstrates the model’s capability to interpret complex reading orders of mathematical expressions, showing that the monotonicity of the CTC alignments is not a limitation of CTC-based models for the problem of MER. These findings underscore the effectiveness of this approach when compared to auto-regressive methods.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
KDD cup: https://kdd.org/kdd-cup.
- 2.
Im2Latex-100k data: https://im2markup.yuntiandeng.com/data/.
- 3.
IBEM data: https://zenodo.org/record/7963703.
References
Anitei, D., Sánchez, J.A., Benedí, J.M.: Py4MER: a CTC-based mathematical expression recognition system. In: IbPRIA, pp. 426–441 (2023). https://doi.org/10.1007/978-3-031-36616-1_34
Anitei, D., Sánchez, J.A., Benedí, J.M., Noya, E.: The IBEM dataset: a large printed scientific image dataset for indexing and searching mathematical expressions. Pattern Recognit. Lett. 172, 29–36 (2023). https://doi.org/10.1016/j.patrec.2023.05.033
Bender, S., Haurilet, M., Roitberg, A., Stiefelhagen, R.: Learning fine-grained image representations for mathematical expression recognition. In: ICDARW, vol. 1, pp. 56–61 (2019). https://doi.org/10.1109/ICDARW.2019.00015
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: CVPR, vol. 1, pp. 886–893 (2005). https://doi.org/10.1109/CVPR.2005.177
Deng, Y., Kanervisto, A., Ling, J., Rush, A.M.: Image-to-markup generation with coarse-to-fine attention. In: ICML, pp. 980–989 (2017). https://doi.org/10.48550/arXiv.1609.04938
Felzenszwalb, P.F., Girshick, R.B., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part-based models. PAMI 32(9), 1627–1645 (2010). https://doi.org/10.1109/TPAMI.2009.167
Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006). https://doi.org/10.1145/1143844.1143891
Le, A.D., Pham, V.L., Ly, V.L., Nguyen, N.Q., Nguyen, H.T., Tran, T.A.: A hybrid vision transformer approach for mathematical expression recognition. In: DICTA, pp. 1–7 (2022). https://doi.org/10.1109/DICTA56598.2022.10034626
Long, J., Hong, Q., Yang, L.: An encoder-decoder method with position-aware for printed mathematical expression recognition. In: ICDAR, pp. 167–181 (2023)
Loshchilov, I., Hutter, F.: Decoupled weight decay regularization (2017). https://arxiv.org/abs/1711.05101
Mirkazemy, A., Adibi, P., Ehsani, S.M.S., Darvishy, A., Hutter, H.P.: Mathematical expression recognition using a new deep neural model. Neural Netw. 167, 865–874 (2023). https://doi.org/10.1016/j.neunet.2023.08.045
Mouchère, H., Viard-Gaudin, C., Zanibbi, R., Garain, U.: ICFHR 2014 competition on recognition of on-line handwritten mathematical expressions (CROHME 2014). In: ICFHR, pp. 791–796 (2014). https://doi.org/10.1109/ICFHR.2014.138
Noya, E., Benedí, J., Sánchez, J., Anitei, D.: Discriminative learning of two-dimensional probabilistic context-free grammars for mathematical expression recognition and retrieval. In: IbPRIA, pp. 333–347 (2022). https://doi.org/10.1007/978-3-031-04881-4_27
Noya, E., Sánchez, J., Benedí, J.: Generation of hypergraphs from the n-best parsing of 2D-probabilistic context-free grammars for mathematical expression recognition. In: ICPR, pp. 5696–5703 (2021). https://doi.org/10.1109/ICPR48806.2021.9412273
Noya, E., Benedí, J., Sánchez, J.A., Anitei, D.: Discriminative estimation of probabilistic context-free grammars for mathematical expression recognition and retrieval. Pattern Anal. Appl. 26, 1–14 (2023). https://doi.org/10.1007/s10044-023-01158-8
Pang, N., Yang, C., Zhu, X., Li, J., Yin, X.C.: Global context-based network with transformer for image2latex. In: ICPR, pp. 4650–4656 (2021). https://doi.org/10.1109/ICPR48806.2021.9412072
Papineni, K., Roukos, S., Ward, T., Zhu, W.: BLEU: a method for automatic evaluation of machine translation. In: ACL, pp. 311–318 (2002). https://doi.org/10.3115/1073083.1073135
Parres, D., Paredes, R.: Fine-tuning vision encoder–decoder transformers for handwriting text recognition on historical documents. In: Proceedings of the 17th International Conference on Document Analysis and Recognition, pp. 253–268 (2023). https://doi.org/10.1007/978-3-031-41685-9_16
Peng, S., Gao, L., Yuan, K., Tang, Z.: Image to LaTeX with graph neural network for mathematical formula recognition. In: ICDAR, pp. 648–663 (2021). https://doi.org/10.1007/978-3-030-86331-9_42
Povey, D., et al.: The Kaldi speech recognition toolkit. In: IEEE Workshop ASRU (2011). https://www.danielpovey.com/files/2011_asru_kaldi.pdf
Povey, D., et al.: Generating exact lattices in the WFST framework. In: ICASSP, pp. 4213–4216 (2012). https://doi.org/10.1109/ICASSP.2012.6288848
Suzuki, M., Tamari, F., Fukuda, R., Uchida, S., Kanahori, T.: INFTY-an integrated OCR system for mathematical documents. In: ACM, pp. 95–104 (2003). https://doi.org/10.1145/958220.958239
Vaswani, A., et al.: Attention is all you need. In: NeurIPS (2017). https://api.semanticscholar.org/CorpusID:13756489
Wang, Z., Liu, J.C.: Translating math formula images to LaTeX sequences using deep neural networks with sequence-level training. IJDAR 24, 63–75 (2021). https://doi.org/10.1007/s10032-020-00360-2
Yan, Z., Zhang, X., Gao, L., Yuan, K., Tang, Z.: ConvMath: a convolutional sequence network for mathematical expression recognition. In: ICPR, pp. 4566–4572 (2021). https://doi.org/10.1109/ICPR48806.2021.9412913
Zanibbi, R., Blostein, D., Cordy, J.: Recognizing mathematical expressions using tree transformation. PAMI 24(11), 1–13 (2002). https://doi.org/10.1109/TPAMI.2002.1046157
Zhang, J., Du, J., Dai, L.: Multi-scale attention with dense encoder for handwritten mathematical expression recognition. In: ICPR, pp. 2245–2250 (2018). https://doi.org/10.1109/ICPR.2018.8546031
Zhang, J., et al.: Watch, Attend and Parse: an end-to-end neural network based approach to handwritten mathematical expression recognition. Pattern Recogn. 71, 196–206 (2017). https://doi.org/10.1016/j.patcog.2017.06.017
Zhang, W., Bai, Z., Zhu, Y.: An improved approach based on CNN-RNNs for mathematical expression recognition. In: ICMSSP, pp. 57–61 (2019). https://doi.org/10.1145/3330393.3330410
Álvaro, F., Sánchez, J., Benedí. J.: An integrated grammar-based approach for mathematical expression recognition. Pattern Recogn. 51, 135–147 (2016). https://doi.org/10.1016/j.patcog.2015.09.013
Acknowledgment
Work was partially supported by the Generalitat Valenciana under the predoctoral grant CIACIF/2021/313, by grant PID2020-116813RB-I00 funded by MCIN/AEI/ 10.13039/501100011033, with the support of valgrAI—Valencian Graduate School and Research Network of Artificial Intelligence and the Generalitat Valenciana, and cofunded by the European Union.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Anitei, D., Parres, D., Sánchez, J.A., Benedí, J.M. (2024). Improving Efficiency and Performance Through CTC-Based Transformers for Mathematical Expression Recognition. In: Barney Smith, E.H., Liwicki, M., Peng, L. (eds) Document Analysis and Recognition - ICDAR 2024. ICDAR 2024. Lecture Notes in Computer Science, vol 14808. Springer, Cham. https://doi.org/10.1007/978-3-031-70549-6_1
Download citation
DOI: https://doi.org/10.1007/978-3-031-70549-6_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-70548-9
Online ISBN: 978-3-031-70549-6
eBook Packages: Computer ScienceComputer Science (R0)