Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3624918.3625337acmconferencesArticle/Chapter ViewAbstractPublication Pagessigir-apConference Proceedingsconference-collections
research-article

Recommending Answers to Math Questions Based on KL-Divergence and Approximate XML Tree Matching

Published: 26 November 2023 Publication History

Abstract

Math is the science and study of quality, structure, space, and change. It seeks out patterns, formulates new conjectures, and establishes the truth by rigorous deduction from appropriately chosen axioms and definitions. The study of math makes a person better at solving problems. It gives someone skills that can use across other subjects and apply in different job roles. In the modern world, builders use math every day to do their work, since construction workers add, subtract, divide, multiply, and work with fractions. It is obvious that math is a major contributor to many areas of study. For this reason, math information retrieval (Math IR) deserves attention and recognition, since a reliable Math IR system helps users find relevant answers to math questions and benefits all math learners whenever they need help solve a math problem, regardless of the time and place. Moreover, Math IR systems enhance the learning experience of their users. In this paper, we present MaRec, a recommender system that retrieves and ranks math answers based on their textual content and embedded formulas in answering a math question. MaRec ranks a potential answer A given a math question Q by computing the (i) KL-divergence score on A and Q using their textual contents, and (ii) the subtree matching score of the math formulas in Q and A represented as XML trees. The design of MaRec is simple and easy to understand, since it solely relies on a probability model and an elegant tree-matching approach in ranking math answers. Conducted empirical studies show that MaRec significantly outperforms (i) three existing state-of-the-art MathIR systems based on an offline evaluation, and (ii) two top-of-the-line machine learning systems based on an online analysis.

References

[1]
P. Ahern. 2023. 27 Mind-Bottling SEO Stats for 2023 (+ Beyond). https://inter-growth.co/seo-stats/. Intergrowth.
[2]
N. Belkin, R. Oddy, and H. Brooks. 1982. ASK for Information Retrieval: Part I. Background and Theory. Journal of Documentation (1982).
[3]
S. Bhatia, D. Majumdar, and P. Mitra. 2011. Query Suggestions in the Absence of Query Logs. In Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval. 795–804.
[4]
David Blei, Andrew Ng, and Michael Jordan. 2001. Latent dirichlet allocation. Advances in neural information processing systems 14 (2001).
[5]
T. Brown, B. Mann, N. Ryder, M. Subbiah, J. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, and A. Askell. 2020. Language Models are Few-Shot Learners. Advances in Neural Information Processing Systems 33 (2020), 1877–1901.
[6]
Y. Bu, S. Zou, Y. Liang, and V. Veeravalli. 2018. Estimation of KL Divergence: Optimal Minimax Rate. IEEE Transactions on Information Theory 64, 4 (2018), 2648–2674.
[7]
The Nation’s REport Card. 2019. National Achievement-Level Results.
[8]
G. Cormack, C. Clarke, and S. Buettcher. 2009. Reciprocal Rank Fusion Outperforms Condorcet and Individual Rank Learning Methods. In Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval. 758–759.
[9]
W. Croft, D. Metzler, and T. Strohman. 2010. Search Engines: Information Retrieval in Practice. Addison Wesley.
[10]
D. Carlisle and P. Ion and R. Miner. 2021. Mathematical Markup Language (MathML), Version 3.0, 2nd Edition. W3C. https://www.w3.org/ TR/2014/REC-MathML3-20140410/.
[11]
P. Dadure, P. Pakray, and S. Bandyopadhyay. 2022. Embedding and Generalization of Formula with Context in the Retrieval of Mathematical Information. King Saud University-Computer and Information Sciences 34, 9 (2022), 6624–6634.
[12]
K. Davila and R. Zanibbi. 2017. Layout and Semantics: Combining Representations for Mathematical Formula Search. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval. 1165–1168.
[13]
J. Devlin, M. Chang, K. Lee, and K. Toutanova. 2018. Bert: Pre-Training of Deep Bidirectional Transformers for Language Understanding. arXiv preprint arXiv:1810.04805 (2018).
[14]
S. Dominich. 2001. Mathematical Foundations of Information Retrieval. Vol. 12. Springer Science & Business Media.
[15]
R. Fatima. 2012. Role of Mathematics in the Development of Society. National Meet on Celebration of National Year of Mathematics. Organized by NCERT, New Delhi 1 (2012), 12.
[16]
L. Fredrik. [n. d.]. xml.etree.ElementTree-The ElementTree XML API. https://github.com/python/cpython/tree/3.11/Lib/xml/etree/ElementTree.py.
[17]
P. Ginsparg. 2021. Lessons from arXiv’s 30 Years of Information Sharing. Nature Reviews Physics 3, 9 (2021), 602–603.
[18]
P. Gupta and V. Gupta. 2012. A Survey of Text Question Answering Techniques. International Journal of Computer Applications 53, 4 (2012).
[19]
X. Hu, L. Gao, X. Lin, Z. Tang, X. Lin, and J. Baker. 2013. Wikimirs: A Mathematical Information Retrieval System for Wikipedia. In Proceedings of the 13th ACM/IEEE-CS joint conference on Digital Libraries. 11–20.
[20]
B. Jansen, A. Spink, and T. Saracevic. 2000. Real Life, Real Users, and Real Needs: a Study and Analysis of User Queries on the Web. IPM 36, 2 (2000), 207–227.
[21]
B. Jones and M. Kenward. 2003. Design and Analysis of Cross-Over Trials, 2nd Ed.Chapman and Hall.
[22]
L. Kazmier. 2003. Schaum’s Outline of Business Statistics. McGraw-Hill.
[23]
M. Lewis, Y. Liu, N. Goyal, M. Ghazvininejad, A. Mohamed, O. Levy, V. Stoyanov, and L. Zettlemoyer. 2019. Bart: Denoising Sequence-to-Sequence Pre-Training for Natural Language Generation, Translation, and Comprehension. arXiv preprint arXiv:1910.13461 (2019).
[24]
M. Líška, P. Sojka, and M. Ružička. 2015. Combining Text and Formula Queries in Math Information Retrieval: Evaluation of Query Results Merging Strategies. In Proceedings of NWSearch. 7–9.
[25]
X. Luo, A. Baranova, and J. Biegert. 2019. Problemsolver at Semeval-2019 Task 10: Sequence-to-Sequence Learning and Expression Trees. In Proceedings of the 13th International Workshop on Semantic Evaluation. 1292–1296.
[26]
C. Manning, P. Raghavan, and H. Schütze. 2008. Introduction to Information Retrieval. Cambridge University Press Cambridge.
[27]
B. Mansouri, V. Novotnỳ, A. Agarwal, D. Oard, and R. Zanibbi. 2022. Third CLEF Lab on Answer Retrieval for Questions on Math (Working Notes Version). Proceedings of the CLEF 2022 (CEUR Working Notes) (2022).
[28]
B. Mansouri, S. Rohatgi, D. Oard, J. Wu, C. Giles, and R. Zanibbi. 2019. Tangent-CFT: An Embedding Model for Mathematical Formulas. In Proceedings of the 2019 ACM SIGIR International Conference on Theory of Information Retrieval. 11–18.
[29]
B. Miller and A. Youssef. 2003. Technical Aspects of the Digital Library of Mathematical Functions. Annals of Math. & AI 38, 1 (2003), 121–136.
[30]
Y. Ng, D. Fraser, B. Kassaie, and F. Tompa. 2021. Dowsing for Math Answers. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 201–212.
[31]
T. Nguyen, K. Chang, and S. Hui. 2012. A Math-Aware Search Engine for Math Question Answering System. In Proceedings of the 21st ACM International Conference on Information and Knowledge Management. 724–733.
[32]
V. Novotnỳ, P. Sojka, M. Stefánik, and D. Lupták. 2020. Three is Better than One: Ensembling Math Information Retrieval Systems. In CLEF (Working Notes).
[33]
A. Pathak, P. Pakray, and A. Gelbukh. 2018. A Formula Embedding Approach to Math Information Retrieval. Computación y Sistemas 22, 3 (2018), 819–833.
[34]
S. Peng, K. Yuan, L. Gao, and Z. Tang. 2021. Mathbert: A Pre-Trained Model for Mathematical Formula Understanding. arXiv preprint arXiv:2105.00377 (2021).
[35]
S. Robertson and H. Zaragoza. 2009. The Probabilistic Relevance Framework: BM25 and Beyond. Foundations and Trends in IR 3, 4 (2009), 333–389.
[36]
L. Rozakis. 2002. Test Taking Strategies and Study Skills for the Utterly Confused. McGraw Hill.
[37]
M. Schubotz, A. Grigorev, M. Leich, H. Cohl, N. Meuschke, B. Gipp, A. Youssef, and V.Markl. 2016. Semantification of Identifiers in Mathematics for Better Math Information Retrieval. In Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval. 135–144.
[38]
P. Sojka and M. Líška. 2011. The Art of Mathematics Retrieval. In Proceedings of the 11th ACM Symposium on Document Engineering. 57–60.
[39]
R. Srihari and W. Li. 2000. A Question Answering System Supported by Information Extraction. In Sixth Applied Natural Language Processing Conference. 166–172.
[40]
D. Stalnaker. 2013. Math Expression Retrieval Using Symbol Pairs in Layout Trees. Master’s thesis. Rochester Institute of Technology.
[41]
Y. Stathopoulos and S. Teufel. 2016. Mathematical Information Retrieval Based on Type Embeddings and Query Expansion. In Proceedings of COLING. 2344–2355.
[42]
Public School View. 2023. Average Public School Math Proficiency. https://publicschoolreview.com/average-math-proficiency-stats/national-data.
[43]
Y. Wang, X. Liu, and S. Shi. 2017. Deep Neural Solver for Math Word Problems. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 845–854.
[44]
WebFX. 2022. 95 SEO Statistics from This Year That’ll Transform Your Strategy. https://www.webfx.com/seo/statistics/.
[45]
R. Zanibbi and D. Blostein. 2012. Recognition and Retrieval of Mathematical Expressions. Document Analysis and Recognition (IJDAR) 15, 4 (2012), 331–357.
[46]
R. Zanibbi and D. Blostein. 2012. Recognition and Retrieval of Mathematical Expressions. Document Analysis and Recognition (IJDAR) 15, 4 (2012), 331–357.
[47]
K. Zhang. 1996. A Constrained Edit Distance between Unordered Labeled Trees. Algorithmica 15, 3 (1996), 205–222.
[48]
Z. Zhang, T. Wang, X. Song, and Y. Wang. 2022. The Design and Implementation of the Natural Handwriting Mathematical Formula Recognition System. In Proceedings of the 6th International Conference on Advances in Image Processing. 114–121.
[49]
J. Zhao, M. Kan, and Y. Theng. 2008. Math Information Retrieval: User Requirements and Prototype Implementation. In Proceedings of the 8th ACM/IEEE-CS joint conference on Digital Libraries. 187–196.
[50]
W. Zhong, J. Yang, and J. Lin. 2022. Evaluating Token-Level and Passage-Level Dense Retrieval Models for Math Information Retrieval. arXiv preprint arXiv:2203.11163 (2022).

Cited By

View all

Index Terms

  1. Recommending Answers to Math Questions Based on KL-Divergence and Approximate XML Tree Matching
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image ACM Conferences
        SIGIR-AP '23: Proceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region
        November 2023
        324 pages
        ISBN:9798400704086
        DOI:10.1145/3624918
        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Sponsors

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 26 November 2023

        Permissions

        Request permissions for this article.

        Check for updates

        Author Tags

        1. KL-divergence
        2. content similarity
        3. math questions and answers
        4. subtree matching

        Qualifiers

        • Research-article
        • Research
        • Refereed limited

        Conference

        SIGIR-AP '23
        Sponsor:

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • 0
          Total Citations
        • 50
          Total Downloads
        • Downloads (Last 12 months)50
        • Downloads (Last 6 weeks)4
        Reflects downloads up to 16 Oct 2024

        Other Metrics

        Citations

        Cited By

        View all

        View Options

        Get Access

        Login options

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        HTML Format

        View this article in HTML Format.

        HTML Format

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media