Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2600428.2609611acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
research-article

A mathematics retrieval system for formulae in layout presentations

Published: 03 July 2014 Publication History

Abstract

The semantics of mathematical formulae depend on their spatial structure, and they usually exist in layout presentations such as PDF, LaTeX, and Presentation MathML, which challenges previous text index and retrieval methods. This paper proposes an innovative mathematics retrieval system along with the novel algorithms, which enables efficient formula index and retrieval from both webpages and PDF documents. Unlike prior studies, which require users to manually input formula markup language as query, the new system enables users to "copy" formula queries directly from PDF documents. Furthermore, by using a novel indexing and matching model, the system is aimed at searching for similar mathematical formulae based on both textual and spatial similarities. A hierarchical generalization technique is proposed to generate sub-trees from the semi-operator tree of formulae and support substructure match and fuzzy match. Experiments based on massive Wikipedia and CiteSeer repositories show that the new system along with novel algorithms, comparing with two representative mathematics retrieval systems, provides more efficient mathematical formula index and retrieval, while simplifying user query input for PDF documents.

References

[1]
A. Aula and M. Káki. Understanding expert search strategies for designing user-friendly search interfaces. In ICWI, pages 759--762, 2003.
[2]
X. Hu, L. Gao, X. Lin, Z. Tang, X. Lin, and J. B. Baker. Wikimirs: a mathematical information retrieval system for wikipedia. In The 13th ACM/IEEE-CS joint conf. on Digital libraries, pages 11--20, 2013.
[3]
K. Járvelin and J. Kekáláinen. Ir evaluation methods for retrieving highly relevant documents. In The 23rd Int. ACM SIGIR Conf., pages 41--48. ACM, 2000.
[4]
S. Kamali and F. W. Tompa. Retrieving documents with mathematical content. In The 36th Int. ACM SIGIR Conf., pages 353--362. ACM, 2013.
[5]
M. Kohlhase and I. Sucan. A search engine for mathematical formulae. In Artificial Intelligence and Symbolic Computation, pages 241--253. Springer, 2006. 8 http://ntcir-math.nii.ac.jp/
[6]
X. Lin, L. Gao, Z. Tang, J. Baker, and V. Sorge. Mathematical formula identification and performance evaluation in pdf documents. Int. J. Doc. Anal. Recogn. (IJDAR), 2013.
[7]
M. Líska, P. Sojka, and M. Ruzicka. Similarity search for mathematics: Masaryk university team at the ntcir-10 math task. In Proc. of the 10th NTCIR Conference, pages 686 -- 691, 2013.
[8]
X. Liu. Generating metadata for cyberlearning resources through information retrieval and meta-search. JASIST, 64(4):771--786, 2013.
[9]
B. Miller and A. Youssef. Technical aspects of the digital library of mathematical functions. Annals of Mathematics and Artificial Intelligence, 2003.
[10]
R. Miner and R. Munavalli. An approach to mathematical search through query formulation and data normalization. Towards Mechanized Mathematical Assistants, pages 342--355, 2007.
[11]
J. Misutka and L. Galambos. Extending full text search engine for mathematical content. Towards Digital Mathematics Library, pages 55--67, 2008.
[12]
T. T. Nguyen, K. Chang, and S. C. Hui. A math-aware search engine for math question answering system. In The 21st ACM Int. Conf. on Information and Knowledge Management, pages 724--733. ACM, 2012.
[13]
T. T. Nguyen, S. C. Hui, and K. Chang. A lattice-based approach for mathematical search using formal concept analysis. Expert Systems with Applications, 39(5):5820--5828, 2012.
[14]
T. Schellenberg, B. Yuan, and R. Zanibbi. Layout-based substitution tree indexing and retrieval for mathematical expressions. In IS & T/SPIE Electronic Imaging, volume 8297, page 82970I, 2012.
[15]
P. Sojka and M. Líska. Indexing and searching mathematics in digital libraries. Intelligent Computer Mathematics, pages 228--243, 2011.
[16]
R. Zanibbi and D. Blostein. Recognition and retrieval of mathematical expressions. Int. J. Doc. Anal. Recogn. (IJDAR), pages 1--27, 2012.
[17]
R. Zanibbi, D. Blostein, and J. R. Cordy. Recognizing mathematical expressions using tree transformation. IEEE Trans. on Pattern Analysis and Machine Intelligence, 24(11):1455--1467, 2002.
[18]
J. Zhao, M.-Y. Kan, and Y. L. Theng. Math information retrieval: user requirements and prototype implementation. In The 8th ACM/IEEE-CS joint conf. on Digital libraries, pages 187--196. ACM, 2008

Cited By

View all
  • (2024)Mathematical Information Retrieval: A ReviewACM Computing Surveys10.1145/369995357:3(1-34)Online publication date: 9-Oct-2024
  • (2021)Learning to Rank for Mathematical Formula RetrievalProceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3404835.3462956(952-961)Online publication date: 11-Jul-2021
  • (2021)Formula Citation Graph Based Mathematical Information RetrievalDocument Analysis and Recognition – ICDAR 202110.1007/978-3-030-86549-8_40(631-647)Online publication date: 2-Sep-2021
  • Show More Cited By

Index Terms

  1. A mathematics retrieval system for formulae in layout presentations

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SIGIR '14: Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval
    July 2014
    1330 pages
    ISBN:9781450322577
    DOI:10.1145/2600428
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 03 July 2014

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. layout presentation
    2. mathematical information retrieval
    3. scientific information extraction
    4. structure matching

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    SIGIR '14
    Sponsor:

    Acceptance Rates

    SIGIR '14 Paper Acceptance Rate 82 of 387 submissions, 21%;
    Overall Acceptance Rate 792 of 3,983 submissions, 20%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)9
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 10 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Mathematical Information Retrieval: A ReviewACM Computing Surveys10.1145/369995357:3(1-34)Online publication date: 9-Oct-2024
    • (2021)Learning to Rank for Mathematical Formula RetrievalProceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3404835.3462956(952-961)Online publication date: 11-Jul-2021
    • (2021)Formula Citation Graph Based Mathematical Information RetrievalDocument Analysis and Recognition – ICDAR 202110.1007/978-3-030-86549-8_40(631-647)Online publication date: 2-Sep-2021
    • (2020)Accelerating Substructure Similarity Search for Formula RetrievalAdvances in Information Retrieval10.1007/978-3-030-45439-5_47(714-727)Online publication date: 8-Apr-2020
    • (2019)Characterizing searches for mathematical conceptsProceedings of the 18th Joint Conference on Digital Libraries10.1109/JCDL.2019.00019(57-66)Online publication date: 2-Jun-2019
    • (2019)Math Expression Image Retrieval via Attention-Based Framework2019 IEEE 31st International Conference on Tools with Artificial Intelligence (ICTAI)10.1109/ICTAI.2019.00044(259-264)Online publication date: Nov-2019
    • (2019)Structural Similarity Search for Formulas Using Leaf-Root Paths in Operator SubtreesAdvances in Information Retrieval10.1007/978-3-030-15712-8_8(116-129)Online publication date: 7-Apr-2019
    • (2018)An Assistive Technology for Braille Users to Support Mathematical Learning: A Semantic Retrieval SystemSymmetry10.3390/sym1011054710:11(547)Online publication date: 26-Oct-2018
    • (2018)Mathematics Content Understanding for Cyberlearning via Formula Evolution MapProceedings of the 27th ACM International Conference on Information and Knowledge Management10.1145/3269206.3271694(37-46)Online publication date: 17-Oct-2018
    • (2017)Mathematical document categorization with structure of mathematical expressionsProceedings of the 17th ACM/IEEE Joint Conference on Digital Libraries10.5555/3200334.3200348(119-128)Online publication date: 19-Jun-2017
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media