Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3558100.3563851acmconferencesArticle/Chapter ViewAbstractPublication PagesdocengConference Proceedingsconference-collections
short-paper

A cascaded approach for page-object detection in scientific papers

Published: 18 November 2022 Publication History

Abstract

In recent years, Page Object Detection (POD) has become a popular document understanding task, proving to be a non-trivial task given the potential complexity of documents. The rise of neural networks facilitated a more general learning approach to this task. However, in the literature, the different objects such as formulae, or figures among others, are generally considered individually. In this paper, we describe the joint localisation of six object classes relevant to scientific papers, namely isolated formulae, embedded formulae, figures, tables, variables and references. Through a qualitative analysis of these object classes, we note a hierarchy among the classes and propose a new localisation approach, using two, cascaded You Only Look Once (YOLO) networks. We also present a new data set consisting of labelled bounding boxes for all six object classes. This data set combines two commonly used data sets in the literature for formulae localisation, adding to the document images in these data sets the labels for figures, tables, variables and references. Using this data set, we achieve an average F1-score of 0.755 across all classes, which is comparable to the state-of-the-art for the object classes when considered individually for localisation.

References

[1]
Aduen Benjumea, Izzedin Teeti, Fabio Cuzzolin, and Andrew Bradley. 2021. YOLOZ: Improving small object detection in YOLOv5 for autonomous vehicles. arXiv preprint arXiv:2112.11798 (2021).
[2]
Salvador Carrion Ponz. 2019. Detection of Mathematical Expressions in Scientific Papers. mathesis. Polytechnic University of Valencia.
[3]
Natalia Criado and Jose Such. 2019. Digital Discrimination. King's College, United Kingdom, 82--97.
[4]
Jing Fang, Liangcai Gao, Kun Bai, Ruiheng Qiu, Xin Tao, and Zhi Tang. 2011. A Table Detection Method for Multipage PDF Documents via Visual Seperators and Tabular Structures. In 2011 International Conference on Document Analysis and Recognition. 779--783.
[5]
Leipeng Hao, Liangcai Gao, Xiaohan Yi, and Zhi Tang. 2016. A Table Detection Method for PDF Documents Based on Convolutional Neural Networks. In 2016 12th IAPR Workshop on Document Analysis Systems (DAS). 287--292.
[6]
Khurram Azeem Hashmi, Alain Pagani, Marcus Liwicki, Didier Stricker, and Muhammad Zeshan Afzal. 2021. Cascade Network with Deformable Composite Backbone for Formula Detection in Scanned Document Images. Applied Sciences 11, 16 (2021).
[7]
Kenichi Iwatsuki, Takeshi Sagara, Tadayoshi Hara, and Akiko Aizawa. 2017. Detecting In-Line Mathematical Expressions in Scientific Documents. In Proceedings of the 2017 ACM Symposium on Document Engineering (Valletta, Malta) (DocEng '17). Association for Computing Machinery, New York, NY, USA, 141--144.
[8]
Sida Li, Liangcai Gao, Zhi Tang, and Yinyan Yu. 2015. Cross-Reference Identification Within a PDF Document. In Document Recognition and Retrieval XXII, Vol. 9402. International Society for Optics and Photonics, 940209.
[9]
Sreekanth Madisetty, Kaushal Kumar Maurya, Akiko Aizawa, and Maunendra Sankar Desarkar. 2021. A Neural Approach for Detecting Inline Mathematical Expressions from Scientific Documents. Expert Systems 38, 4 (2021).
[10]
Parag Mali, Puneeth Kukkadapu, Mahshad Mahdavi, and Richard Zanibbi. 2020. ScanSSD: Scanning Single Shot Detector for Mathematical Formulas in PDF Document Images. Computing Research Repository abs/2003.08005 (2020). https://arxiv.org/abs/2003.08005
[11]
Ermelinda Oro and Massimo Ruffolo. 2009. PDF-TREX: An Approach for Recognizing and Extracting Tables from PDF Documents. In 2009 10th International Conference on Document Analysis and Recognition. 906--910.
[12]
Bui Hai Phong, Luong Tan Dat, Nguyen Thi Yen, Thang Manh Hoang, and Thi-Lan Le. 2020. A Deep Learning Based System for Mathematical Expression Detection and Recognition in Document Images. In 2020 12th International Conference on Knowledge and Systems Engineering (KSE). 85--90.
[13]
Junaid Younas, Syed Tahseen Raza Rizvi, Muhammad Imran Malik, Faisal Shafait, Paul Lukowicz, and Sheraz Ahmed. 2019. FFD: Figure and Formula Detection From Document Images. In 2019 Digital Image Computing: Techniques and Applications (DICTA). IEEE, 1--7.
[14]
Junaid Younas, Shoaib Ahmed Siddiqui, Mohsin Munir, Muhammad Imran Malik, Faisal Shafait, Paul Lukowicz, and Sheraz Ahmed. 2020. Fi-fo detector: Figure and Formula Detection Using Deformable Networks. Applied Sciences 10, 18 (2020), 6460.

Index Terms

  1. A cascaded approach for page-object detection in scientific papers

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    DocEng '22: Proceedings of the 22nd ACM Symposium on Document Engineering
    September 2022
    118 pages
    ISBN:9781450395441
    DOI:10.1145/3558100
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    In-Cooperation

    • SIGDOC: ACM Special Interest Group on Systems Documentation

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 18 November 2022

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. deep learning
    2. page object detection

    Qualifiers

    • Short-paper

    Conference

    DocEng '22
    Sponsor:
    DocEng '22: ACM Symposium on Document Engineering 2022
    September 20 - 23, 2022
    California, San Jose

    Acceptance Rates

    Overall Acceptance Rate 194 of 564 submissions, 34%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 59
      Total Downloads
    • Downloads (Last 12 months)8
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 09 Jan 2025

    Other Metrics

    Citations

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media