short-paper

A cascaded approach for page-object detection in scientific papers

Authors:

Erika Spiteri Bailey,

Alexandra Bonnici,

Stefania CristinaAuthors Info & Claims

DocEng '22: Proceedings of the 22nd ACM Symposium on Document Engineering

Article No.: 4, Pages 1 - 4

https://doi.org/10.1145/3558100.3563851

Published: 18 November 2022 Publication History

Get Access

Abstract

In recent years, Page Object Detection (POD) has become a popular document understanding task, proving to be a non-trivial task given the potential complexity of documents. The rise of neural networks facilitated a more general learning approach to this task. However, in the literature, the different objects such as formulae, or figures among others, are generally considered individually. In this paper, we describe the joint localisation of six object classes relevant to scientific papers, namely isolated formulae, embedded formulae, figures, tables, variables and references. Through a qualitative analysis of these object classes, we note a hierarchy among the classes and propose a new localisation approach, using two, cascaded You Only Look Once (YOLO) networks. We also present a new data set consisting of labelled bounding boxes for all six object classes. This data set combines two commonly used data sets in the literature for formulae localisation, adding to the document images in these data sets the labels for figures, tables, variables and references. Using this data set, we achieve an average F1-score of 0.755 across all classes, which is comparable to the state-of-the-art for the object classes when considered individually for localisation.

References

[1]

Aduen Benjumea, Izzedin Teeti, Fabio Cuzzolin, and Andrew Bradley. 2021. YOLOZ: Improving small object detection in YOLOv5 for autonomous vehicles. arXiv preprint arXiv:2112.11798 (2021).

Google Scholar

[2]

Salvador Carrion Ponz. 2019. Detection of Mathematical Expressions in Scientific Papers. mathesis. Polytechnic University of Valencia.

Google Scholar

[3]

Natalia Criado and Jose Such. 2019. Digital Discrimination. King's College, United Kingdom, 82--97.

Crossref

Google Scholar

[4]

Jing Fang, Liangcai Gao, Kun Bai, Ruiheng Qiu, Xin Tao, and Zhi Tang. 2011. A Table Detection Method for Multipage PDF Documents via Visual Seperators and Tabular Structures. In 2011 International Conference on Document Analysis and Recognition. 779--783.

Digital Library

Google Scholar

[5]

Leipeng Hao, Liangcai Gao, Xiaohan Yi, and Zhi Tang. 2016. A Table Detection Method for PDF Documents Based on Convolutional Neural Networks. In 2016 12th IAPR Workshop on Document Analysis Systems (DAS). 287--292.

Crossref

Google Scholar

[6]

Khurram Azeem Hashmi, Alain Pagani, Marcus Liwicki, Didier Stricker, and Muhammad Zeshan Afzal. 2021. Cascade Network with Deformable Composite Backbone for Formula Detection in Scanned Document Images. Applied Sciences 11, 16 (2021).

Crossref

Google Scholar

[7]

Kenichi Iwatsuki, Takeshi Sagara, Tadayoshi Hara, and Akiko Aizawa. 2017. Detecting In-Line Mathematical Expressions in Scientific Documents. In Proceedings of the 2017 ACM Symposium on Document Engineering (Valletta, Malta) (DocEng '17). Association for Computing Machinery, New York, NY, USA, 141--144.

Digital Library

Google Scholar

[8]

Sida Li, Liangcai Gao, Zhi Tang, and Yinyan Yu. 2015. Cross-Reference Identification Within a PDF Document. In Document Recognition and Retrieval XXII, Vol. 9402. International Society for Optics and Photonics, 940209.

Google Scholar

[9]

Sreekanth Madisetty, Kaushal Kumar Maurya, Akiko Aizawa, and Maunendra Sankar Desarkar. 2021. A Neural Approach for Detecting Inline Mathematical Expressions from Scientific Documents. Expert Systems 38, 4 (2021).

Crossref

Google Scholar

[10]

Parag Mali, Puneeth Kukkadapu, Mahshad Mahdavi, and Richard Zanibbi. 2020. ScanSSD: Scanning Single Shot Detector for Mathematical Formulas in PDF Document Images. Computing Research Repository abs/2003.08005 (2020). https://arxiv.org/abs/2003.08005

Google Scholar

[11]

Ermelinda Oro and Massimo Ruffolo. 2009. PDF-TREX: An Approach for Recognizing and Extracting Tables from PDF Documents. In 2009 10th International Conference on Document Analysis and Recognition. 906--910.

Digital Library

Google Scholar

[12]

Bui Hai Phong, Luong Tan Dat, Nguyen Thi Yen, Thang Manh Hoang, and Thi-Lan Le. 2020. A Deep Learning Based System for Mathematical Expression Detection and Recognition in Document Images. In 2020 12th International Conference on Knowledge and Systems Engineering (KSE). 85--90.

Crossref

Google Scholar

[13]

Junaid Younas, Syed Tahseen Raza Rizvi, Muhammad Imran Malik, Faisal Shafait, Paul Lukowicz, and Sheraz Ahmed. 2019. FFD: Figure and Formula Detection From Document Images. In 2019 Digital Image Computing: Techniques and Applications (DICTA). IEEE, 1--7.

Google Scholar

[14]

Junaid Younas, Shoaib Ahmed Siddiqui, Mohsin Munir, Muhammad Imran Malik, Faisal Shafait, Paul Lukowicz, and Sheraz Ahmed. 2020. Fi-fo detector: Figure and Formula Detection Using Deformable Networks. Applied Sciences 10, 18 (2020), 6460.

Crossref

Google Scholar

Index Terms

A cascaded approach for page-object detection in scientific papers
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
        Object detection

Recommendations

Ensemble of Deep Object Detectors for Page Object Detection
IMCOM '18: Proceedings of the 12th International Conference on Ubiquitous Information Management and Communication

Document Imaging Understanding (DIU) is the process of converting all of the information content of a document image digital into an electronic format launched its reasonable content. We first evaluate different state-of-the-art object detection methods (...
A brief review of state-of-the-art object detectors on benchmark document images datasets
Abstract
Document image analysis (DIA) has become a challenging brand in computer vision, which is the foundation of document understanding applications. Page object detection is one of the crucial tasks in DIA, locating instances of semantic objects and ...
Zero-Shot Object Detection: Joint Recognition and Localization of Novel Concepts
Abstract
Zero shot learning (ZSL) identifies unseen objects for which no training images are available. Conventional ZSL approaches are restricted to a recognition setting where each test image is categorized into one of several unseen object classes. We ...

Comments

Information & Contributors

Information

Published In

DocEng '22: Proceedings of the 22nd ACM Symposium on Document Engineering

September 2022

118 pages

ISBN:9781450395441

DOI:10.1145/3558100

General Chairs:
Curtis Wigington
Adobe Systems Incorporated
,
Matthew Hardy
Adobe Systems Incorporated
,
Program Chairs:
Steven R. Bagley
University of Nottingham, United Kingdom
,
Steven Simske
Colorado State University, Fort Collins, CO

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

SIGDOC: ACM Special Interest Group on Systems Documentation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 November 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Short-paper

Conference

DocEng '22

Sponsor:

SIGWEB

DocEng '22: ACM Symposium on Document Engineering 2022

September 20 - 23, 2022

California, San Jose

Acceptance Rates

Overall Acceptance Rate 194 of 564 submissions, 34%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
59
Total Downloads

Downloads (Last 12 months)8
Downloads (Last 6 weeks)0

Reflects downloads up to 09 Jan 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Index Terms

Recommendations

Ensemble of Deep Object Detectors for Page Object Detection

A brief review of state-of-the-art object detectors on benchmark document images datasets

Zero-Shot Object Detection: Joint Recognition and Localization of Novel Concepts

Comments

Published In

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Other Metrics

Article Metrics

Other Metrics

Login options

Full Access

PDF

eReader

Abstract

References

Index Terms

Recommendations

Ensemble of Deep Object Detectors for Page Object Detection

A brief review of state-of-the-art object detectors on benchmark document images datasets

Zero-Shot Object Detection: Joint Recognition and Localization of Novel Concepts

Comments

Information

Published In

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Login options

Full Access

View options

PDF

eReader

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations