Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2361354.2361365acmconferencesArticle/Chapter ViewAbstractPublication PagesdocengConference Proceedingsconference-collections
research-article

A methodology for evaluating algorithms for table understanding in PDF documents

Published: 04 September 2012 Publication History

Abstract

This paper presents a methodology for the evaluation of table understanding algorithms for PDF documents. The evaluation takes into account three major tasks: table detection, table structure recognition and functional analysis. We provide a general and flexible output model for each task along with corresponding evaluation metrics and methods. We also present a methodology for collecting and ground-truthing PDF documents based on consensus-reaching principles and provide a publicly available ground-truthed dataset.

References

[1]
M. J. Cafarella, A. Halevy, and J. Madhavan. Structured data on the web. Commun. ACM, 54(2):72--79, 2011.
[2]
A. C. e Silva. Metrics for evaluating performance in document analysis: application to tables. IJDAR, 14(1):101--109, 2011.
[3]
T. Hassan. Towards a common evaluation strategy for table structure recognition algorithms. In Proc. of DocEng, 2010.
[4]
J. Hu, R. Kashi, D. Lopresti, and G. Wilfong. Evaluating the performance of table processing algorithms. IJDAR, 4(3):140--153, 2002.
[5]
J. Hu, R. Kashi, D. Lopresti, G. Wilfong, and G. Nagy. Why table ground-truthing is hard. In Proc. of ICDAR, pages 129--133, 2001.
[6]
M. Hurst. The Interpretation of Tables in Texts. PhD thesis, University of Edinburgh, 2000.
[7]
M. Hurst. A constraint-based approach to table structure derivation. In Proc. of ICDAR, pages 911--915, 2003.
[8]
T. Kieninger and A. Dengel. An approach towards benchmarking of table structure recognition results. In Proc. of ICDAR, pages 1232--1236, 2005.
[9]
D. D. Lewis. Evaluating and optimizing autonomous text classification systems. In Proc. of SIGIR, pages 246--254, 1995.
[10]
E. Oro and M. Ruffolo. PDF-TREX: An approach for recognizing and extracting tables from PDF documents. In Proc. of ICDAR, pages 906--910, 2009.
[11]
I. T. Phillips. User's reference manual for the uw english/technical document image database III. Technical report, Seattle University, 1996.
[12]
A. Shahab, F. Shafait, T. Kieninger, and A. Dengel. An open approach towards the benchmarking of table structure recognition systems. In Proc. of DAS, pages 113--120, 2010.
[13]
X. Wang. Tabular Abstraction, Editing and Formatting. PhD thesis, University of Waterloo, 1996.

Cited By

View all
  • (2025)LORE++: Logical location regression network for table structure recognition with pre-trainingPattern Recognition10.1016/j.patcog.2024.110816157(110816)Online publication date: Jan-2025
  • (2024)Deep Learning for Table Detection and Structure Recognition: A SurveyACM Computing Surveys10.1145/365728156:12(1-41)Online publication date: 10-Apr-2024
  • (2024)SEMv2: Table separation line detection based on instance segmentationPattern Recognition10.1016/j.patcog.2024.110279149(110279)Online publication date: May-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
DocEng '12: Proceedings of the 2012 ACM symposium on Document engineering
September 2012
256 pages
ISBN:9781450311168
DOI:10.1145/2361354
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 September 2012

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. document analysis
  2. document understanding
  3. ground-truth dataset
  4. metrics
  5. performance evaluation
  6. table processing

Qualifiers

  • Research-article

Conference

DocEng '12
Sponsor:
DocEng '12: ACM Symposium on Document Engineering
September 4 - 7, 2012
Paris, France

Acceptance Rates

Overall Acceptance Rate 194 of 564 submissions, 34%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)33
  • Downloads (Last 6 weeks)3
Reflects downloads up to 05 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2025)LORE++: Logical location regression network for table structure recognition with pre-trainingPattern Recognition10.1016/j.patcog.2024.110816157(110816)Online publication date: Jan-2025
  • (2024)Deep Learning for Table Detection and Structure Recognition: A SurveyACM Computing Surveys10.1145/365728156:12(1-41)Online publication date: 10-Apr-2024
  • (2024)SEMv2: Table separation line detection based on instance segmentationPattern Recognition10.1016/j.patcog.2024.110279149(110279)Online publication date: May-2024
  • (2024)End to End Table TransformerDocument Analysis and Recognition - ICDAR 202410.1007/978-3-031-70533-5_20(331-345)Online publication date: 8-Sep-2024
  • (2023)M2SH: A Hybrid Approach to Table Structure Recognition using Two-Stage Multi-Modality Feature Fusion2023 IEEE International Conference on Systems, Man, and Cybernetics (SMC)10.1109/SMC53992.2023.10394093(791-798)Online publication date: 1-Oct-2023
  • (2023)An End-to-End Table Structure Analysis Method Using Graph Attention NetworksLeveraging Generative Intelligence in Digital Libraries: Towards Human-Machine Collaboration10.1007/978-981-99-8088-8_20(230-239)Online publication date: 30-Nov-2023
  • (2023)GriTS: Grid Table Similarity Metric for Table Structure RecognitionDocument Analysis and Recognition - ICDAR 202310.1007/978-3-031-41734-4_33(535-549)Online publication date: 19-Aug-2023
  • (2023)Scene Table Structure Recognition with Segmentation and Key Point CollaborationDocument Analysis and Recognition - ICDAR 202310.1007/978-3-031-41679-8_17(295-310)Online publication date: 19-Aug-2023
  • (2022)PubTables-1M: Towards comprehensive table extraction from unstructured documents2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52688.2022.00459(4624-4632)Online publication date: Jun-2022
  • (2022)Neural Collaborative Graph Machines for Table Structure Recognition2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52688.2022.00449(4523-4532)Online publication date: Jun-2022
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media