Table-processing paradigms: a research survey

DW Embley, M Hurst, D Lopresti, G Nagy - International Journal of …, 2006 - Springer
International Journal of Document Analysis and Recognition (IJDAR), 2006Springer
Tables are a ubiquitous form of communication. While everyone seems to know what a table
is, a precise, analytical definition of “tabularity” remains elusive because some bureaucratic
forms, multicolumn text layouts, and schematic drawings share many characteristics of
tables. There are significant differences between typeset tables, electronic files designed for
display of tables, and tables in symbolic form intended for information retrieval. Most past
research has addressed the extraction of low-level geometric information from raster images …
Abstract
Tables are a ubiquitous form of communication. While everyone seems to know what a table is, a precise, analytical definition of “tabularity” remains elusive because some bureaucratic forms, multicolumn text layouts, and schematic drawings share many characteristics of tables. There are significant differences between typeset tables, electronic files designed for display of tables, and tables in symbolic form intended for information retrieval. Most past research has addressed the extraction of low-level geometric information from raster images of tables scanned from printed documents, although there is growing interest in the processing of tables in electronic form as well. Recent research on table composition and table analysis has improved our understanding of the distinction between the logical and physical structures of tables, and has led to improved formalisms for modeling tables. This review, which is structured in terms of generalized paradigms for table processing, indicates that progress on half-a-dozen specific research issues would open the door to using existing paper and electronic tables for database update, tabular browsing, structured information retrieval through graphical and audio interfaces, multimedia table editing, and platform-independent display.
Springer