Abstract
Tables are the only acceptable means of communicating certain types of structured data. A precise definition of “tabularity” remains elusive because some bureaucratic forms, multicolumn text layouts, and schematic drawings share many characteristics of tables. There are significant differences between typeset tables, electronic files designed for display of tables, and tables in symbolic form intended for information retrieval. Although most research to date has addressed the extraction of low-level geometric information from scanned raster images of paper tables, the recent trend toward the analysis of tables in electronic form may pave the way to a higher level of table understanding. Recent research on table composition and table analysis has improved our understanding of the distinction between the logical and physical structures of tables, and has led to improved formalisms for modeling tables. The present study indicates that progress on half-a-dozen specific research issues would open the door to using existing paper and electronic tables for database update, tabular browsing, structured information retrieval through graphical and audio interfaces, multimedia table editing, and platform-independent display. Although tables are not a conventional format for conveying the primary content of technical papers, here we attempt to subdue our natural garrulity by adopting this genre to communicate what we have to say about tables entirely in tabular form.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
A. Abu-Tarif. Table processing and table understanding. Master’s thesis, Rensselaer Polytechnic Institute, May 1998. 100
J. F. Arias, S. Balasubramanian, A. Prasad, R. Kasturi, and A. Chhabra. Information extraction from telephone company drawings. In Proceedings of the Conference on Computer Vision and Pattern Recognition, pages 729–732, Seattle, Washington, June 1994. 100
J. F. Arias, A. Chhabra, and V. Misra. Efficient interpretation of tabular documents. In Proceedings of the International Conference on Pattern Recognition (ICPR’96), volume III, pages 681–685, Vienna, Austria, August 1996. 100
J. F. Arias, A. Chhabra, and V. Misra. Interpreting and representing tabular documents. In Proceedings of the Conference on Computer Vision and Pattern Recognition, pages 600–605, San Francisco, CA, June 1996. 100
J. F. Arias and R. Kasturi. Efficient techniques for line drawing interpretation and their application to telephone company drawings. Technical Report CSE TR CSE-95-020, Penn State University, August 1995. 100
S. Balasubramanian, S. Chandran, J. F. Arias, R. Kasturi, and A. Chhabra. Information extraction from tabular drawings. In Proceedings of Document Recognition I (IS&T/SPIE Electronic Imaging’94), volume 2181, pages 152–163, San Jose, CA, June 1994. 100
L. Bing, J. Zao, and X. Hong. New method for logical structure extraction of form document image. In Proceedings of Document Recognition and Retrieval VI (IS&T/SPIE Electronic Imaging’99), volume 3651, pages 183–193, San Jose, CA, January 1999. 100
S. Chandran and R. Kasturi. Structural recognition of tabulated data. In Proceedings of the Second International Conference on Document Analysis and Recognition (ICDAR’93), pages 516–519, Tsukuba Science City, Japan, October 1993. 100
A. K. Chhabra, V. Misra, and J. Arias. Detection of horizontal lines in noisy run length encoded images: The FAST method. In R. Kasturi and K. Tombre, editors, Graphics Recognition — Methods and Applications, volume 1072 of Lecture Notes in Computer Science, pages 35–48. Springer-Verlag, Berlin, Germany, 1996. 100
E. Codd. A relational model of data for large shared data banks. Communications of the ACM, 13(6), June 1970. 104
M. J. DeHaemer, G. Wright, and T. W. Dillon. Automated speech recognition for spreadsheet tasks: Performance effects for experts and novices. International Journal of Human-Computer Interaction, 6(3):299–318, 1994. 100
S. Douglas, M. Hurst, and D. Quinn. Using natural language processing for identifying and interpreting tables in plain text. In Proceedings of the Symposium on Document Analysis and Information Retrieval (SDAIR’95), pages 535–545, Las Vegas, NV, April 1995. 100
D. Embley, B. Kurtz, and S. Woodfield. Object-oriented Systems Analysis: A Model Driven Apprach. Yourdon Press, 1992. 100, 104
M. Garris, S. Janet, and W. Klein. Federal Register document image database. In Proceedings of Document Recognition and Retrieval VI (IS&T/SPIE Electronic Imaging’99), volume 3651, pages 97–108, San Jose, CA, January 1999. 100
P. Gray, S. Embury, W. Gray, and K. Hui. An agent-based system for handling distributed design constraints. In Proceedings of Agents’98, 1998. 100
E. A. Green. Model-based analysis of printed tables. PhD thesis, Rensselaer Polytechnic Institute, May 1996. 100
E. A. Green and M. Krishnamoorthy. Model-based analysis of printed tables. In Proceedings of the Third International Conference on Document Analysis and Recognition (ICDAR’95), pages 214–217, Montréal, Canada, August 1995. 100, 104
E. A. Green and M. Krishnamoorthy. Model-based analysis of printed tables. In Proceedings of the First International Workshop on Graphics Recognition (GREC’95), pages 234–242, PA, 1995. 100, 104
E. A. Green and M. Krishnamoorthy. Recognition of tables using table grammars. In Proceedings of the Symposium on Document Analysis and Information Retrieval (SDAIR’95), pages 261–277, Las Vegas, NV, April 1995. 100, 104
T. B. Haas. The development of a prototype knowledge-based table-processing system. Master’s thesis, Brigham Young University, December 1997. 100, 104
R. Hall. Handbook of Tabular Presentation. The Ronald Press Company, New York, NY, 1943. 100
Y. Hirayama. A method for table structure analysis using DP matching. In Proceedings of the Third International Conference on Document Analysis and Recognition (ICDAR’95), pages 583–586, Montréal, Canada, August 1995. 100
O. Hori and D. S. Doermann. Robust table-form structure analysis based on boxdriven reasoning. In Proceedings of the Third International Conference on Document Analysis and Recognition (ICDAR’95), pages 218–221, Montréal, Canada, August 1995. 100
J. Hu, R. Kashi, D. Lopresti, and G. Wilfong. Medium-independent table detection. In Proceedings of Document Recognition and Retrieval VII (IS&T/SPIE Electronic Imaging’00), San Jose, CA, January 2000. To appear. 100
T. Hu. Recognizing table entries in a scanned document. Master’s thesis, Rensselaer Polytechnic Institute, October 1993. 100
M. Hurst and S. Douglas. Layout and language: Preliminary investigations in recognizing the structure of tables. In Proceedings of the International Conference on Document Analysis and Recognition (ICDAR’97), pages 1043–1047, August 1997. 100, 104
K. Itonori. A table structure recognition based on textblock arrangement and ruled line position. In Proceedings of the Second International Conference on Document Analysis and Recognition (ICDAR’93), pages 765–768, Tsukuba Science City, Japan, October 1993. 100
T. G. Kieninger. Table structure recognition based on robust block segmentation. In Proceedings of Document Recognition V (IS&T/SPIE Electronic Imaging’98), volume 3305, pages 22–32, San Jose, CA, January 1998. 100
W. Kornfeld and J. Wattecamps. Automatically locating, extracting and analyzing tabular data. In Proceedings of the 21st International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 347–348, Melbourne, Australia, August 1998. 100
M. Krishnamoorthy. TBL, an easy to use table description language. Internal document, Rensselaer Polytechnic Institute, 1992. 100
G. Kyriazis. Analysis of digitized tables. Senior project report, Rensselaer Polytechnic Institute, 1990. 100
L. Lamport. LATEX: A Document Preparation System. Addison-Wesley, Reading, MA, 1985. 100
A. Laurentini and P. Viada. Identifying and understanding tabular material in compound documents. In Proceedings of the Eleventh International Conference on Pattern Recognition (ICPR’92), pages 405–409, The Hague, 1992. 100
M. Lesk. Tbl — a program to format tables. In UNIX Programmer’s Manual, volume 2A. Bell Telephone Laboratories, Murray Hill, NJ, 1979. 100
D. Lopresti and G. Nagy. Automated table processing: An (opinionated) survey. In Proceedings of the Third IAPR International Workshop on Graphics Recognition, pages 109–134, Jaipur, India, September 1999. 94
G. Nagy, M. Krishnamoorthy, S. Seth, and M. Viswanathan. Syntactic segmentation and labeling of digitized pages from technical journals. IEEE Transactions on Pattern Analysis and Machine Intelligence, 15(7):737–747, 1993. 100
G. Nagy and S. Seth. Hierarchical representation of optically scanned documents. In Proceedings the International Conference on Pattern Recognition (ICPR), pages 347–349, 1984. 100
C. Peterman, C. H. Chang, and H. Alam. A system for table understanding. In Proceedings of the Symposium on Document Image Understanding Technology (SDIUT’97), pages 55–62, Annapolis, MD, April/May 1997. 94, 100
P. Pyreddy and W. B. Croft. TINTIN: A system for retrieval in text tables. Technical Report UM-CS-1997-002, University of Massachusetts, Amherst, January 1997. 100
M. A. Rahgozar and R. Cooperman. A graph-based table recognition system. In Proceedings of Document Recognition III (IS&T/SPIE Electronic Imaging’96), volume 2660, pages 192–203, San Jose, CA, January 1996. 100
D. Rus and D. Subramanian. Customizing information capture and access. ACM Transactions on Information Systems, 15(1):67–101, 1997. 100
J. H. Shamalian, H. S. Baird, and T. L. Wood. A retargetable table reader. In Proceedings of the International Conference on Document Analysis and Recognition (ICDAR’97), pages 158–163, August 1997. 100
R. Sproat, J. Hu, and H. Chen. EMU: an e-mail preprocessor for text-to-speech. In Proceedings of the IEEE Workshop on Multimedia Signal Processing, pages 239–244, Los Angeles, CA, December 1998. 100
E. R. Tufte. The Visual Display of Quantitative Information. Graphics Press, Cheshire, CT, 1983. 100
E. Turolla, Y. Belaid, and A. Belaid. Form item extraction based on line searching. In R. Kasturi and K. Tombre, editors, Graphics Recognition — Methods and Applications, volume 1072 of Lecture Notes in Computer Science, pages 69–79. Springer-Verlag, Berlin, Germany, 1996. 100
M. A. Walker, J. Fromer, G. D. Fabbrizio, C. Mestel, and D. Hindle. What can I say?: Evaluating a spoken language interface to email. In Proceedings of the Conference on Human Factors in Computing Systems (CHI), pages 582–589, Los Angeles, CA, April 1998. 100
X. Wang. Tabular abstraction, editing, and formatting. PhD thesis, University of Waterloo, 1996. 99, 100, 102, 104
T. Watanabe, Q. L. Quo, and N. Sugie. Layout recognition of multi-kinds of table-form documents. IEEE Transactions on Pattern Analysis and Machine Intelligence, 17(4):432–445, 1995. 100, 104
S. Whittaker and C. Sidner. Email overload: exploring personal information management of email. In Proceedings of the Conference on Human Factors in Computing Systems (CHI), pages 276–283, Vancouver, British Columbia, Canada, April 1996. 100
P. Wright. Using tabulated information. Ergonomics, 11(4):331–343, 1968. 100
P. Wright. Understanding tabular displays. Visible Language, 7:351–359, 1973. 100
P. Wright. The comprehension of tabulated information: some similarities between prose and reading tables. NSPI Journal, XIX(8):25–29, October 1980. 100
K. Zuyev. Table image segmentation. In Proceedings of the International Conference on Document Analysis and Recognition (ICDAR’97), pages 705–708, August 1997. 100
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2000 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Lopresti, D., Nagy, G. (2000). A Tabular Survey of Automated Table Processing. In: Chhabra, A.K., Dori, D. (eds) Graphics Recognition Recent Advances. GREC 1999. Lecture Notes in Computer Science, vol 1941. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-40953-X_9
Download citation
DOI: https://doi.org/10.1007/3-540-40953-X_9
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-41222-9
Online ISBN: 978-3-540-40953-3
eBook Packages: Springer Book Archive