Abstract
Document understanding, the interpretation of a document from its image form, is a technology area which benefits greatly from the integration of natural language processing with image processing. We have developed a prototype of an Intelligent Document Understanding System (IDUS) which employs several technologies: image processing, optical character recognition, document structure analysis and text understanding in a cooperative fashion. This paper discusses those areas of research during development of IDUS where we have found the most benefit from the integration of natural language processing and image processing: document structure analysis, optical character recognition (OCR) correction, and text analysis. We also discuss two applications which are supported by IDUS: text retrieval and automatic generation of hypertext links
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Ball, C. N., Dahl, D., Norton, L. M., Hirschman, L., Weir, C. & Linebarger, M. (1989). Answers and Questions: Processing Messages and Queries. Proceedings ofThe DARPA Speech and Language Workshop, 60–66. Morgan Kaufman Publishers (San Mateo, CA): Cape Cod, MA.
Church, K. W. (1988). A Stochastic Parts Program and Noun Phrase Parser for Unrestricted Text. Proceedings ofthe Second Conference on Applied Natural Language Processing, 136–143. Association for Computational Linguistics: Austin.
Dahl, D. A. (1993). Hypothesizing Case Frames for Unknown Verbs.A Festschrift for Gerald Sanders, John Benjamins: Philadelphia. Edited by Gregory Iverson and Mushira Eid.
Dahl, D. A. & Ball, C. N. (1990). Reference Resolution in PUNDIT. In Saint-Dizier, P. and Szpakowicz, S. (eds.)Logic and Logic Grammars for Language Processing. Ellis Horwood Limited: London.
Dahl, D. A., Hirschman, L., Norton, L. M., Linebarger, M. C., Magerman, D. & Ball, C. N. (1990). Training and Evaluation of a Spoken Language Understanding System. Proceedings ofThe DARPA Speech and Language Workshop, 212–218. Morgan Kaufman Publishers (San Mateo, CA): Hidden Valley, PA.
Dahl, D. A., Norton, L. M. & Taylor, S. L. (1993). Improving OCR Accuracy with Linguistic Knowledge. InSecond Symposium on Document Analysis and Retrieval, 169–177. University of Las Vegas, Nevada: Las Vegas, Nevada.
Fillmore, C. (1977). The Case for Case Reported. In Cole, P. and Sadock, J. (eds.)Syntax and Semantics. Volume 8: Grammatical Relations. Academic Press: New York.
Fillmore, C. (1980). The Case for Case. In Bach and Harms (eds.)Universals in Linguistic Theory, 1–88. Holt, Reinhart, and Winston: New York.
Fisher, J. (1991). Logical Structure Descriptions of Segmented Document Images. In Proceedings ofThe First International Conference on Document Analysis and Recognition, 302–310. AFCET-IRISA/INRIA, Saint-Malo, France.
Hemphill, C. T., Godfrey, J. J. & Doddington, G. R. (1990). The ATIS Spoken Language System Pilot Corpus. Proceedings ofThe DARPA Speech and Language Workshop. Morgan Kaufman Publishers (San Mateo, CA): Hidden Valley, PA.
Hinds, S. C., Fisher, J. L. & D'Amato, D. P. (1990). A Document Skew Detection Method Using Run-Length Encoding and the Hough Transform. In Proceedings ofThe Tenth International Conference on Pattern Recognition, 464–468. IEEE Computer Society Press (Los Alamitos, CA): Atlantic City, NJ.
Hirschman, L. & Dowding, J. (1990). Restriction Grammar: A Logic Grammar. In Saint-Dizier, P. and Szpakowicz, S. (eds.)Logic and Logic Grammars for Language Processing, 141–167. Ellis Horwood: London.
Kucera, H. & Francis, W. (1968). Computational Analysis of Present-Day American English.Technical Report. Brown University: Providence, Rhode, Island.
Lam, W. & Niyogi, D. (1988). Block Segmentation of Document Images Using the X-Y Tree Approach.Technical Report TR 88-14, Dept. of CS, SUNY/Buffalo.
Lang, F.-M. & Hirschman, L. (1988). Improved Portability and Parsing Through Interactive Acquisition of Semantic Information. Proceedings ofThe Second Conference on Applied Natural Language Processing, 49–57. Association for Computational Linguistics: Austin, TX.
Lipshutz, M. & Taylor, S. L. (1994a). Automatic Generation of Hypertext from Legacy Documents. Accepted to theRIAO Conference on Intelligent Multimedia Information Retrieval Systems and Management: New York, NY.
Lipshutz, M. & Taylor, S. L. (1994b). Comprehensive Document Representation. Accepted for publication inMathematical and Computer Modelling.
Marcus, M. (1990). Very Large Annotated Database of American English. Proceedings ofThe DARPA Speech and Language Workshop, 428. Morgan Kaufman Publishers (San Mateo, CA): Hidden Valley, PA.
Nielsen, J. (1990).Hypertext and Hypermedia. Academic Press, Inc., San Diego, CA.
Norton, L. M., Dahl, D. A., McKay, D. P., Hirschman, L., Linebarger, M. C. Magerman, D. & Ball, C. N. (1990). Management and Evaluation of Interactive Dialog in the Air Travel Domain. Proceedings ofThe DARPA Speech and Language Workshop, 141–146. Morgan Kaufman Publishers (San Mateo, CA): Hidden Valley, PA.
Norton, L. M., Linebarger, M. C., Dahl, D. A. & Nguyen, N. (1991). Augmented Role Filling Capabilities for Semantic Interpretation of Natural Language. Proceedings ofThe DARPA Speech and Language Workshop, 125–133. Morgan Kaufman Publishers (San Mateo, CA): Pacific Grove, CA.
Pallett, D. S. (1991). DARPA Resource Management and ATIS Benchmark Poster Session. Proceedings ofThe DARPA Speech and Language Workshop, 49–58. Morgan Kaufman Publishers (San Mateo, CA): Pacific Grove, CA.
Palmer, M. (1990).Semantic Processing for Finite Domains. Cambridge University Press, Cambridge, England.
Price, P. (1990). Evaluation of Spoken Language Systems: The ATIS Domain. Proceedings of theDARPA Speech and Language Workshop, 91–95. Morgan Kaufman Publishers (San Mateo, CA): Hidden Valley, PA.
Ronse, C. & Devijver, P. A. (1984).Connected Components in Binary Images: The Detection Problem. John Wiley and Sons, Inc., New York.
Sager, N. (1981).Natural Language Information Processing: A Computer Grammar of English and Its Applications, Addison-Wesley: Reading, Mass.
Schwartz, R. & Austin, S. (1990). Efficient, High-Performance Algorithms for N-Best Search. Proceedings ofThe DARPA Speech and Language Workshop, 6–11. Morgan Kaufman Publishers (San Mateo, CA): Hidden Valley, PA.
Soong, F. K. & Huang, E.-F. (1990). A Tree-Trellis Based Fast Search for Finding the N-Best Sentence Hypotheses in Continuous Speech Recognition. Proceedings ofThe DARPA Speech and Natural Language Workshop, 12–19. Morgan Kaufman Publishers (San Mateo, CA): Hidden Valley, PA.
Strzalkowski, T. & Vauthey, B. (1992). Information Retrieval Using Robust Natural Language Processing. Proceedings ofThe Thirteenth Annual Meeting of the Association for Computational Linguistics, 104–111. Association for Computational Linguistics: Newark, DE.
Taylor, S. L., Lipshutz, M. & Weir, C. (1992). Document Structure Interpretation by Integrating Multiple Knowledge Sources.Symposium on Document Analysis and Information Retrieval, 58–76. University of Las Vegas, Nevada: Las Vegas, Nevada.
Taylor, S. L., Lipshutz, M., Dahl, D. A. & Weir, C. (1993). An Intelligent Document Understanding System. Proceedings ofThe Second International Conference on Document Analysis and Recognition, 107–110. IEEE Computer Society Press (Los Alamitos, CA): Tsukuba City, Japan.
Tsujimoto, S. & Asada, H. (1990). Understanding Multi-Articled Documents. Proceedings ofThe Tenth International Conference on Pattern Recognition, 551–556. IEEE Computer Society Press (Los Alamitos, CA): Atlantic City, NJ.
van Herwijnin, E. (1990).Practical SGML. Kluwer Academic Publishers, Norwell, MA.
Wong, K., Casey, R. & Wahl, F. (1982). Document Analysis System.IBM J. Research and Development 26(6): 647–656.
Zue, V., Glass, J., Goodine, D., Leung, H., McCandless, M., Phillips, M., Polifroni, J. & Seneff, S. (1990). Recent Progress in the Voyager System. Proceedings ofThe DARPA Speech and Language Workshop, 206–211. Morgan Kaufman Publishers (San Mateo, CA): Hidden Valley, PA.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Taylor, S.L., Dahl, D.A., Lipshutz, M. et al. Integrating natural language understanding with document structure analysis. Artif Intell Rev 8, 255–276 (1994). https://doi.org/10.1007/BF00849077
Issue Date:
DOI: https://doi.org/10.1007/BF00849077