Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Integrating natural language understanding with document structure analysis

  • Published:
Artificial Intelligence Review Aims and scope Submit manuscript

Abstract

Document understanding, the interpretation of a document from its image form, is a technology area which benefits greatly from the integration of natural language processing with image processing. We have developed a prototype of an Intelligent Document Understanding System (IDUS) which employs several technologies: image processing, optical character recognition, document structure analysis and text understanding in a cooperative fashion. This paper discusses those areas of research during development of IDUS where we have found the most benefit from the integration of natural language processing and image processing: document structure analysis, optical character recognition (OCR) correction, and text analysis. We also discuss two applications which are supported by IDUS: text retrieval and automatic generation of hypertext links

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

References

  • Ball, C. N., Dahl, D., Norton, L. M., Hirschman, L., Weir, C. & Linebarger, M. (1989). Answers and Questions: Processing Messages and Queries. Proceedings ofThe DARPA Speech and Language Workshop, 60–66. Morgan Kaufman Publishers (San Mateo, CA): Cape Cod, MA.

    Google Scholar 

  • Church, K. W. (1988). A Stochastic Parts Program and Noun Phrase Parser for Unrestricted Text. Proceedings ofthe Second Conference on Applied Natural Language Processing, 136–143. Association for Computational Linguistics: Austin.

    Google Scholar 

  • Dahl, D. A. (1993). Hypothesizing Case Frames for Unknown Verbs.A Festschrift for Gerald Sanders, John Benjamins: Philadelphia. Edited by Gregory Iverson and Mushira Eid.

    Google Scholar 

  • Dahl, D. A. & Ball, C. N. (1990). Reference Resolution in PUNDIT. In Saint-Dizier, P. and Szpakowicz, S. (eds.)Logic and Logic Grammars for Language Processing. Ellis Horwood Limited: London.

    Google Scholar 

  • Dahl, D. A., Hirschman, L., Norton, L. M., Linebarger, M. C., Magerman, D. & Ball, C. N. (1990). Training and Evaluation of a Spoken Language Understanding System. Proceedings ofThe DARPA Speech and Language Workshop, 212–218. Morgan Kaufman Publishers (San Mateo, CA): Hidden Valley, PA.

    Google Scholar 

  • Dahl, D. A., Norton, L. M. & Taylor, S. L. (1993). Improving OCR Accuracy with Linguistic Knowledge. InSecond Symposium on Document Analysis and Retrieval, 169–177. University of Las Vegas, Nevada: Las Vegas, Nevada.

    Google Scholar 

  • Fillmore, C. (1977). The Case for Case Reported. In Cole, P. and Sadock, J. (eds.)Syntax and Semantics. Volume 8: Grammatical Relations. Academic Press: New York.

    Google Scholar 

  • Fillmore, C. (1980). The Case for Case. In Bach and Harms (eds.)Universals in Linguistic Theory, 1–88. Holt, Reinhart, and Winston: New York.

    Google Scholar 

  • Fisher, J. (1991). Logical Structure Descriptions of Segmented Document Images. In Proceedings ofThe First International Conference on Document Analysis and Recognition, 302–310. AFCET-IRISA/INRIA, Saint-Malo, France.

    Google Scholar 

  • Hemphill, C. T., Godfrey, J. J. & Doddington, G. R. (1990). The ATIS Spoken Language System Pilot Corpus. Proceedings ofThe DARPA Speech and Language Workshop. Morgan Kaufman Publishers (San Mateo, CA): Hidden Valley, PA.

    Google Scholar 

  • Hinds, S. C., Fisher, J. L. & D'Amato, D. P. (1990). A Document Skew Detection Method Using Run-Length Encoding and the Hough Transform. In Proceedings ofThe Tenth International Conference on Pattern Recognition, 464–468. IEEE Computer Society Press (Los Alamitos, CA): Atlantic City, NJ.

    Google Scholar 

  • Hirschman, L. & Dowding, J. (1990). Restriction Grammar: A Logic Grammar. In Saint-Dizier, P. and Szpakowicz, S. (eds.)Logic and Logic Grammars for Language Processing, 141–167. Ellis Horwood: London.

    Google Scholar 

  • Kucera, H. & Francis, W. (1968). Computational Analysis of Present-Day American English.Technical Report. Brown University: Providence, Rhode, Island.

    Google Scholar 

  • Lam, W. & Niyogi, D. (1988). Block Segmentation of Document Images Using the X-Y Tree Approach.Technical Report TR 88-14, Dept. of CS, SUNY/Buffalo.

  • Lang, F.-M. & Hirschman, L. (1988). Improved Portability and Parsing Through Interactive Acquisition of Semantic Information. Proceedings ofThe Second Conference on Applied Natural Language Processing, 49–57. Association for Computational Linguistics: Austin, TX.

    Google Scholar 

  • Lipshutz, M. & Taylor, S. L. (1994a). Automatic Generation of Hypertext from Legacy Documents. Accepted to theRIAO Conference on Intelligent Multimedia Information Retrieval Systems and Management: New York, NY.

  • Lipshutz, M. & Taylor, S. L. (1994b). Comprehensive Document Representation. Accepted for publication inMathematical and Computer Modelling.

  • Marcus, M. (1990). Very Large Annotated Database of American English. Proceedings ofThe DARPA Speech and Language Workshop, 428. Morgan Kaufman Publishers (San Mateo, CA): Hidden Valley, PA.

    Google Scholar 

  • Nielsen, J. (1990).Hypertext and Hypermedia. Academic Press, Inc., San Diego, CA.

    Google Scholar 

  • Norton, L. M., Dahl, D. A., McKay, D. P., Hirschman, L., Linebarger, M. C. Magerman, D. & Ball, C. N. (1990). Management and Evaluation of Interactive Dialog in the Air Travel Domain. Proceedings ofThe DARPA Speech and Language Workshop, 141–146. Morgan Kaufman Publishers (San Mateo, CA): Hidden Valley, PA.

    Google Scholar 

  • Norton, L. M., Linebarger, M. C., Dahl, D. A. & Nguyen, N. (1991). Augmented Role Filling Capabilities for Semantic Interpretation of Natural Language. Proceedings ofThe DARPA Speech and Language Workshop, 125–133. Morgan Kaufman Publishers (San Mateo, CA): Pacific Grove, CA.

    Google Scholar 

  • Pallett, D. S. (1991). DARPA Resource Management and ATIS Benchmark Poster Session. Proceedings ofThe DARPA Speech and Language Workshop, 49–58. Morgan Kaufman Publishers (San Mateo, CA): Pacific Grove, CA.

    Google Scholar 

  • Palmer, M. (1990).Semantic Processing for Finite Domains. Cambridge University Press, Cambridge, England.

    Google Scholar 

  • Price, P. (1990). Evaluation of Spoken Language Systems: The ATIS Domain. Proceedings of theDARPA Speech and Language Workshop, 91–95. Morgan Kaufman Publishers (San Mateo, CA): Hidden Valley, PA.

    Google Scholar 

  • Ronse, C. & Devijver, P. A. (1984).Connected Components in Binary Images: The Detection Problem. John Wiley and Sons, Inc., New York.

    Google Scholar 

  • Sager, N. (1981).Natural Language Information Processing: A Computer Grammar of English and Its Applications, Addison-Wesley: Reading, Mass.

    Google Scholar 

  • Schwartz, R. & Austin, S. (1990). Efficient, High-Performance Algorithms for N-Best Search. Proceedings ofThe DARPA Speech and Language Workshop, 6–11. Morgan Kaufman Publishers (San Mateo, CA): Hidden Valley, PA.

    Google Scholar 

  • Soong, F. K. & Huang, E.-F. (1990). A Tree-Trellis Based Fast Search for Finding the N-Best Sentence Hypotheses in Continuous Speech Recognition. Proceedings ofThe DARPA Speech and Natural Language Workshop, 12–19. Morgan Kaufman Publishers (San Mateo, CA): Hidden Valley, PA.

    Google Scholar 

  • Strzalkowski, T. & Vauthey, B. (1992). Information Retrieval Using Robust Natural Language Processing. Proceedings ofThe Thirteenth Annual Meeting of the Association for Computational Linguistics, 104–111. Association for Computational Linguistics: Newark, DE.

    Google Scholar 

  • Taylor, S. L., Lipshutz, M. & Weir, C. (1992). Document Structure Interpretation by Integrating Multiple Knowledge Sources.Symposium on Document Analysis and Information Retrieval, 58–76. University of Las Vegas, Nevada: Las Vegas, Nevada.

    Google Scholar 

  • Taylor, S. L., Lipshutz, M., Dahl, D. A. & Weir, C. (1993). An Intelligent Document Understanding System. Proceedings ofThe Second International Conference on Document Analysis and Recognition, 107–110. IEEE Computer Society Press (Los Alamitos, CA): Tsukuba City, Japan.

    Google Scholar 

  • Tsujimoto, S. & Asada, H. (1990). Understanding Multi-Articled Documents. Proceedings ofThe Tenth International Conference on Pattern Recognition, 551–556. IEEE Computer Society Press (Los Alamitos, CA): Atlantic City, NJ.

    Google Scholar 

  • van Herwijnin, E. (1990).Practical SGML. Kluwer Academic Publishers, Norwell, MA.

    Google Scholar 

  • Wong, K., Casey, R. & Wahl, F. (1982). Document Analysis System.IBM J. Research and Development 26(6): 647–656.

    Google Scholar 

  • Zue, V., Glass, J., Goodine, D., Leung, H., McCandless, M., Phillips, M., Polifroni, J. & Seneff, S. (1990). Recent Progress in the Voyager System. Proceedings ofThe DARPA Speech and Language Workshop, 206–211. Morgan Kaufman Publishers (San Mateo, CA): Hidden Valley, PA.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Taylor, S.L., Dahl, D.A., Lipshutz, M. et al. Integrating natural language understanding with document structure analysis. Artif Intell Rev 8, 255–276 (1994). https://doi.org/10.1007/BF00849077

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF00849077

Key words