Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.3115/1075096.1075124dlproceedingsArticle/Chapter ViewAbstractPublication PagesaclConference Proceedingsconference-collections
Article
Free access

Closing the gap: learning-based information extraction rivaling knowledge-engineering methods

Published: 07 July 2003 Publication History

Abstract

In this paper, we present a learning approach to the scenario template task of information extraction, where information filling one template could come from multiple sentences. When tested on the MUC-4 task, our learning approach achieves accuracy competitive to the best of the MUC-4 systems, which were all built with manually engineered rules. Our analysis reveals that our use of full parsing and state-of-the-art learning algorithms have contributed to the good performance. To our knowledge, this is the first research to have demonstrated that a learning approach to the full-scale information extraction task could achieve performance rivaling that of the knowledge engineering approach.

References

[1]
M. E. Califf and R. J. Mooney. 1999. Relational learning of pattern-match rules for information extraction. In Proceedings of AAAI99, pages 328--334.
[2]
E. Charniak, C. Hendrickson, N. Jacobson, and M. Perkowitz. 1993. Equations for part-of-speech tagging. In Proceedings of AAA193, pages 784--789.
[3]
H. L. Chieu and H. T. Ng. 2002a. A maximum entropy approach to information extraction from semi-structured and free text. In Proceedings of AAAI02, pages 786--791.
[4]
H. L. Chieu and H. T. Ng. 2002b. Named entity recognition: A maximum entropy approach using global information. In Proceedings of COLING02, pages 190--196.
[5]
F. Ciravegna. 2001. Adaptive information extraction from text by rule induction and generalisation. In Proceedings of IJCAI01, pages 1251--1256.
[6]
M. Collins. 1999. Head-driven statistical models for natural language parsing. Ph.D. thesis, Department of Computer and Information Science, University of Pennsylvania.
[7]
R. O. Duda and P. E. Hart. 1973. Pattern Classification and Scene Analysis. Wiley, New York.
[8]
D. Fisher, S. Soderland, J. McCarthy, F. Feng, and W. Lehnert. 1995. Description of the UMass system as used for MUC-6. In Proceedings of MUC-6, pages 127--140.
[9]
D. Gildea and D. Jurafsky. 2000. Automatic labelling of semantic roles. In Proceedings of ACL00, pages 512--520.
[10]
S. Miller, M. Crystal, H. Fox, L. Ramshaw, R. Schwartz, R. Stone, R. Weischedel, and the Annotation Group. 1998. Algorithms that learn to extract information BBN: Description of the SIFT system as used for MUC-7. In Proceedings of MUC-7.
[11]
J. R. Quinlan. 1993. C4.5: Programs for Machine Learning. Morgan Kaufmann, San Francisco.
[12]
A. Ratnaparkhi. 1998. Maximum Entropy Models for Natural Language Ambiguity Resolution. Ph.D. thesis, Department of Computer and Information Science, University of Pennsylvania.
[13]
L. Rau, G. Krupka, and P. Jacobs. 1992. GE NL-TOOLSET: MUC-4 test results and analysis. In Proceedings of MUC-4, pages 94--99.
[14]
D. Roth and W. Yih. 2001. Relational learning via propositional algorithms: An information extraction case study. In Proceedings of IJACI01, pages 1257--1263.
[15]
S. Soderland. 1999. Learning information extraction rules for semi-structured and free text. Machine Learning, 34(1/2/3):233--272.
[16]
W. M. Soon, H. T. Ng, and D. C. Y. Lim. 2001. A machine learning approach to coreference resolution of noun phrases. Computational Linguistics, 27(4):521--544.
[17]
V. N. Vapnik. 1995. The Nature of Statistical Learning Theory. Springer-Verlag, New York.

Cited By

View all
  • (2018)Open-Schema Event Profiling for Massive News CorporaProceedings of the 27th ACM International Conference on Information and Knowledge Management10.1145/3269206.3271674(587-596)Online publication date: 17-Oct-2018
  • (2012)An ontology-based information extraction approach for résumésProceedings of the 2012 international conference on Pervasive Computing and the Networked World10.1007/978-3-642-37015-1_14(165-179)Online publication date: 28-Nov-2012
  • (2011)Template-based information extraction without the templatesProceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 110.5555/2002472.2002595(976-986)Online publication date: 19-Jun-2011
  • Show More Cited By
  1. Closing the gap: learning-based information extraction rivaling knowledge-engineering methods

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image DL Hosted proceedings
      ACL '03: Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
      July 2003
      571 pages

      Publisher

      Association for Computational Linguistics

      United States

      Publication History

      Published: 07 July 2003

      Qualifiers

      • Article

      Acceptance Rates

      Overall Acceptance Rate 85 of 443 submissions, 19%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)62
      • Downloads (Last 6 weeks)4
      Reflects downloads up to 25 Dec 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2018)Open-Schema Event Profiling for Massive News CorporaProceedings of the 27th ACM International Conference on Information and Knowledge Management10.1145/3269206.3271674(587-596)Online publication date: 17-Oct-2018
      • (2012)An ontology-based information extraction approach for résumésProceedings of the 2012 international conference on Pervasive Computing and the Networked World10.1007/978-3-642-37015-1_14(165-179)Online publication date: 28-Nov-2012
      • (2011)Template-based information extraction without the templatesProceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 110.5555/2002472.2002595(976-986)Online publication date: 19-Jun-2011
      • (2010)Extracting sequences from the webProceedings of the ACL 2010 Conference Short Papers10.5555/1858842.1858895(286-290)Online publication date: 11-Jul-2010
      • (2009)A unified model of phrasal and sentential evidence for information extractionProceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1 - Volume 110.5555/1699510.1699530(151-160)Online publication date: 6-Aug-2009
      • (2007)Fuzzy pattern rule induction for information extractionProceedings of the 2nd international conference on Advances in computation and intelligence10.5555/1777606.1777686(641-651)Online publication date: 21-Sep-2007
      • (2006)Learning domain-specific information extraction patterns from the WebProceedings of the Workshop on Information Extraction Beyond The Document10.5555/1641408.1641416(66-73)Online publication date: 22-Jul-2006
      • (2006)Adaptive information extractionACM Computing Surveys (CSUR)10.1145/1132956.113295738:2(4-es)Online publication date: 25-Jul-2006
      • (2005)Mining information extraction rules from datasheets without linguistic parsingProceedings of the 18th international conference on Innovations in Applied Artificial Intelligence10.1007/11504894_69(510-520)Online publication date: 22-Jun-2005
      • (2004)Discriminative slot detection using kernel methodsProceedings of the 20th international conference on Computational Linguistics10.3115/1220355.1220464(757-es)Online publication date: 23-Aug-2004
      • Show More Cited By

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Login options

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media