Article

Free access

Closing the gap: learning-based information extraction rivaling knowledge-engineering methods

Authors:

Hai Leong Chieu,

Yoong Keok LeeAuthors Info & Claims

ACL '03: Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1

Pages 216 - 223

https://doi.org/10.3115/1075096.1075124

Published: 07 July 2003 Publication History

Abstract

In this paper, we present a learning approach to the scenario template task of information extraction, where information filling one template could come from multiple sentences. When tested on the MUC-4 task, our learning approach achieves accuracy competitive to the best of the MUC-4 systems, which were all built with manually engineered rules. Our analysis reveals that our use of full parsing and state-of-the-art learning algorithms have contributed to the good performance. To our knowledge, this is the first research to have demonstrated that a learning approach to the full-scale information extraction task could achieve performance rivaling that of the knowledge engineering approach.

References

[1]

M. E. Califf and R. J. Mooney. 1999. Relational learning of pattern-match rules for information extraction. In Proceedings of AAAI99, pages 328--334.

Digital Library

[2]

E. Charniak, C. Hendrickson, N. Jacobson, and M. Perkowitz. 1993. Equations for part-of-speech tagging. In Proceedings of AAA193, pages 784--789.

[3]

H. L. Chieu and H. T. Ng. 2002a. A maximum entropy approach to information extraction from semi-structured and free text. In Proceedings of AAAI02, pages 786--791.

Digital Library

[4]

H. L. Chieu and H. T. Ng. 2002b. Named entity recognition: A maximum entropy approach using global information. In Proceedings of COLING02, pages 190--196.

Digital Library

[5]

F. Ciravegna. 2001. Adaptive information extraction from text by rule induction and generalisation. In Proceedings of IJCAI01, pages 1251--1256.

Digital Library

[6]

M. Collins. 1999. Head-driven statistical models for natural language parsing. Ph.D. thesis, Department of Computer and Information Science, University of Pennsylvania.

Digital Library

[7]

R. O. Duda and P. E. Hart. 1973. Pattern Classification and Scene Analysis. Wiley, New York.

Digital Library

[8]

D. Fisher, S. Soderland, J. McCarthy, F. Feng, and W. Lehnert. 1995. Description of the UMass system as used for MUC-6. In Proceedings of MUC-6, pages 127--140.

Digital Library

[9]

D. Gildea and D. Jurafsky. 2000. Automatic labelling of semantic roles. In Proceedings of ACL00, pages 512--520.

Digital Library

[10]

S. Miller, M. Crystal, H. Fox, L. Ramshaw, R. Schwartz, R. Stone, R. Weischedel, and the Annotation Group. 1998. Algorithms that learn to extract information BBN: Description of the SIFT system as used for MUC-7. In Proceedings of MUC-7.

[11]

J. R. Quinlan. 1993. C4.5: Programs for Machine Learning. Morgan Kaufmann, San Francisco.

Digital Library

[12]

A. Ratnaparkhi. 1998. Maximum Entropy Models for Natural Language Ambiguity Resolution. Ph.D. thesis, Department of Computer and Information Science, University of Pennsylvania.

Digital Library

[13]

L. Rau, G. Krupka, and P. Jacobs. 1992. GE NL-TOOLSET: MUC-4 test results and analysis. In Proceedings of MUC-4, pages 94--99.

Digital Library

[14]

D. Roth and W. Yih. 2001. Relational learning via propositional algorithms: An information extraction case study. In Proceedings of IJACI01, pages 1257--1263.

Digital Library

[15]

S. Soderland. 1999. Learning information extraction rules for semi-structured and free text. Machine Learning, 34(1/2/3):233--272.

Digital Library

[16]

W. M. Soon, H. T. Ng, and D. C. Y. Lim. 2001. A machine learning approach to coreference resolution of noun phrases. Computational Linguistics, 27(4):521--544.

[17]

V. N. Vapnik. 1995. The Nature of Statistical Learning Theory. Springer-Verlag, New York.

Digital Library

Cited By

Yuan QRen XHe WZhang CGeng XHuang LJi HLin CHan JCuzzocrea AAllan JPaton NSrivastava DAgrawal RBroder AZaki MCandan SLabrinidis ASchuster AWang H(2018)Open-Schema Event Profiling for Massive News CorporaProceedings of the 27th ACM International Conference on Information and Knowledge Management10.1145/3269206.3271674(587-596)Online publication date: 17-Oct-2018
https://dl.acm.org/doi/10.1145/3269206.3271674
Çelik DElçi A(2012)An ontology-based information extraction approach for résumésProceedings of the 2012 international conference on Pervasive Computing and the Networked World10.1007/978-3-642-37015-1_14(165-179)Online publication date: 28-Nov-2012
https://dl.acm.org/doi/10.1007/978-3-642-37015-1_14
Chambers NJurafsky DLin D(2011)Template-based information extraction without the templatesProceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 110.5555/2002472.2002595(976-986)Online publication date: 19-Jun-2011
https://dl.acm.org/doi/10.5555/2002472.2002595
Show More Cited By

Closing the gap: learning-based information extraction rivaling knowledge-engineering methods
1. Computing methodologies
  1. Artificial intelligence
2. Hardware
  1. Power and energy
    1. Power estimation and optimization

Recommendations

Towards Closing the Security Gap of Tweak-aNd-Tweak (TNT)
Advances in Cryptology – ASIACRYPT 2020
Abstract
Tweakable block ciphers (TBCs) have been established as a valuable replacement for many applications of classical block ciphers. While several dedicated TBCs have been proposed in the previous years, generic constructions that build a TBC from a ...
$^{}$
$\sqrt{}^{}$ $^{}$ $\sqrt{}^{}$ $^{}$
Closing the Gap: A Learning Algorithm for Lost-Sales Inventory Systems with Lead Times
We consider a periodic-review, single-product inventory system with lost sales and positive lead times under censored demand. In contrast to the classical inventory literature, we assume the firm does not know the demand distribution a priori and makes an ...
Closing the Efficiency Gap Between Synchronous and Network-Agnostic Consensus
Advances in Cryptology – EUROCRYPT 2024
Abstract
In the consensus problem, n parties want to agree on a common value, even if some of them are corrupt and arbitrarily misbehave. If the parties have a common input m, then they must agree on m.
Protocols solving consensus assume either a ... $_{}$ $_{}$
$_{}$ $_{}$ $_{}_{}$ $_{}_{}$
$^{}$

Comments

Information & Contributors

Information

Published In

cover image DL Hosted proceedings

ACL '03: Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1

July 2003

571 pages

Program Chairs:
Erhard W. Hinrichs,
Dan Roth

Publisher

Association for Computational Linguistics

United States

Publication History

Published: 07 July 2003

Qualifiers

Article

Acceptance Rates

Overall Acceptance Rate 85 of 443 submissions, 19%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

11
Total Citations
View Citations
393
Total Downloads

Downloads (Last 12 months)62
Downloads (Last 6 weeks)4

Reflects downloads up to 25 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Yuan QRen XHe WZhang CGeng XHuang LJi HLin CHan JCuzzocrea AAllan JPaton NSrivastava DAgrawal RBroder AZaki MCandan SLabrinidis ASchuster AWang H(2018)Open-Schema Event Profiling for Massive News CorporaProceedings of the 27th ACM International Conference on Information and Knowledge Management10.1145/3269206.3271674(587-596)Online publication date: 17-Oct-2018
https://dl.acm.org/doi/10.1145/3269206.3271674
Çelik DElçi A(2012)An ontology-based information extraction approach for résumésProceedings of the 2012 international conference on Pervasive Computing and the Networked World10.1007/978-3-642-37015-1_14(165-179)Online publication date: 28-Nov-2012
https://dl.acm.org/doi/10.1007/978-3-642-37015-1_14
Chambers NJurafsky DLin D(2011)Template-based information extraction without the templatesProceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 110.5555/2002472.2002595(976-986)Online publication date: 19-Jun-2011
https://dl.acm.org/doi/10.5555/2002472.2002595
Fader ASoderland SEtzioni O(2010)Extracting sequences from the webProceedings of the ACL 2010 Conference Short Papers10.5555/1858842.1858895(286-290)Online publication date: 11-Jul-2010
https://dl.acm.org/doi/10.5555/1858842.1858895
Patwardhan SRiloff EKoehn PMihalcea R(2009)A unified model of phrasal and sentential evidence for information extractionProceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1 - Volume 110.5555/1699510.1699530(151-160)Online publication date: 6-Aug-2009
https://dl.acm.org/doi/10.5555/1699510.1699530
Xiao J(2007)Fuzzy pattern rule induction for information extractionProceedings of the 2nd international conference on Advances in computation and intelligence10.5555/1777606.1777686(641-651)Online publication date: 21-Sep-2007
https://dl.acm.org/doi/10.5555/1777606.1777686
Patwardhan SRiloff ECaliff MGreenwood MStevenson MYangarber R(2006)Learning domain-specific information extraction patterns from the WebProceedings of the Workshop on Information Extraction Beyond The Document10.5555/1641408.1641416(66-73)Online publication date: 22-Jul-2006
https://dl.acm.org/doi/10.5555/1641408.1641416
Turmo JAgeno ACatalà N(2006)Adaptive information extractionACM Computing Surveys (CSUR)10.1145/1132956.113295738:2(4-es)Online publication date: 25-Jul-2006
https://dl.acm.org/doi/10.1145/1132956.1132957
Agrawal RHo HJacquenet FJacquenet M(2005)Mining information extraction rules from datasheets without linguistic parsingProceedings of the 18th international conference on Innovations in Applied Artificial Intelligence10.1007/11504894_69(510-520)Online publication date: 22-Jun-2005
https://dl.acm.org/doi/10.1007/11504894_69
Zhao SMeyers AGrishman R(2004)Discriminative slot detection using kernel methodsProceedings of the 20th international conference on Computational Linguistics10.3115/1220355.1220464(757-es)Online publication date: 23-Aug-2004
https://dl.acm.org/doi/10.3115/1220355.1220464
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents