Multi-level Boundary Classification for Information Extraction

Finn, Aidan; Kushmerick, Nicholas

doi:10.1007/978-3-540-30115-8_13

Aidan Finn²² &
Nicholas Kushmerick²²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3201))

Included in the following conference series:

European Conference on Machine Learning

Abstract

We investigate the application of classification techniques to the problem of information extraction (IE). In particular we use support vector machines and several different feature-sets to build a set of classifiers for IE. We show that this approach is competitive with current state-of-the-art IE algorithms based on specialized learning algorithms. We also introduce a new technique for improving the recall of our IE algorithm. This approach uses a two-level ensemble of classifiers to improve the recall of the extracted fragments while maintaining high precision. We show that this approach outperforms current state-of-the-art IE algorithms on several benchmark IE tasks.

Download to read the full chapter text

Chapter PDF

Improving Supervised Classification Using Information Extraction

Data Mining Algorithms for Knowledge Extraction

TEES 2.2: Biomedical Event Extraction for Diverse Corpora

Article Open access 30 October 2015

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Brill, E.: Some advances in transformation-based parts of speech tagging. In: AAAI (1994)
Google Scholar
Califf, M.E., Mooney, R.J.: Relational learning of pattern-match rules for information extraction. In: Proc. 16th Nat. Conf. Artifical Intelligence (1999)
Google Scholar
Ciravegna, F.: Adaptive information extraction from text by rule induction and generalisation. In: Proc. 17th Int. Joint Conf. Artificial Intelligence (2001)
Google Scholar
Cohen, W.: Fast effective rule induction. In: ICML (1995)
Google Scholar
Freitag, D.: Machine Learning for Information Extraction in Informal Domains. PhD thesis, Carnegie Mellon University (1998)
Google Scholar
Freitag, D., Kushmerick, N.: Boosted wrapper induction. In: Proc. 17th Nat. Conf. Artificial Intelligence (2000)
Google Scholar
Lavelli, A., Califf, M.E., Ciravegna, F., Freitag, D., Giuliano, C., Kushmerick, N., Romano, L.: A critical survey of the methodology for IE evaluation. In: 4th International Conference on Language Resources and Evaluation (2004)
Google Scholar
Littlestone, N.: Learning quickly when irrelevant attributes abound: A new linear-threshold algorithm. Machine Learning 2(4), 285–318 (1988)
Google Scholar
Peshkin, L., Pfeffer, A.: Bayesian information extraction network. In: Proc.18th Int. Joint Conf. Artifical Intelligence (2003)
Google Scholar
Platt, J.C.: Fast training of support vector machines using sequential minimal optimization. In: Schölkopf, B., Burges, C., Smola, A. (eds.) Advances in Kernel Methods - Support Vector Learning, MIT Press, Cambridge (1998)
Google Scholar
Quinlan, R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Francisco (1993)
Google Scholar
Roth, D.: Learning to resolve natural language ambiguities: A unified approach. In: National Conference on Artificial Intelligence (1998)
Google Scholar
Roth, D., Yih, W.-T.: Relational learning via propositional algorithms: An information extraction case study. In: 17th International Joint Conference on Artificial Intelligence (2001)
Google Scholar
Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann, San Francisco (2000)
Google Scholar

Download references

Author information

Authors and Affiliations

Smart Media Institute, Computer Science Department, University College Dublin, Ireland
Aidan Finn & Nicholas Kushmerick

Authors

Aidan Finn
View author publications
You can also search for this author in PubMed Google Scholar
Nicholas Kushmerick
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

INSA-Lyon, LIRIS CNRS UMR5205, F-69621, Villeurbanne, France
Jean-François Boulicaut
Dipartimento di Informatica, Università degli Studi di Bari,
Floriana Esposito
Pisa KDD Laboratory, ISTI - CNR, Area della Ricerca di Pisa, Via Giuseppe Moruzzi 1, Pisa, Italy
Fosca Giannotti
Dipartimento di Informatica, Via F. Buonarroti 2, 56127, Pisa, Italy
Dino Pedreschi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Finn, A., Kushmerick, N. (2004). Multi-level Boundary Classification for Information Extraction. In: Boulicaut, JF., Esposito, F., Giannotti, F., Pedreschi, D. (eds) Machine Learning: ECML 2004. ECML 2004. Lecture Notes in Computer Science(), vol 3201. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30115-8_13

Download citation

DOI: https://doi.org/10.1007/978-3-540-30115-8_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23105-9
Online ISBN: 978-3-540-30115-8
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics

Multi-level Boundary Classification for Information Extraction

Abstract

Chapter PDF

Similar content being viewed by others

Improving Supervised Classification Using Information Extraction

Data Mining Algorithms for Knowledge Extraction

TEES 2.2: Biomedical Event Extraction for Diverse Corpora

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Multi-level Boundary Classification for Information Extraction

Abstract

Chapter PDF

Similar content being viewed by others

Improving Supervised Classification Using Information Extraction

Data Mining Algorithms for Knowledge Extraction

TEES 2.2: Biomedical Event Extraction for Diverse Corpora

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation