Adaptive Information Extraction: Core Technologies for Information Agents

Kushmerick, Nicholas; Thomas, Bernd

doi:10.1007/3-540-36561-3_4

Nicholas Kushmerick⁵ &
Bernd Thomas⁶

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2586))

365 Accesses
25 Citations

Abstract

For the purposes of this chapter, an information agent can be described as a distributed system that receives a goal through its user interface, gathers information relevant to this goal from a variety of sources, processes this content as appropriate, and delivers the results to the users. We focus on the second stage in this generic architecture. We survey a variety of information extraction techniques that enable information agents to automatically gather information from heterogeneous sources.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Information Extraction Approaches: A Survey

Information Retrieval

Distributed Platform for the Extraction and Analysis of Information

References

F. Bergadano and D. Gunetti. Inductive Logic Programming. MIT Press, 1996.
Google Scholar
G. Beuster, B. Thomas, and C. Wolff. MIA-A Ubiquitous Multi-Agent Web Information System. In Proceedings of International ICSC Symposium on Multi-Agents and MobileAgents in Virtual Organizations and E-Commerce (MAMA’2000), December 2000.
Google Scholar
D. Bikel, S. Miller, R. Schwartz, and R. Weischedel. Nymble: A high-performance learning name-finder. In Proc. Conf. on Applied Natural Language Processing, 1997.
Google Scholar
S. Brin. Extracting patterns and relations from the World Wide Web. In Proc. SIGMOD Workshop on Databases and the Web, 1998.
Google Scholar
M. E. Califf. Relational Learning Techniques for Natural Language Information Extraction. PhD thesis, University of Texas at Austin, August 1998.
Google Scholar
F. Ciravegna. Learning to Tag for Information Extraction from Text. In Workshop Machine Learning for Information Extraction, European Conference on Artifical Intelligence ECCAI, August 2000. Berlin, Germany.
Google Scholar
P. Clark and T. Niblett. The CN2 induction algorithm. Machine Learning, 3:261–283, 1989.
Google Scholar
W. Cohen and L. Jensen. A structured wrapper induction system for extracting information from semi-structured documents.
Google Scholar
V. Crescenzi, G. Mecca, and P. Merialdo. Roadrunner: Towards automatic data extraction from large web sites. In The VLDB Journal, pages 109–118, 2001.
Google Scholar
D. Freitag. Machine Learning for Information Extraction in Informal Domains. PhD thesis, Computer Science Department, Carnegie Mellon University, Pittsburgh, PA, November 1998.
Google Scholar
D. Freitag and N. Kushmerick. Boosted Wrapper Induction. In Proceedings of the Seventh National Conference on Artificial, pages 577–583, July 30–August 3 2000. Austin, Texas.
Google Scholar
D. Freitag and A. McCallum. Information Extraction withHMMstructures learned by stochastic optimization. In Proceedings of the Seventh National Conference on Artificial, July 30–August 3 2000. Austin, Texas.
Google Scholar
G. Grieser, K. P. Jantke, S. Lange, and B. Thomas. A Unifying Approach to HTML Wrapper Representation and Learning. In Proceedings of the Third International Conference on Discovery Science, December 2000. Kyoto, Japan.
Google Scholar
C. Hsu and M. Dung. Generating finite-state transducers for semistructured data extraction from the web. J. Information Systems, 23(8):521–538, 1998.
Article Google Scholar
L. Jensen and W. Cohen. Grouping extracted fields. In Proc. IJCAI-01Workshop on Adaptive Text Extraction and Mining, 2001.
Google Scholar
M. Junker, M. Sintek, and M. Rinck. Learning for Text Categorization and Information Extraction with ILP. In Proc. Workshop on Learning Language in Logic, June 1999. Bled, Slovenia.
Google Scholar
N. Kushmerick. Wrapper Induction for Information Extraction. PhD thesis, University of Washington, 1997.
Google Scholar
N. Kushmerick. Regression testing for wrapper maintenance. In Proc. National Conference on Artificial Intelligence, pages 74–79, 1999.
Google Scholar
N. Kushmerick. Wrapper induction: Efficiency and expressiveness. Artificial Intelligence, 118(1–2):15–68, 2000.
Article MATH MathSciNet Google Scholar
N. Kushmerick. Wrapper verification. World Wide Web Journal, 3(2):79–94, 2000.
Article MATH Google Scholar
N. Kushmerick, D. S. Weld, and R. Doorenbos. Wrapper Induction for Information Extraction. In M. E. Pollack, editor, Fifteenth International Joint Conference on Artificial Intelligence, volume 1, pages 729–735, August 1997. Japan.
Google Scholar
T. Leek. Information extraction using hidden Markov models. Master’s thesis, University of California, San Diego, 1997.
Google Scholar
K. Lerman and S. Minton. Learning the common structure of data. In Proc. National Conference on Artificial Intelligence, 2000.
Google Scholar
T. M. Mitchell. Machine Learning. McGraw-Hill, 1997.
Google Scholar
S. Muggleton and L. D. Raedt. Inductive logic programming: Theory and methods. Journal of Logic Programming, 19(20):629–679, 1994.
Article MathSciNet Google Scholar
I. Muslea. Extraction patterns for information extraction tasks: A survey. In Proc. AAAI-99 Workshop on Machine Learning for Information Extraction, 1999.
Google Scholar
I. Muslea, S. Minton, and C. Knoblock. A hierarchical approach to wrapper induction. In Proc. Third International Conference on Autonomous Agents, pages 190–197, 1999.
Google Scholar
I. Muslea, S. Minton, and C. Knoblock. Selective sampling with redundant views. In Proc. National Conference on Artificial Intelligence, 2000.
Google Scholar
J. R. Quinlan. Learning logical definitions from relations. Machine Learning, 5:239–266, 1990.
Google Scholar
E. M. Riloff. Information Extraction as a Basis for Portable Text Classification Systems. PhD thesis, University of Massachusetts Amherst, 1994.
Google Scholar
K. Seymore, A. McCallum, and R. Rosenfeld. Learning hidden Markov model structure for information extraction. In Proc. AAAI-99 Workshop on Machine Learning for Information Extraction, 1999.
Google Scholar
S. Soderland. Learning information extraction rules for semi-structured and free text. Machine Learning, 34(1–3):233–272, 1999.
Article MATH Google Scholar
S. G. Soderland. Learning Text Analysis Rules for Domain-Specific Natural Language Processing.PhD thesis, University of Massachusetts Amherst, 1997.
Google Scholar
B. Thomas. Anti-Unification Based Learning of T-Wrappers for Information Extraction. In Proc. AAAI-99 Workshop on Machine Learning for Information Extraction, 1999.
Google Scholar
B. Thomas. Token-Templates and Logic Programs for Intelligent Web Search. Intelligent Information Systems, 14(2/3):241–261, March-June 2000. Special Issue: Methodologies for Intelligent Information Systems.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Computer Science Department, University College Dublin, Dublin
Nicholas Kushmerick
Institut für Informatik, Universität Koblenz-Landau, Koblenz-Landau
Bernd Thomas

Authors

Nicholas Kushmerick
View author publications
You can also search for this author in PubMed Google Scholar
Bernd Thomas
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

German Research Center for Artifical Intelligence Multiagent Systems Group, Stuhlsatzenhausweg 3, 66123, Saarbrücken, Germany
Matthias Klusch
Dipartimento di Ingegneria dell’Informazione, Università di Modena e Reggio Emilia, Via Vignolese 905, 41100, Modena, Italy
Sonia Bergamaschi
Department of Computing Science, University of Aberdeen King’s College, AB24 5UE, Aberdeen, UK
Pete Edwards
Austrian Research Institute for Artificial Intelligence, Schottengasse 3, 1010, Vienna, Austria
Paolo Petta

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Kushmerick, N., Thomas, B. (2003). Adaptive Information Extraction: Core Technologies for Information Agents. In: Klusch, M., Bergamaschi, S., Edwards, P., Petta, P. (eds) Intelligent Information Agents. Lecture Notes in Computer Science(), vol 2586. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-36561-3_4

Download citation

DOI: https://doi.org/10.1007/3-540-36561-3_4
Published: 14 March 2003
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-00759-3
Online ISBN: 978-3-540-36561-7
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics

Adaptive Information Extraction: Core Technologies for Information Agents

Abstract

Access this chapter

Preview

Similar content being viewed by others

Information Extraction Approaches: A Survey

Information Retrieval

Distributed Platform for the Extraction and Analysis of Information

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Navigation

Adaptive Information Extraction: Core Technologies for Information Agents

Abstract

Access this chapter

Preview

Similar content being viewed by others

Information Extraction Approaches: A Survey

Information Retrieval

Distributed Platform for the Extraction and Analysis of Information

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation