Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/295240.295725guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Learning to extract symbolic knowledge from the World Wide Web

Published: 01 July 1998 Publication History

Abstract

The World Wide Web is a vast source of information accessible to computers, but understandable only to humans. The goal of the research described here is to automatically create a computer understandable world wide knowledge base whose content mirrors that of the World Wide Web. Such a knowledge base would enable much more effective retrieval of Web information, and promote new uses of the Web to support knowledge-based inference and problem solving. Our approach is to develop a trainable information extraction system that takes two inputs: an ontology defining the classes and relations of interest, and a set of training data consisting of labeled regions of hypertext representing instances of these classes and relations. Given these inputs, the system learns to extract information from other pages and hyperlinks on the Web. This paper describes our general approach, several machine learning algorithms for this task, and promising initial results with a prototype system.

References

[1]
Cestnik, B. 1990. Estimating probabilities: A crucial task in machine learning. In Aiello, L., ed., Proc. of the 9th European Conf. on Artificial Inttlligence .
[2]
Craven, M.; DiPasquo, D.; Freitag, D.; McCallum, A.; Mitchell, T.; Nigam, K.; and Slattery, S. 1998. Learning to extract symbolic knowledge from the World Wide Web. Technical report, CMU CS Dept.
[3]
Lewis, D.; Schapire, R. E.; Callan, J. P.; and Papka, R. 1996. Training algorithms for linear text classifiers. In Proc. of the 19th Annual Int. ACM SlCIR Conf.
[4]
Quinlan, J. R., and Cameron-Jones, R. M. 1993. FOIL: A midterm report. In Proc. of the 12th European Conf. on Machine Learning .
[5]
Richards, B. L., and Mooney, R. J. 1992. Learning relations by pathfinding. In Proc. of the 10th National Conf. on Artificial Intelligence .
[6]
Shakes, J. Langheinrich, M., and Etzicni, O. 1996. Dynamic reference sifting: a case study in the homepage domain. In Proc. of 6th Int. World Wide Web Conf. .
[7]
Soderland, S. 1996. Learning Text Analysis Rules for Domain-specific Natural Language Processing . Ph.D. Dissertation, University of Massachusetts. Department of Computer Science Technical Report 96-087.
[8]
Spertus, E. 1997. ParaSite: Mining structural information on the Web. In Proc. of the 6th Int. World Wide Web Conf. .

Cited By

View all
  • (2019)A Visual Analytics Approach for Interactive Document ClusteringACM Transactions on Interactive Intelligent Systems10.1145/324138010:1(1-33)Online publication date: 9-Aug-2019
  • (2018)Coupled Clustering Ensemble by Exploring Data InterdependenceACM Transactions on Knowledge Discovery from Data10.1145/323096712:6(1-38)Online publication date: 28-Aug-2018
  • (2018)Never-ending learningCommunications of the ACM10.1145/319151361:5(103-115)Online publication date: 24-Apr-2018
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings
AAAI '98/IAAI '98: Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
July 1998
1218 pages
ISBN:0262510987

Publisher

American Association for Artificial Intelligence

United States

Publication History

Published: 01 July 1998

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 30 Aug 2024

Other Metrics

Citations

Cited By

View all
  • (2019)A Visual Analytics Approach for Interactive Document ClusteringACM Transactions on Interactive Intelligent Systems10.1145/324138010:1(1-33)Online publication date: 9-Aug-2019
  • (2018)Coupled Clustering Ensemble by Exploring Data InterdependenceACM Transactions on Knowledge Discovery from Data10.1145/323096712:6(1-38)Online publication date: 28-Aug-2018
  • (2018)Never-ending learningCommunications of the ACM10.1145/319151361:5(103-115)Online publication date: 24-Apr-2018
  • (2018)Technical perspective: Breaking the mold of machine learningCommunications of the ACM10.1145/319151161:5(102-102)Online publication date: 24-Apr-2018
  • (2017)Attentive Graph-based Recursive Neural Network for Collective Vertex ClassificationProceedings of the 2017 ACM on Conference on Information and Knowledge Management10.1145/3132847.3133081(2403-2406)Online publication date: 6-Nov-2017
  • (2016)Semi-supervised Multinomial Naive Bayes for text classification by leveraging word-level statistical constraintProceedings of the Thirtieth AAAI Conference on Artificial Intelligence10.5555/3016100.3016304(2877-2883)Online publication date: 12-Feb-2016
  • (2016)MinervaACM SIGARCH Computer Architecture News10.1145/3007787.300116544:3(267-278)Online publication date: 18-Jun-2016
  • (2016)MinervaProceedings of the 43rd International Symposium on Computer Architecture10.1109/ISCA.2016.32(267-278)Online publication date: 18-Jun-2016
  • (2015)A link-based approach to semantic relation analysisNeurocomputing10.1016/j.neucom.2014.12.011154:C(127-138)Online publication date: 22-Apr-2015
  • (2015)Information-theoretic term weighting schemes for document clustering and classificationInternational Journal on Digital Libraries10.1007/s00799-014-0121-316:2(145-159)Online publication date: 1-Jun-2015
  • Show More Cited By

View Options

View options

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media