Article

Learning to extract symbolic knowledge from the World Wide Web

Authors:

Seán SlatteryAuthors Info & Claims

AAAI '98/IAAI '98: Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence

Pages 509 - 516

Published: 01 July 1998 Publication History

Abstract

The World Wide Web is a vast source of information accessible to computers, but understandable only to humans. The goal of the research described here is to automatically create a computer understandable world wide knowledge base whose content mirrors that of the World Wide Web. Such a knowledge base would enable much more effective retrieval of Web information, and promote new uses of the Web to support knowledge-based inference and problem solving. Our approach is to develop a trainable information extraction system that takes two inputs: an ontology defining the classes and relations of interest, and a set of training data consisting of labeled regions of hypertext representing instances of these classes and relations. Given these inputs, the system learns to extract information from other pages and hyperlinks on the Web. This paper describes our general approach, several machine learning algorithms for this task, and promising initial results with a prototype system.

References

[1]

Cestnik, B. 1990. Estimating probabilities: A crucial task in machine learning. In Aiello, L., ed., Proc. of the 9th European Conf. on Artificial Inttlligence .

Google Scholar

[2]

Craven, M.; DiPasquo, D.; Freitag, D.; McCallum, A.; Mitchell, T.; Nigam, K.; and Slattery, S. 1998. Learning to extract symbolic knowledge from the World Wide Web. Technical report, CMU CS Dept.

Crossref

Google Scholar

[3]

Lewis, D.; Schapire, R. E.; Callan, J. P.; and Papka, R. 1996. Training algorithms for linear text classifiers. In Proc. of the 19th Annual Int. ACM SlCIR Conf.

Crossref

Google Scholar

[4]

Quinlan, J. R., and Cameron-Jones, R. M. 1993. FOIL: A midterm report. In Proc. of the 12th European Conf. on Machine Learning .

Crossref

Google Scholar

[5]

Richards, B. L., and Mooney, R. J. 1992. Learning relations by pathfinding. In Proc. of the 10th National Conf. on Artificial Intelligence .

Google Scholar

[6]

Shakes, J. Langheinrich, M., and Etzicni, O. 1996. Dynamic reference sifting: a case study in the homepage domain. In Proc. of 6th Int. World Wide Web Conf. .

Crossref

Google Scholar

[7]

Soderland, S. 1996. Learning Text Analysis Rules for Domain-specific Natural Language Processing . Ph.D. Dissertation, University of Massachusetts. Department of Computer Science Technical Report 96-087.

Crossref

Google Scholar

[8]

Spertus, E. 1997. ParaSite: Mining structural information on the Web. In Proc. of the 6th Int. World Wide Web Conf. .

Crossref

Google Scholar

Cited By

View all

Sherkat EMilios EMinghim R(2019)A Visual Analytics Approach for Interactive Document ClusteringACM Transactions on Interactive Intelligent Systems10.1145/324138010:1(1-33)Online publication date: 9-Aug-2019
https://dl.acm.org/doi/10.1145/3241380
Wang CChi CShe ZCao LStantic B(2018)Coupled Clustering Ensemble by Exploring Data InterdependenceACM Transactions on Knowledge Discovery from Data10.1145/323096712:6(1-38)Online publication date: 28-Aug-2018
https://dl.acm.org/doi/10.1145/3230967
Mitchell TCohen WHruschka ETalukdar PYang BBetteridge JCarlson ADalvi BGardner MKisiel BKrishnamurthy JLao NMazaitis KMohamed TNakashole NPlatanios ERitter ASamadi MSettles BWang RWijaya DGupta AChen XSaparov AGreaves MWelling J(2018)Never-ending learningCommunications of the ACM10.1145/319151361:5(103-115)Online publication date: 24-Apr-2018
https://dl.acm.org/doi/10.1145/3191513
Show More Cited By

Index Terms

Learning to extract symbolic knowledge from the World Wide Web

Recommendations

World wide web site summarization

Summaries of Web sites help Web users get an idea of the site contents without having to spend time browsing the sites. Currently, manually constructed summaries of Web sites by volunteer experts are available, such as the DMOZ Open Directory Project. ...
World Wide Web Bible
The world wide telecom web browser
WWW '08: Proceedings of the 17th international conference on World Wide Web

As the number of telephony voice applications grow, there will be a need for a browser to surf the Web of interconnected voice applications (called as VoiceSites). These VoiceSites are accessed through a telephone over an audio channel. We present the ...

Comments

Information & Contributors

Information

Published In

AAAI '98/IAAI '98: Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence

July 1998

1218 pages

ISBN:0262510987

Chairmen:
Jack Mostow
Carnegie Mellon Univ., Pittsburgh, PA
,
Charles Rich
Mitsubishi Electric Research Lab
,
Bruce Buchanan
Mitsubishi Electric Research Lab

Publisher

American Association for Artificial Intelligence

United States

Publication History

Published: 01 July 1998

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

124
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 30 Aug 2024

Other Metrics

View Author Metrics

Citations

Cited By

View all

Sherkat EMilios EMinghim R(2019)A Visual Analytics Approach for Interactive Document ClusteringACM Transactions on Interactive Intelligent Systems10.1145/324138010:1(1-33)Online publication date: 9-Aug-2019
https://dl.acm.org/doi/10.1145/3241380
Wang CChi CShe ZCao LStantic B(2018)Coupled Clustering Ensemble by Exploring Data InterdependenceACM Transactions on Knowledge Discovery from Data10.1145/323096712:6(1-38)Online publication date: 28-Aug-2018
https://dl.acm.org/doi/10.1145/3230967
Mitchell TCohen WHruschka ETalukdar PYang BBetteridge JCarlson ADalvi BGardner MKisiel BKrishnamurthy JLao NMazaitis KMohamed TNakashole NPlatanios ERitter ASamadi MSettles BWang RWijaya DGupta AChen XSaparov AGreaves MWelling J(2018)Never-ending learningCommunications of the ACM10.1145/319151361:5(103-115)Online publication date: 24-Apr-2018
https://dl.acm.org/doi/10.1145/3191513
Etzioni O(2018)Technical perspective: Breaking the mold of machine learningCommunications of the ACM10.1145/319151161:5(102-102)Online publication date: 24-Apr-2018
https://dl.acm.org/doi/10.1145/3191511
Xu QWang QXu CQu LLim EWinslett MSanderson MFu ASun JCulpepper SLo EHo JDonato DAgrawal RZheng YCastillo CSun ATseng VLi C(2017)Attentive Graph-based Recursive Neural Network for Collective Vertex ClassificationProceedings of the 2017 ACM on Conference on Information and Knowledge Management10.1145/3132847.3133081(2403-2406)Online publication date: 6-Nov-2017
https://dl.acm.org/doi/10.1145/3132847.3133081
Zhao LHuang MYao ZSu RJiang YZhu X(2016)Semi-supervised Multinomial Naive Bayes for text classification by leveraging word-level statistical constraintProceedings of the Thirtieth AAAI Conference on Artificial Intelligence10.5555/3016100.3016304(2877-2883)Online publication date: 12-Feb-2016
https://dl.acm.org/doi/10.5555/3016100.3016304
Reagen BWhatmough PAdolf RRama SLee HLee SHernández-Lobato JWei GBrooks D(2016)MinervaACM SIGARCH Computer Architecture News10.1145/3007787.300116544:3(267-278)Online publication date: 18-Jun-2016
https://dl.acm.org/doi/10.1145/3007787.3001165
Reagen BWhatmough PAdolf RRama SLee HLee SHernández-Lobato JWei GBrooks DMin SLoh G(2016)MinervaProceedings of the 43rd International Symposium on Computer Architecture10.1109/ISCA.2016.32(267-278)Online publication date: 18-Jun-2016
https://dl.acm.org/doi/10.1109/ISCA.2016.32
Cheng XMiao DWang C(2015)A link-based approach to semantic relation analysisNeurocomputing10.1016/j.neucom.2014.12.011154:C(127-138)Online publication date: 22-Apr-2015
https://dl.acm.org/doi/10.1016/j.neucom.2014.12.011
Ke W(2015)Information-theoretic term weighting schemes for document clustering and classificationInternational Journal on Digital Libraries10.1007/s00799-014-0121-316:2(145-159)Online publication date: 1-Jun-2015
https://dl.acm.org/doi/10.1007/s00799-014-0121-3
Show More Cited By

View Options

View options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Abstract

References

Cited By

Index Terms

Recommendations

World wide web site summarization

World Wide Web Bible

The world wide telecom web browser

Comments

Information

Published In

Publisher

Publication History

Qualifiers

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View options

Get Access

Login options

Full Access

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations