Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1458082.1458150acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Learning to link with wikipedia

Published: 26 October 2008 Publication History

Abstract

This paper describes how to automatically cross-reference documents with Wikipedia: the largest knowledge base ever known. It explains how machine learning can be used to identify significant terms within unstructured text, and enrich it with links to the appropriate Wikipedia articles. The resulting link detector and disambiguator performs very well, with recall and precision of almost 75%. This performance is constant whether the system is evaluated on Wikipedia articles or "real world" documents.
This work has implications far beyond enriching documents with explanatory links. It can provide structured knowledge about any unstructured fragment of text. Any task that is currently addressed with bags of words - indexing, clustering, retrieval, and summarization to name a few - could use the techniques described here to draw on a vast network of concepts and semantics.

References

[1]
Auer, S. and Bizer, C. and Kobilarov, G. and Lehmann, J. and Cyganiak, R. and Ives, Z. (2007) DBpedia: A Nucleus for a Web of Open Data. In Proceedings of the 6th International Semantic Web Conference, Busan, Korea.
[2]
Banerjee, S. and Ramanathan, K. and Gupta, A. (2007) Clustering short texts using Wikipedia. In Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, Amsterdam, pp. 787--788.
[3]
Barr, J. and Cabrera, L. F. (2006) AI gets a brain. In ACM Queue 4(4), pp. 24--29.
[4]
David, C., L. Giroux, S. Bertrand-Gastaldy, and D. Lanteigne (1995) Indexing as problem solving: A cognitive approach to consistency. In Proceedings of the ASIS Annual Meeting, Medford, NJ, pp. 49--55.
[5]
Dolan, S. (2008) Six Degrees of Wikipedia. Retrieved June 2008 from www.netsoc.tcd.ie/~mu/wiki/
[6]
Drenner, S., Harper, M., Frankowski, D., Riedl, J. and Terveen, L. (2006) Insert movie reference here: a system to bridge conversation and item-oriented web sites. In Proceedings of the SIGCHI conference on Human Factors in computing systems, New York, NY, pp. 951--954
[7]
Gabrilovich, E. and Markovitch, S. (2007) Overcoming the brittleness bottleneck using Wikipedia: Enhancing text categorization with encyclopedic knowledge. In Proceedings of the Twenty-First National Conference on Artificial Intelligence, Boston, MA.
[8]
Howe, J. (2006) The Rise of Crowdsourcing. In Wired Magazine 14(6).
[9]
Lih, A. (2004) Wikipedia as Participatory Journalism: Reliable Sources? Metrics for evaluating collaborative media as a news resource. In Proceedings of the 5th International Symposium on Online Journalism, Austin, Texas.
[10]
Maron, M. E. (1977) On indexing, retrieval and the meaning of about. In Journal of the American Society for Information Science 28(1), pp. 38--43
[11]
Medelyan, O., Witten, I. H. and Milne, D. (2008) Topic Indexing with Wikipedia. In Proceedings of the AAAI 2008 Workshop on Wikipedia and Artificial Intelligence (WIKIAI 2008), Chicago, IL.
[12]
Mihalcea, R. and Csomai, A. (2007) Wikify!: linking documents to encyclopedic knowledge. In Proceedings of the 16th ACM Conference on Information and Knowledge management (CIKM'07), Lisbon, Portugal, pp. 233--242
[13]
Milne, D., Witten, I. H. and Nichols, D. M. (2007). A Knowledge-Based Search Engine Powered by Wikipedia. In Proceedings of the ACM Conference on Information and Knowledge Management (CIKM'2007), Lisbon, Portugal.
[14]
Milne, D., and Witten, I. H. (2008) An effective, low-cost measure of semantic relatedness obtained from Wikipedia links. In Proceedings of the AAAI 2008 Workshop on Wikipedia and Artificial Intelligence (WIKIAI 2008), Chicago, IL.
[15]
Mossberg, W. (2001) New Windows XP Feature Can Re-Edit Others' Sites. The Wall Street Journal, June 2001
[16]
Ponzetto, S. P. and Strube, M. (2007) Deriving a Large Scale Taxonomy from Wikipedia. In Proceedings of the 22st National Conference on Artificial Intelligence (AAAI'07), Vancouver, British Columbia, pp. 1440--1445.
[17]
Quinlan, J. R. (1993) C4. 5: Programs for Machine Learning. Morgan Kaufmann
[18]
Suchanek, F. M. and Kasneci, G. and Weikum, G. (2007) Yago: a core of semantic knowledge. In Proceedings of the 16th international conference on World Wide Web (WWW'07), Alberta, Canada, pp. 697--706.
[19]
Völkel, M. and Krötzsch, M. and Vrandecic, D. and Haller, H. and Studer, R. (2006) Semantic Wikipedia. In Proceedings of the 15th international conference on World Wide Web (WWW'06), Edinburgh, Scotland, pp. 585--594

Cited By

View all
  • (2024)Entity Linking Model Based on Cascading Attention and Dynamic GraphElectronics10.3390/electronics1319384513:19(3845)Online publication date: 28-Sep-2024
  • (2024)Large Language Models Enable Few-Shot ClusteringTransactions of the Association for Computational Linguistics10.1162/tacl_a_0064812(321-333)Online publication date: 5-Apr-2024
  • (2024)LLM-Assisted Analytics in Semiconductor Test (Invited)Proceedings of the 2024 ACM/IEEE International Symposium on Machine Learning for CAD10.1145/3670474.3685974(1-7)Online publication date: 9-Sep-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
CIKM '08: Proceedings of the 17th ACM conference on Information and knowledge management
October 2008
1562 pages
ISBN:9781595939913
DOI:10.1145/1458082
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 October 2008

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. data mining
  2. semantic annotation
  3. wikipedia
  4. word sense disambiguation

Qualifiers

  • Research-article

Conference

CIKM08
CIKM08: Conference on Information and Knowledge Management
October 26 - 30, 2008
California, Napa Valley, USA

Acceptance Rates

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)144
  • Downloads (Last 6 weeks)12
Reflects downloads up to 16 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Entity Linking Model Based on Cascading Attention and Dynamic GraphElectronics10.3390/electronics1319384513:19(3845)Online publication date: 28-Sep-2024
  • (2024)Large Language Models Enable Few-Shot ClusteringTransactions of the Association for Computational Linguistics10.1162/tacl_a_0064812(321-333)Online publication date: 5-Apr-2024
  • (2024)LLM-Assisted Analytics in Semiconductor Test (Invited)Proceedings of the 2024 ACM/IEEE International Symposium on Machine Learning for CAD10.1145/3670474.3685974(1-7)Online publication date: 9-Sep-2024
  • (2024)TRAFMEL: Multimodal Entity Linking Based on Transformer Reranking and Multimodal Co-Attention FusionInternational Journal of Software Engineering and Knowledge Engineering10.1142/S021819402450013X34:06(973-997)Online publication date: 16-May-2024
  • (2024)Self-Supervised Enhancement for Named Entity Disambiguation via Multimodal Graph ConvolutionIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2022.317317935:1(231-245)Online publication date: Jan-2024
  • (2024)Span-based few-shot event detection via aligning external knowledgeNeural Networks10.1016/j.neunet.2024.106327176(106327)Online publication date: Aug-2024
  • (2024)Adaptive deep learning for entity disambiguation via knowledge-based risk analysisExpert Systems with Applications: An International Journal10.1016/j.eswa.2023.122342238:PEOnline publication date: 27-Feb-2024
  • (2024)A graph based named entity disambiguation using clique partitioning and semantic relatednessData & Knowledge Engineering10.1016/j.datak.2024.102308152(102308)Online publication date: Jul-2024
  • (2024)NEREL: a Russian information extraction dataset with rich annotation for nested entities, relations, and wikidata entity linksLanguage Resources and Evaluation10.1007/s10579-023-09674-z58:2(547-583)Online publication date: 1-Jun-2024
  • (2024)Entity linking for English and other languages: a surveyKnowledge and Information Systems10.1007/s10115-023-02059-266:7(3773-3824)Online publication date: 2-Apr-2024
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media