Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2396761.2396832acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

KORE: keyphrase overlap relatedness for entity disambiguation

Published: 29 October 2012 Publication History

Abstract

Measuring the semantic relatedness between two entities is the basis for numerous tasks in IR, NLP, and Web-based knowledge extraction. This paper focuses on disambiguating names in a Web or text document by jointly mapping all names onto semantically related entities registered in a knowledge base. To this end, we have developed a novel notion of semantic relatedness between two entities represented as sets of weighted (multi-word) keyphrases, with consideration of partially overlapping phrases. This measure improves the quality of prior link-based models, and also eliminates the need for (usually Wikipedia-centric) explicit interlinkage between entities. Thus, our method is more versatile and can cope with long-tail and newly emerging entities that have few or no links associated with them. For efficiency, we have developed approximation techniques based on min-hash sketches and locality-sensitive hashing. Our experiments on semantic relatedness and on named entity disambiguation demonstrate the superiority of our method compared to state-of-the-art baselines.

References

[1]
E. Alfonseca, M. Pasca, and E. Robledo-Arnuncio. Acquisition of Instance Attributes via Labeled and Related Instances. In Proceeding of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2010, Geneva, Switzerland, pages 58--65, 2010.
[2]
S. Auer, C. Bizer, G. Kobilarov, J. Lehmann, R. Cyganiak, and Z. Ives. DBpedia: A Nucleus for a Web of Open Data. In Proceedings of the 6th International Semantic Web and 2nd Asian Semantic Web Conference, ISWC 2007/ASWC2007, Busan, South Korea, pages 722--735, 2007.
[3]
A. Z. Broder, M. Charikar, A. M. Frieze, and M. Mitzenmacher. Min-Wise Independent Permutations. In Proceedings of the 30th Annual ACM Symposium on the Theory of Computing, STOC 1998, Dallas, Texas, USA, pages 327--336, 1998.
[4]
A. Budanitsky and G. Hirst. Evaluating WordNet-based Measures of Lexical Semantic Relatedness. Computational Linguistics, 32(1), 2006.
[5]
K. Chakrabarti, S. Chaudhuri, T. Cheng, and D. Xin. EntityTagger: Automatically Tagging Entities with Descriptive Phrases. In Proceedings of the 20th International Conference Companion on World Wide Web, WWW 2011, Hyderabad, India, pages 19--20, 2011.
[6]
R. L. Cilibrasi and P. M. Vitanyi. The Google Similarity Distance. IEEE Transactions on Knowledge and Data Engineering, 19:370--383, 2007.
[7]
D. Coppersmith, L. K. Fleischer, and A. Rurda. Ordering by Weighted Number of Wins Gives a Good Ranking for Weighted Tournaments. Transactions on Algorithms, 6(3), 2010.
[8]
S. Cucerzan. Large-Scale Named Entity Disambiguation Based on Wikipedia Data. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, EMNLP-CoNLL 2007, Prague, Cech Republic, pages 708--716, 2007.
[9]
M. Dredze, P. McNamee, D. Rao, A. Gerber, and T. Finin. Entity Disambiguation for Knowledge Base Population. In Proceedings of the 23rd International Conference on Computational Linguistics, COLING 2010, Beijing, China, pages 277--285, 2010.
[10]
P. Ferragina and U. Scaiella. TAGME: On-The-Fly Annotation of Short Text Fragments (by Wikipedia Entities). In Proceedings of the 19th ACM Conference on Information and Knowledge Management, CIKM 2010, Toronto, Ontario, Canada, pages 1625--1628, 2010.
[11]
L. Finkelstein, E. Gabrilovich, Y. Matias, E. Rivlin, Z. Solan, G. Wolfman, and E. Ruppin. Placing Search in Context: The Concept Revisited. ACM Transactions on Information Systems, 20(1):116--131, 2002.
[12]
E. Gabrilovich and S. Markovitch. Computing Semantic Relatedness using Wikipedia-based Explicit Semantic Analysis. In Proceedings of The 20th International Joint Conference for Artificial Intelligence, IJCAI 2007, Hyderabad, India, pages 1606--1611, 2007.
[13]
A. Gionis, P. Indyk, and R. Motwani. Similarity Search in High Dimensions via Hashing. In Proceedings of 25th International Conference on Very Large Data Bases, VLDB 1999, Edinburgh, Scotland, pages 518--529, Sept. 1999.
[14]
S. Hassan and R. Mihalcea. Semantic Relatedness Using Salient Semantic Analysis. In Proceedings of the 25th AAAI Conference on Artificial Intelligence, AAAI 2011, San Francisco, California, United States, pages 884--889, 2011.
[15]
J. Hoffart, F. M. Suchanek, K. Berberich, and G. Weikum. YAGO2: A Spatially and Temporally Enhanced Knowledge Base from Wikipedia. Artificial Intelligence, 2012.
[16]
J. Hoffart, M. A. Yosef, I. Bordino, H. Fürstenau, M. Pinkal, M. Spaniol, B. Taneva, S. Thater, and G. Weikum. Robust Disambiguation of Named Entities in Text. In Conference on Empirical Methods in Natural Language Processing, EMNLP 2011, Edinburgh, Scotland, 2011, pages 782--792, 2011.
[17]
P. Indyk and R. Motwani. Approximate Nearest Neighbors: Towards Removing the Curse of Dimensionality. In Proceedings of the 30th Annual ACM Symposium on the Theory of Computing, STOC 1998, Dallas, Texas, United States, pages 604--613, 1998.
[18]
A. Kotov and C. Zhai. Tapping into Knowledge Base for Concept Feedback: Leveraging ConceptNet to Improve Search Results for Difficult Queries. In Proceedings of the 5th ACM International Conference on Web Search and Data Mining, WSDM 2012, Seattle, Washington, United States, pages 403--412, 2012.
[19]
S. Kulkarni, A. Singh, G. Ramakrishnan, and S. Chakrabarti. Collective Annotation of Wikipedia Entities in Web Text. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2009, Paris, France, pages 457--466, 2009.
[20]
M. Li, X. Chen, X. Li, B. Ma, and P. M. B. Vitányi. The Similarity Metric. IEEE Transactions on Information Theory, 50(12):3250--3264, 2004.
[21]
R. Mihalcea and P. Tarau. TextRank: Bringing Order into Texts. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, EMNLP 2004, Barcelona, Spain, pages 404--411, 2004.
[22]
D. Milne and I. H. Witten. An Effective, Low-Cost Measure of Semantic Relatedness Obtained from Wikipedia Links. In Proceedings of the AAAI 2008 Workshop on Wikipedia and Artificial Intelligence, WIKIAI 2008, Chicago, Illinois, United States, 2008.
[23]
D. Milne and I. H. Witten. Learning to Link with Wikipedia. In Proceedings of the 17th ACM Conference on Information and Knowledge Mining, CIKM 2008, Napa Valley, United States, pages 509--518, 2008.
[24]
R. Navigli. Word Sense Disambiguation: A survey. ACM Comput. Surv., 41(2), 2009.
[25]
P. Pantel and A. Fuxman. Jigs and Lures: Associating Web Queries with Structured Entities. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, ACL 2011, Portland, Oregon, United States, pages 83--92, 2011.
[26]
S. P. Ponzetto and R. Navigli. Knowledge-Rich Word Sense Disambiguation Rivaling Supervised Systems. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, ACL 2010, Uppsala, Sweden, pages 1522--1531, 2010.
[27]
S. P. Ponzetto and M. Strube. Knowledge Derived from Wikipedia for Computing Semantic Relatedness. Journal of Artificial Intelligence Research, 30(1):181--212, 2007.
[28]
K. Radinsky, E. Agichtein, E. Gabrilovich, and S. Markovitch. A Word at a Time: Computing Word Relatedness using Temporal Semantic Analysis. In Proceedings of the 20th International Conference on World Wide Web, WWW 2011, Hyderabad, India, pages 337--346, 2011.
[29]
D. Ravichandran, P. Pantel, and E. Hovy. Randomized Algorithms and NLP: Using Locality Sensitive Hash Functions for High Speed Noun Clustering. In Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, ACL 2005, Ann Arbor, United States, 2005.
[30]
S. Singh, A. Subramanya, F. C. N. Pereira, and A. McCallum. Large-Scale Cross-Document Coreference Using Distributed Inference and Hierarchical Models. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, ACL 2011, Portland, Oregon, United States, pages 793--803, 2011.
[31]
R. Sinha and R. Mihalcea. Unsupervised Graph-based Word Sense Disambiguation Using Measures of Word Semantic Similarity. In Proceedings of the 1st IEEE International Conference on Semantic Computing, ICSC 2007, Irvine, California, United States, pages 363--369, 2007.
[32]
F. M. Suchanek, G. Kasneci, and G. Weikum. YAGO: A Core of Semantic Knowledge. In Proceedings of the 16th International Conference on World Wide Web, WWW 2007, Banff, Canada, pages 697--706, 2007.
[33]
T. Zesch, C. Müller, and I. Gurevych. Using Wiktionary for Computing Semantic Relatedness. In Proceedings of the 23rd AAAI Conference on Artificial Intelligence, AAAI 2008, Chicago, Illinois, United States, pages 861--867, 2008.

Cited By

View all
  • (2024)The RDF2vec family of knowledge graph embedding methodsSemantic Web10.3233/SW-233514(1-32)Online publication date: 25-Jan-2024
  • (2024)Doc‐KG: Unstructured documents to knowledge graph construction, identification and validation with WikidataExpert Systems10.1111/exsy.13617Online publication date: 8-May-2024
  • (2024)SRSCL: A strong-relatedness-sequence-based fine-grained collective entity linking method for heterogeneous information networksExpert Systems with Applications10.1016/j.eswa.2023.121759238(121759)Online publication date: Mar-2024
  • Show More Cited By

Index Terms

  1. KORE: keyphrase overlap relatedness for entity disambiguation

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    CIKM '12: Proceedings of the 21st ACM international conference on Information and knowledge management
    October 2012
    2840 pages
    ISBN:9781450311564
    DOI:10.1145/2396761
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 29 October 2012

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. entity disambiguation
    2. entity relatedness
    3. locality-sensitive hashing
    4. semantic relatedness

    Qualifiers

    • Research-article

    Conference

    CIKM'12
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

    Upcoming Conference

    CIKM '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)20
    • Downloads (Last 6 weeks)2
    Reflects downloads up to 03 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)The RDF2vec family of knowledge graph embedding methodsSemantic Web10.3233/SW-233514(1-32)Online publication date: 25-Jan-2024
    • (2024)Doc‐KG: Unstructured documents to knowledge graph construction, identification and validation with WikidataExpert Systems10.1111/exsy.13617Online publication date: 8-May-2024
    • (2024)SRSCL: A strong-relatedness-sequence-based fine-grained collective entity linking method for heterogeneous information networksExpert Systems with Applications10.1016/j.eswa.2023.121759238(121759)Online publication date: Mar-2024
    • (2024)Entity linking for English and other languages: a surveyKnowledge and Information Systems10.1007/s10115-023-02059-266:7(3773-3824)Online publication date: 2-Apr-2024
    • (2024)Knowledge Graphs for Enhancing Large Language Models in Entity DisambiguationThe Semantic Web – ISWC 202410.1007/978-3-031-77844-5_9(162-179)Online publication date: 27-Nov-2024
    • (2024)MESS: Coarse-Grained Modular Two-Way Dialogue Entity Linking FrameworkMachine Learning and Knowledge Discovery in Databases. Research Track10.1007/978-3-031-70341-6_15(248-263)Online publication date: 22-Aug-2024
    • (2023)Social world knowledge: Modeling and applicationsPLOS ONE10.1371/journal.pone.028370018:7(e0283700)Online publication date: 7-Jul-2023
    • (2023)Efficient Approximate Nearest Neighbor Search in Multi-dimensional DatabasesProceedings of the ACM on Management of Data10.1145/35889081:1(1-27)Online publication date: 30-May-2023
    • (2023)A simple semantic ranking approach for entity linkingThird International Seminar on Artificial Intelligence, Networking, and Information Technology (AINIT 2022)10.1117/12.2667421(43)Online publication date: 22-Feb-2023
    • (2023)RDF-star2Vec: RDF-star Graph Embeddings for Data MiningIEEE Access10.1109/ACCESS.2023.334102911(142030-142042)Online publication date: 2023
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media