Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2661829.2662073acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Canonicalizing Open Knowledge Bases

Published: 03 November 2014 Publication History

Abstract

Open information extraction approaches have led to the creation of large knowledge bases from the Web. The problem with such methods is that their entities and relations are not canonicalized, leading to redundant and ambiguous facts. For example, they may store {Barack Obama, was born, Honolulu and {Obama, place of birth, Honolulu}. In this paper, we present an approach based on machine learning methods that can canonicalize such Open IE triples, by clustering synonymous names and phrases.
We also provide a detailed discussion about the different signals, features and design choices that influence the quality of synonym resolution for noun phrases in Open IE KBs, thus shedding light on the middle ground between "open" and "closed" information extraction systems.

References

[1]
S. Auer, C. Bizer, G. Kobilarov, J. Lehmann, R. Cyganiak, and Z. Ives. DBpedia: A nucleus for a web of open data. In ISWC, 2007.
[2]
A. Bagga and B. Baldwin. Entity-based cross-document coreferencing using the vector space model. In COLING, 1998.
[3]
M. Banko, M. J. Cafarella, S. Soderland, M. Broadhead, and O. Etzioni. Open information extraction from the web. In IJCAI, 2007.
[4]
K. Bollacker, C. Evans, P. Paritosh, T. Sturge, and J. Taylor. Freebase: a collaboratively created graph database for structuring human knowledge. In SIGMOD, 2008.
[5]
A. Carlson, J. Betteridge, B. Kisiel, B. Settles, E. H. Jr., and T. Mitchell. Toward an architecture for never-ending language learning. In AAAI, 2010.
[6]
L. Del Corro and R. Gemulla. Clausie: clause-based open information extraction. In WWW, 2013.
[7]
X. Dong, E. Gabrilovich, G. Heitz, W. Horn, N. Lao, K. Murphy, T. Strohmann, S. Sun, and W. Zhang. Knowledge vault: A web-scale approach to probabilistic knowledge fusion. In KDD, 2014.
[8]
X. Dong, A. Halevy, and J. Madhavan. Reference reconciliation in complex information spaces. In SIGMOD, 2005.
[9]
O. Etzioni, A. Fader, J. Christensen, S. Soderland, and Mausam. Open Information Extraction: the Second Generation. In IJCAI, 2011.
[10]
A. Fader, S. Soderland, and O. Etzioni. Identifying relations for open information extraction. In EMNLP, 2011.
[11]
E. Gabrilovich, M. Ringgaard, and A. Subramanya. FACC1: Freebase annotation of ClueWeb corpora, version 1, June 2013.
[12]
L. A. Galárraga, N. Preda, and F. M. Suchanek. Mining rules to align knowledge bases. In AKBC, 2013.
[13]
L. A. Galárraga, C. Teioudi, K. Hose, and F. Suchanek. AMIE: Association rule mining under incomplete evidence in ontological knowledge bases. In WWW, 2013.
[14]
W. A. Gale, K. W. Church, and D. Yarowsky. One sense per discourse. In Workshop on Speech and Natural Language, 1992.
[15]
B. Hachey, W. Radford, J. Nothman, M. Honnibal, and J. Curran. Evaluating entity linking with wikipedia. Artificial Intelligence, 194, 2013.
[16]
T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning. Springer, 2001.
[17]
J. Hoffart, Y. Altun, and G. Weikum. Discovering emerging entities with ambiguous names. In WWW, 2014.
[18]
J. Krishnamurthy and T. M. Mitchell. Which noun phrases denote which concepts? In HLT, 2011.
[19]
T. Lin, Mausam, and O. Etzioni. Entity linking at web scale. In AKBC-WEKEX, 2012.
[20]
C. D. Manning, P. Raghavan, and H. Schütze. Introduction to Information Retrieval. Cambridge University Press, 2008.
[21]
A. McCallum, K. Nigam, and L. H. Ungar. Efficient clustering of high-dimensional data sets with application to reference matching. In KDD, 2000.
[22]
B. Min, S. Shi, R. Grishman, and C. Y. Lin. Ensemble semantics for large-scale unsupervised relation extraction. In EMNLP-CoNLL, 2012.
[23]
N. Nakashole, G. Weikum, and F. Suchanek. Patty: A taxonomy of relational patterns with semantic types. In EMNLP, 2012.
[24]
G. Papadakis, E. Ioannou, C. Niederée, and P. Fankhauser. Efficient entity resolution for large heterogeneous information spaces. In WSDM, 2011.
[25]
J. Pujara, H. Miao, L. Getoor, and W. Cohen. Knowledge graph identification. In ISWC, 2013.
[26]
L. Ratinov, D. Roth, D. Downey, and M. Anderson. Local and global algorithms for disambiguation to wikipedia. In NAACL, 2011.
[27]
A. Ritter and O. Etzioni. A latent dirichlet allocation method for selectional preferences. In ACL, 2010.
[28]
M. Schmitz, R. Bart, S. Soderland, O. Etzioni, et al. Open language learning for information extraction. In EMNLP-CoNLL, 2012.
[29]
F. Suchanek, G. Kasneci, and G. Weikum. YAGO - A Core of Semantic Knowledge. In WWW, 2007.
[30]
M. Wick, S. Singh, and A. McCallum. A discriminative hierarchical model for fast coreference at large scale. In ACL, 2012.
[31]
F. Wu and D. S. Weld. Open information extraction using wikipedia. In ACL, 2010.
[32]
A. Yates and O. Etzioni. Unsupervised methods for determining object and relation synonyms on the web. J. Artif. Int. Res., 34(1), Mar. 2009.

Cited By

View all
  • (2024)Large Language Models Enable Few-Shot ClusteringTransactions of the Association for Computational Linguistics10.1162/tacl_a_0064812(321-333)Online publication date: 5-Apr-2024
  • (2024)Jointly Canonicalizing and Linking Open Knowledge Base via Unified Embedding LearningProceedings of the ACM Web Conference 202410.1145/3589334.3645700(2304-2314)Online publication date: 13-May-2024
  • (2024)Open knowledge base canonicalization with multi-task learningWorld Wide Web10.1007/s11280-024-01288-x27:5Online publication date: 18-Jul-2024
  • Show More Cited By

Index Terms

  1. Canonicalizing Open Knowledge Bases

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    CIKM '14: Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management
    November 2014
    2152 pages
    ISBN:9781450325981
    DOI:10.1145/2661829
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 03 November 2014

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. entities
    2. knowledge bases
    3. open information extraction

    Qualifiers

    • Research-article

    Conference

    CIKM '14
    Sponsor:

    Acceptance Rates

    CIKM '14 Paper Acceptance Rate 175 of 838 submissions, 21%;
    Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

    Upcoming Conference

    CIKM '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)55
    • Downloads (Last 6 weeks)5
    Reflects downloads up to 01 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Large Language Models Enable Few-Shot ClusteringTransactions of the Association for Computational Linguistics10.1162/tacl_a_0064812(321-333)Online publication date: 5-Apr-2024
    • (2024)Jointly Canonicalizing and Linking Open Knowledge Base via Unified Embedding LearningProceedings of the ACM Web Conference 202410.1145/3589334.3645700(2304-2314)Online publication date: 13-May-2024
    • (2024)Open knowledge base canonicalization with multi-task learningWorld Wide Web10.1007/s11280-024-01288-x27:5Online publication date: 18-Jul-2024
    • (2024)Incorporating topic and property for knowledge base synchronizationKnowledge and Information Systems10.1007/s10115-024-02160-066:10(6241-6268)Online publication date: 13-Jun-2024
    • (2023)OERL: Enhanced Representation Learning via Open Knowledge GraphsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2022.321885035:9(8880-8892)Online publication date: 1-Sep-2023
    • (2023)Refined Commonsense Knowledge From Large-Scale Web ContentsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2022.320650535:8(8431-8447)Online publication date: 1-Aug-2023
    • (2023)Variational autoencoder densified graph attention for fusing synonymous entitiesKnowledge-Based Systems10.1016/j.knosys.2022.110061259:COnline publication date: 10-Jan-2023
    • (2023)Mapping and Cleaning Open Commonsense Knowledge Bases with Generative TranslationThe Semantic Web – ISWC 202310.1007/978-3-031-47240-4_20(368-387)Online publication date: 6-Nov-2023
    • (2022)An assertion and alignment correction framework for large scale knowledge basesSemantic Web10.3233/SW-21044814:1(29-53)Online publication date: 30-Nov-2022
    • (2022)RTX-KG2: a system for building a semantically standardized knowledge graph for translational biomedicineBMC Bioinformatics10.1186/s12859-022-04932-323:1Online publication date: 29-Sep-2022
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media