research-article

Canonicalizing Open Knowledge Bases

Authors:

Luis Galárraga,

Fabian M. SuchanekAuthors Info & Claims

CIKM '14: Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management

Pages 1679 - 1688

https://doi.org/10.1145/2661829.2662073

Published: 03 November 2014 Publication History

Abstract

Open information extraction approaches have led to the creation of large knowledge bases from the Web. The problem with such methods is that their entities and relations are not canonicalized, leading to redundant and ambiguous facts. For example, they may store {Barack Obama, was born, Honolulu and {Obama, place of birth, Honolulu}. In this paper, we present an approach based on machine learning methods that can canonicalize such Open IE triples, by clustering synonymous names and phrases.

We also provide a detailed discussion about the different signals, features and design choices that influence the quality of synonym resolution for noun phrases in Open IE KBs, thus shedding light on the middle ground between "open" and "closed" information extraction systems.

References

[1]

S. Auer, C. Bizer, G. Kobilarov, J. Lehmann, R. Cyganiak, and Z. Ives. DBpedia: A nucleus for a web of open data. In ISWC, 2007.

Digital Library

[2]

A. Bagga and B. Baldwin. Entity-based cross-document coreferencing using the vector space model. In COLING, 1998.

Digital Library

[3]

M. Banko, M. J. Cafarella, S. Soderland, M. Broadhead, and O. Etzioni. Open information extraction from the web. In IJCAI, 2007.

Digital Library

[4]

K. Bollacker, C. Evans, P. Paritosh, T. Sturge, and J. Taylor. Freebase: a collaboratively created graph database for structuring human knowledge. In SIGMOD, 2008.

Digital Library

[5]

A. Carlson, J. Betteridge, B. Kisiel, B. Settles, E. H. Jr., and T. Mitchell. Toward an architecture for never-ending language learning. In AAAI, 2010.

Digital Library

[6]

L. Del Corro and R. Gemulla. Clausie: clause-based open information extraction. In WWW, 2013.

Digital Library

[7]

X. Dong, E. Gabrilovich, G. Heitz, W. Horn, N. Lao, K. Murphy, T. Strohmann, S. Sun, and W. Zhang. Knowledge vault: A web-scale approach to probabilistic knowledge fusion. In KDD, 2014.

Digital Library

[8]

X. Dong, A. Halevy, and J. Madhavan. Reference reconciliation in complex information spaces. In SIGMOD, 2005.

Digital Library

[9]

O. Etzioni, A. Fader, J. Christensen, S. Soderland, and Mausam. Open Information Extraction: the Second Generation. In IJCAI, 2011.

Digital Library

[10]

A. Fader, S. Soderland, and O. Etzioni. Identifying relations for open information extraction. In EMNLP, 2011.

Digital Library

[11]

E. Gabrilovich, M. Ringgaard, and A. Subramanya. FACC1: Freebase annotation of ClueWeb corpora, version 1, June 2013.

[12]

L. A. Galárraga, N. Preda, and F. M. Suchanek. Mining rules to align knowledge bases. In AKBC, 2013.

Digital Library

[13]

L. A. Galárraga, C. Teioudi, K. Hose, and F. Suchanek. AMIE: Association rule mining under incomplete evidence in ontological knowledge bases. In WWW, 2013.

Digital Library

[14]

W. A. Gale, K. W. Church, and D. Yarowsky. One sense per discourse. In Workshop on Speech and Natural Language, 1992.

Digital Library

[15]

B. Hachey, W. Radford, J. Nothman, M. Honnibal, and J. Curran. Evaluating entity linking with wikipedia. Artificial Intelligence, 194, 2013.

Digital Library

[16]

T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning. Springer, 2001.

[17]

J. Hoffart, Y. Altun, and G. Weikum. Discovering emerging entities with ambiguous names. In WWW, 2014.

Digital Library

[18]

J. Krishnamurthy and T. M. Mitchell. Which noun phrases denote which concepts? In HLT, 2011.

Digital Library

[19]

T. Lin, Mausam, and O. Etzioni. Entity linking at web scale. In AKBC-WEKEX, 2012.

Digital Library

[20]

C. D. Manning, P. Raghavan, and H. Schütze. Introduction to Information Retrieval. Cambridge University Press, 2008.

Digital Library

[21]

A. McCallum, K. Nigam, and L. H. Ungar. Efficient clustering of high-dimensional data sets with application to reference matching. In KDD, 2000.

Digital Library

[22]

B. Min, S. Shi, R. Grishman, and C. Y. Lin. Ensemble semantics for large-scale unsupervised relation extraction. In EMNLP-CoNLL, 2012.

Digital Library

[23]

N. Nakashole, G. Weikum, and F. Suchanek. Patty: A taxonomy of relational patterns with semantic types. In EMNLP, 2012.

Digital Library

[24]

G. Papadakis, E. Ioannou, C. Niederée, and P. Fankhauser. Efficient entity resolution for large heterogeneous information spaces. In WSDM, 2011.

Digital Library

[25]

J. Pujara, H. Miao, L. Getoor, and W. Cohen. Knowledge graph identification. In ISWC, 2013.

Digital Library

[26]

L. Ratinov, D. Roth, D. Downey, and M. Anderson. Local and global algorithms for disambiguation to wikipedia. In NAACL, 2011.

Digital Library

[27]

A. Ritter and O. Etzioni. A latent dirichlet allocation method for selectional preferences. In ACL, 2010.

Digital Library

[28]

M. Schmitz, R. Bart, S. Soderland, O. Etzioni, et al. Open language learning for information extraction. In EMNLP-CoNLL, 2012.

Digital Library

[29]

F. Suchanek, G. Kasneci, and G. Weikum. YAGO - A Core of Semantic Knowledge. In WWW, 2007.

Digital Library

[30]

M. Wick, S. Singh, and A. McCallum. A discriminative hierarchical model for fast coreference at large scale. In ACL, 2012.

Digital Library

[31]

F. Wu and D. S. Weld. Open information extraction using wikipedia. In ACL, 2010.

Digital Library

[32]

A. Yates and O. Etzioni. Unsupervised methods for determining object and relation synonyms on the web. J. Artif. Int. Res., 34(1), Mar. 2009.

Digital Library

Cited By

Viswanathan VGashteovski KGashteovski KLawrence CWu TNeubig G(2024)Large Language Models Enable Few-Shot ClusteringTransactions of the Association for Computational Linguistics10.1162/tacl_a_0064812(321-333)Online publication date: 5-Apr-2024
https://doi.org/10.1162/tacl_a_00648
Shen WYang BLiu YChua TNgo CKa-Wei Lee RKumar RLauw H(2024)Jointly Canonicalizing and Linking Open Knowledge Base via Unified Embedding LearningProceedings of the ACM Web Conference 202410.1145/3589334.3645700(2304-2314)Online publication date: 13-May-2024
https://dl.acm.org/doi/10.1145/3589334.3645700
Liu BPeng HZeng WZhao XLiu SPan LLi X(2024)Open knowledge base canonicalization with multi-task learningWorld Wide Web10.1007/s11280-024-01288-x27:5Online publication date: 18-Jul-2024
https://dl.acm.org/doi/10.1007/s11280-024-01288-x
Show More Cited By

Index Terms

Canonicalizing Open Knowledge Bases
1. Information systems
  1. Information systems applications

Recommendations

Towards Vandalism Detection in Knowledge Bases: Corpus Construction and Analysis
SIGIR '15: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval

We report on the construction of the Wikidata Vandalism Corpus WDVC-2015, the first corpus for vandalism in knowledge bases. Our corpus is based on the entire revision history of Wikidata, the knowledge base underlying Wikipedia. Among Wikidata's 24 ...
Search-based entity disambiguation with document-centric knowledge bases
i-KNOW '15: Proceedings of the 15th International Conference on Knowledge Technologies and Data-driven Business

Entity disambiguation is the task of mapping ambiguous terms in natural-language text to its entities in a knowledge base. One possibility to describe these entities within a knowledge base is via entity-annotated documents (document-centric knowledge ...
Harnessing Open Information Extraction for Entity Classification in a French Corpus
Proceedings of the 29th Canadian Conference on Artificial Intelligence on Advances in Artificial Intelligence - Volume 9673

We describe a recall-oriented open information extraction system designed to extract knowledge from French corpora. We put it to the test by showing that general domain information triples extracted from French Wikipedia can be used for deriving new ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

CIKM '14: Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management

November 2014

2152 pages

ISBN:9781450325981

DOI:10.1145/2661829

General Chairs:
Jianzhong Li
Harbin Inst. of Technology
,
X. Sean Wang
Fudan University
,
Program Chairs:
Minos Garofalakis
Technical University of Crete, Greece
,
Ian Soboroff
National Institute of Standards, USA
,
Torsten Suel
New York University, USA
,
Min Wang
Google Research, USA

Copyright © 2014 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 03 November 2014

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

CIKM '14

Sponsor:

CIKM '14: 2014 ACM Conference on Information and Knowledge Management

November 3 - 7, 2014

Shanghai, China

Acceptance Rates

CIKM '14 Paper Acceptance Rate 175 of 838 submissions, 21%;

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Sponsor:
sigir
sigir

The 34th ACM International Conference on Information and Knowledge Management

November 10 - 14, 2025

Seoul , Republic of Korea

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

63
Total Citations
View Citations
553
Total Downloads

Downloads (Last 12 months)55
Downloads (Last 6 weeks)5

Reflects downloads up to 01 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Viswanathan VGashteovski KGashteovski KLawrence CWu TNeubig G(2024)Large Language Models Enable Few-Shot ClusteringTransactions of the Association for Computational Linguistics10.1162/tacl_a_0064812(321-333)Online publication date: 5-Apr-2024
https://doi.org/10.1162/tacl_a_00648
Shen WYang BLiu YChua TNgo CKa-Wei Lee RKumar RLauw H(2024)Jointly Canonicalizing and Linking Open Knowledge Base via Unified Embedding LearningProceedings of the ACM Web Conference 202410.1145/3589334.3645700(2304-2314)Online publication date: 13-May-2024
https://dl.acm.org/doi/10.1145/3589334.3645700
Liu BPeng HZeng WZhao XLiu SPan LLi X(2024)Open knowledge base canonicalization with multi-task learningWorld Wide Web10.1007/s11280-024-01288-x27:5Online publication date: 18-Jul-2024
https://dl.acm.org/doi/10.1007/s11280-024-01288-x
Tong JWang ZRui X(2024)Incorporating topic and property for knowledge base synchronizationKnowledge and Information Systems10.1007/s10115-024-02160-066:10(6241-6268)Online publication date: 13-Jun-2024
https://dl.acm.org/doi/10.1007/s10115-024-02160-0
Li QWang DFeng SSong KZhang YYu G(2023)OERL: Enhanced Representation Learning via Open Knowledge GraphsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2022.321885035:9(8880-8892)Online publication date: 1-Sep-2023
https://dl.acm.org/doi/10.1109/TKDE.2022.3218850
Nguyen TRazniewski SRomero JWeikum G(2023)Refined Commonsense Knowledge From Large-Scale Web ContentsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2022.320650535:8(8431-8447)Online publication date: 1-Aug-2023
https://dl.acm.org/doi/10.1109/TKDE.2022.3206505
Li QWang DFeng SSong KZhang YYu G(2023)Variational autoencoder densified graph attention for fusing synonymous entitiesKnowledge-Based Systems10.1016/j.knosys.2022.110061259:COnline publication date: 10-Jan-2023
https://dl.acm.org/doi/10.1016/j.knosys.2022.110061
Romero JRazniewski S(2023)Mapping and Cleaning Open Commonsense Knowledge Bases with Generative TranslationThe Semantic Web – ISWC 202310.1007/978-3-031-47240-4_20(368-387)Online publication date: 6-Nov-2023
https://dl.acm.org/doi/10.1007/978-3-031-47240-4_20
Chen JJiménez-Ruiz EHorrocks IChen XMyklebust E(2022)An assertion and alignment correction framework for large scale knowledge basesSemantic Web10.3233/SW-21044814:1(29-53)Online publication date: 30-Nov-2022
https://doi.org/10.3233/SW-210448
Wood EGlen AKvarfordt LWomack FAcevedo LYoon TMa CFlores VSinha MChodpathumwan YTermehchy ARoach JMendoza LHoffman ADeutsch EKoslicki DRamsey S(2022)RTX-KG2: a system for building a semantically standardized knowledge graph for translational biomedicineBMC Bioinformatics10.1186/s12859-022-04932-323:1Online publication date: 29-Sep-2022
https://doi.org/10.1186/s12859-022-04932-3
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents