research-article

Factorizing YAGO: scalable machine learning for linked data

Authors:

Maximilian Nickel,

Volker Tresp, and

Hans-Peter KriegelAuthors Info & Claims

WWW '12: Proceedings of the 21st international conference on World Wide Web

April 2012

Pages 271 - 280

https://doi.org/10.1145/2187836.2187874

Published: 16 April 2012 Publication History

Abstract

Vast amounts of structured information have been published in the Semantic Web's Linked Open Data (LOD) cloud and their size is still growing rapidly. Yet, access to this information via reasoning and querying is sometimes difficult, due to LOD's size, partial data inconsistencies and inherent noisiness. Machine Learning offers an alternative approach to exploiting LOD's data with the advantages that Machine Learning algorithms are typically robust to both noise and data inconsistencies and are able to efficiently utilize non-deterministic dependencies in the data. From a Machine Learning point of view, LOD is challenging due to its relational nature and its scale. Here, we present an efficient approach to relational learning on LOD data, based on the factorization of a sparse tensor that scales to data consisting of millions of entities, hundreds of relations and billions of known facts. Furthermore, we show how ontological knowledge can be incorporated in the factorization to improve learning results and how computation can be distributed across multiple nodes. We demonstrate that our approach is able to factorize the YAGO~2 core ontology and globally predict statements for this large knowledge base using a single dual-core desktop computer. Furthermore, we show experimentally that our approach achieves good results in several relational learning tasks that are relevant to Linked Data. Once a factorization has been computed, our model is able to predict efficiently, and without any additional training, the likelihood of any of the 4.3 ⋅ 10¹⁴ possible triples in the YAGO~2 core ontology.

References

[1]

M. Ankerst, M. Breunig, H. Kriegel, and J. Sander. OPTICS: ordering points to identify the clustering structure. In ACM SIGMOD Record, volume 28, page 49--60, 1999.

Digital Library

[2]

S. Auer, C. Bizer, G. Kobilarov, J. Lehmann, R. Cyganiak, and Z. Ives. Dbpedia: A nucleus for a web of open data. The Semantic Web, page 722--735, 2008.

Digital Library

[3]

S. Auer and J. Lehmann. Creating knowledge out of interlinked data. Semantic Web, 1(1):97--104, Jan. 2010.

Digital Library

[4]

B. W. Bader, R. A. Harshman, and T. G. Kolda. Temporal analysis of semantic graphs using ASALSAN. In Seventh IEEE International Conference on Data Mining (ICDM 2007), pages 33--42, Omaha, NE, USA, Oct. 2007.

Digital Library

[5]

V. Bicer, T. Tran, and A. Gossen. Relational kernel machines for learning from Graph-Structured RDF data. The Semantic Web: Research and Applications, page 47--62, 2011.

Digital Library

[6]

C. Bizer, T. Heath, and T. Berners-Lee. Linked data-the story so far. International Journal on Semantic Web and Information Systems, 5(3):1--22, 2009.

[7]

S. Bloehdorn and Y. Sure. Kernel methods for mining instance data in ontologies. In Proceedings of the 6th international The semantic web and 2nd Asian conference on Asian semantic web conference, page 58--71, 2007.

Digital Library

[8]

R. Bro. PARAFAC. tutorial and applications. Chemometrics and Intelligent Laboratory Systems, 38(2):149--171, 1997.

[9]

C. d'Amato, N. Fanizzi, and F. Esposito. Non-parametric statistical learning methods for inductive classifiers in semantic knowledge bases. In Proceedings of the 2008 IEEE International Conference on Semantic Computing, page 291--298, Washington, DC, USA, 2008. IEEE Computer Society.

Digital Library

[10]

J. Davis and M. Goadrich. The relationship between Precision-Recall and ROC curves. In Proceedings of the 23rd international conference on Machine learning, page 233--240, 2006.

Digital Library

[11]

N. Fanizzi, C. D'Amato, and F. Esposito. DL-FOIL concept learning in description logics. In Proceedings of the 18th international conference on Inductive Logic Programming, ILP '08, page 107--121, Berlin, Heidelberg, 2008. Springer-Verlag.

Digital Library

[12]

T. Franz, A. Schultz, S. Sizov, and S. Staab. Triplerank: Ranking semantic web data by tensor decomposition. The Semantic Web-ISWC 2009, page 213--228, 2009.

Digital Library

[13]

H. Halpin, P. Hayes, J. McCusker, D. Mcguinness, and H. Thompson. When owl: same As isn't the same: An analysis of identity in linked data. The Semantic Web--ISWC 2010, page 305--320, 2010.

Digital Library

[14]

S. Hellmann, J. Lehmann, and S. Auer. Learning of OWL class descriptions on very large knowledge bases. Int. J. Semantic Web Inf. Syst, 5(2):25--48, 2009.

[15]

P. Hitzler and F. van Harmelen. A reasonable semantic web. Semantic Web, 1(1):39--44, 2010.

Digital Library

[16]

A. Hogan, A. Harth, A. Passant, S. Decker, and A. Polleres. Weaving the pedantic web. Linked Data on the Web (LDOW 2010), 2010.

[17]

Y. Huang, V. Tresp, M. Bundschus, and A. Rettinger. Multivariate structured prediction for learning on semantic web. 2010.

[18]

C. Kiefer, A. Bernstein, and A. Locher. Adding data mining support to SPARQL via statistical relational learning methods. In Proceedings of the 5th European semantic web conference, pages 478--492, 2008.

Digital Library

[19]

S. Kok and P. Domingos. Statistical predicate invention. In Proceedings of the 24th international conference on Machine learning, page 433--440, 2007.

Digital Library

[20]

T. G. Kolda and B. W. Bader. Tensor decompositions and applications. SIAM Review, 51(3):455, 2009.

Digital Library

[21]

H. T. Lin, N. Koul, and V. Honavar. Learning relational bayesian classifiers from RDF data. In Proceedings of the International Semantic Web Conference (ISWC 2011), 2011. In press.

Digital Library

[22]

M. Nickel, V. Tresp, and H. Kriegel. A Three-Way model for collective learning on Multi-Relational data. In Proceedings of the 28th International Conference on Machine Learning, ICML '11, pages 809--816, Bellevue, WA, USA, 2011. ACM.

[23]

S. Rendle, C. Freudenthaler, and L. Schmidt-Thieme. Factorizing personalized markov chains for next-basket recommendation. In Proceedings of the 19th international conference on World wide web, page 811--820, 2010.

Digital Library

[24]

M. Richardson and P. Domingos. Markov logic networks. Machine Learning, 62(1):107--136, 2006.

Digital Library

[25]

D. Roy, C. Kemp, V. Mansinghka, and J. Tenenbaum. Learning annotated hierarchies from relational data. Advances in neural information processing systems, 19:1185, 2007.

[26]

P. Sen, G. Namata, M. Bilgic, L. Getoor, B. Galligher, and T. Eliassi-Rad. Collective classification in network data. AI Magazine, 29(3):93, 2008.

Digital Library

[27]

F. Suchanek, G. Kasneci, and G. Weikum. Yago: a core of semantic knowledge. In Proceedings of the 16th international conference on World Wide Web, page 697--706, 2007.

Digital Library

[28]

J. Sun, D. Tao, and C. Faloutsos. Beyond streams and graphs: dynamic tensor analysis. In Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, page 374--383, 2006.

Digital Library

[29]

I. Sutskever, R. Salakhutdinov, and J. B. Tenenbaum. Modelling relational data using bayesian clustered tensor factorization. Advances in Neural Information Processing Systems, 22, 2009.

[30]

P. Tan, M. Steinbach, V. Kumar, et al. Introduction to data mining. Pearson Addison Wesley Boston, 2006.

Digital Library

[31]

J. Völker and M. Niepert. Statistical schema induction. The Semantic Web: Research and Applications, page 124--138, 2011.

Digital Library

[32]

K. Weinberger, A. Dasgupta, J. Langford, A. Smola, and J. Attenberg. Feature hashing for large scale multitask learning. In Proceedings of the 26th Annual International Conference on Machine Learning, page 1113--1120, 2009.

Digital Library

[33]

M. Welling and Y. W. Teh. Bayesian learning via stochastic gradient langevin dynamics. In Proceedings of the 28th International Conference on Machine Learning, Bellevue, WA, USA, 2011.

Cited By

Zhong HYang DShi SWei LWang Y(2024)From data to insights: the application and challenges of knowledge graphs in intelligent auditJournal of Cloud Computing10.1186/s13677-024-00674-013:1Online publication date: 29-May-2024
https://doi.org/10.1186/s13677-024-00674-0
Wang C(2024)CoolGust: knowledge representation learning with commonsense knowledge guidelines and constraintsNeural Computing and Applications10.1007/s00521-024-09423-536:12(6305-6323)Online publication date: 16-Feb-2024
https://doi.org/10.1007/s00521-024-09423-5
Liu YZhang QDu MHuang XHu X(2023)Error Detection on Knowledge Graphs with Triple Embedding2023 31st European Signal Processing Conference (EUSIPCO)10.23919/EUSIPCO58844.2023.10289852(1604-1608)Online publication date: 4-Sep-2023
https://doi.org/10.23919/EUSIPCO58844.2023.10289852
Show More Cited By

Index Terms

Factorizing YAGO: scalable machine learning for linked data
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
    2. Machine learning approaches
      1. Markov decision processes
2. Theory of computation
  1. Theory and algorithms for application domains
    1. Machine learning theory
      1. Markov decision processes

Recommendations

Mapping the central LOD ontologies to PROTON upper-level ontology
OM'10: Proceedings of the 5th International Conference on Ontology Matching - Volume 689

Linking Open Data (LOD) facilitates the emergence of a web of linked data by publishing and interlinking open data on the web in RDF. One can explore linked data across servers by following the links in the graph. The LOD cloud has 203 datasets and more ...
Read More
SPedia: A Central Hub for the Linked Open Data of Scientific Publications

Producing the Linked Open Data LOD is getting potential to publish high-quality interlinked data. Publishing such data facilitates intelligent searching from the Web of data. In the context of scientific publications, data about millions of scientific ...
Read More
The Labeling System: A New Approach to Overcome the Vocabulary Bottleneck
DH-CASE '14: DH-CASE II: Collaborative Annotations on Shared Environments: metadata, tools and techniques in the Digital Humanities

Shared controlled vocabularies are a prerequisite for collaborative annotation and semantic interchange. The creation and maintenance of such vocabularies is, however, time-consuming and expensive. The diversity of research questions in the humanities ...
Read More

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

WWW '12: Proceedings of the 21st international conference on World Wide Web

April 2012

1078 pages

ISBN:9781450312295

DOI:10.1145/2187836

General Chairs:
Alain Mille
Université de Lyon, France
,
Fabien Gandon
INRIA, France
,
Jacques Misselis
HP, France
,
Program Chairs:
Michael Rabinovich
Case Western Reserve University, USA
,
Steffen Staab
University of Koblenz-Landau, Germany

Copyright © 2012 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Univ. de Lyon: Universite de Lyon

In-Cooperation

SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 16 April 2012

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

WWW 2012

Sponsor:

Univ. de Lyon

WWW 2012: 21st World Wide Web Conference 2012

April 16 - 20, 2012

Lyon, France

Acceptance Rates

Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

190
Total Citations
View Citations
1,967
Total Downloads

Downloads (Last 12 months)66
Downloads (Last 6 weeks)11

Other Metrics

View Author Metrics

Citations

Cited By

Zhong HYang DShi SWei LWang Y(2024)From data to insights: the application and challenges of knowledge graphs in intelligent auditJournal of Cloud Computing10.1186/s13677-024-00674-013:1Online publication date: 29-May-2024
https://doi.org/10.1186/s13677-024-00674-0
Wang C(2024)CoolGust: knowledge representation learning with commonsense knowledge guidelines and constraintsNeural Computing and Applications10.1007/s00521-024-09423-536:12(6305-6323)Online publication date: 16-Feb-2024
https://doi.org/10.1007/s00521-024-09423-5
Liu YZhang QDu MHuang XHu X(2023)Error Detection on Knowledge Graphs with Triple Embedding2023 31st European Signal Processing Conference (EUSIPCO)10.23919/EUSIPCO58844.2023.10289852(1604-1608)Online publication date: 4-Sep-2023
https://doi.org/10.23919/EUSIPCO58844.2023.10289852
Tresp VSharifzadeh SLi HKonopatzki DMa Y(2023)The Tensor Brain: A Unified Theory of Perception, Memory, and Semantic DecodingNeural Computation10.1162/neco_a_0155235:2(156-227)Online publication date: 20-Jan-2023
https://doi.org/10.1162/neco_a_01552
Khan A(2023)Knowledge Graphs QueryingACM SIGMOD Record10.1145/3615952.361595652:2(18-29)Online publication date: 11-Aug-2023
https://dl.acm.org/doi/10.1145/3615952.3615956
Pei SKou ZZhang QZhang XSingh ASun YAkoglu LGunopulos DYan XKumar ROzcan FYe J(2023)Few-shot Low-resource Knowledge Graph Completion with Multi-view Task Representation GenerationProceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3580305.3599350(1862-1871)Online publication date: 6-Aug-2023
https://dl.acm.org/doi/10.1145/3580305.3599350
Nayyeri MXu CAlam MLehmann JYazdi H(2023)LogicENN: A Neural Based Knowledge Graphs Embedding Model With Logical RulesIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2021.312164645:6(7050-7062)Online publication date: 1-Jun-2023
https://doi.org/10.1109/TPAMI.2021.3121646
Chen LLi ZHe WCheng GXu TYuan NChen E(2023)Entity Summarization via Exploiting Description Complementarity and SalienceIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2022.314904734:11(8297-8309)Online publication date: Nov-2023
https://doi.org/10.1109/TNNLS.2022.3149047
Li QWang DFeng SSong KZhang YYu G(2023)OERL: Enhanced Representation Learning via Open Knowledge GraphsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2022.321885035:9(8880-8892)Online publication date: 1-Sep-2023
https://doi.org/10.1109/TKDE.2022.3218850
Huang WLiu JLi TJi SWang DHuang T(2023)FedCKE: Cross-Domain Knowledge Graph Embedding in Federated LearningIEEE Transactions on Big Data10.1109/TBDATA.2022.32057059:3(792-804)Online publication date: 1-Jun-2023
https://doi.org/10.1109/TBDATA.2022.3205705
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents