Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1007/978-3-319-46523-4_30guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

RDF2Vec: RDF Graph Embeddings for Data Mining

Published: 17 October 2016 Publication History

Abstract

Linked Open Data has been recognized as a valuable source for background information in data mining. However, most data mining tools require features in propositional form, i.e., a vector of nominal or numerical features associated with an instance, while Linked Open Data sources are graphs by nature. In this paper, we present RDF2Vec, an approach that uses language modeling approaches for unsupervised feature extraction from sequences of words, and adapts them to RDF graphs. We generate sequences by leveraging local information from graph sub-structures, harvested by Weisfeiler-Lehman Subtree RDF Graph Kernels and graph walks, and learn latent numerical representations of entities in RDF graphs. Our evaluation shows that such vector representations outperform existing techniques for the propositionalization of RDF graphs on a variety of different predictive machine learning tasks, and that feature vector representations of general knowledge graphs such as DBpedia and Wikidata can be easily reused for different tasks.

References

[1]
Bloehdorn S, Sure Y, et al. Aberer K et al. Kernel methods for mining instance data in ontologies The Semantic Web 2007 Heidelberg Springer 58-71
[2]
Cheng G, Tran T, Qu Y, et al. Aroyo L et al. RELIN: relatedness and informativeness-based centrality for entity summarization The Semantic Web – ISWC 2011 2011 Heidelberg Springer 114-129
[3]
Cheng, W., Kasneci, G., Graepel, T., Stern, D., Herbrich, R.: Automated feature generation from structured knowledge. In: CIKM (2011)
[4]
Di Noia T and Ostuni VC Faber W and Paschke A Recommender systems and linked open data Reasoning Web. Web Logic Rules 2015 Heidelberg Springer 88-113
[5]
Fanizzi N and d’Amato C Esposito F, Raś ZW, Malerba D, and Semeraro G A declarative kernel for ALC concept descriptions Foundations of Intelligent Systems 2006 Heidelberg Springer 322-331
[6]
Hoffart, J., Seufert, S., Nguyen, D.B., Theobald, M., Weikum, G.: KORE: keyphrase overlap relatedness for entity disambiguation. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management, pp. 545–554. ACM (2012)
[7]
Huang Y, Tresp V, Nickel M, and Kriegel HP A scalable approach for statistical learning in semantic graphs Semant. Web 2014 5 5-22
[8]
Kappara, V.N.P., Ichise, R., Vyas, O.: LiDDM: a data mining system for linked data. In: LDOW (2011)
[9]
Khan, M.A., Grimnes, G.A., Dengel, A.: Two pre-processing operators for improved learning from semanticweb data. In: RCOMM (2010)
[10]
Kramer S, Lavrač N, and Flach P Džeroski S and Lavrač N Propositionalization approaches to relational data mining Relational Data Mining 2001 Berlin Springer 262-291
[11]
Lehmann, J., Isele, R., Jakob, M., Jentzsch, A., Kontokostas, D., Mendes, P.N., Hellmann, S., Morsey, M., van Kleef, P., Auer, S., Bizer, C.: DBpedia - a large-scale, multilingual knowledge base extracted from Wikipedia. Semant. Web J. (2013)
[12]
Lösch U, Bloehdorn S, and Rettinger A Simperl E, Cimiano P, Polleres A, Corcho O, and Presutti V Graph kernels for RDF data The Semantic Web: Research and Applications 2012 Heidelberg Springer 134-148
[13]
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)
[14]
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)
[15]
Minervini, P., Fanizzi, N., d’Amato, C., Esposito, F.: Scalable learning of entity and predicate embeddings for knowledge graph completion. In: 2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA), pp. 162–167. IEEE (2015)
[16]
Mynarz, J., Svátek, V.: Towards a benchmark for LOD-enhanced knowledge discovery from structured data. In: The Second International Workshop on Knowledge Discovery and Data Mining Meets Linked Open Data (2013)
[17]
Nickel, M., Murphy, K., Tresp, V., Gabrilovich, E.: A review of relational machine learning for knowledge graphs: from multi-relational link prediction to automated knowledge graph construction. arXiv preprint arXiv:1503.00759 (2015)
[18]
Paulheim, H.: Exploiting linked open data as background knowledge in data mining. In: Workshop on Data Mining on Linked Open Data (2013)
[19]
Paulheim, H.: Knowlegde graph refinement: a survey of approaches and evaluation methods. Semant. Web J. 1–20 (2016, Preprint)
[20]
Paulheim, H., Fümkranz, J.: Unsupervised generation of data mining features from linked open data. In: Proceedings of the 2nd International Conference on Web Intelligence, Mining and Semantics, p. 31. ACM (2012)
[21]
Paulheim, H., Ristoski, P., Mitichkin, E., Bizer, C.: Data mining with background knowledge from the web. In: RapidMiner World 2014 Proceedings, pp.1-14. Shaker, Aachen (2014)
[22]
Perozzi, B., Al-Rfou, R., Skiena, S.: Deepwalk: online learning of social representations. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 701–710. ACM (2014)
[23]
Ristoski P, Bizer C, and Paulheim H Mining the web of linked data with rapidminer Web Semant.: Sci. Serv. Agents World Wide Web 2015 35 142-151
[24]
Ristoski, P., Paulheim, H.: A comparison of propositionalization strategies for creating features from linked open data. In: Linked Data for Knowledge Discovery (2014)
[25]
Ristoski P and Paulheim H Semantic web in data mining and knowledge discovery: a comprehensive survey. Web Semant.: Sci. Serv. Agents World Wide Web 2016 36 1-22
[26]
Ristoski, P., Paulheim, H., Svátek, V., Zeman, V.: The linked data mining challenge 2015. In: KNOW@LOD (2015)
[27]
Ristoski, P., Paulheim, H., Svátek, V., Zeman, V.: The linked data mining challenge 2016. In: KNOWLOD (2016)
[28]
Ristoski Petar, de Vries Gerben Klaas Dirk, and Paulheim Heiko A Collection of Benchmark Datasets for Systematic Evaluations of Machine Learning on the Semantic Web Lecture Notes in Computer Science 2016 Cham Springer International Publishing 186-194
[29]
Schmachtenberg M, Bizer C, Paulheim H, et al. Mika P et al. Adoption of the linked data best practices in different topical domains The Semantic Web – ISWC 2014 2014 Heidelberg Springer 245-260
[30]
Shervashidze N, Schweitzer P, Van Leeuwen EJ, Mehlhorn K, and Borgwardt KM Weisfeiler-Lehman graph kernels J. Mach. Learn. Res. 2011 12 2539-2561
[31]
Vrandečić D and Krötzsch M Wikidata: a free collaborative knowledgebase Commun. ACM 2014 57 10 78-85
[32]
de Vries GKD Blockeel H, Kersting K, Nijssen S, and Železný F A fast approximation of the Weisfeiler-Lehman graph kernel for RDF data Machine Learning and Knowledge Discovery in Databases 2013 Heidelberg Springer 606-621
[33]
de Vries, G.K.D., de Rooij, S.: A fast and simple graph kernel for RDF. In: DMLOD (2013)
[34]
de Vries GKD and de Rooij S Substructure counting graph kernels for machine learning from RDF data Web Semant.: Sci. Serv. Agents World Wide Web 2015 35 71-84
[35]
Yanardag, P., Vishwanathan, S.: Deep graph kernels. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1365–1374. ACM (2015)

Cited By

View all
  • (2024)Comparing Spatial-Temporal Knowledge Graph on Spatial Downstream TasksProceedings of the 32nd ACM International Conference on Advances in Geographic Information Systems10.1145/3678717.3691321(581-584)Online publication date: 29-Oct-2024
  • (2024)Cardinality Estimation over Knowledge Graphs with Embeddings and Graph Neural NetworksProceedings of the ACM on Management of Data10.1145/36392992:1(1-26)Online publication date: 26-Mar-2024
  • (2024)Inductive autoencoder for efficiently compressing RDF graphsInformation Sciences: an International Journal10.1016/j.ins.2024.120210662:COnline publication date: 1-Mar-2024
  • Show More Cited By

Index Terms

  1. RDF2Vec: RDF Graph Embeddings for Data Mining
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image Guide Proceedings
        The Semantic Web – ISWC 2016: 15th International Semantic Web Conference, Kobe, Japan, October 17–21, 2016, Proceedings, Part I
        Oct 2016
        665 pages
        ISBN:978-3-319-46522-7
        DOI:10.1007/978-3-319-46523-4
        • Editors:
        • Paul Groth,
        • Elena Simperl,
        • Alasdair Gray,
        • Marta Sabou,
        • Markus Krötzsch,
        • Freddy Lecue,
        • Fabian Flöck,
        • Yolanda Gil

        Publisher

        Springer-Verlag

        Berlin, Heidelberg

        Publication History

        Published: 17 October 2016

        Author Tags

        1. Graph embeddings
        2. Linked open data
        3. Data mining

        Qualifiers

        • Article

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)0
        • Downloads (Last 6 weeks)0
        Reflects downloads up to 30 Jan 2025

        Other Metrics

        Citations

        Cited By

        View all
        • (2024)Comparing Spatial-Temporal Knowledge Graph on Spatial Downstream TasksProceedings of the 32nd ACM International Conference on Advances in Geographic Information Systems10.1145/3678717.3691321(581-584)Online publication date: 29-Oct-2024
        • (2024)Cardinality Estimation over Knowledge Graphs with Embeddings and Graph Neural NetworksProceedings of the ACM on Management of Data10.1145/36392992:1(1-26)Online publication date: 26-Mar-2024
        • (2024)Inductive autoencoder for efficiently compressing RDF graphsInformation Sciences: an International Journal10.1016/j.ins.2024.120210662:COnline publication date: 1-Mar-2024
        • (2024)Personalizing Communication and Segmentation with Random Forest Node EmbeddingExpert Systems with Applications: An International Journal10.1016/j.eswa.2024.124621255:PCOnline publication date: 1-Dec-2024
        • (2024)Towards Enhancing Linked Data Retrieval in Conversational UIs Using Large Language ModelsWeb Information Systems Engineering – WISE 202410.1007/978-981-96-0573-6_18(246-261)Online publication date: 2-Dec-2024
        • (2024)SnapE – Training Snapshot Ensembles of Link Prediction ModelsThe Semantic Web – ISWC 202410.1007/978-3-031-77844-5_1(3-22)Online publication date: 11-Nov-2024
        • (2024)Enhancing Machine Learning Predictions Through Knowledge Graph EmbeddingsNeural-Symbolic Learning and Reasoning10.1007/978-3-031-71167-1_15(279-295)Online publication date: 9-Sep-2024
        • (2024)Do Similar Entities Have Similar Embeddings?The Semantic Web10.1007/978-3-031-60626-7_1(3-21)Online publication date: 26-May-2024
        • (2023)Calibrate and boost logical expressiveness of GNN over multi-relational and temporal graphsProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3669034(66692-66726)Online publication date: 10-Dec-2023
        • (2023)Do you catch my drift? On the usage of embedding methods to measure concept shift in knowledge graphsProceedings of the 12th Knowledge Capture Conference 202310.1145/3587259.3627555(70-74)Online publication date: 5-Dec-2023
        • Show More Cited By

        View Options

        View options

        Figures

        Tables

        Media

        Share

        Share

        Share this Publication link

        Share on social media