Abstract
The traditional means of extracting information from the Web are keyword-based search and browsing. The Semantic Web adds structured information (i.e., semantic annotations and references) supporting both activities. One of the most interesting recent developments is Linked Open Data (LOD), where information is presented in the form of facts – often originating from published domain-specific databases – that can be accessed both by a human and a machine via specific query endpoints. In this article, we argue that machine learning provides a new way to query web data, in particular LOD, by analyzing and exploiting statistical regularities. We discuss challenges when applying machine learning to the Web and discuss the particular learning approaches we have been pursuing in THESEUS. We discuss a number of applications where the Web is queried via machine learning and describe several extensions to our approaches.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Although the world might be governed by scientific laws and logical constraints in general, at the level of abstraction that we and our applications have to function, the world partially appears to be governed by probabilities and statistical patterns.
- 2.
- 3.
In particular, the probability that a relationship between two entities exists given the knowledge base KB is estimated as
$$\displaystyle\begin{array}{rcl} \hat{P}((\mathit{Jane},\mathit{likes},\mathit{Jack})\vert KB) =\sum _{ i=1}^{L}f_{ i}^{\mathit{Jane}}f_{ i}^{\mathit{likes},\mathit{Jack}}& & {}\\ \end{array}$$where \(\{f_{i}^{{\it \text{Jane}}}\}_{i=1}^{L}\) are the L factors describing Jane, and \(\{f_{i}^{{\it \text{likes}},{\it \text{Jack}}}\}_{i=1}^{L}\) are the L factors describing Jack in his role as an object of the predicate “likes”. There are a number of approaches for calculating the factors. In our work in the SUNS framework (Tresp et al. 2009; Huang et al. 2010), we have employed regularized factorization of the associated data matrices. In our three-way tensor approach RESCAL (Nickel et al. 2011), we estimate
$$\displaystyle\begin{array}{rcl} \hat{P}((\mathit{Jane},\mathit{likes},\mathit{Jack})\vert \mathit{KB}) =\sum _{ i=1}^{L}f_{ i}^{\mathit{Jane}}R^{\mathit{likes}}f_{ i}^{\mathit{Jack}}\;.& & {}\\ \end{array}$$Each entity has a unique latent representation, here \(\{f_{i}^{\mathit{Jane}}\}_{i=1}^{L}\) and \(\{f_{i}^{{\it \text{Jack}}}\}_{i=1}^{L}\), and the relation type specific interaction is modeled by the matrix \(R^{{\it \text{likes}}}\).
- 4.
- 5.
- 6.
References
S. Auer, C. Bizer, G. Kobilarov, J. Lehmann, R. Cyganiak, Z. Ives, DBpedia: a nucleus for a web of open data, in Proceedings of the 6th International Semantic Web Conference (ISWC’08), Karlsruhe. Volume 4825 of Lecture Notes in Computer Science (Springer, Berlin/Heidelberg/New York, 2008), pp. 722–735
M. Balduini, I. Celino, D. Dell’Aglio, E.D. Valle, Y. Huang, T. Lee, S.H. Kim, V. Tresp, Reality mining on micropost streams: deductive and inductive reasoning for personalized and location-based recommendations. Semant. Web Interoperability Usability Applicability 2, 1–16 (2013)
C. Bizer, T. Heath, T. Berners-Lee, Linked data – the story so far. Int. J. Semant. Web Inf. Syst. (IJSWIS) 5(3), 1–22 (2009)
D. Brickley, L. Miller, The Friend of a Friend (FOAF) project, http://www.foaf-project.org/
D. Fensel, F. van Harmelen, B. Andersson, P. Brennan, H. Cunningham, E.D. Valle, F. Fischer, Z. Huang, A. Kiryakov, T.K. il Lee, L. Schooler, V. Tresp, S. Wesner, M. Witbrock, N. Zhong, Towards LarKC: a platform for web-scale reasoning, in Proceedings of the IEEE International Conference on Semantic Computing, Santa Clara, Aug 2008, pp. 524–529
Y. Huang, V. Tresp, M. Bundschus, A. Rettinger, H.P. Kriegel, Multivariate structured prediction for learning on semantic web, in Proceedings of the 20th International Conference on Inductive Logic Programming (ILP’10), Florence, ed. by P. Frasconi, F.A. Lisi. Volume 6489 of Lecture Notes in Computer Science (Springer, Berlin/Heidelberg/New York, 2010), pp. 92–104
Y. Huang, V. Tresp, M. Nickel, A. Rettinger, H.P. Kriegel, A scalable approach for statistical learning in semantic graphs. Semant. Web Interoperability Usability Applicability 1, 1–18 (2013)
X. Jiang, Y. Huang, M. Nickel, V. Tresp, Combining information extraction, deductive reasoning and machine learning for relation prediction, in Proceedings of the 9th Extended Semantic Web Conference (ESWC’12), Heraklion, ed. by E. Simperl, P. Cimiano, A. Polleres, O. Corcho, V. Presutti. Volume 7295 of Lecture Notes in Computer Science (Springer, Berlin/Heidelberg/New York, 2012a), pp. 164–178. http://dblp.uni-trier.de/db/conf/esws/eswc2012.html#JiangHNT12
X. Jiang, V. Tresp, Y. Huang, M. Nickel, Link prediction in multi-relational graphs using additive models, in Proceedings of the 11th International Workshop on Semantic Technologies Meet Recommender Systems & Big Data, ed. by M. de Gemmis, T.D. Noia, P. Lops, T. Lukasiewicz, G. Semeraro. Volume 919 of CEUR Workshop Proceedings, 2012b, pp. 1–12, CEUR-WS.org. http://dblp.uni-trier.de/db/conf/semweb/sersy2012.html#JiangTHN12.
X. Jiang, V. Tresp, Y. Huang, M. Nickel, H.P. Kriegel, Scalable relation prediction exploiting both intrarelational correlation and contextual information, in Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD’12), Bristol, ed. by P.A. Flach, T.D. Bie, N. Cristianini. Volume 7523 of Lecture Notes in Computer Science (Springer, Berlin/Heidelberg/New York, 2012c), pp. 601–616. http://dblp.uni-trier.de/db/conf/pkdd/pkdd2012-1.html#JiangTHNK12
M.G. Kann, Advances in translational bioinformatics: computational approaches for the hunting of disease genes. Brief. Bioinform. 11(1), 96–110 (2010). http://dblp.uni-trier.de/db/journals/bib/bib11.html#Kann10
M. Nickel, H.P. Kriegel, V. Tresp, A three-way model for collective learning on multi-relational data, in Proceedings of the 28th International Conference on Machine Learning (ICML’11), Bellevue, 2011
M. Nickel, V. Tresp, H.P. Kriegel, Factorizing YAGO: scalable machine learning for linked data, in Proceedings of the 21st International World Wide Web Conference, Lyon, ed. by A. Mille, F.L. Gandon, J. Misselis, M. Rabinovich, S. Staab (ACM, 2012), pp. 271–280. http://dblp.uni-trier.de/db/conf/www/www2012.html#NickelTK12
V. Tresp, Y. Huang, M. Bundschus, A. Rettinger, Materializing and querying learned knowledge, in Proceedings of the First ESWC Workshop on Inductive Reasoning and Machine Learning on the Semantic Web (IRMLeS’09), Heraklion, vol. 474 (RWTH Aachen, 2009)
V. Tresp, Y. Huang, X. Jiang, A. Rettinger, Graphical models for relations – modeling relational context, in Proceedings of the International Conference on Knowledge Discovery and Information Retrieval (KDIR’11), Paris, Oct 2011
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Tresp, V., Huang, Y., Nickel, M. (2014). Querying the Web with Statistical Machine Learning. In: Wahlster, W., Grallert, HJ., Wess, S., Friedrich, H., Widenka, T. (eds) Towards the Internet of Services: The THESEUS Research Program. Cognitive Technologies. Springer, Cham. https://doi.org/10.1007/978-3-319-06755-1_18
Download citation
DOI: https://doi.org/10.1007/978-3-319-06755-1_18
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-06754-4
Online ISBN: 978-3-319-06755-1
eBook Packages: Computer ScienceComputer Science (R0)