research-article

Word embeddings-based transfer learning for boosted relational dependency networks

Authors:

Gerson ZaveruchaAuthors Info & Claims

Machine Learning, Volume 113, Issue 3

Pages 1269 - 1302

https://doi.org/10.1007/s10994-023-06404-y

Published: 20 September 2023 Publication History

Abstract

Conventional machine learning methods assume data to be independent and identically distributed (i.i.d.) and ignore the relational structure of the data, which contains crucial information about how objects participate in relationships and events. Statistical Relational Learning (SRL) combines elements from statistical and probabilistic modeling to relational learning to represent and learn in domains with complex relational and rich probabilistic structures. SRL models do not suppose data to be i.i.d., but, as conventional machine learning models, they also assume training and testing data are sampled from the same distribution. Transfer learning has emerged as an essential technique to handle scenarios where such an assumption does not hold. It aims to provide methods with the ability to recognize knowledge previously learned in a source domain and apply it to a new model in a target domain to start solving a new task. For SRL models, the primary challenge is to transfer the learned structure, mapping the vocabulary across different domains. In this work, we propose TransBoostler, an algorithm that uses pre-trained word embeddings to guide the mapping. We follow the assumption that the name of a predicate has a semantic connotation that can be mapped to a vector space model. Next, TransBoostler employs theory revision to adapt the mapped model to the target data. Experimental results showed that TransBoostler successfully transferred trees across different domains. It performs equally well as, or better than, previous systems and requires less training time for some investigated scenarios.

References

[1]

Azevedo Santos R, Paes A, and Zaverucha G Transfer learning by mapping and revising boosted relational dependency networks Mach Learn 2020 109 7 1435-1463

Digital Library

[2]

Baziotis, C., Pelekis, N., & Doulkeridis, C. (2017). Deep lstm with attention for message-level and topic-based sentiment analysis. In: Proc. of the 11th Int. Workshop on Semantic Evaluation (SemEval-2017). Association for Computational Linguistics, Vancouver, Canada, pp 747–754

[3]

Bilenko, M., & Mooney, RJ. (2003). Adaptive duplicate detection using learnable string similarity measures. In: Proc. of the Ninth ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining. Association for Computing Machinery, New York, NY, USA, KDD ’03, p 39-48,

Digital Library

[4]

Bojanowski, P., Grave, E., Joulin, A., et al. (2017). Enriching Word Vectors with Subword Information. Trans of the Association for Computational Linguistics,5, 135–146. https://doi.org/10.1162/tacl_a_00051 https://arxiv.org/abs/https://direct.mit.edu/tacl/articlepdf/doi/10.1162/tacl a 00051/1567442/tacl a 00051.pdf

[5]

Bordes, A., Usunier, N., & Garcia-Duran, A., et al. (2013). Translating embeddings for modeling multi-relational data. In: Burges C, Bottou L, Welling M, et al (eds) Advances in Neural Information Processing Systems, vol 26. Curran Associates, Inc., https://proceedings.neurips.cc/paper/2013/file/1cecc7a77928ca8133fa24680a88d2f9-Paper.pdf

[6]

Bratko I PROLOG programming for artificial intelligence 1990 2 Inc, USA Addison-Wesley Longman Publishing Co.

Digital Library

[7]

Carlson, A., Betteridge, J., Kisiel, B., & et al. (2010). Toward an architecture for never-ending language learning. In: Proc. of the Twenty-Fourth AAAI Conf. on Artificial Intelligence. AAAI Press, AAAI’10, p 1306-1313

[8]

Davis, J., & Goadrich, M. (2006). The relationship between precision-recall and roc curves. In: Proc. of the 23rd Int. Conf. on Mach. Learn. Association for Computing Machinery, New York, NY, USA, ICML ’06, p 233-240,

Digital Library

[9]

De Raedt, L. (2008). Logical and Relational Learning. pp 1–1,

Digital Library

[10]

Dietterich, TG., Ashenfelter, A., & Bulatov, Y. (2004). Training conditional random fields via gradient tree boosting. In: Proc. of the Twenty-First Int. Conf. on Mach. Learn. Association for Computing Machinery, New York, NY, USA, ICML ’04, p 28,

Digital Library

[11]

Duboc AL, Paes A, and Zaverucha G Using the bottom clause and mode declarations in fol theory revision from examples Machine Learning 2009 76 1 73-107

Digital Library

[12]

de Figueiredo LF, Paes A, and Zaverucha G Katzouris N and Artikis A Transfer learning for boosted relational dependency networks through genetic algorithm Inductive Logic Programming 2022 Springer Int Publishing, Cham 125-139

Digital Library

[13]

Friedman, N., Getoor, L., Koller, D., & et al. (1999). Learning probabilistic relational models. In: Proc. of the 16th Int. Joint Conf. on Artificial Intelligence - Volume 2. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, IJCAI’99, p 1300-1307

[14]

Getoor, L., & Taskar, B. (2007). Introduction to Statistical Relational Learning (Adapt. Computation and Mach. Learn.). The MIT Press

[15]

Haaren, JV., Kolobov, A., & Davis, J. (2015). Todtler: Two-order-deep transfer learning. In: Proc. of the Twenty-Ninth AAAI Conference on Artificial Intelligence. AAAI Press, AAAI’15, p 3007-3015

[16]

Han, X., Huang, Z., An, B., & et al. (2021). Adaptive transfer learning on graph neural networks. In: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. Association for Computing Machinery, New York, NY, USA, p 565-574

[17]

Hirsch, S., Guy, I., Nus, A., & et al. (2020). Query reformulation in e-commerce search. In: Proc. of the 43rd Int. ACM SIGIR Conf. on Research and Development in Information Retrieval. Association for Computing Machinery, New York, NY, USA, p 1319-1328, https://doi.org/10.1145/3397271.3401065

Digital Library

[18]

Khosravi, H., & Bina, B. (2010). A survey on statistical relational learning. In: Proc. of the 23rd Canadian Conf. on Adv. in Artificial Intelligence. Springer-Verlag, Berlin, Heidelberg, AI’10, p 256-268,

Digital Library

[19]

Khosravi H, Schulte O, Hu J, et al. Muggleton SH, Tamaddoni-Nezhad A, Lisi FA, et al. Learning compact markov logic networks with decision trees Inductive Logic Programming 2012 Berlin Heidelberg, Berlin, Heidelberg Springer 20-25

Digital Library

[20]

Kuhn HW The Hungarian Method for the Assignment Problem Naval Research Logistics Quarterly 1955 2 1–2 83-97

[21]

Kumaraswamy, R., Odom, P., Kersting, K., & et al. (2015). Transfer learning via relational type matching. In: 2015 IEEE Int. Conf. on Data Mining, pp 811–816,

Digital Library

[22]

Kusner, M., Sun, Y., Kolkin, N., & et al. (2015). From word embeddings to document distances. In: Bach F, Blei D (eds) Proc. of the 32nd Int. Conf. on Mach. Learn., Proc. of Mach. Learn. Res., vol 37. PMLR, Lille, France, pp 957–966

[23]

Lee CK, Lu C, Yu Y, et al. Transfer learning with graph neural networks for optoelectronic properties of conjugated oligomers The Journal of Chemical Physics 2021 154 2 024-906

[24]

Luca T, Paes A, and Zaverucha G Katzouris N and Artikis A Mapping across relational domains for transfer learning with word embeddings-based similarity Inductive Logic Programming 2022 Springer Int Publishing, Cham 167-182

Digital Library

[25]

Mewes HW, Frishman D, Gruber C, et al. Mips: A database for genomes and protein sequences Nucleic Acids Research 2000 28 37-40

[26]

Mihalkova, L., & Mooney, RJ. (2007). Bottom-up learning of markov logic network structure. In: Proc. of the 24th Int. Conf. on Mach. Learn. Association for Computing Machinery, New York, NY, USA, ICML ’07, p 625-632,

Digital Library

[27]

Mihalkova, L., & Mooney, RJ. (2009). Transfer learning from minimal target data by mapping across relational domains. In: Proc. of the 21st Int. Joint Conference on Artificial Intelligence. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, IJCAI’09, p 1163-1168

[28]

Mihalkova, L., Huynh, T., & Mooney, RJ. (2007). Mapping and revising markov logic networks for transfer learning. In: Proc. of the 22nd Nat. Conf. on Artificial Intelligence - Volume 1. AAAI Press, AAAI’07, p 608-614

[29]

Mikolov, T., Chen, K., Corrado, G., & et al. (2013a). Efficient estimation of word representations in vector space. In: Bengio Y, LeCun Y (eds) 1st Int. Conf. on Learn. Representations, ICLR 2013, Scottsdale, Arizona, USA, May 2-4, 2013, Workshop Track Proceedings

[30]

Mikolov, T., Sutskever, I., Chen, K., et al. (2013b). Distributed representations of words and phrases and their compositionality. In: Proc. of the 26th Int. Conf. on Neural Information Processing Systems - Volume 2. Curran Associates Inc., Red Hook, NY, USA, NIPS’13, p 3111-3119

[31]

Mikolov, T., Grave, E., Bojanowski, P., & et al. (2018). Advances in pre-training distributed word representations. In: Proc. of the Eleventh Int. Conf. on Language Resources and Evaluation (LREC 2018). European Language Resources Association (ELRA), Miyazaki, Japan, https://aclanthology.org/L18-1008

[32]

Miller GA Wordnet: A lexical database for English Communications ACM 1995 38 11 39-41

Digital Library

[33]

Natarajan S, Khot T, Kersting K, et al. Gradient-based boosting for statistical relational learning: The relational dependency network case Machine Learning 2012 86 1 25-56

Digital Library

[34]

Neville J and Jensen D Relational dependency networks Journal Machine Learning Research 2007 8 653-692

Digital Library

[35]

Paes A, Zaverucha G, and Costa VS On the use of stochastic local search techniques to revise first-order logic theories from examples Machine Learning 2017 106 2 197-241

Digital Library

[36]

Pan SJ and Yang Q A survey on transfer learning IEEE Transactions on Knowledge and Data Engineering 2010 22 10 1345-1359

Digital Library

[37]

Pele, O., & Werman, M. (2009). Fast and robust earth mover’s distances. In: 2009 IEEE 12th Int. Conf. on Computer Vision, pp 460–467,

[38]

Pilehvar MT and Camacho-Collados J Embeddings in natural language processing: Theory and advances in vector representations of meaning Synthesis Lectures on Human Language Technologies 2020 13 4 1-175

[39]

Shvaytser H A necessary condition for learning from positive examples Machine Learning 1990 5 1 101-113

[40]

Sidorov, G., Gelbukh, A., Gomez, Adorno, H., & et al. (2014). Soft similarity and soft cosine measure: Similarity of features in vector space model. Computación y Sistemas 18.

[41]

Stahl, I. (1993). Predicate invention in ilp-an overview. In: European Conference on Machine Learning, Springer, pp 311–322

[42]

Tan, C., Sun, F., Kong, T., & et al. (2018). A survey on deep transfer learning. In: Kurková V, Manolopoulos Y, Hammer B, et al (eds) Artificial Neural Networks and Mach. Learn. - ICANN 2018 - 27th Int. Conf. on Artificial Neural Networks, Rhodes, Greece, October 4-7, 2018, Proc., Part III, Lecture Notes in Computer Science, vol 11141. Springer, pp 270–279,

[43]

Torregrossa, F., Allesiardo, R., Claveau, V., et al. (2021). A survey on training and evaluation of word embeddings. Int Journal of Data Science and Analytics,11(2), 85–103. https://hal.archives-ouvertes.fr/hal-03148517

[44]

Torrey, L., & Shavlik, J. (2010). Transfer learning. In: Handbook of research on Mach. Learn. applications and trends: algorithms, methods, and techniques. IGI global, pp 242–264

[45]

Toutanova, K., Klein, D., Manning, CD., & et al. (2003). Feature-rich part-of-speech tagging with a cyclic dependency network. In: Proc. of the 2003 Conf. of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1. Association for Computational Linguistics, USA, NAACL ’03, p 173-180,

Digital Library

[46]

Valiant LG A theory of the learnable Communications of the ACM 1984 27 11 1134-1142

Digital Library

[47]

Vig L, Srinivasan A, Bain M, et al. An investigation into the role of domain-knowledge on the use of embeddings Int Conf. on Inductive Logic Programming 2017 Springer 169-183

[48]

Wang, Z., Zhang, J., Feng, J., & et al. (2014). Knowledge graph embedding by translating on hyperplanes. In: Proc. of the AAAI Conf. on Artificial Intelligence

[49]

Wrobel S De Raedt L First Order Theory Refinement Advances in Inductive Logic Programming 1996 IOS Press

[50]

Wu, Z., Zhao, D., Liang, Q., & et al. (2021). Dynamic sparsity neural networks for automatic speech recognition. In: IEEE Int. Conf. on Acoustics, Speech and Signal Processing, ICASSP 2021, Toronto, ON, Canada, June 6-11, 2021. IEEE, pp 6014–6018,

[51]

Yang Q, Zhang Y, Dai W, et al. Transfer Learning Cambridge University Press 2020

Index Terms

Word embeddings-based transfer learning for boosted relational dependency networks

Index terms have been assigned to the content through auto-classification.

Recommendations

Mapping Across Relational Domains for Transfer Learning with Word Embeddings-Based Similarity
Inductive Logic Programming
Abstract
Statistical machine learning models are a concise representation of probabilistic dependencies among the attributes of an object. Most of the models assume that training and testing data come from the same distribution. Transfer learning has ...
Transfer learning by mapping and revising boosted relational dependency networks
Abstract
Statistical machine learning algorithms usually assume the availability of data of considerable size to train the models. However, they would fail in addressing domains where data is difficult or expensive to obtain. Transfer learning has emerged ...
Combining Word Embeddings-Based Similarity Measures for Transfer Learning Across Relational Domains
Inductive Logic Programming
Abstract
Statistical relational learning (SRL) algorithms have succeeded in many real-world applications as real-world data is relational and consists of different entities characterized by different sets of attributes. Real-world data can also be noisy ...

Comments

Information & Contributors

Information

Published In

cover image Machine Language

Machine Language Volume 113, Issue 3

Mar 2024

460 pages

Issue’s Table of Contents

© The Author(s), under exclusive licence to Springer Science+Business Media LLC, part of Springer Nature 2023. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Publisher

Kluwer Academic Publishers

United States

Publication History

Published: 20 September 2023

Accepted: 16 August 2023

Revision received: 04 December 2022

Received: 26 May 2022

Author Tags

Qualifiers

Research-article

Funding Sources

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 09 Nov 2024

Other Metrics

View Author Metrics

Citations

View Options

View options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Issue’s Table of Contents