short-paper

PyKale: Knowledge-Aware Machine Learning from Multiple Sources in Python

Authors:

Mustafa Chasmai,

Lawrence Schobs, and

Hao XuAuthors Info & Claims

CIKM '22: Proceedings of the 31st ACM International Conference on Information & Knowledge Management

October 2022

Pages 4274 - 4278

https://doi.org/10.1145/3511808.3557676

Published: 17 October 2022 Publication History

Abstract

PyKale is a Python library for Knowledge-aware machine learning from multiple sources of data to enable/accelerate interdisciplinary research. It embodies green machine learning principles to reduce repetitions/redundancy, reuse existing resources, and recycle learning models across areas. We propose a pipeline-based application programming interface (API) so all machine learning workflows follow a standardized six-step pipeline. PyKale focuses on leveraging knowledge from multiple sources for accurate and interpretable prediction, particularly multimodal learning and transfer learning. To be more accessible, it separates code and configurations to enable non-programmers to configure systems without coding. PyKale is officially part of the PyTorch ecosystem and includes interdisciplinary examples in bioinformatics, knowledge graph, image/video recognition, and medical imaging: https://pykale.github.io/.

References

[1]

Kartik Ahuja and Mihaela van der Schaar. 2019. Joint Concordance Index. In Proceedings of the 2019 53rd Asilomar Conference on Signals, Systems, and Computers. 2206--2213.

[2]

Samer Alabed, Johanna Uthoff, Shuo Zhou, Pankaj Garg, Krit Dwivedi, Faisal Alandejani, Rebecca Gosling, Lawrence Schobs, Martin Brook, Yousef Shahin, et al. 2022. Machine learning cardiac-MRI features predict mortality in newly diagnosed pulmonary arterial hypertension. European Heart Journal-Digital Health, Vol. 3, 2 (2022), 265--275.

[3]

Peizhen Bai, Yan Ge, Fangling Liu, and Haiping Lu. 2019. Joint interaction with context operation for collaborative filtering. Pattern Recognition, Vol. 88 (2019), 729--738.

Digital Library

[4]

Shai Ben-David, John Blitzer, Koby Crammer, Fernando Pereira, et al. 2007. Analysis of representations for domain adaptation. In Proceedings of the Advances in Neural Information Processing Systems. 137--144.

[5]

Oren Ben-Kiki, Clark Evans, and Brian Ingerson. 2009. Yaml ain't markup language (yaml?) version 1.1. Working Draft 2008-05, Vol. 11 (2009).

[6]

Antonio Candelieri, Riccardo Perego, and Francesco Archetti. 2021. Green machine learning via augmented Gaussian processes and multi-information source optimization. Soft Computing (2021), 1--13.

[7]

Joao Carreira and Andrew Zisserman. 2017. Quo vadis, action recognition? a new model and the kinetics dataset. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6299--6308.

[8]

William A Falcon and et al. 2019. PyTorch Lightning. GitHub., Vol. 3 (2019). https://github.com/PyTorchLightning/pytorch-lightning

[9]

Matthias Fey and Jan E. Lenssen. 2019. Fast Graph Representation Learning with PyTorch Geometric. In ICLR Workshop on Representation Learning on Graphs and Manifolds.

[10]

Yaroslav Ganin, Evgeniya Ustinova, Hana Ajakan, Pascal Germain, Hugo Larochelle, Francc ois Laviolette, Mario Marchand, and Victor Lempitsky. 2016. Domain-adversarial training of neural networks. Journal of Machine Learning Research, Vol. 17, 1 (2016), 2096--2030.

Digital Library

[11]

Eva Garc'ia Mart'in. 2017. Energy efficiency in machine learning: A position paper. In Proceedings of the 30th Annual Workshop of the Swedish Artificial Intelligence Society, Vol. 137. 68--72.

[12]

Jacob Gardner, Geoff Pleiss, Kilian Q Weinberger, David Bindel, and Andrew G Wilson. 2018. GPyTorch: Blackbox Matrix-Matrix Gaussian Process Inference with GPU Acceleration. In Proceedings of the Advances in Neural Information Processing Systems, Vol. 31. 7587--7597.

[13]

Georgian. 2020. Multimodal-Toolkit. GitHub (2020). https://github.com/georgian-io/Multimodal-Toolkit

[14]

Jie Hu, Li Shen, and Gang Sun. 2018. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7132--7141.

[15]

Junguang Jiang, Bo Fu, and Mingsheng Long. 2020. Transfer-Learning-library. GitHub (2020). https://github.com/thuml/Transfer-Learning-Library

[16]

Thomas Kipf and Max Welling. 2017. Semi-Supervised Classification with Graph Convolutional Networks. In Proceedings of the 5th International Conference on Learning Representations.

[17]

Jean Kossaifi, Yannis Panagakis, Anima Anandkumar, and Maja Pantic. 2019. TensorLy: Tensor Learning in Python. Journal of Machine Learning Research, Vol. 20, 26 (2019), 1--6.

[18]

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Proceedings of the Advances in Neural Information Processing Systems. 1097--1105.

[19]

Sun Yuan Kung. 2014. Kernel methods and machine learning. Cambridge University Press.

[20]

Tiqing Liu, Yuhmei Lin, Xin Wen, R. Jorissen, and M. Gilson. 2007. BindingDB: a web-accessible database of experimentally determined protein--ligand binding affinities. Nucleic Acids Research, Vol. 35 (2007), D198 -- D201.

[21]

Mingsheng Long, Yue Cao, Jianmin Wang, and Michael Jordan. 2015. Learning transferable features with deep adaptation networks. In Proceedings of the International Conference on Machine Learning. 97--105.

Digital Library

[22]

Mingsheng Long, Zhangjie Cao, Jianmin Wang, and Michael I Jordan. 2018. Conditional Adversarial Domain Adaptation. In Proceedings of the Advances in Neural Information Processing Systems, Vol. 31.

[23]

Mingsheng Long, Han Zhu, Jianmin Wang, and Michael I Jordan. 2017. Deep transfer learning with joint adaptation networks. In Proceedings of the International Conference on Machine Learning. 2208--2217.

[24]

Haiping Lu, Konstantinos N Plataniotis, and Anastasios N Venetsanopoulos. 2008. MPCA: Multilinear principal component analysis of tensor objects. IEEE Transactions on Neural Networks, Vol. 19, 1 (2008), 18--39.

Digital Library

[25]

Nic Ma, Wenqi Li, and Richard Brown. 2021. Project-MONAI/MONAI: 0.5.3. https://doi.org/10.5281/zenodo.4891800

[26]

Sébastien Marcel and Yann Rodriguez. 2010. Torchvision the Machine-Vision Package of Torch. In Proceedings of the 18th ACM International Conference on Multimedia. 1485--1488.

Digital Library

[27]

Xiangrui Meng, Joseph Bradley, Burak Yavuz, Evan Sparks, Shivaram Venkataraman, Davies Liu, Jeremy Freeman, DB Tsai, Manish Amde, Sean Owen, et al. 2016. Mllib: Machine learning in apache spark. Journal of Machine Learning Research, Vol. 17, 1 (2016), 1235--1241.

Digital Library

[28]

Hakime Öztürk, E. Olmez, and Arzucan Özgür. 2018. DeepDTA: deep drug--target binding affinity prediction. Bioinformatics, Vol. 34, 17 (2018), i821 -- i829.

[29]

Sinno Jialin Pan, Ivor W Tsang, James T Kwok, and Qiang Yang. 2010. Domain adaptation via transfer component analysis. IEEE Transactions on Neural Networks, Vol. 22, 2 (2010), 199--210.

Digital Library

[30]

Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, et al. 2011. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, Vol. 12 (2011), 2825--2830.

Digital Library

[31]

Xingchao Peng, Qinxun Bai, Xide Xia, Zijun Huang, Kate Saenko, and Bo Wang. 2019. Moment matching for multi-source domain adaptation. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 1406--1415.

[32]

Fernando Pérez-García, Rachel Sparks, and Sebastien Ourselin. 2020. TorchIO: a Python library for efficient loading, preprocessing, augmentation and patch-based sampling of medical images in deep learning. (2020). http://arxiv.org/abs/2003.04696

[33]

Haozhi Qi, Chong You, Xiaolong Wang, Yi Ma, and Jitendra Malik. 2020. Deep isometric learning for visual recognition. In Proceedings of the International Conference on Machine Learning. 7824--7835.

[34]

Edgar Riba, Dmytro Mishkin, Daniel Ponsa, Ethan Rublee, and Gary Bradski. 2020. Kornia: an open source differentiable computer vision library for pytorch. In Proceedings of the IEEE Winter Conference on Applications of Computer Vision. 3674--3683.

[35]

Aghiles Salah, Quoc-Tuan Truong, and Hady W Lauw. 2020. Cornac: A Comparative Framework for Multimodal Recommender Systems. Journal of Machine Learning Research, Vol. 21, 95 (2020), 1--5.

[36]

Roy Schwartz, Jesse Dodge, Noah A Smith, and Oren Etzioni. 2020. Green AI. Commun. ACM, Vol. 63, 12 (2020), 54--63.

Digital Library

[37]

Jian Shen, Yanru Qu, Weinan Zhang, and Yong Yu. 2018. Wasserstein distance guided representation learning for domain adaptation. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 32.

[38]

Amanpreet Singh, Vedanuj Goswami, Vivek Natarajan, Yu Jiang, Xinlei Chen, Meet Shah, Marcus Rohrbach, Dhruv Batra, and Devi Parikh. 2020. MMF: A multimodal framework for vision and language research. https://github.com/facebookresearch/mmf.

[39]

Xiaonan Song, Lingnan Meng, Qiquan Shi, and Haiping Lu. 2015. Learning tensor-based features for whole-brain fMRI classification. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention. 613--620.

Digital Library

[40]

Andrew J Swift, Haiping Lu, Johanna Uthoff, Pankaj Garg, Marcella Cogliano, Jonathan Taylor, Peter Metherall, Shuo Zhou, Christopher S Johns, Samer Alabed, et al. 2021. A machine learning cardiac magnetic resonance approach to extract disease features and automate pulmonary arterial hypertension diagnosis. European Heart Journal-Cardiovascular Imaging, Vol. 22, 2 (2021), 236--245.

[41]

Anne-Marie Tousch and Christophe Renaudin. 2020. (Yet) Another Domain Adaptation library. https://github.com/criteo-research/pytorch-ada

[42]

Du Tran, Heng Wang, Lorenzo Torresani, Jamie Ray, Yann LeCun, and Manohar Paluri. 2018. A closer look at spatiotemporal convolutions for action recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6450--6459.

[43]

Johanna Uthoff, Samer Alabed, Andrew J Swift, and Haiping Lu. 2020. Geodesically Smoothed Tensor Features for Pulmonary Hypertension Prognosis Using the Heart and Surrounding Tissues. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention. 253--262.

Digital Library

[44]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan Gomez, Ukasz Kaiser, and Illia Polosukhin. 2017. Attention is All You Need. In Proceedings of the Advances in Neural Information Processing Systems. 6000--6010.

[45]

Hao Xu, Shengqi Sang, Peizhen Bai, Ruike Li, Laurence Yang, and Haiping Lu. 2022. GripNet: Graph Information Propagation on Supergraphs for Heterogeneous Graphs. Pattern Recognition (2022).

[46]

Ke Yan, Lu Kou, and David Zhang. 2017. Learning domain-invariant subspace using domain features and independence maximization. IEEE transactions on cybernetics, Vol. 48, 1 (2017), 288--299.

[47]

Shuo Zhou. 2022. Interpretable Domain-Aware Learning for Neuroimage Classification. Ph.D. Dissertation. University of Sheffield.

[48]

Yongchun Zhu, Fuzhen Zhuang, and Deqing Wang. 2019. Aligning Domain-Specific Distribution and Classifier for Cross-Domain Classification from Multiple Sources. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 5989--5996.

Digital Library

[49]

Marinka Zitnik, Monica Agrawal, and Jure Leskovec. 2018a. Modeling polypharmacy side effects with graph convolutional networks. Bioinformatics, Vol. 34, 13 (2018), i457--i466.

[50]

Marinka Zitnik, Rok Sosivc, Sagar Maheshwari, and Jure Leskovec. 2018b. BioSNAP Datasets: Stanford Biomedical Network Dataset Collection. endthebibl

Cited By

Liu XZhou SLei TJiang PChen ZLu H(2023)First-Person Video Domain Adaptation With Multi-Scene Cross-Site Datasets and Attention-Based MethodsIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2023.328167133:12(7774-7788)Online publication date: 31-May-2023
https://dl.acm.org/doi/10.1109/TCSVT.2023.3281671
Paul SSaha AArefin MBhuiyan TBiswas AReza AAlotaibi NAlyami SMoni M(2023)A Comprehensive Review of Green Computing: Past, Present, and Future ResearchIEEE Access10.1109/ACCESS.2023.330433211(87445-87494)Online publication date: 2023
https://doi.org/10.1109/ACCESS.2023.3304332

Index Terms

PyKale: Knowledge-Aware Machine Learning from Multiple Sources in Python
1. Computing methodologies
  1. Artificial intelligence
  2. Machine learning
    1. Machine learning algorithms
2. Software and its engineering
  1. Software notations and tools
    1. Software libraries and repositories

Recommendations

Reinforcement Learning with Case-Based Heuristics for RoboCup Soccer Keepaway
SBR-LARS '12: Proceedings of the 2012 Brazilian Robotics Symposium and Latin American Robotics Symposium

In this paper we propose to combine Case-based Reasoning and Heuristically Accelerated Reinforcement Learning to speed up a Reinforcement Learning algorithm in a Transfer Learning problem. To do so, we propose a new algorithm called SARSA Accelerated by ...
Read More
Option compatible reward inverse reinforcement learning
Highlights
- New method of assigning reward functions for a hierarchical IRL problem is introduced.
Abstract
Reinforcement learning in complex environments is a challenging problem. In particular, the success of reinforcement learning algorithms depends on a well-designed reward function. Inverse reinforcement learning (IRL) solves the ...
Read More
A Survey on Transfer Learning

A major assumption in many machine learning and data mining algorithms is that the training and future data must be in the same feature space and have the same distribution. However, in many real-world applications, this assumption may not hold. For ...
Read More

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

CIKM '22: Proceedings of the 31st ACM International Conference on Information & Knowledge Management

October 2022

5274 pages

ISBN:9781450392365

DOI:10.1145/3511808

General Chairs:
Mohammad Al Hasan
Indiana University Purdue University, Indianapolis, USA
,
Li Xiong
Emory University, Atlanta, USA

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 October 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Short-paper

Funding Sources

Wellcome Trust

Conference

CIKM '22

Sponsor:

CIKM '22: The 31st ACM International Conference on Information and Knowledge Management

October 17 - 21, 2022

GA, Atlanta, USA

Acceptance Rates

CIKM '22 Paper Acceptance Rate 621 of 2,257 submissions, 28%;

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
139
Total Downloads

Downloads (Last 12 months)47
Downloads (Last 6 weeks)3

Other Metrics

View Author Metrics

Citations

Cited By

Liu XZhou SLei TJiang PChen ZLu H(2023)First-Person Video Domain Adaptation With Multi-Scene Cross-Site Datasets and Attention-Based MethodsIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2023.328167133:12(7774-7788)Online publication date: 31-May-2023
https://dl.acm.org/doi/10.1109/TCSVT.2023.3281671
Paul SSaha AArefin MBhuiyan TBiswas AReza AAlotaibi NAlyami SMoni M(2023)A Comprehensive Review of Green Computing: Past, Present, and Future ResearchIEEE Access10.1109/ACCESS.2023.330433211(87445-87494)Online publication date: 2023
https://doi.org/10.1109/ACCESS.2023.3304332

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents