Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3511808.3557676acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
short-paper

PyKale: Knowledge-Aware Machine Learning from Multiple Sources in Python

Published: 17 October 2022 Publication History
  • Get Citation Alerts
  • Abstract

    PyKale is a Python library for Knowledge-aware machine learning from multiple sources of data to enable/accelerate interdisciplinary research. It embodies green machine learning principles to reduce repetitions/redundancy, reuse existing resources, and recycle learning models across areas. We propose a pipeline-based application programming interface (API) so all machine learning workflows follow a standardized six-step pipeline. PyKale focuses on leveraging knowledge from multiple sources for accurate and interpretable prediction, particularly multimodal learning and transfer learning. To be more accessible, it separates code and configurations to enable non-programmers to configure systems without coding. PyKale is officially part of the PyTorch ecosystem and includes interdisciplinary examples in bioinformatics, knowledge graph, image/video recognition, and medical imaging: https://pykale.github.io/.

    References

    [1]
    Kartik Ahuja and Mihaela van der Schaar. 2019. Joint Concordance Index. In Proceedings of the 2019 53rd Asilomar Conference on Signals, Systems, and Computers. 2206--2213.
    [2]
    Samer Alabed, Johanna Uthoff, Shuo Zhou, Pankaj Garg, Krit Dwivedi, Faisal Alandejani, Rebecca Gosling, Lawrence Schobs, Martin Brook, Yousef Shahin, et al. 2022. Machine learning cardiac-MRI features predict mortality in newly diagnosed pulmonary arterial hypertension. European Heart Journal-Digital Health, Vol. 3, 2 (2022), 265--275.
    [3]
    Peizhen Bai, Yan Ge, Fangling Liu, and Haiping Lu. 2019. Joint interaction with context operation for collaborative filtering. Pattern Recognition, Vol. 88 (2019), 729--738.
    [4]
    Shai Ben-David, John Blitzer, Koby Crammer, Fernando Pereira, et al. 2007. Analysis of representations for domain adaptation. In Proceedings of the Advances in Neural Information Processing Systems. 137--144.
    [5]
    Oren Ben-Kiki, Clark Evans, and Brian Ingerson. 2009. Yaml ain't markup language (yaml?) version 1.1. Working Draft 2008-05, Vol. 11 (2009).
    [6]
    Antonio Candelieri, Riccardo Perego, and Francesco Archetti. 2021. Green machine learning via augmented Gaussian processes and multi-information source optimization. Soft Computing (2021), 1--13.
    [7]
    Joao Carreira and Andrew Zisserman. 2017. Quo vadis, action recognition? a new model and the kinetics dataset. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6299--6308.
    [8]
    William A Falcon and et al. 2019. PyTorch Lightning. GitHub., Vol. 3 (2019). https://github.com/PyTorchLightning/pytorch-lightning
    [9]
    Matthias Fey and Jan E. Lenssen. 2019. Fast Graph Representation Learning with PyTorch Geometric. In ICLR Workshop on Representation Learning on Graphs and Manifolds.
    [10]
    Yaroslav Ganin, Evgeniya Ustinova, Hana Ajakan, Pascal Germain, Hugo Larochelle, Francc ois Laviolette, Mario Marchand, and Victor Lempitsky. 2016. Domain-adversarial training of neural networks. Journal of Machine Learning Research, Vol. 17, 1 (2016), 2096--2030.
    [11]
    Eva Garc'ia Mart'in. 2017. Energy efficiency in machine learning: A position paper. In Proceedings of the 30th Annual Workshop of the Swedish Artificial Intelligence Society, Vol. 137. 68--72.
    [12]
    Jacob Gardner, Geoff Pleiss, Kilian Q Weinberger, David Bindel, and Andrew G Wilson. 2018. GPyTorch: Blackbox Matrix-Matrix Gaussian Process Inference with GPU Acceleration. In Proceedings of the Advances in Neural Information Processing Systems, Vol. 31. 7587--7597.
    [13]
    Georgian. 2020. Multimodal-Toolkit. GitHub (2020). https://github.com/georgian-io/Multimodal-Toolkit
    [14]
    Jie Hu, Li Shen, and Gang Sun. 2018. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7132--7141.
    [15]
    Junguang Jiang, Bo Fu, and Mingsheng Long. 2020. Transfer-Learning-library. GitHub (2020). https://github.com/thuml/Transfer-Learning-Library
    [16]
    Thomas Kipf and Max Welling. 2017. Semi-Supervised Classification with Graph Convolutional Networks. In Proceedings of the 5th International Conference on Learning Representations.
    [17]
    Jean Kossaifi, Yannis Panagakis, Anima Anandkumar, and Maja Pantic. 2019. TensorLy: Tensor Learning in Python. Journal of Machine Learning Research, Vol. 20, 26 (2019), 1--6.
    [18]
    Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Proceedings of the Advances in Neural Information Processing Systems. 1097--1105.
    [19]
    Sun Yuan Kung. 2014. Kernel methods and machine learning. Cambridge University Press.
    [20]
    Tiqing Liu, Yuhmei Lin, Xin Wen, R. Jorissen, and M. Gilson. 2007. BindingDB: a web-accessible database of experimentally determined protein--ligand binding affinities. Nucleic Acids Research, Vol. 35 (2007), D198 -- D201.
    [21]
    Mingsheng Long, Yue Cao, Jianmin Wang, and Michael Jordan. 2015. Learning transferable features with deep adaptation networks. In Proceedings of the International Conference on Machine Learning. 97--105.
    [22]
    Mingsheng Long, Zhangjie Cao, Jianmin Wang, and Michael I Jordan. 2018. Conditional Adversarial Domain Adaptation. In Proceedings of the Advances in Neural Information Processing Systems, Vol. 31.
    [23]
    Mingsheng Long, Han Zhu, Jianmin Wang, and Michael I Jordan. 2017. Deep transfer learning with joint adaptation networks. In Proceedings of the International Conference on Machine Learning. 2208--2217.
    [24]
    Haiping Lu, Konstantinos N Plataniotis, and Anastasios N Venetsanopoulos. 2008. MPCA: Multilinear principal component analysis of tensor objects. IEEE Transactions on Neural Networks, Vol. 19, 1 (2008), 18--39.
    [25]
    Nic Ma, Wenqi Li, and Richard Brown. 2021. Project-MONAI/MONAI: 0.5.3. https://doi.org/10.5281/zenodo.4891800
    [26]
    Sébastien Marcel and Yann Rodriguez. 2010. Torchvision the Machine-Vision Package of Torch. In Proceedings of the 18th ACM International Conference on Multimedia. 1485--1488.
    [27]
    Xiangrui Meng, Joseph Bradley, Burak Yavuz, Evan Sparks, Shivaram Venkataraman, Davies Liu, Jeremy Freeman, DB Tsai, Manish Amde, Sean Owen, et al. 2016. Mllib: Machine learning in apache spark. Journal of Machine Learning Research, Vol. 17, 1 (2016), 1235--1241.
    [28]
    Hakime Öztürk, E. Olmez, and Arzucan Özgür. 2018. DeepDTA: deep drug--target binding affinity prediction. Bioinformatics, Vol. 34, 17 (2018), i821 -- i829.
    [29]
    Sinno Jialin Pan, Ivor W Tsang, James T Kwok, and Qiang Yang. 2010. Domain adaptation via transfer component analysis. IEEE Transactions on Neural Networks, Vol. 22, 2 (2010), 199--210.
    [30]
    Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, et al. 2011. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, Vol. 12 (2011), 2825--2830.
    [31]
    Xingchao Peng, Qinxun Bai, Xide Xia, Zijun Huang, Kate Saenko, and Bo Wang. 2019. Moment matching for multi-source domain adaptation. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 1406--1415.
    [32]
    Fernando Pérez-García, Rachel Sparks, and Sebastien Ourselin. 2020. TorchIO: a Python library for efficient loading, preprocessing, augmentation and patch-based sampling of medical images in deep learning. (2020). http://arxiv.org/abs/2003.04696
    [33]
    Haozhi Qi, Chong You, Xiaolong Wang, Yi Ma, and Jitendra Malik. 2020. Deep isometric learning for visual recognition. In Proceedings of the International Conference on Machine Learning. 7824--7835.
    [34]
    Edgar Riba, Dmytro Mishkin, Daniel Ponsa, Ethan Rublee, and Gary Bradski. 2020. Kornia: an open source differentiable computer vision library for pytorch. In Proceedings of the IEEE Winter Conference on Applications of Computer Vision. 3674--3683.
    [35]
    Aghiles Salah, Quoc-Tuan Truong, and Hady W Lauw. 2020. Cornac: A Comparative Framework for Multimodal Recommender Systems. Journal of Machine Learning Research, Vol. 21, 95 (2020), 1--5.
    [36]
    Roy Schwartz, Jesse Dodge, Noah A Smith, and Oren Etzioni. 2020. Green AI. Commun. ACM, Vol. 63, 12 (2020), 54--63.
    [37]
    Jian Shen, Yanru Qu, Weinan Zhang, and Yong Yu. 2018. Wasserstein distance guided representation learning for domain adaptation. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 32.
    [38]
    Amanpreet Singh, Vedanuj Goswami, Vivek Natarajan, Yu Jiang, Xinlei Chen, Meet Shah, Marcus Rohrbach, Dhruv Batra, and Devi Parikh. 2020. MMF: A multimodal framework for vision and language research. https://github.com/facebookresearch/mmf.
    [39]
    Xiaonan Song, Lingnan Meng, Qiquan Shi, and Haiping Lu. 2015. Learning tensor-based features for whole-brain fMRI classification. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention. 613--620.
    [40]
    Andrew J Swift, Haiping Lu, Johanna Uthoff, Pankaj Garg, Marcella Cogliano, Jonathan Taylor, Peter Metherall, Shuo Zhou, Christopher S Johns, Samer Alabed, et al. 2021. A machine learning cardiac magnetic resonance approach to extract disease features and automate pulmonary arterial hypertension diagnosis. European Heart Journal-Cardiovascular Imaging, Vol. 22, 2 (2021), 236--245.
    [41]
    Anne-Marie Tousch and Christophe Renaudin. 2020. (Yet) Another Domain Adaptation library. https://github.com/criteo-research/pytorch-ada
    [42]
    Du Tran, Heng Wang, Lorenzo Torresani, Jamie Ray, Yann LeCun, and Manohar Paluri. 2018. A closer look at spatiotemporal convolutions for action recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6450--6459.
    [43]
    Johanna Uthoff, Samer Alabed, Andrew J Swift, and Haiping Lu. 2020. Geodesically Smoothed Tensor Features for Pulmonary Hypertension Prognosis Using the Heart and Surrounding Tissues. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention. 253--262.
    [44]
    Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan Gomez, Ukasz Kaiser, and Illia Polosukhin. 2017. Attention is All You Need. In Proceedings of the Advances in Neural Information Processing Systems. 6000--6010.
    [45]
    Hao Xu, Shengqi Sang, Peizhen Bai, Ruike Li, Laurence Yang, and Haiping Lu. 2022. GripNet: Graph Information Propagation on Supergraphs for Heterogeneous Graphs. Pattern Recognition (2022).
    [46]
    Ke Yan, Lu Kou, and David Zhang. 2017. Learning domain-invariant subspace using domain features and independence maximization. IEEE transactions on cybernetics, Vol. 48, 1 (2017), 288--299.
    [47]
    Shuo Zhou. 2022. Interpretable Domain-Aware Learning for Neuroimage Classification. Ph.D. Dissertation. University of Sheffield.
    [48]
    Yongchun Zhu, Fuzhen Zhuang, and Deqing Wang. 2019. Aligning Domain-Specific Distribution and Classifier for Cross-Domain Classification from Multiple Sources. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 5989--5996.
    [49]
    Marinka Zitnik, Monica Agrawal, and Jure Leskovec. 2018a. Modeling polypharmacy side effects with graph convolutional networks. Bioinformatics, Vol. 34, 13 (2018), i457--i466.
    [50]
    Marinka Zitnik, Rok Sosivc, Sagar Maheshwari, and Jure Leskovec. 2018b. BioSNAP Datasets: Stanford Biomedical Network Dataset Collection. endthebibl

    Cited By

    View all
    • (2023)First-Person Video Domain Adaptation With Multi-Scene Cross-Site Datasets and Attention-Based MethodsIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2023.328167133:12(7774-7788)Online publication date: 31-May-2023
    • (2023)A Comprehensive Review of Green Computing: Past, Present, and Future ResearchIEEE Access10.1109/ACCESS.2023.330433211(87445-87494)Online publication date: 2023

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    CIKM '22: Proceedings of the 31st ACM International Conference on Information & Knowledge Management
    October 2022
    5274 pages
    ISBN:9781450392365
    DOI:10.1145/3511808
    • General Chairs:
    • Mohammad Al Hasan,
    • Li Xiong
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 17 October 2022

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. machine learning
    2. multimodal learning
    3. pytorch
    4. transfer learning

    Qualifiers

    • Short-paper

    Funding Sources

    Conference

    CIKM '22
    Sponsor:

    Acceptance Rates

    CIKM '22 Paper Acceptance Rate 621 of 2,257 submissions, 28%;
    Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

    Upcoming Conference

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)47
    • Downloads (Last 6 weeks)3

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)First-Person Video Domain Adaptation With Multi-Scene Cross-Site Datasets and Attention-Based MethodsIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2023.328167133:12(7774-7788)Online publication date: 31-May-2023
    • (2023)A Comprehensive Review of Green Computing: Past, Present, and Future ResearchIEEE Access10.1109/ACCESS.2023.330433211(87445-87494)Online publication date: 2023

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media