Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3627673.3680041acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Revolutionizing Biomarker Discovery: Leveraging Generative AI for Bio-Knowledge-Embedded Continuous Space Exploration

Published: 21 October 2024 Publication History

Abstract

Biomarker discovery is vital in advancing personalized medicine, offering insights into disease diagnosis, prognosis, and therapeutic efficacy. Traditionally, the identification and validation of biomarkers heavily depend on extensive experiments and statistical analyses. These approaches are time-consuming, demand extensive domain expertise, and are constrained by the complexity of biological systems. These limitations motivate us to ask: Can we automatically identify the effective biomarker subset without substantial human efforts? Inspired by the success of generative AI, we think that the intricate knowledge of biomarker identification can be compressed into a continuous embedding space, thus enhancing the search for better biomarkers. Thus, we propose a new biomarker identification framework with two important modules:1) training data preparation and 2) embedding-optimization-generation. The first module uses a multi-agent system to automatically collect pairs of biomarker subsets and their corresponding prediction accuracy as training data. These data establish a strong knowledge base for biomarker identification. The second module employs an encoder-evaluator-decoder learning paradigm to compress the knowledge of the collected data into a continuous space. Then, it utilizes gradient-based search techniques and autoregressive-based reconstruction to efficiently identify the optimal subset of biomarkers. Finally, we conduct extensive experiments on three real-world datasets to show the efficiency, robustness, and effectiveness of our method.

References

[1]
Babak Alipanahi, Andrew Delong, Matthew T Weirauch, and Brendan J Frey. 2015. Predicting the sequence specificities of DNA-and RNA-binding proteins by deep learning. Nature biotechnology 33, 8 (2015), 831--838.
[2]
Genomes Project Consortium, A Auton, LD Brooks, RM Durbin, EP Garrison, and HM Kang. 2015. A global reference for human genetic variation. Nature 526, 7571 (2015), 68--74.
[3]
Joost CF De Winter. 2019. Using the Student's t-test with extremely small sample sizes. Practical Assessment, Research, and Evaluation 18, 1 (2019), 10.
[4]
Wei Fan, Kunpeng Liu, Hao Liu, Ahmad Hariri, Dejing Dou, and Yanjie Fu. 2021. Autogfs: Automated group-based feature selection via interactive reinforcement learning. In Proceedings of the 2021 SIAM International Conference on Data Mining (SDM). SIAM, 342--350.
[5]
George Forman et al. 2003. An extensive empirical study of feature selection metrics for text classification. J. Mach. Learn. Res. 3, Mar (2003), 1289--1305.
[6]
Nanxu Gong, Wangyang Ying, Dongjie Wang, and Yanjie Fu. 2024. Neuro-Symbolic Embedding for Short and Effective Feature Selection via Autoregressive Generation. arXiv preprint arXiv:2404.17157 (2024).
[7]
Pablo M Granitto, Cesare Furlanello, Franco Biasioli, and Flavia Gasperi. 2006. Recursive feature elimination with random forest for PTR-MS analysis of agroindustrial products. Chemometrics and intelligent laboratory systems 83, 2 (2006), 83--90.
[8]
Mark A Hall. 1999. Feature selection for discrete and numeric class machine learning. (1999).
[9]
Amin Hashemi, Mohammad Bagher Dowlatshahi, and Hossein Nezamabadi-pour. 2022. Ensemble of feature selection algorithms: a multi-criteria decision-making approach. International Journal of Machine Learning and Cybernetics 13, 1 (2022), 49--69.
[10]
Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation 9, 8 (1997), 1735--1780.
[11]
Yanyong Huang, Zongxin Shen, Yuxin Cai, Xiuwen Yi, Dongjie Wang, Fengmao Lv, and Tianrui Li. 2023. C2IMUFS: Complementary and Consensus Learning-Based Incomplete Multi-View Unsupervised Feature Selection. IEEE Transactions on Knowledge and Data Engineering 35, 10 (2023), 10681--10694. https://doi.org/ 10.1109/TKDE.2023.3266595
[12]
YeongSeog Kim, W Nick Street, and Filippo Menczer. 2000. Feature selection in unsupervised learning via evolutionary search. In Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining. 365--369.
[13]
Diederik P Kingma and Max Welling. 2013. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114 (2013).
[14]
Ron Kohavi and George H John. 1997. Wrappers for feature subset selection. Artificial intelligence 97, 1--2 (1997), 273--324.
[15]
Riccardo Leardi. 1996. Genetic algorithms in feature selection. In Genetic algorithms in molecular modeling. Elsevier, 67--86.
[16]
Ismael Lemhadri, Feng Ruan, and Rob Tibshirani. 2021. Lassonet: Neural networks with feature sparsity. In International Conference on Artificial Intelligence and Statistics. PMLR, 10--18.
[17]
Yuk Fai Leung and Duccio Cavalieri. 2003. Fundamentals of cDNA microarray data analysis. TRENDS in Genetics 19, 11 (2003), 649--659.
[18]
Kunpeng Liu, Yanjie Fu, Pengfei Wang, Le Wu, Rui Bo, and Xiaolin Li. 2019. Automating feature subspace exploration via multi-agent reinforcement learning. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 207--215.
[19]
Kunpeng Liu, Pengfei Wang, Dongjie Wang, Wan Du, Dapeng Oliver Wu, and Yanjie Fu. 2021. Efficient Reinforced Feature Selection via Early Stopping Traverse Strategy. In 2021 IEEE International Conference on Data Mining (ICDM). IEEE, 399--408.
[20]
Xiao-Li Meng, Robert Rosenthal, and Donald B Rubin. 1992. Comparing correlated correlation coefficients. Psychological bulletin 111, 1 (1992), 172.
[21]
Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, et al. 2015. Human-level control through deep reinforcement learning. nature 518, 7540 (2015), 529--533.
[22]
Patrenahalli M. Narendra and Keinosuke Fukunaga. 1977. A branch and bound algorithm for feature subset selection. IEEE Transactions on computers 9 (1977), 917--922.
[23]
Zhiyuan Ning, Chunlin Tian, Meng Xiao,Wei Fan, PengyangWang, Li Li, Pengfei Wang, and Yuanchun Zhou. 2024. FedGCS: A Generative Framework for Efficient Client Selection in Federated Learning via Gradient-based Optimization. arXiv preprint arXiv:2405.06312 (2024).
[24]
Hanchuan Peng, Fuhui Long, and Chris Ding. 2005. Feature selection based on mutual information criteria of max-dependency, max-relevance, and minredundancy. IEEE Transactions on pattern analysis and machine intelligence 27, 8 (2005), 1226--1238.
[25]
Robin L Plackett. 1983. Karl Pearson and the chi-squared test. International statistical review/revue internationale de statistique (1983), 59--72.
[26]
Lusheng Song, Minkyo Song, M Constanza Camargo, Jennifer Van Duine, Stacy Williams, Yunro Chung, Kyoung-Mee Kim, Jolanta Lissowska, Armands Sivins, Weimin Gao, et al. 2021. Identification of anti-Epstein-Barr virus (EBV) antibody signature in EBV-associated gastric carcinoma. Gastric Cancer 24 (2021), 858--867.
[27]
Lusheng Song, Minkyo Song, Charles S Rabkin, Yunro Chung, Stacy Williams, Javier Torres, Alejandro H Corvalan, Robinson Gonzalez, Enrique Bellolio, Mahasish Shome, et al. 2023. Identification of anti-Helicobacter pylori antibody signatures in gastric intestinal metaplasia. Journal of Gastroenterology 58, 2 (2023), 112--124.
[28]
Lusheng Song, Minkyo Song, Charles S Rabkin, Stacy Williams, Yunro Chung, Jennifer Van Duine, Linda M Liao, Kailash Karthikeyan, Weimin Gao, Jin G Park, et al. 2020. Helicobacter pylori immunoproteomic profiles in gastric cancer. Journal of Proteome Research 20, 1 (2020), 409--419.
[29]
Lars St, Svante Wold, et al. 1989. Analysis of variance (ANOVA). Chemometrics and intelligent laboratory systems 6, 4 (1989), 259--272.
[30]
V Sugumaran, V Muralidharan, and KI Ramachandran. 2007. Feature selection using decision tree and classification through proximal support vector machine for fault diagnostics of roller bearing. Mechanical systems and signal processing 21, 2 (2007), 930--942.
[31]
Robert Tibshirani. 1996. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Methodological) 58, 1 (1996), 267--288.
[32]
Emil Uffelmann, Qin Qin Huang, Nchangwi Syntia Munung, Jantina De Vries, Yukinori Okada, Alicia R Martin, Hilary C Martin, Tuuli Lappalainen, and Danielle Posthuma. 2021. Genome-wide association studies. Nature Reviews Methods Primers 1, 1 (2021), 59.
[33]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in neural information processing systems 30 (2017).
[34]
Meng Xiao, Dongjie Wang, Min Wu, Pengfei Wang, Yuanchun Zhou, and Yanjie Fu. 2023. Beyond discrete selection: Continuous embedding space optimization for generative feature selection. In 2023 IEEE International Conference on Data Mining (ICDM). IEEE, 688--697.
[35]
Jihoon Yang and Vasant Honavar. 1998. Feature subset selection using a genetic algorithm. In Feature extraction, construction and selection. Springer, 117--136.
[36]
Yiming Yang and Jan O Pedersen. 1997. A comparative study on feature selection in text categorization. In Icml, Vol. 97. Nashville, TN, USA, 35.
[37]
Wangyang Ying, Dongjie Wang, Haifeng Chen, and Yanjie Fu. 2024. Feature Selection as Deep Sequential Generative Learning. arXiv preprint arXiv:2403.03838 (2024).
[38]
Lei Yu and Huan Liu. 2003. Feature selection for high-dimensional data: A fast correlation-based filter solution. In Proceedings of the 20th international conference on machine learning (ICML-03). 856--863.
[39]
Weiliang Zhang, Zhen Meng, Dongjie Wang, Min Wu, Kunpeng Liu, Yuanchun Zhou, and Meng Xiao. 2024. Enhanced Gene Selection in Single-Cell Genomics: Pre-Filtering Synergy and Reinforced Optimization. arXiv preprint arXiv:2406.07418 (2024).

Index Terms

  1. Revolutionizing Biomarker Discovery: Leveraging Generative AI for Bio-Knowledge-Embedded Continuous Space Exploration

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    CIKM '24: Proceedings of the 33rd ACM International Conference on Information and Knowledge Management
    October 2024
    5705 pages
    ISBN:9798400704369
    DOI:10.1145/3627673
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 21 October 2024

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. biomarker identification
    2. data application
    3. feature selection

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    CIKM '24
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

    Upcoming Conference

    CIKM '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 72
      Total Downloads
    • Downloads (Last 12 months)72
    • Downloads (Last 6 weeks)10
    Reflects downloads up to 08 Feb 2025

    Other Metrics

    Citations

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media