Search | arXiv e-print repository

Privacy-Preserving Multi-Center Differential Protein Abundance Analysis with FedProt

Authors: Yuliya Burankova, Miriam Abele, Mohammad Bakhtiari, Christine von Törne, Teresa Barth, Lisa Schweizer, Pieter Giesbertz, Johannes R. Schmidt, Stefan Kalkhof, Janina Müller-Deile, Peter A van Veelen, Yassene Mohammed, Elke Hammer, Lis Arend, Klaudia Adamowicz, Tanja Laske, Anne Hartebrodt, Tobias Frisch, Chen Meng, Julian Matschinske, Julian Späth, Richard Röttger, Veit Schwämmle, Stefanie M. Hauck, Stefan Lichtenthaler , et al. (6 additional authors not shown)

Abstract: Quantitative mass spectrometry has revolutionized proteomics by enabling simultaneous quantification of thousands of proteins. Pooling patient-derived data from multiple institutions enhances statistical power but raises significant privacy concerns. Here we introduce FedProt, the first privacy-preserving tool for collaborative differential protein abundance analysis of distributed data, which uti… ▽ More Quantitative mass spectrometry has revolutionized proteomics by enabling simultaneous quantification of thousands of proteins. Pooling patient-derived data from multiple institutions enhances statistical power but raises significant privacy concerns. Here we introduce FedProt, the first privacy-preserving tool for collaborative differential protein abundance analysis of distributed data, which utilizes federated learning and additive secret sharing. In the absence of a multicenter patient-derived dataset for evaluation, we created two, one at five centers from LFQ E.coli experiments and one at three centers from TMT human serum. Evaluations using these datasets confirm that FedProt achieves accuracy equivalent to DEqMS applied to pooled data, with completely negligible absolute differences no greater than $\text{$4 \times 10^{-12}$}$. In contrast, -log10(p-values) computed by the most accurate meta-analysis methods diverged from the centralized analysis results by up to 25-27. FedProt is available as a web tool with detailed documentation as a FeatureCloud App. △ Less

Submitted 21 July, 2024; originally announced July 2024.

Comments: 52 pages, 16 figures, 12 tables. Last two authors listed are joint last authors

arXiv:2305.15453 [pdf]

Drugst.One -- A plug-and-play solution for online systems medicine and network-based drug repurposing

Authors: Andreas Maier, Michael Hartung, Mark Abovsky, Klaudia Adamowicz, Gary D. Bader, Sylvie Baier, David B. Blumenthal, Jing Chen, Maria L. Elkjaer, Carlos Garcia-Hernandez, Mohamed Helmy, Markus Hoffmann, Igor Jurisica, Max Kotlyar, Olga Lazareva, Hagai Levi, Markus List, Sebastian Lobentanzer, Joseph Loscalzo, Noel Malod-Dognin, Quirin Manz, Julian Matschinske, Miles Mee, Mhaned Oubounyt, Alexander R. Pico , et al. (14 additional authors not shown)

Abstract: In recent decades, the development of new drugs has become increasingly expensive and inefficient, and the molecular mechanisms of most pharmaceuticals remain poorly understood. In response, computational systems and network medicine tools have emerged to identify potential drug repurposing candidates. However, these tools often require complex installation and lack intuitive visual network mining… ▽ More In recent decades, the development of new drugs has become increasingly expensive and inefficient, and the molecular mechanisms of most pharmaceuticals remain poorly understood. In response, computational systems and network medicine tools have emerged to identify potential drug repurposing candidates. However, these tools often require complex installation and lack intuitive visual network mining capabilities. To tackle these challenges, we introduce Drugst.One, a platform that assists specialized computational medicine tools in becoming user-friendly, web-based utilities for drug repurposing. With just three lines of code, Drugst.One turns any systems biology software into an interactive web tool for modeling and analyzing complex protein-drug-disease networks. Demonstrating its broad adaptability, Drugst.One has been successfully integrated with 21 computational systems medicine tools. Available at https://drugst.one, Drugst.One has significant potential for streamlining the drug discovery process, allowing researchers to focus on essential aspects of pharmaceutical treatment research. △ Less

Submitted 4 July, 2023; v1 submitted 24 May, 2023; originally announced May 2023.

Comments: 45 pages, 6 figures, 7 tables

arXiv:2305.06488 [pdf]

A Platform for the Biomedical Application of Large Language Models

Authors: Sebastian Lobentanzer, Shaohong Feng, The BioChatter Consortium, Andreas Maier, Cankun Wang, Jan Baumbach, Nils Krehl, Qin Ma, Julio Saez-Rodriguez

Abstract: Current-generation Large Language Models (LLMs) have stirred enormous interest in recent months, yielding great potential for accessibility and automation, while simultaneously posing significant challenges and risk of misuse. To facilitate interfacing with LLMs in the biomedical space, while at the same time safeguarding their functionalities through sensible constraints, we propose a dedicated,… ▽ More Current-generation Large Language Models (LLMs) have stirred enormous interest in recent months, yielding great potential for accessibility and automation, while simultaneously posing significant challenges and risk of misuse. To facilitate interfacing with LLMs in the biomedical space, while at the same time safeguarding their functionalities through sensible constraints, we propose a dedicated, open-source framework: BioChatter. Based on open-source software packages, we synergise the many functionalities that are currently developing around LLMs, such as knowledge integration / retrieval-augmented generation, model chaining, and benchmarking, resulting in an easy-to-use and inclusive framework for application in many use cases of biomedicine. We focus on robust and user-friendly implementation, including ways to deploy privacy-preserving local open-source LLMs. We demonstrate use cases via two multi-purpose web apps (https://chat.biocypher.org), and provide documentation, support, and an open community. △ Less

Submitted 17 February, 2024; v1 submitted 10 May, 2023; originally announced May 2023.

Comments: 31 pages, 3 figures

arXiv:2212.13543 [pdf]

Democratising Knowledge Representation with BioCypher

Authors: Sebastian Lobentanzer, Patrick Aloy, Jan Baumbach, Balazs Bohar, Pornpimol Charoentong, Katharina Danhauser, Tunca Doğan, Johann Dreo, Ian Dunham, Adrià Fernandez-Torras, Benjamin M. Gyori, Michael Hartung, Charles Tapley Hoyt, Christoph Klein, Tamas Korcsmaros, Andreas Maier, Matthias Mann, David Ochoa, Elena Pareja-Lorente, Ferdinand Popp, Martin Preusse, Niklas Probul, Benno Schwikowski, Bünyamin Sen, Maximilian T. Strauss , et al. (4 additional authors not shown)

Abstract: Standardising the representation of biomedical knowledge among all researchers is an insurmountable task, hindering the effectiveness of many computational methods. To facilitate harmonisation and interoperability despite this fundamental challenge, we propose to standardise the framework of knowledge graph creation instead. We implement this standardisation in BioCypher, a FAIR (findable, accessi… ▽ More Standardising the representation of biomedical knowledge among all researchers is an insurmountable task, hindering the effectiveness of many computational methods. To facilitate harmonisation and interoperability despite this fundamental challenge, we propose to standardise the framework of knowledge graph creation instead. We implement this standardisation in BioCypher, a FAIR (findable, accessible, interoperable, reusable) framework to transparently build biomedical knowledge graphs while preserving provenances of the source data. Mapping the knowledge onto biomedical ontologies helps to balance the needs for harmonisation, human and machine readability, and ease of use and accessibility to non-specialist researchers. We demonstrate the usefulness of this framework on a variety of use cases, from maintenance of task-specific knowledge stores, to interoperability between biomedical domains, to on-demand building of task-specific knowledge graphs for federated learning. BioCypher (https://biocypher.org) frees up valuable developer time; we encourage further development and usage by the community. △ Less

Submitted 17 January, 2023; v1 submitted 27 December, 2022; originally announced December 2022.

Comments: 34 pages, 6 figures; submitted to Nature Biotechnology

arXiv:2011.08902 [pdf]

doi 10.1038/s41540-021-00181-x

Comparative transcriptome analysis reveals key epigenetic targets in SARS-CoV-2 infection

Authors: Marisol Salgado-Albarran, Erick I. Navarro-Delgado, Aylin Del Moral-Morales, Nicolas Alcaraz, Jan Baumbach, Rodrigo Gonzalez-Barrios, Ernesto Soto-Reyes

Abstract: COVID-19 is an infection caused by SARS-CoV-2 (Severe Acute Respiratory Syndrome coronavirus 2), which has caused a global outbreak. Current research efforts are focused on the understanding of the molecular mechanisms involved in SARS-CoV-2 infection in order to propose drug-based therapeutic options. Transcriptional changes due to epigenetic regulation are key host cell responses to viral infect… ▽ More COVID-19 is an infection caused by SARS-CoV-2 (Severe Acute Respiratory Syndrome coronavirus 2), which has caused a global outbreak. Current research efforts are focused on the understanding of the molecular mechanisms involved in SARS-CoV-2 infection in order to propose drug-based therapeutic options. Transcriptional changes due to epigenetic regulation are key host cell responses to viral infection and have been studied in SARS-CoV and MERS-CoV; however, such changes are not fully described for SARS-CoV-2. In this study, we analyzed multiple transcriptomes obtained from cell lines infected with MERS-CoV, SARS-CoV and SARS-CoV-2, and from COVID-19 patient-derived samples. Using integrative analyses of gene co-expression networks and de-novo pathway enrichment, we characterize different gene modules and protein pathways enriched with Transcription Factors or Epifactors relevant for SARS-CoV-2 infection. We identified EP300, MOV10, RELA and TRIM25 as top candidates, and more than 60 additional proteins involved in the epigenetic response during viral infection that have therapeutic potential. Our results show that targeting the epigenetic machinery could be a feasible alternative to treat COVID-19. △ Less

Submitted 17 November, 2020; originally announced November 2020.

Comments: 33 pages, 2 tables, 5 figures, 4 supplementary figures

arXiv:2010.16403 [pdf]

Flimma: a federated and privacy-preserving tool for differential gene expression analysis

Authors: Olga Zolotareva, Reza Nasirigerdeh, Julian Matschinske, Reihaneh Torkzadehmahani, Tobias Frisch, Julian Späth, David B. Blumenthal, Amir Abbasinejad, Paolo Tieri, Nina K. Wenke, Markus List, Jan Baumbach

Abstract: Aggregating transcriptomics data across hospitals can increase sensitivity and robustness of differential expression analyses, yielding deeper clinical insights. As data exchange is often restricted by privacy legislation, meta-analyses are frequently employed to pool local results. However, if class labels are inhomogeneously distributed between cohorts, their accuracy may drop. Flimma (https://e… ▽ More Aggregating transcriptomics data across hospitals can increase sensitivity and robustness of differential expression analyses, yielding deeper clinical insights. As data exchange is often restricted by privacy legislation, meta-analyses are frequently employed to pool local results. However, if class labels are inhomogeneously distributed between cohorts, their accuracy may drop. Flimma (https://exbio.wzw.tum.de/flimma/) addresses this issue by implementing the state-of-the-art workflow limma voom in a privacy-preserving manner, i.e. patient data never leaves its source site. Flimma results are identical to those generated by limma voom on combined datasets even in imbalanced scenarios where meta-analysis approaches fail. △ Less

Submitted 23 November, 2020; v1 submitted 30 October, 2020; originally announced October 2020.

Comments: 27 pages, 7 figures

arXiv:2004.12420 [pdf]

doi 10.1038/s41467-020-17189-2

Exploring the SARS-CoV-2 virus-host-drug interactome for drug repurposing

Authors: Sepideh Sadegh, Julian Matschinske, David B. Blumenthal, Gihanna Galindez, Tim Kacprowski, Markus List, Reza Nasirigerdeh, Mhaned Oubounyt, Andreas Pichlmair, Tim Daniel Rose, Marisol Salgado-Albarrán, Julian Späth, Alexey Stukalov, Nina K. Wenke, Kevin Yuan, Josch K. Pauling, Jan Baumbach

Abstract: Coronavirus Disease-2019 (COVID-19) is an infectious disease caused by the SARS-CoV-2 virus. It was first identified in Wuhan, China, and has since spread causing a global pandemic. Various studies have been performed to understand the molecular mechanisms of viral infection for predicting drug repurposing candidates. However, such information is spread across many publications and it is very time… ▽ More Coronavirus Disease-2019 (COVID-19) is an infectious disease caused by the SARS-CoV-2 virus. It was first identified in Wuhan, China, and has since spread causing a global pandemic. Various studies have been performed to understand the molecular mechanisms of viral infection for predicting drug repurposing candidates. However, such information is spread across many publications and it is very time-consuming to access, integrate, explore, and exploit. We developed CoVex, the first interactive online platform for SARS-CoV-2 and SARS-CoV-1 host interactome exploration and drug (target) identification. CoVex integrates 1) experimentally validated virus-human protein interactions, 2) human protein-protein interactions and 3) drug-target interactions. The web interface allows user-friendly visual exploration of the virus-host interactome and implements systems medicine algorithms for network-based prediction of drugs. Thus, CoVex is an important resource, not only to understand the molecular mechanisms involved in SARS-CoV-2 and SARS-CoV-1 pathogenicity, but also in clinical research for the identification and prioritization of candidate therapeutics. We apply CoVex to investigate recent hypotheses on a systems biology level and to systematically explore the molecular mechanisms driving the virus life cycle. Furthermore, we extract and discuss drug repurposing candidates involved in these mechanisms. CoVex renders COVID-19 drug research systems-medicine-ready by giving the scientific community direct access to network medicine algorithms integrating virus-host-drug interactions. It is available at https://exbio.wzw.tum.de/covex/. △ Less

Submitted 26 April, 2020; originally announced April 2020.

Comments: 15 pages, 4 figures

Journal ref: Nat Commun 11, 3518 (2020)

arXiv:1904.12353 [pdf, other]

TiCoNE 2: A Composite Clustering Model for Robust Cluster Analyses on Noisy Data

Authors: Christian Wiwie, Richard Röttger, Jan Baumbach

Abstract: Identifying groups of similar objects using clustering approaches is one of the most frequently employed first steps in exploratory biomedical data analysis. Many clustering methods have been developed that pursue different strategies to identify the optimal clustering for a data set. We previously published TiCoNE, an interactive clustering approach coupled with de-novo network enrichment of id… ▽ More Identifying groups of similar objects using clustering approaches is one of the most frequently employed first steps in exploratory biomedical data analysis. Many clustering methods have been developed that pursue different strategies to identify the optimal clustering for a data set. We previously published TiCoNE, an interactive clustering approach coupled with de-novo network enrichment of identified clusters. However, in this first version time-series and network analysis remained two separate steps in that only time-series data was clustered, and identified clusters mapped to and enriched within a network in a second separate step. In this work, we present TiCoNE 2: An extension that can now seamlessly incorporate multiple data types within its composite clustering model. Systematic evaluation on 50 random data sets, as well as on 2,400 data sets containing enriched cluster structure and varying levels of noise, shows that our approach is able to successfully recover cluster patterns embedded in random data and that it is more robust towards noise than non-composite models using only one data type, when applied to two data types simultaneously. Herein, each data set was clustered using five different similarity functions into k=10/30 clusters, resulting to ~5,000 clusterings in total. We evaluated the quality of each derived clustering with the Jaccard index and an internal validity score. We used TiCoNE to calculate empirical p-values for all generated clusters with different permutation functions, resulting in ~80,000 cluster p-values. We show, that derived p-values can be used to reliably distinguish between foreground and background clusters. TiCoNE 2 allows researchers to seamlessly analyze time-series data together with biological interaction networks in an intuitive way and thereby provides more robust results than single data type cluster analyses. △ Less

Submitted 28 April, 2019; originally announced April 2019.

arXiv:1710.10262 [pdf]

Elucidation of time-dependent systems biology cell response patterns with time course network enrichment

Authors: Christian Wiwie, Alexander Rauch, Anders Haakonsson, Inigo Barrio-Hernandez, Blagoy Blagoev, Susanne Mandrup, Richard Röttger, Jan Baumbach

Abstract: Advances in OMICS technologies emerged both massive expression data sets and huge networks modelling the molecular interplay of genes, RNAs, proteins and metabolites. Network enrichment methods combine these two data types to extract subnetwork responses from case/control setups. However, no methods exist to integrate time series data with networks, thus preventing the identification of time-depen… ▽ More Advances in OMICS technologies emerged both massive expression data sets and huge networks modelling the molecular interplay of genes, RNAs, proteins and metabolites. Network enrichment methods combine these two data types to extract subnetwork responses from case/control setups. However, no methods exist to integrate time series data with networks, thus preventing the identification of time-dependent systems biology responses. We close this gap with Time Course Network Enrichment (TiCoNE). It combines a new kind of human-augmented clustering with a novel approach to network enrichment. It finds temporal expression prototypes that are mapped to a network and investigated for enriched prototype pairs interacting more often than expected by chance. Such patterns of temporal subnetwork co-enrichment can be compared between different conditions. With TiCoNE, we identified the first distinguishing temporal systems biology profiles in time series gene expression data of human lung cells after infection with Influenza and Rhino virus. TiCoNE is available online (https://ticone.compbio.sdu.dk) and as Cytoscape app in the Cytoscape App Store (http://apps.cytoscape.org/). △ Less

Submitted 27 October, 2017; originally announced October 2017.

Showing 1–9 of 9 results for author: Baumbach, J