\DeclareCaptionType

InfoBox

\journaltitle

Bioinformatics Advances \DOIDOI HERE \accessAdvance Access Publication Date: Day Month Year \appnotesPerspective

\corresp

[ $\dagger$ ]Co-first authors.

\corresp

[ $\#$ ]Co-second authors: coordinators for Sections 2-7, listed in their alphabetical order by last name.

\corresp

[ $\ast$ ]Corresponding author (email: tmilenko@nd.edu).

\corresp

[]All other authors are listed in their alphabetical order by last name.
Co-authorships on this paper are the result of the co-authors participating in the same workshop and not of scientific collaboration of any sort. As such, the co-authorships do not constitute any conflict of interest.

Current and future directions in network biology

Marinka Zitnik Michelle M. Li Aydin Wells Kimberly Glass Deisy Morselli Gysi Arjun Krishnan T. M. Murali Predrag Radivojac Sushmita Roy Anaïs Baudot Serdar Bozdag Danny Z. Chen Lenore Cowen Kapil Devkota Anthony Gitter Sara Gosline Pengfei Gu Pietro H. Guzzi Heng Huang Meng Jiang Ziynet Nesibe Kesimoglu Mehmet Koyuturk Jian Ma Alexander R. Pico Nataša Pržulj Teresa M. Przytycka Benjamin J. Raphael Anna Ritz Roded Sharan Yang Shen Mona Singh Donna K. Slonim Hanghang Tong Xinan Holly Yang Byung-Jun Yoon Haiyuan Yu Tijana Milenković \orgdivDepartment of Biomedical Informatics, \orgnameHarvard University \orgdivDepartment of Computer Science and Engineering, \orgnameUniversity of Notre Dame \orgdivLucy Family Institute for Data and Society, \orgnameUniversity of Notre Dame \orgdivEck Institute for Global Health, \orgnameUniversity of Notre Dame \orgdivChanning Division of Network Medicine, \orgnameBrigham and Women’s Hospital, Harvard Medical School \orgdivDepartment of Statistics, \orgnameFederal University of Paraná \orgdivDepartment of Physics, \orgnameNortheastern University \orgdivDepartment of Biomedical Informatics, \orgnameUniversity of Colorado \orgdivDepartment of Computer Science, \orgnameVirginia Tech \orgdivKhoury College of Computer Sciences, \orgnameNortheastern University \orgdivDepartment of Biostatistics and Medical Informatics, \orgnameUniversity of Wisconsin-Madison \orgnameWisconsin Institute for Discovery \orgnameAix Marseille Université, INSERM, MMG, \orgaddressMarseille, France \orgdivDepartment of Computer Science and Engineering, \orgnameUniversity of North Texas \orgdivDepartment of Mathematics, \orgnameUniversity of North Texas \orgdivDepartment of Computer Science, \orgnameTufts University \orgnameMorgridge Institute for Research \orgnamePacific Northwest National Laboratory \orgdivDepartment of Medical and Surgical Sciences, \orgnameUniversity Magna Graecia of Catanzaro \orgdivDepartment of Computer Science, \orgnameUniversity of Maryland College Park \orgdivNational Center of Biotechnology Information, \orgnameNational Library of Medicine, National Institutes of Health \orgdivDepartment of Computer and Data Sciences, \orgnameCase Western Reserve University \orgdivComputational Biology Department, School of Computer Science, \orgnameCarnegie Mellon University \orgdivInstitute of Data Science and Biotechnology, \orgnameGladstone Institutes \orgdivDepartment of Computer Science, \orgnameUniversity College London \orgdivICREA, \orgnameCatalan Institution for Research and Advanced Studies \orgnameBarcelona Supercomputing Center (BSC) \orgdivDepartment of Computer Science, \orgnamePrinceton University \orgdivDepartment of Biology, \orgnameReed College \orgdivSchool of Computer Science, \orgnameTel Aviv University \orgdivDepartment of Electrical and Computer Engineering, \orgnameTexas A&M University \orgdivLewis-Sigler Institute for Integrative Genomics, \orgnamePrinceton University \orgdivDepartment of Computer Science, \orgnameUniversity of Illinois Urbana-Champaign \orgdivDepartment of Pediatrics, \orgnameUniversity of Chicago \orgdivComputational Science Initiative, \orgnameBrookhaven National Laboratory \orgdivDepartment of Computational Biology, \orgnameWeill Institute for Cell and Molecular Biology, Cornell University

(2024; 2024)

Abstract

Network biology is an interdisciplinary field bridging computational and biological sciences that has proved pivotal in advancing the understanding of cellular functions and diseases across biological systems and scales. Although the field has been around for two decades, it remains nascent. It has witnessed rapid evolution, accompanied by emerging challenges. These challenges stem from various factors, notably the growing complexity and volume of data together with the increased diversity of data types describing different tiers of biological organization. We discuss prevailing research directions in network biology and highlight areas of inference and comparison of biological networks, multimodal data integration and heterogeneous networks, higher-order network analysis, machine learning on networks, and network-based personalized medicine. Following the overview of recent breakthroughs across these five areas, we offer a perspective on the future directions of network biology. Additionally, we offer insights into scientific communities, educational initiatives, and the importance of fostering diversity within the field. This paper establishes a roadmap for an immediate and long-term vision for network biology.

keywords:

Biological networks, algorithms, machine learning, multi-omics, network analysis

Acronyms used in this paper

3D: 3-dimensional
BKG: biomedical knowledge graph
CAFA: Critical Assessment of protein Function Annotation
CASP: Critical Assessment of protein Structure Prediction
CondBKG: condition-aware biomedical knowledge graph
DREAM: Dialogue on Reverse Engineering Assessment and Methods
EDI: equity, diversity, and inclusion
GDV: graphlet degree vector
GO: Gene Ontology
GNN: graph neural network
ISCB: International Society for Computational Biology
LLM: large language model
PPI: protein-protein interaction
TF: transcription factor
TGF $\beta$ : transforming growth factor-beta

1 1. Introduction

A network (or graph) is comprised of a set of nodes (or vertices) that are connected by a set of edges (or links); see InfoBox 1. Networks allow us to study the properties of a complex system that emerge from interactions between its individual components. Networks have been a powerful way to represent a variety of real-world phenomena, including technological, information, transportation, social, financial, software, ecological, chemical, and biological systems Barabasi2016 ; newman2018networks . Our focus is on biological networks, which offer the understanding of complex functions at the levels of genes, proteins, cells, tissues, organs, etc., by representing a given biological system as an interconnected entity rather than a collection of individual components. In a biological network, nodes typically represent biomolecules (e.g., amino acid residues within a protein, proteins within a cell, or cells within a tissue), and edges typically indicate interactions between the biomolecules (e.g., physical, functional, or chemical). While the main focus of our paper is on such biological networks that model relationships between biomolecules, i.e., on molecular/cellular networks, our paper touches on other types of biological networks, such as biomedical knowledge graphs, ontologies, patient similarity networks modeling e.g., electronic health record data, brain networks constructed from medical imaging data, and even social and contact networks relevant for spread of disease. We acknowledge that other types of biological networks exist that are not the focus of our paper and that we thus do not cover, such as ecological ones.

Network biology (Fig. 1) is an interdisciplinary field spanning computational (e.g., algorithms, graph theory, network science, data mining, and machine learning) and biological sciences. While the field has existed for nearly two decades, it has undergone numerous rapid changes and new computational challenges have arisen. This is caused by many factors, including increasing data complexity, such as multiple types of data becoming available at different levels (or scales) of biological organization, as well as growing data size. Ironically, despite the massive increase in available data, the data remain incomplete and noisy. This means that the research directions in the field also need to evolve.

This paper discusses the current state as well as the future of the field. Its goal is to identify pressing challenges with well-established as well as emerging topics in network biology, which are shown in Fig. 1: inference and comparison of biological networks (Section 2), multimodal data integration and heterogeneous networks (Section 3), higher-order network analysis (Section 4), machine learning on networks (Section 5), and network-based personalized medicine (Section 6). We comment on why these topics have been strategically chosen for discussion in this paper.

Noting again that a key focus of our paper is on molecular/cellular (i.e., -omics) data, certain types of -omics data are explicitly captured as networks. That is, interactions between biomolecules are measured explicitly by biotechnological data collection platforms. A prominent example is protein-protein interaction (PPI) networks. In these networks, nodes are proteins and edges correspond to physical bindings between the proteins. In human and some model organisms, extensive high-throughput yeast two-hybrid and other experimental efforts have resulted in large sets of “reference” PPIs (such as HURI for human), along with substantial knowledge about protein binding specificities luck2020reference ; stark2006biogrid .

Other types of -omics data are not captured as networks explicitly, but interactions between biomolecules can be inferred computationally, resulting in, e.g., association, correlation, regulatory, or knowledge graphs (InfoBox 1). Section 2 addresses several aspects of the task of inferring a homogeneous network, including a condition-specific network, typically from up to a couple of -omics data types/modes, along with a related topic of differential network analysis, which is one type of network comparison. Section 3 addresses the task of inferring a heterogeneous network, typically from diverse -omics or other multimodal data types (InfoBox 1), along with several other tasks related to multi-omics data integration, including network alignment, which is another type of network comparison. By a homogeneous network, we mean a network with a single node type and a single edge type, while by a heterogeneous network, we mean any non-homogeneous network (i.e., multiple node types or multiple edge types or both); see InfoBox 1 and Section 3 for details.

Given (explicitly captured or inferred) network data, the next step is to analyze the data. While Sections 2 and 3 already address network analysis from the perspective of network comparison and several other tasks, Sections 4 and 5 further discuss prominent tasks related to network analysis. Namely, Section 4 discusses topics of capturing higher-order network structures called graphlets (subgraphs) in traditionally used pairwise graphs, which capture interactions between pairs of nodes, as well as shifting from pairwise graphs to hypergraphs, which are capable of capturing interactions between more than two nodes (InfoBox 1). Section 5 discusses machine learning advances in network biology, which has grown exponentially in the last decade. Key topics discussed include graph representation learning, incorporating knowledge into machine learning models, generative graph modeling, and transfer learning.

Section 6 complements the other, computationally focused sections by discussing an applied aspect of network biology: network-based personalized (or precision) medicine. Precision medicine aims to provide tailored treatment strategies for individuals aronson2015building ; kaiser2015nih . This personalized characterization may include molecular, environmental, lifestyle, and other factors. Integrating such different data types via network approaches can expand the potential for precision therapeutics while providing robustness to various types of data noise wang2014similarity .

The five topics are not mutually exclusive. For example, multimodal (including multi-omics) data integration is a topic relevant to almost all of Sections 2-6. After the current research network biology advances are presented in these five sections, Section 7 discusses future research directions in the field, and Section 8 provides additional discussion on scientific communities, education/training, and diversity in computational (including network) biology.

{InfoBox*}

[!t] [] • A (pairwise, homogeneous) graph (or network) $\mathcal{G}=(\mathcal{V},\mathcal{E})$ is defined by a set of nodes (or vertices) $\mathcal{V}$ and a set of edges (or links) $\mathcal{E}$ . All nodes $v\in\mathcal{V}$ are of the same type. An edge $e_{u,v}\in\mathcal{E}$ indicates a relationship between exactly two nodes $u,v\in\mathcal{V}$ . • In a protein-protein interaction (PPI) network, nodes are proteins and edges correspond to physical bindings between proteins. Such a network of physical PPIs is also referred to as interactome. • A (physical) PPI network is a special type of an association network between proteins. In addition to physical PPIs, an association network may contain links between proteins derived from sequence or 3D structural similarities, genetic interactions, literature-mined edges, or other protein association types. • Correlation networks are calculated from -omics data collected across multiple samples. A prominent type are gene co-expression networks, where nodes (genes) are linked by undirected edges if the genes’ expression levels are correlated strongly enough across the samples. • Regulatory networks capture directed relationships between regulators and their targets and describe causal (rather than correlative) relationships between biomolecules. A prominent type are gene regulatory networks where the regulators are transcription factor proteins (or other molecules that impact gene expression, such as microRNAs) and the targets are genes. • Biomedical knowledge graphs describe semantic relationships between diverse biomedical entities (e.g., genes, diseases, and patients, as well as associated measurements). They represent facts using “subject-predicate-object” triples as the fundamental unit; the subject and object are nodes in the graph and the predicate (or relation) corresponds to a directed edge between the nodes. • A condition-unspecific (or context-unaware) network spans multiple conditions/contexts such as diseases, ages, cell types, tissues, etc., and ultimately, individuals. • A condition-specific network is inferred by integrating a context-unaware network with condition-specific node measurement (e.g., gene expression or mutation) data. The outcome of the data integration is identification of network regions that are “active” in the given condition, which can be seen as condition-specific or disease-dysregulated pathways (sparse, tree-like subnetworks) or functional modules (dense, clique-like subnetworks). • A heterogeneous graph contains multiple types of nodes and/or edges. • A multiplex graph is a heterogeneous graph with multiple edge types between the same nodes, possibly nodes of a single type, in which case the heterogeneity comes from the different edge types. • A network-of-networks is a heterogeneous graph in which different node types exist at different scales (or levels) and nodes at a higher level are graphs themselves at the lower level. • Multimodal data that are represented as a heterogeneous graph in network biology include multi-omic data such as epigenomic, transcriptomic, proteomic, and metabolomic molecular measurements as well as non-molecular data such as text and images from e.g., patients’ electronic health records. • A hypergraph is a generalization of a (pairwise) graph in which an edge (also called a hyperedge) can connect any number (including more than two) of the nodes. • A subgraph (or subnetwork) $\mathcal{G}_{S}=(\mathcal{V}_{S},\mathcal{E}_{S})$ of a graph $\mathcal{G}=(\mathcal{V},\mathcal{E})$ consists of a set of nodes $\mathcal{V}_{S}\subseteq\mathcal{V}$ and a set of edges $\mathcal{E}_{S}\subseteq\mathcal{E}$ such that for each edge $e\in\mathcal{E}_{S}$ , both of its end nodes must be in $\mathcal{V}_{S}$ . • A subgraph is induced if and only if all edges between the nodes in $\mathcal{V_{S}}$ that exist in $\mathcal{E}$ are in $\mathcal{E_{S}}$ . • Graphlets are connected, non-isomorphic, induced subgraphs of a (pairwise) graph. • Hypergraphlets are graphlet extensions from (pairwise) graphs to hypergraphs. • A cluster or community in a graph is a set of topologically related nodes, typically densely connected to each other and loosely connected to nodes in other clusters.

Basic terminology used in the paper. Note that distinct scientific communities in network biology, including graph theory, network science, data mining, machine learning, and artificial intelligence, may use varied terminology for the same concepts or identical terms for different concepts.

Refer to caption — Figure 1: Overview of the network biology field and five research topics discussed in this paper. The word cloud in the center, generated using WordClouds.com, contains the top 30 most representative words from this paper. Note that each word’s rank is based on the sum of the weights of the core word (e.g., learn) and its derived words (e.g., learns, learning, learned).

2 2. Inference and comparison of biological networks

Inference of a network from non-network data. Biological networks that are computationally inferred from non-network -omics data can be categorized into three broad types: association networks, correlation networks, and regulatory networks. The three network types are defined briefly in InfoBox 1, discussed in detail in the following text, and illustrated in Fig. 2A.

Association networks typically capture undirected and unsigned relationships between biological molecules; while they might contain experimentally derived interactions, they may also contain interactions derived computationally from a variety of possible data sources. One of the most common types of association networks are (physical) PPI networks, which are explicitly derived via high-throughput experiments (Section 1) luck2020reference . These experiments, primarily co-immunoprecipitation and yeast two-hybrid, differ in their estimated error rates and can produce both false positives and false negatives bader2004gaining ; sprinzak2003reliable ; von2002comparative . In addition, for all but the yeast interactome, where a substantial fraction of pairs of proteins have been assayed, even in most model organisms, the majority of pairs of proteins have not been tested for interaction sledzieski2021d . Thus, even across all the myriad sources of PPI networks, there is much missing data sledzieski2021d . In addition to physical PPIs, many public resources curate associations between biomolecules from many data sources bajpai2020systematic ; wright2024state . For example, the widely-used STRING association network szklarczyk2023string contains interactions between proteins derived from sequence or 3D structural similarities, genetic interactions, literature-mined edges, or other types of pairwise protein associations that are distinct from physical binding between proteins.

In an association network composed of genetic interactions (also known as a genetic interaction network), an edge between nodes (genes/proteins) indicates that mutations or other perturbations to the two nodes produce an unexpected cellular phenotype baryshnikova2013genetic . An example of a genetic interaction is when mutations in both of the genes/proteins result in cell death, i.e., are lethal, while the cell remains viable when there is a mutation in just one of them. A weighted version of a genetic interaction network also exists, in which edge weights indicate how strong or weak the observed double mutant phenotype, such as cell growth rate, is compared to the expected phenotype costanzo2016global .

Challenges with association networks are that they are generally not condition-specific and also contain interactions derived for multiple different types of evidence, with different evidence sources having different quality levels and representing different types of biological relationships. Additional investigation of how different evidence sources influence network analysis results is often required Kim2021HumanNetv3 . Although the biological relationships represented in PPI networks and genetic interaction networks are easier to interpret, these networks tend to be incomplete and noisy and only exist for a limited number of species and biological conditions, limiting their use rolland2014proteome ; Zitnik2019Evolution .

Correlation networks are typically calculated from -omics data collected across multiple samples (time points, tissues, patients, ages, drugs, or other conditions); relationships in correlation networks are typically undirected and signed, depending on how the network is inferred. Among the most prominent types of correlation networks are gene co-expression networks. Namely, given transcriptomics data containing the expression (i.e., mRNA abundance) levels of genes across multiple samples, a gene co-expression network can be constructed by linking nodes (genes) via edges if the genes’ expression levels are correlated strongly enough across the samples. In addition to being used to capture gene co-expression, correlation networks have been applied in biomedicine to study relationships between many other types of elements, such as metabolites perez2020network , disease biomarkers chu2014analyzing ; huang2019network ; nishihara2017biomarker , and even foods kim2015uncovering ; samieri2020using . Correlation networks are widely used in biomedical applications due to their simplicity and the ease with which they can be generated and interpreted huang2019network ; lee2021changes ; pierson2015sharing ; samieri2020using . Pearson correlation is the most common measure for calculating correlation networks, i.e., determining which gene pairs should be linked by edges, although other measures, such as Spearman correlation or mutual information, are also used, depending on the nature of the data and nonlinearity of the relationships being captured reshef2011detecting . Multiple algorithms and tools have been developed for inferring correlation networks, including ARACNe margolin2006aracne , which calculates the mutual information between pairs of nodes and then removes indirect relationships; CLR faith2007large , which calculates the mutual information between pairs of nodes and then $z$ -score normalizes; WGCNA zhang2005general , which scales the Pearson correlation to generate a scale-free network topology (or network structure); and wTO gysi2018wto , which normalizes the chosen correlation by all other correlations and calculates a probability for each edge.

One advantage of correlation networks compared to association networks, especially PPI networks resulting from high-throughput experiments, is that correlation networks are explicitly derived from condition-specific -omics data, while association networks generally do not capture condition-specific information sonawane2019network . However, despite their popularity, correlation networks have multiple known limitations. One limitation is difficulty translating to biological mechanisms larsen2019coli . Another limitation is that different network inference methods yield significant dissimilarities in the topology as well as functional content between the resulting correlation networks rider2014 . For example, when multiple methods are applied to infer gene co-expression networks based on the same underlying data, the resulting networks tend to capture different sets of edges between the same nodes; furthermore, when those networks are used to predict genes’ functional annotations such as Gene Ontology (GO) terms, the results often differ li2023enhancing . Sometimes it might be helpful to combine networks inferred using different methods into a consensus network gysi2018wto ; li2023enhancing , where edges are re-weighted so that the more networks support an edge and the more strongly they support it, the higher its consensus weight or probability. A further limitation of gene co-expression networks is that co-expression between two genes occurs when one gene regulates another or when two genes are targeted by the same regulator ku2012interpreting ; yin2021emergence . However, these two distinct biological scenarios are represented the same way in a co-expression network, by linking the two genes with an undirected edge. Instead, regulatory networks can distinguish between the different scenarios, as discussed next.

Regulatory networks capture directed relationships between regulators and their targets and describe causal (rather than just correlative) relationships between biomolecules; although these networks in theory should be signed, in practice deriving the sign of regulatory relationships from high-throughput biological data is challenging. There are many types of regulatory networks in biology. However, for most inferred regulatory networks, the regulators are transcription factor (TF) proteins (or other molecules that impact gene expression such as microRNAs) and the targets are genes; these are commonly referred to as gene regulatory networks. There are many approaches to infer gene regulatory networks. For example, TF-gene relationships can be measured experimentally through ChIP-sequencing. In this case, the presence of a TF binding in the regulatory region(s) of a gene can be used to infer an edge from that TF to the gene. However, the cost and experimental limitations make it impossible to infer a complete gene regulatory network in this way. Therefore, many computational approaches have been developed to infer gene regulatory networks. For example, the DNA sequence of gene regulatory regions can be scanned to identify matching patterns (known as sequence motifs) that indicate a potential TF binding site; however, linking TFs to genes based on DNA sequence alone does not give a condition-specific network. Thus, methods to infer gene regulatory networks typically use gene expression data, either alone or in combination with computational evidence for TF binding in gene promoters, to infer TF-gene relationships marbach2012wisdom . Popular algorithms of this type include Inferelater bonneau2006inferelator , which uses linear regression, L1 shrinkage, and LASSO to identify a set of parsimonious models to predict target gene expression levels from TF expression levels (and other factors); GENIE3 huynh2010inferring , which uses tree-based ensemble methods to develop a set of regression problems that predict the expression pattern of each target gene from the expression of a set of input TF genes; and PANDA glass2013passing , which uses message passing to amplify consistent structures across three input data types: TF-TF PPIs, computationally inferred TF-gene relationships, and gene-gene co-expressions. As opposed to Inferelator and GENIE3, PANDA does not consider the expression levels of TFs but instead uses evidence of co-expression in genes as evidence of targeting by the same TF. In contrast, a recent method NETREX-CF incorporates, among other techniques, a machine learning approach known as collaborative filtering to deal with missing data wang2022netrex .

Other methods to infer regulatory networks incorporate epigenetic data. In particular, chromatin state can indicate whether the DNA is “open” and available to be bound by a TF; thus, computational evidence for TF binding in gene regulatory regions that also overlap with open chromatin can be used to estimate cell type-specific networks neph2012circuitry . Specific algorithms to infer gene regulatory networks using epigenetic data include TEPIC schmidt2017combining ; schmidt2019tepic , which combines TF binding affinities, chromatin state data, and gene annotation data to predict TF-gene relationships, and SPIDER sonawane2021constructing , which uses message passing to infer and amplify consistent structure in an epigenetically-pruned gene regulatory network constructed by combining computational evidence for TF binding with open chromatin data. Both TEPIC and SPIDER can also (optionally) incorporate gene expression data. Despite multiple methods in this area (including many beyond those described here), it remains challenging to integrate multiple types of -omics data to effectively infer accurate condition-specific regulatory networks; we elaborate on this challenge in subsection “Inference of a heterogeneous network from multimodal data” of Section 3.

Link prediction: inference of new interactions from existing network data. Link prediction is applicable to any network type, but in network biology, it has prominently been used in association networks containing interactions between proteins. Regardless of the type of data used to construct an association network, the resulting network is often incomplete. For example, many pairs of proteins in an organism may yet to be assayed for physical interaction. However, the “guilt by association” principles that underly the topological organization of most of these networks cowen2017network mean that the patterns of connection of existing links can reliably predict some of the missing edges. We refer to this as network-based link prediction. Network-based prediction of new interactions between proteins often uses either a relatively simple rule (e.g., it may be desirable to link nodes that have high degrees, that have many common interacting partners – or neighbors – either direct or extended ones, that share many paths, or that are topologically similar Hulovatyy2014 ) or more sophisticated diffusion-based network embeddings cocskun2021node ; cowen2017network ; devkota2020glide ; hamilton2018embedding ; huang2020skipgnn ; kovacs2019network ; Yuen2020better . A mixture of these strategies, where simple rules are employed in the core of the network, and diffusion-based network embeddings are employed outside the core, perform particularly well. However, the set of rules and the embedding used matters devkota2020glide , especially because interaction patterns may be quite different in networks containing physical PPIs vs. those containing inferred, non-physical associations between proteins.

Link prediction: other techniques to infer missing interactions. Beyond methods that leverage the topology of the known interactions, the other methods to infer missing interactions will vary based on the underlying type of protein association data used to construct the network. For example, for physical PPI prediction, classical techniques such as docking can also be used when protein 3D structural models are available. With the rise of deep learning methods such as AlphaFold jumper2021highly , ESMFold lin2022language , and OmegaFold wu2022high , now a 3D structural model is usually available for most proteins. AlphaFold-Multimer evans2021protein is a recent deep learning-based extension of AlphaFold that allows for predicting protein complexes, i.e., the quaternary structure of multiple proteins; then, it might be possible to use the confidence score of the predicted structure to predict whether the proteins interact or not. The predicted quaternary structure also provides the interaction interfaces between the proteins.

When the goal is ultra-fast prediction (for example, in order to perform genome-wide scans), there are alternative deep learning methods chen2019multifaceted ; hashemifar2018predicting ; sledzieski2021d ; zhang2018predicting that have had success in sequence-based prediction of PPIs. These methods focus on computational speed. That is, like the network-based methods, they seek to predict only whether (rather than also how, which is more challenging) two protein sequences interact, so that it is tractable to make predictions for all the protein pairs in the network. However, we note that some of these sequence-based methods manage to implicitly incorporate information about protein 3D structures. For example, D-SCRIPT sledzieski2021d uses a pretrained protein language model bepler2021learning and implicitly learns a fuzzy contact map representation.

How to simultaneously leverage network- and sequence-based link prediction for physical PPI data remains an open problem, with valuable initial work bepler2021learning . Also, evaluating link prediction methods and especially hybrid methods is tricky. This is because existing ground-truth networks (other than HURI luck2020reference ) are biased by the portions of the networks containing well-studied proteins and pathways Schaefer2015correcting . So, it is difficult to come up with fair performance measures that are not biased by node degrees, and that do not advantage network-based methods while disadvantaging sequence-based methods Singh2022TopsyTurvy ; Wang2023assesment . On the other hand, sequence-based approaches do better on close homologs of known interacting protein pairs Sledzieski2023TT3D .

Other researchers have noted that databases that amalgamate physical PPI data have not always kept up with the literature, and have proposed text-mining approaches to predict these “missing” links kim2008pie ; papanikolaou2015protein ; van2009novel .

Inference of a condition-specific network. While existing biological network data resulting from extensive experimental efforts are an incredible resource, they typically do not capture how interactions in biological networks differ across conditions, i.e., they are context-unaware. By conditions, we mean diseases, ages, cell types, tissues, etc., and ultimately, individuals. Indeed, while human genomes in both healthy and disease populations are rapidly being sequenced, the corresponding condition-specific networks remain largely unknown. Moreover, the substantial amount of genetic variation across populations makes it infeasible in the near term to experimentally determine the full impact of this variation on interactions. So, computational methods have played and will continue to play a major role in inferring condition-specific networks.

We divide computational approaches for inferring condition-specific networks into several broad categories. (1) The first category are approaches that assess whether mutations observed in disease alter protein interactions. (2) The second category are approaches that combine mutation data (e.g., on how many patients with a disease have genes containing significantly associated single nucleotide polymorphisms, indels, etc.) or condition-specific gene expression data (e.g., information on which genes are significantly expressed – or active – in a given condition (here, typically multiple data samples are needed per condition) with a PPI network. The goal is to identify PPIs that are dysregulated in a given disease or active in a given condition, i.e., to infer a condition-specific PPI network (Fig. 2C). (3) The third category are approaches that use gene expression data to infer a correlation network specific to the condition or sample of interest. (4) The fourth category are analogs of the previous approaches but applied to regulatory networks rather than PPI or correlation networks.

Regarding the first approach category, significant computational efforts have focused on characterizing whether mutations observed in disease and variants across populations alter protein interactions. Early work mapping mutations observed in Mendelian diseases onto protein structures demonstrated that there is a statistically significant enrichment of Mendelian disease mutations in protein interaction interfaces, as compared to neutral polymorphisms observed across populations gao2015insights . Homology modeling and domain-based approaches to identify sites that participate in interactions with DNA, RNA, peptides, ions, and small molecules have revealed that missense mutations observed in Mendelian diseases and somatic missense mutations in cancer are both enriched in these sites, with the strongest enrichments for DNA-binding sites, while common variants are depleted from these sites ghersi2014interaction ; kobren2019systematic . Further, these enrichments can be leveraged to identify cancer-relevant genes by developing statistical approaches to uncover proteins with more somatic missense mutations in their binding sites than expected ghersi2014interaction ; kobren2020pertinint . Protein interaction interfaces, as identified by homology modeling mosca2013interactome3d and machine learning meyer2018interactome , have also been shown to be enriched in somatic missense mutations as compared to non-interface residues, and specific protein interactions relevant for cancer have been identified cheng2021comprehensive . High-throughput experimental screens have led to estimates that two thirds of disease-causing polymorphisms perturb protein interactions, with about half of these interrupting specific protein interactions while leaving other interactions unaffected sahni2015widespread .

Regarding the second approach category, numerous computational efforts have focused on integrating condition-specific molecular measurements, mainly gene mutation or expression data (also referred to as gene activity data), with PPI network data (which is generally not condition-specific, i.e., is context-unaware). They do so by mapping the gene activities onto the corresponding proteins in the PPI network, in order to assign condition-specific weights to the proteins or PPIs (or both) in the network (Fig. 2C). Then, highly weighted PPI network regions are hypothesized to be pathways dysregulated in disease (if using mutation data) or condition-specific subnetworks (if using expression data) Leiserson2015 ; newaz2020improving . The set of all such PPIs/pathways/subnetworks is a condition-specific PPI network. The data integration step is often performed via network propagation cowen2017network , which diffuses the gene activities through the PPI network via random walks. Nonetheless, other approach types exist such as kernel, Bayesian, or non-negative matrix factorization methods newaz2020improving .

Prominent applications of approaches from the second category have been studying cancer Leiserson2015 ; silverbush2019simultaneous , tissue-specificity Basha2020 , aging li2022towards , and genome-wide associations carlin2019fast ; vanunu2010associating . As an example, cancer-related gene mutation data was integrated with PPI data using the HotNet2 algorithm to identify the parts of the PPI network that are likely to be active in cancer Leiserson2015 . Such a cancer-specific network is not necessarily connected, i.e., it might consist of multiple connected components, each of which can be thought of as a cancer-specific pathway or subnetwork. As another example, a general framework was proposed for assessing the ability of condition-specific PPI network inference approaches to illuminate tissue-specific processes and disease genes Basha2020 . This framework integrated RNA-sequencing profiles for 34 human tissues with a PPI network to create 34 tissue-specific PPI networks. Here, all tissue-specific PPI networks contained the same nodes and interactions, and they differed “only” in the weights associated with them. Then, given data associating GO biological processes to their relevant human tissues, this framework allows different condition-specific PPI network inference approaches to be benchmarked via enrichment tests in terms of their ability to recover tissue-specific processes. As a final example, unlike in the above applications where the inferred cancer- and tissue-specific networks were static, when studying human aging, which is a dynamic biological process, it is desired to infer a dynamic aging-specific network. Of the pioneering approaches towards this goal li2020supervisedTCBB ; newaz2020improving ; li2021improved ; li2022towards , a recent finding is that inferring an aging-specific PPI network that is both weighted and dynamic (as opposed to unweighted or static) results in the most accurate prediction of aging-related genes li2021improved . To infer this network, network propagation was used to map gene expression-based weights at different ages onto nodes in a PPI network. This resulted in a weighted network snapshot for each age, where the different snapshots had the same nodes and PPIs and “only” differed in their age-specific weights. The collection of all age-specific snapshots formed a weighted dynamic aging-specific PPI network. Then, aging-related genes can be predicted from this network, as discussed below li2021improved ; li2022towards .

An important issue in identifying condition-specific networks and especially disease-altered subnetworks via the above approaches is to determine whether the resulting (sub)networks are due to the molecular measurements (i.e., mutation or expression data) alone, the PPI network topology alone (e.g., due to ascertainment bias in PPI network data), or genuinely a combination of both molecular measurement and network data. Recent work has shown that in some applications there may be a narrow regime where both molecular data and network information contribute to the identification of disease-dysregulated subnetworks chitra2022netmix2 ; reyna2021netmix .

Regarding the third approach category, condition-specific correlation networks are most often derived by applying a correlation measure to subsets of related samples pierson2015sharing . However, since correlation measures rely on defining a distribution, this approach is inappropriate when a specific condition is represented by only a few (or even a single) sample(s). However, recently methods have been developed to infer “sample-specific correlations”. That is, given a set of gene expression samples (across which correlation can be measured), these approaches can estimate one network for each individual sample in the input dataset. In particular, both SSN liu2016personalized and LIONESS kuijjer2019lionessr ; kuijjer10estimating work by computing two correlation networks, one with all samples and one with all samples except an individual sample of interest. Then, they use the difference between the two networks to estimate a correlation network specific to the sample of interest.

Finally, regarding the fourth approach category, genetic variants can impact gene regulatory networks by, for example, altering TF binding or allele-specific expression przytycki2020differential . Recall that missense mutations are enriched in sites that participate in interactions with DNA, RNA, peptides, ions, and small molecules, with the strongest enrichments for DNA-binding sites ghersi2014interaction ; kobren2019systematic . Also, recall that statistical approaches to identify proteins with more somatic missense mutations in their binding sites than expected by chance have identified cancer-relevant genes kobren2020pertinint ; kobren2019systematic . Deep learning approaches trained on DNA binding data from ENCODE encode2020expanded have also been used to assess whether DNA mutations impact TF binding in a tissue-specific manner Zhou2015 . For some TFs, altered DNA-binding specificities can be predicted de novo using machine learning christensen2012recognition ; persikov2014novo ; sahni2015widespread ; wetzel2022learning . However, if a DNA-binding protein’s specificity is known a priori, then it is more accurate to instead predict how mutations alter that specificity rather than predict specificities de novo. For example, accurate predictions about how mutations alter DNA-binding specificities for homeodomain proteins were made by simultaneously learning interaction interfaces between DNA-binding proteins and their binding sites together with a predictive approach for DNA-binding specificity wetzel2022learning . Extending this approach to all DNA-binding proteins represents an important avenue for future work.

There has been also significant work done to infer condition-specific regulatory networks from various types of -omics data, as has been extensively reviewed in baur2020data . As one example, PANDA was applied to subsets of GTEx gene expression data to infer 38 tissue-specific gene regulatory networks sonawane2017understanding ; then, it was found that changes in TF targeting patterns led to the creation of new regulatory paths, giving them transcriptional control of tissue-specific processes. There also exist approaches that can be used to infer sample-specific networks for different -omics data types. For example, EGRET integrates predicted TF binding sites with genotype and expression quantitative trait loci data to create individual genotype-specific regulatory networks weighill2022predicting . The SPIDER sonawane2021constructing and TEPIC schmidt2017combining ; schmidt2019tepic methods (described above) can be applied to individual epigenetic profiles to generate sample-specific regulatory networks. PSIONIC learns patient-specific TF regression weights by using chromatin-filtered TF-gene relationships to predict gene expression. Finally, the LIONESS method kuijjer10estimating can be used together with existing gene regulatory network reconstruction approaches that leverage gene expression data. When applying it in the same way as already described for correlation networks (the third approach category above), the LIONESS framework uses two estimated gene regulatory networks, one inferred with all gene expression samples and one inferred with all samples except one, to estimate a gene regulatory network specific to that sample kuijjer10estimating .

Differential network analysis: comparison of condition-specific networks. Condition-specific networks often have the same set of nodes and differ only in terms of their edges. Many approaches have been developed to identify network regions that differ the most between condition-specific networks; such regions have been shown to be responsible for the underlying biological differences between e.g., healthy and disease conditions, between different tissues, or between young and old ages basha2018differentialnet ; lichtblau2017comparative , as discussed in more detail below. In general, approaches for this task can be characterized in several ways.

One category is based on the stage of network analysis, i.e., when differences between condition-specific networks are measured. Given condition-specific networks, one option is to first compute some topological property of a network region (at the level of a node, edge, network cluster – group of highly interconnected nodes – or entire network; see below) in each condition-specific network and then measure the extent of change in that property across the networks/conditions; the goal is to identify network regions that change the most lichtblau2017comparative ; zhu2016metadcn . By a topological property, we mean a quantifiable measure of network structure such as the degree distribution of a network (the percentage of nodes in the network that have a given number of neighbors, i.e., degree), or centrality measures that rank nodes in a network from most to least central/important (examples are degree centrality according to which nodes with high degrees are central, and betweenness centrality according to which nodes that are on many shortest paths are central) Barabasi2016 ; newaz2019bookchapter ; newman2018networks .

A potential issue is that some topological properties, and especially centrality measures, are meaningful when used within a network but not necessarily when compared across networks newman2018networks . As an alternative, approaches exist that first use the condition-specific networks to infer a single differential network that intuitively captures edges that differ between the conditions (Fig. 2D); only then, a desired topological property (e.g., centrality of each node) in the differential network is computed to identify network regions that are the most relevant (e.g., central/important) for the underlying condition-specific differences ruan2015differential .

The other category is based on the level of topology, i.e., where differences between condition-specific networks are measured: at the node weighill2021gene , edge glass2015network , cluster padi2018detecting , or entire network level newaz2020improving . At the node level, differences in centrality (e.g., degree or betweenness) are often used to identify the biomolecules around which network connectivity varies the most between the compared conditions. For example, “differential targeting,” i.e., the difference in gene targeting – or the sum of the weights for all incoming edges to a gene – between two gene regulatory networks was used in combination with standard gene set enrichment tools to identify over-represented biological processes in pancreatic ductal adenocarcinoma subtypes weighill2021gene . At the edge level, the goal is typically to determine edges specific to a given condition. This can be done in multiple ways, by taking, for example, a certain percentage of the highest-weight edges, all edges above a given threshold, edges that have higher weights in one condition compared to others sonawane2017understanding , or a combination of these glass2015network . For example, the tissue-specific PPI networks discussed above, which were defined by differential edge scores, were correctly enriched in their respective tissue-associated biological processes; also, when the top 1% of the differential edges were considered, the resulting differential network regions were correctly enriched in genes related to diseases associated with their respective tissues Basha2020 . Linking this discussion to the first approach category described above, it is important to note that although node centralities are often determined for each condition-specific network and then compared across the networks, they can also be calculated for a network defined by condition-specific edges. For example, degree and betweenness centralities of all genes in 38 tissue-specific gene regulatory networks were used to show that tissue-specific genes tended to assume bottleneck positions in their corresponding networks; in parallel, tissue-specific edges were identified by comparing the weight of each edge in a given tissue to the distribution of that edge’s weight across all tissues, and it was found that the tissue-specific edges were enriched for connections between tissue-specific genes and depleted for canonical interactions sonawane2017understanding . At the cluster level, for example, given two condition-specific networks, ALPACA padi2018detecting identifies clusters that are shared between networks and distinct to each network. Heterogeneous (specifically, multiplex; Section 3) clustering algorithms mucha2010community could also perhaps be useful for identifying such clusters. At the level of entire networks, typically their pairwise edge overlaps, as measured by e.g., the Jaccard index, are used to quantify their pairwise (dis)similarities newaz2020improving .

We comment on two additional aspects of differential network analysis. First, while some condition-specific networks are derived from multiple data samples, sample-specific networks have the additional benefit of being able to be compared while accounting for other potentially relevant biomedical information kuijjer10estimating . For example, the same statistical tools employed for differential gene expression analysis can be used to determine significant changes in the node-, edge-, cluster-, and network-level topological properties between sets of sample-specific networks. Importantly, this allows topological properties to be evaluated in the context of relevant biological and phenotypic variables, as well as potential confounders. For example, limma ritchie2015limma was applied to compare features between male and female sample-specific gene regulatory networks while controlling for relevant confounders such as body mass index and age; node, edge, and TF-targeting was identified specific to males and females across 29 different tissues lopes2020sex , as well as sex-specific targeting of the drug metabolism pathway in colon cancer lopes2018gene .

Second, while the above discussion applies to all condition types, including temporal ones, we explicitly wish to comment more on approaches for characterizing how networks change over time teschendorff2021statistical . A prominent application in this context has been studying the change of PPI network topology with age. The process of inferring an aging-specific PPI network has already been discussed above. Here, we comment on how such a network, consisting of network snapshots corresponding to different ages, is analyzed. Original studies asked whether the overall, or global, topology changed with age, by: measuring pairwise edge overlaps between the snapshots; evaluating whether the snapshots’ properties such as the average clustering coefficient, diameter, and graphlet degree distributions changed with age; and evaluating the fit of each snapshot to random (e.g., scale-free or geometric) graphs faisal2014dynamic ; newaz2020improving . Global topologies of the age-specific snapshots did not significantly change with age. It was then analyzed whether local topological positions of nodes as measured by (normalized) centralities changed with age. Hundreds of such genes were identified and predicted as aging-related; the predictions were validated via functional enrichment analyses faisal2014dynamic ; newaz2020improving .

Unlike such unsupervised prediction of aging-related genes, in recent work li2021improved ; li2022towards , supervised prediction was performed: by relying on knowledge about which genes are aging- versus non-aging-related Magalhaes2009a , new aging-related genes were predicted if their evolving topologies in a dynamic aging-specific PPI network matched topologies of the known aging-related genes. Recall that the state-of-the-art aging-specific dynamic PPI network is weighted. So, weighted node topological measures were used as features for supervised prediction that were simple extensions of unweighted centralities. Also, more advanced measures were proposed, which account for how the distribution of edge weights in the given node’s (extended) network neighborhood changes with age, i.e., across the network snapshots li2021improved . A parallel line of work focused on studying how clusters, i.e., community structure, in a dynamic aging-specific human PPI network changed with age, and it was shown that the most prominent changes in the community structure correspond to ages that reflect known shifts from one stage of human lifespan to another ClueNet ; SCOUT .

Another prominent point of discussion in the temporal/dynamic context are theoretical studies of molecular networks and observations of cell differentiation (i.e., the transition of a cell from one type to another), which indicate that cellular transitions can be smooth or nonlinear, gradual or abrupt moris2016transition ; nykter2008gene . Computational methods to characterize these transitions using single-cell gene expression data include MuTrans zhou2021dissecting , QuanTC sha2020inference , and BioTIP yang2022detecting . These methods use different statistical approaches (stochastic differential equations, unsupervised learning of cell plasticity, or co-expression) and underlying theories (entropy and energy or tipping-point theory), but converge at the same best-studied bifurcations in six datasets yang2022detecting .

Other types of network comparison. Differential network analysis is one type of network comparison, in which networks being compared have the exact same nodes and differ “only” in their edges (or edge weights). In other words, the mapping between the nodes of the compared networks is known. A complementary category of network comparison includes approaches that compare networks when their node mapping is unknown. Here, there are two distinct types: (1) network alignment or alignment-based network comparison and (2) alignment-free network comparison Yaverouglu2015 .

Alignment-based network comparison aims to find a mapping between the nodes of the compared networks that optimizes some objective function; this typically means conserving many edges and a large subgraph between the networks faisal2015post ; Guzzi2017 ; Yaverouglu2015 . This approach category is useful for comparing biological networks of different species to identify evolutionary conserved parts of the networks. Consequently, network alignment allows for transferring biological knowledge (e.g., proteins’ functional annotations or PPIs) between aligned network regions across the compared species; also, it can complement sequence alignment by allowing for identification of protein orthology relationships based on the proteins’ PPI network rather than (just) sequence similarities. Note that even when aligning homogeneous networks, the problem of network alignment can be viewed as integrating these networks into a heterogeneous (specifically, multiplex; Section 3) network representation. For this reason, and because recently methods have been proposed that actually align heterogeneous networks, we discuss algorithmic aspects of network alignment in the more appropriate Section 3. Here, we mainly aim to contrast general working principles of the different types of network comparison.

In contrast to alignment-based comparison, alignment-free network comparison simply aims to quantify the overall topological similarity between networks, regardless of a node mapping between the networks, and without intending to identify any conserved network regions; this typically means comparing some topological properties between networks, such as their (graphlet) degree distributions newaz2019bookchapter ; Yaverouglu2015 . Alignment-free network comparison is most often used to evaluate the fit of a random graph (e.g., scale-free or geometric) to a real-world network; also, it can identify groups/families of networks that are topologically similar to each other Yaverouglu2015 . Given that alignment-free network comparison approaches do not aim to produce a node mapping between the compared networks, while alignment-based approaches do, the former are typically computationally more efficient than the latter Yaverouglu2015 .

3 3. Multimodal data integration and heterogeneous networks

Overview. Network representations of biological systems, from cells to ecosystems, are naturally heterogeneous, consisting of multiple types of nodes and interactions de2023more . This section focuses on prominent computational challenges related to inference and analysis of heterogeneous networks. Broadly, a heterogeneous network is defined as a representation of multimodal data where each data mode corresponds to a different node or edge type. In the literature, the term “heterogeneous network” has often been used as a synonym to, e.g., a multiplex, interdependent, multiscale, or multilayer network. The challenge is that sometimes different terminologies are used for the same concept, or the same terminology is used for different concepts; the disparate terminology associated with heterogeneous networks can reflect nuances in their frameworks kivela2014multilayer . Here is the terminology from the existing literature (e.g., gu2022modeling ; pio2021multiverse ) that we use in this paper (Fig. 3A).

A heterogeneous network is a network with multiple node types and/or multiple edge types. A multiplex network is a special type of heterogeneous network with multiple edge types between the same nodes, possibly nodes of a single type, in which case the heterogeneity comes from the different edge types. A multiplex network can be viewed as being composed of different network layers sharing the same set (replica) of nodes but each layer having distinct edge types kinsley2020multilayer . An example of this type in biology is a molecular network capturing different types of relationships, such as physical interactions, functional relationships, and sequence similarities between proteins. A typical heterogeneous network, including those discussed in this section, contains both distinct node types and (by definition) distinct edge types. An example of this type is a molecular network representing relationships among heterogeneous node types such as genes, transcripts, proteins, and metabolites. Another example is a knowledge graph representing semantic relationships between node types such as genes, patients, drugs, and diseases. Another level of complexity is handling distinct node types at different scales (or levels) of biological organization, e.g., node types resulting from data modalities that capture molecular measurements in epigenomic, transcriptomic, proteomic, and metabolomic assays and from non-molecular text and imaging data. Here, a network-of-networks is a special case in which a node at a given scale is a network at the lower scale. For example, a node (protein) in a PPI network can be represented as a protein structure network in which nodes are the protein’s amino acids and edges link amino acids that are close enough in the protein’s 3D fold gu2022modeling .

The broad definition of a heterogeneous network that we use subsumes any network type that is not a homogeneous (single node type and single edge type) network. Note that in some scientific fields, such as physics, while a multiplex network typically has the same meaning as above, heterogeneous network is a rarely used term. Instead, a heterogeneous network is often referred to as a multilayer network, and a network-of-networks is sometimes used as a synonym for a multilayer network de2023more ; DeDomenico2013Mathematical ; kivela2014multilayer .

Heterogeneous networks are a powerful framework for the representation, integration, and analysis of diverse data modalities of a complex system with multiple types of nodes or edges (or both), allowing for reconciling complementary measurements and providing a holistic view of the system. Here, we discuss the following major research directions encompassing heterogeneous networks: inference of a heterogeneous network from multimodal data, pathway reconstruction for interpretation of multi-omic data, network alignment, inference and reasoning with biomedical knowledge graphs, and network-of-networks analysis. This is not an exhaustive list of topics on heterogeneous networks, and other sections touch on additional topics. For example, Section 5 touches on graph representation learning including but not limited to learning in heterogeneous networks, and Section 6 talks about integration of multimodal data for the purpose of patient stratification, identification of disease-dysregulated molecular pathways and functional modules, and other precision medicine applications.

Inference of a heterogeneous network from multimodal data. Heterogeneous network inference is the computational task of inferring the graph connectivity structure from multimodal – to date, typically multi-omic – measurements hawe2019inferring . The vast majority of methods for this task infer connections between nodes corresponding to biomolecules such as genes, proteins, and metabolites (Fig. 3B) using bulk -omic datasets. Single-cell -omic datasets have posed new opportunities for network inference where nodes can represent individual cells. Heterogeneous network inference methods can be grouped into categories based on how much they rely on labeled positive examples of edges.

Probably the simplest category of approaches take as input labeled examples of edges and non-edges along with pairwise node feature vectors derived from multimodal data and train binary classifiers to discriminate node pairs with edges from node pairs without edges greene2015understanding ; marbach2012predictive . These binary classification approaches assume that all node pairs are independent of each other and are therefore limited in their ability to exploit the known connectivity structure of the graph. An alternative are embedding methods (discussed in more detail in Section 5) that take as input an incomplete graph and multimodal measurement data as node features and learn an embedding of the nodes based on the (partial) graph structure and measured values, which are then used to infer edges based on link prediction lee2020heterogeneous ; yue2020graph or matrix completion natarajan_inductive_2014 . Graph embedding methods relax the independence assumption of binary classification methods. As graph embedding methods capture more of the network connectivity, it is conceivable that they need less training data to do as good prediction as simple binary classification. Graph neural networks (GNNs, discussed in more detail in Section 5) offer new ways to incorporate more global information about the network to inform the inference task yue2020graph . The biggest limitation of the above approaches is the need for positive training data (edges) and that negative examples (non-edges) are not truly observed but are assumed to be part of the complement of the positive set.

On the other hand, unsupervised graph structure learning methods take as input node-level measurements and infer the graph structure from these measurements alone, without requiring any labeled examples of edges/non-edges. These approaches can range from correlation-based networks inferring pairwise dependencies between nodes representing different multimodal data vasaikar2018linkedomics ; zhou2021omicsanalyst to more general approaches based on probabilistic graphical models hawe2019inferring ; koller2009probabilistic . We note that several of these methods were originally developed for transcriptomic datasets and are thus discussed in Section 2. In probabilistic graphical models, nodes are modeled as random variables and edges correspond to statistical dependencies koller2009probabilistic , where each data modality is represented as a different node type Chen2014selection ; Sedgewick2018mixed . A key modeling challenge when handling multiple types of measurements is to specify the appropriate probability distributions for each data modality Chen2014selection ; Sedgewick2018mixed . Furthermore, the larger number of variables of multimodal data introduces additional scalability issues for learning the structure of probabilistic graphical models such as general Bayesian networks. Several heuristics such as focusing on promising parents friedman2013learning ; schmidt2007 , exploiting modularity of molecular networks segal2005learning , or approximating joint probability distributions as done in dependency networks greenfield2013robust ; heckerman2000dependency ; roy2013integrated have enabled these models to scale to thousands of variables.

Once the networks have been defined, they can be further clustered into modules to identify potential functional groupings among the nodes choobdar2019assessment ; mitra2013integrative ; newman2006modularity . Unsupervised learning of graph structure from multi-omic data lends itself naturally to the inference of gene regulatory networks baur2020data , where node types represent target genes and protein regulators. Protein regulators can be further modeled based on their observed mRNA levels or their hidden activity levels miraldi2019leveraging . While such approaches do not need any edge-level information, if any, potentially noisy, information is available, this can be incorporated as a graph prior to guide the structure learning greenfield2013robust ; miraldi2019leveraging ; siahpirani2017prior .

The availability of single-cell multi-omic datasets has also opened up challenges that can be tackled with heterogeneous network inference demetci2022scot ; heumos2023best . One such problem is to infer cell-cell networks with nodes corresponding to cells, node types corresponding to different modalities (e.g., scRNA-seq, scATAC-seq) or time points (or both), and edges representing different semantics such as similarity or lineage relationships. Due to the size and sparsity in these data, dimensionality reduction is typically performed prior to inference of network structure. Non-negative matrix factorization, independent components analysis, and variational autoencoders are common dimensionality reduction approaches for single-cell multi-omic datasets. After dimensionality reduction, graph learning can be done using the $k$ -nearest neighbor approach butler2018integrating or with optimal transport demetci2022scot ; schiebinger2019optimal . Graphs based on $k$ -nearest neighbors, with different distance measures, are straightforward to implement and frequently used in practice, while optimal transport’s framework to match probability distributions of cells can be used to capture fine-grained cell dynamics.

Pathway reconstruction for interpretation of multi-omic data. Heterogeneous networks offer a powerful framework to integrate, interpret, and reconcile missing and noisy measurements commonly seen in multi-omic experiments haque2017practical ; peck2021boosting . The task of pathway reconstruction takes as input multi-omic measurements of different biomolecules represented as node types and a large background molecular network. It outputs a sparse subnetwork with high-quality connections among the relevant biomolecules garrido2022integrating (Fig. 3C). The background networks typically contain PPIs and may also include protein-DNA, protein-RNA, or protein-metabolite interactions to match the available -omic data. Paths from one relevant biomolecule to another in the background network can help prune irrelevant biomolecules and identify those that may play critical roles in the overall biological process but were missed in the -omic measurements paull2013discovering ; pirhaji2016revealing ; tuncbag2016network ; winkler2022novo . Note that this task also relates to condition-specific network inference discussed in Section 2 and multi-omic module discovery discussed in Section 6 for discovery of dysregulated pathways in diseases such as cancer.

The sparse subnetwork obtained depends on the choice of optimization algorithm and its parameters. Some pathway reconstruction algorithms are computationally efficient, based on shortest paths ritz2016pathways or network flow yeger2009bridging . Despite their algorithmic simplicity, these methods can still effectively prioritize biologically relevant nodes and interactions. Network flow-based methods can scale across multiple experiments by relying on the multicommodity flow approach, which identifies nodes and edges that are unique and shared across conditions gosline2012samnet . General integer linear programming approaches chasman2014inferring ; ourfali2007spine support arbitrary node, edge, and path constraints. These provide the greatest customization for a particular multi-omic dataset but less scalability and reusability across applications. Intermediate approaches such as the Prize-Collecting Steiner Forest tuncbag2013simultaneous are computationally difficult to solve exactly but can be approximated efficiently. For instance, the Omics Integrator software tuncbag2016network based on the Prize-Collecting Steiner Forest algorithm adds prizes to nodes that should be included in the sparse subnetwork and costs to edges based on their reliability. Omics Integrator also includes a module to estimate prizes for active TFs based on chromatin accessibility, gene expression, and DNA-binding motifs. Its parameters control the tradeoff between node prizes and edge costs, a penalty for including nodes with high degree, and a penalty for the number of connected components in the subnetwork.

Heterogeneous pathway reconstruction is especially powerful because network connections between different types of biomolecules can be combined to reveal more complete and explanatory pathways. For instance, a TF that activates differentially expressed genes detected with RNA-seq may be inferred to be regulated by an upstream phosphorylated kinase detected with mass spectrometry. A study of Kaposi’s Sarcoma-associated Herpesvirus infection sychev2017integrated illustrates the data types and algorithms involved, and biological insights gained in multi-omic pathway reconstruction. The authors profiled the proteomic and phosphoproteomic changes in endothelial cells induced by viral infection using mass spectrometry and gene expression changes with RNA-seq. They used TF binding motifs and a statistical enrichment test with the gene expression data to identify potentially relevant transcriptional regulators. Then, they applied Omics Integrator tuncbag2016network to combine the transcriptional regulators, proteomic changes, phosphoproteomic changes, and a PPI background network in order to obtain a holistic view of the endothelial cell response to infection. Ultimately, this analysis revealed peroxisome-related proteins to be an important part of the response. This network-based insight was supported with follow-up wet laboratory experiments sychev2017integrated .

Network alignment. In network biology, network alignment has traditionally been used to compare species’ PPI networks Emmert-Streib2016 ; faisal2015post ; Guzzi2017 ; ma2022heuristics ; Sharan2006 ; PNAMNA . In this context, network alignment aims to find a node (protein) mapping between the compared networks that uncover regions of high topological (and often sequence) conservation, with the hypothesis that the resulting aligned nodes and network regions are evolutionary conserved or functionally similar. Finding such a node mapping is closely related to the NP-complete subgraph isomorphism problem, making the network alignment problem NP-hard faisal2015post .

Even when comparing PPI networks, which are homogeneous, network alignment can be viewed as a multimodal data integration task. This is because an alignment (i.e., node mapping) in a “composed view” results in a heterogeneous (specifically, multiplex) network whose “supernodes” contain mapped nodes from the individual homogeneous networks and whose edges are of distinct types, indicating which one(s) of the compared networks the given edge is present in under the given node mapping (Fig. 3D). More recently, approaches have been proposed for aligning heterogeneous networks in biology gu2018HNA ; LHetNetAligner and other domains chen_fascinate_2016 ; yan_dissecting_2022 . Below, we discuss algorithmic principles of traditional alignment of homogeneous networks and then comment on the alignment of heterogeneous networks.

Analogous to sequence alignment, alignment of homogeneous networks can be local or global Meng2016 . Both have (dis)advantages Guzzi2017 . Also, network alignment can be pairwise (between exactly two networks) or multiple (between more than two networks) multiMAGNA++ . The latter has traditionally been expected to lead to deeper biological insights as it aligns all considered networks simultaneously as opposed to one pair at a time; however, a recent evaluation showed that this is not always the case PNAMNA . At the same time, multiple network alignment is computationally more complex multiMAGNA++ .

Network alignment has two main algorithmic components faisal2014global . First, topological similarity between nodes across the compared networks is computed via some measure of node conservation; graphlet-based measures (Section 4) are among state-of-the-art gu2018HNA ; newaz2019bookchapter . Second, an alignment strategy quickly identifies alignments that optimize some objective function accounting for total node and ideally also edge conservation under the given node mapping. That is, a good alignment should both map similar nodes to each other and conserve many edges. Original alignment strategies were of the seed-and-extend type GRAAL ; singh2008global ; WAVE . The extension around highly similar “seed” nodes, by adding mapped nodes incrementally to build the alignment one step at a time, is intended to explicitly improve node conservation of the resulting alignment, but edge conservation only implicitly. To improve edge conservation explicitly as the alignment is constructed, rather than only evaluating it after the fact, another type of alignment strategy – a search algorithm – was introduced. Here, entire alignments are explored, and the one that scores the highest based on the given (e.g., edge conservation-based) objective function is returned, using, e.g., genetic algorithms MAGNA ; multiMAGNA++ ; DynaMAGNA++ ; MAGNA++ or simulated annealing SANA .

A recent algorithmic shift in network alignment has been from unsupervised to supervised, data-driven alignment TARA ; TARA++ . Traditional network alignment uses the notion of topological similarity to quantify how close to isomorphic two nodes’ extended network neighborhoods are. A major issue is that regardless of the considered similarity measure, aligned nodes often do not correspond to nodes that should actually be mapped, i.e., that are functionally related TARA . Specifically, when comparing species’ PPI networks, aligned nodes do not correspond to proteins that are involved in same biological processes. This is why a move was made from optimizing topological similarity to learning from the data what kind of topological relatedness corresponds to functional relatedness, without assuming that topological relatedness means topological similarity TARA . For example, topological similarity will aim to match a triangle in one network to a triangle in another network, and a square in the former to a square in the latter. Yet, due to biological variation or noise in PPI data, perhaps it is the triangle in the first network that is functionally related and should thus be matched to the square rather than the triangle in the second network, which is what topological relatedness would aim to learn from the data. This resulted in moving from traditional unsupervised alignment (functional labels of nodes, e.g., biological processes of proteins in PPI networks, being used to evaluate an alignment only after it is produced) to supervised, data-driven alignment (functional labels of nodes being used during the process of constructing an alignment, to learn patterns of topological relatedness). A pioneering data-driven network alignment method used traditional machine learning, i.e., user-predefined (graphlet-based) features TARA ; TARA++ and standard classifiers (e.g., logistic regression), while a follow-up effort used deep learning and specifically GNNs ding2023supervised .

Finally, going back to alignment of heterogeneous networks, an earlier attempt in biology was still to align homogeneous networks to each other, where the heterogeneity came from the fact that the individual homogeneous networks being compared were of different types: one was a human PPI network whose nodes were proteins, and the other was a disease-disease association network whose nodes were diseases Wu2009 . Then, the goal of aligning the two networks was to identify causative genes/proteins and their pathways underlying disease families. But, because each of the compared networks was homogeneous, a homogeneous network alignment approach sufficed for their comparison. A more recent effort towards actually aligning one heterogeneous network to another, each with different node and edge types (or colors), was extending the existing notions of homogeneous graphlet-based node similarity/conservation as well as homogeneous edge conservation (discussed above) into their heterogeneous (or colored) counterparts, and then extending the existing seed-and-extend or search alignment strategies (discussed above) to find high-scoring alignments with respect to the new heterogeneous conservation measures gu2018HNA . In evaluations on synthetic and real biological networks, the heterogeneous methods led to higher-quality alignments and better robustness to noise in the data than their homogeneous counterparts gu2018HNA . Two types of heterogeneous biological networks were considered: first, PPI networks were aligned to each other, where nodes (proteins) were colored according to whether they were involved in aging, cancer, and/or Alzheimer’s disease; second, protein-GO term networks were aligned to each other, where such a network had two types of nodes – proteins and GO terms – and three types of edges – PPIs, protein-GO term annotations, and GO term-GO term semantic similarity associations gu2018HNA . This effort gu2018HNA aligned heterogeneous networks globally. In parallel, an approach for their local alignment was proposed LHetNetAligner .

Ideas from machine learning-based embedding of heterogeneous networks (Section 5) in biology pio2021multiverse and other domains wang2022multiplex ; wang2022survey could be extended to heterogeneous network alignment. However, to our knowledge, such extension has not yet been carried out in biology but it has been carried out in other domains such as social, information, or technological networks cai2023resolving ; wang2022network ; xiong2021contrastive ; zhang2020nettrans ; zhang2019origin ; zheng2018heterogeneous . Note that in zhang2020nettrans ; zhang2019origin , the heterogeneity of considered networks came from node/edge attributes rather than explicit node/edge types. In these two studies, GNNs were used to first find an embedding of nodes of the compared networks, and then the network alignment problem was viewed as a point registration problem zhang2019origin or a neural network transformation problem zhang2020nettrans .

Inference of and reasoning on biomedical knowledge graphs. Biomedical knowledge graphs (BKGs), which describe semantic relationships between biomedical entities, are among the richest examples of heterogeneous networks nicholson2020constructing . BKGs aim to combine facts about diverse biomedical entities, which can range from genes to individual patients as well as measurements associated with them. BKGs represent biological facts using “subject-predicate-object” triples as the fundamental unit, with the subject and object corresponding to nodes in the graph and the predicate (also called a relation) corresponding to a directed edge, possibly of different types, between the nodes. For example, Chlorin e6-PDT (subject) reduced (predicate) cell proliferation (object); Fig. 3E. Exemplar active BKG projects, each taking a unique approach, include Scalable Precision Medicine Knowledge Engine (SPOKE)¹¹1https://spoke.rbvi.ucsf.edu morris2010ceres , BioThings Explorer²²2https://explorer.biothings.io fecho2022progress ; lelong2022biothings , biomedical “corner” of Wikidata³³3https://www.wikidata.org manske2019genedb ; page2022wikidata ; waagmeester2020wikidata , and PrimeKG chandak2023building .

BKGs have emerged as powerful frameworks for diverse biomedical applications nicholson2020constructing including drug repurposing (e.g., Hetionet himmelstein2017systematic and SPOKE morris2010ceres ), rare disease diagnosis alsentzer2022deep , and biomarker discovery (e.g., SPOKE himmelstein2015heterogeneous ). BKGs leverage graph databases like Neo4j and Virtuoso, and semantic web standards like the Resource Description Framework for their backend. BKGs leverage over a hundred years of graph theory to enable operations on first neighbors, paths, centralities, and other network components, as well as semantics, inference, and reasoning. There are a number of computational challenges that emerge to maximally extract the information encoded in BKGs for diverse biomedical applications ranging from construction of BKGs to reasoning with BKGs peng2023knowledge . For example, advanced, multi-hop queries specifying node and edge types are essential to navigating heterogeneous network representations of biomedical knowledge; “multi-hop” refers to having to traverse at least two edges in the graph. Many of these challenges have been approached using similar methods of network inference as previously described (e.g., link prediction) as well as more recently with graph representation learning approaches discussed in Section 5.

Equally important is the question of representation of biomedical and biological literature to enable advanced queries and reasoning. Traditional BKGs assume that all knowledge can be represented as subject-predicate-object tuples and are constructed using tuple extraction techniques based on machine learning. A simple postprocessing algorithm can extract the tuples from any sentence and represent them as links between nodes on the BKGs. However, traditional BKGs have ignored the conditions (e.g., patient age or environment) of the facts, which capture essential contexts for knowledge exploration and inference. Recently, a new type of BKG, Condition-aware BKG (CondBKG jiang2020biomedical ), has been introduced, which considers both facts and their conditions in the biomedical statements. Unlike traditional BKGs which have only one layer of subject-predicate-object tuples, CondBKG is a three-layered information-lossless representation of BKGs. The first layer has biomedical concept and attribute nodes; the second layer represents both biomedical fact and condition tuples by nodes of the predicate phrases, connecting to the subjects and objects in the first layer; the third layer has nodes that represent statement sentences as their textual attributes and connect to fact and/or condition tuples in the second layer (Fig. 3E). CondBKG is constructed from a machine learning model’s output tuples. Given a statement sentence and its context (e.g., nearby sentences) in a scientific article, the model learns from multiple types of input signals of sentence (e.g., word embeddings and part of speech tags) and predicts one or multiple tuples. CondBKG has 18.1 million fact tuples, 7.5 million condition tuples, 10.9 million concept nodes, and 703 thousand attribute nodes. CondBKG preserves more knowledge from unstructured text than traditional flat BKGs and can be used to answer tailored queries, such as what factors increase or reduce cell proliferation and their conditions (Fig. 3E). CondBKG can provide a good understanding of biomedical and biological statements and supports diverse applications for biomedical knowledge discovery.

Network-of-networks analysis. Biological systems function at different scales of organization. Thus, network-of-networks analysis (Fig. 3A) is an exciting, still relatively unexplored area of research. This topic has received an increasing amount of attention only in recent years. This is likely because it has been increasingly recognized that network-of-networks representations of various biological data can be obtained: (1) given that different diseases tend to manifest in different tissues, nodes (diseases) in a disease similarity network can be represented as their associated tissue-specific PPI networks ni2016disease ; (2) nodes in a PPI network can be represented as protein structure networks gao2023hierarchical ; gu2022modeling ; (3) nodes in a network of interacting molecules can be represented as molecular graphs wang2022powerful ; wang2020gognn ; (4) nodes in a bipartite graph containing interactions between drugs and their target proteins can be represented as drug molecule graphs and target protein structure networks, respectively chu2022hierarchical . Note that not all existing network-of-networks studies originate in the biology domain. Some have been proposed and evaluated in other domains, such as on text and social network datasets li2022semi .

The studies that have analyzed biological network-of-networks data typically perform different network analysis and application tasks, as follows. The task of node ranking was applied to candidate disease gene prioritization from the network-of-networks of type (1) above ni2016disease . The task of link prediction was applied to predicting interactions between proteins from the network-of-networks of type (2) above gao2023hierarchical , between molecules such as drugs from the network-of-networks of type (3) above wang2022powerful ; wang2020gognn , or between drugs and their target proteins from the network-of-networks of type (4) above chu2022hierarchical . A new task was introduced – that of entity label prediction – which merges the two traditionally isolated tasks of node (protein) classification at the higher scale containing a PPI network and graph (also protein) classification at the lower-scale containing protein structure networks gu2022modeling . This task was applied to predicting protein functions from the network-of-networks of type (2) above gu2022modeling . Given that the different approaches were proposed for different tasks/applications, they have typically not been evaluated against each other. It remains unclear whether the different approaches can be effectively used in tasks/applications other than those they were proposed for, as well as what (dis)advantages of each approach are on the methodological level. With the increasing availability of network-of-networks data and the increasing number of approaches for network-of-networks analysis, the need for proper method evaluation will only continue to gain importance. This will require all studies to make their data and code publicly available and easy to use. According to our exploration of the existing network-of-networks studies discussed above, this is not always true.

4 4. Higher-order network analysis

Need for higher-order graph representations of biological systems. This paper, unless explicitly noted otherwise, deals with traditional pairwise graphs (or simply graphs). Such a graph represents the organization of a biological system as a network of pairwise interactions between biomolecules (e.g., a PPI is represented as an edge connecting two proteins, and a transcriptional regulatory interaction is represented as a directed edge from a TF to a gene). However, these interactions often involve additional components and the interactions themselves can be regulated by other components battiston2020networks . In other words, there is often a need to capture interactions between multiple (two or more) nodes rather than between exactly two nodes (as is the case with pairwise graphs). Several higher-order graph ideas have been proposed in the literature to overcome the limitations of pairwise graphs. There are two general categories of such ideas.

The first category still works with pairwise graphs but relies on either higher-order dependencies between two nodes xu2016representing or small subgraphs newaz2019bookchapter , as follows. Regarding higher-order dependencies, it was shown that when representing sequential data such as global shipping traffic as networks, assuming the first-order dependency, i.e., that the next movement of traffic depends only on the current node, and thus discounting the fact that the movement may depend on several previous steps, can yield inaccurate network analysis results xu2016representing . This is because data derived from many complex systems can show up to fifth-order dependencies between two nodes. Consequently, an approach was proposed for capturing variable orders of dependencies between pairs of nodes xu2016representing . Regarding subgraphs, these can be viewed as “higher-order coordinated patterns” between two or more nodes of a pairwise graph battiston2020networks ; a subgraph captures first-order dependencies (as discussed above and defined in xu2016representing ) between multiple nodes in a pairwise graph. Examples of subgraph types are cycles (e.g., a triangle or a square) or cliques (the densest of all subgraph types, containing all possible edges between their nodes) battiston2020networks . Two general categories of subgraphs exist: graphlets Przulj2007 and network motifs Milo04 . Two key differences exist between them: graphlets are induced subgraphs while network motifs are not, and network motifs need to be statistically significantly over-represented in a pairwise graph compared to a null (i.e., random graph) model while graphlets do not rely on a null model.

Both higher-order dependencies and subgraphs in pairwise graphs from the first category fail to directly account for interactions between more than two nodes in a network. An alternative is the second category of higher-order graph ideas – to explicitly consider higher-order graph structures. Here, while simplicial complexes are a theoretic possibility, they have assumptions that are practically too strong in some systems battiston2020networks . The next most general idea of higher-order interactions that is at the same time less constraining and thus more practical are hypergraphs battiston2020networks .

Higher-order dependencies (as discussed above and defined in xu2016representing ) have not yet received attention in the biology domain, which is why we do not discuss this idea further. Graphlets in pairwise graphs (or simply graphlets), hypergraphs, and graphlets in hypergraphs (i.e., hypergraphlets) have received significant attention in the biology domain, which is why the following sections discuss these topics in more detail. While network motifs have also received attention, it remains unclear which random graph model fits real-world networks the best and should thus be used for network motif identification ArtzyRandrup04 ; newaz2019bookchapter , which is why we do not discuss network motifs further.

Graphlets. Graphlets, small subgraphs, are Lego-like building blocks of a network. More formally, they are connected, non-isomorphic, induced subgraphs of a graph Przulj2004 . Because counting of large graphlets in a large network is time-consuming, in practice, graphlets on up to five nodes have typically been studied. Graphlets were originally proposed as subgraphs of undirected, homogeneous, static, unordered, and pairwise graphs newaz2019bookchapter . More recently, they were extended to their directed Lugo2014 ; Sarajlic2016 , heterogeneous gu2018HNA , dynamic hulovatyy2015exploring , ordered GRAFENE ; GRALIGN , or hypergraph Gaudelet2017HG ; Lugo-Martinez2021 counterparts, respectively; the latter are called hypergraphlets and are discussed more below after hypergraphs are introduced. The following concepts are discussed for original graphlets, but they generalize to the more data-rich counterparts as well.

In a graphlet, nodes can correspond to different symmetry groups called automorphism orbits (or just orbits for simplicity) Przulj2007 . For example, in a graphlet corresponding to the 3-node path (e.g., $a-b-c$ ), the two outer nodes ( $a$ and $c$ in our illustration) are symmetric to each other and thus belong to the same orbit, while the middle node ( $b$ ) is in its own orbit. As another example, in a clique, all nodes are symmetric to each other and thus belong to the same orbit. There are 15 orbits for 2-4-node graphlets and 73 for 2-5-node graphlets. This concept of graphlet orbits can be used to quantify a node’s extended network neighborhood into a 15- or 73-dimensional embedding, often called the node’s graphlet degree vector (GDV) milenkovic2008uncovering . This vector counts how many times a node of interest touches (or participates in) each of the considered graphlets at each of their orbits. By computing GDV for each node in a network, one can obtain the network’s GDV matrix, whose entry $(i,j)$ contains the information of how many times node $i$ touches orbit $j$ milenkovic2008uncovering ; newaz2019bookchapter . Note that there exist an analogous concept of edge (rather than node) as well as node pair orbits, GDVs, and GDV matrices Hulovatyy2014 ; Solava2012 .

GDV matrices of networks have been used as features to compare extended neighborhoods of nodes (edges, node pairs) in the same network, extended neighborhoods of nodes (edges, node pairs) across different networks, or structures of entire networks newaz2019bookchapter . These, in turn, have been used in numerous computational tasks, such as network alignment, alignment-free network comparison, graph classification, node classification, network de-noising via link prediction, inference of a condition-specific network or pathway reconstruction, network clustering, and node centrality computation, as well as for various application problems, such as studying human aging, protein folding and function, cancer and other diseases, pathogenicity, or mental health (e.g., depression and anxiety), as briefly discussed in other sections arici2023unveiling ; Liu2020 ; liu2021heterogeneous ; magnano2021automating ; newaz2019bookchapter ; newaz2022multi ; Solava2012 .

Hypergraphs. Hypergraphs provide powerful representations by generalizing edges between exactly two nodes to hyperedges that involve multiple nodes Berge1973 . For example, protein complexes, which involve simultaneous interactions among multiple proteins that carry out function only as a group, are effectively represented using undirected hypergraphs, where each node is a protein and each undirected hyperedge (a set of nodes) is a complex klamt2009hypergraphs . Under this representation, complexes that share interactors can be disambiguated, thus allowing more flexibility to capture multiple functionalities on the same set of nodes. Signaling pathways, on the other hand, are represented using directed hypergraphs in which proteins are represented by nodes and reactions are represented by directed hyperedges ritz2014signaling .

Fig. 4 shows an example of nine reactions from the transforming growth factor-beta (TGF $\beta$ ) signaling pathway gillespie2022reactome and their representation using higher-order graph frameworks. In this example, TGF $\beta$ 1 binds to the TFG $\beta$ receptor and phosphorylates SMAD2/3, which in turn binds to SMAD4; SMAD2/3 are subsequently dephosphorylated by MTMR4. The signaling reactions are captured by a directed hypergraph with nine hyperedges connecting proteins (which may be phosphorylated) and protein complexes (Fig. 4A). Without the directed hyperedges, we have a series of overlapping protein complexes, the structure of which provides some insights into how the protein complexes form (Fig. 4B). Directed and undirected hypergraphs offer more information than a graph that only captures pairwise physical interactions in this cascade (Fig. 4C). If dealing with the pairwise graph representation in Fig. 4C, graphlets can help characterize the local topology of a specific node (Fig. 4D) or an entire network, as discussed above. If dealing with the hypergraph representation from Fig. 4A-B, hypergraphlets, discussed below, can be used to quantify topology (Fig. 4E).

A shortcoming of pairwise graphs in representing multi-component interactions is that some paths may be lost murgas2022hypergraph or ghost paths can be created pandey2007functional while contracting a multi-way interaction into a set of pairwise interactions. For example, as seen in Fig. 4A, the interaction between TGF $\beta$ 1 and SMAD2/3 occurs when TGF $\beta$ 1 is part of the TGF $\beta$ complex that is phosphorylated, but this information is lost in the pairwise graph representation shown in Fig. 4C. In addition, contracting multi-way interactions into pairwise interactions results in the replication of interactions between multiple components, inflating subgraph density, multiplicity of paths, and node degrees; while also shortening paths. Generalization of notions such as density or centrality to hypergraphs can therefore provide more reliable insights into the topology and dynamics of biological networks feng2021hypergraph .

In addition to reducing representation loss, hypergraphs also offer meaningful algorithmic advantages. Owing to the graph duality property where each graph can be represented as a hypergraph by inverting nodes and edges of the original graph into hyperedges and nodes, respectively, of a dual graph, hypergraph representations offer a possibility to unify methodology. For example, node classification, edge classification, and link prediction on pairwise graphs can all be seen as node classification on (extended) dual hypergraphs Lugo-Martinez2021 . This allows for the development of general methodologies and software that could support statistical inference tasks on biological networks.

To date, the application of hypergraphs in biological network analysis is limited because of constraints posed by the availability of data and annotations (or lack thereof). In cellular signaling, post-translational modifications play a central role in multi-way interactions among cellular components, yet only a small fraction of post-translational modifications are well-characterized needham2019illuminating . As biotechnology advances and more data are generated, the availability of algorithms that solve fundamental problems on hypergraph representations, therefore, has the potential to guide data generation and curation of annotations.

Hypergraph algorithms. In the broader computer science community, hypergraph algorithms exist for several problems including shortest paths, random walks, and clustering ausiello2017directed ; cambini1997flows ; ducournau2014random ; gao2014dynamic ; zhou2006learning . Within the context of network biology, hypergraphs have been used to study metabolic networks klamt2009hypergraphs , clusters in PPI networks ramadan2004hypergraph , and shortest paths in signaling pathways. This final application is the best developed use of directed hypergraphs in network biology. Hence, we focus our discussion on it.

Defining reachability in directed hypergraphs is significantly more complex than in pairwise graphs. A key principle is that the nodes in the head of a hyperedge are reachable from some source only if all the nodes in the tail are themselves reachable from that source. This principle expresses the natural concept that for any product of a reaction to form, all the reactants must be present. The notion of B-reachability formalizes this idea ausiello2017directed ; ritz2014signaling . The challenge now is that computing B-hyperpath with the smallest number of edges is an NP-complete problem, even when the tail and head of each hyperedge contain at most two nodes and we are interested only in acyclic hyperpaths ritz2014signaling . An initial approach proposed a mixed-integer linear program to compute optimal hyperpaths ritz2014signaling , applying it with success to the Wnt signaling pathway in the NCI Pathway Interaction Database. In practice, a drawback of this method was that a very large number of nodes without any incoming hyperedge had to be included among the sources for any meaningful hyperpath to exist. A later technique relaxed the definition of B-hyperpath franzese2019hypergraph to address this problem. As another alternative, an efficient heuristic approach can handle cyclic hyperpaths and computes optimal ones in practice krieger2022heuristic . An exact cutting-plane algorithm can also compute shortest hyperpaths with cycles while being efficient in practice on both the NCI Pathway Interaction Database and Reactome krieger2023computing . Finally, similar problems have been studied in the context of metabolic networks. Here, the notion of shortest path is generalized to a factory, which also takes reaction stoichiometry into account. A mixed-integer linear program can find factories with the fewest reactions and accommodate negative regulation krieger2022computing .

Statistical learning on hypergraphs. Hypergraphs can be approximated by pairwise graphs (e.g., star expansion, clique expansion Agarwal2006 ), but such approximations do not retain all properties of the original hypergraphs (e.g., the cut properties Ihler1993 ). Therefore, methods directly developed for learning on hypergraph data can offer practical advantages. A number of such approaches have emerged Antelmi2023 ; chitra2019random ; Cong1991 ; Leordeanu2012 ; Lugo-Martinez2021 ; Maleki2022 ; Wachman2007 ; however, accurate learning on hypergraphs is often hindered by NP-hardness issues Gartner2003 ; Hein2013 ; Purkait2017 and, thus, methods developed to directly deal with hypergraph data often trade accuracy for scalability.

A common theme in statistical learning on hypergraphs is finding a typically high-dimensional representation, or an embedding, of the data, and subsequently applying traditional machine learning to learn some concept; see Section 5 for more details. These methods can work at the level of entire graphs for graph classification, or at the level of nodes (edges), for node (edge) classification and link prediction. A well-known graph classification problem is the prediction of toxicity of chemical molecules Vishwanathan2010 , where the nodes are atoms, and the edges are bonds, both of different types, or prediction of protein function Borgwardt2005 . Examples of popular node/edge classification problems are function prediction for proteins/protein complexes in PPI networks or for amino acid residues in protein structure networks Lugo-Martinez2016 ; Vacic2010 . An example of a link prediction problem is the task of de-noising and completion of the PPI network itself, as also discussed in Section 2.

Embeddings are often formalized via kernel-based approaches or representation learning (Section 5), thus allowing the practitioners to use both finite- and infinite-dimensional representations. Well-performing kernel approaches (kernels are symmetric, positive semi-definite similarity functions defined on pairs of objects, that allow efficient learning Shawe2004 ) include random walks Wachman2007 and hypergraphlet counting Lugo-Martinez2021 . Hypergraphlets are typically defined as small, connected, (rooted) hypergraphs, often with a finite number of node and edge types Lugo-Martinez2021 . They are a non-trivial extension of (pairwise) graphlets discussed above Lugo2014 ; milenkovic2008uncovering ; Przulj2007 ; Przulj2004 ; Shervashidze2009 ; Vacic2010 , with both illustrated in Fig. 4D-E. As with graphlets, the appeal for counting hypergraphlets derives from the graph reconstruction conjecture Bondy1977 . Though proved only for certain types of graphs (e.g., trees), the graph reconstruction conjecture postulates that a large graph of size $n$ can be reconstructed up to isomorphism from the counts of all subgraphs up to the size of $n-1$ . A stronger version of the conjecture allows for such reconstruction for subgraphs up to the size of some $k<n-1$ . Under these conditions, hypergraphlet counting approaches can lead to embeddings that allow universal approximation on hypergraph data. Another approach, relying on neural-network graph embeddings, allows for scaling hypergraph-based approaches to very large graphs Maleki2022 .

Additional approaches for hypergraphs exist, which are based on deep learning gui2016large ; tu2018structural . Among these, a prominent example utilizes a GNN based on self-attention to effectively learn embeddings of the nodes and predict hyperedges for non- $k$ -uniform heterogeneous hypergraphs, enhancing the generalizability zhang2020hypersagnn . This approach and its extensions have been applied to studying chromatin biology zhang2020matcha ; zhang2022multiscale and predicting genetic interactions for a group of genes, specifically trigenic interactions, thereby significantly expanding the quantitative characterization of higher-order interactions zhang2020dango .

Limitations. Three major issues confront the wide adoption of hypergraph-based representations in network biology. Databases such as Reactome gillespie2022reactome contain well-curated reaction networks that are amenable to representations as generalizations of directed hypergraphs. The first issue is that these resources remain incomplete and rely on manual curation. One promising direction of research is to analyze pairwise graphs to automatically infer reactions. An elegant example is an approach that uses properties of chordal graphs to convert a graph representation of a signaling pathway as a nested tree of protein complexes zotenko2006decomposition . A graph is chordal if every pair of nodes in every cycle of length four or more is connected by an edge. Since PPI networks are not necessarily chordal, the authors augment them with additional edges, e.g., those that connect weak siblings, i.e., pairs of nodes that have identical neighbor sets but are themselves not connected by an edge. If the resulting graph is chordal, it admits a representation as a tree of cliques, which can be converted into a tree of complexes in the original graph by deleting the artificially-added edges. This method was applied to the TNF- $\alpha$ /NF- $\kappa$ B and pheromone signaling pathways zotenko2006decomposition . To further the use of hypergraphs in network biology, it will be important to generalize this method to apply to larger classes of graphs and to unify these methods of automated reconstruction with the results of manual curation. It may also be valuable to formulate hybrid network representations that combine the features of pairwise graphs and hypergraphs. A caveat here is that the need to develop a novel set of algorithms for every new representation might prevent its wide adoption in the community.

The second issue is that the theory for (directed) hypergraphs is much less well-developed than for pairwise graphs. Problems that have well-established and simple polynomial-time solutions on pairwise graphs, e.g., shortest paths, turn out to be computationally intractable on directed hypergraphs ritz2014signaling , as discussed above. Incorporating regulation into the definitions of shortest paths continues to be challenging krieger2022computing . Moreover, graph-theoretic concepts such as clusters, flows, random walks, or convolutions that have been employed fruitfully in network biology are either challenging to generalize to hypergraphs or have found limited applications in biology.

The third issue is that it is not clear under what conditions or for which applications a higher-order representation is better than a pairwise graph representation. Arguments often appeal to visual and qualitative reasoning (Fig. 4). We encourage the community to come forward with well-established datasets, evaluation measures, and benchmark frameworks that can pose these questions formally and develop generalizable standards.

5 5. Machine learning on networks

Overview. Machine learning has emerged as a powerful paradigm for creating predictive models specified as parameterized functions with tunable parameters that operate on structured data, such as graphs, spatial geometries, relational structures, and manifolds. Applying machine learning methods to network data has demonstrated potential in a myriad of biological network analysis tasks hetzel2021graph ; li2022graph ; theodoris2023transfer ; yue2020graph . Recent methods are designed to produce graph representations as compact numerical vectors (or embeddings) corresponding to various graph elements, such as nodes, edges, subgraphs, and entire graphs, and capture essential information about the topology of these elements. These learned representations can be fed into models trained toward a vast array of downstream analytic tasks.

Predictive models on graphs include models for predicting node labels (node classification), edge-level relationships (link prediction), subgraph-level labels (subgraph classification), and graph-level labels (graph classification) (Fig. 5). These models can be created through unsupervised, self-supervised, and supervised learning on all types of networks, including homogeneous, heterogeneous, temporal, and spatial networks, and with additional constraints and domain knowledge imposed on the models. By leveraging deep graph learning models pretrained on large-scale general graph datasets, it is possible to adapt (or fine-tune) pretrained representations for diverse use cases in predictive and generative modeling gainza2020deciphering ; gainza2023novo . As machine learning on graphs continues to be developed, appropriate model benchmarking is necessary to ensure that task-specific evaluation measures are well-defined and predictions are fair and robust. The rest of this section discusses these topics, which are also summarized in Fig. 5.

Table 1: Prominent open-source benchmark datasets for machine learning on biological networks. Databases are categorized by data type. The table is organized alphabetically by data type and database names.

Data type	Database	Task type	Prediction tasks
General	Long Range Graph Benchmark dwivedi2022LRGB	Edge-level	Molecular bond
		Graph-level	Peptide function, peptide structure
General	Open Biomedical Network	Node-level	Protein function
	Benchmark liu2023nleval	Edge-level	Disease-gene association
General	Open Graph Benchmark hu2020open	Node-level	Protein function
		Edge-level	Protein-protein association, drug-drug interaction, heterogeneous interaction, vessels in mouse brain
		Graph-level	Molecular property, species-specific protein association
General	SubGNN Benchmarks alsentzer2020subgraph	Subgraph-level	Proteins associated with biological process, rare neurological disorders phenotype-based diagnosis, and rare metabolic disorders phenotype-based diagnosis
General	Temporal Graph Benchmark huang2024temporal	Node-level	Dynamic node affinity prediction
		Edge-level	Dynamic link prediction
Knowledge graph	PrimeKG chandak2023building	Node-level	Identity of protein/gene, disease, drug, biological process, pathway, phenotype, molecular function, cellular component, exposure, and anatomical region
		Edge-level	Protein-protein interaction, disease-drug indication, disease-drug contraindication, disease-drug off-label use, disease-phenotype association, disease-disease association, disease-protein association, disease-exposure association, phenotype-protein association, pathway-gene association, etc.
Knowledge graph	Phenotype Knowledge Translator callahan2024open	Node-level	Identity of tissue, cell, DNA, RNA, gene, miRNA, variant, protein, disease, biological process, pathway, phenotype, molecular function, cellular component, and chemical
		Edge-level	Tissue-/cell-specific gene expression, gene-variant association, variant-disease association, chemical-disease association, chemical-pathway association, etc
Molecular	Protein sEquence undERstanding xu2022peer	Edge-level	Protein-protein interaction, contact prediction
design		Graph-level	Molecular property (e.g., fold classification, secondary structure prediction)
Molecular	Tasks Assessing Protein Embeddings tape2019	Edge-level	Protein-protein interaction, contact prediction
design		Graph-level	Molecular property (e.g., fold classification, secondary structure prediction)
Molecular design	Graph Explainability Library agarwal2023evaluating	Graph-level	Molecular mutagenic property, molecular functional group (e.g., benzine rings, fluoride carbonyl)
Neurology	NeuroGraph said2023neurograph	Graph-level	Donor demographics (age and gender), task states (emotion processing, gambling, language, motor, relational processing, social cognition, and working memory), cognitive traits (working memory, fluid intelligence)
Therapeutic discovery	AVIDa-hIL6 tsuruta2024avida	Edge-level	Antigen-antibody interaction
Therapeutic discovery	Therapeutic Data Commons huang2021therapeutics	Edge-level	Drug-target interaction, drug-drug interaction, protein-protein interaction, disease-gene association, drug-response prediction, drug-synergy prediction, peptide-MHC binding, antibody-antigen affinity, miRNA-target prediction, catalyst prediction, TCR-epitope binding, and clinical trial outcomes
		Graph-level	Molecular property (e.g., synthesizability, drug-likeness)

Unsupervised, self-supervised, and supervised graph learning. Unsupervised learning of graph representations involves optimizing parameterized strategies, such as GNNs, graph transformers, or multi-layer neural message-passing models, to aggregate information from a node’s (e.g., a gene in a gene co-expression network or a patient in a patient similarity network) neighbors in the network. The goal is to optimize the representations so that the proximity between entities in the embedding space mirrors their proximity in the network atz2021geometric ; cao2020comprehensive . Prevalent strategies for sampling neighbors in the network vicinity of nodes that get embedded in the latent space include biased and unbiased random walks as well as adaptive neighbor sampling hamilton2017inductive ; velivckovic2018deep . Objective functions of these methods aim to maximize embedding similarity in the latent space for neighboring nodes in the network hamilton2020graph ; hamilton2017representation ; perozzi2014deepwalk ; tang2015line . For instance, nodes connected by edges should be embedded closer together in the latent space (i.e., have more similar embeddings) than nodes that are not connected grover2016node2vec ; liu2022graph ; wu2021self ; xie2022self .

Self-supervised graph representation learning, the predominant approach for machine learning on graphs, leverages not only the network structure but also additional context or auxiliary tasks to generate informative embeddings. Unlike unsupervised methods that solely rely on the network structure for optimization, self-supervised techniques utilize auxiliary (pretext) tasks, such as predicting node attributes or reconstructing graph substructures, to enhance the learning process and create more robust embeddings hassani2020contrastive ; li2022graph ; zitnik2018modeling ; zitnik2017predicting . An example of a self-supervised node-level auxiliary task is predicting each node’s degree. Link prediction is a self-supervised edge-level task that predicts whether an edge exists between a pair of nodes kipf2016variational ; li2022graph based on a self-supervised objective liu2021self , which can be formulated using contrastive learning you2020graph , node or edge masking agarwal2023evaluating , and generative denoising yi2024graph . Examples of self-supervised subgraph and graph tasks include predicting subgraph and graph properties, such as distributional statistics of shortest path lengths, network diameter, and the presence or absence of specific higher-order structures and graphlets alsentzer2020subgraph ; luo2022clear ; you2020graph .

Graph representation learning, whether unsupervised or self-supervised, can be applied to any type of network, including but not limited to homogeneous, heterogeneous, temporal, spatial, and physical networks. For example, in heterogeneous networks, GNN and graph transformer models leverage node- and edge-based attention weights to aggregate neighborhood information depending on node and edge types fu2022mvgcn ; kesimoglu2023graf ; wang2019heterogeneous ; xie_mgat_2020 ; Zhang2019HetGNN . Other approaches treat each edge type as a homogeneous graph, apply a graph representation learning model to it, and then integrate edge-type specific node representations into final representations fu2022mvgcn ; kesimoglu2023graf ; kesimoglu2022supreme ; wang2021mogonet . In a heterogeneous network, subgraphs can be sampled via metapaths sun2011pathsim , which are defined by sequences of relationships (or edge types) connecting different types of nodes to model semantic nuances underlying the network in a self-supervised manner, such as through contrastive learning dong2017metapath2vec ; zhao2021multi . These advancements in graph representation learning have impacted areas like cancer biology, drug discovery, and disease diagnosis esteva2019guide ; huang2023zero ; huang2022artificial ; morselli2021network ; stokes2020deep .

Supervised graph representation learning uses networks with additional expert-curated or experimentally-derived labeled data to directly optimize models for specific prediction tasks (Fig. 5A). In this paradigm, nodes, edges, subgraphs, or entire graphs are associated with labels, and the learning process minimizes the discrepancy between the model’s predictions and these labels Schlichtkrull2018 ; velivckovic2017graph . Common applications include node classification, where individual nodes are assigned to predefined categories, and graph classification, wherein entire graphs are categorized based on their topological features eyuboglu2022mutual ; gilmer2017neural . Unlike unsupervised and self-supervised models, supervised graph learning directly uses label information, often leading to more task-specific and accurate representations, albeit at the cost of requiring labeled data.

Incorporating knowledge into machine learning models through knowledge graphs, spatial constraints, equivariances, and symmetries. In numerous biological and medical applications, standard graph representation learning often falls short of requirements. In these cases, the model’s predictive accuracy can be enhanced by imposing constraints drawn from pre-existing knowledge. Typical strategies encompass incorporating multimodal data into BKGs, augmenting GNNs with bespoke architectures, and applying domain-specific invariances.

BKGs help model heterogeneous relationships between biomedical entities, as already discussed in Section 3. The resulting latent space, which reflects the topology of the underlying knowledge graph, can be operated on to make inferences about existing and novel relationships. Jointly modeling diverse types of relationships in a BKG, such as integrative modeling of transcription regulation and metabolism chandrasekaran2010probabilistic ; niu2021trimer , can present unique challenges due to the BKG’s incompleteness and potential high-order relationships involving heterogeneous entities. Incorporating pathway knowledge, either implicitly as constraints that regularize network embeddings niu2021trimer or directly as a prior placed on the BKG structure and parameters in a Bayesian fashion boluki2017incorporating , has been shown to improve predictive performance. Supervised machine learning methods often require many samples to identify biologically meaningful patterns, which can limit their applicability in areas such as rare diseases that are inherently limited in clinical cases, leading to few samples to analyze banerjee2023machine . Advances in self-supervised graph learning applied to BKGs have shown promise for rare disease research alsentzer2022deep and will likely be informative for applications beyond rare diseases for which few samples exist with high-dimensional data.

Temporal and spatial data can be represented as networks, but specialized neural architectures are necessary to learn optimally on temporal/dynamic networks. Temporal graph representation learning methods typically involve two main components: a GNN architecture to generate embeddings for each time point and a recurrent neural network, such as a long short-term memory network or a transformer network, to perform sequence learning by leveraging temporal relationships between elements in the sequence. Existing approaches use GNNs as feature extractors of nodes and the underlying topology, and recurrent neural networks for temporal learning and to include additional metadata information li2017diffusion ; manessi2020dynamic ; pareja2020evolvegcn ; peng2020spatial ; zhao2019t . Recently, static GNNs have been extended to handle dynamic graphs by treating time points as hierarchical states you2022roland or applied to irregular time series data by propagating neural messages between time intervals of each sensor as well as between sensors zhang2022raindrop . Protein molecular configurations can be depicted as protein structure networks where amino acid nodes are linked by the 3D physical proximity of their residues, and the amino acid spatial coordinate information is encoded as node attributes. Deep learning models, particularly through the use of equivariant GNNs, can both attain high performance and preserve transformations of protein networks under translation, reflection, and rotation of networks in the 3D space batzner20223 ; gong2023general ; jumper2021highly . For instance, to establish a model that remains invariant to molecular spatial orientation, constraints enforcing rotation invariance ought to be integrated jumper2021highly . Methodologies derived from equivariant neural networks, such as AlphaFold jumper2021highly , can complement sequence-based language models lin2023evolutionary by harnessing evolutionary data to infer protein structures from primary amino acid sequences, and potentially generate realistic molecular formations.

Generative graph models. Generative graph models are a class of machine learning models specifically designed to generate new graphs, or parts of graphs, that resemble a given set of training graphs in some way. These models learn to capture the underlying patterns and structures in the training graphs and can then be used to produce new graphs with similar properties as the training graphs. For example, in molecular biology, the inherently graph-like nature of molecular structures has made GNNs an ideal tool for generating drug-like molecules, guiding the generation process by learning the underlying patterns and properties from real molecular data bilodeau2022generative . One such method is a variational graph autoencoder that learns embeddings of molecular structures and uses them to generate novel molecular graphs jin2018junction ; kipf2016variational ; li2018learning . Other generative models, such as GraphVAE, GraphRNN, and MolGAN, have also been developed to generate realistic graphs de2018molgan ; simonovsky2018graphvae ; you2018graphrnn . Inspired by generative adversarial networks for image generation, MolGAN pits a generator model (which produces graphs) against a discriminator model (which tries to distinguish between real and generated graphs). Additionally, graph transformer networks have recently been proposed for molecular graph generation, demonstrating the ability to generate molecules with desired properties by training on extensive chemical databases bagal2021molgpt .

When applied to protein design, GNNs have demonstrated impressive results in designing protein sequences that fold into specific structures ingraham2019generative . Graph-based methods like PotentialNet have shown promise for protein-ligand binding prediction feinberg2018potentialnet . Similarly, DeepSite uses 3D convolutional neural networks to predict protein-ligand binding sites jimenez2017deepsite . Moreover, recent generative models, such as ProteinMPNN dauparas2022robust utilize message-passing neural network architecture to generate protein sequences and structures, further expanding the range of possibilities for protein design.

Diffusion models have recently emerged as powerful tools in protein and drug design abramson2024accurate ; corso2023diffdock ; watson2023novo ; yim2024diffusion , leveraging their capability to model complex distributions for generating novel molecular and protein structures. In protein design, diffusion models operate by gradually denoising a random configuration towards a target protein structure, learning the distribution of protein conformations. A notable example is RFDiffusion watson2023novo , a diffusion model that generates protein structures by conditioning on both sequence and structural information, achieving enhanced accuracy in structure prediction. In drug design, these models are adapted to generate molecular graphs by iteratively refining a random molecular graph into a drug-like molecule with desired properties through a learned diffusion process o20243d .

Transfer learning. The quality of representations generated by graph representation learning methods is contingent upon the availability of labels. Nevertheless, in the realm of network biology, labels are often in short supply due to the substantial resources required for their curation and validation. A potent solution to addressing this challenge is transfer learning. This approach involves initially training a graph representation learning model on a large reference network via self-supervised pretraining hu2020strategies ; li2022graph ; xie2022self ; you2020graph , followed by adapting the resulting model or its outputs to a different task of interest typically through supervised learning on a small set of labeled examples (fine-tuning). Pretraining a model on a large network followed by fine-tuning of the model using a small labeled dataset allows the model to harness extant information about a network entity (i.e., from the large network utilized for pretraining) in service of diverse tasks with limited task-specific labels.

Transfer learning has shown considerable potential for developing predictive models on condition-specific networks that vary with biological conditions. Networks are typically constructed from context-unaware data (e.g., the human reference PPI network luck2020reference ) or data generated under specific conditions (e.g., a gene co-expression network for a particular disease). Biomedical entities and their interactions can vary across biological conditions, such as tissues, cell types, and disease states. Nevertheless, generalizing knowledge from context-unaware networks to context-specific problems presents considerable challenges. For instance, modeling tissue- or cell type-specific interactions from the human reference PPI network requires the construction of tissue- and cell type-specific networks and the development of multi-scale network models greene2015understanding ; ietswaart2021genewalk ; li2021deep ; zitnik2017predicting . One approach to this challenge involves constructing context-specific networks (as discussed in Sections 2 and 3) and applying independent shallow network embedding layers to learn node representations based on network topology and tissue hierarchical structure greene2015understanding ; zitnik2017predicting . An alternative strategy is to learn shallow network embeddings on a context-unaware network, such that the embeddings of nodes operating in the same context are more similar to each other than nodes operating in different contexts ietswaart2021genewalk . Recent methods incorporate context in a data-driven manner, constructing cell type-specific PPI networks using single-cell transcriptomic data li2023contextualizing ; li2021deep . Unified by a network of cell type and tissue hierarchy, these networks can be harnessed to learn unique protein representations tailored to each cell type context li2023contextualizing ; li2021deep .

Understanding predictive models, benchmarking, and rigorous evaluation across diverse tasks. With the rapid evolution of graph learning methodologies, the need to construct rigorous benchmarks for effectively assessing the performance of these novel techniques is becoming increasingly urgent hu2020open ; shchur2018pitfalls . Open-science evaluation platforms such as the Benchmarking GNN dwivedi2020benchmarking , Open Graph Benchmark hu2021ogb ; hu2020open , and others (Table 1) serve as significant assets for general graph benchmarking, while other resources are being curated explicitly for the domain of network biology liu2023nleval .

To provide a comprehensive evaluation, these resources ought to be expanded to include tasks defined at various levels of graphs, including node classification, link prediction, subgraph classification and clustering, and whole-graph classification and regression. In addition to benchmarking models for predictive tasks, evaluation frameworks are needed for generative graph models. They should also encompass diverse types of biological graphs, such as heterogeneous, spatial, and temporal ones. A critical element in this regard is benchmarking the performance of network-based machine learning techniques across multiple dimensions of evaluation beyond accuracy, including robustness, generalizability, and computational efficiency.

Moreover, the explainability of graph-based learning can offer significant insights in the biomedical domain agarwal2023evaluating ; xie2022task ; ying2019gnnexplainer ; yuan2021explainability . Consequently, it is equally important to examine learned algorithms by examining pretrained graph representations forster2022bionic and mapping attention mechanisms in attention-based deep learning models elmarakeby2021biologically . As we move towards the broader application of machine learning models in network biology, proper quantification of the uncertainty, error, and utility associated with these models is indispensable. Given the potential for considerable uncertainty in these models, effective techniques for uncertainty quantification are required to fully comprehend the predictive capabilities and limitations of a given model abdar2021review .

When the model’s objective is specific, such as treatment recommendation, disease diagnosis and prognosis, and steady-state or transient network behavior prediction, an objective-driven approach to uncertainty quantification can be beneficial yoon2013quantifying . This approach allows us to quantify uncertainty based on its impact on the expected performance of prediction and intervention tasks. Ultimately, this can pave the way for optimal experimental design techniques dehghannasiri2014optimal ; dehghannasiri2015efficient that prioritize experiments to generate the most informative data points selected by active learning strategies, effectively reducing model uncertainty.

6 6. Network-based personalized medicine

Overview. The overarching goal of precision medicine is to develop diagnostic and treatment strategies tailored to individual patients aronson2015building ; kaiser2015nih ; NMD18 , while also taking into account the desired level of precision for each treatment. Personalized characterization of an individual or a group can encompass various data types, including molecular, healthcare, environmental, lifestyle, and behavioral information, commonly modeled and analyzed as networks PrzuljSci2016 . By assimilating data from different modalities, precision therapeutics can amplify their potential and bolster resilience against diverse data noise Glig2016 ; huang2021therapeutics ; huang2022artificial . Fusing data from multiple sources has proven effective in advancing precision medicine Thomas21 ; PSB2016 ; iCell2019 ; wang2014similarity .

Patient stratification. Precision medicine aims to provide individualized diagnostic and treatment strategies. Developing treatments tailored to specific patient groups based on distinct disease subtypes (Fig. 6A) is poised to transform a prevailing one-size-fits-all approach used in healthcare. Network methods can integrate multimodal data to identify patient groups with coherent genetic, genomic, physiological, and clinical profiles ektefaie2023multimodal ; PSB2016 ; petti2023network , even when the underlying data are incomplete and noisy pai2018patient . The methods assume that patients with similar clinical signatures and similar -omics profiles have similar clinical outcomes. Similarities between patients can be efficiently represented through patient similarity networks; in these networks, nodes symbolize patients, and weighted edges denote the degree of similarity derived from clinical and biomolecular patient attributes. Each patient data attribute, such as age, sex, mutation status, or gene expression profile, can be used to create a network of pairwise patient similarities. Then, the set of all such networks can be viewed as a multiplex network, with a layer for each of the attributes. Various similarity measures can be employed to assess patient similarity across different datasets corresponding to different attributes. After the multiplex patient similarity network is constructed, patient subtypes can be identified by examining the community (clustering) structure within the network. Communities are characterized as subsets of nodes that are densely connected to each other and loosely connected to nodes in different communities Fortunato2010 . Communities in a patient similarity network are thus densely/strongly linked patient groups and can shed light on distinct disease subtypes.

Network methods offer distinct advantages over non-network approaches, which often grapple with the complexities of integrated datasets gligorijevic2015methods . Patient stratification has increasingly benefited from network-based methodologies, which can elucidate intricate biological interactions, especially within disease mutation landscapes, such as cancer PSB2016 or rare hereditary diseases Thromb23 . By studying different types of gene-gene interactions, encompassing aspects like mutual exclusivity, co-occurrence, and both physical and functional associations, and analyzing personalized gene regulatory networks rogers2022network , one can better understand inter-individual variation in disease driven by differences in interactions caused by each patient’s genetic background, environmental exposures, and the proportions of specific cell types involved in disease van2018integrative . Such insights can elevate the accuracy of patient stratification, which is typically measured as the ability to classify patients as belonging to known disease subtypes pai2019netdx or the ability to identify disease biomarkers that generalize (maintain performance) when applied to new data that have not yet been seen by the model alsentzer2022deep ; kong2022network . These insights can also guide the refinement of therapeutic strategies, ensuring they are optimally tailored to specific patient groups dao2017bewith ; PSB2016 ; huang2023zero .

Identification of pathways associated with disease subtypes and patient groups. Identifying group-specific mutations provides valuable insights into the underlying biochemical pathways associated with the disease (Fig. 6B). These pathways can be conceptualized as networks, laying the foundation for an in-depth understanding of disease mechanisms. Incorporating individual mutation or expression data into pathway-based (i.e., network-based) methods aid in identifying targetable mutations park2019pathway . This approach is especially pertinent in determining functional pathways that play roles in expression responses to disease-propagating mutations, leveraging the concept of pathway centrality Sam22A ; Sam22 .

For instance, by integrating genomic, clinical, and therapeutic data through networks, physicians can categorize patients with treatment-resistant prostate cancer based on specific gene mutations like AR, PTEN, and BRCA2. Recognizing these mutations facilitates the adoption of personalized therapies, targeting the aberrant pathways distinctive to each patient’s tumor profile. As a result, this tailored treatment strategy offers the potential for safer and more effective treatments mateo2020accelerating .

Furthermore, recent research has illuminated the importance of tissue-specific regulatory networks and the pathways they encompass, which frequently manifest genetic mutations in particular patient cohorts. This understanding emerged from the combined analysis of expression and chromatin accessibility data, unveiling a previously unidentified tissue-specific stem-cell-like subtype of treatment-resistant prostate cancer that may be a target for intervention tang2022chromatin . Similarly, a comparative structural analysis of the chromatin structure network in chronic lymphocytic leukemia and control tissue of origin revealed that genes driving this cancer type are characterized by specific local wiring patterns not only in the chromatin structure network of chronic lymphocytic leukemia cells but also of healthy cells CLL20 . This allows for the successful prediction of new DNA elements related to this cancer type, and importantly, it shows that cancer-related DNA elements can be identified in other cancer types by investigating the chromatin structure network of the healthy cell of origin, a critical new insight paving the road to new therapeutic strategies CLL20 .

Identification of disease-dysregulated functional modules. Studying disease-dysregulated functional modules of genes can advance the understanding of disease beyond isolated mutations or pathway dysregulations. Disease-associated behaviors can materialize in clusters of tightly interacting proteins forming functional modules (Fig. 6B) agrawal2018large ; menche2015uncovering rather than exclusively via singular gene mutations or perturbed gene expression schadt2009molecular .

The quest to uncover disease-associated functional gene modules from molecular networks is a long-standing challenge with implications for precision medicine barabasi2011network ; choobdar2019assessment ; eyuboglu2022mutual ; Thomas20 ; mitra2013integrative ; gysi2022non . Prevailing approaches for finding disease modules rely on the assumption that interacting genes tend to associate with similar phenotypes. For instance, gene co-expression network analysis has been employed to pinpoint modules of genes that exhibit analogous co-expression patterns in breast cancer. Notably, these clusters of genes correlate with distinct metastasis progression patterns in patients chuang2007network . Multi-omic module detection in cancer can consider mutation mutual exclusivity, transcriptional regulation, and gene co-expression alongside PPI connections silverbush2019simultaneous .

Given the complexity of disease circuits in many complex diseases, concentrated efforts have been directed toward identifying disease-associated gene modules that correlate with patient phenotypes choobdar2019assessment ; saelens2018comprehensive . Disease-associated gene modules, identified through computational approaches and various types of gene networks, have been used to refine disease diagnosis morselli2020whole . They can also forecast the response of individual cell lines to specific anticancer agents and potentially suggest patient-tailored drug combinations kim2020identifying ; Salazar21 . Supplementing these techniques, differential network analysis (Section 2) can reveal differential connections or rewiring of a molecular network under varying conditions. This complements traditional differential gene expression analyses, giving a robust framework to investigate diverse conditions and, by extension, different patient groups gysi2020construction ; morselli2020whole ; tu2021differential .

Precision medicine’s applications in identifying candidate anticancer therapeutics have broadened its scope to probe molecular shifts linked with other diseases and aging. Recent research endeavors have used multi-omics strategies to pinpoint innovative therapeutic targets for ulcerative colitis voitalov2022module and rheumatoid arthritis li2023contextualizing . As another example, complementing the above discussion of detecting disease-associated modules of genes from a molecular network, modules of diseases have been detected from a heterogeneous disease-disease similarity network halu2019multiplex . Other studies have delved into molecular biomarkers, their regulatory pathways, and age-related modifications tseng2018peripheral . These studies aim to formulate therapies adeptly tailored to diverse age demographics. Complementing the focus on aging, there is a burgeoning interest in discerning patient sex-specific disparities. These lines of inquiry draw motivation from epidemiological data, which delineate differential patterns in the incidence, progression, and prognosis of complex diseases across gender and age brackets cannistraci2021age .

Drug repurposing and pharmacogenomics. Compared to traditional drug development, drug repurposing (Fig. 6C) offers significant advantages such as low cost, reduced risk, and faster drug development timelines cheng2018network ; langhauser2018diseasome ; pushpakom2019drug ; unsal2023nmsdr . While early examples of successfully repurposed drugs have been identified through serendipitous discoveries, the availability of massive amounts of -omics and knowledge data and advances in computational techniques have provided opportunities for systematic in silico inference of novel indications for existing drugs guney2016network ; huang2023zero ; wen2023multimodal ; Xenos23 ; Carme21 . Network science and machine learning models have demonstrated impressive capabilities, but the bar for clinical applications is high. For example, an ensemble network approach has been used to identify drug candidates for repurposing against COVID-19 viral replication morselli2021network ; patten2022identification . As another example, a heterogeneous network approach revealed diseases that are most similar to COVID-19, thus reflecting conditions that are risk factors in patients and suggesting the suitability of this approach for use in drug repurposing Verstraete2020CovMulNet19 . Validation of the most promising computational predictions in the laboratory yielded an order of magnitude more potent candidates than non-guided experimental screening. In pharmacogenomics, graph convolutional neural networks trained on heterogeneous networks of drug-drug interactions identified adverse events due to polypharmacy and concomitant use of medications zitnik2018modeling . Furthermore, deciphering drug-cell connectivity data, indispensable for patient-specific drug repositioning, gains momentum by embedding PPI networks using tensor completion algorithms bumin2022fit .

The role of medical imaging in precision medicine. In addition to -omics data, medical images have emerged as an important new data modality that can facilitate precision medicine, including disease detection, diagnosis, and therapeutic interventions comaniciu2016shaping ; lambin2017radiomics . Often, medical images encompass distinct topological patterns of target entities that can serve as diagnostic signatures or biomarkers, such as the dendritic structure of the trachea or clustering behaviors of immune cells. Combining these topological signatures with deep learning algorithms offers a substantial advantage in various medical image analysis endeavors, including segmentation, classification, registration, and tracking, and can help with the interpretability of deep learning models. Building tools to compute topological and deep learning representations of imaging data inaugurates new avenues for nuanced analysis, unveiling hidden patterns and intricate correlations within multifaceted datasets edelsbrunner2002topological . These developments have catalyzed the birth of topology-infused deep learning techniques for myriad applications, spanning from segmenting retinal vessels hu2019topology ; shit2021cldice to discerning retinal arteries/veins mishra2021vtg and forecasting protein semantic similarities wang2022tango .

An important application of network-based precision medicine lies in brain disorders, where medical image analysis intertwines with network and -omics data (Fig. 6D). Specifically, procuring multimodal neuroimaging, neural network configurations, genetic markers, and other biomolecular signatures could allow for gaining insights into the neural architectures of the human brain, the modulation of its functionalities by network topographies, and the genetic interplays that correspond to disease-specific cerebral patterns. An emergent discipline, dubbed connectome genetics, heralds the meticulous delineation of human neural connectivity, unraveling its ties to cognition, behavior, and the genetic underpinnings of individual neural circuit variances arnatkeviciute2021genome . Graph mining techniques combined with data science methods have been devised, geared towards personalizing diagnosis and therapy by leveraging the multifaceted data from connectome genetics arnatkeviciute2021genetic ; jahanshad2013genome ; sha2023genetic . The recent advent of GNN-driven deep learning models further deepens our grasp on the intricate shifts within this data, advancing our understanding of neurological diseases and their heterogeneity across patient populations zhang2019brain ; zhang2021disentangled ; zhao2022revealing .

The role of social and contact networks in healthcare. Biological networks hold significant promise for advancing personalized medicine. In tandem, social, support, and contact networks correlate with individual health outcomes (Fig. 6E), providing valuable insights into patient behaviors and sentiments smith2008social . Such networks offer real-time perspectives on patient inclinations, such as therapy adherence preferences. Moreover, they can model patient behaviors associated with medication consumption, enabling the formulation of individualized intervention strategies guinazu2020employing . The confluence of health and social networks has been harnessed to forecast individual health outcomes, including mental health parameters like anxiety and depression. These predictions emerge from a rich tapestry of data sources, including combinations of heterogeneous social network data and wearable health measures liu2021heterogeneous , and dynamic social network interactions Liu2020 .

In global health emergencies, networks detailing interpersonal contacts have been pivotal in predicting disease transmission. The COVID-19 pandemic spurred the creation of composite models that integrate contact information with individual patient attributes hiram2022disease . Within such models, nodes signify individuals, while links – static or temporal/dynamic – depict inter-individual interactions. Distinct individual features, such as health status (e.g., healthy or recovered), are encapsulated as node-associated feature vectors. Grounded in theoretical foundations of susceptible-infectious-recovered models hiram2022disease , these approaches are nuanced and can account for real-world contact patterns. They allow for simulation and evaluation of public health response strategies, from containment measures to vaccination campaigns alguliyev2021graph ; bryant2020modelling ; stegehuis2016epidemic . For example, designing a vaccination strategy targeting individuals based on contact behaviors could preempt outbreaks. Since the design of a tailored vaccination strategy may save lives and control the epidemic spreading, we believe that more work should be done to improve these models by designing novel simulation algorithms which require less computational power. Actually, many simulation models require the inspection of all the nodes and edges for each simulation run, making them difficult to run on very large graphs Fortunato2010 ; hiram2022disease .

Open questions for network-based precision medicine. Despite notable advancements in network methods for precision medicine, several challenges remain. These include model benchmarking and comparison, integration of multimodal data from individual patients, and strategies to achieve the intricate equilibrium between preserving patient confidentiality and maximizing the utility of these approaches. Evaluating new methods is complex because establishing ground-truth, i.e., gold-standard or “correct”, benchmarks against which various network strategies can be compared guo2021challenges remains challenging. Evaluating precision therapeutics in vivo presents even greater challenges, given the impossibility of retroactively altering treatment modalities for the same individual at a specific temporal junction. Garnering multimodal data about a single patient presents its own difficulties, as diverse data types vary in quality and completeness wang2014similarity ; zitnik2019machine . In light of these complexities, there is a need for graph learning algorithms tailored for data-intensive multimodal networks. Importantly, new network embedding methodologies may provide simplification of these complexities into new modeling paradigms that are easier to comprehend and compute on Doria23 ; Doria23A ; Xenos21 . Furthermore, it is imperative to foster computational paradigms adept at handling patient data in a manner that safeguards privacy while not compromising on scientific robustness and safety hunter2012reporting .

Precision medicine stands poised to enable transformative shifts in disease diagnosis, therapeutic interventions, and overall patient care. Network methods and multimodal data integration are instrumental to these ambitions. Addressing intrinsic challenges related to small-sample datasets that lack statistical power and magnifying methods’ susceptibility to misinterpretation and unstable performance is paramount for furthering its nascent triumphs. Surmounting these obstacles requires interdisciplinary research involving network biology scientists, clinicians, and healthcare policymakers to ensure that precision medicine evolves as a paradigm for disease diagnosis, prevention, and treatment that works equally well for all patients by taking into account individual differences in lifestyle, socioeconomic factors, environment, and biological characteristics all2019all .

7 7. Research discussion and future outlook

Even the well-established network biology research topics/problems, such as network inference (Section 2), have many known limitations and thus open questions associated with them. The emerging research problems, such as network-of-networks analysis (Section 3) or determining how the explosion of large language models can benefit network biology, will have even more challenges associated with them, as expected, given that these problems have started to receive attention only recently; such challenges are discussed below. The emerging problems also bring exciting new opportunities. In the following sections, we build upon the discussion about limitations and open questions from the previous sections, link together common themes from the earlier sections, and complement the previous sections by introducing additional open problems and opportunities.

7.1 On methodological paradigms and empirical evaluation

The need to compare different categories of approaches designed for the same purpose. For several topics discussed thus far, a common theme has been that it remains unclear how specific categories of approaches for a given purpose compare to each other in terms of methodological (dis)advantages, as well as in which network analysis tasks or biological/biomedical applications they might be (in)appropriate to use. For example, with network alignment, methods from biological and other (e.g., social) network domains are rarely evaluated against each other (as discussed more below); with network-of-networks analysis, the existing approaches were proposed for different network analysis tasks or biological/biomedical applications and have not yet been compared to each other (Section 3); with hypergraph versus pairwise graph analyses, it remains unclear to what extent different tasks actually benefit from hypergraph-based methods (Section 4).

Focusing more on network alignment, methods for this purpose introduced for biological networks have typically been thoroughly compared to each other (Section 3), including fair comparison of different approach categories, such as global versus local network alignment Guzzi2017 ; Meng2016 , pairwise versus multiple network alignment PNAMNA , or alignment of static versus dynamic networks DynaMAGNA++ . On the other hand, network alignment methods introduced in network biology have rarely been compared to those introduced in other domains such as social networks, and vice versa, despite having similar if not the same goals – mapping related nodes or network regions across compared networks. This could be because biological networks have significantly fewer nodes and are likely noisier than other (e.g., social) networks eyuboglu2022mutual . This could also be because networks in different domains contain different types of data, which makes the methods customized to their specific data types, rendering their comparison challenging or requiring methodological extensions and new developments. Or, it could be because developers of methods in different domains are from different scientific communities and may thus be unaware of each others’ scientific discoveries (Section 8). In either case, it is critical to understand the methodological (dis)advantages of approaches from different domains. Their comprehensive and fair comparison could be a step in this direction, guiding the development of more powerful and possibly more generalizable network alignment approaches.

Network biology has traditionally relied on approaches that work directly on graph topology. In contrast, in recent years, the field has seen an increasing interest in network embedding – be it via earlier spectral-based or diffusion/propagation/random-walk based methods or more recent deep learning methods – which first transform graph topology into compact numerical representation vectors, i.e., embeddings, and then work on these graph representations (Section 5). A comparative study of non-embedding approaches that work directly on graph topology against network embedding methods was performed in a broad set of contexts: network alignment, graph clustering (i.e., community detection), protein function prediction, network de-noising, and pharmacogenomics Nelson2019embed . The finding was that in terms of accuracy, depending on the context and evaluation measures used, sometimes direct, graph-based methods outperformed network embedding ones and other times, results were reversed; regarding computational complexity/running time, embedding methods outperformed direct, graph-based methods most of the time Nelson2019embed . These indicate the need for a deeper combination of these approaches.

Also, network biology has traditionally relied on combinatorial or graph-theoretic techniques, i.e., on manually engineered or user-predefined topological features of nodes or graphs (the field has also relied on additional method types, e.g., those from the physics community within the field of network science, but these are not the focus of discussion here). For example, a prominent research problem of the graph-theoretic type that has revolutionized the field of network biology is counting graphlets/subgraphs in a graph; various node-, edge-, or network-level features based on these counts are then applicable to many downstream computational tasks and biological/biomedical applications, as discussed in Section 4. More recently, network biology has benefited from the boom in deep learning (e.g., GNNs), which can automatically generate relevant network topological features prominently via graph representation learning (Section 5). It remains unclear which of graph-theoretic versus deep learning approaches (i.e., manually engineered versus automatically generated network topological features) are better and in which contexts. In other words, both approach categories seem to have merits depending on the context. Again, the question is how to combine them for improved performance.

As an example, graphlet-based and GNN-based analyses of protein structure networks were shown to outperform traditional non-network-based analyses of protein sequences and 3D structures in the tasks of protein structure comparison/classification and protein function prediction, respectively GRAFENE ; gligorijevic2021structure ; NETPCLASS . Only recently, the graphlet and GNN approaches were evaluated against each other when comparing protein structures, by the authors who proposed using GNNs for studying 3D structures gligorijevic2021structure . They found that graphlet-based analyses greatly outperformed GNN-based analyses in accuracy, although they found the latter to scale better to denser protein structure networks berenburgyoutube1 .

The relatively inferior performance of GNNs compared to graphlet-based approaches in that particular network-based protein structure comparison berenburgyoutube1 can potentially be elucidated as follows. Given that network comparison represents an NP-hard undertaking, a viable computational strategy that balances feasibility and efficacy involves the comparison of network substructures. Graphlets, by design, embody such an approach. Early GNNs were initially not designed for modeling subgraphs. So, it might not be surprising that popular GNN architectures cannot count graphlets and subgraphs and thus might not be the right methodological choice for specific scientific problems chen2020can . Nevertheless, recent advancements in the field have yielded a spectrum of novel GNN methodologies tailored to subgraph modeling and enumeration. Theoretical underpinnings have emerged that show the expressive capacity of GNNs, delineating which classes of GNN architectures are proficient or deficient in quantifying specific subgraph structures bouritsas2022improving ; chen2020can ; tahmasebi2020counting ; tahmasebi2023power ; yu2023learning . For example, while message-passing GNNs have been popular architectures for learning on graphs, recent research has revealed important shortcomings in their expressive power. In response, higher-order GNNs have been developed that substantially increase the expressive power, although at a high computational cost tahmasebi2020counting . These techniques demonstrate the potential to enumerate subgraphs, thus circumventing the established limitations of low-order (message-passing) GNNs while exploiting sparsity to reduce the computational complexity relative to higher-order GNNs tahmasebi2020counting . Further, recent recursive pooling methods centered on local neighborhoods and dynamically rewired message-passing techniques gutteridge2023drew improve performance for tasks relying on long-range interactions. Finally, innovative methods based on graph transformers ying2021transformers ; zhang2022hierarchical afford a spectrum of trade-offs between expressive capability and efficiency of machine learning models.

Related to the above discussion, recent developments have highlighted the emergence of state-of-the-art geometric deep learning models trained on protein 3D structures baek2021accurate ; abramson2024accurate . Many models focus on proteins’ structural surfaces and some explicitly incorporate the underlying protein sequence or structural fold information zhang2023full ; dauparas2022robust . Notably, these models have enhanced performance in various tasks associated with predicting interactions between proteins and other biomolecules gainza2020deciphering ; gainza2023novo ; baek2024accurate . These tasks encompass critical areas such as protein pocket-ligand prediction, prediction of PPI residues, ultrafast scanning of protein surfaces to forecast protein complexes, and the design of novel protein binders gainza2020deciphering ; gainza2023novo . Geometric deep learning methods that model protein 3D structures as networks are promising. Such approaches were shown to outperform existing scientific methods traditionally used in a variety of tasks related to structure-based modeling and prediction of protein properties; the existing methods included network approaches that are not based on geometric deep learning stark2022equibind ; wang2022learning ; zhang2023protein . The tasks in question included drug binding, PPI prediction, and protein fold, function, or reaction prediction/classification stark2022equibind ; wang2022learning ; zhang2023protein .

A potential avenue to handling different approach categories/paradigms, such as those discussed above, each with its own merits depending on the context, is to propose algorithmic improvements toward reconciling them. Another is to carry out empirical evaluation of different approaches in a variety of different contexts: at various levels of graph structure (e.g., node, edge, subgraph, or entire network), for diverse types of graphs (e.g., heterogeneous, dynamic, spatial), in different computational tasks (e.g., node classification, graph classification, link prediction), and different biological/biomedical applications (e.g., protein function prediction, cancer, aging, drug repurposing). The following sections discuss these two avenues in more detail.

Algorithmic improvements towards reconciling diverse methodological paradigms. An algorithmic solution to handling different approach categories for the same purpose is to design hybrid methods that employ techniques from all associated disciplines. For example, deep learning methods can be combined with a network propagation approach to improve the embedding of multiple networks nasser2023bertwalk . Alternatively, a theory that would unify different approach categories could be proposed. For instance, the field of neural algorithmic reasoning focuses on developing deep learning models that emulate combinatorial algorithms velivckovic2021neural . As a case in point, a transformer neural architecture, which was initially devised for natural language processing, has been repurposed to tackle the combinatorial traveling salesperson network problems bresson2021transformer and graph-structured datasets yun2019graph . A primary objective of this discipline is to investigate the capacity of (graph) neural networks to learn novel combinatorial algorithms, particularly for NP-hard challenges that necessitate heuristic approaches. Put differently, the aim is to ascertain if deep learning can extract heuristics from data more effectively, potentially superseding human-crafted heuristic methods that could demand years of dedicated research to formulate for NP-hard problems bresson2021transformer .

Another potential solution on the methodological level relies on the fact that current GNN approaches mainly adopt deep learning from other domains outside of network biology. As such, it is necessary to understand the correct inductive biases within a deep learning model that are representative of a biological mechanism under consideration. For example, can and should the hierarchical structures of ontologies, such as the GO or Disease Ontology, be incorporated into the GNN structure used for predicting proteins’ functions or disease associations, respectively? Existing work on visible neural networks shows that such an attempt to incorporate a cell’s hierarchical structure and function into the architecture of the deep learning model is effective and facilitates interpretability as the model’s components naturally correspond to biological entities Thomas20 ; ma2018using . Even the hierarchical network-of-networks idea is not only useful as a potent new way to represent and analyze multiscale biological data as discussed in Section 3, but also as a novel graph representation learning methodology for popular network analysis tasks that are not necessarily of the multiscale nature. For example, there exist studies that take multiple networks as input, all at the same scale, and then perform the well-established tasks of graph embedding du2019mrmine or classification wang2022imbalanced via novel hierarchical approaches, e.g., a graph-of-graphs neural network wang2022imbalanced , or matrix-factorization based data fusion iCell2019 .

Another relevant question is how generalizable versus specific an approach should be. One frequent issue is selecting a suitable similarity measure. For instance, this issue arises when deciding which property of a graph should indicate the proximity of its nodes in an embedding produced by a GNN, or when discerning relationships between biomolecules for inferring correlation or regulatory networks by linking nodes with edges. Selecting an optimal similarity measure for a specific task or application often requires extensive empirical assessment, evaluating multiple measures against one another. It remains a challenge to discern whether a universal, principled similarity measure exists. The answer could potentially be specific to individual tasks or applications or broad categories of analogous tasks. The emphasis on generalizability also begs the question of its desirability; sometimes, the focus should be finely tuned to the specific task, application, or audience ektefaie2024evaluating . Furthermore, in some contexts, dissimilarity (or distance) might be more pertinent than similarity. For example, proteins can have opposing effects on each other despite working on the same functional goal weber2020recent ; badia2023gene ; szklarczyk2023string . As another example, neighboring edges might mean different things, such as up- versus down-regulation of genes. An essential consideration is the selection of distances with theoretical underpinnings that facilitate efficient optimization Cao2013 , including distances that provably uphold the triangle inequality ding2006transitive and distances specified on smooth manifolds that yield symmetric positive semi-definite distance matrices wang2018network . Moreover, in typically high-dimensional spaces, the compromises entailed when our chosen distances forsake theoretical properties can be significant, potentially distorting interpretations and downstream analyses Beyer1999 ; Radovanovic2010 .

Uncertainty quantitation and confidence estimation. Uncertainty quantification presents a unique set of challenges. The inherent structure and complexity of network datasets introduce nuances not observed in other data modalities. The primary challenge lies in distinguishing between aleatoric (data-related) and epistemic (model-related) uncertainties while effectively mitigating potential biases that can distort predictive performance hullermeier2021aleatoric ; zhao2020uncertainty . Aleatoric uncertainty, stemming from inherent biological variation and limitations of experimental technology, encompasses variability arising from naturally random effects and natural variation intrinsic to the data hullermeier2021aleatoric . For instance, in PPI networks, inherent biological variability can lead to uncertainties in node or edge properties. On the other hand, epistemic uncertainty is engendered by a lack of knowledge or limited modeling assumptions. This type of uncertainty is particularly pronounced in graph-based tasks due to the myriad ways graphs can be represented, processed, and interpreted. For instance, different choices in GNN model architectures or graph pooling strategies can introduce varying degrees of epistemic uncertainty hullermeier2021aleatoric . Effectively quantifying and addressing these uncertainties is paramount for ensuring reliable and robust findings, especially when making critical decisions based on such models.

Additional considerations for proper empirical method evaluation: benchmark data, performance measures, code and data sharing, best practices. Establishing appropriate benchmark data (including ground-truth data for training and testing/evaluating a predictive model), evaluation measures, and benchmark frameworks is critical to allow for systematic, fair, and unbiased method comparison. Valuable efforts already exist (Table 1). Nonetheless, notably, such frameworks must allow for continuous evaluation as new methods and algorithms will continue to appear. Best practices and guidelines on assessment in network biology are needed.

Lessons learned from challenges in biomedicine such as Critical Assessment of protein Structure Prediction (CASP) kryshtafovych2023new ; Kryshtafovych2021 ; Moult1995 , Dialogue on Reverse Engineering Assessment and Methods (DREAM) meyer2021advances ; saez2016crowdsourcing ; stolovitzky2007dialogue , and Critical Assessment of protein Function Annotation (CAFA) Jiang2016 ; Radivojac2013 ; Zhou2019 can perhaps help guide the development of best evaluation practices specific to network biology. Such challenges are a paradigm for unbiased and robust evaluation of algorithms for analysis of biological and biomedical data, which crowdsources data analysis to large communities of expert volunteers Costello2013 ; saez2016crowdsourcing . Challenges are done in the form of collaborative scientific competitions. Through these, rigorous validation and reproducibility of methods are promoted, open innovation is encouraged, collaborative communities are fostered to solve diverse and critical biomedical problems and accelerate scientific discovery, the creation and dissemination of well-curated data repositories are enabled, and the integration of predictions from different methods submitted by challenge participants provides a robust solution that often outperforms the best individual solution saez2016crowdsourcing .

CASP is the earliest formal method assessment initiative in computational biology Moult1995 . While network biology approaches can be used for CASP’s protein structure prediction and CAFA’s protein function prediction problems, DREAM was explicitly initiated in response to a network biology need – to reverse-engineer biological networks from high-throughput data stolovitzky2007dialogue . Since then, numerous DREAM Challenges have been conducted spanning a variety of additional computational (not necessarily network) biology topics, including TF binding, gene regulation, signaling networks, dynamical network models, disease module identification, scRNA-seq and scATAC-seq data analysis, single-cell transcriptomics, and drug combinations⁴⁴4https://dreamchallenges.org/ meyer2021advances . Note that in addition to these initiatives focused solely on computational biology tasks, there exist community benchmark frameworks for general graph-based machine learning that also handle some computational biology tasks, which could thus also serve as significant assets. An example is Open Graph Benchmark hu2021ogb ; hu2020open (Section 5), which includes the task of predicting protein function from PPI network data with fully reproducible results and directly comparable approaches using the same datasets⁵⁵5https://ogb.stanford.edu/docs/leader_nodeprop/#ogbn-proteins. Other examples are shown in Table 1.

Interestingly, some of the common themes that emerged from the original 2006 DREAM initiative stolovitzky2007dialogue still hold to this date. The current biological network data may not be mechanistically accurate, yet they can still help understand cellular functioning. Exploring condition-specific biological networks is important because network properties can differ in different conditions. While there exist some highly trusted biological data (e.g., the reference HURI PPI network for humans luck2020reference ) that may serve as ground truth for understanding (dis)advantages of network algorithms, synthetic network data that are much easier to generate will continue to be necessary for evaluating algorithm performance. However, experimentalists are unlikely to trust any scientific findings from synthetic data or computational approaches evaluated only on such data. Further, regarding ground-truth data for training and testing/evaluating a predictive model, it is critical to have available knowledge on both positive and negative instances in ground-truth data. Examples of the latter are PPIs or protein-functional associations that do not exist in cells. However, such negative instance data are hard to obtain in biology.

To add to the discussion about ground-truth data, using the aging process as an example, ground-truth data about human aging have been obtained in one of two ways: via sequence-based homology from model species Magalhaes2009a or via differential gene expression analyses in humans berchtold2008gene ; jia2018analysis . In a recent study li2021improved , only 17 genes were shared between the 185 sequence-based and 347 expression-based human aging-related genes. This poses several questions. How do we resolve such discrepancies with datasets on the same biological process resulting from different modalities/technologies, which likely exist in other applications as well? Given their high complementarity, perhaps integrating the different data types could yield more comprehensive insights into the biological process under consideration. However, if any of the other datasets are noisy, or if the different data types have different “signatures” (i.e., features) in a biological network, their integration could decrease the chances of detecting meaningful biological signals from the network compared to analyzing the different data types individually. Moreover, because different types of biological data collected via biotechnologies (e.g., genomic sequence data versus transcriptomic gene expression data versus interactomic PPI data) are likely to capture complementary functional slices of the given biological process, is it appropriate to use some of these datasets as the ground-truth data to validate predictions obtained via computational analyses of the other datasets? In our example of the aging process, is it appropriate to use sequence-based or expression-based aging-related knowledge to validate network-based aging-related gene predictions? Is this appropriate, especially because sequence-based and expression-based “knowledge” are also computational predictions, i.e., the result of sequence alignment and differential gene expression analysis, respectively? Also, is this appropriate because sequence-based knowledge about human aging are sequence orthologs of aging-related genes in model species? So, would any aspects of the aging process that are unique to humans be missed by the knowledge originally collected in the model species?

Another challenge with empirical evaluation is accurately estimating the absolute and relative performance of machine learning models and quantifying the uncertainty of performance estimates. Network data is inherently relational, thus inevitably violating the assumptions of independent and identically distributed data Neville2009 ; Neville2012 . Even further, the problems with long-tailed degree distribution in biological networks and homology between nodes require careful selection of training and test data when evaluating performance accuracy Hamp2015 ; Lugo-Martinez2021 ; Park2012 .

Also, to allow for proper method evaluation, the authors of original methods must publicly release complete and easy-to-use code and data from their papers to allow for reproducing the initial studies and applying and evaluating a given method on new data heil2021reproducibility . Journals and other publication venues should and typically do establish requirements for data and code sharing. Consequently, scientific communities have shown remarkable improvements regarding releasing open-source software and data. Yet, ensuring compliance remains an issue. For example, while code or data might be released, they are sometimes incomplete or not easy to use. Or, there are instances when there might be a link (e.g., to GitHub) provided in the corresponding publication to meet the publication venue requirements, but the link might point to a page that says “under construction”, to an empty directory, or to a directory containing some files but without a transparent readme file on how to use the information provided. Who should ensure compliance with publication venue requirements, i.e., that complete and easy-to-use code and data are provided to ensure easy reproducibility? The editors of a venue publishing a given paper? The reviewers already volunteering their virtually non-existent “free” time to evaluate the paper’s scientific merits for publication should thus probably not be expected to invest even more effort to verify that the code and data can be run correctly. The authors? The future readers of the paper who might be interested in using the method? If the latter two, what should be the repercussions if it is found that the code or data do not exist or are not possible or easy to use? On a related note, how long after publication should the authors be required to maintain the project code and data and respond to related email inquiries? Hosting of the code and data is not an issue for authors due to availability of archival data repositories such as Zenodo. However, actively maintaining the code and data is an issue, and this is directly related to whether and how long after the project completion the funding by the federal agencies and others might be available for this purpose.

Complete transparency in all decisions (from graph construction to analysis) is crucial. Workflow management systems, such as Nextflow di2017nextflow and Snakemake koster2012snakemake , can enable rapid prototyping and deployment of computational workflows by combining software packages and various tools. Clear documentation, open-source sharing of code and algorithms, and making raw and processed data available can ensure that results are not just a one-off finding but can be consistently reproduced and built upon by the broader scientific community.

7.2 On missing data

Network completeness and interaction causality. Much of network biology relies on aging technologies with notable limitations. Focusing on physical PPIs, biotechnologies such as yeast two-hybrid systems Fields1989 , cross-linking mass-spectrometry Piersimoni2022 , and structural determination of protein complexes Jacobsen2007 ; Rhodes1993 ; Saibil2022 have collectively generated systems-level data that have led to critical methodological advances in network biology. Of course, these efforts to obtain the physical interactome have been complemented by valuable data collection and network inference efforts related to systems-level correlation networks. However, as computational methods are now maturing, the data are starting to lag. High-resolution, high-throughput data-generating technologies, capable of directly identifying pathways and order of molecular events in various experimental and clinical contexts, are the next frontier for deeper understanding of molecular systems.

There is a need to expand from physical and correlation networks toward causal relationships belyaeva2021causal or simulatable kinetic models karr2012whole . For this, biotechnologies for data collection need to be improved to allow for higher-quality data to build better causal networks and more complete networks. This will also require the development of new (categories of) approaches that can handle the captured causality. Even if/when we have high-quality causal networks and efficient and accurate methods for their analysis, will this suffice to understand biochemical mechanisms? When one knows biochemical mechanisms, one can infer causality. However, causality might not necessarily allow for fully understanding biochemical mechanisms.

Algorithmic research to guide data generation efforts. It will likely be beneficial to integrate multi-omic network data with BKGs to offer precise and targeted treatments for rare diseases alsentzer2022deep . Such network data with richer semantics will more directly help suggest biological hypotheses sanghvi2013accelerated ; wang2023scientific or support iterative data generation and analyses through active learning sverchkov2017review ; zhang2022active . Informing laboratory experiments using predictions from computational studies could be a path forward to build more complete and accurate data, which could lead to developing new, more advanced network analysis methods to further inform and improve laboratory experiments.

How network biology (primarily algorithmic research) can best support the collection and analysis of multimodal data is quite an important question, especially when collecting multimodal data for the same individuals, including building personalized (i.e., individual-specific) networks. An answer here could be to first figure out what question will be asked in which task/application and then design a data collection strategy. One might want to define optimal datasets. Or, one might want to find unifying factors within data modalities; this is precisely why there is a need for multimodal data for the same individuals, at least some of the data/individuals. This might require systematic, comprehensive, and well-funded consortia efforts. Perhaps algorithmic approaches such as active learning can help prioritize what data should be collected, e.g., from specific populations or about particular biological functions. As success in experimentally collecting or computationally inferring various types of biological networks continues to improve, research efforts likely should shift towards obtaining a predictive understanding of personalized networks. Moreover, even within a single individual, molecular networks vary across tissues and cell types, posing additional challenges in defining an individual-specific network.

Network dynamics. Another data component that is currently missing or is very scarce is network dynamics. Various types of time-dependent perturbation data could help infer dynamic biological networks. Examples of tasks/applications that have benefited from dynamic network analysis in biology are as follows.

One example is the task of network alignment: unlike traditional network alignment that has compared static networks (Section 3), recently, the problem of aligning dynamic networks has been defined, and several algorithms have been proposed for solving the newly defined problem GoTWAVE ; DynaMAGNA++ ; DynaWAVE . The challenge here is the lack of experimentally obtained dynamic biological network data, which is why such methods have been evaluated on synthetic networks, computationally inferred dynamic biological networks, or dynamic networks from other domains GoTWAVE ; DynaMAGNA++ ; DynaWAVE .

Another example is a recent network-based study of the dynamics of the protein folding process newaz2022multi . A key challenge is the lack of large-scale data on protein folding intermediates, i.e., 3D conformations of a protein as it undergoes folding to attain its native structure. Experimental data of this type are lacking even on the small scale newaz2022multi . Traditional computational, simulation-based studies, as well as the recent network-based effort newaz2022multi , all approximate the folding intermediates of a protein from the protein’s final (or native) 3D structure. Obtaining the actual protein folding intermediates experimentally is unlikely to happen any time soon, especially at a large scale, so computational efforts will be needed. With recent breakthroughs in protein structure prediction, e.g., AlphaFold jumper2021highly , this need represents an excellent opportunity for computational research to help obtain, model, and analyze the resulting dynamic data.

A further example is a dynamic network analysis of the aging process, i.e., predicting new aging-related genes from a dynamic aging-specific PPI network (Section 2). Here, a key challenge is that shockingly, using newer aging-related gene expression and PPI network data obtained via newer and thus higher-quality biotechnologies to infer a dynamic aging-specific network does not yield more accurate aging-related gene predictions than using older data of the same type from over a decade ago when dynamic network analyses of aging were pioneered li2022towards . It was also observed in a different study on active module identification that using newer network data typically did not lead to more biologically meaningful results lazareva2021limits . Going back to aging, it remains unclear whether the issue is with gene expression data, PPI network data, methods for integrating the two to computationally infer a dynamic aging-specific network, network methods used for feature extraction from the aging-specific network, ground-truth data on which genes are aging- versus non-aging-related, or something else entirely li2022towards .

As our final example, we discuss quantitative and qualitative mathematical modeling of network dynamics from the systems biology perspective le2015quantitative ; kestler2008network . Quantitative formalisms provide a precise description of the evolution of the system, including its temporal aspects; they are strongly dependant on the availability and precision of the required parameters. At the other end of the spectrum, qualitative (logic) frameworks have the advantage to be simpler, with no requirement for quantitative parameters, allowing analytical analyses. Logical models allow coarse-grained descriptions of the properties of the biological network and bring out key actors and mechanisms controlling the dynamics of the system maheshwari2017framework . Recent efforts use -omics data, including single-cell transcriptomes, to construct or contextualize Boolean models herault2023novel ; montagud2022patient ; schwab2021reconstructing .

Towards inclusive and equitable precision medicine. Progress in computational (including network) biology and biomedicine has been hindered by a lack of -omics data encompassing vast human diversity cruz2023importance . Underrepresentation of human genetic diversity has drastically weakened the biological discoveries that would benefit all populations, leading to health disparities. The traditional one-size-fits-all healthcare model meant for a “typical” patient may not work well for everyone. In response, the National Institutes of Health has aimed to invite one million people across the United States to help build one of the most diverse health databases in history, welcoming participants from all backgrounds through the “All of Us” program⁶⁶6https://allofus.nih.gov/. Inclusivity is at the core of the program: participants are diverse in terms of their races, ethnicities, age groups, regions of the country, gender identity, sexual orientation, socioeconomic status, education, disability, and health status. The data collected through the program is expected to lead to discoveries on how our biology, environment, and lifestyle affect our health. Unlike traditional research that has focused on a particular disease or group of people, this program aims to build a diverse database that can inform thousands of studies on a variety of health conditions. Availability of inclusive and diverse -omics data, design of research studies that intentionally and carefully account for such data, and development of computational methods and evaluation frameworks that handle such data in a fair and unbiased manner will be critical for advancing computational biology and biomedicine for all populations and reaching health equity.

Beyond the issue of underrepresentation, certain populations are intrinsically limited in size, such as rare diseases, which are inherently limited in clinical cases banerjee2023machine . Studying a substantial fraction of a small population may still result in data that do not yield health outcomes comparable to those from larger populations. In such scenarios, amassing more data may not be feasible, leading to small-sample datasets that can lack statistical power and magnify the susceptibility of computational models to misinterpretation and unstable performance. Network analysis techniques can play a pivotal role in addressing this challenge. Techniques such as few-shot machine learning alsentzer2022deep and domain adaptation he2023domain for network methods are instrumental in enabling computational models to learn patterns from small datasets and generalize to newly acquired data. Such models can adapt and generalize across diverse populations, thereby enhancing the robustness and applicability of health outcomes derived from datasets with small numbers of samples.

7.3 Other major future research advancements

The interface between network biology and large language models. Large language models (LLMs), such as ChatGPT and GPT-4, create opportunities to unify natural language processing and knowledge graph reasoning fatemi2023talk ; pan2024unifying , owing to their wide-ranging applicability. Nevertheless, LLMs often serve as black-box models, presenting limitations in comprehensively capturing and accessing factual knowledge. In contrast, BKGs are structured knowledge models that systematically store extensive factual information. BKGs have the potential to enhance LLMs by providing external knowledge that aids in inference and bolstering interpretability. However, constructing BKGs is intricate and dynamic, posing challenges to existing methods in generating novel facts and representing previously unseen knowledge. Thus, an approach integrating LLMs and BKGs could emerge as a valuable strategy, harnessing their strengths in tandem pan2024unifying .

The potential synergies between traditional text and structured knowledge graphs are becoming increasingly evident. Language model pretraining has proven invaluable in extracting knowledge from text corpora to bolster various downstream tasks. Yet, these models predominantly focus on single documents, often overlooking inter-document dependencies or broader knowledge scopes. Recent advances mcdermott2023structure ; yasunaga2022linkbert address this limitation by conceptualizing text corpora as interconnected document graphs. By placing linked documents in shared contexts and adopting self-supervised objectives combining masked language modeling and document relation prediction, such methods can achieve considerable progress in tasks like multi-hop reasoning and few-shot question answering. On a parallel front, while text-based language models have garnered substantial attention, knowledge graphs can complement text data, offering structured background knowledge that provides a useful scaffold for reasoning. In an emerging line of inquiry, studies yasunaga2022deep explore self-supervised paradigms to construct a unified foundation model, intertwining text and knowledge graphs. These approaches pretrain models by unifying two self-supervised reasoning tasks, masked language modeling, and link prediction, marking an exciting direction for future advancements in network biology.

LLMs, traditionally associated with the processing of natural language, possess a flexibility that extends their utility beyond text data luo2022biogpt . The underlying architectures, especially transformer-based designs like BERT and GPT variants, can be adapted to learn from any sequential data. In biology, this adaptability implies that LLMs can be trained on biological sequences, such as DNA, RNA, and proteins lin2023evolutionary ; rao2019evaluating ; xu2022peer . Rather than processing words or sentences, these models can assimilate nucleotide or amino acid sequences, thereby capturing intricate patterns and dependencies in genomic and proteomic data dauparas2022robust ; lin2023evolutionary ; mcdermott2023structure ; meier2021language . These cross-disciplinary advances in LLMs highlight their potential to advance the frontiers of computational biology. In addition to large sequence-based pretrained models like LLMs, an emerging area of structure-based pretrained models is concerned with generating new network structures, such as protein and small molecule networks bennett2023improving ; gainza2023novo ; rodrigues2022csm ; townshend2021geometric ; wang2022scaffolding .

Interpretabilty. Interpretability in network biology involves elucidating mechanisms of disease and health, such as tumor growth and immune responses. However, deep graph learning models are black-box systems with limited immediate interpretability as they produce outputs through a series of complex, non-linear transformations of input data points. This poses challenges in domains where clear insights are imperative. For instance, while dimensionality reduction techniques and graph representation learning algorithms produce compact latent feature representations of high-dimensional data and graphs, they often sacrifice the interpretability of the features they produce. Conversely, graph-theoretic signatures, which capture network motifs, graphlets, or other substructures, can amplify understanding of networks by identifying relevant structural patterns.

Future research directions in interpretability must focus on integrating domain-specific knowledge into model training and evaluation. By directly incorporating biological constraints and prior knowledge into model architectures, we can enhance interpretability without compromising predictive performance. Additionally, developing explainable techniques tailored explicitly for network biology is crucial. Exploring hybrid models combining interpretable statistical models with deep learning approaches is another promising avenue. Such models can leverage the strengths of both types to produce interpretable and accurate predictions. Likewise, creating advanced visualization tools that effectively convey complex model outputs and biological insights to researchers and clinicians is essential. These tools should be intuitive and enable interactive exploration of model predictions and features.

Reproducibility. Reproducibility in network biology research is a multifaceted challenge due to several reasons. (1) Graph construction: How a graph is constructed can drastically impact the insights drawn from it. For example, consider the problem of inferring an association PPI network. The decision to include only direct interactions versus both direct and indirect interactions can lead to vastly different network topologies. Choosing a threshold to determine an edge (e.g., a particular strength of interaction or confidence level) can also significantly alter the graph. (2) Edge definitions: What constitutes an edge can be subjective and is often based on the specific context. In a gene co-expression network, for instance, the definition of an edge might be based on a particular correlation coefficient threshold. A slight variation in this threshold can lead to including or excluding numerous interactions, thus changing the network’s structure and potentially its inferred properties. (3) Latent embeddings: Graph-based machine learning methods used to compute embeddings can have a significant effect on the results. Different embedding techniques capture different types of structural and feature-based information, leading to variations in tasks like node classification or link prediction. (4) Dynamic nature of biological networks: Biological systems are inherently dynamic. A PPI network at one point in time or under one set of conditions might differ from the network under another state. Thus, reproducing results requires the same methodology and the same or equivalent biological conditions. (5) Finally, graph sampling: In many cases, a subgraph or sample is taken due to the massive size of networks or computational constraints. The method and randomness inherent in this sampling can lead to non-reproducible results if not carefully controlled.

Towards wide adoption and translation of algorithmic innovation into practical and societal impact. The recommended method evaluation and data generation improvements discussed above are needed not just for method developers – typically, computational scientists – to be able to properly evaluate their new approaches against existing ones, but even more importantly, for adoption by end users – experimental scientists and in the long run, clinicians, healthcare workers, and patients (Section 8 comments more on this topic, including training needed for non-computational folks to use network approaches). The disconnect between computational and experimental scientists, even those dedicated to the common scientific goals Ramola2022 , suggests that efforts are necessary to overcome both technical and social challenges in interdisciplinary research fields. Computational scientists might need to consider not only traditionally algorithmic evaluation measures, such as precision, recall, and other performance criteria, but also measures that evaluate the utility and feasibility of integrating methods into scientific and clinical workflows huang2023zero ; huang2022artificial . Additionally, computational scientists are primarily incentivized to develop new algorithms and prototype software. In contrast, experimental and clinical scientists expect tools that are robust, trustworthy, and exhibit few glitches in practice. Authoritative evaluations, carried out by independent and interdisciplinary researchers on tasks directly relevant to downstream applications, are essential choobdar2019assessment ; marbach2012wisdom . Rapid and broad dissemination of these evaluations, recommendations, and guidelines for best practices should be prioritized in network biology.

Major milestones in network biology. The pinnacle of success for network biology would likely be a comprehensive and dynamic understanding of the entire cellular or organismal interactome across different conditions and life stages. This would include PPIs, gene regulation, metabolic pathways, cell signaling, and more. We can imagine a complete map of every biological interaction in an organism, from the level of genes and molecules up to tissues and organs, with the ability to zoom in on details and see dynamic changes over time or under different conditions. Another significant milestone would be the seamless integration of network biology with other disciplines to provide a holistic understanding of life. This means connecting the molecular interactome with tissue-level networks, organ systems, and inter-organismal interactions, such as those seen in symbiosis or ecosystems. From a practical standpoint, a significant success measure would be the application of network biology insights to develop novel and more effective therapeutic interventions. This could mean identifying critical network nodes or interactions to target diseases, leading to innovative treatments.

Drawing parallels from the reference human genome, the equivalent for network biology could be a reference interactome—a standardized and comprehensive map of all known biological interactions within a human cell. This would serve as a baseline for studying disease, development, aging, and other biological processes. Any deviations from this reference in specific cell types, conditions, or diseases could be studied in detail.

Just as AlphaFold jumper2021highly has made waves in predicting protein structures, a comparable success in network biology might be the development of tools that can accurately predict the emergent properties of a biological system from its underlying network. Given a set of interactions, this would mean the tool could foresee the system’s response to a drug, its behavior under certain conditions, or its evolution over time.

8 8. Additional discussion on scientific communities, education, and diversity

The question of who are network biologists or computational biologists is hard. Ideally, a computational biologist would have the interest and knowledge to both develop core computational methods and understand fundamental biological mechanisms. That raises the question of how to properly train more of such researchers to advance computational biology, including its subarea of network biology that models and analyzes biological systems as networks. For example, based on the personal experience of some of the authors of this paper, in a network biology course, computationally-focused students might enjoy computational but not biological aspects (e.g., in a general network science course, students typically choose a non-biology domain to work on, such as technological or social networks). In contrast, biology students might enjoy biological but not computational aspects. So, efforts might be needed to convince students to be genuinely excited about both developing computational approaches and understanding biological mechanisms. Systematically identifying and addressing gaps in current computational biology training programs or starting new interdisciplinary training programs might be needed, along with appropriate support and resources from funding agencies.

Some of these gaps are as follows. An essential part of efficient training would be to have robust, well-known, and trustworthy software tools that are readily available and easy to use, especially by those who are not proficient in computing; clearly, both developing and sustaining such software requires resources. Similar holds for building and making available datasets easily accessible by people who are not proficient in biology to help them get involved easily. Another important part would be exposing students to interdisciplinary collaborative teams to train them to work together on the same research questions with scientists from different disciplines.

Another vital part of training relates to hiring and promoting computational biology faculty who would offer the training. A challenge here, based on the personal experience of some of the authors of this paper, seems to be as follows. When hiring a computational biologist in a traditional computationally-focused department (e.g., computer science, applied mathematics, statistics, or physics), someone who is more trained in biology may be viewed as not enough of a computational scientist, even when they are proficient in using existing computational methods to uncover new biological knowledge and possibly also at least occasionally develop new computational methods for studying biological systems. Similarly, in a traditional biology-focused department, a more computationally trained person may be viewed as not enough of a biological scientist, even when they evaluate their new computational methods on biological data and possibly at least occasionally yield new knowledge about biological systems. Yet, both kinds of candidates can be great for both department types. Hence, hiring and promotion groups might need to think differently about interdisciplinary computational biology research. This is especially true in departments where these groups do not have computational biologists or where there are no specific, interdisciplinary departments like biomedical data science or computational biology.

There exists an additional challenge even when focusing on computationally-oriented researchers within computational biology. Scientific communities that could benefit (from) the field of network biology include graph theory, network science, data mining, machine learning, and artificial intelligence. These communities often use different terminology for the same concepts (e.g., network alignment versus graph matching or graph clustering versus network community detection). Distinct scientific communities may all analyze biological network data, or address identical computational challenges across various application domains, such as biological versus social networks. However, they often do not attend the same research forums. For instance, attendees of the prominent computational biology conference, Intelligent Systems for Molecular Biology (ISMB), might not necessarily participate in data mining conferences like Knowledge Discovery and Data Mining (KDD) or artificial intelligence conferences such as Neural Information Processing Systems (NeurIPS), and vice versa. Consequently, advancements in one domain might remain obscure in another. Organizing scientific symposia to convene computational scientists from traditionally distinct network biology communities, focusing on universally relevant topics, could help bridge this gap.

The above discussion items can be seen as diversity-focused, be it diversity in one’s training and skills or scientific communities they belong to nielsen2018making . Many other aspects of diversity exist in science, and we focus on some of them here. The International Society for Computational Biology (ISCB) is a globally recognized entity advocating for and advancing scholarship, research, training, outreach, and inclusive community building in computational biology and its professions. This is why we rely on ISCB’s demographic statistics to represent the current state in the computational biology field. According to a demographic survey of the ISCB membership, whose results are publicly available in the 2022/2023 ISCB Equity, Diversity, and Inclusion (EDI) report⁷⁷7https://www.iscb.org/edi-resources, among those who responded, 32.8% indicated “female”, 60% indicated “male”, 0.4% indicated “non-binary”, and 6.8% indicated “prefer not to declare”. Regarding ethnic origin, in the same report, 53% of those who responded with anything but “prefer not to declare” indicated a non-European descent. Some additional EDI statistics are as follows. At the time of the 2020/2021 ISCB EDI report (the latest report that offered this type of information), 41% of the ISCB Board of Directors were female, and 57% of the Executive Committee (elected officers) were female; 61% of selected keynote speakers at the Intelligent Systems for Molecular Biology (ISMB), ISCB’s flagship and most prestigious conference, were female since 2016. Regarding ISCB awards, fellows election, and other honors, the final selection shows a good gender balance that reflects the membership. However, during the nomination stage, in 2022/2023, for the innovator award, senior scientist award, and fellows election, 22%, 28%, and 25% of the nominees were female, respectively, compared to 32.8% of the entire ISCB membership being female. ISCB does not have such data yet on ethnicity.

Enhancing awareness and mitigating biases when nominating candidates for honors or inviting candidates as conference speakers is a pathway to improving diversity in the computational biology field. Another more ambitious goal is to achieve diversity statistics in the field that mirror those of the general population. This should be accomplished for all of undergraduate students, graduate students, postdoctoral fellows, and faculty (across various ranks), not only by addressing the ‘leaky pipeline’ issue alper1993pipeline ; sarraju2023leaky , but also by identifying and eliminating institutional barriers to establish an inclusive support infrastructure stevens2021fund . This might only be achievable over a longer period. Also, biology-focused subfields of computational biology are currently more gender-diverse than its computationally-focused subfields. Thus, diversity in computational biology might be more readily achieved by recruiting trainees from biology-focused subfields and equipping them with the requisite computational skills rather than the reverse. However, sourcing from computational subfields remains essential. Yet, disciplines like computer science, mathematics, and physics can act as gatekeepers and entering these fields without the appropriate background can be challenging mervis2022fix ; torbey2020algebra . Because innovative concepts can emerge from diverse sources and all individuals, it is imperative to eliminate gatekeeping barriers.

Additional diversity-related challenges include the need to recognize and mitigate potential implicit biases; limited access to registration and travel funds to conferences based on their locations, especially for those in middle and low-income countries; current lack of ethnicity data to evaluate diversity efforts of computational biology conferences and communities, including ISCB; empirical research into equity in science, etc. Systematic and properly funded initiatives by universities and professional societies are necessary to achieve this. And so are individual efforts by the members of the scientific community. Everyone should be responsible for contributing to joint diversity efforts for the field to make significant and sufficient progress.

9 Software and data availability

Not applicable.

10 Conflict of interest

In Section 8, we rely on ISCB’s diversity statistics. These statistics are publicly available, and so there is no conflict of interest. Yet, to remedy any potential perceived conflict of interest, we declare that Predrag Radivojac is the President of ISCB and currently serves on the Board of Directors of ISCB. In addition, Tijana Milenković currently serves on the ISCB Board of Directors and the ISCB EDI Committee. The remaining authors have no conflicts of interest to declare.

11 Acknowledgements

This work has been initialized at the Workshop on Future Directions in Network Biology held at the University of Notre Dame during June 12-14, 2022. The workshop was supported by the U.S. National Science Foundation [grant number CCF-1941447]. This targeted meeting brought together 39 active researchers in various aspects of network biology to present and discuss a short- and long-term vision for computational research in this field. 31 of the workshop participants attended the meeting in person. Due to difficulties with international travel related to the COVID-19 pandemic, all in-person workshop participants were from institutions in the United States. To draw on a combination of distinct ideas and experiences, when inviting participants, an effort was made to balance diversity among the attendees along multiple axes, including seniority (full, associate, or assistant professors, postdocs, and PhD students), affiliation (representation from academia, industry, and government), and gender (42% of the in-person participants were female). The workshop participants presented their views of important research directions, open problems, and challenges that would propel computational and algorithmic advances in network biology. Presentation slides for the scientific sessions at the workshop are linked to the workshop website⁸⁸8https://www3.nd.edu/~tmilenko/NetworkBiologyWorkshop/, and videos of the presentations are publicly available on YouTube⁹⁹9https://www.youtube.com/playlist?list=PLy8BJXti_TvYaL7frFJz2mf38e8o0NaFN.

Thanks to Siyu Yang, a Ph.D. student in the Department of Computer Science and Engineering at the University of Notre Dame, for carrying out the literature search on network-of-networks analysis.

Pacific Northwest National Laboratory is operated by Battelle for the U.S. Department of Energy under Contract No. DE-AC05 to 76RLO 1830.

The work of Teresa M. Przytycka was supported by the Intramural Research Program of the National Library of Medicine, National Institutes of Health [grant number LM200887-16].

References

[1] M. Abdar, F. Pourpanah, S. Hussain, D. Rezazadegan, L. Liu, M. Ghavamzadeh, P. Fieguth, X. Cao, A. Khosravi, U. R. Acharya, et al. A Review of Uncertainty Quantification in Deep Learning: Techniques, Applications and Challenges. Information Fusion, 76:243–297, 2021.
[2] J. Abramson, J. Adler, J. Dunger, R. Evans, T. Green, A. Pritzel, O. Ronneberger, L. Willmore, A. J. Ballard, J. Bambrick, et al. Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature, pages 1–3, 2024.
[3] C. Agarwal, O. Queen, H. Lakkaraju, and M. Zitnik. Evaluating explainability for graph neural networks. Scientific Data, 10(1):144, 2023.
[4] S. Agarwal, K. Branson, and S. Belongie. Higher Order Learning with Graphs. In Proceedings of the International Conference on Machine Learning, pages 17–24, 2006.
[5] M. Agrawal, M. Zitnik, and J. Leskovec. Large-scale analysis of disease pathways in the human interactome. In Proceedings of the Pacific Symposium on Biocomputing, pages 111–122, 2018.
[6] R. Alguliyev, R. Aliguliyev, and F. Yusifov. Graph modelling for tracking the COVID-19 pandemic spread. Infectious Disease Modelling, 6:112–122, 2021.
[7] J. Alper. The pipeline is leaking women all the way along. Science, 260(5106):409–411, 1993.
[8] E. Alsentzer, S. Finlayson, M. Li, and M. Zitnik. Subgraph neural networks. In Proceedings of the Advances in Neural Information Processing Systems, volume 33, pages 8017–8029, 2020.
[9] E. Alsentzer, M. M. Li, S. N. Kobren, U. D. Network, I. S. Kohane, and M. Zitnik. Deep learning for diagnosing patients with rare genetic diseases. medRxiv 2022.12.07.22283238, 2022.
[10] A. Antelmi, G. Cordasco, M. Polato, V. Scarano, C. Spagnuolo, and D. Yang. A Survey on Hypergraph Representation Learning. ACM Computing Surveys, 2023.
[11] D. Aparicio, P. Ribeiro, T. Milenković, and F. Silva. Temporal network alignment via GoT-WAVE. Bioinformatics, 35(18):3527–3529, 2019.
[12] K. M. Arici and N. Tuncbag. Unveiling hidden connections in omics data via pyPARAGON: an integrative hybrid approach for disease network construction. bioRxiv 2023.07.13.547583, 2023.
[13] A. Arnatkeviciute, B. D. Fulcher, M. A. Bellgrove, and A. Fornito. Where the genome meets the connectome: understanding how genes shape human brain connectivity. Neuroimage, 244:118570, 2021.
[14] A. Arnatkeviciute, B. D. Fulcher, S. Oldham, J. Tiego, C. Paquola, Z. Gerring, K. Aquino, Z. Hawi, B. Johnson, G. Ball, et al. Genetic influences on hub connectivity of the human connectome. Nature Communications, 12(1):4237, 2021.
[15] S. J. Aronson and H. L. Rehm. Building the foundation for genomics in precision medicine. Nature, 526(7573):336–342, 2015.
[16] Y. Artzy-Randrup, S. J. Fleishman, N. Ben-Tal, and L. Stone. Comment on “Network motifs: Simple building blocks of complex networks” and “Superfamilies of evolved and designed networks”. Science, 305:1107c, 2004.
[17] K. Atz, F. Grisoni, and G. Schneider. Geometric deep learning on molecular representations. Nature Machine Intelligence, 3(12):1023–1032, 2021.
[18] G. Ausiello and L. Laura. Directed hypergraphs: Introduction and fundamental algorithms—A survey. Theoretical Computer Science, 658:293–306, 2017.
[19] J. S. Bader, A. Chaudhuri, J. M. Rothberg, and J. Chant. Gaining confidence in high-throughput protein interaction networks. Nature Biotechnology, 22(1):78–85, 2004.
[20] P. Badia-i Mompel, L. Wessels, S. Müller-Dott, R. Trimbour, R. O. Ramirez Flores, R. Argelaguet, and J. Saez-Rodriguez. Gene regulatory network inference in the era of single-cell multi-omics. Nature Reviews Genetics, 24(11):739–754, 2023.
[21] M. Baek, F. DiMaio, I. Anishchenko, J. Dauparas, S. Ovchinnikov, G. R. Lee, J. Wang, Q. Cong, L. N. Kinch, R. D. Schaeffer, et al. Accurate prediction of protein structures and interactions using a three-track neural network. Science, 373(6557):871–876, 2021.
[22] M. Baek, R. McHugh, I. Anishchenko, H. Jiang, D. Baker, and F. DiMaio. Accurate prediction of protein–nucleic acid complexes using rosettafoldna. Nature Methods, 21(1):117–121, 2024.
[23] V. Bagal, R. Aggarwal, P. Vinod, and U. D. Priyakumar. MolGPT: molecular generation using a transformer-decoder model. Journal of Chemical Information and Modeling, 62(9):2064–2076, 2021.
[24] A. K. Bajpai, S. Davuluri, K. Tiwary, S. Narayanan, S. Oguru, K. Basavaraju, D. Dayalan, K. Thirumurugan, and K. K. Acharya. Systematic comparison of the protein-protein interaction databases from a user’s perspective. Journal of Biomedical Informatics, 103:103380, 2020.
[25] J. Banerjee, J. N. Taroni, R. J. Allaway, D. V. Prasad, J. Guinney, and C. Greene. Machine learning in rare disease. Nature Methods, pages 1–12, 2023.
[26] A.-L. Barabási. Network Science. Cambridge University Press, 2016.
[27] A.-L. Barabási, N. Gulbahce, and J. Loscalzo. Network medicine: a network-based approach to human disease. Nature Reviews Genetics, 12(1):56–68, 2011.
[28] A. Baryshnikova, M. Costanzo, C. L. Myers, B. Andrews, and C. Boone. Genetic Interaction Networks: Toward an Understanding of Heritability. Annual Review of Genomics and Human Genetics, 14:111–133, 2013.
[29] O. Basha, C. Argov, R. Artzy, Y. Zoabi, I. Hekselman, L. Alfandari, V. Chalifa-Caspi, and E. Yeger-Lotem. Differential network analysis of multiple human tissue interactomes highlights tissue-selective processes and genetic disorder genes. Bioinformatics, 36(9):2821–2828, 2020.
[30] O. Basha, R. Shpringer, C. M. Argov, and E. Yeger-Lotem. The DifferentialNet database of differential protein–protein interactions in human tissues. Nucleic Acids Research, 46(D1):D522–D526, 2018.
[31] F. Battiston, G. Cencetti, I. Iacopini, V. Latora, M. Lucas, A. Patania, J.-G. Young, and G. Petri. Networks beyond pairwise interactions: Structure and dynamics. Physics Reports, 874:1–92, 2020.
[32] S. Batzner, A. Musaelian, L. Sun, M. Geiger, J. P. Mailoa, M. Kornbluth, N. Molinari, T. E. Smidt, and B. Kozinsky. E(3)-equivariant graph neural networks for data-efficient and accurate interatomic potentials. Nature Communications, 13(1):2453, 2022.
[33] B. Baur, J. Shin, S. Zhang, and S. Roy. Data integration for inferring context-specific gene regulatory networks. Current Opinion in Systems Biology, 23:38–46, 2020.
[34] A. Belyaeva, L. Cammarata, A. Radhakrishnan, C. Squires, K. D. Yang, G. Shivashankar, and C. Uhler. Causal network models of SARS-CoV-2 expression and aging to identify candidates for drug repurposing. Nature Communications, 12(1):1024, 2021.
[35] N. R. Bennett, B. Coventry, I. Goreshnik, B. Huang, A. Allen, D. Vafeados, Y. P. Peng, J. Dauparas, M. Baek, L. Stewart, et al. Improving de novo protein binder design with deep learning. Nature Communications, 14(1):2625, 2023.
[36] T. Bepler and B. Berger. Learning the protein language: Evolution, structure, and function. Cell Systems, 12(6):654–669, 2021.
[37] N. C. Berchtold, D. H. Cribbs, P. D. Coleman, J. Rogers, E. Head, R. Kim, T. Beach, C. Miller, J. Troncoso, J. Q. Trojanowski, et al. Gene expression changes in the course of normal brain aging are sexually dimorphic. Proceedings of the National Academy of Sciences, 105(40):15605–15610, 2008.
[38] D. Berenberg, V. Gligorijević, and R. Bonneau. Graph embeddings for protein structural comparison. Talk in the 3DSIG track at the Intelligent Systems For Molecular Biology and European Conference on Computational Biology, 2021. https://www.youtube.com/watch?v=1SuojEkR6ZA.
[39] C. Berge. Graphs and Hypergraphs. Elsevier Science, 1985.
[40] K. Beyer, J. Goldstein, R. Ramakrishnan, and U. Shaft. When is “Nearest Neighbor” Meaningful? In Proceedings of the International Conference on Database Theory, pages 217–235, 1999.
[41] C. Bilodeau, W. Jin, T. Jaakkola, R. Barzilay, and K. F. Jensen. Generative models for molecular discovery: Recent advances and challenges. Wiley Interdisciplinary Reviews: Computational Molecular Science, 12(5):e1608, 2022.
[42] S. Boluki, M. S. Esfahani, X. Qian, and E. R. Dougherty. Incorporating biological prior knowledge for Bayesian learning via maximal knowledge-driven information priors. BMC Bioinformatics, 18(14):61–80, 2017.
[43] J. A. Bondy and R. L. Hemminger. Graph reconstruction—a survey. Journal of Graph Theory, 1(3):227–268, 1977.
[44] R. Bonneau, D. J. Reiss, P. Shannon, M. Facciotti, L. Hood, N. S. Baliga, and V. Thorsson. The Inferelator: an algorithm for learning parsimonious regulatory networks from systems-biology data sets de novo. Genome Biology, 7:1–16, 2006.
[45] K. M. Borgwardt, C. S. Ong, S. Schonauer, S. V. Vishwanathan, A. J. Smola, and H. P. Kriegel. Protein function prediction via graph kernels. Bioinformatics, 21 Suppl 1:i47–56, 2005.
[46] G. Bouritsas, F. Frasca, S. Zafeiriou, and M. M. Bronstein. Improving Graph Neural Network Expressivity via Subgraph Isomorphism Counting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(1):657–668, 2022.
[47] X. Bresson and T. Laurent. The transformer network for the Traveling Salesman Problem. arXiv:2103.03012, 2021.
[48] P. Bryant and A. Elofsson. Modelling the dispersion of SARS-CoV-2 on a dynamic network graph. medRxiv 2020.10.19.20215046, 2020.
[49] A. Bumin, A. Ritz, D. K. Slonim, T. Kahveci, and K. Huang. FiT: fiber-based tensor completion for drug repurposing. In Proceedings of the ACM International Conference on Bioinformatics, Computational Biology and Health Informatics, pages 1–10, 2022.
[50] A. Butler, P. Hoffman, P. Smibert, E. Papalexi, and R. Satija. Integrating single-cell transcriptomic data across different conditions, technologies, and species. Nature Biotechnology, 36(5):411–420, 2018.
[51] Y. Cai, X. Jiang, Y. Li, X. He, and C. Lin. Resolving Power Equipment Data Inconsistency via Heterogeneous Network Alignment. IEEE Access, 11:23980–23988, 2023.
[52] T. J. Callahan, I. J. Tripodi, A. L. Stefanski, L. Cappelletti, S. B. Taneja, J. M. Wyrwa, E. Casiraghi, N. A. Matentzoglu, J. Reese, J. C. Silverstein, et al. An open source knowledge graph ecosystem for the life sciences. Scientific Data, 11(1):363, 2024.
[53] R. Cambini, G. Gallo, and M. G. Scutellà. Flows on hypergraphs. Mathematical Programming, 78:195–217, 1997.
[54] C. V. Cannistraci, M. G. Valsecchi, and I. Capua. Age-sex population adjusted analysis of disease severity in epidemics as a tool to devise public health policies for COVID-19. Scientific Reports, 11(1):11787, 2021.
[55] M. Cao, H. Zhang, J. Park, N. M. Daniels, M. E. Crovella, L. J. Cowen, and B. Hescott. Going the Distance for Protein Function Prediction: A New Distance Metric for Protein Interaction Networks. PLOS One, 8(10):e76339, 2013.
[56] W. Cao, Z. Yan, Z. He, and Z. He. A Comprehensive Survey on Geometric Deep Learning. IEEE Access, 8:35929–35949, 2020.
[57] D. E. Carlin, S. H. Fong, Y. Qin, T. Jia, J. K. Huang, B. Bao, C. Zhang, and T. Ideker. A fast and flexible framework for network-assisted genomic association. iScience, 16:155–161, 2019.
[58] P. Chandak, K. Huang, and M. Zitnik. Building a knowledge graph to enable precision medicine. Scientific Data, 10(1):67, 2023.
[59] S. Chandrasekaran and N. D. Price. Probabilistic integrative modeling of genome-scale metabolic and regulatory networks in Escherichia coli and Mycobacterium tuberculosis. Proceedings of the National Academy of Sciences, 107(41):17845–17850, 2010.
[60] D. Chasman, B. Gancarz, L. Hao, M. Ferris, P. Ahlquist, and M. Craven. Inferring host gene subnetworks involved in viral replication. PLOS Computational Biology, 10(5):e1003626, 2014.
[61] C. Chen, H. Tong, L. Xie, L. Ying, and Q. He. FASCINATE: Fast Cross-Layer Dependency Inference on Multi-layered Networks. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 765–774, 2016.
[62] M. Chen, C. J.-T. Ju, G. Zhou, X. Chen, T. Zhang, K.-W. Chang, C. Zaniolo, and W. Wang. Multifaceted protein–protein interaction prediction based on Siamese residual RCNN. Bioinformatics, 35(14):i305–i314, 2019.
[63] S. Chen, D. M. Witten, and A. Shojaie. Selection and estimation for mixed graphical models. Biometrika, 102(1):47–64, 2014.
[64] Z. Chen, L. Chen, S. Villar, and J. Bruna. Can graph neural networks count substructures? In Proceedings of the Advances in Neural Information Processing Systems, volume 33, pages 10383–10395, 2020.
[65] F. Cheng, R. J. Desai, D. E. Handy, R. Wang, S. Schneeweiss, A.-L. Barabasi, and J. Loscalzo. Network-based approach to prediction and population-based validation of in silico drug repurposing. Nature Communications, 9(1):2691, 2018.
[66] F. Cheng, J. Zhao, Y. Wang, W. Lu, Z. Liu, Y. Zhou, W. R. Martin, R. Wang, J. Huang, T. Hao, et al. Comprehensive characterization of protein–protein interactions perturbed by disease mutations. Nature Genetics, 53(3):342–353, 2021.
[67] U. Chitra, T. Y. Park, and B. J. Raphael. NetMix2: A Principled Network Propagation Algorithm for Identifying Altered Subnetworks. Journal of Computational Biology, 29(12):1305–1323, 2022.
[68] U. Chitra and B. Raphael. Random walks on hypergraphs with edge-dependent vertex weights. In Proceedings of the International Conference on Machine Learning, pages 1172–1181, 2019.
[69] S. Choobdar, M. E. Ahsen, J. Crawford, M. Tomasoni, T. Fang, D. Lamparter, J. Lin, B. Hescott, X. Hu, J. Mercer, et al. Assessment of network module identification across complex diseases. Nature Methods, 16(9):843–852, 2019.
[70] R. G. Christensen, M. S. Enuameh, M. B. Noyes, M. H. Brodsky, S. A. Wolfe, and G. D. Stormo. Recognition models to predict DNA-binding specificities of homeodomain proteins. Bioinformatics, 28(12):i84–i89, 2012.
[71] J.-h. Chu, C. P. Hersh, P. J. Castaldi, M. H. Cho, B. A. Raby, N. Laird, R. Bowler, S. Rennard, J. Loscalzo, J. Quackenbush, et al. Analyzing networks of phenotypes in complex diseases: methodology and applications in COPD. BMC Systems Biology, 8(1):1–12, 2014.
[72] Z. Chu, F. Huang, H. Fu, Y. Quan, X. Zhou, S. Liu, and W. Zhang. Hierarchical graph representation learning for the prediction of drug-target binding affinity. Information Sciences, 613:507–523, 2022.
[73] H.-Y. Chuang, E. Lee, Y.-T. Liu, D. Lee, and T. Ideker. Network-based classification of breast cancer metastasis. Molecular Systems Biology, 3(1):140, 2007.
[74] D. Comaniciu, K. Engel, B. Georgescu, and T. Mansi. Shaping the future through innovations: From medical imaging to precision medicine. Medical Image Analysis, 33:19–26, 2016.
[75] J. Cong, L. Hagen, and A. Kahng. Random walks for circuit clustering. In Proceedings of the IEEE International ASIC Conference and Exhibit, pages P14–2, 1991.
[76] G. Corso, H. Stärk, B. Jing, R. Barzilay, and T. S. Jaakkola. Diffdock: Diffusion steps, twists, and turns for molecular docking. In Proceedings of the International Conference on Learning Representations, 2023.
[77] M. Coşkun and M. Koyutürk. Node similarity-based graph convolution for link prediction in biological networks. Bioinformatics, 37(23):4501–4508, 2021.
[78] M. Costanzo, B. VanderSluis, E. N. Koch, A. Baryshnikova, C. Pons, G. Tan, W. Wang, M. Usaj, J. Hanchard, S. D. Lee, et al. A global genetic interaction network maps a wiring diagram of cellular function. Science, 353(6306):aaf1420, 2016.
[79] J. C. Costello and G. Stolovitzky. Seeking the wisdom of crowds through challenge-based competitions in biomedical research. Clinical Pharmacology & Therapeutics, 93(5):396–398, 2013.
[80] L. Cowen, T. Ideker, B. J. Raphael, and R. Sharan. Network propagation: a universal amplifier of genetic associations. Nature Reviews Genetics, 18(9):551–562, 2017.
[81] J. Crawford and T. Milenković. ClueNet: Clustering a temporal network based on topological similarity rather than denseness. PLOS One, 13(5):e0195993, 2018.
[82] L. A. Cruz, J. N. Cooke Bailey, and D. C. Crawford. Importance of Diversity in Precision Medicine: Generalizability of Genetic Associations Across Ancestry Groups Toward Better Identification of Disease Susceptibility Variants. Annual Review of Biomedical Data Science, 6:339–356, 2023.
[83] P. Dao, Y.-A. Kim, D. Wojtowicz, S. Madan, R. Sharan, and T. M. Przytycka. BeWith: A Between-Within method to discover relationships between cancer modules via integrated analysis of mutual exclusivity, co-occurrence and functional interactions. PLOS Computational Biology, 13(10):e1005695, 2017.
[84] J. Dauparas, I. Anishchenko, N. Bennett, H. Bai, R. J. Ragotte, L. F. Milles, B. I. Wicky, A. Courbet, R. J. de Haas, N. Bethel, et al. Robust deep learning–based protein sequence design using ProteinMPNN. Science, 378(6615):49–56, 2022.
[85] N. De Cao and T. Kipf. MolGAN: An implicit generative model for small molecular graphs. In Proceedings of the International Conference on Machine Learning Workshop on Theoretical Foundations and Applications of Deep Generative Models, 2018.
[86] M. De Domenico. More is different in real-world multilayer networks. Nature Physics, 19:1247–1262, 2023.
[87] M. De Domenico, A. Solé-Ribalta, E. Cozzo, M. Kivelä, Y. Moreno, M. A. Porter, S. Gómez, and A. Arenas. Mathematical Formulation of Multilayer Networks. Physical Review X, 3:041022, 2013.
[88] J. P. de Magalhães, A. Budovsky, G. Lehmann, J. Costa, Y. Li, V. Fraifeld, and G. Church. The Human Ageing Genomic Resources: online databases and tools for biogerontologists. Aging Cell, 8(1):65–72, 2009.
[89] R. Dehghannasiri, B.-J. Yoon, and E. R. Dougherty. Optimal experimental design for gene regulatory networks in the presence of uncertainty. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 12(4):938–950, 2014.
[90] R. Dehghannasiri, B.-J. Yoon, and E. R. Dougherty. Efficient experimental design for uncertainty reduction in gene regulatory networks. In Proceedings of the MidSouth Computational Biology and Bioinformatics Society Conference, volume 16, pages 1–18, 2015.
[91] P. Demetci, R. Santorella, B. Sandstede, W. S. Noble, and R. Singh. SCOT: Single-Cell Multi-Omics Alignment with Optimal Transport. Journal of Computational Biology, 29(1):3–18, 2022.
[92] K. Devkota, J. M. Murphy, and L. J. Cowen. GLIDE: combining local methods and diffusion state embeddings to predict missing interactions in biological networks. Bioinformatics, 36:i464–i473, 2020.
[93] P. Di Tommaso, M. Chatzou, E. W. Floden, P. P. Barja, E. Palumbo, and C. Notredame. Nextflow enables reproducible computational workflows. Nature Biotechnology, 35(4):316–319, 2017.
[94] C. Ding, X. He, H. Xiong, H. Peng, and S. R. Holbrook. Transitive closure and metric inequality of weighted graphs: detecting protein interaction modules using cliques. International Journal of Data Mining and Bioinformatics, 1(2):162–177, 2006.
[95] K. Ding, S. Wang, and Y. Luo. Supervised biological network alignment with graph neural networks. Bioinformatics, 39:i465–i474, 2023.
[96] Y. Dong, N. V. Chawla, and A. Swami. metapath2vec: Scalable Representation Learning for Heterogeneous Networks. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 135–144, 2017.
[97] S. Doria-Belenguer, A. Xenos, G. Ceddia, N. Malod-Dognin, and N. Pržulj. A functional analysis of omic network embedding spaces reveals key altered functions in cancer. Bioinformatics, 39(5):281, 2023.
[98] S. Doria-Belenguer, A. Xenos, G. Ceddia, N. Malod-Dognin, and N. Pržulj. The axes of biology: a novel axes-based network embedding paradigm to decipher the functional mechanisms of the cell. Bioinformatics Advances, page vbae075, 2024.
[99] B. Du and H. Tong. MrMine: Multi-resolution Multi-network Embedding. In Proceedings of the ACM International Conference on Information and Knowledge Management, pages 479–488, 2019.
[100] A. Ducournau and A. Bretto. Random walks in directed hypergraphs and application to semi-supervised image segmentation. Computer Vision and Image Understanding, 120:91–102, 2014.
[101] V. P. Dwivedi, C. K. Joshi, A. T. Luu, T. Laurent, Y. Bengio, and X. Bresson. Benchmarking graph neural networks. Journal of Machine Learning Research, 2022.
[102] V. P. Dwivedi, L. Rampášek, M. Galkin, A. Parviz, G. Wolf, A. T. Luu, and D. Beaini. Long range graph benchmark. In Thirty-sixth Conference on Neural Information Processing Systems Datasets and Benchmarks Track, 2022.
[103] Edelsbrunner, Letscher, and Zomorodian. Topological Persistence and Simplification. Discrete & Computational Geometry, 28:511–533, 2002.
[104] Y. Ektefaie, G. Dasoulas, A. Noori, M. Farhat, and M. Zitnik. Multimodal learning with graphs. Nature Machine Intelligence, 5(4):340–350, 2023.
[105] Y. Ektefaie, A. Shen, D. Bykova, M. Marin, M. Zitnik, and M. R. Farhat. Evaluating generalizability of artificial intelligence models for molecular datasets. bioRxiv, pages 2024–02, 2024.
[106] H. A. Elmarakeby, J. Hwang, R. Arafeh, J. Crowdis, S. Gang, D. Liu, S. H. AlDubayan, K. Salari, S. Kregel, C. Richter, et al. Biologically informed deep neural network for prostate cancer discovery. Nature, 598(7880):348–352, 2021.
[107] F. Emmert-Streib, M. Dehmer, and Y. Shi. Fifty years of graph matching, network alignment and network comparison. Information Sciences, 346(C):180–197, 2016.
[108] A. Esteva, A. Robicquet, B. Ramsundar, V. Kuleshov, M. DePristo, K. Chou, C. Cui, G. Corrado, S. Thrun, and J. Dean. A guide to deep learning in healthcare. Nature Medicine, 25(1):24–29, 2019.
[109] R. Evans, M. O’Neill, A. Pritzel, N. Antropova, A. Senior, T. Green, A. Žídek, R. Bates, S. Blackwell, J. Yim, et al. Protein complex prediction with AlphaFold-Multimer. bioRxiv 2021.10.04.463034, 2021.
[110] S. Eyuboglu, M. Zitnik, and J. Leskovec. Mutual interactors as a principle for phenotype discovery in molecular interaction networks. In Proceedings of the Pacific Symposium on Biocomputing, pages 61–72, 2023.
[111] F. Faisal, K. Newaz, J. Chaney, J. Li, S. Emrich, P. Clark, and T. Milenković. GRAFENE: Graphlet-based alignment-free network approach integrates 3D structural and sequence (residue order) data to improve protein structural comparison. Scientific Reports, 7(14890), 2017.
[112] F. E. Faisal, L. Meng, J. Crawford, and T. Milenković. The post-genomic era of biological network alignment. EURASIP Journal on Bioinformatics and Systems Biology, 2015(1):3, 2015.
[113] F. E. Faisal and T. Milenković. Dynamic networks reveal key players in aging. Bioinformatics, 30(12):1721–1729, 2014.
[114] F. E. Faisal, H. Zhao, and T. Milenković. Global network alignment in the context of aging. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 12(1):40–52, 2014.
[115] J. J. Faith, B. Hayete, J. T. Thaden, I. Mogno, J. Wierzbowski, G. Cottarel, S. Kasif, J. J. Collins, and T. S. Gardner. Large-scale mapping and validation of Escherichia coli transcriptional regulation from a compendium of expression profiles. PLOS Biology, 5(1):e8, 2007.
[116] B. Fatemi, J. Halcrow, and B. Perozzi. Talk like a graph: Encoding graphs for large language models. arXiv:2310.04560, 2023.
[117] K. Fecho, A. E. Thessen, S. E. Baranzini, C. Bizon, J. J. Hadlock, S. Huang, R. T. Roper, N. Southall, C. Ta, P. B. Watkins, et al. Progress toward a universal biomedical data translator. Clinical and Translational Science, 15(8):1838–1847, 2022.
[118] E. N. Feinberg, D. Sur, Z. Wu, B. E. Husic, H. Mai, Y. Li, S. Sun, J. Yang, B. Ramsundar, and V. S. Pande. PotentialNet for Molecular Property Prediction. ACS Central Science, 4(11):1520–1530, 2018.
[119] S. Feng, E. Heath, B. Jefferson, C. Joslyn, H. Kvinge, H. D. Mitchell, B. Praggastis, A. J. Eisfeld, A. C. Sims, L. B. Thackray, et al. Hypergraph models of biological networks to identify genes critical to pathogenic viral response. BMC Bioinformatics, 22(1):1–21, 2021.
[120] S. Fields and O. Song. A novel genetic system to detect protein-protein interactions. Nature, 340(6230):245–246, 1989.
[121] D. T. Forster, S. C. Li, Y. Yashiroda, M. Yoshimura, Z. Li, L. A. V. Isuhuaylas, K. Itto-Nakama, D. Yamanaka, Y. Ohya, H. Osada, et al. BIONIC: biological network integration using convolutions. Nature Methods, 19(10):1250–1261, 2022.
[122] S. Fortunato. Community detection in graphs. Physics Reports, 486:75–174, 2010.
[123] N. Franzese, A. Groce, T. Murali, and A. Ritz. Hypergraph-based connectivity measures for signaling pathway topologies. PLOS Computational Biology, 15(10):e1007384, 2019.
[124] N. Friedman, I. Nachman, and D. Peér. Learning Bayesian Network Structure from Massive Datasets: The ”Sparse Candidate” Algorithm. In Proceedings of the Conference on Uncertainty in Artificial Intelligence, page 206–215, 1999.
[125] H. Fu, F. Huang, X. Liu, Y. Qiu, and W. Zhang. MVGCN: data integration through multi-view graph convolutional network for predicting links in biomedical bipartite networks. Bioinformatics, 38(2):426–434, 2022.
[126] P. Gainza, F. Sverrisson, F. Monti, E. Rodola, D. Boscaini, M. Bronstein, and B. Correia. Deciphering interaction fingerprints from protein molecular surfaces using geometric deep learning. Nature Methods, 17(2):184–192, 2020.
[127] P. Gainza, S. Wehrle, A. Van Hall-Beauvais, A. Marchand, A. Scheck, Z. Harteveld, S. Buckley, D. Ni, S. Tan, F. Sverrisson, et al. De novo design of protein interactions with learned surface fingerprints. Nature, pages 1–9, 2023.
[128] J. Gao, Q. Zhao, W. Ren, A. Swami, R. Ramanathan, and A. Bar-Noy. Dynamic Shortest Path Algorithms for Hypergraphs. IEEE/ACM Transactions on Networking, 23(6):1805–1817, 2014.
[129] M. Gao, H. Zhou, and J. Skolnick. Insights into disease-associated mutations in the human proteome through protein structural analysis. Structure, 23(7):1362–1369, 2015.
[130] Z. Gao, C. Jiang, J. Zhang, X. Jiang, L. Li, P. Zhao, H. Yang, Y. Huang, and J. Li. Hierarchical graph learning for protein–protein interaction. Nature Communications, 14(1):1093, 2023.
[131] M. Garrido-Rodriguez, K. Zirngibl, O. Ivanova, S. Lobentanzer, and J. Saez-Rodriguez. Integrating knowledge and omics to decipher mechanisms via large-scale models of signaling networks. Molecular Systems Biology, 18(7):e11036, 2022.
[132] T. Gärtner, P. Flach, and S. Wrobel. On Graph Kernels: Hardness Results and Efficient Alternatives. In Proceedings of the Conference on Learning Theory, pages 129–143, 2003.
[133] T. Gaudelet, N. Malod-Dognin, J. Lugo-Martinez, P. Radivojac, and N. Pržulj. Hypergraphlets Give Insight into Multi-Scale Organisation of Molecular Networks. In Proceedings of the International Conference on Complex Networks and Their Applications, page 41, 2017.
[134] T. Gaudelet, N. Malod-Dognin, and N. Pržulj. Unveiling new disease, pathway, and gene associations via multi-scale neural network. PLOS One, 15(4):e0231059, 2020.
[135] T. Gaudelet, N. Malod-Dognin, and N. Pržulj. Integrative data analytic framework to enhance cancer precision medicine. Network and Systems Medicine, 4(1):60–73, 2021.
[136] D. Ghersi and M. Singh. Interaction-based discovery of functionally important genes in cancers. Nucleic Acids Research, 42(3):e18–e18, 2014.
[137] M. Gillespie, B. Jassal, R. Stephan, M. Milacic, K. Rothfels, A. Senff-Ribeiro, J. Griss, C. Sevilla, L. Matthews, C. Gong, et al. The reactome pathway knowledgebase 2022. Nucleic Acids Research, 50(D1):D687–D692, 2022.
[138] J. Gilmer, S. S. Schoenholz, P. F. Riley, O. Vinyals, and G. E. Dahl. Neural message passing for quantum chemistry. In International Conference on Machine Learning, pages 1263–1272. PMLR, 2017.
[139] K. Glass, C. Huttenhower, J. Quackenbush, and G.-C. Yuan. Passing messages between biological networks to refine predicted interactions. PLOS One, 8(5):e64832, 2013.
[140] K. Glass, J. Quackenbush, D. Spentzos, B. Haibe-Kains, and G.-C. Yuan. A network model for angiogenesis in ovarian cancer. BMC Bioinformatics, 16(1):1–17, 2015.
[141] V. Gligorijevic, N. Malod-Dognin, and N. Pržulj. Integrative methods for analyzing big data in precision medicine. Proteomics, 16(5):741–758, 2016.
[142] V. Gligorijevic, N. Malod-Dognin, and N. Pržulj. Patient-specific data fusion for cancer stratification and personalised treatment. In Proceedings of the Pacific Symposium on Biocomputing, 2016.
[143] V. Gligorijević and N. Pržulj. Methods for biological data integration: perspectives and challenges. Journal of the Royal Society Interface, 12(112):20150571, 2015.
[144] V. Gligorijević, P. D. Renfrew, T. Kosciolek, J. K. Leman, D. Berenberg, T. Vatanen, C. Chandler, B. C. Taylor, I. M. Fisk, H. Vlamakis, et al. Structure-based protein function prediction using graph convolutional networks. Nature Communications, 12(1):3168, 2021.
[145] X. Gong, H. Li, N. Zou, R. Xu, W. Duan, and Y. Xu. General framework for E(3)-equivariant neural network representation of density functional theory hamiltonian. Nature Communications, 14(1):2848, 2023.
[146] S. J. C. Gosline, S. J. Spencer, O. Ursu, and E. Fraenkel. SAMNet: a network-based approach to integrate multi-dimensional high throughput datasets. Integrative Biology, 4(11):1415–1427, 2012.
[147] C. S. Greene, A. Krishnan, A. K. Wong, E. Ricciotti, R. A. Zelaya, D. S. Himmelstein, R. Zhang, B. M. Hartmann, E. Zaslavsky, S. C. Sealfon, et al. Understanding multicellular function and disease with human tissue-specific networks. Nature Genetics, 47(6):569–576, 2015.
[148] A. Greenfield, C. Hafemeister, and R. Bonneau. Robust data-driven incorporation of prior knowledge into the inference of dynamic regulatory networks. Bioinformatics, 29(8):1060–1067, 2013.
[149] A. Grover and J. Leskovec. node2vec: Scalable Feature Learning for Networks. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 855–864, 2016.
[150] S. Gu, M. Jiang, P. H. Guzzi, and T. Milenković. Modeling multi-scale data via a network of networks. Bioinformatics, 38(9):2544–2553, 2022.
[151] S. Gu, J. Johnson, F. Faisal, and T. Milenković. From homogeneous to heterogeneous network alignment via colored graphlets. Scientific Reports, 8(1):12524, 2018.
[152] S. Gu and T. Milenković. Data-driven network alignment. PLOS One, 15(7):e0234978, 2020.
[153] S. Gu and T. Milenković. Data-driven biological network alignment that uses topological, sequence, and functional information. BMC Bioinformatics, 22(1):1–24, 2021.
[154] H. Gui, J. Liu, F. Tao, M. Jiang, B. Norick, and J. Han. Large-scale embedding learning in heterogeneous event data. In Proceedings of the IEEE International Conference on Data Mining, pages 907–912, 2016.
[155] M. F. Guiñazú, V. Cortés, C. F. Ibáñez, and J. D. Velásquez. Employing online social networks in precision-medicine approach using information fusion predictive model to improve substance use surveillance: A lesson from twitter and marijuana consumption. Information Fusion, 55:150–163, 2020.
[156] E. Guney, J. Menche, M. Vidal, and A.-L. Barábasi. Network-based in silico drug efficacy screening. Nature Communications, 7(1):10331, 2016.
[157] M. G. Guo, D. N. Sosa, and R. B. Altman. Challenges and opportunities in network-based solutions for biological questions. Briefings in Bioinformatics, 23(1), 2021.
[158] B. Gutteridge, X. Dong, M. M. Bronstein, and F. Di Giovanni. DRew: dynamically rewired message passing with delay. In Proceedings of the International Conference on Machine Learning, pages 12252–12267, 2023.
[159] P. H. Guzzi and T. Milenković. Survey of local and global biological network alignment: the need for reconciling the two sides of the same coin. Briefings in Bioinformatics, 19(3):472–481, 2017.
[160] D. M. Gysi and K. Nowick. Construction, comparison and evolution of networks in life sciences and other disciplines. Journal of the Royal Society Interface, 17(166):20190610, 2020.
[161] D. M. Gysi, A. Voigt, T. d. M. Fragoso, E. Almaas, and K. Nowick. wTO: an R package for computing weighted topological overlap and a consensus network with integrated visualization tool. BMC Bioinformatics, 19(1):1–16, 2018.
[162] P. H. Guzzi, F. Petrizzelli, and T. Mazza. Disease spreading modeling and analysis: A survey. Briefings in Bioinformatics, 23(4), 2022.
[163] A. Halu, M. De Domenico, A. Arenas, and A. Sharma. The multiplex network of human diseases. NPJ Systems Biology and Applications, 5(1):15, 2019.
[164] W. Hamilton, P. Bajaj, M. Zitnik, D. Jurafsky, and J. Leskovec. Embedding logical queries on knowledge graphs. In Proceedings of the Advances in Neural Information Processing Systems, volume 31, 2018.
[165] W. Hamilton, Z. Ying, and J. Leskovec. Inductive representation learning on large graphs. In Proceedings of the Advances in Neural Information Processing Systems, volume 30, 2017.
[166] W. L. Hamilton. Graph representation learning. Morgan & Claypool Publishers, 2020.
[167] W. L. Hamilton, R. Ying, and J. Leskovec. Representation Learning on Graphs: Methods and Applications. IEEE Data Engineering Bulletin, 40:52–74, 2017.
[168] T. Hamp and B. Rost. More challenges for machine-learning protein interactions. Bioinformatics, 31(10):1521–1525, 2015.
[169] A. Haque, J. Engel, S. A. Teichmann, and T. Lönnberg. A practical guide to single-cell RNA-sequencing for biomedical research and clinical applications. Genome Medicine, 9(1):1–12, 2017.
[170] S. Hashemifar, B. Neyshabur, A. A. Khan, and J. Xu. Predicting protein–protein interactions through sequence-based deep learning. Bioinformatics, 34(17):i802–i810, 2018.
[171] K. Hassani and A. H. Khasahmadi. Contrastive multi-view representation learning on graphs. In Proceedings of the International Conference on Machine Learning, pages 4116–4126, 2020.
[172] J. S. Hawe, F. J. Theis, and M. Heinig. Inferring interaction networks from multi-omics data. Frontiers in Genetics, 10:535, 2019.
[173] H. He, O. Queen, T. Koker, C. Cuevas, T. Tsiligkaridis, and M. Zitnik. Domain adaptation for time series under feature and label shifts. Proceedings of the International Conference in Machine Learning, 2023.
[174] D. Heckerman, D. M. Chickering, C. Meek, R. Rounthwaite, and C. Kadie. Dependency networks for inference, collaborative filtering, and data visualization. Journal of Machine Learning Research, 1:49–75, 2000.
[175] B. J. Heil, M. M. Hoffman, F. Markowetz, S.-I. Lee, C. S. Greene, and S. C. Hicks. Reproducibility standards for machine learning in the life sciences. Nature Methods, 18(10):1132–1135, 2021.
[176] M. Hein, S. Setzer, L. Jost, and S. S. Rangapuram. The total variation on hypergraphs–learning on hypergraphs revisited. In Proceedings of the Advances in Neural Information Processing Systems, pages 2427–2435, 2013.
[177] L. Hérault, M. Poplineau, E. Duprez, and É. Remy. A novel boolean network inference strategy to model early hematopoiesis aging. Computational and Structural Biotechnology Journal, 21:21–33, 2023.
[178] L. Hetzel, D. S. Fischer, S. Günnemann, and F. J. Theis. Graph representation learning for single-cell biology. Current Opinion in Systems Biology, 28:100347, 2021.
[179] L. Heumos, A. C. Schaar, C. Lance, A. Litinetskaya, F. Drost, L. Zappia, M. D. Lücken, D. C. Strobl, J. Henao, F. Curion, et al. Best practices for single-cell analysis across modalities. Nature Reviews Genetics, pages 1–23, 2023.
[180] D. S. Himmelstein and S. E. Baranzini. Heterogeneous network edge prediction: a data integration approach to prioritize disease-associated genes. PLOS Computational Biology, 11(7):e1004259, 2015.
[181] D. S. Himmelstein, A. Lizee, C. Hessler, L. Brueggeman, S. L. Chen, D. Hadley, A. Green, P. Khankhanian, and S. E. Baranzini. Systematic integration of biomedical knowledge prioritizes drugs for repurposing. eLife, 6:e26726, 2017.
[182] W. Hu, M. Fey, H. Ren, M. Nakata, Y. Dong, and J. Leskovec. OGB-LSC: A Large-Scale Challenge for Machine Learning on Graphs. In Proceedings of the Conference on Neural Information Processing Systems, volume 35, 2021.
[183] W. Hu, M. Fey, M. Zitnik, Y. Dong, H. Ren, B. Liu, M. Catasta, and J. Leskovec. Open graph benchmark: Datasets for machine learning on graphs. In Proceedings of the Advances in Neural Information Processing Systems, volume 33, pages 22118–22133, 2020.
[184] W. Hu, B. Liu, J. Gomes, M. Zitnik, P. Liang, V. Pande, and J. Leskovec. Strategies for pre-training graph neural networks. In Proceedings of the International Conference on Learning Representations, 2020.
[185] X. Hu, F. Li, D. Samaras, and C. Chen. Topology-preserving deep image segmentation. In Proceedings of the Advances in Neural Information Processing Systems, volume 32, 2019.
[186] K. Huang, P. Chandak, Q. Wang, S. Havaldar, A. Vaid, J. Leskovec, G. Nadkarni, B. S. Glicksberg, N. Gehlenborg, and M. Zitnik. Zero-shot prediction of therapeutic use with geometric deep learning and clinician centered design. medRxiv 2023.03.19.23287458, 2023.
[187] K. Huang, T. Fu, W. Gao, Y. Zhao, Y. Roohani, J. Leskovec, C. W. Coley, C. Xiao, J. Sun, and M. Zitnik. Therapeutics data commons: Machine learning datasets and tasks for drug discovery and development. In Proceedings of the Conference on Neural Information Processing Systems, 2021.
[188] K. Huang, T. Fu, W. Gao, Y. Zhao, Y. Roohani, J. Leskovec, C. W. Coley, C. Xiao, J. Sun, and M. Zitnik. Artificial intelligence foundation for therapeutic science. Nature Chemical Biology, 18(10):1033–1036, 2022.
[189] K. Huang, Y. Jin, E. Candes, and J. Leskovec. Uncertainty quantification over graph with conformalized graph neural networks. In A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt, and S. Levine, editors, Advances in Neural Information Processing Systems, volume 36, pages 26699–26721. Curran Associates, Inc., 2023.
[190] K. Huang, C. Xiao, L. M. Glass, M. Zitnik, and J. Sun. SkipGNN: predicting molecular interactions with skip-graph networks. Scientific Reports, 10(1):1–16, 2020.
[191] S. Huang, F. Poursafaei, J. Danovitch, M. Fey, W. Hu, E. Rossi, J. Leskovec, M. Bronstein, G. Rabusseau, and R. Rabbany. Temporal graph benchmark for machine learning on temporal graphs. Advances in Neural Information Processing Systems, 36, 2024.
[192] T. Huang, K. Glass, O. A. Zeleznik, J. H. Kang, K. L. Ivey, A. R. Sonawane, B. M. Birmann, C. P. Hersh, F. B. Hu, and S. S. Tworoger. A Network Analysis of Biomarkers for Type 2 Diabetes. Diabetes, 68(2):281–290, 2019.
[193] E. Hüllermeier and W. Waegeman. Aleatoric and epistemic uncertainty in machine learning: An introduction to concepts and methods. Machine Learning, 110(3):457–506, 2021.
[194] Y. Hulovatyy, H. Chen, and T. Milenković. Exploring the structure and function of temporal networks with dynamic graphlets. Bioinformatics, 31(12):i171–i180, 2015.
[195] Y. Hulovatyy and T. Milenković. SCOUT: simultaneous time segmentation and community detection in dynamic networks. Scientific Reports, 6(37557), 2016.
[196] Y. Hulovatyy, R. Solava, and T. Milenković. Revealing Missing Parts of the Interactome via Link Prediction. PLOS One, 9(3):e90073, 2014.
[197] L. E. Hunter, C. Hopfer, S. F. Terry, and M. E. Coors. Reporting actionable research results: shared secrets can save lives. Science Translational Medicine, 4(143):143cm8, 2012.
[198] V. A. Huynh-Thu, A. Irrthum, L. Wehenkel, and P. Geurts. Inferring regulatory networks from expression data using tree-based methods. PLOS One, 5(9):e12776, 2010.
[199] R. Ietswaart, B. M. Gyori, J. A. Bachman, P. K. Sorger, and L. S. Churchman. GeneWalk identifies relevant gene functions for a biological context using network representation learning. Genome Biology, 22:1–35, 2021.
[200] E. Ihler, D. Wagner, and F. Wagner. Modeling hypergraphs by graphs with the same mincut properties. Information Processing Letters, 45(4):171–175, 1993.
[201] J. Ingraham, V. Garg, R. Barzilay, and T. Jaakkola. Generative models for graph-based protein design. In Proceedings of the Advances in Neural Information Processing Systems, volume 32, 2019.
[202] N. E. Jacobsen. NMR spectroscopy explained: simplified theory, applications and examples for organic chemistry and structural biology. John Wiley & Sons, 2007.
[203] N. Jahanshad, P. Rajagopalan, X. Hua, D. P. Hibar, T. M. Nir, A. W. Toga, C. R. Jack Jr, A. J. Saykin, R. C. Green, M. W. Weiner, et al. Genome-wide scan of healthy human connectome discovers spon1 gene variant influencing dementia severity. Proceedings of the National Academy of Sciences, 110(12):4768–4773, 2013.
[204] K. Jia, C. Cui, Y. Gao, Y. Zhou, and Q. Cui. An analysis of aging-related genes derived from the genotype-tissue expression project (GTEx). Cell Death Discovery, 5(1):26, 2018.
[205] T. Jiang, Q. Zeng, T. Zhao, B. Qin, T. Liu, N. V. Chawla, and M. Jiang. Biomedical knowledge graphs construction from conditional statements. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 18(3):823–835, 2020.
[206] Y. Jiang, T. R. Oron, W. T. Clark, A. R. Bankapur, D. D’Andrea, R. Lepore, C. S. Funk, I. Kahanda, K. M. Verspoor, A. Ben-Hur, et al. An expanded evaluation of protein function prediction methods shows an improvement in accuracy. Genome Biology, 17(1):1–19, 2016.
[207] J. Jiménez, S. Doerr, G. Martínez-Rosell, A. S. Rose, and G. De Fabritiis. DeepSite: protein-binding site predictor using 3D-convolutional neural networks. Bioinformatics, 33(19):3036–3042, 2017.
[208] W. Jin, R. Barzilay, and T. Jaakkola. Junction tree variational autoencoder for molecular graph generation. In Proceedings of the International Conference on Machine Learning, pages 2323–2332, 2018.
[209] J. Jumper, R. Evans, A. Pritzel, T. Green, M. Figurnov, O. Ronneberger, K. Tunyasuvunakool, R. Bates, A. Žídek, A. Potapenko, et al. Highly accurate protein structure prediction with AlphaFold. Nature, 596(7873):583–589, 2021.
[210] J. Kaiser. NIH plots million-person megastudy. Science, 347(6224):817, 2015.
[211] J. R. Karr, J. C. Sanghvi, D. N. Macklin, M. V. Gutschow, J. M. Jacobs, B. Bolival, N. Assad-Garcia, J. I. Glass, and M. W. Covert. A whole-cell computational model predicts phenotype from genotype. Cell, 150(2):389–401, 2012.
[212] Z. N. Kesimoglu and S. Bozdag. GRAF: Graph attention-aware fusion networks. arXiv:2303.16781, 2023.
[213] Z. N. Kesimoglu and S. Bozdag. SUPREME: multiomics data integration using graph convolutional networks. NAR Genomics and Bioinformatics, 5(2):lqad063, 2023.
[214] H. A. Kestler, C. Wawra, B. Kracher, and M. Kühl. Network modeling of signal transduction: establishing the global view. Bioessays, 30(11-12):1110–1125, 2008.
[215] C. Y. Kim, S. Baek, J. Cha, S. Yang, E. Kim, E. Marcotte, T. Hart, and I. Lee. HumanNet v3: an improved database of human gene networks for disease research. Nucleic Acids Research, 50(D1):D632–D639, 2021.
[216] S. Kim, S.-Y. Shin, I.-H. Lee, S.-J. Kim, R. Sriram, and B.-T. Zhang. Pie: an online prediction system for protein–protein interactions from text. Nucleic Acids Research, 36:W411–W415, 2008.
[217] S. Kim, J. Sung, M. Foo, Y.-S. Jin, and P.-J. Kim. Uncovering the nutritional landscape of food. PLOS One, 10(3):e0118697, 2015.
[218] Y.-A. Kim, R. S. Basso, D. Wojtowicz, A. S. Liu, D. S. Hochbaum, F. Vandin, and T. M. Przytycka. Identifying drug sensitivity subnetworks with NETPHIX. iScience, 23(10):101619, 2020.
[219] A. C. Kinsley, G. Rossi, M. J. Silk, and K. VanderWaal. Multilayer and multiplex networks: An introduction to their use in veterinary epidemiology. Frontiers in Veterinary Science, 7:596, 2020.
[220] T. N. Kipf and M. Welling. Variational graph auto-encoders. In Proceedings of the Neural Information Processing Systems Workshop on Bayesian Deep Learning, 2016.
[221] M. Kivelä, A. Arenas, M. Barthelemy, J. P. Gleeson, Y. Moreno, and M. A. Porter. Multilayer networks. Journal of Complex Networks, 2(3):203–271, 2014.
[222] S. Klamt, U.-U. Haus, and F. Theis. Hypergraphs and cellular networks. PLOS Computational Biology, 5(5):e1000385, 2009.
[223] S. N. Kobren, B. Chazelle, and M. Singh. PertInInt: an integrative, analytical approach to rapidly uncover cancer driver genes with perturbed interactions and functionalities. Cell Systems, 11(1):63–74, 2020.
[224] S. N. Kobren and M. Singh. Systematic domain-based aggregation of protein structures highlights DNA-, RNA- and other ligand-binding positions. Nucleic Acids Research, 47(2):582–593, 2019.
[225] D. Koller and N. Friedman. Probabilistic graphical models: principles and techniques. MIT Press, 2009.
[226] J. Kong, D. Ha, J. Lee, I. Kim, M. Park, S.-H. Im, K. Shin, and S. Kim. Network-based machine learning approach to predict immunotherapy response in cancer patients. Nature Communications, 13(1):3703, 2022.
[227] J. Köster and S. Rahmann. Snakemake—a scalable bioinformatics workflow engine. Bioinformatics, 28(19):2520–2522, 2012.
[228] I. A. Kovács, K. Luck, K. Spirohn, Y. Wang, C. Pollis, S. Schlabach, W. Bian, D.-K. Kim, N. Kishore, T. Hao, et al. Network-based prediction of protein interactions. Nature Communications, 10(1):1240, 2019.
[229] S. Krieger and J. Kececioglu. Computing optimal factories in metabolic networks with negative regulation. Bioinformatics, 38:i369–i377, 2022.
[230] S. Krieger and J. Kececioglu. Heuristic shortest hyperpaths in cell signaling hypergraphs. Algorithms for Molecular Biology, 17(1):12, 2022.
[231] S. Krieger and J. Kececioglu. Computing shortest hyperpaths for pathway inference in cellular reaction networks. In Proceedings of the International Conference on Research In Computational Molecular Biology, pages 155–173, 2023.
[232] A. Kryshtafovych, M. Antczak, M. Szachniuk, T. Zok, R. C. Kretsch, R. Rangan, P. Pham, R. Das, X. Robin, G. Studer, et al. New prediction categories in CASP15. PROTEINS: Structure, Function, and Bioinformatics, 2023.
[233] A. Kryshtafovych, T. Schwede, M. Topf, K. Fidelis, and J. Moult. Critical assessment of methods of protein structure prediction (CASP)-Round XIV. PROTEINS: Structure, Function, and Bioinformatics, 89(12):1607–1617, 2021.
[234] W. L. Ku, G. Duggal, Y. Li, M. Girvan, and E. Ott. Interpreting patterns of gene expression: Signatures of coregulation, the data processing inequality, and triplet motifs. PLOS One, 7(2):e31969, 2012.
[235] O. Kuchaiev, T. Milenković, V. Memišević, W. Hayes, and N. Pržulj. Topological network alignment uncovers biological function and phylogeny. Journal of the Royal Society Interface, 7:1341–1354, 2010.
[236] M. L. Kuijjer, P.-H. Hsieh, J. Quackenbush, and K. Glass. lionessR: single sample network inference in R. BMC Cancer, 19:1–6, 2019.
[237] M. L. Kuijjer, M. G. Tung, G. Yuan, J. Quackenbush, and K. Glass. Estimating sample-specific regulatory networks. iScience, 14:226–240, 2019.
[238] P. Lambin, R. T. Leijenaar, T. M. Deist, J. Peerlings, E. E. De Jong, J. Van Timmeren, S. Sanduleanu, R. T. Larue, A. J. Even, A. Jochems, et al. Radiomics: the bridge between medical imaging and personalized medicine. Nature Reviews Clinical Oncology, 14(12):749–762, 2017.
[239] F. Langhauser, A. I. Casas, V.-T.-V. Dao, E. Guney, J. Menche, E. Geuss, P. W. Kleikers, M. G. López, A.-L. Barabási, C. Kleinschnitz, et al. A diseasome cluster-based drug repurposing of soluble guanylate cyclase activators from smooth muscle relaxation to direct neuroprotection. NPJ Systems Biology and Applications, 4(1):8, 2018.
[240] S. J. Larsen, R. Röttger, H. H. H. W. Schmidt, and J. Baumbach. E. coli gene regulatory networks are inconsistent with gene expression data. Nucleic Acids Research, 47(1):85–92, 2019.
[241] O. Lazareva, J. Baumbach, M. List, and D. B. Blumenthal. On the limits of active module identification. Briefings in Bioinformatics, 2021.
[242] N. Le Novere. Quantitative and logic modelling of molecular and gene networks. Nature Reviews Genetics, 16(3):146–158, 2015.
[243] B. Lee, S. Zhang, A. Poleksic, and L. Xie. Heterogeneous multi-layered network model for omics data integration and analysis. Frontiers in Genetics, 10:1381, 2020.
[244] Y. Lee, A. H. Kim, E. Kim, S. Lee, K.-S. Yu, I.-J. Jang, J.-Y. Chung, and J.-Y. Cho. Changes in the gut microbiome influence the hypoglycemic effect of metformin through the altered metabolism of branched-chain and nonessential amino acids. Diabetes Research and Clinical Practice, 178:108985, 2021.
[245] M. Leiserson, F. Vandin, H. Wu, J. Dobson, J. Eldridge, and J. Thomas. Pan-Cancer Network Analysis Identifies Combinations of Rare Somatic Mutations across Pathways and Protein Complexes. Nature Genetics, 47:106–114, 2015.
[246] S. Lelong, X. Zhou, C. Afrasiabi, Z. Qian, M. A. Cano, G. Tsueng, J. Xin, J. Mullen, Y. Yao, R. Avila, et al. BioThings SDK: a toolkit for building high-performance data APIs in biomedical research. Bioinformatics, 38(7):2077–2079, 2022.
[247] M. Leordeanu and C. Sminchisescu. Efficient hypergraph clustering. In Proceedings of the International Conference on Artificial Intelligence and Statistics, pages 676–684, 2012.
[248] J. Li, Y. Huang, H. Chang, and Y. Rong. Semi-supervised hierarchical graph classification. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(5):6265–6276, 2022.
[249] M. M. Li, K. Huang, and M. Zitnik. Graph representation learning in biomedicine and healthcare. Nature Biomedical Engineering, pages 1–17, 2022.
[250] M. M. Li, Y. Huang, M. Sumathipala, M. Q. Liang, A. Valdeolivas, A. N. Ananthakrishnan, K. Liao, D. Marbach, and M. Zitnik. Contextualizing protein representations using deep learning on protein networks and single-cell data. bioRxiv 2023.07.18.549602, 2023.
[251] M. M. Li and M. Zitnik. Deep contextual learners for protein networks. In Proceedings of the International Conference on Machine Learning Workshop on Computational Biology, 2021.
[252] Q. Li, K. A. Button-Simons, M. A. Sievert, E. Chahoud, G. F. Foster, K. Meis, M. T. Ferdig, and T. Milenković. Enhancing gene co-expression network inference for the malaria parasite plasmodium falciparum. bioRxiv 2023.05.31.543171, 2023.
[253] Q. Li and T. Milenković. Supervised prediction of aging-related genes from a context-specific protein interaction subnetwork $\dagger$ . IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2021.
[254] Q. Li, K. Newaz, and T. Milenković. Improved supervised prediction of aging-related genes via weighted dynamic network analysis. BMC Bioinformatics, 22(1):1–26, 2021.
[255] Q. Li, K. Newaz, and T. Milenković. Towards future directions in data-integrative supervised prediction of human aging-related genes. Bioinformatics Advances, 2(1):vbac081, 2022.
[256] Y. Li, O. Vinyals, C. Dyer, R. Pascanu, and P. Battaglia. Learning deep generative models of graphs. In Proceedings of the International Conference on Machine Learning, 2018.
[257] Y. Li, R. Yu, C. Shahabi, and Y. Liu. Diffusion Convolutional Recurrent Neural Network: Data-Driven Traffic Forecasting. In Proceedings of the International Conference on Learning Representations, 2018.
[258] Y. Lichtblau, K. Zimmermann, B. Haldemann, D. Lenze, M. Hummel, and U. Leser. Comparative assessment of differential network analysis methods. Briefings in Bioinformatics, 18(5):837–850, 2017.
[259] Z. Lin, H. Akin, R. Rao, B. Hie, Z. Zhu, W. Lu, A. dos Santos Costa, M. Fazel-Zarandi, T. Sercu, S. Candido, et al. Language models of protein sequences at the scale of evolution enable accurate structure prediction. bioRxiv, 2022:500902, 2022.
[260] Z. Lin, H. Akin, R. Rao, B. Hie, Z. Zhu, W. Lu, N. Smetanin, R. Verkuil, O. Kabeli, Y. Shmueli, et al. Evolutionary-scale prediction of atomic-level protein structure with a language model. Science, 379(6637):1123–1130, 2023.
[261] R. Liu and A. Krishnan. Open biomedical network benchmark: A python toolkit for benchmarking datasets with biomedical networks. In Proceedings of the Machine Learning in Computational Biology, pages 23–59, 2024.
[262] S. Liu, D. Hachen, O. Lizardo, C. Poellabauer, A. Striegel, and T. Milenković. The power of dynamic social networks to predict individuals’ mental health. In Proceedings of the Pacific Symposium on Biocomputing, volume 25, pages 635–646, 2020.
[263] S. Liu, F. Vahedian, D. Hachen, O. Lizardo, C. Poellabauer, A. Striegel, and T. Milenković. Heterogeneous network approach to predict individuals’ mental health. ACM Transactions on Knowledge Discovery From Data, 15(2):1–26, 2021.
[264] X. Liu, Y. Wang, H. Ji, K. Aihara, and L. Chen. Personalized characterization of diseases using sample-specific networks. Nucleic Acids Research, 44(22):e164–e164, 2016.
[265] X. Liu, F. Zhang, Z. Hou, L. Mian, Z. Wang, J. Zhang, and J. Tang. Self-supervised learning: Generative or contrastive. IEEE Transactions on Knowledge and Data Engineering, 35(1):857–876, 2021.
[266] Y. Liu, M. Jin, S. Pan, C. Zhou, Y. Zheng, F. Xia, and S. Y. Philip. Graph self-supervised learning: A survey. IEEE Transactions on Knowledge and Data Engineering, 35(6):5879–5900, 2022.
[267] C. M. Lopes-Ramos, C.-Y. Chen, M. L. Kuijjer, J. N. Paulson, A. R. Sonawane, M. Fagny, J. Platig, K. Glass, J. Quackenbush, and D. L. DeMeo. Sex differences in gene expression and regulatory networks across 29 human tissues. Cell Reports, 31(12):107795, 2020.
[268] C. M. Lopes-Ramos, M. L. Kuijjer, S. Ogino, C. S. Fuchs, D. L. DeMeo, K. Glass, and J. Quackenbush. Gene regulatory network analysis identifies sex-linked differences in colon cancer drug metabolism. Cancer Research, 78(19):5538–5547, 2018.
[269] K. Luck, D.-K. Kim, L. Lambourne, K. Spirohn, B. E. Begg, W. Bian, R. Brignall, T. Cafarelli, F. J. Campos-Laborie, B. Charloteaux, et al. A reference map of the human binary protein interactome. Nature, 580(7803):402–408, 2020.
[270] J. Lugo-Martinez, V. Pejaver, K. A. Pagel, S. Jain, M. Mort, D. N. Cooper, S. D. Mooney, and P. Radivojac. The loss and gain of functional amino acid residues is a common mechanism causing human inherited disease. PLOS Computational Biology, 12(8):e1005091, 2016.
[271] J. Lugo-Martinez and P. Radivojac. Generalized graphlet kernels for probabilistic inference in sparse graphs. Network Science, 2(2):254–276, 2014.
[272] J. Lugo-Martinez, D. Zeiberg, T. Gaudelet, N. Malod-Dognin, N. Pržulj, and P. Radivojac. Classification in biological networks with hypergraphlet kernels. Bioinformatics, 37(7):1000–1007, 2021.
[273] R. Luo, L. Sun, Y. Xia, T. Qin, S. Zhang, H. Poon, and T.-Y. Liu. BioGPT: generative pre-trained transformer for biomedical text generation and mining. Briefings in Bioinformatics, 23(6):bbac409, 2022.
[274] X. Luo, W. Ju, M. Qu, Y. Gu, C. Chen, M. Deng, X.-S. Hua, and M. Zhang. CLEAR: Cluster-enhanced contrast for self-supervised graph representation learning. IEEE Transactions on Neural Networks and Learning Systems, 2022.
[275] J. Ma, M. K. Yu, S. Fong, K. Ono, E. Sage, B. Demchak, R. Sharan, and T. Ideker. Using deep learning to model the hierarchical structure and function of a cell. Nature Methods, 15(4):290–298, 2018.
[276] L. Ma, Z. Shao, L. Li, J. Huang, S. Wang, Q. Lin, J. Li, M. Gong, and A. K. Nandi. Heuristics and metaheuristics for biological network alignment: A review. Neurocomputing, 491:426–441, 2022.
[277] C. S. Magnano and A. Gitter. Automating parameter selection to avoid implausible biological pathway models. NPJ Systems Biology and Applications, 7(1):12, 2021.
[278] P. Maheshwari and R. Albert. A framework to find the logic backbone of a biological network. BMC Systems Biology, 11(1):1–18, 2017.
[279] S. Maleki, D. Saless, D. P. Wall, and K. Pingali. HyperNetVec: Fast and scalable hierarchical embedding for hypergraphs. In Proceedings of the International Conference on Network Science, pages 169–183, 2022.
[280] N. Malod-Dognin, G. Ceddia, M. Gvozdenov, B. Tomi, S. Dunji Manevski, V. Djordjevic, and N. Pržulj. A phenotype driven integrative framework uncovers molecular mechanisms of a rare hereditary thrombophilia. PLOS One, 18(4):e0284084, 2023.
[281] N. Malod-Dognin, V. Pancaldi, A. Valencia, and N. Pržulj. Chromatin network markers of leukemia. Bioinformatics, 36:i455–i463, 2020.
[282] N. Malod-Dognin, J. Petschnigg, and N. Pržulj. Precision medicine―a promising, yet challenging road lies ahead. Current Opinion in Systems Biology, 7:1–7, 2018.
[283] N. Malod-Dognin, J. Petschnigg, S. Windels, J. Povh, H. Hemingway, R. Ketteler, and N. Pržulj. Towards a data-integrated cell. Nature Communications, 10(1):805, 2019.
[284] N. Malod-Dognin and N. Pržulj. GR-Align: fast and flexible alignment of protein 3D structures using graphlet degree similarity. Bioinformatics, 30(9):1259–65, 2014.
[285] N. Mamano and W. Hayes. SANA: simulated annealing far outperforms many other search algorithms for biological network alignment. Bioinformatics, 33(14):2156–2164, 2017.
[286] F. Manessi, A. Rozza, and M. Manzo. Dynamic graph convolutional networks. Pattern Recognition, 97:107000, 2020.
[287] M. Manske, U. Böhme, C. Püthe, and M. Berriman. GeneDB and Wikidata. Wellcome Open Research, 4:114, 2019.
[288] D. Marbach, J. C. Costello, R. Küffner, N. M. Vega, R. J. Prill, D. M. Camacho, K. R. Allison, M. Kellis, J. J. Collins, and G. Stolovitzky. Wisdom of crowds for robust gene network inference. Nature Methods, 9(8):796–804, 2012.
[289] D. Marbach, S. Roy, F. Ay, P. E. Meyer, R. Candeias, T. Kahveci, C. A. Bristow, and M. Kellis. Predictive regulatory models in drosophila melanogaster by integrative inference of transcriptional networks. Genome Research, 22(7):1334–1349, 2012.
[290] A. A. Margolin, I. Nemenman, K. Basso, C. Wiggins, G. Stolovitzky, R. D. Favera, and A. Califano. ARACNE: an algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context. BMC Bioinformatics, 7:1–15, 2006.
[291] J. Mateo, R. McKay, W. Abida, R. Aggarwal, J. Alumkal, A. Alva, F. Feng, X. Gao, J. Graff, M. Hussain, et al. Accelerating precision medicine in metastatic prostate cancer. Nature Cancer, 1(11):1041–1053, 2020.
[292] M. B. McDermott, B. Yap, P. Szolovits, and M. Zitnik. Structure-inducing pre-training. Nature Machine Intelligence, pages 1–10, 2023.
[293] J. Meier, R. Rao, R. Verkuil, J. Liu, T. Sercu, and A. Rives. Language models enable zero-shot prediction of the effects of mutations on protein function. Proceedings of the Advances in Neural Information Processing Systems, 34:29287–29303, 2021.
[294] J. Menche, A. Sharma, M. Kitsak, S. D. Ghiassian, M. Vidal, J. Loscalzo, and A.-L. Barabási. Uncovering disease-disease relationships through the incomplete interactome. Science, 347(6224):1257601, 2015.
[295] L. Meng, A. Striegel, and T. Milenković. Local versus global biological network alignment. Bioinformatics, 32(20):3155–3164, 2016.
[296] J. Mervis. Fix the system, not the students. Science, 375(6584), 2022.
[297] M. J. Meyer, J. F. Beltrán, S. Liang, R. Fragoza, A. Rumack, J. Liang, X. Wei, and H. Yu. Interactome INSIDER: a structural interactome browser for genomic studies. Nature Methods, 15(2):107–114, 2018.
[298] P. Meyer and J. Saez-Rodriguez. Advances in systems biology modeling: 10 years of crowdsourcing DREAM challenges. Cell Systems, 12(6):636–653, 2021.
[299] M. Milano, T. Milenković, M. Cannataro, and P. H. Guzzi. L-HetNetAligner: a novel algorithm for local alignment of heterogeneous biological networks. Scientific Reports, 10(1):2045–2322, 2020.
[300] T. Milenković and N. Pržulj. Uncovering biological network function via graphlet degree signatures. Cancer Informatics, 6:257–273, 2008.
[301] R. Milo, S. Itzkovitz, N. Kashtan, R. Levitt, S. Shen-Orr, I. Ayzenshtat, M. Sheffer, and U. Alon. Superfamilies of evolved and designed networks. Science, 303:1538–1542, 2004.
[302] E. R. Miraldi, M. Pokrovskii, A. Watters, D. M. Castro, N. De Veaux, J. A. Hall, J.-Y. Lee, M. Ciofani, A. Madar, N. Carriero, et al. Leveraging chromatin accessibility for transcriptional regulatory network inference in T Helper 17 Cells. Genome Research, 29(3):449–463, 2019.
[303] S. Mishra, Y. X. Wang, C. C. Wei, D. Z. Chen, and X. S. Hu. VTG-Net: a CNN based vessel topology graph network for retinal artery/vein classification. Frontiers in Medicine, page 2124, 2021.
[304] K. Mitra, A.-R. Carvunis, S. K. Ramesh, and T. Ideker. Integrative approaches for finding modular structure in biological networks. Nature Reviews Genetics, 14(10):719–732, 2013.
[305] A. Montagud, J. Béal, L. Tobalina, P. Traynard, V. Subramanian, B. Szalai, R. Alföldi, L. Puskás, A. Valencia, E. Barillot, et al. Patient-specific boolean models of signalling networks guide personalised treatments. Elife, 11:e72626, 2022.
[306] J. E. Moore, M. J. Purcaro, H. E. Pratt, C. B. Epstein, N. Shoresh, J. Adrian, T. Kawli, C. A. Davis, A. Dobin, et al. Expanded encyclopaedias of DNA elements in the human and mouse genomes. Nature, 583(7818):699–710, 2020.
[307] N. Moris, C. Pina, and A. M. Arias. Transition states and cell fate decisions in epigenetic landscapes. Nature Reviews Genetics, 17(11):693–703, 2016.
[308] R. T. Morris, T. R. O’Connor, and J. J. Wyrick. Ceres: software for the integrated analysis of transcription factor binding sites and nucleosome positions in saccharomyces cerevisiae. Bioinformatics, 26(2):168–174, 2010.
[309] D. Morselli Gysi and A.-L. Barabási. Non-coding RNAs improve the predictive power of network medicine. Proceedings of the National Academy of Sciences, 120(45):e2301342120, 2023.
[310] D. Morselli Gysi, T. de Miranda Fragoso, F. Zebardast, W. Bertoli, V. Busskamp, E. Almaas, and K. Nowick. Whole transcriptomic network analysis using co-expression differential network analysis (CoDiNA). PLOS One, 15(10):e0240523, 2020.
[311] D. Morselli Gysi, Í. Do Valle, M. Zitnik, A. Ameli, X. Gan, O. Varol, S. D. Ghiassian, J. Patten, R. A. Davey, J. Loscalzo, et al. Network medicine framework for identifying drug-repurposing opportunities for COVID-19. Proceedings of the National Academy of Sciences, 118(19):e2025581118, 2021.
[312] R. Mosca, A. Céol, and P. Aloy. Interactome3D: adding structural details to protein networks. Nature Methods, 10(1):47–53, 2013.
[313] J. Moult, J. T. Pedersen, R. Judson, and K. Fidelis. A large-scale experiment to assess protein structure prediction methods. PROTEINS: Structure, Function, and Bioinformatics, 23(3):ii–iv, 1995.
[314] P. Mucha, T. Richardson, K. Macon, M. Porter, and J. Onnela. Community structure in time-dependent, multiscale, and multiplex networks. Science, 328(5980):876–878, 2010.
[315] K. A. Murgas, E. Saucan, and R. Sandhu. Hypergraph geometry reflects higher-order dynamics in protein interaction networks. Scientific Reports, 12(1):20879, 2022.
[316] R. Nasser and R. Sharan. BERTwalk for integrating gene networks to predict gene-to pathway-level properties. Bioinformatics Advances, 3(1):vbad086, 2023.
[317] N. Natarajan and I. S. Dhillon. Inductive matrix completion for predicting gene-disease associations. Bioinformatics, 30(12):i60–i68, 2014.
[318] E. J. Needham, B. L. Parker, T. Burykin, D. E. James, and S. J. Humphrey. Illuminating the dark phosphoproteome. Science Signaling, 12(565):eaau8645, 2019.
[319] W. Nelson, M. Žitnik, B. Wang, J. Leskovec, A. Goldenberg, and R. Sharan. To embed or not: network embedding as a paradigm in computational biology. Frontiers in Genetics, 10:381, 2019.
[320] S. Neph, A. B. Stergachis, A. Reynolds, R. Sandstrom, E. Borenstein, and J. A. Stamatoyannopoulos. Circuitry and dynamics of human transcription factor regulatory networks. Cell, 150(6):1274–1286, 2012.
[321] J. Neville, B. Gallagher, and T. Eliassi-Rad. Evaluating statistical tests for within-network classifiers of relational data. In Proceedings of the IEEE International Conference on Data Mining, pages 397–406, 2009.
[322] J. Neville, B. Gallagher, T. Eliassi-Rad, and T. Wang. Correcting evaluation bias of relational classifiers with network cross validation. Knowledge and Information Systems, 30:31–55, 2012.
[323] K. Newaz, M. Ghalehnovi, A. Rahnama, P. Antsaklis, and T. Milenković. Network-based protein structural classification. Royal Society Open Science, 7(6):191461, 2020.
[324] K. Newaz and T. Milenković. Graphlets in network science and computational biology. Analyzing Network Data in Biology and Medicine: An Interdisciplinary Textbook for Biological, Medical and Computational Scientists, pages 193–240, 2019.
[325] K. Newaz and T. Milenkovic. Inference of a Dynamic Aging-related Biological Subnetwork via Network Propagation. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 19(2):974–988, 2022.
[326] K. Newaz, J. Piland, P. L. Clark, S. J. Emrich, J. Li, and T. Milenković. Multi-layer sequential network analysis improves protein 3D structural classification. PROTEINS: Structure, Function, and Bioinformatics, 90(9):1721–1731, 2022.
[327] M. Newman. Networks. Oxford University Press, 2018.
[328] M. E. Newman. Modularity and community structure in networks. Proceedings of the National Academy of Sciences, 103(23):8577–8582, 2006.
[329] J. Ni, M. Koyuturk, H. Tong, J. Haines, R. Xu, and X. Zhang. Disease gene prioritization by integrating tissue-specific molecular networks using a robust multi-network model. BMC Bioinformatics, 17:1–13, 2016.
[330] D. N. Nicholson and C. S. Greene. Constructing knowledge graphs and their biomedical applications. Computational and Structural Biotechnology Journal, 18:1414–1428, 2020.
[331] M. W. Nielsen, C. W. Bloch, and L. Schiebinger. Making gender diversity work for scientific discovery and innovation. Nature Human Behaviour, 2(10):726–734, 2018.
[332] R. Nishihara, K. Glass, K. Mima, T. Hamada, J. A. Nowak, Z. R. Qian, P. Kraft, E. L. Giovannucci, C. S. Fuchs, A. T. Chan, et al. Biomarker correlation network in colorectal carcinoma by tumor anatomic location. BMC Bioinformatics, 18(1):1–14, 2017.
[333] P. Niu, M. J. Soto, B.-J. Yoon, E. R. Dougherty, F. J. Alexander, I. Blaby, and X. Qian. TRIMER: transcription regulation integrated with metabolic regulation. iScience, 24(11):103218, 2021.
[334] M. Nykter, N. D. Price, M. Aldana, S. A. Ramsey, S. A. Kauffman, L. E. Hood, O. Yli-Harja, and I. Shmulevich. Gene expression dynamics in the macrophage exhibit criticality. Proceedings of the National Academy of Sciences, 105(6):1897–1900, 2008.
[335] P. O. O Pinheiro, J. Rackers, J. Kleinhenz, M. Maser, O. Mahmood, A. Watkins, S. Ra, V. Sresht, and S. Saremi. 3d molecule generation by denoising voxel grids. Advances in Neural Information Processing Systems, 36, 2024.
[336] A. of Us Research Program Investigators. The “All of Us” research program. New England Journal of Medicine, 381(7):668–676, 2019.
[337] O. Ourfali, T. Shlomi, T. Ideker, E. Ruppin, and R. Sharan. SPINE: a framework for signaling-regulatory pathway inference from cause-effect experiments. Bioinformatics, 23(13):i359–i366, 2007.
[338] M. Padi and J. Quackenbush. Detecting phenotype-driven transitions in regulatory network structure. NPJ Systems Biology and Applications, 4(1):16, 2018.
[339] R. D. Page. Wikidata and the bibliography of life. PeerJ, 10:e13712, 2022.
[340] S. Pai and G. D. Bader. Patient similarity networks for precision medicine. Journal of Molecular Biology, 430(18):2924–2938, 2018.
[341] S. Pai, S. Hui, R. Isserlin, M. A. Shah, H. Kaka, and G. D. Bader. netDx: interpretable patient classification using integrated patient similarity networks. Molecular Systems Biology, 15(3):e8497, 2019.
[342] S. Pan, L. Luo, Y. Wang, C. Chen, J. Wang, and X. Wu. Unifying large language models and knowledge graphs: A roadmap. IEEE Transactions on Knowledge and Data Engineering, 2024.
[343] J. Pandey, M. Koyutürk, Y. Kim, W. Szpankowski, S. Subramaniam, and A. Grama. Functional annotation of regulatory pathways. Bioinformatics, 23(13):i377–i386, 2007.
[344] N. Papanikolaou, G. A. Pavlopoulos, T. Theodosiou, and I. Iliopoulos. Protein–protein interaction predictions using text mining methods. Methods, 74:47–53, 2015.
[345] A. Pareja, G. Domeniconi, J. Chen, T. Ma, T. Suzumura, H. Kanezashi, T. Kaler, T. Schardl, and C. Leiserson. Evolvegcn: evolving graph convolutional networks for dynamic graphs. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, pages 5363–5370, 2020.
[346] J. Park, B. J. Hescott, and D. K. Slonim. Pathway centrality in protein interaction networks identifies putative functional mediating pathways in pulmonary disease. Scientific Reports, 9(1):5863, 2019.
[347] Y. Park and E. M. Marcotte. Flaws in evaluation schemes for pair-input computational predictions. Nature Methods, 9(12):1134–1136, 2012.
[348] J. Patten, P. T. Keiser, D. Morselli-Gysi, G. Menichetti, H. Mori, C. J. Donahue, X. Gan, I. do Valle, K. Geoghegan-Barek, M. Anantpadma, et al. Identification of potent inhibitors of SARS-CoV-2 infection by combined pharmacological evaluation and cellular network prioritization. iScience, 25(9), 2022.
[349] E. O. Paull, D. E. Carlin, M. Niepel, P. K. Sorger, D. Haussler, and J. M. Stuart. Discovering causal pathways linking genomic events to transcriptional states using tied diffusion through interacting events (TieDIE). Bioinformatics, 29(21):2757–2764, 2013.
[350] S. A. Peck Justice, N. A. McCracken, J. F. Victorino, G. D. Qi, A. B. Wijeratne, and A. L. Mosley. Boosting detection of low-abundance proteins in thermal proteome profiling experiments by addition of an isobaric trigger channel to TMT multiplexes. Analytical Chemistry, 93(18):7000–7010, 2021.
[351] C. Peng, F. Xia, M. Naseriparsa, and F. Osborne. Knowledge graphs: opportunities and challenges. Artificial Intelligence Review, pages 1–32, 2023.
[352] H. Peng, H. Wang, B. Du, M. Z. A. Bhuiyan, H. Ma, J. Liu, L. Wang, Z. Yang, L. Du, S. Wang, et al. Spatial temporal incidence dynamic graph neural networks for traffic flow forecasting. Information Sciences, 521:277–290, 2020.
[353] L. Perez De Souza, S. Alseekh, Y. Brotman, and A. R. Fernie. Network-based strategies in metabolomics data analysis and interpretation: from molecular networking to biological interpretation. Expert Review of Proteomics, 17(4):243–255, 2020.
[354] B. Perozzi, R. Al-Rfou, and S. Skiena. Deepwalk: online learning of social representations. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 701–710, 2014.
[355] A. V. Persikov and M. Singh. De novo prediction of DNA-binding specificities for Cys₂His₂ zinc finger proteins. Nucleic Acids Research, 42(1):97–108, 2014.
[356] M. Petti and L. Farina. Network medicine for patients’ stratification: from single-layer to multi-omics. Wires Mechanisms of Disease, page e1623, 2023.
[357] L. Piersimoni, P. L. Kastritis, C. Arlt, and A. Sinz. Cross-linking mass spectrometry for investigating protein conformations and protein-protein interactions—a method for all seasons. Chemical Reviews, 122(8):7500–7531, 2022.
[358] E. Pierson, G. Consortium, D. Koller, A. Battle, and S. Mostafavi. Sharing and specificity of co-expression networks across 35 human tissues. PLOS Computational Biology, 11(5):e1004220, 2015.
[359] L. Pio-Lopez, A. Valdeolivas, L. Tichit, É. Remy, and A. Baudot. MultiVERSE: a multiplex and multiplex-heterogeneous network embedding approach. Scientific Reports, 11(1):8794, 2021.
[360] L. Pirhaji, P. Milani, M. Leidl, T. Curran, J. Avila-Pacheco, C. B. Clish, F. M. White, A. Saghatelian, and E. Fraenkel. Revealing disease-associated pathways by network integration of untargeted metabolomics. Nature Methods, 13(9):770–776, 2016.
[361] N. Pržulj. Biological network comparison using graphlet degree distribution. Bioinformatics, 23(2):e177–e183, 2007.
[362] N. Pržulj, D. G. Corneil, and I. Jurisica. Modeling interactome: scale-free or geometric? Bioinformatics, 20(18):3508–3515, 2004.
[363] N. Pržulj and N. Malod-Dognin. Network analytics in the age of big data. Science, 353(6295):123–124, 2016.
[364] P. F. Przytycki and M. Singh. Differential allele-specific expression uncovers breast cancer genes dysregulated by cis noncoding mutations. Cell Systems, 10(2):193–203, 2020.
[365] P. Purkait, T. J. Chin, A. Sadri, and D. Suter. Clustering with hypergraphs: the case for large hyperedges. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(9):1697–1711, 2017.
[366] S. Pushpakom, F. Iorio, P. A. Eyers, K. J. Escott, S. Hopper, A. Wells, A. Doig, T. Guilliams, J. Latimer, C. McNamee, et al. Drug repurposing: progress, challenges and recommendations. Nature Reviews Drug Discovery, 18(1):41–58, 2019.
[367] P. Radivojac, W. T. Clark, T. R. Oron, A. M. Schnoes, T. Wittkop, A. Sokolov, K. Graim, C. Funk, K. Verspoor, A. Ben-Hur, et al. A large-scale evaluation of computational protein function prediction. Nature Methods, 10(3):221–227, 2013.
[368] M. Radovanović, A. Nanopoulos, and M. Ivanović. Hubs in space: popular nearest neighbors in high-dimensional data. Journal of Machine Learning Research, 11:2487–2531, 2010.
[369] E. Ramadan, A. Tarafdar, and A. Pothen. A hypergraph model for the yeast protein complex network. In Proceedings of the International Parallel and Distributed Processing Symposium, page 189, 2004.
[370] R. Ramola, I. Friedberg, and P. Radivojac. The field of protein function prediction as viewed by different domain scientists. Bioinformatics Advances, 2(1):vbac057, 2022.
[371] R. Rao, N. Bhattacharya, N. Thomas, Y. Duan, P. Chen, J. Canny, P. Abbeel, and Y. Song. Evaluating protein transfer learning with tape. In Proceedings of the Advances in Neural Information Processing Systems, volume 32, 2019.
[372] R. Rao, N. Bhattacharya, N. Thomas, Y. Duan, X. Chen, J. Canny, P. Abbeel, and Y. S. Song. Evaluating protein transfer learning with tape. In Advances in Neural Information Processing Systems, 2019.
[373] D. N. Reshef, Y. A. Reshef, H. K. Finucane, S. R. Grossman, G. McVean, P. J. Turnbaugh, E. S. Lander, M. Mitzenmacher, and P. C. Sabeti. Detecting novel associations in large data sets. Science, 334(6062):1518–1524, 2011.
[374] M. A. Reyna, U. Chitra, R. Elyanow, and B. J. Raphael. NetMix: a network-structured mixture model for reduced-bias estimation of altered subnetworks. Journal of Computational Biology, 28(5):469–484, 2021.
[375] G. Rhodes. Crystallography Made Crystal Clear, Third Edition: A Guide for Users of Macromolecular Models. Elsevier, 2010.
[376] A. Rider, T. Milenković, G. Siwo, R. Pinapati, S. Emrich, M. Ferdig, and N. Chawla. Networks are important for systems biology. Network Science, 2(02):139–161, 2014.
[377] M. E. Ritchie, B. Phipson, D. Wu, Y. Hu, C. W. Law, W. Shi, and G. K. Smyth. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Research, 43(7):e47–e47, 2015.
[378] A. Ritz, C. L. Poirel, A. N. Tegge, N. Sharp, K. Simmons, A. Powell, S. D. Kale, and T. Murali. Pathways on demand: automated reconstruction of human signaling networks. NPJ Systems Biology and Applications, 2(1):1–9, 2016.
[379] A. Ritz, A. N. Tegge, H. Kim, C. L. Poirel, and T. Murali. Signaling hypergraphs. Trends in Biotechnology, 32(7):356–362, 2014.
[380] C. H. Rodrigues and D. B. Ascher. CSM-Potential: mapping protein interactions and biological ligands in 3D space using geometric deep learning. Nucleic Acids Research, 50(W1):W204–W209, 2022.
[381] J. D. Rogers, B. A. Aguado, K. M. Watts, K. S. Anseth, and W. J. Richardson. Network modeling predicts personalized gene expression and drug responses in valve myofibroblasts cultured with patient sera. Proceedings of the National Academy of Sciences, 119(8):e2117323119, 2022.
[382] T. Rolland, M. Taşan, B. Charloteaux, S. J. Pevzner, Q. Zhong, N. Sahni, S. Yi, I. Lemmens, C. Fontanillo, R. Mosca, et al. A proteome-scale map of the human interactome network. Cell, 159(5):1212–1226, 2014.
[383] S. Roy, S. Lagree, Z. Hou, J. A. Thomson, R. Stewart, and A. P. Gasch. Integrated module and gene-specific regulatory inference implicates upstream signaling networks. PLOS Computational Biology, 9(10):e1003252, 2013.
[384] D. Ruan, A. Young, and G. Montana. Differential analysis of biological networks. BMC Bioinformatics, 16(1):327, 2015.
[385] W. Saelens, R. Cannoodt, and Y. Saeys. A comprehensive evaluation of module detection methods for gene expression data. Nature Communications, 9(1):1090, 2018.
[386] J. Saez-Rodriguez, J. C. Costello, S. H. Friend, M. R. Kellen, L. Mangravite, P. Meyer, T. Norman, and G. Stolovitzky. Crowdsourcing biomedical research: leveraging communities as innovation engines. Nature Reviews Genetics, 17(8):470–486, 2016.
[387] N. Sahni, S. Yi, M. Taipale, J. I. F. Bass, J. Coulombe-Huntington, F. Yang, J. Peng, J. Weile, G. I. Karras, Y. Wang, et al. Widespread macromolecular interaction perturbations in human genetic disorders. Cell, 161(3):647–660, 2015.
[388] H. R. Saibil. Cryo-EM in molecular and cellular biology. Molecular Cell, 82(2):274–284, 2022.
[389] A. Said, R. Bayrak, T. Derr, M. Shabbir, D. Moyer, C. Chang, and X. Koutsoukos. Neurograph: Benchmarks for graph machine learning in brain connectomics. Advances in Neural Information Processing Systems, 36:6509–6531, 2023.
[390] D. Salazar, C. Valencia, and N. Pržulj. Multi-project and multi-profile joint non-negative matrix factorization for cancer omic datasets. Bioinformatics, 37(24):4801–4809, 2021.
[391] C. Samieri, A. R. Sonawane, S. Lefèvre-Arbogast, C. Helmer, F. Grodstein, and K. Glass. Using network science tools to identify novel diet patterns in prodromal dementia. Neurology, 94(19):e2014–e2025, 2020.
[392] J. C. Sanghvi, S. Regot, S. Carrasco, J. R. Karr, M. V. Gutschow, B. Bolival Jr, and M. W. Covert. Accelerated discovery via a whole-cell model. Nature Methods, 10(12):1192–1195, 2013.
[393] A. Sarajlić, N. Malod-Dognin, Ö. Yaveroğlu, and N. Pržulj. Graphlet-based characterization of directed networks. Scientific Reports, 6, 2016.
[394] V. Saraph and T. Milenković. MAGNA: maximizing accuracy in global network alignment. Bioinformatics, 30(20):2931–2940, 2014.
[395] A. Sarraju, S. Ngo, and F. Rodriguez. The leaky pipeline of diverse race and ethnicity representation in academic science and technology training in the United States, 2003–2019. PLOS One, 18(4):e0284945, 2023.
[396] E. E. Schadt. Molecular networks as sensors and drivers of common human diseases. Nature, 461(7261):218–223, 2009.
[397] M. Schaefer, L. Serrano, and M. Andrade-Navarro. Correcting for the study bias associated with protein–protein interaction measurements reveals differences between protein degree distributions from different cancer types. Frontiers in Genetics, 6:137790, 2015.
[398] G. Schiebinger, J. Shu, M. Tabaka, B. Cleary, V. Subramanian, A. Solomon, J. Gould, S. Liu, S. Lin, P. Berube, et al. Optimal-transport analysis of single-cell gene expression identifies developmental trajectories in reprogramming. Cell, 176(4):928–943, 2019.
[399] M. Schlichtkrull, T. N. Kipf, P. Bloem, R. Van Den Berg, I. Titov, and M. Welling. Modeling relational data with graph convolutional networks. In Proceedings of the International Semantic Web Conference, pages 593–607, 2018.
[400] F. Schmidt, N. Gasparoni, G. Gasparoni, K. Gianmoena, C. Cadenas, J. K. Polansky, P. Ebert, K. Nordström, M. Barann, A. Sinha, et al. Combining transcription factor binding affinities with open-chromatin data for accurate gene expression prediction. Nucleic Acids Research, 45(1):54–66, 2017.
[401] F. Schmidt, F. Kern, P. Ebert, N. Baumgarten, and M. H. Schulz. TEPIC 2—an extended framework for transcription factor binding prediction and integrative epigenomic analysis. Bioinformatics, 35(9):1608–1609, 2019.
[402] M. Schmidt, A. Niculescu-Mizil, and K. Murphy. Learning graphical model structure using L1-regularization paths. In Proceedings of the National Conference on Artificial Intelligence, page 1278–1283, 2007.
[403] J. D. Schwab, N. Ikonomi, S. D. Werle, F. M. Weidner, H. Geiger, and H. A. Kestler. Reconstructing Boolean network ensembles from single-cell data for unraveling dynamics in the aging of human hematopoietic stem cells. Computational and Structural Biotechnology Journal, 19:5321–5332, 2021.
[404] A. J. Sedgewick, K. Buschur, I. Shi, J. D. Ramsey, V. K. Raghu, D. V. Manatakis, Y. Zhang, J. Bon, D. Chandra, C. Karoleski, et al. Mixed graphical models for integrative causal analysis with application to chronic lung disease diagnosis and prognosis. Bioinformatics, 35(7):1204–1212, 2018.
[405] E. Segal, D. Pe’er, A. Regev, D. Koller, N. Friedman, and T. Jaakkola. Learning module networks. Journal of Machine Learning Research, 6(4), 2005.
[406] Y. Sha, S. Wang, P. Zhou, and Q. Nie. Inference and multiscale model of epithelial-to-mesenchymal transition via single-cell transcriptomic data. Nucleic Acids Research, 48(17):9505–9520, 2020.
[407] Z. Sha, D. Schijven, S. E. Fisher, and C. Francks. Genetic architecture of the white matter connectome of the human brain. Science Advances, 9(7):eadd2870, 2023.
[408] R. Sharan and T. Ideker. Modeling cellular machinery through biological network comparison. Nature Biotechnology, 24(4):427–433, 2006.
[409] J. Shawe-Taylor and N. Cristianini. Kernel Methods for Pattern Analysis. Cambridge University Press, 2004.
[410] O. Shchur, M. Mumme, A. Bojchevski, and S. Günnemann. Pitfalls of graph neural network evaluation. In Proceedings of the Relational Representation Learning Workshop, 2018.
[411] N. Shervashidze, S. Vishwanathan, T. Petri, K. Mehlhorn, and K. Borgwardt. Efficient graphlet kernels for large graph comparison. In Proceedings of the Artificial Intelligence and Statistics, pages 488–495, 2009.
[412] S. Shit, J. C. Paetzold, A. Sekuboyina, I. Ezhov, A. Unger, A. Zhylka, J. P. Pluim, U. Bauer, and B. H. Menze. clDice-a novel topology-preserving loss function for tubular structure segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16560–16569, 2021.
[413] A. F. Siahpirani and S. Roy. A prior-based integrative framework for functional transcriptional regulatory network inference. Nucleic Acids Research, 45(4):e21–e21, 2017.
[414] D. Silverbush, S. Cristea, G. Yanovich-Arad, T. Geiger, N. Beerenwinkel, and R. Sharan. Simultaneous integration of multi-omics data improves the identification of cancer driver modules. Cell Systems, 8(5):456–466, 2019.
[415] M. Simonovsky and N. Komodakis. GraphVAE: towards generation of small graphs using variational autoencoders. In Proceedings of the Artificial Neural Networks and Machine Learning, pages 412–422, 2018.
[416] R. Singh, K. Devkota, S. Sledzieski, B. Berger, and L. Cowen. Topsy-Turvy: integrating a global view into sequence-based PPI prediction. Bioinformatics, 38(Supplement_1):i264–i272, 2022.
[417] R. Singh, J. Xu, and B. Berger. Global alignment of multiple protein interaction networks with application to functional orthology detection. Proceedings of the National Academy of Sciences, 105(35):12763–12768, 2008.
[418] S. Sledzieski, K. Devkota, R. Singh, L. Cowen, and B. Berger. TT3D: Leveraging precomputed protein 3D sequence models to predict protein–protein interactions. Bioinformatics, 39(11):btad663, 2023.
[419] S. Sledzieski, R. Singh, L. Cowen, and B. Berger. D-script translates genome to phenome with sequence-based, structure-aware, genome-scale predictions of protein-protein interactions. Cell Systems, 12(10):969–982, 2021.
[420] K. P. Smith and N. A. Christakis. Social networks and health. Annual Review of Sociology, 34:405–429, 2008.
[421] R. Solava, R. Michaels, and T. Milenković. Graphlet-based edge clustering reveals pathogen-interacting proteins. Bioinformatics, 18(28):i480–i486, 2012.
[422] A. R. Sonawane, D. L. DeMeo, J. Quackenbush, and K. Glass. Constructing gene regulatory networks using epigenetic data. NPJ Systems Biology and Applications, 7(1):45, 2021.
[423] A. R. Sonawane, J. Platig, M. Fagny, C.-Y. Chen, J. N. Paulson, C. M. Lopes-Ramos, D. L. DeMeo, J. Quackenbush, K. Glass, and M. L. Kuijjer. Understanding tissue-specific gene regulation. Cell Reports, 21(4):1077–1088, 2017.
[424] A. R. Sonawane, S. T. Weiss, K. Glass, and A. Sharma. Network medicine in the age of biomedical big data. Frontiers in Genetics, 10:445334, 2019.
[425] E. Sprinzak, S. Sattath, and H. Margalit. How reliable are experimental protein–protein interaction data? Journal of Molecular Biology, 327(5):919–923, 2003.
[426] C. Stark, B.-J. Breitkreutz, T. Reguly, L. Boucher, A. Breitkreutz, and M. Tyers. BioGRID: a general repository for interaction datasets. Nucleic Acids Research, 34(suppl_1):D535–D539, 2006.
[427] H. Stärk, O. Ganea, L. Pattanaik, R. Barzilay, and T. Jaakkola. Equibind: Geometric deep learning for drug binding structure prediction. In Proceedings of the International Conference on Machine Learning, pages 20503–20521, 2022.
[428] C. Stegehuis, R. Van Der Hofstad, and J. S. Van Leeuwaarden. Epidemic spreading on complex networks with community structures. Scientific Reports, 6(1):1–7, 2016.
[429] K. R. Stevens, K. S. Masters, P. Imoukhuede, K. A. Haynes, L. A. Setton, E. Cosgriff-Hernandez, M. A. L. Bell, P. Rangamani, S. E. Sakiyama-Elbert, S. D. Finley, et al. Fund Black scientists. Cell, 184(3):561–565, 2021.
[430] J. M. Stokes, K. Yang, K. Swanson, W. Jin, A. Cubillos-Ruiz, N. M. Donghia, C. R. MacNair, S. French, L. A. Carfrae, Z. Bloom-Ackermann, et al. A deep learning approach to antibiotic discovery. Cell, 180(4):688–702, 2020.
[431] G. Stolovitzky, D. Monroe, and A. Califano. Dialogue on reverse-engineering assessment and methods: the DREAM of high-throughput pathway inference. Annals of the New York Academy of Sciences, 1115(1):1–22, 2007.
[432] Y. Sun, J. Crawford, J. Tang, and T. Milenković. Simultaneous optimization of both node and edge conservation in network alignment via WAVE. In Proceedings of the Workshop on Algorithms In Bioinformatics, pages 16–39, 2015.
[433] Y. Sun, J. Han, X. Yan, P. S. Yu, and T. Wu. Pathsim: Meta path-based top-k similarity search in heterogeneous information networks. Proceedings of the VLDB Endowment, 4(11):992–1003, 2011.
[434] Y. Sverchkov and M. Craven. A review of active learning approaches to experimental design for uncovering biological networks. PLOS Computational Biology, 13(6):e1005466, 2017.
[435] Z. E. Sychev, A. Hu, T. A. DiMaio, A. Gitter, N. D. Camp, W. S. Noble, A. Wolf-Yadlin, and M. Lagunoff. Integrated systems biology analysis of KSHV latent infection reveals viral induction and reliance on peroxisome mediated lipid metabolism. PLOS Pathogens, 13(3):e1006256, 2017.
[436] D. Szklarczyk, R. Kirsch, M. Koutrouli, K. Nastou, F. Mehryary, R. Hachilif, A. L. Gable, T. Fang, N. T. Doncheva, S. Pyysalo, et al. The STRING database in 2023: protein–protein association networks and functional enrichment analyses for any sequenced genome of interest. Nucleic Acids Research, 51(D1):D638–D646, 2023.
[437] B. Tahmasebi, D. Lim, and S. Jegelka. Counting substructures with higher-order graph neural networks: Possibility and impossibility results. arXiv:2012.03174, 2020.
[438] B. Tahmasebi, D. Lim, and S. Jegelka. The power of recursion in graph neural networks for counting substructures. In Proceedings of the International Conference on Artificial Intelligence and Statistics, pages 11023–11042, 2023.
[439] F. Tang, D. Xu, S. Wang, C. K. Wong, A. Martinez-Fundichely, C. J. Lee, S. Cohen, J. Park, C. E. Hill, K. Eng, et al. Chromatin profiles classify castration-resistant prostate cancers suggesting therapeutic targets. Science, 376(6596):eabe1505, 2022.
[440] J. Tang, M. Qu, M. Wang, M. Zhang, J. Yan, and Q. Mei. LINE: Large-scale information network embedding. In Proceedings of the International Conference on World Wide Web, pages 1067–1077, 2015.
[441] A. E. Teschendorff and A. P. Feinberg. Statistical mechanics meets single-cell biology. Nature Reviews Genetics, 22(7):459–476, 2021.
[442] C. V. Theodoris, L. Xiao, A. Chopra, M. D. Chaffin, Z. R. Al Sayed, M. C. Hill, H. Mantineo, E. M. Brydon, Z. Zeng, X. S. Liu, et al. Transfer learning enables predictions in network biology. Nature, pages 1–9, 2023.
[443] R. Torbey, N. D. Martin, J. R. Warner, and C. L. Fletcher. Algebra I before high school as a gatekeeper to computer science participation. In Proceedings of the ACM Technical Symposium on Computer Science Education, pages 839–844, 2020.
[444] R. J. Townshend, S. Eismann, A. M. Watkins, R. Rangan, M. Karelina, R. Das, and R. O. Dror. Geometric deep learning of rna structure. Science, 373(6558):1047–1051, 2021.
[445] P.-T. Tseng, Y.-S. Cheng, C.-F. Yen, Y.-W. Chen, B. Stubbs, P. Whiteley, A. F. Carvalho, D.-J. Li, T.-Y. Chen, W.-C. Yang, et al. Peripheral iron levels in children with attention-deficit hyperactivity disorder: a systematic review and meta-analysis. Scientific Reports, 8(1):1–11, 2018.
[446] H. Tsuruta, H. Yamazaki, R. Maeda, R. Tamura, J. Wei, Z. E. Mariet, P. Phloyphisut, H. Shimokawa, J. R. Ledsam, L. Colwell, et al. Avida-hil6: a large-scale vhh dataset produced from an immunized alpaca for predicting antigen-antibody interactions. Advances in Neural Information Processing Systems, 36, 2024.
[447] J.-J. Tu, L. Ou-Yang, Y. Zhu, H. Yan, H. Qin, and X.-F. Zhang. Differential network analysis by simultaneously considering changes in gene interactions and gene expression. Bioinformatics, 37(23):4414–4423, 2021.
[448] K. Tu, P. Cui, X. Wang, F. Wang, and W. Zhu. Structural deep embedding for hyper-networks. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 32, 2018.
[449] N. Tuncbag, A. Braunstein, A. Pagnani, S.-S. C. Huang, J. Chayes, C. Borgs, R. Zecchina, and E. Fraenkel. Simultaneous reconstruction of multiple signaling pathways via the prize-collecting steiner forest problem. Journal of Computational Biology, 20(2):124–136, 2013.
[450] N. Tuncbag, S. J. Gosline, A. Kedaigle, A. R. Soltis, A. Gitter, and E. Fraenkel. Network-based interpretation of diverse high-throughput datasets through the omics integrator software package. PLOS Computational Biology, 12(4):e1004879, 2016.
[451] Ü. Ünsal, A. Cüvitoğlu, K. Turhan, and Z. Işık. Nmsdr: Drug repurposing approach based on transcriptome data and network module similarity. Molecular Informatics, 42(3):2200077, 2023.
[452] V. Vacic, L. M. Iakoucheva, S. Lonardi, and P. Radivojac. Graphlet kernels for prediction of functional residues in protein structures. Journal of Computational Biology, 17(1):55–72, 2010.
[453] M. G. Van Der Wijst, D. H. de Vries, H. Brugge, H.-J. Westra, and L. Franke. An integrative approach for building personalized gene regulatory networks for precision medicine. Genome Medicine, 10(1):1–15, 2018.
[454] H. H. van Haagen, P. A. ’t Hoen, A. Botelho Bovo, A. de Morree, E. M. van Mulligen, C. Chichester, J. A. Kors, J. T. den Dunnen, G.-J. B. van Ommen, S. M. van der Maarel, et al. Novel protein-protein interactions inferred from literature context. PLOS One, 4:e7894, 2009.
[455] O. Vanunu, O. Magger, E. Ruppin, T. Shlomi, and R. Sharan. Associating genes and protein complexes with disease via network propagation. PLOS Computational Biology, 6(1):e1000641, 2010.
[456] S. V. Vasaikar, P. Straub, J. Wang, and B. Zhang. Linkedomics: analyzing multi-omics data within and across 32 cancer types. Nucleic Acids Research, 46(D1):D956–D963, 2018.
[457] P. Veličković and C. Blundell. Neural algorithmic reasoning. Patterns, 2(7), 2021.
[458] P. Veličković, G. Cucurull, A. Casanova, A. Romero, P. Lio, and Y. Bengio. Graph attention networks. In Proceedings of the International Conference on Learning Representations, 2018.
[459] P. Veličković, W. Fedus, W. L. Hamilton, P. Liò, Y. Bengio, and R. D. Hjelm. Deep Graph Infomax. In Proceedings of the International Conference on Learning Representations, 2019.
[460] N. Verstraete, G. Jurman, G. Bertagnolli, A. Ghavasieh, V. Pancaldi, and M. De Domenico. CovMulNet19, Integrating Proteins, Diseases, Drugs, and Symptoms: A Network Medicine Approach to COVID-19. Network and Systems Medicine, 3(1):130–141, 2020.
[461] V. Vijayan, D. Critchlow, and T. Milenković. Alignment of dynamic networks. Bioinformatics, 33(14):i180–i189, 2017.
[462] V. Vijayan, E. Krebs, and T. Milenković. Pairwise versus multiple global network alignment. IEEE Access, 8:41961–41974, 2020.
[463] V. Vijayan and T. Milenković. Multiple network alignment via multiMAGNA++. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 15(5):1669–1682, 2017.
[464] V. Vijayan and T. Milenković. Aligning dynamic networks with DynaWAVE. Bioinformatics, 34(10):1795–1798, 2018.
[465] V. Vijayan, V. Saraph, and T. Milenković. MAGNA++: Maximizing Accuracy in Global Network Alignment via both node and edge conservation. Bioinformatics, 31(14):2409–2411, 2015.
[466] S. V. N. Vishwanathan, N. N. Schraudolph, R. I. Kondor, and K. M. Borgwardt. Graph kernels. Journal of Machine Learning Research, 11:1201–1242, 2010.
[467] I. Voitalov, L. Zhang, C. Kilpatrick, J. B. Withers, A. Saleh, V. R. Akmaev, and S. D. Ghiassian. The module triad: A novel network biology approach to utilize patients’ multi-omics data for target discovery in ulcerative colitis. Scientific Reports, 12(1):21685, 2022.
[468] C. Von Mering, R. Krause, B. Snel, M. Cornell, S. G. Oliver, S. Fields, and P. Bork. Comparative assessment of large-scale data sets of protein–protein interactions. Nature, 417(6887):399–403, 2002.
[469] A. Waagmeester, G. Stupp, S. Burgstaller-Muehlbacher, B. M. Good, M. Griffith, O. L. Griffith, K. Hanspers, H. Hermjakob, T. S. Hudson, K. Hybiske, et al. Wikidata as a knowledge graph for the life sciences. eLife, 9:e52614, 2020.
[470] G. Wachman and R. Khardon. Learning from interpretations: a rooted kernel for ordered hypergraphs. In Proceedings of the International Conference on Machine Learning, pages 943–950, 2007.
[471] B. Wang, A. M. Mezlini, F. Demir, M. Fiume, Z. Tu, M. Brudno, B. Haibe-Kains, and A. Goldenberg. Similarity network fusion for aggregating data types on a genomic scale. Nature Methods, 11(3):333–337, 2014.
[472] B. Wang, A. Pourshafeie, M. Zitnik, J. Zhu, C. D. Bustamante, S. Batzoglou, and J. Leskovec. Network enhancement as a general method to denoise weighted biological networks. Nature Communications, 9(1):3108, 2018.
[473] H. Wang, T. Fu, Y. Du, W. Gao, K. Huang, Z. Liu, P. Chandak, S. Liu, P. Van Katwyk, A. Deac, et al. Scientific discovery in the age of artificial intelligence. Nature, 620(7972):47–60, 2023.
[474] H. Wang, D. Lian, W. Liu, D. Wen, C. Chen, and X. Wang. Powerful graph of graphs neural network for structured entity analysis. World Wide Web, 25(2):609–629, 2022.
[475] H. Wang, D. Lian, Y. Zhang, L. Qin, and X. Lin. GoGNN: Graph of Graphs Neural Network for Predicting Structured Entity Interactions. In Proceedings of the International Joint Conference on Artificial Intelligence, 2021.
[476] H. Wang, H. Zheng, and D. Z. Chen. TANGO: A GO-term embedding based method for protein semantic similarity prediction. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 20(1):694–706, 2023.
[477] J. Wang, S. Lisanza, D. Juergens, D. Tischer, J. L. Watson, K. M. Castro, R. Ragotte, A. Saragovi, L. F. Milles, M. Baek, et al. Scaffolding protein functional sites using deep learning. Science, 377(6604):387–394, 2022.
[478] L. Wang, H. Liu, Y. Liu, J. Kurtin, and S. Ji. Learning hierarchical protein representations via complete 3d graph networks. In Proceedings of the International Conference on Learning Representations, 2022.
[479] Q. Wang, H. Jiang, Y. Jiang, S. Yi, Q. Nie, and G. Zhang. Multiplex network infomax: Multiplex network embedding via information fusion. Digital Communications and Networks, 2022.
[480] T. Wang, W. Shao, Z. Huang, H. Tang, J. Zhang, Z. Ding, and K. Huang. MOGONET integrates multi-omics data using graph convolutional networks allowing patient classification and biomarker identification. Nature Communications, 12(1):3445, 2021.
[481] X. Wang, D. Bo, C. Shi, S. Fan, Y. Ye, and S. Y. Philip. A survey on heterogeneous graph embedding: methods, techniques, applications and sources. IEEE Transactions on Big Data, 9(2):415–436, 2022.
[482] X. Wang, H. Ji, C. Shi, B. Wang, Y. Ye, P. Cui, and P. S. Yu. Heterogeneous graph attention network. In Proceedings of the World Wide Web Conference, pages 2022–2032, 2019.
[483] X. Wang, L. Madeddu, K. Spirohn, L. Martini, A. Fazzone, L. Becchetti, T. Wytock, I. Kovács, O. Balogh, B. Benczik, and M. Pétervári. Assessment of community efforts to advance network-based prediction of protein–protein interactions. Nature Communications, 14(1):1582, 2023.
[484] Y. Wang, H. Lee, J. M. Fear, I. Berger, B. Oliver, and T. M. Przytycka. NetREX-CF integrates incomplete transcription factor data with gene expression to reconstruct gene regulatory networks. Communications Biology, 5(1):1282, 2022.
[485] Y. Wang, Q. Peng, W. Wang, X. Guo, M. Shao, H. Liu, W. Liang, and L. Pan. Network alignment enhanced via modeling heterogeneity of anchor nodes. Knowledge-Based Systems, 250:109116, 2022.
[486] Y. Wang, Y. Zhao, N. Shah, and T. Derr. Imbalanced graph classification via graph-of-graph neural networks. In Proceedings of the ACM International Conference on Information and Knowledge Management, pages 2067–2076, 2022.
[487] J. L. Watson, D. Juergens, N. R. Bennett, B. L. Trippe, J. Yim, H. E. Eisenach, W. Ahern, A. J. Borst, R. J. Ragotte, L. F. Milles, et al. De novo design of protein structure and function with rfdiffusion. Nature, 620(7976):1089–1100, 2023.
[488] A. N. Weber, Z. A. Bittner, S. Shankar, X. Liu, T.-H. Chang, T. Jin, and A. Tapia-Abellán. Recent insights into the regulatory networks of nlrp3 inflammasome activation. Journal of Cell Science, 133(23):jcs248344, 2020.
[489] D. Weighill, M. Ben Guebila, K. Glass, J. Platig, J. J. Yeh, and J. Quackenbush. Gene targeting in disease networks. Frontiers in Genetics, 12:649942, 2021.
[490] D. Weighill, M. B. Guebila, K. Glass, J. Quackenbush, and J. Platig. Predicting genotype-specific gene regulatory networks. Genome Research, 32(3):524–533, 2022.
[491] J. Wen, X. Zhang, E. Rush, V. A. Panickan, X. Li, T. Cai, D. Zhou, Y.-L. Ho, L. Costa, E. Begoli, et al. Multimodal representation learning for predicting molecule–disease relations. Bioinformatics, 39(2):btad085, 2023.
[492] J. L. Wetzel, K. Zhang, and M. Singh. Learning probabilistic protein–DNA recognition codes from DNA-binding specificities using structural mappings. Genome Research, 32(9):1776–1786, 2022.
[493] S. Windels, N. Malod-Dognin, and N. Pržulj. Graphlet eigencentralities capture novel central roles of genes in pathways. PLOS One, 17(1):e0261676, 2022.
[494] S. Windels, N. Malod-Dognin, and N. Pržulj. Identifying cellular cancer mechanisms through pathway-driven data integration. Bioinformatics, 38(18):4344–4351, 2022.
[495] S. Winkler, I. Winkler, M. Figaschewski, T. Tiede, A. Nordheim, and O. Kohlbacher. De novo identification of maximally deregulated subnetworks based on multi-omics data with deregnet. BMC Bioinformatics, 23(1):1–28, 2022.
[496] S. N. Wright, S. Colton, L. V. Schaffer, R. T. Pillich, C. Churas, D. Pratt, and T. Ideker. State of the interactomes: an evaluation of molecular networks for generating biological insights. bioRxiv 2024.04.26.587073, 2024.
[497] L. Wu, H. Lin, C. Tan, Z. Gao, and S. Z. Li. Self-supervised learning on graphs: Contrastive, generative, or predictive. IEEE Transactions on Knowledge and Data Engineering, 35:4216–4235, 2023.
[498] R. Wu, F. Ding, R. Wang, R. Shen, X. Zhang, S. Luo, C. Su, Z. Wu, Q. Xie, B. Berger, et al. High-resolution de novo structure prediction from primary sequence. bioRxiv, pages 2022–07, 2022.
[499] X. Wu, Q. Liu, and R. Jiang. Align human interactome with phenome to identify causative genes and networks underlying disease families. Bioinformatics, 25(1):98–104, 2009.
[500] A. Xenos, N. Malod-Dognin, S. Milinković, and N. Pržulj. Linear functional organization of the omic embedding space. Bioinformatics, 37(21):3839–3847, 2021.
[501] A. Xenos, N. Malod-Dognin, C. Zambrana, and N. Pržulj. Integrated data analysis uncovers new COVID-19 related genes and potential drug re-purposing candidates. International Journal of Molecular Sciences, 24(2):1431, 2023.
[502] Y. Xie, S. Katariya, X. Tang, E. Huang, N. Rao, K. Subbian, and S. Ji. Task-agnostic graph explanations. In Proceedings of the Advances in Neural Information Processing Systems, volume 35, pages 12027–12039, 2022.
[503] Y. Xie, Z. Xu, J. Zhang, Z. Wang, and S. Ji. Self-supervised learning of graph neural networks: A unified review. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(2):2412–2429, 2022.
[504] Y. Xie, Y. Zhang, M. Gong, Z. Tang, and C. Han. MGAT: Multi-view Graph Attention Networks. Neural Networks, 132:180–189, 2020.
[505] H. Xiong, J. Yan, and L. Pan. Contrastive multi-view multiplex network embedding with applications to robust network alignment. In Proceedings of the ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 1913–1923, 2021.
[506] J. Xu, T. L. Wickramarathne, and N. V. Chawla. Representing higher-order dependencies in networks. Science Advances, 2(5):e1600028, 2016.
[507] M. Xu, Z. Zhang, J. Lu, Z. Zhu, Y. Zhang, M. Chang, R. Liu, and J. Tang. Peer: a comprehensive and multi-task benchmark for protein sequence understanding. In Proceedings of the Advances in Neural Information Processing Systems, volume 35, pages 35156–35173, 2022.
[508] Y. Yan, Q. Zhou, J. Li, T. Abdelzaher, and H. Tong. Dissecting Cross-Layer Dependency Inference on Multi-Layered Inter-Dependent Networks. In Proceedings of the International Conference on Information and Knowledge Management, pages 2341–2351, 2022.
[509] X. H. Yang, A. Goldstein, Y. Sun, Z. Wang, M. Wei, I. P. Moskowitz, and J. M. Cunningham. Detecting critical transition signals from single-cell transcriptomes to infer lineage-determining transcription factors. Nucleic Acids Research, 50(16):e91–e91, 2022.
[510] M. Yasunaga, A. Bosselut, H. Ren, X. Zhang, C. D. Manning, P. S. Liang, and J. Leskovec. Deep bidirectional language-knowledge graph pretraining. In Proceedings of the Advances in Neural Information Processing Systems, volume 35, pages 37309–37323, 2022.
[511] M. Yasunaga, J. Leskovec, and P. Liang. Linkbert: Pretraining language models with document links. Association for Computational Linguistics, 2022.
[512] Ö. Yaveroğlu, T. Milenković, and N. Pržulj. Proper evaluation of alignment-free network comparison methods. Bioinformatics, 31(16):2697–2704, 2015.
[513] E. Yeger-Lotem, L. Riva, L. J. Su, A. D. Gitler, A. G. Cashikar, O. D. King, P. K. Auluck, M. L. Geddie, J. S. Valastyan, D. R. Karger, et al. Bridging high-throughput genetic and transcriptional data reveals cellular responses to alpha-synuclein toxicity. Nature Genetics, 41(3):316–323, 2009.
[514] K. Yi, B. Zhou, Y. Shen, P. Liò, and Y. Wang. Graph denoising diffusion for inverse protein folding. Advances in Neural Information Processing Systems, 36, 2024.
[515] J. Yim, H. Stärk, G. Corso, B. Jing, R. Barzilay, and T. S. Jaakkola. Diffusion models in protein structure and docking. Wiley Interdisciplinary Reviews: Computational Molecular Science, 14(2):e1711, 2024.
[516] W. Yin, L. Mendoza, J. Monzon-Sandoval, A. O. Urrutia, and H. Gutierrez. Emergence of co-expression in gene regulatory networks. PLOS One, 16(4):e0247671, 2021.
[517] C. Ying, T. Cai, S. Luo, S. Zheng, G. Ke, D. He, Y. Shen, and T.-Y. Liu. Do transformers really perform badly for graph representation? In Proceedings of the Advances in Neural Information Processing Systems, volume 34, pages 28877–28888, 2021.
[518] Z. Ying, D. Bourgeois, J. You, M. Zitnik, and J. Leskovec. Gnnexplainer: Generating explanations for graph neural networks. In Proceedings of the Advances in Neural Information Processing Systems, volume 32, 2019.
[519] B.-J. Yoon, X. Qian, and E. R. Dougherty. Quantifying the objective cost of uncertainty in complex dynamical systems. IEEE Transactions on Signal Processing, 61(9):2256–2266, 2013.
[520] J. You, T. Du, and J. Leskovec. ROLAND: graph learning framework for dynamic graphs. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 2358–2366, 2022.
[521] J. You, R. Ying, X. Ren, W. Hamilton, and J. Leskovec. GraphRNN: Generating Realistic Graphs with Deep Auto-Regressive Models. In Proceedings of the International Conference on Machine Learning, pages 5708–5717, 2018.
[522] Y. You, T. Chen, Y. Sui, T. Chen, Z. Wang, and Y. Shen. Graph contrastive learning with augmentations. In Proceedings of the Advances in Neural Information Processing Systems, volume 33, pages 5812–5823, 2020.
[523] X. Yu, Z. Liu, Y. Fang, and X. Zhang. Learning to count isomorphisms with graph neural networks. In Proceedings of the AAAI Conference on Artificial Intelligence, 2023.
[524] H. Yuan, H. Yu, J. Wang, K. Li, and S. Ji. On explainability of graph neural networks via subgraph explorations. In Proceedings of the International Conference on Machine Learning, pages 12241–12252, 2021.
[525] X. Yue, Z. Wang, J. Huang, S. Parthasarathy, S. Moosavinasab, Y. Huang, S. M. Lin, W. Zhang, P. Zhang, and H. Sun. Graph embedding on biomedical networks: methods, applications and evaluations. Bioinformatics, 36(4):1241–1251, 2020.
[526] H. Y. Yuen and J. Jansson. Better link prediction for protein-protein interaction networks. In Proceedings of the IEEE International Conference on Bioinformatics and Bioengineering, pages 53–60, 2020.
[527] S. Yun, M. Jeong, R. Kim, J. Kang, and H. J. Kim. Graph transformer networks. Advances in neural information processing systems, 32, 2019.
[528] C. Zambrana, A. Xenos, R. Bottcher, N. Malod-Dognin, and N. Pržulj. Network neighbors of viral targets and differentially expressed genes in COVID-19 are drug target candidates. Scientific Reports, 11(1):18985, 2021.
[529] B. Zhang and S. Horvath. A general framework for weighted gene co-expression network analysis. Statistical Applications in Genetics and Molecular Biology, 4(1), 2005.
[530] C. Zhang, D. Song, C. Huang, A. Swami, and N. V. Chawla. Heterogeneous graph neural network. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, page 793–803, 2019.
[531] J. Zhang, L. Cammarata, C. Squires, T. P. Sapsis, and C. Uhler. Active learning for optimal intervention design in causal models. Nature Machine Intelligence, 5(10):1066–1075, 2023.
[532] L. Zhang, G. Yu, M. Guo, and J. Wang. Predicting protein-protein interactions using high-quality non-interacting pairs. BMC Bioinformatics, 19(19):105–124, 2018.
[533] R. Zhang and J. Ma. Matcha: probing multi-way chromatin interaction with hypergraph representation learning. Cell Systems, 10(5):397–407, 2020.
[534] R. Zhang, J. Ma, and J. Ma. DANGO: Predicting higher-order genetic interactions. bioRxiv 2020.11.26.400739, 2020.
[535] R. Zhang, T. Zhou, and J. Ma. Multiscale and integrative single-cell Hi-C analysis with Higashi. Nature Biotechnology, 40(2):254–261, 2022.
[536] R. Zhang, Y. Zou, and J. Ma. Hyper-SAGNN: a self-attention based graph neural network for hypergraphs. In Proceedings of the International Conference on Learning Representations, 2020.
[537] S. Zhang, H. Tong, Y. Xia, L. Xiong, and J. Xu. NetTrans: Neural Cross-Network Transformation. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 986–996, 2020.
[538] S. Zhang, H. Tong, J. Xu, Y. Hu, and R. Maciejewski. ORIGIN: Non-Rigid Network Alignment. In Proceedings of the IEEE International Conference on Big Data, pages 998–1007, 2019.
[539] X. Zhang, M. Zeman, T. Tsiligkaridis, and M. Zitnik. Graph-guided network for irregularly sampled multivariate time series. In Proceedings of the International Conference on Learning Representations, 2022.
[540] Y. Zhang and H. Huang. Brain connectome based complex brain disorder prediction via novel graph-blind convolutional network. In Proceedings of the International Conference on Information Processing In Medical Imaging, 2019.
[541] Y. Zhang, L. Zhan, S. Wu, P. Thompson, and H. Huang. Disentangled and proportional representation learning for multi-view brain connectomes. In Proceedings of the Medical Image Computing and Computer Assisted Intervention, pages 508–518, 2021.
[542] Z. Zhang, Q. Liu, Q. Hu, and C.-K. Lee. Hierarchical graph transformer with adaptive node sampling. In Proceedings of the Advances in Neural Information Processing Systems, volume 35, pages 21171–21183, 2022.
[543] Z. Zhang, Z. Lu, H. Zhongkai, M. Zitnik, and Q. Liu. Full-atom protein pocket design via iterative refinement. Advances in Neural Information Processing Systems, 36:16816–16836, 2023.
[544] Z. Zhang, M. Xu, A. R. Jamasb, V. Chenthamarakshan, A. Lozano, P. Das, and J. Tang. Protein representation learning by geometric structure pretraining. In Proceedings of the International Conference on Learning Representations, 2023.
[545] C. Zhao, L. Zhan, P. M. Thompson, and H. Huang. Revealing Continuous Brain Dynamical Organization with Multimodal Graph Transformer. In Proceedings of the Medical Image Computing and Computer Assisted Intervention, pages 346–355, 2022.
[546] J. Zhao, Q. Wen, S. Sun, Y. Ye, and C. Zhang. Multi-view Self-supervised Heterogeneous Graph Embedding. In Proceedings of the Joint European Conference on Machine Learning and Knowledge Discovery In Databases, pages 319–334, 2021.
[547] L. Zhao, Y. Song, C. Zhang, Y. Liu, P. Wang, T. Lin, M. Deng, and H. Li. T-GCN: A Temporal Graph Convolutional Network for Traffic Prediction. IEEE Transactions on Intelligent Transportation Systems, 21(9):3848–3858, 2019.
[548] X. Zhao, F. Chen, S. Hu, and J.-H. Cho. Uncertainty aware semi-supervised learning on graph data. Advances in Neural Information Processing Systems, 33:12827–12836, 2020.
[549] V. W. Zheng, M. Sha, Y. Li, H. Yang, Y. Fang, Z. Zhang, K.-L. Tan, and K. C.-C. Chang. Heterogeneous embedding propagation for large-scale e-commerce user alignment. In Proceedings of the IEEE International Conference on Data Mining, pages 1434–1439, 2018.
[550] D. Zhou, J. Huang, and B. Schölkopf. Learning with hypergraphs: Clustering, classification, and embedding. In Proceedings of the Advances in Neural Information Processing Systems, volume 19, 2006.
[551] G. Zhou, J. Ewald, and J. Xia. OmicsAnalyst: a comprehensive web-based platform for visual analytics of multi-omics data. Nucleic Acids Research, 49(W1):W476–W482, 2021.
[552] J. Zhou and O. G. Troyanskaya. Predicting effects of noncoding variants with deep learning–based sequence model. Nature Methods, 12(10):931–934, 2015.
[553] N. Zhou, Y. Jiang, T. R. Bergquist, A. J. Lee, B. Z. Kacsoh, A. W. Crocker, K. A. Lewis, G. Georghiou, H. N. Nguyen, M. N. Hamid, et al. The CAFA challenge reports improved protein function prediction and new functional annotations for hundreds of genes through experimental screens. Genome Biology, 20(1):1–23, 2019.
[554] P. Zhou, S. Wang, T. Li, and Q. Nie. Dissecting transition cells from single-cell transcriptome data through multiscale stochastic dynamics. Nature Communications, 12(1):5609, 2021.
[555] L. Zhu, Y. Ding, C.-Y. Chen, L. Wang, Z. Huo, S. Kim, C. Sotiriou, S. Oesterreich, and G. Tseng. MetaDCN: meta-analysis framework for differential co-expression network detection with an application in breast cancer. Bioinformatics, 33(8):1121–1129, 2016.
[556] M. Zitnik, M. Agrawal, and J. Leskovec. Modeling polypharmacy side effects with graph convolutional networks. Bioinformatics, 34(13):i457–i466, 2018.
[557] M. Zitnik and J. Leskovec. Predicting multicellular function through multi-layer tissue networks. Bioinformatics, 33(14):i190–i198, 2017.
[558] M. Zitnik, F. Nguyen, B. Wang, J. Leskovec, A. Goldenberg, and M. M. Hoffman. Machine learning for integrating data in biology and medicine: Principles, practice, and opportunities. Information Fusion, 50:71–91, 2019.
[559] M. Zitnik, R. Sosič, M. W. Feldman, and J. Leskovec. Evolution of resilience in protein interactomes across the tree of life. Proceedings of the National Academy of Sciences, 116(10):4426–4433, 2019.
[560] E. Zotenko, K. S. Guimarães, R. Jothi, and T. M. Przytycka. Decomposition of overlapping protein complexes: A graph theoretical method for analyzing static and dynamic protein associations. Algorithms for Molecular Biology, 1(1):1–11, 2006.