Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Prediction of Oncogenic Interactions and Cancer-Related Signaling Networks Based On Network Topology

Download as pdf or txt
Download as pdf or txt
You are on page 1of 12

Prediction of Oncogenic Interactions and Cancer-Related

Signaling Networks Based on Network Topology


Marcio Luis Acencio*, Luiz Augusto Bovolenta, Esther Camilo, Ney Lemke
Department of Physics and Biophysics, Botucatu Biosciences Institute, UNESP – Univ Estadual Paulista, Botucatu, São Paulo, Brazil

Abstract
Cancer has been increasingly recognized as a systems biology disease since many investigators have demonstrated that this
malignant phenotype emerges from abnormal protein-protein, regulatory and metabolic interactions induced by
simultaneous structural and regulatory changes in multiple genes and pathways. Therefore, the identification of oncogenic
interactions and cancer-related signaling networks is crucial for better understanding cancer. As experimental techniques
for determining such interactions and signaling networks are labor-intensive and time-consuming, the development of a
computational approach capable to accomplish this task would be of great value. For this purpose, we present here a novel
computational approach based on network topology and machine learning capable to predict oncogenic interactions and
extract relevant cancer-related signaling subnetworks from an integrated network of human genes interactions (INHGI). This
approach, called graph2sig, is twofold: first, it assigns oncogenic scores to all interactions in the INHGI and then these
oncogenic scores are used as edge weights to extract oncogenic signaling subnetworks from INHGI. Regarding the
prediction of oncogenic interactions, we showed that graph2sig is able to recover 89% of known oncogenic interactions
with a precision of 77%. Moreover, the interactions that received high oncogenic scores are enriched in genes for which
mutations have been causally implicated in cancer. We also demonstrated that graph2sig is potentially useful in extracting
oncogenic signaling subnetworks: more than 80% of constructed subnetworks contain more than 50% of original
interactions in their corresponding oncogenic linear pathways present in the KEGG PATHWAY database. In addition, the
potential oncogenic signaling subnetworks discovered by graph2sig are supported by experimental evidence. Taken
together, these results suggest that graph2sig can be a useful tool for investigators involved in cancer research interested in
detecting signaling networks most prone to contribute with the emergence of malignant phenotype.

Citation: Acencio ML, Bovolenta LA, Camilo E, Lemke N (2013) Prediction of Oncogenic Interactions and Cancer-Related Signaling Networks Based on Network
Topology. PLoS ONE 8(10): e77521. doi:10.1371/journal.pone.0077521
Editor: Julio Vera, University of Erlangen-Nuremberg, Germany
Received April 1, 2013; Accepted September 3, 2013; Published October 25, 2013
Copyright: ß 2013 Acencio et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits
unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Funding: This work has been supported by grants #2010/20684-3, #2012/13450-1, #2012/00741-8 and #2013/02018-4 from the São Paulo Research
Foundation. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing Interests: The authors have declared that no competing interests exist.
* E-mail: mlacencio@ibb.unesp.br

Introduction Instead, additional oncogenic signals altering nuclear trafficking


and ubiquitin-mediated proteolysis are required to promote the
The cancer phenotype is driven by the simultaneous expression nuclear retention of the overexpressed CCND1 [3], condition of
of six biological capabilities: self-sufficiency in growth signals, which the continued proliferation of cell, one of the features
insensitivity to antigrowth signals, avoidance of apoptosis, necessary to a full malignant transformation, can be sustained.
sustained angiogenesis, limitless replicative potential and tissue The above-mentioned example reinforces the fact that a normal
invasion and metastasis [1]. All these ‘‘hallmarks of cancer’’ cell will be transformed into a cancer cell only if multiple normal
emerge as a result of the complex interplay among oncogenic interactions are simultaneously disturbed by multiple oncogenic
signals that are sets of sequential physical and biochemical signals. In this regard, the determination of the oncogenic role of
reactions, i.e. phosphorylation, dephosphorylation, binding, disso- individual genes or proteins is insufficient to decipher the
ciation etc., that are triggered by oncogenes or tumor suppressor intricacies of the signaling pathways involved in cancer. The
genes and culminate in the expression of fundamental cell determination of oncogenic role of genes and proteins in a systems
physiology changes associated with the malignant phenotype. level, on the other hand, would be preferable to this end and, as a
In general, oncogenic signals disturb the normal interactions as matter of fact, systems biology-based approaches have been
long as these signals propagate through the signaling network. For convincingly shown to be successful in uncovering the functioning
example, the overexpression of CCND1, a gene that is an of cancer signaling pathways (for reviews on cancer systems
important regulator in cell cycle progression, is the result of the biology, see [4] and [5]).
constitutive oncogenic signaling triggered by mutated KRAS in The combination of machine learning and graph theory is one
many cancer cells [2]. The interactions downstream to KRAS and of the systems biology-based approaches used to determine and
upstream to CCND1 are disturbed and, as a consequence, CCND1 predict how phenotypes emerge from the interactions among
is overexpressed. However, overexpression of CCND1 alone is not biological entities. We have previously used this approach to
sufficient to drive oncogenic transformation through the self- predict essential genes on a genome-wide scale and determine
sufficiency in growth signals supported by mutated KRAS. cellular rules for essentiality on Escherichia coli [6] and Saccharomyces

PLOS ONE | www.plosone.org 1 October 2013 | Volume 8 | Issue 10 | e77521


Oncogenic Interactions and Signaling Networks

cerevisiae [7]. Moreover, we have also used the combination of Protein-protein physical interactions data were obtained from
machine learning and graph theory to predict morbid and version 1.3 of the Human Integrated Protein-Protein Interaction
druggable genes and determine rules for morbidity and drugg- rEference (HIPPIE), a database dedicated to the collection of
ability in human [8]. Besides attaining successful prediction rates, experimentally verified and scored human protein-protein inter-
we have also obtained biologically plausible cellular rules in these actions integrated from multiple sources [10]. We collected from
cases. These findings prompted us to investigate whether the HIPPIE only interactions detected by experimental techniques
combination of machine learning and graph theory would be also that received scores of 5 or more, i.e. techniques that were
useful to reveal in a systems-level how cancer signaling pathways considered by HIPPIE expert curators as those with high reliability
act in concert to generate the malignant phenotype. and low error rate [10]. Protein-protein interactions from HIPPIE
For this purpose, we present in this paper a novel computational (and from all other similar databases in fact) are considered
method based on machine learning and graph theory, the undirected interactions because this type of interaction is supposed
graph2sig, that determines (1) the oncogenic potential of an to be non-directional. However, as the extraction of potential
interaction, i.e. its capacity to transmit oncogenic signals in an oncogenic signaling subnetworks from INHGI depends on the
integrated network of human gene interactions (INHGI) and (2) directionality of interactions, i.e. direction of signal flow between
extracts from INHGI potential cancer-related signaling subnet- proteins, and interactions provided by our source of training data,
works given two genes of interest by using the oncogenic potential the KEGG PATHWAY [11], are directed (see more details in the
scores assigned to the interactions. Using graph2sig, we were able to section ‘‘Construction of training datasets’’), each protein-protein
reliably predict the oncogenic potential of interactions as well as to interaction p1 – p2 was transformed in two distinct directed
extract from INHGI subnetworks containing known and potential interactions: p1 ? p2 and p2 ? p1 .
oncogenic pathways supported by experimental evidence. To the Human transcriptional regulation interactions were obtained
best of our knowledge, this is the first time that the combination of from the current version of the Human Transcriptional Regula-
machine learning and graph theory is used to predict both the tion Interaction database (HTRIdb; [12]). Created by our group,
oncogenic potential of interactions and potential cancer-related HTRIdb is a repository of experimentally verified interactions
signaling subnetworks. between human transcription factors and their target genes
detected by 14 distinct experimental techniques embracing both
Materials and Methods small and large-scale techniques. We collected from HTRIdb all
transcription factors/target genes interactions.
The aims of graph2sig is twofold: prediction of the oncogenic Metabolic interactions were extracted from the human meta-
potential of interactions (Figure 1) and extraction of potential bolic model Recon 1 [13] by a code implemented in Mathematica
oncogenic signaling subnetworks from the INHGI (Figure 2). The H
7.0 (Wolfram Research, Inc.). We excluded those metabolic
first step of graph2sig is the construction of the INHGI and the interactions generated by the so-called ‘‘currency metabolites’’,
computation of network centralities of genes in INHGI (Table 1). abundant molecular species present throughout the cell most of
The second step concerns the use of these computed network the time and, therefore, unlikely to impose any constraints on the
centralities as training data for training machine learning dynamics of metabolic reactions [14]. We considered currency
algorithms (or learners) to generate prediction models for assigning metabolites the eight most connected metabolites (ADP, ATP,
oncogenic potential to interactions. The third step is the Hz , H2 O, NADPz , NADPH, orthophosphate and pyrophos-
assignment of a ‘‘oncogenic potential’’ (pcan ) to each interaction phate) in the original metabolic model Recon 1. In addition, we
by these prediction models (Figure 1). added to the set of metabolic interactions some important
The fourth step is to find the paths between two genes of interactions that are missing in the Recon 1: PIK3CA ? PDPK1,
interest, gi and gj , in the INHGI with the highest pcan values by PIK3CA ? ILK, PIK3CA ? AKT3, PIK3CA ? AKT2,
using the recursive enumeration algorithm (REA) [9], a path PIK3CA ? AKT1, PIK3CB ? PDPK1, PIK3CB ? ILK,
finding algorithm that lists the paths in the order of their weights PIK3CB ? AKT3, PIK3CB ? AKT2, PIK3CB ? AKT1,
(in this case, the pcan ). The final step is the selection and merging of PIK3CD ? PDPK1, PIK3CD ? ILK, PIK3CD ? AKT3,
paths found by REA for building the potential cancer-related PIK3CD ? AKT2, PIK3CD ? AKT1 and PTEN ? AKT1.
signaling subnetwork containing the highest oncogenic pathways The final INHGI is a directed network formed by the integration
linking gi and gj (Figure 2). These steps were implemented in a of the protein physical, metabolic and transcriptional regulation
bash script available at http://www.lbbc.ibb.unesp.br/graph2sig. interactions through genes common to these data sets (see
Dataset S1). Before performing the integration, we converted all
First step: INHGI construction and computation of human gene names to their GeneID – as provided by the Entrez
network centralities Gene database [15] – to avoid the creation of false interactions due
INHGI construction. The INHGI, which contains only to gene name ambiguity.
experimentally verified interactions, was constructed based on Computation of network centralities. For each gene g in
assumption that two genes, g1 and g2 , coding respectively for INHGI, we computed 4 network centrality measures as listed in
proteins p1 and p2 , are interacting genes if (i) p1 and p2 interact Table 1. Briefly, degree centrality (deg) is defined as the number of
physically (protein physical interaction), (ii) the transcription factor links to node (in our case, gene). Clustering coefficient (cluster) of a
p1 directly regulates the transcription of gene g2 , i.e., p1 binds to node (in our case, a gene) quantifies how close the node and its
the promoter region of g2 (transcriptional regulation interaction), neighbors are to being a clique, i.e., all nodes connected to all
or (iii) the enzymes p1 and p2 share metabolites, i.e., a product nodes. For the INHGI, cluster is defined as the proportion of links
generated by a reaction catalyzed by enzyme p1 is used as reactant between the genes within the neighborhood of g divided by the
by a reaction catalyzed by enzyme p2 , or the enzyme p1 generates number of links that could exist between them. Betweenness
a metabolite that interacts with a non-enzymatic p2 (metabolic centrality (bet) reflects the role played by a node (in our case, a
interaction). The experimentally verified human interactions were gene) in the global network architecture and, for the INHGI, is
obtained from different sources according to the type of defined as the fraction of shortest paths between gi and gj passing
interaction as described below. through g. Closeness centrality (clo) measures how close a node (in

PLOS ONE | www.plosone.org 2 October 2013 | Volume 8 | Issue 10 | e77521


Oncogenic Interactions and Signaling Networks

Figure 1. Initial steps of graph2sig. After building the INHGI and calculating the network centralities, balanced training groups are constructed
and presented to the selected machine learning algorithm (bagged J48) that, in turn, generates the prediction models as depicted in (A). These
prediction models are combined in one final prediction model by the Vote algorithm. This final model is then used to assign oncogenic scores to
interactions in INHGI originating the wINHGI as shown in (B).
doi:10.1371/journal.pone.0077521.g001

our case, a gene) is to all others in the network and, for the INHGI, INHGI because currently it is not possible to build a list of
is defined as the mean shortest path between g and all other genes interactions not known to transmit oncogenic signals. We
reachable from it. All these network centrality measures were randomly selected 1000 different sets of 265 of these non-
calculated by the Python package NetworkX 1.6 [16]. oncogenic interactions and combine them with the list of 265
known oncogenic interactions to build 1000 different training
Second step: generation of prediction models datasets containing 530 interactions each. These are the ‘‘normal
Construction of training datasets. We constructed two datasets’’. From these normal datasets, we generate 10000
groups of balanced training datasets, i.e., datasets containing the different ‘‘shuffled datasets’’ by randomly shuffling the class labels
same number of positive (in our case, known oncogenic (oncogenic and non-oncogenic) among interactions (Figure 1).
interactions) and negative (in our case, non-oncogenic interactions) Construction of prediction models. We employed the
examples: ‘‘normal datasets’’ and ‘‘shuffled datasets’’. These version 3.7.5 of WEKA (Waikato Environment for Knowledge
training data are available at http://www.lbbc.ibb.unesp.br/ Analysis) software package, a collection of machine learning
graph2sig. algorithms for data mining tasks [17], to generate the prediction
For constructing the training datasets, we first gathered a list of models. We used the training data described in the previous
oncogenic interactions – interactions known to transmit oncogenic section to train the bootstrap aggregating (bagging), a machine
signals – from the cancer pathway maps provided by KEGG learning ensemble meta-algorithm that combine multiple base
PATHWAY database [11] and then mapped them to the INHGI. learners [18]. In our case, we selected as the base learner the J48
The final list of oncogenic interactions used as positive examples to algorithm, a WEKA’s implementation of the C4.5 decision tree
train our machine learning algorithm is comprised by 265 [19], with the default parameters.
oncogenic interactions present in the INHGI (see Dataset S1). Usually, the generation of prediction models by bagging is
Regarding the negative examples, we considered as ‘‘non- conducted as follows: (1) n bootstrap replicates of the training
oncogenic interactions’’ the remaining interactions present in the dataset is created; (2) each replicate is presented to the base learner

PLOS ONE | www.plosone.org 3 October 2013 | Volume 8 | Issue 10 | e77521


Oncogenic Interactions and Signaling Networks

Figure 2. Final steps of graph2sig. (A) The application of REA on the wINHGI generates a list of m paths along with their costs for each pair of
genes and these costs are converted to weights and normalized so that the minimum weight is zero and the maximum weight is 1. (B) Twenty
subnetworks are generated from this list of paths and the subnetwork with the highest average clustering coefficient is selected. (C) For each pair of
genes, 41 subnetworks are generated and, among these subnetworks, the one with the highest average clustering coefficient is selected as the final
potential cancer-related subnetwork.
doi:10.1371/journal.pone.0077521.g002

that than builds n prediction models; and (3) these n prediction Performance of constructed prediction models. We
models are eventually combined in a single model. In our case, assessed the performance of our prediction models by estimating
bagging was configured to produce 20 bootstrap replicates of each their recall, precision and area under the receiving operating
training dataset and these replicates were then presented to J48 characteristic (ROC) curve (AUC). Recall is the proportion of
that, in turn, generated 20 prediction models for each training actual oncogenic interactions which are correctly predicted as such
dataset. These models were finally combined in a single model for against all actual cancer-related interactions:
each training dataset totaling 1000 combined ‘‘normal’’ models
(generated from the normal datasets) and 10000 combined TP
Recall~
‘‘shuffled’’ models (generated from shuffled datasets). TPzFN

PLOS ONE | www.plosone.org 4 October 2013 | Volume 8 | Issue 10 | e77521


Oncogenic Interactions and Signaling Networks

Table 1. Network centralities measures used as training features in graph2sig.

Centrality measure Function Description

Degree centrality deg(g) Number of links to gene g representing the number of interactions.
Clustering coefficient 2ng ng is the number of links connecting the neighbors of g and kg is the number of links connecting g
clu(g)~
kg (kg {1) to its neighbors.
Betweenness centrality X sgi gj (g) sgi gj is the number of shortest paths between gi and gj and sgi gj (g) is the number of shortest paths
bet(g)~
gi =g=gj
sgi gj between gi and gj passing through g.

Closeness centrality n d(g,gj )is the shortest distance between genes g and gj ; n is the number of genes in the network.
clo(g)~ P
d(g,gj )
gj

doi:10.1371/journal.pone.0077521.t001

TP (true positive) denotes the amount of actual cancer-related models generated by normal and shuffled datasets with a p-value
interactions correctly predicted as such and FN (false negative) v0.005 were considered statistically significant.
denotes the amount of actual cancer-related interactions incor-
rectly predicted as not known to be related to cancer, respectively. Third step: prediction of potential oncogenic interactions
Precision is the proportion of actual cancer-related interactions We assembled the 1000 combined normal prediction models
which are correctly predicted as such against all interactions constructed in the previous step in one single model (available at
predicted as related to cancer: http://www.lbbc.ibb.unesp.br/graph2sig) by using ‘‘Vote’’, a
WEKA’s implementation of the voting meta-algorithm that
TP combines the output predictions of each prediction model by
Precision~ different rules [24]. We then applied this single prediction model,
TPzFP
which contains 20000 models as a result of the combination of the
1000 combined models that, in turn, contains 20 models each, to
FP denotes the amount of interactions actually not known to be assign pcan values, i.e., potential to transmit oncogenic signals, to
related to cancer incorrectly predicted as cancer-related interac- the entire set of interactions in INHGI pcan values. The final pcan
tions, respectively. value is an average of 20000 values individually assigned by each
The AUC is a summary measure of the ROC curve – a plot of model within the single prediction model.
the true positive rate versus false positive rate that indicates the
probability of a true positive prediction as a function of the Fourth step: execution of the recursive enumeration
probability of a false positive prediction for all possible threshold algorithm (REA)
values [20] – and is equivalent to the probability that a randomly To find the paths with the highest pcan values between two genes
chosen negative example (in our case, a non-oncogenic interac- gi and gj in the INHGI, graph2sig uses REA [9]. This algorithm
tion) will have a smaller estimated probability of belonging to the enumerates k paths between a start and an end node in the reverse
positive class than a randomly chosen positive example (in our order of their costs, C, so that the path with minimum C is ranked
case, a oncogenic interaction) [21]. first among the k paths. Before executing REA, pcan values in
Using WEKA, we estimated the above-mentioned performance INHGI are converted into costs (1{pcan ) since REA considers the
measures by performing a 10-fold cross-validation to test the 1000 weights of edges as costs. In this way, the path with the maximum
combined normal and 10000 combined shuffled prediction XI
p (i), where I is the total number of interactions in the
i~1 carc
models. The 10-fold cross-validation works as follows: each dataset
is randomly partitioned into 10 subsets. Of the 10 subsets, a single path, corresponds to the path with minimum C for REA.
subset is retained as the validation data for testing the model, and In REA, besides selecting a start node – in our case a gene gi
the remaining 9 subsets are used as training data. The cross- that triggers the oncogenic signal – and an end node – in our case
validation process is then repeated 10 times, with each of the 10 a gene gj of interest that receives the oncogenic signal triggered by
subsets used exactly once as the validation data. The 10 results the start gene – it is also possible to define k up to a maximum
from the folds are then averaged to produce a single estimation for value predetermined for each size of network. For INHGI, for
each performance measure for each prediction model. In our case, instance, REA allows to define a maximum p of 3|106 paths. For
each performance measure of each prediction model is an average each pair gi – gj , graph2sig runs REA with 41 different values of k:
of 200 results since each model is a combination of 20 other 100 to 1000 in increments of 100 paths, 2000 to 10000 in
models. Finally, we reported the performance measures estimated increments of 1000 paths, 20000 to 100000 in increments of 10000
by the 10-fold cross-validation as medians of the 1000 combined paths, 200000 to 1000000 in increments of 100000 paths and
normal and 10000 combined shuffled prediction models. 1500000 to 3000000 in increments of 500000 paths.
The statistical comparisons of the performance measures From the 41 groups of paths returned by REA, 41 potential
estimated by our prediction models generated by normal and cancer-related signaling subnetworks are constructed for each gi –
shuffled datasets were performed by the Mann-Whitney-U test gj pair as shown in the next section.
[22]. According to established conventions in the machine
learning community, we used this test since it makes no Final step: extraction of potential cancer-related
assumptions about the underlying distribution of performance signaling subnetworks
measures used to evaluate the prediction models [23]. Differences In this final step of graph2sig, from each group of paths returned by
between performance measures estimated by our prediction REA (e.g., group with 1000 paths or 100000 paths) for each gi – gj

PLOS ONE | www.plosone.org 5 October 2013 | Volume 8 | Issue 10 | e77521


Oncogenic Interactions and Signaling Networks

pair, the potential cancer-related signaling subnetwork is construct- described in the next sections are valid only for the current INHGI.
ed as follows: Any alteration in the structure of INHGI will also change the
1 network centrality measures and, as a consequence, the construc-
1. For each path, C is converted to weight, W , where W ~ ; tion of prediction models as well as the assignment of pcan values.
C
2. W values are normalized so that max (W )~1 and
min (W )~0 as following: Evaluation of the performance of prediction models
The second and third steps of graph2sig concern, respectively, the
W (k){ min (W ) generation of prediction models and assignment of oncogenic
g(k)~ ð1Þ potential scores, pcan , to interactions in INHGI. Prior to the
max (W ){ min (W )
assignment of pcan values (as described in detail in ‘‘Methods’’), we
where g(k) is the normalized W for path k and W (k) is the sought to estimate the performance of the generated prediction
calculated weight in (1) for path k; models in recovering known oncogenic interactions and distin-
3. Twenty subnetworks are constructed such that each subnet- guishing non-oncogenic from oncogenic interactions. For this
work is comprised by a set of k paths with gƒ1{m where m purpose, we assessed their performance by measuring their median
ranges from 0 to 0.95 in increments of 0.05 (Figure 2); recall, precision and AUC across the 1000 normal models (see
‘‘Methods’’ for more details).
4. The subnetwork with the highest average clustering coefficient
among all 20 subnetworks is selected as the potential cancer- Before analyzing the performance measures of our prediction
related signaling subnetwork (Figure 2). models, we estimated the performance measures of the prediction
models generated from the shuffled datasets and then compared
them with the prediction models generated from the normal
At this level, graph2sig contains a collection of 41 potential datasets. This was done to check whether the prediction models
cancer-related signaling subnetworks for each gi – gj pair. The built by training the bagged J48 on non-shuffled datasets learned
ultimate potential cancer-related signaling subnetwork for each gi the traits actually associated with cancer instead of traits associated
– gj pair is the subnetwork with the highest average clustering with any random subset of genes. For this comparison, we used the
coefficient among the 41 subnetworks (Figure 2). Mann-Whitney-U test [22] as described in ‘‘Methods’’. For
shuffled models, the recall ranged from 0.22 to 0.81 with a
median of 0.49, the precision ranged from 0.39 to 0.69 with a
Results and Discussion
median of 0.5 and the AUC ranged from 0.38 to 0.62 with a
INHGI: general features median of 0.49. All these values are statistically different from the
The construction of the INHGI is fundamental to graph2sig since performance measures of normal models (p-value v2|10{16 for
the utilization of network centrality measures of genes as training all measures), thereby indicating that the traits actually associated
features in the machine learning approach proposed here is the with cancer were learned by our normal prediction models.
core of the whole process. In addition, the extraction of a signaling After confirmation that the prediction models generated from
subnetwork makes sense only in a network context. Thus, it is normal datasets is likely to learn the traits actually associated with
important to be aware of some general features of the INHGI as cancer, we aimed to analyze their performance measures. As
these features can serve as useful resources for the analysis and shown in Figure 3, the recall of prediction models ranged from
interpretation of results. 0.83 to 0.94 with a median of 0.89 and their precision ranged from
The INHGI is a directed network comprised by 19789 genes and 0.71 to 0.83 with a median of 0.77. Then, the prediction models
318332 interactions. From these 19789 genes, 13932 interact with correctly recovered 89% of known oncogenic interaction with a
each other via 242716 protein physical interactions (considered precision of 77%. Furthermore, the probability of an interaction
here as directed interactions; see details in ‘‘Methods’’), 1166 via predicted as oncogenic actually belongs to the set of known
24299 metabolic interactions and 18310 via 51317 transcriptional oncogenic interactions ranged from 84% to 93% with a median of
regulation interactions. Moreover, 896 genes interact with each 89% as indicated by the median AUC (Figure 3).
other via protein physical and metabolic interactions, 12508 via While our prediction models are able to recover most of known
protein physical and transcriptional regulation interactions and oncogenic interactions as revealed by their high recall (median of
1042 via metabolic and transcriptional regulation interactions (see 89%), their ability to distinguish oncogenic from non-oncogenic is
Dataset S1). less pronounced as revealed by their moderate precision (median
The INHGI is certainly far from complete if we consider, for of 77%). This indicates a certain level of noise in the training data
example, the estimates calculated by Stumpf and colleagues [25]: that is likely associated with the existence of shared common
they have estimated that the size of human network of protein- features between oncogenic and non-oncogenic interactions that
protein interactions is about 650000 interactions. Therefore, induced our prediction models to yield a moderate performance in
INHGI contains &19% of total number of estimated human discriminating oncogenic from non-oncogenic interactions. This
protein-protein interactions as 121358 undirected protein-protein can be partially due to the strategy used to select non-oncogenic
interactions are present in this network. Moreover, INHGI interactions: since it is impossible at present to compile a list of
contains approximately 46% of the already identified 43059 non-oncogenic interactions, we selected interactions not known to
human genes (according to the EntrezGene database [15] accessed transmit oncogenic signals, i.e., all interactions in INHGI except
on September 10th, 2012). The remaining 23211 genes absent the known oncogenic interactions, as non-oncogenic interactions.
from INHGI are transcriptionally regulated by at least one Thus, some of these non-oncogenic interactions may actually be
transcription factor implying that, in the future, INHGI will be existing oncogenic interactions not yet present in the cancer
increased by the addition of at least 23211 transcriptional pathway maps provided by KEGG PATHWAY database.
regulation interactions. Our strategy for selecting the oncogenic interactions could also
Due to the incompleteness of the INHGI discussed above – in have contributed to the existence of shared common features
fact a noticeable characteristic of all networks constructed between oncogenic and non-oncogenic interaction. As previously
exclusively by experimentally validated interactions –, the results mentioned in the section ‘‘Materials and Methods’’, we considered

PLOS ONE | www.plosone.org 6 October 2013 | Volume 8 | Issue 10 | e77521


Oncogenic Interactions and Signaling Networks

as ‘‘oncogenic’’ those interactions present in the cancer pathways network size and, therefore, some of the network centralities-
maps provided by KEGG PATHWAY database. We cannot related shared common features between oncogenic and non-
guarantee the real ‘‘oncogenicity’’ of these interactions since these oncogenic interactions might disappear as a consequence.
cancer pathways maps are inferred from literature by KEGG
expert curators through the combination of experimental data Do pcan values reliably express the oncogenic nature of
obtained from different research articles. No experimental interactions?
evidence has been reported so far to show that these pathways, As the final goal of graph2sig is to use the pcan values as edge
at least in their entirety, are actually utilized in cancer cells. weights for the extraction of oncogenic signaling subnetworks
Although we did not try to construct training groups by using between any two genes of interest in the INHGI, it is important to
oncogenic interactions collected from sources other than KEGG check whether these values reliably express the oncogenic nature
PATHWAY, we believe that it is difficult to avoid this uncertainty of interactions. For this purpose, the prediction models evaluated
about the real oncogenicity of interactions. For example, the in the previous section were merged in a single model that, in turn,
pathways present both in the NetPath database [26] and in the was used to assign pcan values to all interactions in the INHGI (see
oncogenic signaling map constructed by Cui and colleagues [27], details in ‘‘Methods’’). This weighted INHGI will be hereafter
two alternatives to KEGG PATHWAY for gathering oncogenic denoted by wINHGI.
interactions, are also collected by using the same strategy as in the Our prediction model seems indeed to express the oncogenic
case of KEGG PATHWAY. nature of interactions: known oncogenic interactions clearly
Another contributing factor for the existence of shared common received high oncogenic potential scores as shown by Figure 4.
features between non-oncogenic and oncogenic interactions can In fact, by using a hypergeometric test – statistical test that
be the incompleteness of INHGI as previously discussed. Since our calculates the likelihood, in a p-value form, that the overrepre-
network contains about 121000 undirected protein-protein inter- sentation of a certain category in a sample occurs by chance – we
actions in comparison to the estimated 650000 human protein- showed that the 257 known oncogenic interactions, which
protein interactions [25], we can envisage that the values of all represent 0.08% of interactions in wINHGI and 0.8% of the
network centrality measures might change with the enlargement of 30395 interactions that received pcan values greater than 0.7
(onco net), are significantly overrepresented in the onco net with a
p-value ~10{237 . Furthermore, 252 (95%) of the 265 known
oncogenic interactions were assigned values of pcan w0:7.
The fact that known oncogenic interactions are overrepresented
in onco net is not surprising since these interactions were used as
training examples for constructing the prediction model. To
convincingly demonstrate that pcan values reliably express the
oncogenic nature of interactions, we asked whether putative
oncogenic interactions – interactions that seem to be involved in
cancer and are currently absent from the KEGG PATHWAY
database – also received high oncogenic potential scores by our
prediction model and are also significantly overrepresented in
onco net. To achieve this goal, we considered as putative
oncogenic interactions the ‘‘oncogenic signal transduction events’’
defined by Cui and colleagues [27]. According to these
investigators, oncogenic signal transduction events are interactions
in which the upstream and downstream nodes get altered either
genetically or epigenetically and, therefore, they are most likely to
be selected and used in cancer signaling. Our oncogenic
transduction events are interactions in the wINHGI in which both
genes are those for which mutations have been causally implicated
in cancer. These genes were collected from the Cancer Gene
Census (http://www.sanger.ac.uk/genetics/CGP/Census/; [28]).
As shown in Figure 5, our prediction model tended to assign
high oncogenic potential scores to these oncogenic transduction
events although this assignment is not as clear as in the case of
known oncogenic interactions. However, by using a hypergeo-
metric test, we showed that the oncogenic signal transduction
events, which represent 0.3% of interactions in wINHGI and 1.5%
of interactions in onco net, are significantly overrepresented in
onco net with a p-value ~3|10{187 . Moreover, 470 (43%) of the
1066 oncogenic signal transduction events in wINHGI were
Figure 3. Boxplot showing the predictive performance mea-
sures for prediction models. Boxplot showing the distribution of assigned values of pcan w0:7.
recall, precision and AUC values for 1000 prediction models generated
from normal datasets (red boxes) and 10000 prediction models Determination of oncogenic signaling subnetworks in
generated from shuffled datasets (blue boxes). The distributions of
performance values for models generated from normal and shuffled
the wINHGI
datasets are statistically different according to the Mann-Whitney-U test As shown in the previous section, the oncogenic scores assigned
(p-value v2|10{16 for all measures). by the first step of graph2sig seem indeed to reflect the oncogenic
doi:10.1371/journal.pone.0077521.g003 nature of interactions. However, as mentioned in ‘‘Introduction’’,

PLOS ONE | www.plosone.org 7 October 2013 | Volume 8 | Issue 10 | e77521


Oncogenic Interactions and Signaling Networks

Figure 4. Frequency distribution of known oncogenic interactions per intervals of oncogenic scores. The blue and red bars show,
respectively, the frequency distributions of known oncogenic interactions and all interactions in the wINHGI per 0.2 intervals of oncogenic scores.
doi:10.1371/journal.pone.0077521.g004

a normal cell will be transformed into a cancer cell only if multiple all subnetworks constructed from OLPs with 3 interactions and
normal regulatory interactions are simultaneously disturbed by 80% of subnetworks constructed from OLPs with 4 interactions
multiple oncogenic signals. This prompted us to proceed to the last contain all interactions from their corresponding OLPs. On the
steps of graph2sig: to use the oncogenic scores as edge weights in the other hand, only &23% of the subnetworks constructed from
extraction of oncogenic signaling subnetworks between any two OLPs with more than 4 interactions contain all interactions from
genes of interest in the INHGI (Figure 2). their corresponding OLPs. To ascertain whether there is indeed a
To evaluate the performance of graph2sig on extracting cancer dependence between the success rate of graph2sig and OLP size, we
signaling subnetworks between genes of interest, oncogenic linear applied the Kendall’s rank correlation test to assess the correlation
pathways (OLPs) extracted from cancer pathway maps provided strength between these variables. According to this test, success
by KEGG PATHWAY database were checked for their presence rate of graph2sig and OLP size correlate moderately with each
within the extracted subnetworks. As currently there is no database other (Kendall correlation coefficient tB = 20.56, p-value
dedicated to the collection of experimentally validated cancer = 6:2|10{9 ). Therefore, the performance of graph2sig is not
signaling subnetworks, we were forced to use the OLPs as strongly influenced by OLP size.
surrogates for assessing the performance of graph2sig. For this Second, we attempted to determine whether the success rate of
purpose, we selected OLPs from which all interactions could be graph2sig could be dependent on the ratio between the sizes of the
mapped to INHGI and the initial gene was an oncogene or tumor subnetwork and the corresponding OLP (subnet:OLP ratio).
suppressor gene. In addition, we selected OLPs from which the According to Table S1, &68% of subnetworks with subnet:OLP
oncogenic signal triggered by the initial gene reaches the target ratio greater than or equal to 10 contain all interactions of the
genes only through direct interactions. Using this strategy, we corresponding OLPs while &40% of subnetworks with subne-
obtained 52 OLPs with number of interactions ranging from 3 to 8 t:OLP ratio less than 10 contain all interactions of the
(Table S1). We then used graph2sig to extract from INHGI the corresponding OLPs. We performed a Kendall’s rank correlation
cancer signaling subnetworks between the initial and target genes test that showed a weak correlation (Kendall correlation coefficient
from each OLP. tB = 0.31, p-value = 0.001) between the success rate of graph2sig
From the 52 pairs of genes collected from the above-mentioned and subnet:OLP ratio. Thus, the performance of graph2sig is also
OLPs, graph2sig extracted subnetworks with size ranging from 10 not strongly influenced by the subnet:OLP ratio.
to 3273 interactions (Table S1 and Dataset S2). Thirty-two It is worth to point out, however, that these correlations
subnetworks (&61% ) contain all interactions from their between the success rate of graph2sig and the OLP size and the
corresponding OLPs and 43 subnetworks (&83%) contain 50% subnet:OLP ratio as well as the success rate of graph2sig itself
or more interactions from their corresponding OLPs (Table S1). should be interpreted cautiously. As already discussed in the
Before proceeding to the analysis of subnetworks per se, we section ‘‘Evaluation of the performance of prediction models’’,
checked whether the success rate of graph2sig, i.e. the ratio between OLPs are pathways inferred from literature by KEGG expert
the number of interactions of the OLP in the subnetwork and the curators through the combination of experimental data obtained
actual number of interactions in OLP, was dependent on factors from different research articles [29]. To the best of our knowledge,
other than the availability of pathways with high oncogenic scores so far no experimental evidence has been reported to show that
linking the selected initial and target genes. these OLPs, at least in their entirety, are actually utilized in cancer
First, we examined the apparent dependence of success rate of cells. This limitation regarding the usage of OLPs as references is
graph2sig on OLP size. At a first glance, the success rate of graph2sig thus likely to underestimate the performance of graph2sig due to the
seems to rely on the OLP size: as we can observe in the Table S1, uncertainty about the real role of these OLPs in transmitting

PLOS ONE | www.plosone.org 8 October 2013 | Volume 8 | Issue 10 | e77521


Oncogenic Interactions and Signaling Networks

Figure 5. Frequency distribution of potential oncogenic interactions per intervals of oncogenic scores. The blue and red bars show,
respectively, the frequency distributions of potential oncogenic interactions and all interactions in the wINHGI per 0.4 intervals of oncogenic scores.
doi:10.1371/journal.pone.0077521.g005

oncogenic signals as demonstrated in the maps provided by and colleagues that demonstrated that the disruption of this
KEGG PATHWAY database. However, on the other hand, interaction can block tumor cell invasion in an in vivo model [32].
graph2sig can be evaluated by its performance in detecting known The involvement of STAT3/JUN interaction in tumor progres-
oncogenic pathways that are not currently visible in the KEGG sion has been demonstrated by Ivanov and colleagues [33]: they
PATHWAY maps. Below, we give some examples that illustrate reported that a cooperation between STAT3 and JUN downreg-
this point. ulates FAS surface expression and its downregulation underlies the
The ABL1 ? NFKB1 subnetwork contain 18 interactions and, resistance of melanoma and possibly other tumor types to therapy.
among these interactions, only one, specifically the physical Hence, by using graph2sig, we found a potential oncogenic pathway
interaction between proteins NFKBIA and NFKB1, is also present by which the oncogenic signals triggered by MET can reach JUN.
in its corresponding OLP (Figure 6 and Dataset S2). Despite this, The ERBB2 ? VEGFA subnetwork contains 24 interactions
further analysis of the ABL1 ? NFKB1 subnetwork revealed the and, among these interactions, all three interactions of its
presence of an oncogenic pathway not described in KEGG corresponding OLP are also present in this subgraph (Figure 7
PATHWAY: ABL1 ? CTNNB1 ? NFKB1 (Figure 6). The and Dataset S2). Regardless of the presence of a complete known
existence of this pathway in cancer cells is supported by oncogenic pathway, further analysis of the ERBB2 ? VEGFA
experimental evidence reported in two research articles [30,31]. subnetwork allowed us to find a potential oncogenic pathway
The cancer-related ABL1/CTNNB1 interaction has been dem- absent from KEGG PATHWAY: ERBB2 ? EGFR ? STAT3
onstrated by Colluccia and colleagues that showed that ABL1 ? VEGFA (Figure 7). While the STAT3/VEGFA is a known
phosphorylates CTNNB1 and this phosphorylation is responsible oncogenic transcriptional regulation interaction present in KEGG
for stabilization and nuclear translocation of CTNNB1 in chronic PATHWAY, the other two interactions are oncogenic interactions
myeloid leukemia [30]. The cancer-related CTNNB1/NFKB1 supported by experimental evidence as shown by two research
interaction, in turn, has been reported by Deng and colleagues articles [34,35]. The oncogenic ERBB2/EGFR interaction has
that demonstrated that CTNNB1 interacts with and inhibits been detected by Wang and colleagues that demonstrated that
NFKB1 in human colon and breast cancers [31]. Therefore, ERBB2 associates with and activates the EGFR in lung cancer
graph2sig disclosed a potential pathway in which the activity of cells [34]. The involvement of EGFR/STAT3 interaction in
NFKB1 is disrupted by oncogenic signals received by ABL1 via tumor progression, in turn, has been demonstrated by Jaganathan
CTNNB1. and colleagues [20]: they reported that the EGFR/STAT3
The MET ? JUN subnetwork contain 116 interactions and, interaction supports the pancreatic cancer phenotype and explains
among these interactions, only two, specifically the protein in part the insensitivity of pancreatic cancer cells to the inhibition
physical interactions MET/GRB2 and MAPK1/JUN, are also of EGFR or STAT3 alone. Thus, by using graph2sig, we found a
present in its corresponding OLP (Figure S1). Despite this, further potential oncogenic pathway by which the oncogenic signals
analysis of the MET ? JUN subnetwork allowed us to find an triggered by ERBB2 alters the expression of VEGFA via EGFR-
oncogenic pathway absent from KEGG PATHWAY: MET ? STAT3 interaction.
STAT3 ? JUN (Figure S1). Experimental evidence supporting As a final example, we checked the KRAS ? CCND1
this pathway comes from two research articles [32,33]. The subnetwork (Figure S2) for the presence of novel potential
oncogenic MET/STAT3 interaction has been detected by Syed oncogenic pathways. This subnetwork contains 134 interactions

PLOS ONE | www.plosone.org 9 October 2013 | Volume 8 | Issue 10 | e77521


Oncogenic Interactions and Signaling Networks

Figure 6. The ABL1 ? NFKB1 subnetwork. This subnetwork


contains 18 interactions. The highlighted solid edge represents the
interaction presents in the corresponding OLP. The highlighted dashed
edges represent the interactions of the potential oncogenic pathway
(ABL1 ? CTNNB1 ? NFKB1). Blue edges represent protein physical
interactions and orange nodes represent genes participating in the
known or potential oncogenic pathways.
doi:10.1371/journal.pone.0077521.g006

including all five interactions of its corresponding OLP (Fig-


ure S2). The analysis of the KRAS ? CCND1 subnetwork
revealed a potential oncogenic pathway that is partially shown in
KEGG PATHWAY: KRAS ? PIK3CA ? AKT1 ? GSK3B ?
MYC ? CCND1 (Figure S2). All interactions, except for
GSK3B/MYC, can be found in other OLPs (e.g. in KRAS ?
BCL2L1; see Dataset S2). The oncogenic role of GSK3B/MYC
interaction, in turn, has been demonstrated elsewhere [36].
Therefore, we can hypothesize that, in cancer cells, the expression Figure 7. The ERBB2 ? VEGFA subnetwork. This subnetwork
of CCND1 promoted by MYC can be a result of oncogenic signals contains 24 interactions. The highlighted solid edges represent the
interactions present in the corresponding OLP. The highlighted dashed
triggered by KRAS that eventually protects MYC from degrada-
edges represent the interactions of the potential oncogenic pathway
tion by GSK3B. (ERBB2 ? EGFR ? STAT3 ? VEGFA). Blue and red edges represent,
Taken together, these results, i.e. the high fraction (& 83%) of respectively, protein physical and transcriptional regulation interac-
constructed subnetworks containing 50% or more interactions tions; orange nodes represent genes participating in the known or
from their corresponding OLPs and the discovery of oncogenic potential oncogenic pathways.
pathways experimentally reported in literature, are compelling doi:10.1371/journal.pone.0077521.g007
enough to suggest that the oncogenic scores assigned to
interactions in the first step of graph2sig can reliably be used as that uses network centralities as training attributes to construct
edge weights in the extraction of oncogenic signaling subnetworks prediction models capable to assign oncogenic scores to interac-
between any two genes of interest in the INHGI in the second step tions that, in turn, are the base for the extraction of cancer-related
of graph2sig. signaling subnetworks.
We could demonstrate that the combination of machine
Concluding remarks learning and graph theory is promising in prioritizing (1)
In an effort to accelerate the pace of discovery of cancer-related interactions capable to transmit oncogenic signals and (2)
interactions and subnetworks, we designed a network topology- cancer-related signaling subnetworks. Similarly to the predictive
based machine learning computational approach, the graph2sig, performance of models constructed to predict essential genes in

PLOS ONE | www.plosone.org 10 October 2013 | Volume 8 | Issue 10 | e77521


Oncogenic Interactions and Signaling Networks

Escherichia coli [6] and Saccharomyces cerevisiae [7] and morbid and Figure S2 The KRAS ? CCND1 subnetwork. This
druggable genes in human [8], the prediction model constructed subnetwork contains 134 interactions. The highlighted solid edges
in the first steps of graph2sig presented a predictive performance represent the interactions present in the corresponding OLP. The
reliable enough (median recall of 0.89, median precision of 0.77 highlighted dashed edges represent the interactions of the potential
and median AUC of 0.89) to assign oncogenic scores to the oncogenic pathway (KRAS ? PIK3CA ? AKT1 ? GSK3B ?
interactions of the INHGI. From this finding we can conclude that MYC ? CCND1). Blue, red and green edges represent,
network centralities are predictive of the oncogenic nature of respectively, protein physical, transcriptional regulation and
interactions. metabolic interactions; orange nodes represent genes participating
Regarding the utilization of oncogenic scores as edges weights in the known or potential oncogenic pathways.
for the extraction of oncogenic signaling subnetworks, we reason (PDF)
that network centralities, indirectly through the oncogenic scores,
are also predictive of cancer-related signaling subnetworks since Dataset S1 Complete data for the wINHGI. This is a tab-
more than 80% of constructed subnetworks contain more than delimited file that includes a table containing all interactions of
50% of original interactions in their corresponding OLPs. In wINGHI with their type of interaction (protein physical, transcrip-
addition, the novel potential oncogenic pathways originally absent tional regulation and metabolic interactions), calculated network
from KEGG PATHWAY but embedded in the constructed centralities and oncogenic scores. Furthermore, it is also possible
oncogenic signaling subnetworks seem to be biologically plausible to find whether interactions are present in KEGG PATHWAY as
as demonstrated by experimental evidence taken from the oncogenic interactions (interactions used as positive examples in
biomedical literature. the training step) and whether interactions belong to the set of
To the best of our knowledge, this is the first time that the putative oncogenic interactions (oncogenic signal transduction
combination of machine learning and graph theory is used to events). The identification of interactors are EntrezGeneIDs. The
predict both the oncogenic potential of interactions and potential list of interactions are ordered by oncogenic score.
cancer-related signaling subnetworks. We envisage that the (ZIP)
graph2sig itself and the weighted integrated network of human Dataset S2 The 52 subnetworks constructed by graph2-
genes interactions, a network created in the first steps of graph2sig, sig. Tab-limited file containing all 52 subnetworks constructed by
will serve as platforms for elucidating the relationship between graph2sig. For each subnetwork are shown the interactions (the
interactions and the expression of the malignant phenotype.
identification of interactors is the official gene symbol), the type of
Furthermore, as part of an integrative systems biology framework
interaction (protein physical, transcriptional regulation and
to facilitate the interpretation of cancer genome sequencing data
metabolic interactions) and the oncogenic scores.
[37], graph2sig could be used in two ways: the selection of the most
(TXT)
relevant mutated genes according to their presence in high
oncogenic interactions and the discovery of subnetworks most Table S1 Statistics for the 52 constructed subnetworks.
likely to be affected by these most relevant mutated gene. Finally, This spreadsheet shows the statistics for the 52 constructed
we also expect that the graph2sig can be used to predict and extract subnetworks including the initial and target genes of the OLPs, the
signaling pathways related to phenotypes other than cancer. cancer type from which the OLPs were extracted, the sizes of
OLPs and constructed subnetworks and the ratio between the
Supporting Information number of interactions of OLPs in the subnetworks and the actual
number of interactions in OLPs.
Figure S1 The MET ? JUN subnetwork. This subnetwork
(PDF)
contains 116 interactions. The highlighted solid edges represent
the interactions present in the corresponding OLP. The
highlighted dashed edges represent the interactions of the potential Author Contributions
oncogenic pathway (MET ? STAT3 ? JUN). Blue edges Conceived and designed the experiments: MLA NL. Performed the
represent protein physical interactions and orange nodes represent experiments: MLA. Analyzed the data: MLA. Contributed reagents/
genes participating in the known or potential oncogenic pathways. materials/analysis tools: LAB EC. Wrote the paper: MLA NL.
(PDF)

References
1. Hanahan D, Weinberg RA (2011) Hallmarks of cancer: the next generation. Cell 9. Jimenez VM, Marzal A (1999) Computing the K shortest paths: a new algorithm
144: 646–674. and an experimental comparison. Lecture Notes in Computer Science 1668: 15–
2. Filmus J, Robles AI, Shi W, Wong MJ, Colombo LL, et al. (1994) Induction of 29.
cyclin D1 overexpression by activated ras. Oncogene 9: 3627–3633. 10. Schaefer MH, Fontaine JF, Vinayagam A, Porras P, Wanker EE, et al. (2012)
3. Kim JK, Diehl JA (2009) Nuclear cyclin D1: an oncogenic driver in human HIPPIE: Integrating protein interaction networks with experiment based quality
cancer. J Cell Physiol 220: 292–296. scores. PLoS One 7: e31826.
4. Wang E (2010) Cancer systems biology. Taylor & Francis. 11. Kanehisa M, Araki M, Goto S, Hattori M, Hirakawa M, et al. (2008) KEGG for
5. Wang E, Lenferink A, O’Connor-McCourt M (2007) Cancer systems biology: linking genomes to life and the environment. Nucleic Acids Res 36: D480–4.
exploring cancer-associated genes on cellular networks. Cell Mol Life Sci 64: 12. Bovolenta LA, Acencio ML, Lemke N (2012) HTRIdb: an open-access database
1752–62. for experimentally verified human transcriptional regulation interactions. BMC
6. Da Silva JPM, Acencio ML, Mombachb JCM, Vieirac R, da Silva J, et al. (2008) Genomics 13: 405.
In silico network topology-based prediction of gene essentiality. Physica A 387: 13. Duarte NC, Becker SA, Jamshidi N, Thiele I, Mo ML, et al. (2007) Global
1049–1055. reconstruction of the human metabolic network based on genomic and bibliomic
7. Acencio ML, Lemke N (2009) Towards the prediction of essential genes by data. PNAS 104: 1777–1782.
integration of network topology, cellular localization and biological process 14. Huss M, Holme P (2007) Currency and commodity metabolites: their
information. BMC Bioinformatics 10: 290. identification and relation to the modularity of metabolic networks. IET Syst
8. Costa PR, Acencio ML, Lemke N (2010) A machine learning approach for Biol 1: 280–285.
genome-wide prediction of morbid and druggable human genes based on 15. Maglott D, Ostell J, Pruitt KD, Tatusova T (2007) Entrez Gene: gene-centered
systems-level data. BMC Genomics 11: S9. information at NCBI. Nucleic Acids Research 35: D26–D31.

PLOS ONE | www.plosone.org 11 October 2013 | Volume 8 | Issue 10 | e77521


Oncogenic Interactions and Signaling Networks

16. Hagberg AA, Schult DA, Swart PJ (2008) Exploring network structure, 29. Kanehisa M, Goto S, Sato Y, Furumichi M, Tanabe M (2012) KEGG for
dynamics, and function using NetworkX. In: Proceedings of the 7th Python in integration and interpretation of large-scale molecular data sets. Nucleic Acids
Science Conference (SciPy2008). Pasadena, CA USA, 11–15. Res 40: D109–D114.
17. Witten IH, Frank E (2000) Data Mining: Practical Machine Learning Tools and 30. Coluccia AML, Vacca A, Duñach M, Mologni L, Redaelli S, et al. (2007) Bcr-
Techniques with Java Implementations. San Francisco: Morgan Kaufmann. Abl stabilizes beta-catenin in chronic myeloid leukemia through its tyrosine
18. Breiman L (1996) Bagging predictors. Machine Learning 24: 123. phosphorylation. EMBO J 26: 1456–1466.
19. Quinlan JR (1993) C4.5: programs for machine learning. San Francisco: 31. Deng J, Miller SA, Wang HY, Xia W, Wen Y, et al. (2002) beta-catenin interacts
Morgan Kaufmann. with and inhibits NF-kappa B in human colon and breast cancer. Cancer Cell 2:
20. Huang J, Ling CX (2005) Using AUC and Accuracy in Evaluating Learning 323–334.
Algorithms. IEEE Trans on Knowl and Data Eng 17: 299–310. 32. Syed ZA, Yin W, Hughes K, Gill JN, Shi R, et al. (2011) HGF/c-met/Stat3
21. Hand DJ, Till RJ (2001) A Simple Generalisation of the Area Under the ROC signaling during skin tumor cell invasion: indications for a positive feedback
Curve for Multiple Class Classification Problems. Mach Learn 45: 171–186. loop. BMC Cancer 11: 180.
22. Wilcoxon F (1947) Probability tables for individual comparisons by ranking 33. Ivanov VN, Bhoumik A, Krasilnikov M, Raz R, Owen-Schaub LB, et al. (2001)
methods. Biometrics 3: 119–22. Cooperation between STAT3 and c-jun suppresses Fas transcription. Mol Cell
23. Demsar J (2006) Statistical Comparisons of Classifiers over Multiple Data Sets. 7: 517–528.
J Mach Learn Res 7: 1–30. 34. Wang SE, Narasanna A, Perez-Torres M, Xiang B, Wu FY, et al. (2006) HER2
24. Kittler J, Hatef M, Duin RP, Matas J (1998) On Combining Classifiers. IEEE kinase domain mutation results in constitutive phosphorylation and activation of
Transactions on Pattern Analysis and Machine Intelligence 20: 226–239. HER2 and EGFR and resistance to EGFR tyrosine kinase inhibitors. Cancer
25. Stumpf MPH, Thorne T, de Silva E, Stewart R, An HJ, et al. (2008) Estimating Cell 10: 25–38.
the size of the human interactome. Proc Natl Acad Sci U S A 105: 6959–64. 35. Jaganathan S, Yue P, Paladino DC, Bogdanovic J, Huo Q, et al. (2011) A
26. Kandasamy K, Mohan SS, Raju R, Keerthikumar S, Kumar GSS, et al. (2010) functional nuclear epidermal growth factor receptor, SRC and Stat3
NetPath: a public resource of curated signal transduction pathways. Genome heteromeric complex in pancreatic cancer cells. PLoS One 6: e19605.
Biol 11: R3. 36. Malempati S, Tibbitts D, Cunningham M, Akkari Y, Olson S, et al. (2006)
27. Cui Q, Ma Y, Jaramillo M, Bari H, Awan A, et al. (2007) A map of human Aberrant stabilization of c-Myc protein in some lymphoblastic leukemias.
cancer signaling. Mol Syst Biol 3: 152. Leukemia 20: 1572–1581.
28. Futreal PA, Coin L, Marshall M, Down T, Hubbard T, et al. (2004) A census of 37. Wang E (2012) Understanding genomic alterations in cancer genomes using an
human cancer genes. Nat Rev Cancer 4: 177–183. integrative network approach. Cancer Lett.

PLOS ONE | www.plosone.org 12 October 2013 | Volume 8 | Issue 10 | e77521

You might also like