Bioinformatic analysis of differential expression
Bioinformatic analysis of differential expression
Bioinformatic analysis of differential expression
www.ijcep.com /ISSN:1936-2625/IJCEP0069951
Original Article
Bioinformatic analysis of differential
expression and core GENEs in breast cancer
Hongchang Dong1*, Shuai Zhang1*, Yu Wei2*, Chunyan Liu2, Na Wang1, Pan Zhang1, Jingling Zhu1, Jin Huang1
1
The Key Laboratory of Xinjiang Endemic & Ethnic Diseases and Department of Biochemistry, Shihezi University
School of Medicine, Shihezi, Xinjiang, China; 2The First Affiliated Hospital of Medical College of Shihezi University,
Shihezi, Xinjiang, China. *Equal contributors.
Received November 27, 2017; Accepted December 22, 2017; Epub March 1, 2018; Published March 15, 2018
Abstract: Breast cancer (BRCA) is one of the most common malignancies in women. The gene expression profile
of GSE103512 from the GEO database was downloaded in order to find key genes involved in the occurrence and
development of BRCA. 75 samples, including 65 cancer and 10 normal samples, were included in this analysis.
Differentially expressed genes (DEGs) between BRCA patients and health people were chosen using R tool. We next
performed gene ontology (GO) analysis and Kyoto Encyclopedia of Gene and Genome (KEGG) pathway analysis us-
ing the Database for Annotation, Visualization and Integrated Discovery (DAVID). Moreover, Cytoscape with Search
Tool for the Retrieval of Interacting Genes (STRING) was utilized to visualize protein-protein interaction (PPI) of these
DEGs. The related genes and medicines specific to hub genes were predicted by CBioportal. We screened a total
of 357 DEGs including 77 up-regulated and 280 down-regulated. A series of BRCA related GO terms and pathways
were identified by analysis of these DEGs. Insulin-like growth factor 1 (IGF1); epidermal growth factor receptor
(EGFR); v-jun avian sarcoma virus 17 oncogene homolog (JUN) and Estrogen Receptor 1 (ESR1) of the DEGs were
screened by construction of the PPI network and the degree of connectivity. IGF1 and ESR1 were finally selected as
potential hub genes and treatment targets of BRCA. In conclusion, this bioinformatics analysis demonstrated that
DEGs and hub genes, such as IGF1, might regulate the development of gastric cancer. These DEGs could be used
as new biomarkers for diagnosis and to guide the combination medicine of BRCA.
Keywords: Breast cancer, bioinformatics analysis, differential expression genes, biomarker, therapeutic
py, and targeted therapy using antibodies rec- Genomes (KEGG) is a collection of databases
ognizing cancer biomarkers; however, they are dealing with genomes, biological pathways, dis-
not always successful and may cause adverse eases, drugs, and chemical substances [16].
effects and drug tolerance [11, 12]. Therefore, DEGs from GSE were uploaded to DAVID to per-
there is a constant need to investigate the form the BP (biological process), CC (cellular
development mechanism for breast cancer component), MF (molecular function) and path-
diagnosis and therapy. way enrichment analysis. P < 0.05 was set as
the cut-off criterion.
High throughput sequencing is increasingly be-
ing widely used and it has been a very signifi- PPI network construction and core genes
cant tool for life sciences, such as cancer grad- analysis
ing, cancer early diagnosis, and prognosis pre-
diction [12]. In this study, to further understand The STRING database (http://string-db.org) ai-
breast cancer, the GSE103512 was download- ms to provide a critical assessment and inte-
ed from GEO datasets and analyzed by R bio- gration of protein-protein interactions, includ-
conductor for DEGs. Analysis of the biological ing direct (physical) as well as indirect (func-
process (BP), molecular function (MF), cellular tional) associations [17]. DEGs from GSE103-
component (CC), and KEGG pathways of the 512 were uploaded to STRING for the known
DEGs and three modules were performed. The and predicted interactions analysis (confidence
regulatory network of these DEGs was con- score ≥ 0.15, maximum number of interactors
structed using string and the key genes with = 0). Next, the result was imported to Cytoscape
high degree of connectivity were selected by for the core genes screening with degree cutoff
Cytoscape. Finally, the correlation between = 25 [18].
genes, survival, and disease was confirmed
based on Kaplan Meier plotter online database Survival analysis of key genes
(http://kmplot.com/analysis/) [13] and TCGA
database. To assess the effect of 54,675 genes on sur-
vival using 10,461 cancer samples, 5,143 bre-
Materials and methods ast, 1,816 ovarian, 2,437 lung and 1,065 gas-
tric cancer patients with a mean follow-up of
Data source and DEGs identification
69/40/49/33 months [19] were analyzed. The
The GSE103512 datasets including 65 breast primary purpose of the tool is as a meta-analy-
cancer with 10 matched normal; 57 colorectal sis based biomarker assessment [19]. In this
cancers with 12 matched normal; 60 non-small study, key genes (ESR1, IGF1, EGFR, JUN) were
cell lung cancers with 9 matched normal; 60 assessed for their effect on survival with 95%
prostate cancers with 7 matched normal, was confidence intervals and log rank P value <
downloaded from GEO. The breast cancer data 0.01.
were chosen to perform the differential expres-
sion genes analysis. The Bioconductor package Correlation between hub genes and breast
of R was applied to detect the DEGs with cut-off cancer
criteria (adjust P value < 0.01 and |logFC| ≥ 1)
[14]. A total of 357 genes with significant dif- Study on the correlation between hub genes
ferential expression were selected after the and breast cancer was performed by the online
analysis of GSE103512; of which, 77 were up- tools GEPIA (http://gepia.cancer-pku.cn/index.
regulated genes and 280 were down-regulated html). GEPIA is a web server for analyzing the
(Table 1). RNA sequencing expression data of 9,736
tumors and 8,587 normal samples from the
Gene ontology and KEGG pathway enrichment TCGA and the GTEx projects, using a standard
analysis processing pipeline [20].
To identify characteristic biological alterations Medicine and regulatory factors of hub genes
in high throughput genome and transcriptome prediction
data, gene ontology analysis (GO), an online
tools containing numerous database, is a usual CBioPortal (http://www.cbioportal.org/) web-
method for annotating genes and gene prod- site integrates the data of 126 tumor genome
ucts [15]. Kyoto Encyclopedia of Genes and research, including TCGA and ICGC, covering
Figure 1. DEG screening of GSE 103512 datasets. Cassette figures before (A) and after (B) data standardization.
(C) Hierarchical clustering of the first 150 DEGs. The color scale shown at the top illustrates the relative expression
level of an mRNA. Red color represents a high relative expression level and a green color represents a low relative
expression level.
the data of twenty-eight thousand specimens. set the adjust P value < 0.01 and |logFC| ≥ 1 as
Here we used this website for prediction of cut-off criteria. The alignment of black dots on
medicine and regulation factors of hub genes the same line indicates good standardization.
based on such database including Phospho- Analysis of the mRNA expression fold-change of
Site, KEGG Drugs, pid, HumanCyc, Reactome, the DEGs showed a highly significant difference
PANTHER and DrugBank [21]. between BRCA and normal tissues (Figure 1C).
These results revealed that the data can be
Results directly used for further analysis. Finally, a to-
tal of 357 differential expressed genes were
Screening of DEGs detected after the analysis of GSE103512, of
which 77 were up-regulated genes and 280
To observe the alteration between breast can- were down-regulated (Table 1).
cer and normal tissues on the transcriptome
level, 65 breast cancer and 10 matched normal Identification of GO terms and pathways re-
mRNA expression values of GSE103512 datas- lated to DEGs
ets were analyzed by R conductor package.
Cassette figures before (Figure 1A) and after The Database for Annotation, Visualization and
(Figure 1B) data standardization are shown. We Integrated Discovery is an online bioinformatics
1149 Int J Clin Exp Pathol 2018;11(3):1146-1156
Core genes in breast cancer development
Figure 2. GO and KEGG pathway analysis of DEGs using-log p. A. Analysis of molecular function enrichment. B.
Analysis of biological process enrichment. C. Analysis of cellular component enrichment. D. Analysis of KEGG path-
way analysis.
Figure 3. PPI network construction of DEGs. The yellow node in the network represents the core node with degree
≥ 25.
resource that provides tools for the functional cancer development, we constructed the regu-
enrichment of large lists of genes or proteins latory network by analysis of PPI and then IGF1
[22]. To better understand breast cancer devel- (degree = 36), EGFR (degree = 36), JUN (degree
opment, molecular function (Figure 2A), bio- = 33), ESR1 (degree = 34) were selected as the
logical process (Figure 2B), cellular component hub genes (Figure 3).
(Figure 2C) and signaling pathway (Figure 2D)
alteration in the disease were analyzed by IGF1 and ESR1 alterations may play important
DAVID. role in breast cancer patient’s survival
PPI network analysis reveals the hub genes Survival analysis is widely used in clinical and
among DEGs epidemiological research, in randomized clini-
cal trials for comparing the efficacy of treat-
The potential main regulators may be the focal ments, and in observational (non-randomized)
point for therapeutic and drug design. In this research to determine and test the existence of
analysis, 77 up-regulated and 280 down-regu- epidemiological association [22]. In our study,
lated genes were selected based on our crite- survival analysis was performed to assess the
rion. To obtain the core genes during breast hub genes in patient’s survival. The results sug-
Figure 4. Prognostic value of four genes (IGF1 (A), ESR1 (B), JUN (C), EGFR (D)) in BRCA patients. The desired
Affymetrix IDs are valid: 209541_at (IGF1), 201464_x_at (ESR1), 202311_s_at (JUN), 1565483_at (EGFR). HR:
hazard ratio, CI: confidence interval.
gest that expression of IGF1 (HR = 0.75 (0.68- grated PhosphoSite, KEGG Drugs, pid, Human-
0.84) log rank P = 4.2e-07) (Figure 4A) and Cyc, Reactome, PANTHER, and DrugBank data-
ESR1 (HR = 0.67 (0.6-0.74) log rank P = 2.5e- bases, to analyze genes and drugs. In our
13) (Figure 4B) but not JUN (HR = 0.88 (0.79- results, 37 FDA approved drugs and 41 non-
0.98) log rank P = 0.02) (Figure 4C) and EGFR FDA approved drugs were selected for ESR1.
(HR = 0.87 (0.74-1.01) log rank P = 0.074) Only Deoxy-Bigchap was selected for IGF1
(Figure 4D) was associated with worse overall (Figure 6). There were also many proteins that
survival (OS) for breast cancer patients. To could regulate IGF1 and ESR in transport, phos-
determine the correlation between hub genes phorylation, expression, and other potential
and breast cancer, IGF1, and ESR1 expression ways (Figure 6).
was evaluated by GEPIA. Indeed, expression of
IGF1 (Figure 5A) was down-regulated and ESR1 Discussion
(Figure 5B) was up-regulated in breast cancer.
Various genes and medicines specific to IGF1 Approximately 232,340 new cases of invasive
and ESR1 reveal IGF1 and ESR1 as potential breast cancer and 39,620 breast cancer dea-
targets for breast cancer treatment ths are expected to occur among US women in
2013. Although exemestane is currently appro-
To evaluate if IGF1 and ESR1 could be targets ved by the US Food and Drug Administration to
for breast cancer therapeutics, a website inte- prevent breast cancer recurrence, promising
Figure 6. Medicine and regulatory factors of IGF1 and ESR1 prediction. Pink circles: genes, white hexagon: drugs
not approved by FDA, yellow hexagon: FDA approved drugs, Pink arrows: transport controls, turquoise arrows: phos-
phorylation controls, green arrows: controls expression, yellow lines: targeted by drug.
and therapeutics. Here, we demonstrate that peutics. Screening of these drugs should be
the mRNA level of IGF1 was significantly de- performed. The genes that regulate ESR1 tran-
creased between BRCA tumor and normal tis- sport, phosphorylation, expression, and other
sues and seemed to be the core factor among potential ways may also participate in the
these DEGs. There are many studies that have human self-protection against BRCA.
shown that mutation of ESR1 was essential for
breast cancer and this gene has been applied Our bioinformatics analysis identified DEGs
as a breast cancer biomarker [2, 3, 29]. It also that might play a central role in the occurrence,
indicated the effectiveness of our analysis. In development, and prognosis of breast cancer.
addition, the breast cancer patients in our In this study, a total of 357 DEGs were selected,
study are ESR1 high expressers, but it is inter- and EGFR, JUN, IGF1, and ESR1 might be the
esting that breast cancer patients with high core genes of breast cancer. In order to get
ESR1 present higher survival than low ESR1 more accurate correlation results, we perform-
expression. We suspect this phenomenon is a ed a series of verification experiments later to
self-protection mechanism against breast can-
confirm the results of this prediction. Finally,
cer and deserves experimental verification.
IGF1 and ESR1 were selected as hub genes for
We also predict the potential drugs for breast diagnosis and treatment of breast cancer.
cancer therapeutics according to our analysis, Overall, this study provides some powerful evi-
and this also highlights the potential of IGF1 dence for future genomic individualized diagno-
and ESR1 as targets of breast cancer thera- sis and treatment of breast cancer.
identifying contextually relevant hubs in bio- [25] Cooper CS, Campbell C and Jhavar S. Mecha-
logical networks. F1000Res 2016; 5: 1745. nisms of disease: biomarkers and molecular
[19] Hou GX, Liu P, Yang J and Wen S. Mining ex- targets from microarray gene expression stud-
pression and prognosis of topoisomerase iso- ies in prostate cancer. Nat Clin Pract Urol
forms in non-small-cell lung cancer by using 2007; 4: 677-687.
Oncomine and Kaplan-Meier plotter. PLoS One [26] Scherzer CR, Eklund AC, Morse LJ, Liao Z, Lo-
2017; 12: e0174515. cascio JJ, Fefer D, Schwarzschild MA, Schloss-
[20] Tang Z, Li C, Kang B, Gao G, Li C and Zhang Z. macher MG, Hauser MA, Vance JM, Sudarsky
GEPIA: a web server for cancer and normal LR, Standaert DG, Growdon JH, Jensen RV and
gene expression profiling and interactive anal- Gullans SR. Molecular markers of early Parkin-
yses. Nucleic Acids Res 2017; [Epub ahead of son’s disease based on gene expression in
print]. blood. Proc Natl Acad Sci U S A 2007; 104:
[21] Gao J, Aksoy BA, Dogrusoz U, Dresdner G, 955-960.
Gross B, Sumer SO, Sun Y, Jacobsen A, Sinha [27] Wairagu PM, Phan AN, Kim MK, Han J, Kim
R, Larsson E, Cerami E, Sander C and Schultz HW, Choi JW, Kim KW, Cha SK, Park KH and
N. Integrative analysis of complex cancer ge- Jeong Y. Insulin priming effect on estradiol-in-
nomics and clinical profiles using the cBioPor- duced breast cancer metabolism and growth.
tal. Sci Signal 2013; 6: pl1. Cancer Biol Ther 2015; 16: 484-492.
[22] Flynn R. Survival analysis. J Clin Nurs 2012; [28] De Santi M, Annibalini G, Barbieri E, Villarini A,
21: 2789-2797. Vallorani L, Contarelli S, Berrino F, Stocchi V
[23] Landi MT, Dracheva T, Rotunno M, Figueroa JD, and Brandi G. Human IGF1 pro-forms induce
Liu H, Dasgupta A, Mann FE, Fukuoka J, Hames breast cancer cell proliferation via the IGF1 re-
M, Bergen AW, Murphy SE, Yang P, Pesatori AC, ceptor. Cell Oncol (Dordr) 2016; 39: 149-159.
Consonni D, Bertazzi PA, Wacholder S, Shih JH, [29] O’Brien KM, Cole SR, Engel LS, Bensen JT,
Caporaso NE and Jen J. Gene expression sig- Poole C, Herring AH and Millikan RC. Breast
nature of cigarette smoking and its role in lung cancer subtypes and previously established
adenocarcinoma development and survival. genetic risk factors: a bayesian approach. Can-
PLoS One 2008; 3: e1651. cer Epidemiol Biomarkers Prev 2014; 23: 84-
[24] Hasan AN, Ahmad MW, Madar IH, Grace BL 97.
and Hasan TN. An in silico analytical study of
lung cancer and smokers datasets from gene
expression omnibus (GEO) for prediction of dif-
ferentially expressed genes. Bioinformation
2015; 11: 229-235.