Abstract
Biomedical research has seen great advances in recent years, in great part due to the long-term aid of the ability to identify biological or genetic markers that uniquely match a given disease. Despite several successes stories, the reality is that most diseases still lack an effective way of treatment, and even diagnostic. While the emergence of –omic technologies, enabled the screening of a whole cell at the molecular level, the large quantities of data produced restricted the capability to extract valid outcomes.
In this paper, we propose an optimization model, based of mixed-integer linear programming, capable of identifying a combination of biomarkers for distinguishing between healthy and diseased samples. The model achieves this taking several individuals’ gene expression profiles, identifying the most relevant genes for differentiation and discovering the optimal combination of biomarkers that best explains the difference between both states. This model was validated on two different datasets through sampling analysis, achieving an out of sample accuracy up to 93%.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Colburn, W.A., DeGruttola, V.G., DeMets, D.L., Downing, G.J., Hoth, D.F., Oates, J.A., Peck, C.C., Schooley, R.T., Spilker, B.A., Woodcock, J., Zeger, S.L.: Biomarkers definitions working group: biomarkers and surrogate endpoints: preferred definitions and conceptual framework. Clin. Pharmacol. Ther. 69(3), 89–95 (2001)
LaBaer, J.: So, you want to look for biomarkers. J. Proteome Res. 4(4), 1053–1059 (2005)
Manolio, T.: Novel risk markers and clinical practice. New Engl. J. Med. 349(17), 1587–1589 (2003)
Lee, Y.H., Wong, D.T.: Saliva: an emerging biofluid for early detection of diseases. Am. J. Dent. 22(4), 241–248 (2009)
Schrohl, A.S., Würtz, S., Kohn, E., Banks, R.E., Nielsen, H.J., Sweep, F.C.G.J., Brünner, N.: Banking of biological fluids for studies of disease-associated protein biomarkers. Mol. Cell. Proteomics MCP 7(10), 2061–2066 (2008)
Sidransky, D.: Nucleic acid-based methods for the detection of cancer. Science 278(5340), 1054–1059 (1997). New York
Wang, Q., Gao, P., Wang, X., Duan, Y.: Investigation and identification of potential biomarkers in human saliva for the early diagnosis of oral squamous cell carcinoma. Clin. Chim. Acta Int. J. Clin. Chem. 427, 79–85 (2014)
Baliban, R.C., Sakellari, D., Li, Z., Guzman, Y.A., Garcia, B.A., Floudas, C.A.: Discovery of biomarker combinations that predict periodontal health or disease with high accuracy from GCF samples based on high-throughput proteomic analysis and mixed-integer linear optimization. J. Clin. Periodontol. 40(2), 131–139 (2013)
Puthiyedth, N., Riveros, C., Berretta, R., Moscato, P.: A new combinatorial optimization approach for integrated feature selection using different datasets: a prostate cancer transcriptomic study. PloS one 10(6), e0127702 (2015)
Li, W.Q., Hu, N., Burton, V.H., Yang, H.H., Su, H., Conway, C.M., Wang, L., Wang, C., Ding, T., Xu, Y., Giffen, C., Abnet, C.C., Goldstein, A.M., Hewitt, S.M., Taylor, P.R.: PLCE1 mRNA and protein expression and survival of patients with esophageal squamous cell carcinoma and gastric adenocarcinoma. Cancer Epidemiol. Biomarkers Prev. Publ. Am. Assoc. Cancer Res. Cosponsored Am. Soc. Prev. Oncol. 23(8), 1579–1588 (2014)
Su, H., Hu, N., Yang, H.H., Wang, C., Takikita, M., Wang, Q.H., Giffen, C., Clifford, R., Hewitt, S.M., Shou, J.Z., Goldstein, A.M., Lee, M.P., Taylor, P.R.: Global gene expression profiling and validation in esophageal squamous cell carcinoma and its association with clinical phenotypes. Clin. Cancer Res. Official J. Am. Assoc. Cancer Res. 17(9), 2955–2966 (2011)
Maire, V., Némati, F., Richardson, M., Vincent-Salomon, A., Tesson, B., Rigaill, G., Gravier, E., Marty-Prouvost, B., De Koning, L., Lang, G., Gentien, D., Dumont, A., Barillot, E., Marangoni, E., Decaudin, D., Roman-Roman, S., Pierré, A., Cruzalegui, F., Depil, S., Tucker, G.C., Dubois, T.: Polo-like kinase 1: a potential therapeutic option in combination with conventional chemotherapy for the management of patients with triple-negative breast cancer. Cancer Res. 73(2), 813–823 (2013)
Maire, V., Baldeyron, C., Richardson, M., Tesson, B., Vincent-Salomon, A., Gravier, E., Marty-Prouvost, B., De Koning, L., Rigaill, G., Dumont, A., Gentien, D., Barillot, E., Roman-Roman, S., Depil, S., Cruzalegui, F., Pierré, A., Tucker, G.C., Dubois, T.: TTK/hMPS1 is an attractive therapeutic target for triple-negative breast cancer. PloS one 8(5), e63712 (2013)
Maubant, S., Tesson, B., Maire, V., Ye, M., Rigaill, G., Gentien, D., Cruzalegui, F., Tucker, G.C., Roman-Roman, S., Dubois, T.: Transcriptome analysis of Wnt3a-treated triple-negative breast cancer cells. PloS one 10(4), e0122333 (2015)
Falcon, R.G., Sarkar, D.: Category: Category Analysis. R package version 2.34.2
Falcon, S., Gentleman, R.: Using GOstats to test gene lists for GO term association. Bioinformatics 23(2), 257–258 (2007). Oxford, England
Gautier, L., Cope, L., Bolstad, B.M., Irizarry, R.A.: affy-analysis of Affymetrix Genechip data at the probe level. Bioinformatics 20(3), 307–315 (2004). Oxford, England
Gentleman, R., Carey, V., Huber, W., Hahne, F.: Genefilter: methods for filtering genes from microarray experiments. R package version 1.50.0
Ritchie, M.E., Phipson, B., Wu, D., Hu, Y., Law, C.W., Shi, W., Smyth, G.K.: limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 43(7), 47 (2015)
Carlson, M.: hgu133a.db: Affymetrix Human Genome U133 Set annotation data (chip hgu133a). R package version 3.1.3
Carlson, M.: hgu133plus2.db: Affymetrix Human Genome U133 Plus 2.0 Array annotation data (chip hgu133plus2). R package version 3.1.3
Gentleman, R.C., Carey, V.J., Bates, D.M., Bolstad, B., Dettling, M., Dudoit, S., Ellis, B., Gautier, L., Ge, Y., Gentry, J., Hornik, K., Hothorn, T., Huber, W., Iacus, S., Irizarry, R., Leisch, F., Li, C., Maechler, M., Rossini, A.J., Sawitzki, G., Smith, C., Smyth, G., Tierney, L., Yang, J.Y.H., Zhang, J.: Bioconductor: open software development for computational biology and bioinformatics. Genome Biol. 5(10), R80 (2004)
Carlson, M.: hgu133b.db: Affymetrix Human Genome U133 Set annotation data (chip hgu133b). R package version 3.1.3
Davis, S., Meltzer, P.S.: GEOquery: a bridge between the Gene Expression Omnibus (GEO) and BioConductor. Bioinformatics 23(14), 1846–1847 (2007). Oxford, England
Sun, M., Xiong, M.: A mathematical programming approach for gene selection and tissue classification. Bioinformatics 19(10), 1243–1251 (2003). Oxford, England
Zou, M., Zhang, P.J., Wen, X.Y., Chen, L., Tian, Y.P., Wang, Y.: A novel mixed integer programming for multi-biomarker panel identification by distinguishing malignant from benign colorectal tumors. Methods 83, 3–17 (2015). San Diego, California
Acknowledgments
This work is co-funded by the North Portugal Regional Operational Programme, under the “Portugal 2020”, through the European Regional Development Fund (ERDF), within project SISBI- Ref\(^{\mathrm {a}}\) NORTE-01-0247-FEDER-003381. This study was also supported by the Portuguese Foundation for Science and Technology (FCT) under the scope of the strategic funding of UID/BIO/04469/2013 unit and COMPETE 2020 (POCI-01-0145-FEDER-006684) and BioTecNorte operation (NORTE-01-0145-FEDER-000004) funded by European Regional Development Fund under the scope of Norte2020 - Programa Operacional Regional do Norte. Joel P. Arrais is funded by CISUC - Center for Informatics and Systems of the University of Coimbra.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Santiago, A.M., Rocha, M., Dourado, A., Arrais, J.P. (2017). Mixed-Integer Programming Model for Profiling Disease Biomarkers from Gene Expression Studies. In: Rojas, I., Ortuño, F. (eds) Bioinformatics and Biomedical Engineering. IWBBIO 2017. Lecture Notes in Computer Science(), vol 10209. Springer, Cham. https://doi.org/10.1007/978-3-319-56154-7_6
Download citation
DOI: https://doi.org/10.1007/978-3-319-56154-7_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-56153-0
Online ISBN: 978-3-319-56154-7
eBook Packages: Computer ScienceComputer Science (R0)