A mixed integer programming-based global optimization framework for analyzing gene expression data

Felici, Giovanni; Tripathi, Kumar Parijat; Evangelista, Daniela; Guarracino, Mario Rosario

doi:10.1007/s10898-017-0530-0

A mixed integer programming-based global optimization framework for analyzing gene expression data

Published: 09 May 2017

Volume 69, pages 727–744, (2017)
Cite this article

Journal of Global Optimization Aims and scope Submit manuscript

Giovanni Felici¹,
Kumar Parijat Tripathi²,
Daniela Evangelista² &
…
Mario Rosario Guarracino²

327 Accesses
Explore all metrics

Abstract

The analysis of high throughput gene expression patients/controls experiments is based on the determination of differentially expressed genes according to standard statistical tests. A typical bioinformatics approach to this problem is composed of two separate steps: first, a subset of genes with altered expression level is identified; then the pathways which are statistically enriched by those genes are selected, assuming they play a relevant role for the biological condition under study. Often, the set of selected pathways contains elements that are not related to the condition. This is due to the fact that the statistical significance is not sufficient for biological relevance. To overcome these problems, we propose a method based on a large mixed integer program that implements a new feature selection model to simultaneously identify the genes whose over- and under-expressions, combined together, discriminate different cancer subtypes, as well as the pathways that are enriched by these genes. The innovation in this model is the solutions are driven towards the enrichment of pathways. That may indeed introduce a bias in the search; such a bias is counter-balanced by a wide exploration of the solution space, varying the involved parameters in their feasible region, and then using a global optimization approach. The conjoint analysis of the pool of solutions obtained by this exploration should indeed provide a robust final set of genes and pathways, overcoming the potential drawbacks of relying solely on statistical significance. Experimental results on transcriptomes for different types of cancer from the Cancer Genome Atlas are presented. The method is able to identify crisp relations between the considered subtypes of cancer and few selected pathways, eventually validated by the biological analysis.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

iRDA: a new filter towards predictive, stable, and enriched candidate genes

Article Open access 09 December 2015

Mixed-Integer Programming Model for Profiling Disease Biomarkers from Gene Expression Studies

A study on metaheuristics approaches for gene selection in microarray data: algorithms, applications and open challenges

Article 23 October 2019

References

Huang, D.W., et al.: Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat. Protoc. 4, 44–57 (2009)
Article Google Scholar
Zhang, B., Shi, Z., Duncan, D.T., Prodduturi, N., Marnett, L.J., Liebler, D.C.: Relating protein adduction to gene expression changes: a systems approach. Mol. BioSyst. 7(7), 2118–27 (2011)
Article Google Scholar
Chen, T.W., Gan, R.C.R., Wu, T.H., Huang, P.J., Lee, C.Y., Chen, Y.Y.M., Chen, C.C., Tang, P.: FastAnnotator: an efficient transcript annotation web tool. BMC Genom. 13(7), S9 (2012)
Google Scholar
Tripathi, K.P., Evangelista, D., Zuccaro, A., Guarracino, M.R.: Transcriptator: an automated computational pipeline to annotate assembled reads and identify non coding rna. PLoS One 10(11), e0140268 (2015)
Article Google Scholar
Guarracino, M.R., Cuciniello, S., Pardalos, P.M.: Classification and characterization of gene expression data with generalized eigenvalues. J. Optim. Theory Appl. 141(3), 533–545 (2009)
Article MATH MathSciNet Google Scholar
Fay, D.S., Gerow, K.A.: Biologist’s guide to statistical thinking and analysis. In: WormBook (ed.) The C. elegans Research Community, WormBook (2013). doi:10.1895/wormbook.1.159.1
Martnez-Abran, A.: Statistical significance and biological relevance: a call for a more cautious interpretation of results in ecology. Acta Oecol. doi:10.1016/j.actao.2008.02.004
Lovell, D.P.: Biological importance and statistical significance. J. Agric. Food Chem. 61(35), 8340–8348 (2013). doi:10.1021/jf401124y
Article Google Scholar
European Food Safety Authority: Statistical significance and biological relevance. EFSA J. 9(9), 2372 (2011). doi:10.2903/j.efsa.2011.2372
Huang, D.W., Sherman, B.T., Lempicki, R.A.: Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists. Nucleic Acids Res. 37(1), 1–13 (2009). doi:10.1093/nar/gkn923
Article Google Scholar
Subramanian, A., Tamayoa, P., Moothaa, V.K., Mukherjee, S., Eberta, B.L., Gillettea, M.A., Paulovichg, A., Pomeroyh, S.L., Goluba, T.R., Landera, E.S., Mesirova, J.P.: Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. PNAS 102(43), 15545–15550 (2005)
Article Google Scholar
Holland, P.W.: Statistics and causal inference. J. Am. Stat. Assoc. 81(396), 945–960 (1986). doi:10.1080/01621459.1986.10478354
Article MATH MathSciNet Google Scholar
Guyon, I.: An introduction to variable and feature selection. J. Mach. Learn. Res. Arch. 3, 1157–1182 (2003)
MATH Google Scholar
Pearl, J.: Causality: models, reasoning and inference. Econ. Theory 19, 675–685 (2003)
Article Google Scholar
Sun, M., Xiong, M.: A mathematical programming approach for gene selection and tissue classification. Bioinformatics 19(10), 1243–1251 (2003)
Article Google Scholar
http://cancergenome.nih.gov/
Ritchie, M.E., Phipson, B., Wu, D., Hu, Y., Law, C.W., Shi, W., Smyth, G.K.: Limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. (2015). doi:10.1093/nar/gkv007
IBM ILOG CPLEX - High-performance mathematical programming engine. http://www.ibm.com/software/integration/optimization/cplex
Maldonado, S., Perez, J., Weber, R., Labb, M.: Feature selection for support vector machines via mixed integer linear programming. Inf. Sci. 279, 163–175 (2014)
Article MATH MathSciNet Google Scholar
Liu, H., Motoda, H.: Feature Selection for Knowledge Discovery and Data Mining. Kluwer Academic Publishers, Dordrecht (2000)
MATH Google Scholar
Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003)
MATH Google Scholar
Felici, G., de Angelis, V., Mancinelli, G.: Feature selection for data mining. In: Felici, G., Trintaphyllou, E. (eds.) Data Mining and Knowledge Discovery Approaches Based on Rule Induction Techniques. Springer, Berlin (2006)
Google Scholar
Mosca, Ettore, Milanesi, Luciano: Network-based analysis of omics with multi-objective optimization. Mol. BioSyst. 9(12), 2971–2980 (2013)
Article Google Scholar
Felici, G., Bertolazzi, P., Guarracino, M., Chinchuluun, A., Pardalos, P.: Logic formulas based knowledge discovery and its application to the classification of biological data. In: Mondaini, R.P. (ed.) BIOMAT 2008, 2009. World Scientific, Singapore, pp. 265-279. ISBN: 978-981-4271-81-3
Bertolazzi, P., Felici, G., Weitschek, E.: Learning to classify species with barcodes. BMC Bioinf. 10, 1–12 (2009)
Article Google Scholar
Bertolazzi, P., Felici, G., Festa, P., Fiscon, G., Weitschek, E.: Integer programming models for feature selection: new extensions and a randomized solution algorithm. Eur. J. Oper. Res. 250, 389–399 (2016)
Article MATH MathSciNet Google Scholar
Garey, M.R., Johnson, D.S.: Computers and Intractability: A Guide to the Theory of NP-Completeness. Freeman W.H, New York (1979)
MATH Google Scholar
Bertolazzi, P., Felici, G., Lancia, G.: Biological data mining. In: Chen, J.K., Lonardi, S. (eds.) Application of Feature Selection and Classification to Computational Molecular Biology, pp. 257–294. Chapman & Hall, London (2010)
Google Scholar
Boros, E., Ibaraki, T., Makino, K.: Logical analysis of binary data with missing bits. Artif. Intell. 107, 219–263 (1999)
Article MATH MathSciNet Google Scholar
Fiscon, G., Weitschek, E., Cella, E., Lo Presti, A., Giovanetti, M., Babakir-Mina, M., Ciotti, M., Ciccozzi, M., Pierangeli, A., Bertolazzi, P., Felici, G.: MISSEL: a method to identify a large number of small species-specific genomic subsequences and its application to viruses classification. BioData Min. (2016) (to appear)
Berretta, R., Mendes, A., Moscato, P.: Integer programming models and algorithms for molecular classification of cancer from microarray data. In: ACSC ’05 Proceedings of the Twenty-eighth Australasian conference on Computer Science, vol 38, pp. 361–370 (2005)
Drukker, C.A., et al.: A prospective evaluation of a breast cancer prognosis signature in the observational RASTER study. Int. J. Cancer 133(4), 929–36 (2013)
Article Google Scholar
Li, D., Xia, H., Li, Z., Hua, L., Li, L.: Identification of novel breast cancer subtype-specific biomarkers by integrating genomics analysis of DNA copy number aberrations and miRNA-mRNA dual expression profiling. BioMed Res. Int. 2015 (2015). doi:10.1155/2015/746970
Goldman, M., Craft, B., Swatloski, T., Ellrott, K., Cline, M., Diekhans, M., Ma, S., Wilks, C., Stuart, J., Haussler, D., Zhu, J.: The UCSC Cancer Genomics Browser: update 2013. Nucleic Acids Res. 41(Database Issue), 949–954 (2012). doi:10.1093/nar/gks1008
Google Scholar
Tian, F., Wang, Y., Seiler, M., Hu, Z.: Functional characterization of breast cancer using pathway profiles. BMC Med. Genom. 7(1), 45 (2014). doi:10.1186/1755-8794-7-45
Gautier, L., Cope, L., Bolstad, B.M., Irizarry, R.A.: Affy analysis of Affymetrix GeneChip data at the probe level. Bioinformatics 20(3), 1367–4803 (2004). doi:10.1093/bioinformatics/btg405
Article Google Scholar
Student: The probable error of a mean. Biometrika, 6(1), 1–25 (1908). doi:10.1093/biomet/6.1.1
Jiang, P., Du, W., Wu, M.: Regulation of the pentose phosphate pathway in cancer. Protein Cell 5(8), 592–602 (2014)
Article Google Scholar
Hoppertona, K.E., Duncana, R.E., Bazineta, R.P., Archera, M.C.: Fatty acid synthase plays a role in cancer metabolism beyond providing fatty acids for phospholipid synthesis or sustaining elevations in glycolytic activity. Exp. Cell Res. 320(2), 302–310 (2014)
Article Google Scholar
Argiles, J., Costelli, P., Carbo, N., LopezSoriano, F.: Branched-chain amino acid catabolism and cancer cachexia (review). Oncol. Rep. (1996). doi:10.3892/or.3.4.687
Google Scholar
Birk, J.U., Lone, S., Susanne, T., Britta, H., Anja, N., Inge, B., Mef, N.: Mismatch repair defective breast cancer in the hereditary nonpolyposis colorectal cancer syndrome. Breast Cancer Res. Treat. 120(3), 777–782 (2010)
Article Google Scholar
Abdel-Fatah, Tarek M.A., Perry, C., Arora, A., Thompson, N., Doherty, R., Moseley, P.M., Green, A.R., Chan, S.Y.T., Ellis, I.O., Madhusudan, S.: Is there a role for base excision repair in estrogen/estrogen receptor-driven breast cancers. Antioxid. Redox Signal. 21(16), 2262–2268 (2014). doi:10.1089/ars.2014.6077
Article Google Scholar
So, E.Y., Ouchi, T.: The application of Toll like receptors for cancer therapy. Int. J. Biol. Sci. 6(7), 675–681 (2010). doi:10.7150/ijbs.6.675
Article Google Scholar
Patt, D.A., Duan, Z., Fang, S., Hortobagyi, G.N., Giordano, S.H.: Acute myeloid leukemia after adjuvant breast cancer. J. Clin. Oncol. 25, 3871–3876 (2007)
Article Google Scholar
Nielsen, T.O., Parker, J.S., Leung, S., et al.: A comparison of PAM50 intrinsic subtyping with immunohistochemistry and clinical prognostic factors in tamoxifen-treated estrogen receptor-positive breast cancer. Clin. Cancer Res. 16(21), 5222–5232 (2010)
Article Google Scholar
Uchida, N., Suda, T., Ishiguro, K.: Effect of chemotherapy for luminal a breast cancer. Yonago Acta Med. 56(2), 51–56 (2013)
Google Scholar
Prat, A., et al.: Molecular characterization of basal-like and non-basal-like triple-negative breast cancer. Oncologist 18(2), 123–133 (2013)
Article Google Scholar
Ossovskaya, V., et al.: Exploring Molecular Pathways of Triple-Negative Breast Cancer. Genes Cancer 2(9), 870–879 (2011)
Article Google Scholar

Download references

Acknowledgements

This work was funded by: the INTEROMICS Italian flagship project, PON02-00612-3461281 and PON02-00619-3470457; The SysBioNet project, a MIUR initiative from the Italian Roadmap Research Infrastructures 2012; Mario R. Guarracino work has been conducted at National Research University Higher School of Economics and supported by a RSF Grant 14-41-00039.

Author information

Authors and Affiliations

IASI-CNR, Via dei Taurini, 19, 00185, Rome, Italy
Giovanni Felici
ICAR-CNR, Via Pietro Castellino 111, 80131, Naples, Italy
Kumar Parijat Tripathi, Daniela Evangelista & Mario Rosario Guarracino

Authors

Giovanni Felici
View author publications
You can also search for this author in PubMed Google Scholar
Kumar Parijat Tripathi
View author publications
You can also search for this author in PubMed Google Scholar
Daniela Evangelista
View author publications
You can also search for this author in PubMed Google Scholar
Mario Rosario Guarracino
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kumar Parijat Tripathi.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Felici, G., Tripathi, K.P., Evangelista, D. et al. A mixed integer programming-based global optimization framework for analyzing gene expression data. J Glob Optim 69, 727–744 (2017). https://doi.org/10.1007/s10898-017-0530-0

Download citation

Received: 09 June 2015
Accepted: 29 April 2017
Published: 09 May 2017
Issue Date: November 2017
DOI: https://doi.org/10.1007/s10898-017-0530-0

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A mixed integer programming-based global optimization framework for analyzing gene expression data

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

iRDA: a new filter towards predictive, stable, and enriched candidate genes

Mixed-Integer Programming Model for Profiling Disease Biomarkers from Gene Expression Studies

A study on metaheuristics approaches for gene selection in microarray data: algorithms, applications and open challenges

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

A mixed integer programming-based global optimization framework for analyzing gene expression data

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

iRDA: a new filter towards predictive, stable, and enriched candidate genes

Mixed-Integer Programming Model for Profiling Disease Biomarkers from Gene Expression Studies

A study on metaheuristics approaches for gene selection in microarray data: algorithms, applications and open challenges

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation