Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Bi-level and Bi-objective p-Median Type Problems for Integrative Clustering: Application to Analysis of Cancer Gene-Expression and Drug-Response Data

Published: 01 January 2018 Publication History
  • Get Citation Alerts
  • Abstract

    Recent advances in high-throughput technologies have given rise to collecting large amounts of multidimensional heterogeneous data that provide diverse information on the same biological samples. Integrative analysis of such multisource datasets may reveal new biological insights into complex biological mechanisms and therefore remains an important research field in systems biology. Most of the modern integrative clustering approaches rely on independent analysis of each dataset and consensus clustering, probabilistic or statistical modeling, while flexible distance-based integrative clustering techniques are sparsely covered. We propose two distance-based integrative clustering frameworks based on bi-level and bi-objective extensions of the p-median problem. A hybrid branch-and-cut method is developed to find global optimal solutions to the bi-level p-median model. As to the bi-objective problem, an $\varepsilon$-constraint algorithm is proposed to generate an approximation to the Pareto optimal set. Every solution found by any of the frameworks corresponds to an integrative clustering. We present an application of our approaches to integrative analysis of NCI-60 human tumor cell lines characterized by gene expression and drug activity profiles. We demonstrate that the proposed mathematical optimization-based approaches outperform some state-of-the-art and traditional distance-based integrative and non-integrative clustering techniques.

    References

    [1]
    M. R. Stratton, P. J. Campbell, and P. A. Futreal, "The cancer genome," Nature, vol. 458, pp. 719-724, 2009.
    [2]
    E. R. Holzinger and M. D. Ritchie, "Integrating heterogeneous high-throughput data for meta-dimensional pharmacogenomics and disease-related studies," Pharmacogenomics, vol. 13, no. 2, pp. 213-222, 2012.
    [3]
    M. D. Ritchie, E. R. Holzinger, R. Li, S. A. Pendergrass, and D. Kim, "Methods of integrating data to uncover genotype-phenotype interactions," Nat. Rev. Genetics, vol. 16, pp. 85-97, 2015.
    [4]
    H. He, D. Lin, J. Zhang, Y. Wang, and H.-W. Deng, Biostatistics, Data Mining and Computational Modeling. Dordrecht, The Netherlands: Springer, 2016, pp. 23-57.
    [5]
    V. N. Kristensen, O. C. Lingjærde, H. G. Russnes, H. K. M. Vollan, A. Frigessi, and A.-L. Børresen-Dale, "Principles and methods of integrative genomic analyses in cancer," Nat. Rev. Cancer, vol. 14, no. 5, pp. 299-313, 2014.
    [6]
    S. Richardson, G. C. Tseng, and W. Sun, "Statistical methods in integrative genomics," Annu. Rev. Statist. Appl., vol. 3, no. 1, pp. 181-209, 2016.
    [7]
    M. Saqi, et al., Systems Medicine: The Future of Medical Genomics, Healthcare, and Wellness. New York, NY, USA: Springer, 2016, pp. 43-60.
    [8]
    D. Wang and J. Gu, "Integrative clustering methods of multiomics data for molecule-based cancer classifications," Quantitative Biol., vol. 4, no. 1, pp. 58-67, 2016.
    [9]
    Y. Wei, "Integrative analyses of cancer data:A review from a statistical perspective," Cancer Informat., vol. 14, pp. 173-181, May 2015. [Online]. Available: www.la-press.com/integrative-analyses-ofcancer-data-a-review-from-a-statistical-perspe-article-a4839
    [10]
    Z. Kutalik, J. S. Beckmann, and S. Bergmann, "A modular approach for integrative analysis of large-scale gene-expression and drug-response data," Nat. Biotechnology, vol. 26, no. 5, pp. 531-539, 2008.
    [11]
    E. F. Lock and D. B. Dunson, "Bayesian consensus clustering," Bioinformatics, vol. 29, no. 20, pp. 2610-2616, 2013.
    [12]
    A. Strehl and J. Ghosh, "Cluster ensembles--A knowledge reuse framework for combining multiple partitions," J. Mach. Learn. Res., vol. 3, pp. 583-617, Mar. 2003.
    [13]
    N. Nguyen and R. Caruana, "Consensus clusterings," in Proc. 7th IEEE Int. Conf. Data Mining, 2007, pp. 607-612.
    [14]
    P. Xanthopoulos, "A review on consensus clustering methods," in Optimization in Science and Engineering, T. M. Rassias, C. A. Floudas, and S. Butenko, Eds. New York, NY, USA: Springer, 2014, pp. 553-566.
    [15]
    J. Azimi and X. Fern, "Adaptive cluster ensemble selection," in Proc. 21st Int. Joint Conf. Artif. Intell., 2009, pp. 992-997.
    [16]
    A. L. N. Fred and A. K. Jain, "Combining multiple clusterings using evidence accumulation," IEEE Trans. Pattern Anal. Mach. Intell., vol. 27, no. 6, pp. 835-850, Jun. 2005.
    [17]
    T. Grotkjær, O. Winther, B. Regenberg, J. Nielsen, and L. K. Hansen, "Robust multi-scale clustering of large DNA microarray datasets with the consensus algorithm," Bioinf., vol. 22, no. 1, pp. 58-67, 2006.
    [18]
    A. Topchy, A. K. Jain, and W. Punch, "Clustering ensembles: Models of consensus and weak partitions," IEEE Trans. Pattern Anal. Mach. Intell., vol. 27, no. 12, pp. 1866-1881, Dec. 2005.
    [19]
    Y. Senbabaoglu, G. Michailidis, and J. Z. Li, "Critical limitations of consensus clustering in class discovery," Sci. Rep., vol. 4, 2014, Art. no. 6207.
    [20]
    Z. Yu, H.-S. Wong, and H. Wang, "Graph-based consensus clustering for class discovery from gene expression data," Bioinf., vol. 23, no. 21, pp. 2888-2896, 2007.
    [21]
    N. Iam-On, T. Boongoen, and S. Garrett, "LCE: A link-based cluster ensemble method for improved gene expression data analysis," Bioinf., vol. 26, no. 12, pp. 1513-1519, 2010.
    [22]
    N. Iam-On, T. Boongoen, S. Garrett, and C. Price, "A link-based approach to the cluster ensemble problem," IEEE Trans. Pattern Anal. Mach. Intell., vol. 33, no. 12, pp. 2396-2409, Dec. 2011.
    [23]
    N. Iam-On and T. Boongoen, "Diversity-driven generation of link-based cluster ensemble and application to data classification," Expert Syst. Appl., vol. 42, no. 21, pp. 8259-8273, 2015.
    [24]
    P. Mahata, "Exploratory consensus of hierarchical clusterings for melanoma and breast cancer," IEEE/ACM Trans. Comput. Biol. Bioinf., vol. 7, no. 1, pp. 138-152, Jan./Feb. 2010.
    [25]
    L. Zheng, T. Li, and C. Ding, "A framework for hierarchical ensemble clustering," ACM Trans. Knowl. Discovery Data, vol. 9, no. 2, pp. 9:1-9:23, 2014.
    [26]
    Z. Yu, H.-S. Wong, J. You, Q. Yang, and H. Liao, "Knowledge based cluster ensemble for cancer discovery from biomolecular data," IEEE Trans. Nanobiosci., vol. 10, no. 2, pp. 76-85, Jun. 2011.
    [27]
    Z. Yu, et al., "Adaptive fuzzy consensus clustering framework for clustering analysis of cancer data," IEEE/ACM Trans. Comput. Biol. Bioinf., vol. 12, no. 4, pp. 887-901, Jul./Aug. 2015.
    [28]
    Z. Yu, et al., "Double selection based semi-supervised clustering ensemble for tumor clustering from gene expression profiles," IEEE/ACM Trans. Comput. Biol. Bioinf., vol. 11, no. 4, pp. 727-740, Jul./Aug. 2014.
    [29]
    Z. Yu, et al., "Incremental semi-supervised clustering ensemble for high dimensional data clustering," IEEE Trans. Knowl. Data Eng., vol. 28, no. 3, pp. 701-714, Mar. 2016.
    [30]
    Z. Yu, L. Li, J. Liu, J. Zhang, and G. Han, "Adaptive noise immune cluster ensemble using affinity propagation," IEEE Trans. Knowl. Data Eng., vol. 27, no. 12, pp. 3176-3189, Dec. 2015.
    [31]
    R. Shen, A. B. Olshen, and M. Ladanyi, "Integrative clustering of multiple genomic data types using a joint latent variable model with application to breast and lung cancer subtype analysis," Bioinf., vol. 25, no. 22, pp. 2906-2912, 2009.
    [32]
    W. Wang, V. Baladandayuthapani, J. S. Morris, B. M. Broom, G. Manyam, and K.-A. Do, "iBAG: Integrative Bayesian analysis of high-dimensional multiplatform genomics data," Bioinf., vol. 29, no. 2, pp. 149-159, 2013.
    [33]
    P. Kirk, J. E. Griffin, R. S. Savage, Z. Ghahramani, and D. L. Wild, "Bayesian correlated clustering to integrate multiple datasets," Bioinf., vol. 28, no. 24, pp. 3290-3297, 2012.
    [34]
    M. Denis and M. G. Tadesse, "Evaluation of hierarchical models for integrative genomic analyses," Bioinf., vol. 32, no. 5, pp. 738- 746, 2016.
    [35]
    J. Chen and S. Zhang, "Integrative analysis for identifying joint modular patterns of gene-expression and drug-response data," Bioinf., vol. 32, pp. 1724-1732, 2016.
    [36]
    C. Wang, R. Machiraju, and K. Huang, "Breast cancer patient stratification using a molecular regularized consensus clustering method," Methods, vol. 67, no. 3, pp. 304-312, 2014.
    [37]
    K. H. Hellton and M. Thoresen, "Integrative clustering of high-dimensional data with joint and individual clusters," Biostatistics, vol. 17, no. 3, pp. 537-548, 2016.
    [38]
    J. Handl and J. Knowles, Multi-Objective Clustering and Cluster Validation. Berlin, Germany: Springer, 2006, pp. 21-47.
    [39]
    J. Handl, D. B. Kell, and J. Knowles, "Multiobjective optimization in bioinformatics and computational biology," IEEE/ACM Trans. Comput. Biol. Bioinf., vol. 4, no. 2, pp. 279-292, Apr.-Jun. 2007.
    [40]
    J.-H. Chang, K.-B. Hwang, and B.-T. Zhang, "Analysis of gene expression profiles and drug activity patterns by clustering and Bayesian network learning," in Methods of Microarray Data Analysis II: Papers from CAMDA'01. Boston, MA, USA: Springer, 2002, pp. 169-184.
    [41]
    E. Fersini, E. Messina, F. Archetti, and C. Manfredotti, "Combining gene expression profiles and drug activity patterns analysis: A relational clustering approach," J. Math. Modelling Algorithms, vol. 9, no. 3, pp. 275-289, 2010.
    [42]
    E. Fersini, E. Messina, and F. Archetti, "A p-median approach for predicting drug response in tumour cells," BMC Bioinf., vol. 15, no. 1, pp. 1-19, 2014.
    [43]
    A. Ferligoj and V. Batagelj, "Direct multicriteria clustering algorithms," J. Classification, vol. 9, no. 1, pp. 43-61, 1992.
    [44]
    A. Mukhopadhyay, U. Maulik, S. Bandyopadhyay, and C. A. C. Coello, "Survey of multiobjective evolutionary algorithms for data mining: Part II," IEEE Trans. Evol. Comput., vol. 18, no. 1, pp. 20-35, Feb. 2014.
    [45]
    J. Handl and J. Knowles, "An evolutionary approach to multiobjective clustering," IEEE Trans. Evol. Comput., vol. 11, no. 1, pp. 56-76, Feb. 2007.
    [46]
    A. Mukhopadhyay, U. Maulik, and S. Bandyopadhyay, "An interactive approach to multiobjective clustering of gene expression patterns," IEEE Trans. Biomed. Eng., vol. 60, no. 1, pp. 35-41, Jan. 2013.
    [47]
    S. Acharya and S. Saha, "Identifying co-expressed miRNAs using multiobjective optimization," in Proc. IEEE Int. Conf. Inf. Technol., 2014, pp. 245-250.
    [48]
    S. Bandyopadhyay, S. Saha, U. Maulik, and K. Deb, "A simulated annealing-based multiobjective optimization algorithm: AMOSA," IEEE Trans. Evol. Comput., vol. 12, no. 3, pp. 269-283, Jun. 2008.
    [49]
    A. Mukhopadhyay, S. Ray, and M. De, "Detecting protein complexes in a PPI network: A gene ontology based multi-objective evolutionary approach," Mol. BioSystems, vol. 8, pp. 3036-3048, 2012.
    [50]
    R. H. Shoemaker, "The NCI60 human tumour cell line anticancer drug screen," Nat. Rev. Cancer, vol. 6, no. 10, pp. 813-823, 2006.
    [51]
    M. S. Daskin and K. L. Maass, " The p-median problem," in Location Science, G. Laporte, S. Nickel, and F. Saldanha da Gama, Eds. Cham, Switzerland: Springer, 2015, pp. 21-45.
    [52]
    N. Mladenovi_c, J. Brimberg, P. Hansen, and J. Moreno-P_erez, "The p-median problem: A survey of metaheuristic approaches," Eur. J. Oper. Res., vol. 179, no. 3, pp. 927-939, 2007.
    [53]
    J. Reese, " Solution methods for the p-median problem: An annotated bibliography," Networks, vol. 28, no. 3, pp. 125-142, 2006.
    [54]
    P. Hansen and B. Jaumard, "Cluster analysis and mathematical programming," Math. Program., vol. 79, no. 1-3, pp. 191-215, 1997.
    [55]
    K. Ichikawa and S. Morishita, "A simple but powerful heuristic method for accelerating k-means clustering of large-scale data in life science," IEEE/ACM Trans. Comput. Biol. Bioinf., vol. 11, no. 4, pp. 681-692, Jul./Aug. 2014.
    [56]
    P. Avella, M. Boccia, S. Salerno, and I. Vasilyev, "An aggregation heuristic for large scale p-median problem," Comput. Oper. Res., vol. 39, no. 7, pp. 1625-1632, 2012.
    [57]
    S. García, M. Labbé, and A. Marín, "Solving large p-median problems with a radius formulation," INFORMS J. Comput., vol. 23, no. 4, pp. 546-556, 2011.
    [58]
    P. Hansen, J. Brimberg, D. Urosevic, and N. Mladenovic, "Solving large p-median clustering problems by primal-dual variable neighborhood search," Data Mining Knowl. Discovery, vol. 19, no. 3, pp. 351-375, 2009.
    [59]
    A. Banerjee, S. Merugu, I. S. Dhillon, and J. Ghosh, "Clustering with bregman divergences," J. Mach. Learn. Res., vol. 6, pp. 1705- 1749, 2005.
    [60]
    P. Avella, A. Sassano, and I. Vasilyev, "Computational study of large-scale p-median problems," Math. Program., vol. 109, no. 1, pp. 89-114, 2007.
    [61]
    H. F. Köhn, D. Steinley, and M. J. Brusco, "The p-median model as a tool for clustering psychological data," Psychological Methods, vol. 15, no. 1, pp. 87-95, 2010.
    [62]
    A. V. Ushakov, I. L. Vasilyev, and T. V. Gruzdeva, "A computational comparison of the p-median clustering and k-means," Int. J. Artif. Intell., vol. 13, no. 1, pp. 229-242, 2015.
    [63]
    S. Benati and S. García, "A mixed integer linear model for clustering with variable selection," Comput. Operations Res., vol. 43, pp. 280-285, 2014.
    [64]
    K. Miettinen, Nonlinear Multiobjective Optimization. Boston, MA, USA: Kluwer, 1999.
    [65]
    J.-F. Bérubé, M. Gendreau, and J.-Y. Potvin, "An exact ¿-constraint method for bi-objective combinatorial optimization problems: Application to the traveling salesman problem with profits," Eur. J. Oper. Res., vol. 194, no. 1, pp. 39-50, 2009.
    [66]
    E. Carrizosa, A. Ushakov, and I. Vasilyev, "Threshold robustness in discrete facility location problems: A bi-objective approach," Optimization Lett., vol. 9, no. 7, pp. 1297-1314, 2015.
    [67]
    S. Dempe, Foundations of Bilevel Programming. Dordrecht, The Netherlands: Kluwer, 2002.
    [68]
    V. V. Kalashnikov, S. Dempe, G. A. Pérez-Valdés, N. I. Kalashnykova, and J.-F. Camacho-Vallejo, "Bilevel programming and applications," Math. Problems Eng., vol. 2015, no. 5, 2015, Art. no. 310301.
    [69]
    I. Vasil'ev, X. Klimentova, and Y. Kochetov, "New lower bounds for the facility location problem with clients' preferences," Comput. Math. Math. Phys., vol. 49, no. 6, pp. 1010-1020, 2009.
    [70]
    L. Cánovas, S. García, M. Labbé, and A. Marín, "A strengthened formulation for the simple plant location problem with order," Operations Res. Lett., vol. 35, no. 2, pp. 141-150, 2007.
    [71]
    M. Conforti, G. Cornu_ejols, and G. Zambelli, Integer Programming. Cham, Switzerland: Springer, 2014.
    [72]
    L. A. Wolsey, Integer Programming. New York, NY, USA: Wiley-Interscience, 1998.
    [73]
    I. Vasilyev, X. Klimentova, and M. Boccia, "Polyhedral study of simple plant location problem with order," Operations Res. Lett., vol. 41, no. 2, pp. 153-158, 2013.
    [74]
    A. T. Murray and R. L. Church, "Applying simulated annealing to location-planning models," J. Heuristics, vol. 2, no. 1, pp. 31-53, 1996.
    [75]
    P. Hansen and N. Mladenovi, "Variable neighborhood search for the p-median," Location Sci., vol. 5, no. 4, pp. 207-226, 1997.
    [76]
    C. Iyigun and A. Ben-Israel, "A generalized Weiszfeld method for the multi-facility location problem," Operations Res. Lett., vol. 38, no. 3, pp. 207-214, 2010.
    [77]
    U. Scherf, et al., "A gene expression database for the molecular pharmacology of cancer," Nat. Genetics, vol. 24, no. 3, pp. 236-244, 2000.
    [78]
    H. Liu, et al., "mRNA and microRNA expression profiles of the NCI-60 integrated with drug activities," Mol. Cancer Therapeutics, vol. 9, no. 5, pp. 1080-1091, 2010.
    [79]
    P. A. Jaskowiak, R. J. G. B. Campello, and I. G. Costa, "Proximity measures for clustering gene expression microarray data:Avalidation methodology and a comparative analysis," IEEE/ACM Trans. Comput. Biol. Bioinf., vol. 10, no. 4, pp. 845-857, Jul./Aug. 2013.
    [80]
    T. Zhang, R. Ramakrishnan, M. Livny, "BIRCH: A new data clustering algorithm and its applications," Data Mining Knowl. Discovery, vol. 1, no. 2, pp. 141-182, 1997.
    [81]
    Y. Zhao and G. Karypis, "Criterion functions for document clustering: Experiments and analysis," Dept. Comput. Sci./Army HPC Res. Center Minneapolis, Univ. Minnesota, Minneapolis, MN, USA, Tech. Rep. #01-40, 2002.
    [82]
    R. Mari, "Integer bilevel linear programming problems: New results and applications," Ph.D. dissertation, Doctoral School Inf. Sci. Technol. Commun., Sapienza Univ., Rome, Italy, 2014.
    [83]
    J. Fülöp, "On the equivalence between a linear bilevel programming problem and linear optimization over the efficient set," Laboratory Operations Res. Decision Syst., Comput. Autom. Inst., Hungarian Academy Sci., Tech. Rep. WP 93-1, 1993.

    Index Terms

    1. Bi-level and Bi-objective p-Median Type Problems for Integrative Clustering: Application to Analysis of Cancer Gene-Expression and Drug-Response Data
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image IEEE/ACM Transactions on Computational Biology and Bioinformatics
        IEEE/ACM Transactions on Computational Biology and Bioinformatics  Volume 15, Issue 1
        January 2018
        352 pages

        Publisher

        IEEE Computer Society Press

        Washington, DC, United States

        Publication History

        Published: 01 January 2018
        Published in TCBB Volume 15, Issue 1

        Qualifiers

        • Research-article

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • 0
          Total Citations
        • 38
          Total Downloads
        • Downloads (Last 12 months)8
        • Downloads (Last 6 weeks)0
        Reflects downloads up to 27 Jul 2024

        Other Metrics

        Citations

        View Options

        Get Access

        Login options

        Full Access

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media