Abstract
Microarray technology has been developed and applied in different biological context, especially for the purpose of monitoring the expression levels of thousands of genes simultaneously. In this regard, analysis of such data requires sophisticated computational tools. Hence, we confined ourselves to propose a tool for the analysis of microarray data. For this purpose, a feature selection scheme is integrated with the classical supervised classifiers like Support Vector Machine, K-Nearest Neighbor, Decision Tree and Naive Bayes, separately to improve the classification performance, named as Integrated Classifiers. Here feature selection scheme generates bootstrap samples that are used to create diverse and informative features using Principal Component Analysis. Thereafter, such features are multiplied with the original data in order create training and testing data for the classifiers. Final classification results are obtained on test data by computing posterior probability. The performance of the proposed integrated classifiers with respect to their conventional classifiers is demonstrated on 12 microarray datasets. The results show that the integrated classifiers boost the performance up to 25.90% for a dataset, while the average performance gain is 9.74%, over the conventional classifiers. The superiority of the results has also been established through statistical significance test.
S.S. Bhowmick and I. Saha—Contributed equally.
Similar content being viewed by others
References
DeRisi, J.L., Iyer, V.R., Brown, P.O.: Exploring the metabolic and genetic control of gene expression on a genomic scale. Science 278(5338), 680–686 (1997)
Stears, R.L., Martinsky, T., Schena, M., et al.: Trends in microarray analysis. Nat. Med. 9(1), 140–145 (2003)
Valentini, G., Masulli, F.: Ensembles of learning machines. In: Marinaro, M., Tagliaferri, R. (eds.) WIRN 2002. LNCS, vol. 2486, pp. 3–20. Springer, Heidelberg (2002). doi:10.1007/3-540-45808-5_1
Mitra, S., Mitra, P., Pal, S.K.: Evolutionary modular design of rough knowledge-based network using fuzzy attributes. Neurocomputing 36, 45–66 (2001)
Khotanzad, A., Chung, C.: Application of multi-layer perceptron neural networks to vision problems. Neural Comput. Appl. 7(3), 249–259 (1998)
Freund, Y., Schapire, R.E.: A desicion-theoretic generalization of on-line learning and an application to boosting. In: Vitányi, P. (ed.) EuroCOLT 1995. LNCS, vol. 904, pp. 23–37. Springer, Heidelberg (1995). doi:10.1007/3-540-59119-2_166
Jordan, M.I., Jacobs, R.A.: Hierarchical mixture of experts and the EM algorithm. Neural Comput. 6, 181–214 (1994)
Hashem, S.: Optimal linear combination of neural networks. Neural Comput. 10, 519–614 (1997)
Boser, B.E., Guyon, I.M., N.Vapnik, V.: A training algorithm for optimal margin classifiers. In: Proceedings of the Fifth Annual Workshop on Computational Learning Theory, pp. 144–152 (1992)
Sun, S.: Ensembles of feature subspaces for object detection. In: Yu, W., He, H., Zhang, N. (eds.) ISNN 2009. LNCS, vol. 5552, pp. 996–1004. Springer, Heidelberg (2009). doi:10.1007/978-3-642-01510-6_113
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, San Francisco (1993)
Duda, R.O., Hart, P.E.: Pattern Classification and Scene Analysis. Wiley, New York (1973)
Armstrong, S.A., Staunton, J.E., Silverman, L.B., Pieters, R., den Boer, M.L., Minden, M.D., Sallan, S.E., Lander, E.S., Golub, T.R., Korsmeyer, S.J.: MLL translocations specify a distinct gene expression profile that distinguishes a unique leukemia. Nat. Genet. 30(1), 41–47 (2002)
Bhattacharjee, A., Richards, W.G., Staunton, J., Li, C., Monti, S., Vasa, P., Ladd, C., Beheshti, J., Bueno, R., Gillette, M., Loda, M., Weber, G., Mark, E.J., Lander, E.S., Wong, W., Johnson, B.E., Golub, T.R., Sugarbaker, D.J., Meyerson, M.: Classification of human lung carcinomas by mrna expression profiling reveals distinct adenocarcinoma subclasses. Proc. Natl Acad. Sci. 98(24), 13790–13795 (2001)
Chowdary, D., Lathrop, J., Skelton, J., Curtin, K., Briggs, T., Zhang, Y., Yu, J., Wang, Y., Mazumder, A.: Prognostic gene expression signatures can be measured in tissues collected in rnalater preservative. J. Mol. Diagn. 8(1), 31–39 (2006)
Friedman, M.: A comparison of alternative tests of significance for the problem of m rankings. Ann. Math. Stat. 11, 86–92 (1940)
Cohen, J.A.: Coefficient of agreement for nominal scales. Educ. Psychol. Meas. 20(1), 37–46 (1960)
Jardine, N., Sibson, R.: Mathematical Taxonomy. Wiley, New Jersey (1971)
Yeung, K.Y., Ruzzo, W.L.: An empirical study on principal component analysis for clustering gene expression data. Bioinformatics 17, 763–774 (2001)
Saha, I., Rak, B., Bhowmick, S.S., Maulik, U., Bhattacharjee, D., Koch, U., Lazniewski, M., Plewczynski, D.: Binding activity prediction of cyclin-dependent inhibitors. J. Chem. Inf. Model. 55(7), 1469–1482 (2015)
Mazzocco, G., Bhowmick, S.S., Saha, I., Maulik, U., Bhattacharjee, D., Plewczynski, D.: MaER: a new ensemble based multiclass classifier for binding activity prediction of HLA Class II proteins. In: Kryszkiewicz, M., Bandyopadhyay, S., Rybinski, H., Pal, S.K. (eds.) PReMI 2015. LNCS, vol. 9124, pp. 462–471. Springer, Cham (2015). doi:10.1007/978-3-319-19941-2_44
Bhowmick, S.S., Saha, I., Maulik, U., Bhattacharjee, D.: Identification of miRNA signature using next-generation sequencing data of prostate cancer. In: Proceedings of the 3rd International Conference on Recent Advances in Information Technology, pp. 528–533 (2016)
Lancucki, A., Saha, I., Bhowmick, S.S., Maulik, U., Lipinski, P.: A new evolutionary microRNA marker selection using next-generation sequencing data. In: 2016 IEEE Congress on Evolutionary Computation (CEC), pp. 2752–2759 (2016)
Saha, I., Bhowmick, S.S., Geraci, F., Pellegrini, M., Bhattacharjee, D., Maulik, U., Plewczynski, D.: Analysis of next-generation sequencing data of mirna for the prediction of breast cancer. In: Panigrahi, B.K., Suganthan, P.N., Das, S., Satapathy, S.C. (eds.) SEMCCO 2015. LNCS, vol. 9873, pp. 116–127. Springer, Cham (2016). doi:10.1007/978-3-319-48959-9_11
Bhowmick, S.S., Saha, I., Maulik, U., Bhattacharjee, D.: Biomarker identification using next generation sequencing data of RNA. In: 2016 International Conference on Advances in Computing, Communications and Informatics (ICACCI), pp. 299–303 (2016)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Bhowmick, S.S., Saha, I., Rato, L., Bhattacharjee, D. (2017). Integrated Classifier: A Tool for Microarray Analysis. In: Mandal, J., Dutta, P., Mukhopadhyay, S. (eds) Computational Intelligence, Communications, and Business Analytics. CICBA 2017. Communications in Computer and Information Science, vol 776. Springer, Singapore. https://doi.org/10.1007/978-981-10-6430-2_3
Download citation
DOI: https://doi.org/10.1007/978-981-10-6430-2_3
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-6429-6
Online ISBN: 978-981-10-6430-2
eBook Packages: Computer ScienceComputer Science (R0)