Abstract
Tumor classification is one of the most vital technologies for cancer diagnosis. Due to the high dimensionality, gene selection (finding a small, closely related gene set to accurately classify tumor) is an important step for improving gene expression data classification performance. Traditional rough set model as a classical attribute reduction method deals with discrete data only. As for the gene expression data containing real-value or noisy data, they are usually employed by a discrete preprocessing, which may result in poor classification accuracy. In this paper, a novel neighborhood rough sets and entropy measure-based gene selection with Fisher score for tumor classification is proposed, which has the ability of dealing with real-value data whilst maintaining the original gene classification information. First, the Fisher score method is employed to eliminate irrelevant genes to significantly reduce computation complexity. Next, some neighborhood entropy-based uncertainty measures are investigated for handling the uncertainty and noisy of gene expression data. Moreover, some of their properties are derived and the relationships among these measures are established. Finally, a joint neighborhood entropy-based gene selection algorithm with the Fisher score is presented to improve the classification performance of gene expression data. The experimental results under an instance and several public gene expression data sets prove that the proposed method is very effective for selecting the most relevant genes with high classification accuracy.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Apolloni J, Leguizamon G, Alba E (2016) Two hybrid wrapper-filter feature selection algorithms applied to high-dimensional microarray experiments. Appl Soft Comput 38:922–932
Aziz R, Verma CK, Srivastava N (2016) A fuzzy based feature selection from independent component subspace for machine learning classification of microarray data. Genom Data 8:4–15
Bhola A, Singh S (2018) Gene selection using high dimensional gene expression data: An appraisal. Curr Bioinform 13(2):225–233
Chen HM, Li TR, Cai Y, Luo C, Fujitac H (2016) Parallel attribute reduction in dominance-based neighborhood rough set. Inf Sci 373:351–368
Chen YM, Zhang ZJ, Zheng JZ, Ma Y, Xue Y (2017) Gene selection for tumor classification using neighborhood rough sets and entropy measures. J Biomed Inform 67:59–68
Das AK, Sengupta S, Bhattacharyya S (2018) A group incremental feature selection for classification using rough set theory based genetic algorithm. Appl Soft Comput 65:400–411
Dong H, Li T, Ding R, Sun J (2018) A novel hybrid genetic algorithm with granular information for feature selection and optimization. Appl Soft Comput 65:33–46
Fan XD, Zhao WD, Wang CZ, Huang Y (2018) Attribute reduction based on max-decision neighborhood rough set model. Knowl-Based Syst 151:16–23
Garcia-Torres M, Gomez-Vela F, Melian-Batista B, Moreno-Vega JM (2016) High-dimensional feature selection via feature grouping: A variable neighborhood search approach. Inf Sci 326:102–118
Greenman CD (2012) Haploinsufficient gene selection in cancer. Science 337(6090):47–48
Hancer E, Xue B, Zhang MJ (2018) Differential evolution for filter feature selection based on information theory and feature ranking. Knowl-Based Syst 140:103–119
Hasanloei MAV, Sheikhpour R, Sarram MA, Sheikhpour E, Sharifi H (2018) A combined Fisher and Laplacian score for feature selection in QSAR based drug design using compounds with known and unknown activities. J Comput-Aided Mater 32(1):375–384
Hu L, Gao WF, Zhao K, Zhang P, Wang F (2018) Feature selection considering two types of feature relevancy and feature interdependency. Expert Syst Appl 93:423–434
Hu J, Pedrycz W, Wang GY, Wang K (2016) Rough sets in distributed decision information systems. Knowl-Based Syst 94:13–22
Hu QH, Pan W, An S, Ma PJ, Wei JM (2010) An efficient gene selection technique for cancer recognition based on neighborhood mutual information. Int J Mach Learn Cyb 1(1-4):63–74
Hu QH, Yu DR, Liu JF, Wu CX (2008) Neighborhood rough set based heterogeneous feature subset selection. Inf Sci 178(18):3577–3594
Huang XJ, Zhang L, Wang BJ, Li FZ, Zhang Z (2018) Feature clustering based support vector machine recursive feature elimination for gene selection. Appl Intell 48(2):594–607
Islam AKMT, Jeong BS, Bari ATMG, Lim CG, Jeon SH (2015) MapReduce based parallel gene selection method. Appl Intell 42(1):147–156
Ivica S, Jana K, Dragi K, Saso D (2018) HMC-ReliefF: Feature ranking for hierarchical multi-label classification. Comput Sci Inf Syst 15(1):187–209
Li JG, Su L, Pang ZN (2015) A filter feature selection method based on MFA score and redundancy excluding and it’s application to tumor gene expression data analysis. Interdiscip Sci 7(3):391–396
Lin HY (2018) Reduced gene subset selection based on discrimination power boosting for molecular classification. Knowl-Based Syst 142:181–191
Liu Y, Huang WL, Jiang YL, Zeng ZY (2014) Quick attribute reduct algorithm for neighborhood rough set model. Inf Sci 271:65–81
Lin YJ, Hu QH, Liu JH, Chen JK, Duan J (2016) Multi-label feature selection based on neighborhood mutual information. Appl Soft Comput 38:244–256
Lyu HQ, Wan MX, Han JQ, Liu RL, Wang C (2017) A filter feature selection method based on the maximal information coefficient and gram-schmidt orthogonalization for biomedical data mining. Comput Biol Med 89:264–274
Pawlak Z (1982) Rough sets. Int J Comput Inform Sci 11(4):341–356
Qian YH, Liang XY, Wang Q, Liang JY, Liu B, Skowronef A, Yao YY, Ma JM, Dang CY (2018) Local rough set: a solution to rough data analysis in big data. Int J Approx Reason 97:38–63
Ramos J, Castellanos-Garzon JA, de Paz JF, Corchado JM (2018) A data mining framework based on boundary-points for gene selection from DNA-microarrays: Pancreatic Ductal Adenocarcinoma as a case study. Eng Appl Artif Intel 70:92–108
Sun L, Xu JC (2014) Information entropy and mutual information-based uncertainty measures in rough set theory. Appl Math Inform Sci 8(3):1973–1985
Sun L, Xu JC (2014) Feature selection using mutual information based uncertainty measures for tumor classification. Bio-Med Mater Eng 24:763–770
Sun L, Xu JC, Xu TH (2014) Information entropy and information granulation-based uncertainty measures in incomplete information systems. Appl Math Inform Sci 8(3):2073–2083
Sun L, Xu JC, Tian Y (2012) Feature selection using rough entropy-based uncertainty measures in incomplete decision systems. Knowl-Based Syst 36:206–216
Sun L, Xu JC, Wang W, Yin Y (2016) Locally linear embedding and neighborhood rough set-based gene selection for gene expression data classification. Genet Mol Res 15(2):15038990. gmr
Sun L, Xu JC, Yin Y (2015) Principal component-based feature selection for tumor classification. Bio-Med Mater Eng 26:S2011–S2017
Sun L, Zhang XY, Xu JC, Wang W, Liu RN (2018) A Gene selection approach based on the fisher linear discriminant and the neighborhood rough set. Bioengineered 9(1):144–151
Sun SQ, Peng QK, Zhang XK (2016) Global feature selection from microarray data using Lagrange multipliers. Knowl-Based Syst 110:267–274
The dataset is download from kent ridge bio-medical dataset. http://datam.i2r.a-star.edu.sg/datasets/krbd/
The dataset is download from gene expression model selector. http://www.gems-system.org
Urbanowicz RJ, Meeker M, La Cava W, Olsona RS, Moore JH (2018) Relief-based feature selection: introduction and review. J Biomed Inform 85:189–203
Venkataramana L, Jacob SG, Ramadoss R (2018) Parallelized classification of cancer sub-types from gene expression profiles using recursive gene selection. Stud Inform Control 27(1):215–224
Wang CZ, He Q, Shao MW, Xu YY, Hu QH (2017) A unified information measure for general binary relations. Knowl-Based Syst 135:18–28
Wang CZ, Hu QH, Wang XZ, Chen DG, Qian YH, Dong Z (2017) Feature selection based on neighborhood discrimination index. IEEE T Neur Net Lear 29(6):2986–2999
Wang CZ, Qi YL, Shao MW, Hu QH, Chen DG, Qian YH, Lin YJ (2017) A fitting model for feature selection with fuzzy rough sets. IEEE T Fuzzy Syst 25(3):741–753
Wang SQ, Kong W, Deng J, Gao S, Zeng WM (2018) Hybrid feature selection algorithm mRMR-ICA for cancer classification from microarray gene expression data. Comb Chem High T Scr 21(5):420–430
Wen LY, Min F, Wang SY (2017) A two-stage discretization algorithm based on information entropy. Appl Intell 47(3):1169–1185
Xu FF, Miao DQ, Wei L (2009) Fuzzy-rough attribute reduction via mutual information with an application to cancer classification. Comput Math Appl 57(5):1010–1017
Zhang BW, Min F, Ciucci D (2015) Representative- based classification through covering-based neighborhood rough sets. Appl Intell 43(3):840–854
Zhang XH, Miao DQ, Liu CH, Le ML (2016) Constructive methods of rough approximation operators and multigranulation rough sets. Knowl-Based Syst 91:114–125
Zhao H, Wang P, Hu QH (2016) Cost-sensitive feature selection based on adaptive neighborhood granularity with multi-level confidence. Inform Sciences 366:134–149
Zheng SF, Liu W X (2011) An experimental comparison of gene selection by lasso and dantzig selector for cancer classification. Comput Biol Med 41(10):1033–1040
Acknowledgments
This work was partially supported by the National Natural Science Foundation of China (Grants 61772176, 61402153, 61672332, 61370169, and 61472042), the China Postdoctoral Science Foundation (Grant 2016M602247), the Plan for Scientific Innovation Talent of Henan Province (Grant 184100510003), the Key Project of Science and Technology Department of Henan Province (Grants 182102210362), the Young Scholar Program of Henan Province (Grant 2017GGJS041), the Key Scientific and Technological Project of Xinxiang City (Grant CXGG17002), and the Ph.D. Research Foundation of Henan Normal University (Grants qd15132, qd15129).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Sun, L., Zhang, XY., Qian, YH. et al. Joint neighborhood entropy-based gene selection method with fisher score for tumor classification. Appl Intell 49, 1245–1259 (2019). https://doi.org/10.1007/s10489-018-1320-1
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-018-1320-1