Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article

Gene selection using locality sensitive laplacian score

Published: 01 November 2014 Publication History

Abstract

Gene selection based on microarray data, is highly important for classifying tumors accurately. Existing gene selection schemes are mainly based on ranking statistics. From manifold learning standpoint, local geometrical structure is more essential to characterize features compared with global information. In this study, we propose a supervised gene selection method called locality sensitive Laplacian score (LSLS), which incorporates discriminative information into local geometrical structure, by minimizing local within-class information and maximizing local between-class information simultaneously. In addition, variance information is considered in our algorithm framework. Eventually, to find more superior gene subsets, which is significant for biomarker discovery, a two-stage feature selection method that combines the LSLS and wrapper method (sequential forward selection or sequential backward selection) is presented. Experimental results of six publicly available gene expression profile data sets demonstrate the effectiveness of the proposed approach compared with a number of state-of-the-art gene selection methods.

References

[1]
T. R. Golub, D. K. Slonim, P. Tamayo, C. Huard, M. Gaasenbeek, J. P. Mesirov, H. Coller, M. L. Loh, J. R. Downing, M. A. Caligiuri, C. D. Bloomfield, and E. S. Lander, "Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring," Science, vol. 286, no. 5439, pp. 531-537, 1999.
[2]
D. Singh, P. G. Febbo, K. Ross, D. G. Jackson, J. Manola, C. Ladd, P. Tamayo, A. A. Renshaw, A. V. D'Amico, J. P. Richie, E. S. Lander, M. Loda, P. W. Kantoff, T. R. Golub, and W. R. Sellers, "Gene expression correlates of clinical prostate cancer behavior," Cancer Cell, vol. 1, no. 2, pp. 203-209, 2002.
[3]
M. West, "Bayesian factor regression models in the large p, small n paradigm," Bayesian Statist., vol. 7, no. 2003, pp. 723- 732, 2003.
[4]
C. Lazar, J. Taminau, S. Meganck, D. Steenhoff, A. Coletta, C. Molter, V. de Schaetzen, R. Duque, H. Bersini, and A. Nowé, "A survey on filter techniques for feature selection in gene expression microarray analysis," IEEE/ACM Trans. Comput. Biol. Bioinf., vol. 9, no. 4, pp. 1106-1119, Jul./Aug. 2012.
[5]
M. Robnik-Šikonja and I. Kononenko, "Theoretical and empirical analysis of reliefF and rreliefF," Mach. Learn., vol. 53, no. 1-2, pp. 23-69, 2003.
[6]
H. Peng, F. Long, and C. Ding, "Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy," IEEE Trans. Pattern Anal. Mach. Intell., vol. 27, no. 8, pp. 1226-1238, Aug. 2005.
[7]
C. Ding and H. Peng, "Minimum redundancy feature selection from microarray gene expression data," J. Bioinf. Comput. Biol., vol. 3, no. 2, pp. 185-205, 2005.
[8]
Y. Wang, Q.-F. Wu, C. Chen, L.-Y. Wu, X.-Z. Yan, S.-G. Yu, X.-S. Zhang, and F.-R. Liang, "Revealing metabolite biomarkers for acupuncture treatment by linear programming based feature selection," BMC Syst. Biol., vol. 6, no. 1, p. S15, 2012.
[9]
X. He, D. Cai, and P. Niyogi, "Laplacian score for feature selection," in Proc. Adv. Neural Inf. Process. Syst., 2005, pp. 507-514.
[10]
Z. Zhao, L. Wang, H. Liu, and J. Ye, "On similarity preserving feature selection," IEEE Trans. Knowl. Data Eng., vol. 25, no. 3, pp. 619-632, Mar. 2011.
[11]
F. Nie, H. Huang, X. Cai, and C. H. Ding, "Efficient and robust feature selection via joint? 2, 1-norms minimization," in Proc. Adv. Neural Inf. Process. Syst., 2010, pp.1813-1821.
[12]
X. Ren, Y. Wang, L. Chen, X.-S. Zhang, and Q. Jin, "ellipsoidFN: A tool for identifying a heterogeneous set of cancer biomarkers based on gene expressions," Nucleic Acids Res., vol. 41, no. 4, p. e53, 2013.
[13]
L.-K. Luo, D.-F. Huang, L.-J. Ye, Q.-F. Zhou, G.-F. Shao, and H. Peng, "Improving the computational efficiency of recursive cluster elimination for gene selection," IEEE/ACM Trans. Comput. Biol. Bioinf., vol. 8, no. 1, pp. 122-129, Jan.-Mar. 2011.
[14]
K.-B. Duan, J. C. Rajapakse, H. Wang, and F. Azuaje, "Multiple SVM-RFE for gene selection in cancer classification with expression data," IEEE Trans. NanoBiosci., vol. 4, no. 3, pp. 228-234, Sep. 2005.
[15]
P. A. Mundra and J. C. Rajapakse, "SVM-RFE with MRMR filter for gene selection," IEEE Trans. NanoBiosci., vol. 9, no. 1, pp. 31- 37, Mar. 2010.
[16]
B. Li, C.-H. Zheng, and D.-S. Huang, "Locally linear discriminant embedding: An efficient method for face recognition," Pattern Recognit., vol. 41, no. 12, pp. 3813-3821, 2008.
[17]
X. He, S. Yan, Y. Hu, P. Niyogi, and H.-J. Zhang, "Face recognition using laplacianfaces," IEEE Trans. Pattern Anal. Mach. Intell., vol. 27, no. 3, pp. 328-340, Mar. 2005.
[18]
X. He, D. Cai, S. Yan, and H.-J. Zhang, "Neighborhood preserving embedding," in Proc. IEEE 10th Int. Conf. Comput. Vis., 2005, vol. 2, pp.1208-1213.
[19]
X. Ren, Y. Wang, X.-S. Zhang, and Q. Jin, "iPcc: A novel feature extraction method for accurate disease class discovery and prediction," Nucleic Acids Res., vol. 41, no. 14, p. e143, 2013.
[20]
D. Cai, X. He, K. Zhou, J. Han, and H. Bao, "Locality sensitive discriminant analysis," in Proc. 20th Int. Joint Conf. Artif. Intell., 2007, pp. 708-713.
[21]
F. R. Chung, Spectral Graph Theory. Providence, RI, USA: AMS, 1997, vol. 92.
[22]
F. Nie, S. Xiang, Y. Jia, C. Zhang, and S. Yan, "Trace ratio criterion for feature selection," in Proc. 23rd Nat. Conf. Artif. Intell., 2008, pp. 671-676.
[23]
D. Cai, X. He, and J. Han, "SRDA: An efficient algorithm for large-scale discriminant analysis," IEEE Trans. Knowl. Data Eng., vol. 20, no. 1, pp. 1-12, Jan. 2008.
[24]
[Online]. Available: http://levis.tongji.edu.cn/gzli/data/mirror-kentridge. html, accessed 19 Jul. 2013.
[25]
G. J. Gordon, R. V. Jensen, L.-L. Hsiao, S. R. Gullans, J. E. Blumenstock, S. Ramaswamy, W. G. Richards, D. J. Sugarbaker, and R. Bueno, "Translation of microarray data into clinically relevant cancer diagnostic tests using gene expression ratios in lung cancer and mesothelioma," Cancer Res., vol. 62, no. 17, pp. 4963-4967, 2002.
[26]
A. A. Alizadeh, M. B. Eisen, R. E. Davis, C. Ma, I. S. Lossos, A. Rosenwald, J. C. Boldrick, H. Sabet, T. Tran, X. Yu, J. I. Powell, L. Yang, G. E. Marti, T. Moore, J. J. Hudson, L. Lu, D. B. Lewis, R. Tibshirani, G. Sherlock, W. C. Chan, T. C. Greiner, D. D. Weisenburger, J. O. Armitage, R. Warnke, R. Levy, W. Wilson, M. R. Grever, J. C. Byrd, D. Botstein, P. O. Brown, and L. M. Staudt, "Distinct types of diffuse large b-cell lymphoma identified by gene expression profiling," Nature, vol. 403, no. 6769, pp. 503-511, 2000.
[27]
S. A. Armstrong, J. E. Staunton, L. B. Silverman, R. Pieters, M. L. den Boer, M. D. Minden, S. E. Sallan, E. S. Lander, T. R. Golub, and S. J. Korsmeyer, "MLL translocations specify a distinct gene expression profile that distinguishes a unique leukemia," Nature Genetics, vol. 30, no. 1, pp. 41-47, 2001.
[28]
J. Khan, J. S. Wei, M. Ringner, L. H. Saal, M. Ladanyi, F. Westermann, F. Berthold, M. Schwab, C. R. Antonescu, C. Peterson, and P. S. Meltzer, "Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks," Nature Med., vol. 7, no. 6, pp. 673-679, 2001.
[29]
E. Theodorsson-Norheim, "Kruskal-wallis test: Basic computer program to perform nonparametric one-way analysis of variance and multiple comparisons on ranks of several independent samples," Comput. Methods Programs Biomed., vol. 23, no. 1, pp. 57-62, 1986.
[30]
C.-C. Chang and C.-J. Lin, "LIBSVM: A library for support vector machines," ACM Trans. Intell. Syst. Technol., vol. 2, no. 3, p. 27, Apr. 2011.
[31]
S.-L. Wang, Y.-H. Zhu, W. Jia, and D.-S. Huang, "Robust classification method of tumor subtype by using correlation filters," IEEE/ACM Trans. Comput. Biol. Bioinf., vol. 9, no. 2, pp. 580-591, Mar. 2012.
[32]
A. P. Bradley, "The use of the area under the ROC curve in the evaluation of machine learning algorithms," Pattern Recognit., vol. 30, no. 7, pp. 1145-1159, 1997.
[33]
Y. Piao, M. Piao, K. Park, and K. H. Ryu, "An ensemble correlation-based gene selection algorithm for cancer classification with gene expression data," Bioinformatics, vol. 28, no. 24, pp. 3306- 3315, 2012.
[34]
Y. Tang, Y.-Q. Zhang, N. V. Chawla, and S. Krasser, "SVMs modeling for highly imbalanced classification," IEEE Trans. Syst., Man, Cybern. B, Cybern., vol. 39, no. 1, pp. 281-288, Feb. 2009.
[35]
D. Cai, X. He, and J. Han, "Document clustering using locality preserving indexing," IEEE Trans. Knowl. Data Eng., vol. 17, no. 12, pp. 1624-1637, Dec. 2005.
[36]
M. Belkin and P. Niyogi, "Laplacian eigenmaps for dimensionality reduction and data representation," Neural Comput., vol. 15, no. 6, pp. 1373-1396, 2003.
[37]
H. Masnadi-Shirazi and N. Vasconcelos, "Cost-sensitive boosting," IEEE Trans. Pattern Anal. Mach. Intell., vol. 33, no. 2, pp. 294-309, Feb. 2011.

Cited By

View all
  • (2024)A Machine Learning-Based Wrapper Method for Feature SelectionInternational Journal of Data Warehousing and Mining10.4018/IJDWM.35204120:1(1-33)Online publication date: 24-Sep-2024
  • (2021)Methods to transform microarray data for cancer prediction2016 IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB)10.1109/CIBCB.2016.7758104(1-7)Online publication date: 10-Mar-2021
  • (2017)hMuLabIEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)10.1109/TCBB.2016.260350714:5(1173-1180)Online publication date: 1-Sep-2017
  • Show More Cited By

Index Terms

  1. Gene selection using locality sensitive laplacian score

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image IEEE/ACM Transactions on Computational Biology and Bioinformatics
        IEEE/ACM Transactions on Computational Biology and Bioinformatics  Volume 11, Issue 6
        November/December 2014
        290 pages
        ISSN:1545-5963
        • Editor:
        • Ying Xu
        Issue’s Table of Contents

        Publisher

        IEEE Computer Society Press

        Washington, DC, United States

        Publication History

        Published: 01 November 2014
        Accepted: 22 May 2014
        Revised: 13 April 2014
        Received: 25 January 2014
        Published in TCBB Volume 11, Issue 6

        Author Tags

        1. feature selection
        2. gene expression profile analysis
        3. local margin maximization
        4. manifold learning

        Qualifiers

        • Article

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)4
        • Downloads (Last 6 weeks)0
        Reflects downloads up to 15 Feb 2025

        Other Metrics

        Citations

        Cited By

        View all
        • (2024)A Machine Learning-Based Wrapper Method for Feature SelectionInternational Journal of Data Warehousing and Mining10.4018/IJDWM.35204120:1(1-33)Online publication date: 24-Sep-2024
        • (2021)Methods to transform microarray data for cancer prediction2016 IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB)10.1109/CIBCB.2016.7758104(1-7)Online publication date: 10-Mar-2021
        • (2017)hMuLabIEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)10.1109/TCBB.2016.260350714:5(1173-1180)Online publication date: 1-Sep-2017
        • (2017)Significance and Functional Similarity for Identification of Disease GenesIEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)10.1109/TCBB.2016.259816314:6(1419-1433)Online publication date: 1-Nov-2017
        • (2016)Supervised, Unsupervised, and Semi-Supervised Feature SelectionIEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)10.1109/TCBB.2015.247845413:5(971-989)Online publication date: 1-Sep-2016
        • (2015)On Efficient Feature Ranking Methods for High-Throughput Data AnalysisIEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)10.1109/TCBB.2015.241579012:6(1374-1384)Online publication date: 1-Nov-2015
        • (2015)Gene selection for microarray data classification using a novel ant colony optimizationNeurocomputing10.1016/j.neucom.2015.05.022168:C(1024-1036)Online publication date: 30-Nov-2015

        View Options

        Login options

        Full Access

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Figures

        Tables

        Media

        Share

        Share

        Share this Publication link

        Share on social media