A New Test System for Stability Measurement of Marker Gene Selection in DNA Microarray Data Analysis

Xiong, Fei; Huang, Heng; Ford, James; Makedon, Fillia S.; Pearlman, Justin D.

doi:10.1007/11573036_41

Fei Xiong¹⁸,
Heng Huang¹⁸,
James Ford¹⁸,
Fillia S. Makedon¹⁸ &
…
Justin D. Pearlman¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 3746))

Included in the following conference series:

Panhellenic Conference on Informatics

2040 Accesses
1 Citations

Abstract

Microarray gene expression data contains informative features that reflect the critical processes controlling prominent biological functions. Feature selection algorithms have been used in previous biomedical research to find the “marker” genes whose expression value change corresponds to the most eminent difference between specimen classes. One problem encountered in such analysis is the imbalance between very large numbers of genes versus relatively fewer specimen samples. A common concern, therefore, is “overfitting” the data and deriving a set of marker genes with low stability over the entire set of possible specimens. To address this problem, we propose a new test environment in which synthetic data is perturbed to simulate possible variations in gene expression values. The goal is for the generated data to have appropriate properties that match natural data, and that are appropriate for use in testing the sensitivity of feature selection algorithms and validating the robustness of selected marker genes. In this paper, we evaluate a statistically-based resampling approach and a Principal Components Analysis (PCA)-based linear noise distribution approach. Our results show that both methods generate reasonable synthetic data and that the signal/noise rate (with variation weights at 5%, 10%, 20% and 30%) measurably impacts the classification accuracy and the marker genes selected. Based on these results, we identify the most appropriate marker gene selection and classification techniques for each type and level of noise we modeled.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Gene Selection with the δ-Sequence Method

Heuristic algorithms for feature selection under Bayesian models with block-diagonal covariance structure

Article Open access 21 March 2018

Statistical Methodologies for Analyzing Genomic Data

References

Medvedovic, M., Yeung, K., Bumgarner, R.E.: Bayesian mixture model based clustering of replicated microarray data. Bioinformatics 20(8), 1222–1232 (2004)
Article Google Scholar
Dougherty, E.R., Barrera, J., Brun, M., Kim, S., Cesar, R.M., Chen, Y., Bittner, M., Trent, J.M.: Inference from clustering with application to gene-expression microarrays. J. Comput. Biol. 9, 105–126 (2002)
Article Google Scholar
Yeung, K.Y., Fraley, C., Murua, A., Raftery, A.E., Ruzzo, W.L.: Model-based clustering and data transformations for gene expression data. Bioinformatics 17, 977–987 (2001)
Article Google Scholar
Kerr, M.k., Churchill, G.A.: Bootstrapping cluster analysis: assessing the reliability of conclusions from microarray experiments. Proc. Natl. Acad. Sci. USA 98, 8961–8965 (2001)
Article MATH Google Scholar
Pollard, K.S., Van der Laan, M.J.: Multiple testing for gene expression data: an investigation of null distributions with consequences for the permutation test. In: Proceedings of the 2003 International MultiConference in Computer Science and Engineering, METMBS 2003 Conference, pp. 3–9 (2003)
Google Scholar
Ge, Y., Dudoit, S., Speed, T.P.: Resampling-based multiple testing for microarray data analysis. Technical Report 633, Department of Statistics, UC Berkeley (2003)
Google Scholar
Troyanskaya, O., Cantor, M., Sherlock, G., Brown, P., Hastie, T., Tibshirani, R., Botstein, D., Altman, R.B.: Missing value estimation methods for DNA microarrays. Bioinformatics 17, 520–525 (2001)
Article Google Scholar
Zhou, X., Wang, X., Dougherty, E.R.: Missing-value estimation using linear and non-linear regression with Bayesian gene selection. Bioinformatics 19(17), 2302–2307 (2003)
Article Google Scholar
Kim, H., Golub, G., Park, H.: Missing value estimation for DNA microarray gene expression data: local least squares imputation. Bioinformatics 21(2), 187–198 (2005)
Article Google Scholar
McShane, L., Radmacher, M., Freidlin, B., Yu, R., Li, M., Simon, R.: Methods for assessing reproducibility of clustering patterns observerd in analyses of microarray data. Bioinformatics 18(11), 1462–1469 (2002)
Article Google Scholar
Fu, L., Youn, E.: Improving Reliability of Gene Selection From Microarray Functional Genomics Data. IEEE Transactions on Information Technology in Biomedicine 7(3), 191–196 (2003)
Article Google Scholar
Kohavi, R., John, G.: Wrapper for feature subset selection. Artifical Intelligence 97(1-2), 273–324 (1997)
Article MATH Google Scholar
Kononenko, I.: Estimating attributes: analysis and extensions of relief. In: Proceedings of the European conference on machine learning on Machine Learning, pp. 171–182. Springer, New York (1994)
Google Scholar
Mitchell, T.: Machine Learning. McGraw-Hill, New York (1997)
MATH Google Scholar
Liu, H., Setiono, R.: Chi2: feature selection and discretization of numeric attributes. In: Proceedings of the Seventh International Conference on Tools with Artifical Intelligence, pp. 388–391 (1995)
Google Scholar
Wang, Y., Makedon, F., Ford, J., Pearlman, J.: HykGene: a hybrid approach for selecting marker genes for phenotype classification using microarray gene expression data. Bioinformatics 21(8), 1530–1537 (2004)
Article Google Scholar
Massart, D., Vandeginste, B., Deming, S., Michotte, Y., Kaufman, L.: The k-nearest neighbor method. In: Chemometrics: A Textbook (Data Handling in Science and Technology, vol. 2, pp. 395–397. Elsevier Science, New York (1988)
Google Scholar
Vapnik, V.: Statistical Learning Theory. Wiley Interscience, New York (1998)
MATH Google Scholar
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Francisco (1993)
Google Scholar
Stone, M.: Cross-Validatory choice and assessment of statistical predictions. J. R. Stat. Soc. B36(1), 111–147 (1974)
MATH Google Scholar
Jolliffe, I.: Principal Component Analysis, 2nd edn. Springer, New York (2002)
MATH Google Scholar
Gordon, G., Jensen, R., Hsiao, L., Gullans, S., Blumenstock, J., Ramaswamy, S., Richards, W., Sugarbaker, D., Bueno, R.: Translation of Microarray Data into Clinically Relevant Cancer Diagnostic Tests Using Gege Expression Ratios in Lung Cancer And Mesothelioma. Cancer Research 62, 4963–4967 (2002)
Google Scholar
Spira, A., Beane, J., Shah, V., Liu, G., Schembri, F., Yang, X., Palma, J., Brody, J.: Effects of cigarette smoke on the human airway epithelial cell transcriptome. Proc. Natl. Acad. Sci. USA 101(27), 10143–10148 (2004)
Article Google Scholar
Witten, I., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann, San Francisco (1999)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Dartmouth College, Hanover, NH, 03755, USA
Fei Xiong, Heng Huang, James Ford & Fillia S. Makedon
Advanced Imaging Center, Dartmouth-Hitchcock Medical Center, Lebanon, NH, 03766, USA
Justin D. Pearlman

Authors

Fei Xiong
View author publications
You can also search for this author in PubMed Google Scholar
Heng Huang
View author publications
You can also search for this author in PubMed Google Scholar
James Ford
View author publications
You can also search for this author in PubMed Google Scholar
Fillia S. Makedon
View author publications
You can also search for this author in PubMed Google Scholar
Justin D. Pearlman
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer and Communication Engineering, University of Thessaly, Glavani 37, 382 21, Volos, Greece
Panayiotis Bozanis
Department of Computer and Communication Engineering, University of Thessaly, 382 21, Volos, Greece
Elias N. Houstis

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Xiong, F., Huang, H., Ford, J., Makedon, F.S., Pearlman, J.D. (2005). A New Test System for Stability Measurement of Marker Gene Selection in DNA Microarray Data Analysis. In: Bozanis, P., Houstis, E.N. (eds) Advances in Informatics. PCI 2005. Lecture Notes in Computer Science, vol 3746. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11573036_41

Download citation

DOI: https://doi.org/10.1007/11573036_41
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29673-7
Online ISBN: 978-3-540-32091-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

A New Test System for Stability Measurement of Marker Gene Selection in DNA Microarray Data Analysis

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Gene Selection with the δ-Sequence Method

Heuristic algorithms for feature selection under Bayesian models with block-diagonal covariance structure

Statistical Methodologies for Analyzing Genomic Data

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

A New Test System for Stability Measurement of Marker Gene Selection in DNA Microarray Data Analysis

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Gene Selection with the δ-Sequence Method

Heuristic algorithms for feature selection under Bayesian models with block-diagonal covariance structure

Statistical Methodologies for Analyzing Genomic Data

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation