Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Robust SVM-Based Biomarker Selection with Noisy Mass Spectrometric Proteomic Data

  • Conference paper
Applications of Evolutionary Computing (EvoWorkshops 2006)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 3907))

Included in the following conference series:

Abstract

Computational analysis of mass spectrometric (MS) proteomic data from sera is of potential relevance for diagnosis, prognosis, choice of therapy, and study of disease activity. To this aim, feature selection techniques based on machine learning can be applied for detecting potential biomarkes and biomaker patterns. A key issue concerns the interpretability and robustness of the output results given by such techniques. In this paper we propose a robust method for feature selection with MS proteomic data. The method consists of the sequentail application of a filter feature selection algorithm, RELIEF, followed by multiple runs of a wrapper feature selection technique based on support vector machines (SVM), where each run is obtained by changing the class label of one support vector. Frequencies of features selected over the runs are used to identify features which are robust with respect to perturbations of the data. This method is tested on a dataset produced by a specific MS technique, called MALDI-TOF MS. Two classes have been artificially generated by spiking. Moreover, the samples have been collected at different storage durations. Leave-one-out cross validation (LOOCV) applied to the resulting dataset, indicates that the proposed feature selection method is capable of identifying highly discriminatory proteomic patterns.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Bradley, A.P.: The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognition 30(6), 1145–1159 (1997)

    Article  Google Scholar 

  2. Cristianini, N., Shawe-Taylor, J.: Support Vector machines. Cambridge Press, Cambridge (2000)

    Google Scholar 

  3. Diamandis, E.P.: Analysis of serum proteomic patterns for early cancer diagnosis: Drawing attention to potential problems. Journal of the National Cancer Institute 96(5), 353–356 (2004)

    Article  Google Scholar 

  4. Issaq, H.J., et al.: SELDI-TOF MS for diagnostic proteomics. Anal. Chem 75(7), 148A–155A (2003)

    Article  Google Scholar 

  5. Petricoin, E.F., et al.: Serum proteomic patterns for detection of prostate cancer. Journal of the National Cancer Institute 94(20), 1576–1578 (2002)

    Google Scholar 

  6. Petricoin, E.F., et al.: Use of proteomic patterns in serum to identify ovarian cancer. The Lancet 359(9306), 572–577 (2002)

    Article  Google Scholar 

  7. Qu, Y., et al.: Boosted decision tree analysis of surface-enhanced laser desorption/ ionization mass spectral serum profiles discriminates prostate cancer from noncancer patients. Clin. Chem 48(10), 1835–1843 (2002)

    Google Scholar 

  8. Zhu, W., et al.: Detection of cancer-specific markers amid massive mass spectral data. PNAS 100(25), 14666–14671 (2003)

    Article  MATH  Google Scholar 

  9. Evgeniou, T., Pontil, M., Elisseeff, A.: Leave one out error, stability, and generalization of voting combinations of classifiers. Mach. Learn. 55(1), 71–97 (2004)

    Article  MATH  Google Scholar 

  10. Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. Machine Learning 3, 1157–1182 (2003); Special Issue on variable and feature selection

    Article  MATH  Google Scholar 

  11. Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene selection for cancer classification using support vector machines. Mach. Learn. 46(1-3), 389–422 (2002)

    Article  MATH  Google Scholar 

  12. John, G.H., Kohavi, R., Pfleger, K.: Irrelevant features and the subset selection problem. In: International Conference on Machine Learning, pp. 121–129 (1994)

    Google Scholar 

  13. Jong, K., Marchiori, E., Sebag, M., van der Vaart, A.: Feature selection in proteomic pattern data with support vector machines. In: IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology (2004)

    Google Scholar 

  14. Kira, K., Rendell, L.A.: The feature selection problem: Traditional methods and a new algorithm. In: Tenth National Conference on artificial intelligence, pp. 129–134 (1992)

    Google Scholar 

  15. Li, J., Zhang, Z., Rosenzweig, J., Wang, Y.Y., Chan, D.W.: Proteomics and bioinformatics approaches for identification of serum biomarkers to detect breast cancer. Clinical Chemistry 48(8), 1296–1304 (2002)

    Google Scholar 

  16. Lie, H., Motoda, H. (eds.): Feature Extraction, Construction and Selection: a Data Mining Perspective. International Series in Engineering and Computer Science. Kluwer, Dordrecht (1998)

    Google Scholar 

  17. Liu, H., Li, J., Wong, L.: A comparative study on feature selection and classification methods using gene expression profiles and proteomic patterns. Genome Informatics 13, 51–60 (2002)

    Google Scholar 

  18. Marchiori, E., Heegaard, N.H.H., West-Nielsen, M., Jimenez, C.R.: Feature selection for classification with proteomic data of mixed quality. In: Proceedings of the 2005 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology, pp. 385–391 (2005)

    Google Scholar 

  19. Marshall, E.: Getting the noise out of gene arrays. Science 306(5696), 630–631 (2004)

    Article  Google Scholar 

  20. Oh, I.S., Lee, J.-S., Moon, B.-R.: Local search-embedded genetic algorithms for feature selection. In: 16 th International Conference on Pattern Recognition (ICPR 2002). IEEE Press, Los Alamitos (2002)

    Google Scholar 

  21. Ransohoff, D.F.: Lessons from controversy: Ovarian cancer screening and serum proteomics. Journal of the National Cancer Institute 97, 315–319 (2005)

    Article  Google Scholar 

  22. Raymer, M.L., Punch, W.F., Goodman, E.D., Kuhn, L.A., Jain, A.K.: Dimensionality reduction using genetic algorithms. IEEE Transactions on Evolutionary Computation 4(2), 164–171 (2000)

    Article  Google Scholar 

  23. Rendell, L.A., Kira, K.: A practical approach to feature selection. In: International Conference on machine learning, pp. 249–256 (1992)

    Google Scholar 

  24. Michiels, S., Koscielny, S., Hill, C.: Prediction of cancer outcome with microarrays: a multiple random validation strategy. The Lancet 365(9458), 488–492 (2005)

    Article  Google Scholar 

  25. Vapnik, V.N.: Statistical Learning Theory. John Wiley & Sons, Chichester (1998)

    MATH  Google Scholar 

  26. West-Nielsen, M., Hogdall, E.V., Marchiori, E., Hogdall, C.K., Schou, C., Heegaard, N.H.H.: Sample handling for mass spectrometric proteomic investigations of human sera. Analytical Chemistry 11(16), 5114–5123 (2005)

    Article  Google Scholar 

  27. Xing, E.P.: Feature selection in microarray analysis. In: A Practical Approach to Microarray Data Analysis, Kluwer Academic Publishers, Dordrecht (2003)

    Google Scholar 

  28. Yu, L., Liu, H.: Feature selection for high-dimensional data: A fast correlationbased filter solution. In: ICML, pp. 856–863 (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Marchiori, E., Jimenez, C.R., West-Nielsen, M., Heegaard, N.H.H. (2006). Robust SVM-Based Biomarker Selection with Noisy Mass Spectrometric Proteomic Data. In: Rothlauf, F., et al. Applications of Evolutionary Computing. EvoWorkshops 2006. Lecture Notes in Computer Science, vol 3907. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11732242_8

Download citation

  • DOI: https://doi.org/10.1007/11732242_8

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-33237-4

  • Online ISBN: 978-3-540-33238-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics