Abstract
Protein mass spectrometry (MS) pattern recognition has recently emerged as a new method for cancer diagnosis. Unfortunately, classification performance may degrade owing to the enormously high dimensionality of the data. This paper investigates the use of Random Projection in protein MS data dimensionality reduction. The effectiveness of Random Projection (RP) is analyzed and compared against Principal Component Analysis (PCA) by using three classification algorithms, namely Support Vector Machine, Feed-forward Neural Networks and K-Nearest Neighbour. Three real-world cancer data sets are employed to evaluate the performances of RP and PCA. Through the investigations, RP method demonstrated better or at least comparable classification performance as PCA if the dimensionality of the projection matrix is sufficiently large. This paper also explores the use of RP as a pre-processing step prior to PCA. The results show that without sacrificing classification accuracy, performing RP prior to PCA significantly improves the computational time.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Perkins, G.L., et al.: Serum Tumor Markers. American Family Physician 68(6), 1075–1082 (2003)
Petricon, E.F., et al.: Use of Proteomic Patterns in Serum to Identify Ovarian Cancer. The Lancet 359, 572–577 (2002)
Dasgupta, S.: Experiments with Random Projections. In: Proc. 16th Conf. Uncertainty in Artificial Intelligence (2000)
Bingham, E., Mannila, H.: Random Projection in Dimensionality Reduction Application to Image and Text Data. Knowledge Discovery and Data Mining, pp. 245–250 (2001)
Levner, I.: Feature Selection and Nearest Centroid Classification for Protein Mass Spectrometry. Bioinformatics 6(68) (2005)
Lilien, R.H., Farid, H., Donald, B.R.: Probabilistic Disease Classification of Expression-Dependent Proteomic Data from Mass Spectrometry of Human Serum. J. of Computational Biology 10(6), 925–946 (2003)
Shen, L., Tan, E.C.: Dimension Reduction-Based Penalized Logistic Regression for Cancer Classification using Microarray Data. IEEE/ACM Trans. on Computational Biology and Bioinformatics 2(2), 166–174 (2005)
Purohit, P.V., Rocke, D.M.: Discriminant Models for High-Throughput Proteomics Mass Spectrometer Data. Proteomics 3, 1699–1703 (2003)
Vempala, S.S.: The Random Projection Method, vol. 65. American Mathematical Society, Providence, RI (2004)
Achlioptas, D.: Database-Friendly Random Projections. In: Symposium on Principles of Database Systems, pp. 274–281 (2001)
Clinical Proteomics Program Databank, National Cancer Institute: http://home.ccr.cancer.gov/ncifdaproteomics/ppatterns.asp
Conrads, T.P., et al.: High-Resolution Serum Proteomic Features for Ovarian Cancer Detection. Endocrine-Related Cancer 11, 163–178 (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Loy, C.C., Lai, W.K., Lim, C.P. (2006). Dimensionality Reduction of Protein Mass Spectrometry Data Using Random Projection. In: King, I., Wang, J., Chan, LW., Wang, D. (eds) Neural Information Processing. ICONIP 2006. Lecture Notes in Computer Science, vol 4233. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11893257_86
Download citation
DOI: https://doi.org/10.1007/11893257_86
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-46481-5
Online ISBN: 978-3-540-46482-2
eBook Packages: Computer ScienceComputer Science (R0)