Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Fast Protein Superfamily Classification Using Principal Component Null Space Analysis

  • Conference paper
Advances in Artificial Intelligence (Canadian AI 2005)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3501))

  • 1241 Accesses

Abstract

The protein family classification problem, which consists of determining the family memberships of given unknown protein sequences, is very important for a biologist for many practical reasons, such as drug discovery, prediction of molecular functions and medical diagnosis. Neural networks and bayesian methods have performed well on the protein classification problem, achieving accuracy ranging from 90% to 98% while running relatively slowly in the learning stage. In this paper, we present a principal component null space analysis (PCNSA) linear classifier to the problem and report excellent results compared to those of neural networks and support vector machines. The two main parameters of PCNSA are linked to the high dimensionality of the dataset used, and were optimized in an exhaustive manner to maximize accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Venter, J.C., Remington, K., Heidelberg, J.F., Halpern, A.L., Rusch, D., Eisen, J.A., Wu, D., Paulsen, I., Nelson, K.E., Nelson, W., Fouts, D.E., Levy, S., Knap, A.H., Lomas, M.W., Nealson, K., White, O., Peterson, J., Hoffman, J., Parsons, R., Baden-Tillson, H., Pfannkoch, C., Rogers, Y.H., Smith, H.O.: Environmental Genome Shotgun Sequencing of the Sargasso Sea. Science 304, 66–74 (2004)

    Article  Google Scholar 

  2. Wu, C.H., Yeh, L.S., Huang, H., Arminski, L., Castro-Alvear, J., Chen, Y., Hu, Z., Kourtesis, P., Ledley, R.S., Suzek, B.E., Vinayaka, C.R., Zhang, J., Barker, W.C.: The Protein Information Resource. Nucleic Acids Res. 31, 345–347 (2003)

    Article  Google Scholar 

  3. Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W., Lipman, D.J.: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 389–402 (1997)

    Article  Google Scholar 

  4. Madera, M., Gough, J.: A comparison of profile hidden Markov model procedures for remote homology detection. Nucleic Acids Res. 30, 4321–4328 (2002)

    Article  Google Scholar 

  5. Wu, C.H., Berry, M., Fung, Y., McLarty, J.: Neural Networks for Full-Scale Protein Sequence Classification: Sequence Encoding with Singular Value Decomposition. Machine Learning 21, 177–193 (1995)

    Google Scholar 

  6. Wang, J., Ma, Q., Shasha, D., Wu, C.: New techniques for extracting features from protein sequences. IBM Systems Journal 40 (2001)

    Google Scholar 

  7. Leslie, C., Eskin, E., Noble, W.S.: The spectrum kernel: a string kernel for SVM protein classification. Pac. Symp. Biocomput., 564–75 (2002)

    Google Scholar 

  8. Vaswani, N.: A Linear Classifier for Gaussian Class Conditional Distributions with Unequal Covariance Matrices. In: Intl. Conference on Pattern Recognition (ICPR), vol. I, p. 240 (2002)

    Google Scholar 

  9. Vaswani, N., Chellappa, R.: Principal Component Null Space Analysis for Image/Video Classification. submitted to IEEE Transactions on Image Processing (2004)

    Google Scholar 

  10. Vaswani, N., Chellappa, R.: Classification Probability Analysis of Principal Component Null Space Analysis. Intl. Conference on Pattern Recognition, ICPR 2004 (2004)

    Google Scholar 

  11. Wu, C., Whitson, G., McLarty, J., Ermongkonchai, A., Chang, T.C.: Protein classification artificial neural system. Protein Sci. 1, 667–677 (1992)

    Article  Google Scholar 

  12. Dayhoff, M., Schwartz, R., Orcutt, B.: A Model of Evolutionary Change in Proteins. Atlas of Protein Sequence and Structure 15, 345–358 (1978)

    Google Scholar 

  13. Hoschek, W.: Uniform, Versatile and Efficient Dense and Sparse Multi-Dimensional Arrays (2000)

    Google Scholar 

  14. Joachims, T., Schölkopf, B., Burges, C.: Making large-Scale SVM Learning Practical. Advances in Kernel Methods - Support Vector Learning. MIT Press, Cambridge (1999)

    Google Scholar 

  15. Rueda, L., Ngom, A.: An Empirical Evaluation of the Classification Error of Two Thresholding Methods for Fisher’s Classifier. In: Arabnia, H.R. (ed.) International Conference on Artifical Intelligence and International Conference on Machine Learning; Models, Technologies and Applications, Las Vegas, Nevada, USA, vol. II, pp. 837–842. CSREA Press (2004)

    Google Scholar 

  16. Zhang, X.: Protein Family Classification Using Multiple-Class Neural Networks. Master’s thesis, University of Windsor (2004)

    Google Scholar 

  17. Karplus, K., Barrett, C., Hughey, R.: Hidden Markov models for detecting remote protein homologies. Bioinformatics 14, 846–856 (1998)

    Article  Google Scholar 

  18. Murzin, A.G., Brenner, S.E., Hubbard, T., Chothia, C.: SCOP: a structural classification of proteins database for the investigation of sequences and structures. J. Mol. Biol. 247, 536–540 (1995)

    Google Scholar 

  19. Bairoch, A., Bucher, P.: PROSITE: recent developments. Nucleic Acids Res. 22, 3583–3589 (1994)

    Article  Google Scholar 

  20. Cappelli, R., Maio, D., Maltoni, D.: Multispace KL for Pattern Representation and Classification. IEEE Transactions on Pattern Analysis and Machine Intelligence 23, 977–996 (2001)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

French, L., Ngom, A., Rueda, L. (2005). Fast Protein Superfamily Classification Using Principal Component Null Space Analysis. In: Kégl, B., Lapalme, G. (eds) Advances in Artificial Intelligence. Canadian AI 2005. Lecture Notes in Computer Science(), vol 3501. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11424918_17

Download citation

  • DOI: https://doi.org/10.1007/11424918_17

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-25864-3

  • Online ISBN: 978-3-540-31952-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics