Fast Protein Superfamily Classification Using Principal Component Null Space Analysis

French, Leon; Ngom, Alioune; Rueda, Luis

doi:10.1007/11424918_17

Leon French²⁰,
Alioune Ngom²⁰ &
Luis Rueda²⁰

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3501))

Included in the following conference series:

Conference of the Canadian Society for Computational Studies of Intelligence

1241 Accesses

Abstract

The protein family classification problem, which consists of determining the family memberships of given unknown protein sequences, is very important for a biologist for many practical reasons, such as drug discovery, prediction of molecular functions and medical diagnosis. Neural networks and bayesian methods have performed well on the protein classification problem, achieving accuracy ranging from 90% to 98% while running relatively slowly in the learning stage. In this paper, we present a principal component null space analysis (PCNSA) linear classifier to the problem and report excellent results compared to those of neural networks and support vector machines. The two main parameters of PCNSA are linked to the high dimensionality of the dataset used, and were optimized in an exhaustive manner to maximize accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Rore: robust and efficient antioxidant protein classification via a novel dimensionality reduction strategy based on learning of fewer features

Article Open access 04 December 2024

A novel method for achieving an optimal classification of the proteinogenic amino acids

Article Open access 18 September 2020

Hierarchical feature extraction based on discriminant analysis

Article 04 February 2019

References

Venter, J.C., Remington, K., Heidelberg, J.F., Halpern, A.L., Rusch, D., Eisen, J.A., Wu, D., Paulsen, I., Nelson, K.E., Nelson, W., Fouts, D.E., Levy, S., Knap, A.H., Lomas, M.W., Nealson, K., White, O., Peterson, J., Hoffman, J., Parsons, R., Baden-Tillson, H., Pfannkoch, C., Rogers, Y.H., Smith, H.O.: Environmental Genome Shotgun Sequencing of the Sargasso Sea. Science 304, 66–74 (2004)
Article Google Scholar
Wu, C.H., Yeh, L.S., Huang, H., Arminski, L., Castro-Alvear, J., Chen, Y., Hu, Z., Kourtesis, P., Ledley, R.S., Suzek, B.E., Vinayaka, C.R., Zhang, J., Barker, W.C.: The Protein Information Resource. Nucleic Acids Res. 31, 345–347 (2003)
Article Google Scholar
Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W., Lipman, D.J.: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 389–402 (1997)
Article Google Scholar
Madera, M., Gough, J.: A comparison of profile hidden Markov model procedures for remote homology detection. Nucleic Acids Res. 30, 4321–4328 (2002)
Article Google Scholar
Wu, C.H., Berry, M., Fung, Y., McLarty, J.: Neural Networks for Full-Scale Protein Sequence Classification: Sequence Encoding with Singular Value Decomposition. Machine Learning 21, 177–193 (1995)
Google Scholar
Wang, J., Ma, Q., Shasha, D., Wu, C.: New techniques for extracting features from protein sequences. IBM Systems Journal 40 (2001)
Google Scholar
Leslie, C., Eskin, E., Noble, W.S.: The spectrum kernel: a string kernel for SVM protein classification. Pac. Symp. Biocomput., 564–75 (2002)
Google Scholar
Vaswani, N.: A Linear Classifier for Gaussian Class Conditional Distributions with Unequal Covariance Matrices. In: Intl. Conference on Pattern Recognition (ICPR), vol. I, p. 240 (2002)
Google Scholar
Vaswani, N., Chellappa, R.: Principal Component Null Space Analysis for Image/Video Classification. submitted to IEEE Transactions on Image Processing (2004)
Google Scholar
Vaswani, N., Chellappa, R.: Classification Probability Analysis of Principal Component Null Space Analysis. Intl. Conference on Pattern Recognition, ICPR 2004 (2004)
Google Scholar
Wu, C., Whitson, G., McLarty, J., Ermongkonchai, A., Chang, T.C.: Protein classification artificial neural system. Protein Sci. 1, 667–677 (1992)
Article Google Scholar
Dayhoff, M., Schwartz, R., Orcutt, B.: A Model of Evolutionary Change in Proteins. Atlas of Protein Sequence and Structure 15, 345–358 (1978)
Google Scholar
Hoschek, W.: Uniform, Versatile and Efficient Dense and Sparse Multi-Dimensional Arrays (2000)
Google Scholar
Joachims, T., Schölkopf, B., Burges, C.: Making large-Scale SVM Learning Practical. Advances in Kernel Methods - Support Vector Learning. MIT Press, Cambridge (1999)
Google Scholar
Rueda, L., Ngom, A.: An Empirical Evaluation of the Classification Error of Two Thresholding Methods for Fisher’s Classifier. In: Arabnia, H.R. (ed.) International Conference on Artifical Intelligence and International Conference on Machine Learning; Models, Technologies and Applications, Las Vegas, Nevada, USA, vol. II, pp. 837–842. CSREA Press (2004)
Google Scholar
Zhang, X.: Protein Family Classification Using Multiple-Class Neural Networks. Master’s thesis, University of Windsor (2004)
Google Scholar
Karplus, K., Barrett, C., Hughey, R.: Hidden Markov models for detecting remote protein homologies. Bioinformatics 14, 846–856 (1998)
Article Google Scholar
Murzin, A.G., Brenner, S.E., Hubbard, T., Chothia, C.: SCOP: a structural classification of proteins database for the investigation of sequences and structures. J. Mol. Biol. 247, 536–540 (1995)
Google Scholar
Bairoch, A., Bucher, P.: PROSITE: recent developments. Nucleic Acids Res. 22, 3583–3589 (1994)
Article Google Scholar
Cappelli, R., Maio, D., Maltoni, D.: Multispace KL for Pattern Representation and Classification. IEEE Transactions on Pattern Analysis and Machine Intelligence 23, 977–996 (2001)
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science, University of Windsor, 401 Sunset Avenue, Windsor, ON, N9B 3P4, Canada
Leon French, Alioune Ngom & Luis Rueda

Authors

Leon French
View author publications
You can also search for this author in PubMed Google Scholar
Alioune Ngom
View author publications
You can also search for this author in PubMed Google Scholar
Luis Rueda
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Département d’informatique et de recherche opérationelle, CP 6128 succ. Centre-Ville, Université de Montréal, H3C 3J7, Montréal, Canada
Balázs Kégl
Département d’informatique et de recherche opérationnelle, Université de Montréal,
Guy Lapalme

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

French, L., Ngom, A., Rueda, L. (2005). Fast Protein Superfamily Classification Using Principal Component Null Space Analysis. In: Kégl, B., Lapalme, G. (eds) Advances in Artificial Intelligence. Canadian AI 2005. Lecture Notes in Computer Science(), vol 3501. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11424918_17

Download citation

DOI: https://doi.org/10.1007/11424918_17
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-25864-3
Online ISBN: 978-3-540-31952-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Fast Protein Superfamily Classification Using Principal Component Null Space Analysis

Abstract

Access this chapter

Preview

Similar content being viewed by others

Rore: robust and efficient antioxidant protein classification via a novel dimensionality reduction strategy based on learning of fewer features

A novel method for achieving an optimal classification of the proteinogenic amino acids

Hierarchical feature extraction based on discriminant analysis

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Fast Protein Superfamily Classification Using Principal Component Null Space Analysis

Abstract

Access this chapter

Preview

Similar content being viewed by others

Rore: robust and efficient antioxidant protein classification via a novel dimensionality reduction strategy based on learning of fewer features

A novel method for achieving an optimal classification of the proteinogenic amino acids

Hierarchical feature extraction based on discriminant analysis

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation