Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/882082.882100acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
Article

Weave amino acid sequences for protein secondary structure prediction

Published: 13 June 2003 Publication History

Abstract

Given a known protein sequence, predicting its secondary structure can help understand its three-dimensional (tertiary) structure, i.e., the folding. In this paper, we present an approach for predicting protein secondary structures. Different from the existing prediction methods, our approach proposes an encoding schema that weaves physio-chemical information in encoded vectors and a prediction framework that combines the context information with secondary structure segments. We employed Support Vector Machine (SVM) for training the CB513 and RS126 data sets, which are collections of protein secondary structure sequences, through sevenfold cross validation to uncover the structural differences of protein secondary structures. Hereafter, we apply the sliding window technique to test a set of protein sequences based on the group classification learned from the training set. Our approach achieves 77.8% segment overlap accuracy (SOV) and 75.2% three-state overall per-residue accuracy (Q3), which outperform other prediction methods.

References

[1]
P. D. Bank. http://www.rcsb.org/pdb/, 2002.
[2]
C.-C. Chang and C.-J. Lin. LIBSVM: a Library for Support Vector Machines. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm, 2001.
[3]
J. A. Cuff and G. J. Barton. Evaluation and Improvement of Multiple Sequence Methods for Protein Secondary Structure Prediction. Proteins: Struct. Funct. Genet., 34:508--519, 1999.
[4]
H. Drucker, D. Wu, and V. Vapnik. Support Vector Machines for Span Categorization. IEEE Trans. on Neural Networks, 10:1048--1054, 1999.
[5]
M. O. D. (ed). Atlas of Protein Sequence and Structure. National Biomedical Research Foundation (Washington, D. C.), 5, 1972.
[6]
D. Frishman and P. Argos. Knowledge-Based Protein Secondary Structure Assignment. Proteins, 23:566--579, 1995.
[7]
S. Hua and Z. Sun. A. Novel Method of Protein Secondary Structure Prediction with High Segment Overlap Measure: Support Vector Machine Approach. Bioinformatics, 308:397--407, 2001.
[8]
J. Garnier, D. J. Osguthorpe, and B. Robson. Analysis of the Accuracy and Implications of Simple Methods for Predicting the Secondary Structure of Globular Proteins. J. Mol Biol, 120:97--120, 1978.
[9]
W. Kabsch and C. Sander. A Dictionary of Protein Secondary Structure. Biopolymers, 22:2577--2637, 1983.
[10]
J. Moult and et al. Critical Assessment of Methods of Protein Structure Prediction (CASP): Round II. Proteins. supplement 1., 29(S1):2--6, 1997.
[11]
D. Nelson and M. Cox. Lehninger Principles of Biochemistry Amino. Worth Publishers, 2000.
[12]
E. E. Osuna, R. Freund, and F. Girosi. Support Vector Machines: Training and Applications (A. I. Memo1602). MIT A.I.Lab, 1997.
[13]
N. Qian and T. J. Sejnowski. Predicting the Secondary Structure of Globular Proteins Using Neural Network Models. J. Mol. Biol, 202:865--884, 1988.
[14]
H. H. Rashidi and K. L. Buehler. Bioinformatics Basics Applications in Biological Science and Medicine. CRC Press, 2000.
[15]
F. M. Richards and C. E. Kundrot. Identification of Structural Motifs from Protein Coordinate Data: Secondary Structure and First-Level Supersecondary Structure. Proteins, 3:71--84, 1988.
[16]
B. Rost and C. Sander. Prediction of Protein Secondary Structure at Better Than 70% Accuracy. J. Mol Biol, 232:584--599, 1993.
[17]
B. Rost, C. Sander, and R. Schneider. Redefining the Goals of Protein Secondary Structure Prediction. J. Mol Biol, 235:13--26, 1994.
[18]
V. Vapnik. The Nature of Statistical Learning Theory. Springer-Verlag, New York, 1995.
[19]
M. J. Zvelebil, G. J. Barton, W. R. Taylor, and et al. Prediction of Protein Secondary Structure and Active Sites Using the Alignment of Homologous Sequences. J. Mol Biol, 195:957--961, 1987.
[20]
D. Zwillinger, S. G. Krantz, and K. H. Rosen, editors. Standard Mathematical Tables and Formulae (30th edition). CRC Press, 1996.

Cited By

View all
  • (2011)Peptidase Detection and Classification Using Enhanced Kernel Methods with Feature Selection5th International Conference on Practical Applications of Computational Biology & Bioinformatics (PACBB 2011)10.1007/978-3-642-19914-1_4(23-30)Online publication date: 2011
  • (2008)SCPRED: Accurate prediction of protein structural class for sequences of twilight-zone similarity with predicting sequencesBMC Bioinformatics10.1186/1471-2105-9-2269:1Online publication date: 1-May-2008
  • (2007)Prediction of protein structural class for the twilight zone sequencesBiochemical and Biophysical Research Communications10.1016/j.bbrc.2007.03.164357:2(453-460)Online publication date: Jun-2007
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
DMKD '03: Proceedings of the 8th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery
June 2003
103 pages
ISBN:9781450374224
DOI:10.1145/882082
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 June 2003

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. SVM
  2. encoding schema
  3. protein secondary structure
  4. protein structure prediction

Qualifiers

  • Article

Conference

DMKD03
Sponsor:

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)0
Reflects downloads up to 20 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2011)Peptidase Detection and Classification Using Enhanced Kernel Methods with Feature Selection5th International Conference on Practical Applications of Computational Biology & Bioinformatics (PACBB 2011)10.1007/978-3-642-19914-1_4(23-30)Online publication date: 2011
  • (2008)SCPRED: Accurate prediction of protein structural class for sequences of twilight-zone similarity with predicting sequencesBMC Bioinformatics10.1186/1471-2105-9-2269:1Online publication date: 1-May-2008
  • (2007)Prediction of protein structural class for the twilight zone sequencesBiochemical and Biophysical Research Communications10.1016/j.bbrc.2007.03.164357:2(453-460)Online publication date: Jun-2007
  • (2007)Use of Artificial Neural Networks and Effects of Amino Acid Encodings in the Membrane Protein Prediction ProblemProgress in Pattern Recognition10.1007/978-1-84628-945-3_4(37-46)Online publication date: 2007
  • (2007)Prediction of protein secondary structure content for the twilight zone sequencesProteins: Structure, Function, and Bioinformatics10.1002/prot.2152769:3(486-498)Online publication date: 10-Jul-2007
  • (2006)Prediction of the Number of Helices for the Twilight Zone Proteins2006 IEEE Symposium on Computational Intelligence and Bioinformatics and Computational Biology10.1109/CIBCB.2006.330972(1-7)Online publication date: Sep-2006
  • (2005)Prediction of secondary protein structure content from primary sequence alone – a feature selection based approachProceedings of the 4th international conference on Machine Learning and Data Mining in Pattern Recognition10.1007/11510888_33(334-345)Online publication date: 9-Jul-2005

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media