Clustering Support Vector Machines and Its Application to Local Protein Tertiary Structure Prediction

He, Jieyue; Zhong, Wei; Harrison, Robert; Tai, Phang C.; Pan, Yi

doi:10.1007/11758525_96

Jieyue He²⁰,
Wei Zhong²¹,
Robert Harrison^21,22,23,
Phang C. Tai²² &
…
Yi Pan²¹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 3992))

Included in the following conference series:

International Conference on Computational Science

1151 Accesses
2 Citations

Abstract

Support Vector Machines (SVMs) are new generation of machine learning techniques and have shown strong generalization capability for many data mining tasks. SVMs can handle nonlinear classification by implicitly mapping input samples from the input feature space into another high dimensional feature space with a nonlinear kernel function. However, SVMs are not favorable for huge datasets with over millions of samples. Granular computing decomposes information in the form of some aggregates and solves the targeted problems in each granule. Therefore, we propose a novel computational model called Clustering Support Vector Machines (CSVMs) to deal with the complex classification problems for huge datasets. Taking advantage of both theory of granular computing and advanced statistical learning methodology, CSVMs are built specifically for each information granule partitioned intelligently by the clustering algorithm. This feature makes learning tasks for each CSVMs more specific and simpler. Moreover, CSVMs built particularly for each granule can be easily parallelized so that CSVMs can be used to handle huge datasets efficiently. The CSVMs model is used for predicting local protein tertiary structure. Compared with the conventional clustering method, the prediction accuracy for local protein tertiary structure has been improved noticeably when the new CSVM model is used. The encouraging experimental results indicate that our new computational model opens a new way to solve the complex classification for huge datasets.

Download to read the full chapter text

Chapter PDF

Granular support vector machine: a review

Article 19 April 2017

Clustering Aided Support Vector Machines

Prediction of phosphorylation sites based on granular support vector machine

Article 27 September 2019

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Agarwal, D.K.: Shrinkage estimator generalizations of proximal support vector machines. In: Proc.of the 8th ACM SIGKDD international conference of knowledge Discovery and data mining, Edmonton, Canada (2002)
Google Scholar
Award, M., Khan, L., Bastani, F., Yen, I.: An Effective Support Vector Machines (SVMs) Performance Using Hierarchical Clustering. In: Proc. of the 16th IEEE International Conference on Tools with Artificial Intelligence (ICTAI 2004) (2004)
Google Scholar
Balcazar, J.L., Dai, Y., Watanabe, O.: Provably Fast Training Algorithms for Support Vector Machines. In: Proc.of the 1stIEEE International Conference on Data mining, pp. 43–50. IEEE Computer Society, Los Alamitos (2001)
Chapter Google Scholar
Berman, H.M., Westbrook, J., Bourne, P.E.: The protein data bank. Nucleic Acids Research 28, 235–242 (2000)
Google Scholar
Bystroff, C., Baker, D.: Prediction of local structure in proteins using a library of sequence-structure motifs. J. Mol. Biol. 281, 565–577 (1998)
Article Google Scholar
Bystroff, C., Thorsson, V., Baker, D.: HMMSTR: A hidden markov model for local sequence-structure correlations in proteins. J. Mol. Biol. 301, 173–190 (2000)
Article Google Scholar
Chang, C.C., Lin, C.J.: Training nu-support vector classifiers: Theory and algorithms. Neural Computations 13, 2119–2147 (2001)
Article MATH Google Scholar
Cristianini, N., Shawe-Taylor, J.: An Introduction to Support Vector Machines and other Kernel-based Learning Methods. Cambridge University Press, Cambridge (2000)
Google Scholar
Daniael, B., Cao, D.: Training Support Vector Machines Using Adaptive Clustering. In: Proc. of SIAM International Conference on Data Mining 2004, Lake Buena Vista, FL, USA (2004)
Google Scholar
Gupta, S.K., Rao, K.S., Bhatnagar, V.: K-means clustering algorithm for categorical attributes. In: Mohania, M., Tjoa, A.M. (eds.) DaWaK 1999. LNCS, vol. 1676, pp. 203–208. Springer, Heidelberg (1999)
Google Scholar
Hu, H., Pan, Y., Harrsion, R., Tai, P.C.: Improved protein secondary structure prediction using support vector machine with a new encoding scheme and advanced tertiary classifier. IEEE Transactions on NanoBioscience 2, 265–271 (2004)
Article Google Scholar
Kolodny, R., Linial, N.: Approximate protein structural alignment in polynomial time. Proc Natl. Acad. Sci. 101, 12201–12206 (2004)
Article Google Scholar
Osuna, E., Freund, R., Girosi, F.: An improved training algorithm for support vector machines. In: Proc. of IEEE Workshop on Neural Networks for Signal Processing, pp. 276–285 (1997)
Google Scholar
Platt, J.: Fast training of support vector machines using sequential minimal optimization. In: Advances in Kerenel Methods-Support Vector Learning, pp. 185–208 (1999)
Google Scholar
Schoelkopf, B., Tsuda, K., Vert, J.P.: Kernel Methods in Computational Biology, pp. 71–92. MIT Press, Cambridge (2004)
Google Scholar
Scholkopf, B., Burges, C., Smola, A. (eds.): Advances in Kernel Methods-Support Vector Learning. MIT Press, Cambridge (1999)
Google Scholar
Valentini, G., Dietterich, T.G.: Low Bias Bagged Support vector Machines. In: Proc. of the 20th International Conference on Machine Learning ICML 2003, pp. 752–759. Washington D.C. USA (2003)
Google Scholar
Vapnik, V.: Statistical Learning Theory. John Wiley&Sons, Inc., New York (1998)
MATH Google Scholar
Vavasis, S.A.: Nonlinear Optimization: Complexity Issues. Oxford Science, New York (1991)
MATH Google Scholar
Wang, G., Dunbrack Jr., R.L.: PISCES: a protein sequence-culling server. Bioinformatics 19(12), 1589–1591 (2003)
Article Google Scholar
Yao, Y.Y.: Granular Computing. Computer Science (Ji Suan Ji Ke Xue). In: Proceedings of The 4th Chinese National Conference on Rough Sets and Soft Computing, vol. 31, pp. 1–5 (2004)
Google Scholar
Yao, Y.Y.: Perspectives of Granular Computing. In: IEEE Conference on Granular Computing (2005) (to appear)
Google Scholar
Yu, H., Yang, J., Han, J.: Classifying Large Data sets Using SVMs with Hierarchical Clusters. In: Proc. of the 9th ACM SIGKDD 2003, Washington DC, USA (2003)
Google Scholar
Zagrovic, B., Pande, V.S.: How does averaging affect protein structure comparison on the ensemble level? Biophysical Journal 87, 2240–2246 (2004)
Article Google Scholar
Zhong, W., Altun, G., Harrison, R., Tai, P.C., Pan, Y.: Mining Protein Sequence Motifs Representing Common 3D Structures. In: Poster Paper of IEEE Computational Systems Bioinformatics (CSB 2005), Stanford University (2005)
Google Scholar
Zhong, W., Altun, G., Harrison, R., Tai, P.C., Pan, Y.: Improved K-means Clustering Algorithm for Exploring Local Protein Sequence Motifs Representing Common Structural Property. IEEE Transactions on NanoBioscience 4, 255–265 (2005)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Southeast University, Nanjing, 210096, China
Jieyue He
Department of Computer Science, USA
Wei Zhong, Robert Harrison & Yi Pan
Department of Biology, Georgia State University, Atlanta, GA, 30303-4110, USA
Robert Harrison & Phang C. Tai
GCC Distinguished Cancer Scholar, USA
Robert Harrison

Authors

Jieyue He
View author publications
You can also search for this author in PubMed Google Scholar
Wei Zhong
View author publications
You can also search for this author in PubMed Google Scholar
Robert Harrison
View author publications
You can also search for this author in PubMed Google Scholar
Phang C. Tai
View author publications
You can also search for this author in PubMed Google Scholar
Yi Pan
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Advanced Computing and Emerging Technologies Centre, The School of Systems Engineering, University of Reading, RG6 6AY, Reading, United Kingdom
Vassil N. Alexandrov
Department of Mathematics and Computer Science, University of Amsterdam, Kruislaan 403, 1098, SJ Amsterdam, The Netherlands
Geert Dick van Albada
Faculty of Sciences, Section of Computational Science, University of Amsterdam, Kruislaan 403, 1098, SJ Amsterdam, The Netherlands
Peter M. A. Sloot
Computer Science Department, University of Tennessee, TN 37996-3450, Knoxville, USA
Jack Dongarra

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

He, J., Zhong, W., Harrison, R., Tai, P.C., Pan, Y. (2006). Clustering Support Vector Machines and Its Application to Local Protein Tertiary Structure Prediction. In: Alexandrov, V.N., van Albada, G.D., Sloot, P.M.A., Dongarra, J. (eds) Computational Science – ICCS 2006. ICCS 2006. Lecture Notes in Computer Science, vol 3992. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11758525_96

Download citation

DOI: https://doi.org/10.1007/11758525_96
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-34381-3
Online ISBN: 978-3-540-34382-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Clustering Support Vector Machines and Its Application to Local Protein Tertiary Structure Prediction

Abstract

Chapter PDF

Similar content being viewed by others

Granular support vector machine: a review

Clustering Aided Support Vector Machines

Prediction of phosphorylation sites based on granular support vector machine

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Clustering Support Vector Machines and Its Application to Local Protein Tertiary Structure Prediction

Abstract

Chapter PDF

Similar content being viewed by others

Granular support vector machine: a review

Clustering Aided Support Vector Machines

Prediction of phosphorylation sites based on granular support vector machine

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation