Abstract
This paper describes a machine learning system that discovered a “negative motif”, in transmembrane domain identification from amino acid sequences, and reports its experiments on protein data using PIR database. We introduce a decision tree whose nodes are labeled with regular patterns. As a hypothesis, the system produces such a decision tree for a small number of randomly chosen positive and negative examples from PIR. Experiments show that our system finds reasonable hypotheses very successfully. As a theoretical foundation, we show that the class of languages defined by decesion trees of depth at mostd overk-variable regular patterns is polynomial-time learnable in the sense of probably approximately correct (PAC) learning for any fixedd, k≥0.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Arikawa, S., Kuhara, S., Miyano, S., Shinohara, A. and Shinohara, T., “A Learning Algorithm for Elementary Formal Systems and Its Experiments on Identification of Transmembrane Domains,” inProc. 25th Hawaii Int. Conf. on Sys. Sci., pp. 675–684, IEEE, 1992.
Bairoch, A., “PROSITE: A Dictionary of Sites and Patterns in Proteins,”Nucleic Acids Res., 19, pp. 2241–2245, 1991.
Blumer, A., Ehrenfeucht, A., and Haussler, D. and Warmuth, M. K., “Learnability and the Vapnik-Chervonenkis Dimension,”JACM, 36, pp. 929–965, 1989.
Ehrenfeucht, A. and Haussler, D., “Learning Decision Trees from Random Examples,”Inform. Comput., 82, pp. 231–246, 1989.
Engelman, D. M., Steiz, T. A. and Goldman, A., “Identifying Nonpolar Transbilayer Helices in Amino Acid Sequences of Membrane Proteins,”Ann. Rev. Biophys. Biophys. Chem., 15, pp. 321–353, 1986.
Gusev, V. and Chuzhanova, N., “The Algorithms for Recognition of the Functional Sites in Genetic Texts,” inProc. 1st Workshop on Algorithmic Learning Theory, Tokyo, pp. 109–119, 1990.
Hartmann, E., Rapoport, T. A. and Lodish, H. F., “Predicting the Orientation of Eukaryotic Membrane-Spanning Proteins,” inProc. Natl. Acad. Sci. U.S.A., 86, pp. 5786–5790, 1989.
Holly, L. H. and Karplus, M., “Protein Secondary Structure Prediction with a Neural Network,” inProc. Natl. Acad. Sci. USA, 86, pp. 152–156, 1989.
Kyte, J. and Doolittle, R. F., “A Simple Method for Displaying the Hydropathic Character of Protein,”J. Mol. Biol., 157, pp. 105–132, 1982.
Lipp, J., Flint, N., Haeuptle, M. T. and Dobberstein, B., “Structural Requirements for Membrane Assembly of Proteins Spanning the Membrane Several Times,”J. Cell Biol., 109, pp. 2013–2022, 1989.
Miyano, S., Shinohara, A. and Shinohara, T., “Which Classes of Elementary Formal Systems are Polynomial-Time Learnable?” inProc. 2nd Algorithmic Learning Theory, Tokyo, pp. 139–150, 1991.
Natarajan, B. K., “On Learning Sets and Functions,”Machine Learning, 4, pp. 67–97, 1989.
Protein Identification Resource, National Biomedical Research Foundation.
Quinlan, J. R., “Induction of Decision Trees,”Machine Learning, 1, pp. 81–106, 1986.
Quinlan, J. R. and Rivest, R. L., “Inferring Decision Trees using the Minimum Description Length Principle,”Inform. Comput., 80, pp. 227–248, 1989.
Rao, J. K. M. and Argos, P., “A Confirmational Preference Parameter to Predict Helices in Integral Membrane Proteins,”Biochim. Biophys. Acta, 869, pp. 197–214, 1986.
Shinohara, T., “Polynomial Time Inference of Pattern Languages and its Applications,” inProc. 7th IBM Symp. Mathematical Foundations of Computer Science, pp. 191–209, 1982.
Shinohara, T., “Polynomial Time Inference of Regular Pattern Languages,” inProc. RIMS Symp. Software Science and Engineering (Lecture Notes in Computer Science, 147), pp. 115–127, 1983.
Utgoff, P. E., “Incremental Induction of Decision Tree,”Machine Learning, 4, pp. 161–186, 1989.
Valiant, L., “A Theory of the Learnable,”Commun. ACM, 27, pp. 1134–1142, 1984.
Von Heijine, G., “Transcending the Impenetrable: How Proteins Come to Terms with Membranes,”Biochim. Biophys. Acta, 947, pp. 307–333, 1988.
Wu C. H., Whiston, G. M. and Montllor, G. J., “PROCANS: A Protein Classification System Using a Neural Network,”IJCNN Int. Joint Conf. Neural Networks, 2, pp. 91–96, 1990.
Author information
Authors and Affiliations
Additional information
Setsuo Arikawa, Ph. D.: He is a Professor and the Director of Research Institute of Fundamental Information Science, Kyushu University. He received the B. S. degree in 1964, the MS degree in 1966 and the Dr. Sci. degree in 1969 all in Mathematics from Kyushu University. He has been working on algorithmic learning theory, logic and inference in Al, and information retrieval systems.
Satoru Kuhara, Ph.D.: He is an Associate Professor of Graduate School of Genetic Resources Technology, Kyushu University. He received the B. A. degree in 1974, the M. Agr. degree in 1976 and the Dr. Agr. in 1980 from Kyushu University. His present interests include computer analysis of genetic information and protein structures.
Satoru Miyano, Ph. D.: He received the B. S. degree in 1977, the M. S. degree in 1979 and the Dr. Sci. degree in 1984 from Kyushu University. Presently, he is a Professor of Research Institute of Fundamental Information Science, Kyushu University. He has been making researches on computational complexity, parallel algorithms, algorithmic learning theory, and genome informatics.
Yasuhito Mukouchi: He is currently a graduate student of Doctor Course of Department of Information Systems, Kyushu University. He received the B. E. and the M. A. degrees from University of Osaka Prefecture in 1987 and 1991, respectively. His research interests are inductive inference and computational learning theory.
Ayumi Shinohara: He received the B. S. degree in 1988 in Mathematics and the M. S. degree in 1990 in Information Systems from Kyushu University. Currently, he is an Assistant of Research Institute of Fundamental Information Science, Kyushu University. He has been working on computational learning theory and genome informatics.
Takeshi Shinohara, Ph. D.: He is an Associate Professor of Department of Artificial Intelligence, Kyushu Institute of Technology. He received the B. S. degree in 1980 from Kyoto University, the M. S. and the Dr. Sci. degrees from Kyushu University in 1982 and 1986, respectively. His present interests are information retrieval, string pattern matching algorithms and computational learning theory.
About this article
Cite this article
Arikawa, S., Miyano, S., Shinohara, A. et al. A machine discovery from amino acid sequences by decision trees over regular patterns. New Gener Comput 11, 361–375 (1993). https://doi.org/10.1007/BF03037183
Received:
Issue Date:
DOI: https://doi.org/10.1007/BF03037183