Abstract
This paper presents a novel paradigm for learning languages that consists of mapping strings to an appropriate high-dimensional feature space and learning a separating hyperplane in that space. It initiates the study of the linear separability of automata and languages by examining the rich class of piecewise-testable languages. It introduces a high-dimensional feature map and proves piecewise-testable languages to be linearly separable in that space. The proof makes use of word combinatorial results relating to subsequences. It also shows that the positive definite kernel associated to this embedding can be computed in quadratic time. It examines the use of support vector machines in combination with this kernel to determine a separating hyperplane and the corresponding learning guarantees. It also proves that all languages linearly separable under a regular finite cover embedding, a generalization of the embedding we used, are regular.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Angluin, D.: On the complexity of minimum inference of regular sets. Information and Control 3(39), 337–350 (1978)
Angluin, D.: Inference of reversible languages. Journal of the ACM (JACM) 3(29), 741–765 (1982)
Anthony, M.: Threshold Functions, Decision Lists, and the Representation of Boolean Functions. Neurocolt Technical report Series NC-TR-96-028, Royal Holloway, University of London (1996)
Bartlett, P., Shawe-Taylor, J.: Generalization performance of support vector machines and other pattern classifiers. In: Advances in kernel methods: support vector learning, pp. 43–54. MIT Press, Cambridge, MA, USA (1999)
Boser, B.E., Guyon, I., Vapnik, V.N.: A training algorithm for optimal margin classifiers. In: Proceedings of the Fifth Annual Workshop of Computational Learning Theory, Pittsburg, vol. 5, pp. 144–152. ACM, New York (1992)
Cortes, C., Haffner, P., Mohri, M.: Rational Kernels: Theory and Algorithms. Journal of Machine Learning Research (JMLR) 5, 1035–1062 (2004)
Cortes, C., Vapnik, V.N.: Support-Vector Networks. Machine Learning 20(3), 273–297 (1995)
Derryberry, J.: Private communication (2004)
Freund, Y., Kearns, M., Ron, D., Rubinfeld, R., Schapire, R.E., Sellie, L.: Efficient learning of typical finite automata from random walks. In: STOC 1993: Proceedings of the twenty-fifth annual ACM symposium on Theory of computing, pp. 315–324. ACM Press, New York (1993)
GarcÃa, P., Ruiz, J.: Learning k-testable and k-piecewise testable languages from positive data. Grammars 7, 125–140 (2004)
Gold, E.M.: Language identification in the limit. Information and Control 50(10), 447–474 (1967)
Gold, E.M.: Complexity of automaton identification from given data. Information and Control 3(37), 302–420 (1978)
Haines, L.H.: On free monoids partially ordered by embedding. Journal of Combinatorial Theory 6, 35–40 (1969)
Haussler, D., Littlestone, N., Warmuth, M.K.: Predicting {0,1}- Functions on Randomly Drawn Points. In: Proceedings of the first annual workshop on Computational learning theory (COLT 1988), pp. 280–296. Morgan Kaufmann Publishers Inc., San Francisco (1988)
Higman, G.: Ordering by divisibility in abstract algebras. Proceedings of The London Mathematical Society 2, 326–336 (1952)
Kearns, M., Vazirani, U.: An Introduction to Computational Learning Theory. The MIT Press, Cambridge (1997)
Lodhi, H., Shawe-Taylor, J., Cristianini, N., Watkins, C.: Text classification using string kernels. In: Leen, T.K., Dietterich, T.G., Tresp, V. (eds.) NIPS 2000, pp. 563–569. MIT Press, Cambridge (2001)
Lothaire, M.: Combinatorics on Words. Encyclopedia of Mathematics and Its Applications, vol. 17. Addison-Wesley, Reading (1983)
Mateescu, A., Salomaa, A.: Volume 1: Word, Language, Grammar. In: Formal languages: an Introduction and a Synopsis. Handbook of Formal Languages, pp. 1–39. Springer, New York (1997)
Oncina, J., GarcÃa, P., Vidal, E.: Learning subsequential transducers for pattern recognition interpretation tasks. IEEE Trans. Pattern Anal. Mach. Intell. 15(5), 448–458 (1993)
Pitt, L., Warmuth, M.: The minimum consistent DFA problem cannot be approximated within any polynomial. Journal of the Assocation for Computing Machinery 40(1), 95–142 (1993)
Ron, D., Singer, Y., Tishby, N.: On the learnability and usage of acyclic probabilistic finite automata. Journal of Computer and System Sciences 56(2), 133–152 (1998)
Simon, I.: Piecewise testable events. In: Brakhage, H. (ed.) GI-Fachtagung 1975. LNCS, vol. 33. Springer, Heidelberg (1975)
Trakhtenbrot, B.A., Barzdin, J.M.: Finite Automata: Behavior and Synthesis. Fundamental Studies in Computer Science, vol. 1. North-Holland, Amsterdam (1973)
Vapnik, V.N.: Statistical Learning Theory. John Wiley & Sons, Chichester (1998)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kontorovich, L., Cortes, C., Mohri, M. (2006). Learning Linearly Separable Languages. In: Balcázar, J.L., Long, P.M., Stephan, F. (eds) Algorithmic Learning Theory. ALT 2006. Lecture Notes in Computer Science(), vol 4264. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11894841_24
Download citation
DOI: https://doi.org/10.1007/11894841_24
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-46649-9
Online ISBN: 978-3-540-46650-5
eBook Packages: Computer ScienceComputer Science (R0)