Learning Linearly Separable Languages

Kontorovich, Leonid; Cortes, Corinna; Mohri, Mehryar

doi:10.1007/11894841_24

Leonid Kontorovich²¹,
Corinna Cortes²² &
Mehryar Mohri^22,23

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4264))

Included in the following conference series:

International Conference on Algorithmic Learning Theory

809 Accesses
3 Citations

Abstract

This paper presents a novel paradigm for learning languages that consists of mapping strings to an appropriate high-dimensional feature space and learning a separating hyperplane in that space. It initiates the study of the linear separability of automata and languages by examining the rich class of piecewise-testable languages. It introduces a high-dimensional feature map and proves piecewise-testable languages to be linearly separable in that space. The proof makes use of word combinatorial results relating to subsequences. It also shows that the positive definite kernel associated to this embedding can be computed in quadratic time. It examines the use of support vector machines in combination with this kernel to determine a separating hyperplane and the corresponding learning guarantees. It also proves that all languages linearly separable under a regular finite cover embedding, a generalization of the embedding we used, are regular.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

The Teaching Complexity of Erasing Pattern Languages with Bounded Variable Frequency

Learning from Positive and Negative Examples: Dichotomies and Parameterized Algorithms

Variations of the Separating Words Problem

References

Angluin, D.: On the complexity of minimum inference of regular sets. Information and Control 3(39), 337–350 (1978)
Article MathSciNet Google Scholar
Angluin, D.: Inference of reversible languages. Journal of the ACM (JACM) 3(29), 741–765 (1982)
Article MathSciNet Google Scholar
Anthony, M.: Threshold Functions, Decision Lists, and the Representation of Boolean Functions. Neurocolt Technical report Series NC-TR-96-028, Royal Holloway, University of London (1996)
Google Scholar
Bartlett, P., Shawe-Taylor, J.: Generalization performance of support vector machines and other pattern classifiers. In: Advances in kernel methods: support vector learning, pp. 43–54. MIT Press, Cambridge, MA, USA (1999)
Google Scholar
Boser, B.E., Guyon, I., Vapnik, V.N.: A training algorithm for optimal margin classifiers. In: Proceedings of the Fifth Annual Workshop of Computational Learning Theory, Pittsburg, vol. 5, pp. 144–152. ACM, New York (1992)
Chapter Google Scholar
Cortes, C., Haffner, P., Mohri, M.: Rational Kernels: Theory and Algorithms. Journal of Machine Learning Research (JMLR) 5, 1035–1062 (2004)
MathSciNet Google Scholar
Cortes, C., Vapnik, V.N.: Support-Vector Networks. Machine Learning 20(3), 273–297 (1995)
MATH Google Scholar
Derryberry, J.: Private communication (2004)
Google Scholar
Freund, Y., Kearns, M., Ron, D., Rubinfeld, R., Schapire, R.E., Sellie, L.: Efficient learning of typical finite automata from random walks. In: STOC 1993: Proceedings of the twenty-fifth annual ACM symposium on Theory of computing, pp. 315–324. ACM Press, New York (1993)
Chapter Google Scholar
García, P., Ruiz, J.: Learning k-testable and k-piecewise testable languages from positive data. Grammars 7, 125–140 (2004)
Google Scholar
Gold, E.M.: Language identification in the limit. Information and Control 50(10), 447–474 (1967)
Article Google Scholar
Gold, E.M.: Complexity of automaton identification from given data. Information and Control 3(37), 302–420 (1978)
Article MathSciNet Google Scholar
Haines, L.H.: On free monoids partially ordered by embedding. Journal of Combinatorial Theory 6, 35–40 (1969)
Article MathSciNet Google Scholar
Haussler, D., Littlestone, N., Warmuth, M.K.: Predicting {0,1}- Functions on Randomly Drawn Points. In: Proceedings of the first annual workshop on Computational learning theory (COLT 1988), pp. 280–296. Morgan Kaufmann Publishers Inc., San Francisco (1988)
Google Scholar
Higman, G.: Ordering by divisibility in abstract algebras. Proceedings of The London Mathematical Society 2, 326–336 (1952)
Article MATH MathSciNet Google Scholar
Kearns, M., Vazirani, U.: An Introduction to Computational Learning Theory. The MIT Press, Cambridge (1997)
Google Scholar
Lodhi, H., Shawe-Taylor, J., Cristianini, N., Watkins, C.: Text classification using string kernels. In: Leen, T.K., Dietterich, T.G., Tresp, V. (eds.) NIPS 2000, pp. 563–569. MIT Press, Cambridge (2001)
Google Scholar
Lothaire, M.: Combinatorics on Words. Encyclopedia of Mathematics and Its Applications, vol. 17. Addison-Wesley, Reading (1983)
MATH Google Scholar
Mateescu, A., Salomaa, A.: Volume 1: Word, Language, Grammar. In: Formal languages: an Introduction and a Synopsis. Handbook of Formal Languages, pp. 1–39. Springer, New York (1997)
Google Scholar
Oncina, J., García, P., Vidal, E.: Learning subsequential transducers for pattern recognition interpretation tasks. IEEE Trans. Pattern Anal. Mach. Intell. 15(5), 448–458 (1993)
Article Google Scholar
Pitt, L., Warmuth, M.: The minimum consistent DFA problem cannot be approximated within any polynomial. Journal of the Assocation for Computing Machinery 40(1), 95–142 (1993)
MATH MathSciNet Google Scholar
Ron, D., Singer, Y., Tishby, N.: On the learnability and usage of acyclic probabilistic finite automata. Journal of Computer and System Sciences 56(2), 133–152 (1998)
Article MATH MathSciNet Google Scholar
Simon, I.: Piecewise testable events. In: Brakhage, H. (ed.) GI-Fachtagung 1975. LNCS, vol. 33. Springer, Heidelberg (1975)
Google Scholar
Trakhtenbrot, B.A., Barzdin, J.M.: Finite Automata: Behavior and Synthesis. Fundamental Studies in Computer Science, vol. 1. North-Holland, Amsterdam (1973)
MATH Google Scholar
Vapnik, V.N.: Statistical Learning Theory. John Wiley & Sons, Chichester (1998)
MATH Google Scholar

Download references

Author information

Authors and Affiliations

Carnegie Mellon University, 5000 Forbes Avenue, Pittsburgh, PA, 15213, USA
Leonid Kontorovich
Google Research, 1440 Broadway, New York, NY, 10018, USA
Corinna Cortes & Mehryar Mohri
Courant Institute of Mathematical Sciences, 251 Mercer Street, New York, NY, 10012, USA
Mehryar Mohri

Authors

Leonid Kontorovich
View author publications
You can also search for this author in PubMed Google Scholar
Corinna Cortes
View author publications
You can also search for this author in PubMed Google Scholar
Mehryar Mohri
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Departament de Llenguatges i Sistemes Informàtics Laboratori d’Algorísmica Relacional, Complexitat i Aprenentatge, Universitat Politècnica de Catalunya, Barcelona,
José L. Balcázar
Google, 1600 Amphitheatre Parkway, 94043, Mountain View, CA, USA
Philip M. Long
Department of Computer Science and Department of Mathematics, National University of Singapore, 117543, Singapore, Republic of Singapore
Frank Stephan

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kontorovich, L., Cortes, C., Mohri, M. (2006). Learning Linearly Separable Languages. In: Balcázar, J.L., Long, P.M., Stephan, F. (eds) Algorithmic Learning Theory. ALT 2006. Lecture Notes in Computer Science(), vol 4264. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11894841_24

Download citation

DOI: https://doi.org/10.1007/11894841_24
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-46649-9
Online ISBN: 978-3-540-46650-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Learning Linearly Separable Languages

Abstract

Access this chapter

Preview

Similar content being viewed by others

The Teaching Complexity of Erasing Pattern Languages with Bounded Variable Frequency

Learning from Positive and Negative Examples: Dichotomies and Parameterized Algorithms

Variations of the Separating Words Problem

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Learning Linearly Separable Languages

Abstract

Access this chapter

Preview

Similar content being viewed by others

The Teaching Complexity of Erasing Pattern Languages with Bounded Variable Frequency

Learning from Positive and Negative Examples: Dichotomies and Parameterized Algorithms

Variations of the Separating Words Problem

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation