Abstract
Language models are used in a variety of fields in order to support other tasks: classification, next-symbol prediction, pattern analysis. In order to compare language models, or to measure the quality of an acquired model with respect to an empirical distribution, or to evaluate the progress of a learning process, we propose to use distances based on the L 2 norm, or quadratic distances. We prove that these distances can not only be estimated through sampling, but can be effectively computed when both distributions are represented by stochastic deterministic finite automata. We provide a set of experiments showing a fast convergence of the distance through sampling and a good scalability, enabling us to use this distance to decide if two distributions are equal when only samples are provided, or to classify texts.
Chapter PDF
Similar content being viewed by others
Keywords
- Speech Recognition
- Language Model
- Automatic Speech Recognition
- Finite Automaton
- Deterministic Finite Automaton
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Amengual, J.C., Sanchis, A., Vidal, E., BenedÃ, J.M.: Language Simplification Through Error-Correcting and Grammatical Inference Techniques. Machine Learning Journal 44, 143–159 (2001)
Ney, H.: Stochastic Grammars and Pattern Recognition. In: Proc. of the NATO Advanced Study Institute, pp. 313–344. Springer, Heidelberg (1992)
Lucas, S., Vidal, E., Amari, A., Hanlon, S., Amengual, J.C.: A Comparison of Syntactic and Statistical Techniques for Off-Line OCR. In: Carrasco, R.C., Oncina, J. (eds.) ICGI 1994. LNCS, vol. 862, pp. 168–179. Springer, Heidelberg (1994)
Lyngsø, R.B., Pedersen, C.N.S., Nielsen, H.: Metrics and Similarity Measures for Hidden Markov Models. In: Proc. of ISMB 1999, pp. 178–186 (1999)
Jelinek, F.: Statistical Methods for Speech Recognition. MIT Press, Cambridge (1998)
Morgan, N., Bourlard, H.: Continuous Speech Recognition: an Introduction to the Hybrid HMM/connectionnist Approach. IEEE Signal Processing Magazine 12, 24–42 (1995)
GarcÃa, P., Segarra, E., Vidal, E., Galiano, I.: On the Use of the Morphic Generator Grammatical Inference (MGGI) Methodology in Automatic Speech Recognition. International Journal of Pattern Recognition and Artificial Intelligence 4, 667–685 (1994)
Thollard, F., Dupont, P., de la Higuera, C.: Probabilistic DFA Inference using Kullback-Leibler Divergence and Minimality. In: Proc. of ICML 2000, pp. 975–982 (2000)
Bunke, H., Sanfeliu, A. (eds.): Syntactic and Structural Pattern Recognition, Theory and Applications, vol. 7. World Scientific, Singapore (1990)
Fu, K.S., Booth, T.L.: Grammatical Inference: Introduction and Survey. Part I and II. IEEE Transactions on Syst. Man. and Cybern. 5, 59–72 & 409–423 (1975)
Miclet, L.: Grammatical Inference. In: Syntactic and Structural Pattern Recognition, Theory and Applications, pp. 237–290. World Scientific, Singapore (1990)
McAllester, D., Schapire, R.: Learning theory and language modeling. In: Exploring Artificial Intelligence in the New Millenium, Morgan Kaufmann, San Francisco (2002)
Fred, A.: Computation of Substring Probabilities in Stochastic Grammars. In: Oliveira, A.L. (ed.) ICGI 2000. LNCS (LNAI), vol. 1891, pp. 103–114. Springer, Heidelberg (2000)
Carrasco, R.C.: Accurate Computation of the Relative Entropy Between Stochastic Regular Grammars. RAIRO 31, 437–444 (1997)
Carrasco, R.C., Rico-Juan, J.R.: A Similarity Between Probabilistic Tree Languages: Application to XML Document Families. Pattern Recognition 36, 2197–2199 (2002) (in press)
Reber, A.S.: Implicit Learning of Artificial Grammars. Journal of verbal learning and verbal behaviour 6, 855–863 (1967)
Murgue, T., de la Higuera, C.: Distances Between Distributions: Comparing Language Models. Technical Report RR-0104, EURISE, Saint-Etienne, France (2004)
Kearns, M., Valiant, L.: Cryptographic Limitations on Learning Boolean Formulae and Finite Automata. In: 21st ACM Symposium on Theory of Computing, pp. 433–444 (1989)
MADCOW: Multi-Site Data Collection for a Spoken Language Corpus. In: Proc. DARPA Speech and Natural Language Workshop 1992, pp. 7–14 (1992)
Carrasco, R., Oncina, J.: Learning Stochastic Regular Grammars by Means of a State Merging Method. In: Carrasco, R.C., Oncina, J. (eds.) ICGI 1994. LNCS, vol. 862, pp. 139–150. Springer, Heidelberg (1994)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Murgue, T., de la Higuera, C. (2004). Distances between Distributions: Comparing Language Models. In: Fred, A., Caelli, T.M., Duin, R.P.W., Campilho, A.C., de Ridder, D. (eds) Structural, Syntactic, and Statistical Pattern Recognition. SSPR /SPR 2004. Lecture Notes in Computer Science, vol 3138. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-27868-9_28
Download citation
DOI: https://doi.org/10.1007/978-3-540-27868-9_28
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-22570-6
Online ISBN: 978-3-540-27868-9
eBook Packages: Springer Book Archive