Abstract
This paper presents a new grammar induction algorithm for probabilistic context-free grammars (PCFGs). There is an approach to PCFG induction that is based on parameter estimation. Following this approach, we apply the variational Bayes to PCFGs. The variational Bayes (VB) is an approximation of Bayesian learning. It has been empirically shown that VB is less likely to cause overfitting. Moreover, the free energy of VB has been successfully used in model selection. Our algorithm can be seen as a generalization of PCFG induction algorithms proposed before. In the experiments, we empirically show that induced grammars achieve better parsing results than those of other PCFG induction algorithms. Based on the better parsing results, we give examples of recursive grammatical structures found by the proposed algorithm.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Attias, H.: A variational Bayesian framework for graphical models. In: Advances in Neural Information Processing Systems vol. 12 (2000)
Baker, J.K.: Trainable grammars for speech recognition. In: Klatt, D.H., Wolf, J.J. (eds.) Speech Communication Papers for the 97th Meeting of the Acoustical Society of America, pp. 547–550 (1979)
Bockhorst, J., Craven, M.: Refining the structure of a stochastic context-free grammar. In: Proceedings of the 17th International Joint Conference on Artificial Intelligence (2001)
Chen, S.F.: Bayesian grammar induction for language modeling. In: Meeting of the Association for Computational Linguistics, pp. 228–235 (1995)
Collins, M.: Discriminative reranking for natural language parsing. In: Proc. 17th International Conf. on Machine Learning, pp. 175–182 (2000)
Ghahramani, Z., Beal, M.J.: Variational inference for Bayesian mixtures of factor analysers. In: Advances in Neural Information Processing Systems, Â vol. 12 (2000)
Hogenhout, W.R., Matsumoto, Y.: A fast method for statistical grammar induction. Natural Language Engineering 4(3), 191–209 (1998)
Klein, D., Manning, C.D.: A generative constituent-context model for improved grammar induction. In: Proceedings of the 40th Annual Meeting of the ACL (2002)
Klein, D., Manning, C.D.: Corpus-based induction of syntactic structure: Models of dependency and constituency. In: Proceedings of the 42nd Annual Meeting of the ACL (2004)
Kurihara, K., Sato, T.: An application of the variational Bayesian approach to probabilistic context-free grammars, 2004. In: IJCNLP 2004 Workshop beyond shallow analyses (2004)
Lari, K., Young, S.: The estimation of stochastic context-free grammars using the inside-outside algorithm. Computer Speech and Language 4, 35–56 (1990)
MacKay, D.J.C.: Em ensemble learning for hidden markov models. Technical report (1997)
Pereira, F.C.N., Schabes, Y.: Inside-outside reestimation from partially bracketed corpora. In: Meeting of the Association for Computational Linguistics, pp. 128–135 (1992)
Sato, M.: Online model selection based on the variational bayes. Neural Computation 13, 1649–1681 (2001)
Schabes, Y., Roth, M., Osborne, R.: Parsing the wall street journal with the inside-outside algorithm. In: ACL, pp. 341–347 (1993)
Stolcke, A., Omohundro, S.: Inducing probabilistic grammars by Bayesian model merging. In: International Conference on Grammatical Inference (1994)
Ueda, N., Ghahramani, Z.: Bayesian model search for mixture models based on optimizing variational bounds. Neural Networks 15(10), 1223–1241 (2002)
van Zaanen, M.: Abl: Alighment-based learning. In: COLING, vol. 18, pp. 961–967 (2000)
Wagsta, K., Cardie, C., Rogers, S., Schroedl, S.: Constrained k-means clustering with background knowledge. In: Proceedings of 18th International Conference on Machine Learning, pp. 577–584 (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kurihara, K., Sato, T. (2006). Variational Bayesian Grammar Induction for Natural Language. In: Sakakibara, Y., Kobayashi, S., Sato, K., Nishino, T., Tomita, E. (eds) Grammatical Inference: Algorithms and Applications. ICGI 2006. Lecture Notes in Computer Science(), vol 4201. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11872436_8
Download citation
DOI: https://doi.org/10.1007/11872436_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-45264-5
Online ISBN: 978-3-540-45265-2
eBook Packages: Computer ScienceComputer Science (R0)