Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1102351.1102471acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicmlConference Proceedingsconference-collections
Article

Exploiting syntactic, semantic and lexical regularities in language modeling via directed Markov random fields

Published: 07 August 2005 Publication History

Abstract

We present a directed Markov random field (MRF) model that combines n-gram models, probabilistic context free grammars (PCFGs) and probabilistic latent semantic analysis (PLSA) for the purpose of statistical language modeling. Even though the composite directed MRF model potentially has an exponential number of loops and becomes a context sensitive grammar, we are nevertheless able to estimate its parameters in cubic time using an efficient modified EM method, the generalized inside-outside algorithm, which extends the inside-outside algorithm to incorporate the effects of the n-gram and PLSA language models. We generalize various smoothing techniques to alleviate the sparseness of n-gram counts in cases where there are hidden variables. We also derive an analogous algorithm to calculate the probability of initial subsequence of a sentence, generated by the composite language model. Our experimental results on the Wall Street Journal corpus show that we obtain significant reductions in perplexity compared to the state-of-the-art baseline trigram model with Good-Turing and Kneser-Ney smoothings.

References

[1]
Baker, J. (1979). Trainable grammars for speech recognition. Proceedings of the 97th Meeting of the Acoustical Society of America, 547--550.
[2]
Bellegarda, J. (2000). Exploiting latent semantic information in statistical language modeling. Proceedings of IEEE, 88(8):1279--1296.
[3]
Berger, A., Della Pietra, S. and Della Pietra, V. (1996). A maximum entropy approach to natural language processing. Computational Linguistics, 22(1):39--71.
[4]
Blei, D., Ng, A. and Jordan, M. (2003). Latent Dirichlet allocation. Journal of Machine Learning Research, 3:993--1022.
[5]
Chelba, C. and Jelinek, F. (2000). Structured language modeling. Computer Speech and Language, 14(4):283--332.
[6]
Chen, S. and Goodman, J. (1999). An empirical study of smoothing techniques for language modeling. Computer Speech and Language, 13(4): 319--358.
[7]
Chi, Z. (1999). Statistical properties of probabilistic context-free grammars. Computational Linguistics, 25(1):131--160.
[8]
Dempster, A., Laird, N. and Rubin, D. (1977). Maximum likelihood estimation from incomplete data via the EM algorithm. Journal of Royal Statistical Society, 39:1--38.
[9]
Griffiths, T., Steyvers, M., Blei, D. and Tenenbaum, J. (2004). Integrating topics and syntax. Advances in Neural Information Processing Systems 17.
[10]
Hofmann, T. (2001). Unsupervised learning by probabilistic latent semantic analysis. Machine Learning, 42(1):177--196.
[11]
Jelinek, F. (1998). Statistical Methods for Speech Recognition. MIT Press.
[12]
Katz, S. (1987). Estimation of probabilities from sparse data for the language model component of a speech recognizer. IEEE Transactions on Acoustics, Speech and Signal Processing, 35(3):400--401.
[13]
Khudanpur, S. and Wu, J. (2000). Maximum entropy techniques for exploiting syntactic, semantic and collocational dependencies in language modeling. Computer Speech and Language, 14(4):355--372.
[14]
Lafferty, J. (2000). A derivation of the inside-outside algorithm from the EM algorithm. IBM Research Report 21636.
[15]
Pritchard, J., Stephens, M. and Donnelly, P. (2000). Inference of population structure using multilocus genotype data. Genetics, 155:945--959.
[16]
Roark, B. (2001). Probabilistic top-down parsing and language modeling. Computational Linguistics, 27(2):249--276.
[17]
Rosenfeld, R. (1996). A maximum entropy approach to adaptive statistical language modeling. Computer Speech and Language, 10(2):187--228.
[18]
Shannon, C. (1948). A mathematical theory of communication. Bell System Technical Journal, 27(2):379--423.
[19]
Wainwright, M. and Jordan, M. (2003). Graphical models, exponential families, and variational inference. Technical Report 649, Department of Statistics, UC-Berkeley.
[20]
Wang, S., Schuurmans, D. and Zhao, Y. (2003). The latent maximum entropy principle. Manuscript.
[21]
Wang, S., Schuurmans, D., Peng, F. and Zhao, Y. (2005). Combining statistical language models via the latent maximum entropy principle. Machine Learning, 59:1--22.
[22]
Yedidia, S., Freeman, W. and Weiss, Y. (2001). Generalized belief propagation. Advances in Neural Information Processing Systems, 13:689--695.
[23]
Younger, D. (1967). Recognition and parsing of context free languages in time N3. Information and Control, 10:198--208.

Cited By

View all
  • (2017)Gaussian conditional random fields extended for directed graphsMachine Language10.1007/s10994-016-5611-7106:9-10(1271-1288)Online publication date: 1-Oct-2017
  • (2016)Combination of language models for word predictionIEEE/ACM Transactions on Audio, Speech and Language Processing10.1109/TASLP.2016.254774324:9(1477-1490)Online publication date: 1-Sep-2016
  • (2012)A scalable distributed syntactic, semantic, and lexical language modelComputational Linguistics10.1162/COLI_a_0010738:3(631-671)Online publication date: 1-Sep-2012
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
ICML '05: Proceedings of the 22nd international conference on Machine learning
August 2005
1113 pages
ISBN:1595931805
DOI:10.1145/1102351
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 August 2005

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Article

Acceptance Rates

Overall Acceptance Rate 140 of 548 submissions, 26%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)4
  • Downloads (Last 6 weeks)1
Reflects downloads up to 04 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2017)Gaussian conditional random fields extended for directed graphsMachine Language10.1007/s10994-016-5611-7106:9-10(1271-1288)Online publication date: 1-Oct-2017
  • (2016)Combination of language models for word predictionIEEE/ACM Transactions on Audio, Speech and Language Processing10.1109/TASLP.2016.254774324:9(1477-1490)Online publication date: 1-Sep-2016
  • (2012)A scalable distributed syntactic, semantic, and lexical language modelComputational Linguistics10.1162/COLI_a_0010738:3(631-671)Online publication date: 1-Sep-2012
  • (2012)Getting Past the Language Gap: Innovations in Machine TranslationMobile Speech and Advanced Natural Language Solutions10.1007/978-1-4614-6018-3_6(103-181)Online publication date: 12-Dec-2012
  • (2011)A large scale distributed syntactic, semantic and lexical language model for machine translationProceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 110.5555/2002472.2002499(201-210)Online publication date: 19-Jun-2011
  • (2011)Markerless reconstruction and synthesis of dynamic facial expressionsComputer Vision and Image Understanding10.1016/j.cviu.2010.11.022115:5(668-680)Online publication date: 1-May-2011
  • (2011)Local shape descriptor selection for object recognition in range dataComputer Vision and Image Understanding10.1016/j.cviu.2010.11.021115:5(681-694)Online publication date: 1-May-2011
  • (2011)Single viewpoint model completion of symmetric objects for digital inspectionComputer Vision and Image Understanding10.1016/j.cviu.2010.11.019115:5(603-610)Online publication date: 1-May-2011
  • (2011)Dynamic soft encoded patterns for facial event analysisComputer Vision and Image Understanding10.1016/j.cviu.2010.11.015115:3(456-465)Online publication date: 1-Mar-2011
  • (2011)Automatic face interpretation using fast 3D illumination-based AAM modelsComputer Vision and Image Understanding10.1016/j.cviu.2010.11.005115:2(194-210)Online publication date: 1-Feb-2011
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media