Article

Exploiting syntactic, semantic and lexical regularities in language modeling via directed Markov random fields

Authors:

Russell Greiner,

Dale Schuurmans,

Li ChengAuthors Info & Claims

ICML '05: Proceedings of the 22nd international conference on Machine learning

Pages 948 - 955

https://doi.org/10.1145/1102351.1102471

Published: 07 August 2005 Publication History

Abstract

We present a directed Markov random field (MRF) model that combines n-gram models, probabilistic context free grammars (PCFGs) and probabilistic latent semantic analysis (PLSA) for the purpose of statistical language modeling. Even though the composite directed MRF model potentially has an exponential number of loops and becomes a context sensitive grammar, we are nevertheless able to estimate its parameters in cubic time using an efficient modified EM method, the generalized inside-outside algorithm, which extends the inside-outside algorithm to incorporate the effects of the n-gram and PLSA language models. We generalize various smoothing techniques to alleviate the sparseness of n-gram counts in cases where there are hidden variables. We also derive an analogous algorithm to calculate the probability of initial subsequence of a sentence, generated by the composite language model. Our experimental results on the Wall Street Journal corpus show that we obtain significant reductions in perplexity compared to the state-of-the-art baseline trigram model with Good-Turing and Kneser-Ney smoothings.

References

[1]

Baker, J. (1979). Trainable grammars for speech recognition. Proceedings of the 97th Meeting of the Acoustical Society of America, 547--550.

[2]

Bellegarda, J. (2000). Exploiting latent semantic information in statistical language modeling. Proceedings of IEEE, 88(8):1279--1296.

[3]

Berger, A., Della Pietra, S. and Della Pietra, V. (1996). A maximum entropy approach to natural language processing. Computational Linguistics, 22(1):39--71.

Digital Library

[4]

Blei, D., Ng, A. and Jordan, M. (2003). Latent Dirichlet allocation. Journal of Machine Learning Research, 3:993--1022.

Digital Library

[5]

Chelba, C. and Jelinek, F. (2000). Structured language modeling. Computer Speech and Language, 14(4):283--332.

Digital Library

[6]

Chen, S. and Goodman, J. (1999). An empirical study of smoothing techniques for language modeling. Computer Speech and Language, 13(4): 319--358.

Digital Library

[7]

Chi, Z. (1999). Statistical properties of probabilistic context-free grammars. Computational Linguistics, 25(1):131--160.

Digital Library

[8]

Dempster, A., Laird, N. and Rubin, D. (1977). Maximum likelihood estimation from incomplete data via the EM algorithm. Journal of Royal Statistical Society, 39:1--38.

[9]

Griffiths, T., Steyvers, M., Blei, D. and Tenenbaum, J. (2004). Integrating topics and syntax. Advances in Neural Information Processing Systems 17.

[10]

Hofmann, T. (2001). Unsupervised learning by probabilistic latent semantic analysis. Machine Learning, 42(1):177--196.

Digital Library

[11]

Jelinek, F. (1998). Statistical Methods for Speech Recognition. MIT Press.

Digital Library

[12]

Katz, S. (1987). Estimation of probabilities from sparse data for the language model component of a speech recognizer. IEEE Transactions on Acoustics, Speech and Signal Processing, 35(3):400--401.

[13]

Khudanpur, S. and Wu, J. (2000). Maximum entropy techniques for exploiting syntactic, semantic and collocational dependencies in language modeling. Computer Speech and Language, 14(4):355--372.

Digital Library

[14]

Lafferty, J. (2000). A derivation of the inside-outside algorithm from the EM algorithm. IBM Research Report 21636.

[15]

Pritchard, J., Stephens, M. and Donnelly, P. (2000). Inference of population structure using multilocus genotype data. Genetics, 155:945--959.

[16]

Roark, B. (2001). Probabilistic top-down parsing and language modeling. Computational Linguistics, 27(2):249--276.

Digital Library

[17]

Rosenfeld, R. (1996). A maximum entropy approach to adaptive statistical language modeling. Computer Speech and Language, 10(2):187--228.

[18]

Shannon, C. (1948). A mathematical theory of communication. Bell System Technical Journal, 27(2):379--423.

[19]

Wainwright, M. and Jordan, M. (2003). Graphical models, exponential families, and variational inference. Technical Report 649, Department of Statistics, UC-Berkeley.

[20]

Wang, S., Schuurmans, D. and Zhao, Y. (2003). The latent maximum entropy principle. Manuscript.

[21]

Wang, S., Schuurmans, D., Peng, F. and Zhao, Y. (2005). Combining statistical language models via the latent maximum entropy principle. Machine Learning, 59:1--22.

Digital Library

[22]

Yedidia, S., Freeman, W. and Weiss, Y. (2001). Generalized belief propagation. Advances in Neural Information Processing Systems, 13:689--695.

Digital Library

[23]

Younger, D. (1967). Recognition and parsing of context free languages in time N3. Information and Control, 10:198--208.

Cited By

Vujicic TGlass JZhou FObradovic Z(2017)Gaussian conditional random fields extended for directed graphsMachine Language10.1007/s10994-016-5611-7106:9-10(1271-1288)Online publication date: 1-Oct-2017
https://dl.acm.org/doi/10.1007/s10994-016-5611-7
Cavalieri DPalazuelos-Cagigas SBastos-Filho TSarcinelli-Filho M(2016)Combination of language models for word predictionIEEE/ACM Transactions on Audio, Speech and Language Processing10.1109/TASLP.2016.254774324:9(1477-1490)Online publication date: 1-Sep-2016
https://dl.acm.org/doi/10.1109/TASLP.2016.2547743
Tan MZhou WZheng LWang S(2012)A scalable distributed syntactic, semantic, and lexical language modelComputational Linguistics10.1162/COLI_a_0010738:3(631-671)Online publication date: 1-Sep-2012
https://dl.acm.org/doi/10.1162/COLI_a_00107
Show More Cited By

Recommendations

A scalable distributed syntactic, semantic, and lexical language model

This paper presents an attempt at building a large scale distributed composite language model that is formed by seamlessly integrating an n-gram model, a structured language model, and probabilistic latent semantic analysis under a directed Markov ...
Preordering using a Target-Language Parser via Cross-Language Syntactic Projection for Statistical Machine Translation

When translating between languages with widely different word orders, word reordering can present a major challenge. Although some word reordering methods do not employ source-language syntactic structures, such structures are inherently useful for word ...
A large scale distributed syntactic, semantic and lexical language model for machine translation
HLT '11: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1

This paper presents an attempt at building a large scale distributed composite language model that simultaneously accounts for local word lexical information, mid-range sentence syntactic structure, and long-span document semantic content under a ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

ICML '05: Proceedings of the 22nd international conference on Machine learning

August 2005

1113 pages

ISBN:1595931805

DOI:10.1145/1102351

General Chair:
Saso Dzeroski
Jozef Stefan Institute, Slovenia
,
Program Chairs:
Luc De Raedt,
Stefan Wrobel

Copyright © 2005 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 August 2005

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Article

Acceptance Rates

Overall Acceptance Rate 140 of 548 submissions, 26%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

22
Total Citations
View Citations
188
Total Downloads

Downloads (Last 12 months)4
Downloads (Last 6 weeks)1

Reflects downloads up to 04 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Vujicic TGlass JZhou FObradovic Z(2017)Gaussian conditional random fields extended for directed graphsMachine Language10.1007/s10994-016-5611-7106:9-10(1271-1288)Online publication date: 1-Oct-2017
https://dl.acm.org/doi/10.1007/s10994-016-5611-7
Cavalieri DPalazuelos-Cagigas SBastos-Filho TSarcinelli-Filho M(2016)Combination of language models for word predictionIEEE/ACM Transactions on Audio, Speech and Language Processing10.1109/TASLP.2016.254774324:9(1477-1490)Online publication date: 1-Sep-2016
https://dl.acm.org/doi/10.1109/TASLP.2016.2547743
Tan MZhou WZheng LWang S(2012)A scalable distributed syntactic, semantic, and lexical language modelComputational Linguistics10.1162/COLI_a_0010738:3(631-671)Online publication date: 1-Sep-2012
https://dl.acm.org/doi/10.1162/COLI_a_00107
Delmonte R(2012)Getting Past the Language Gap: Innovations in Machine TranslationMobile Speech and Advanced Natural Language Solutions10.1007/978-1-4614-6018-3_6(103-181)Online publication date: 12-Dec-2012
https://doi.org/10.1007/978-1-4614-6018-3_6
Tan MZhou WZheng LWang SLin D(2011)A large scale distributed syntactic, semantic and lexical language model for machine translationProceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 110.5555/2002472.2002499(201-210)Online publication date: 19-Jun-2011
https://dl.acm.org/doi/10.5555/2002472.2002499
Sibbing DHabbecke MKobbelt L(2011)Markerless reconstruction and synthesis of dynamic facial expressionsComputer Vision and Image Understanding10.1016/j.cviu.2010.11.022115:5(668-680)Online publication date: 1-May-2011
https://dl.acm.org/doi/10.1016/j.cviu.2010.11.022
Taati BGreenspan M(2011)Local shape descriptor selection for object recognition in range dataComputer Vision and Image Understanding10.1016/j.cviu.2010.11.021115:5(681-694)Online publication date: 1-May-2011
https://dl.acm.org/doi/10.1016/j.cviu.2010.11.021
Law AAliaga D(2011)Single viewpoint model completion of symmetric objects for digital inspectionComputer Vision and Image Understanding10.1016/j.cviu.2010.11.019115:5(603-610)Online publication date: 1-May-2011
https://dl.acm.org/doi/10.1016/j.cviu.2010.11.019
Yang PLiu QMetaxas D(2011)Dynamic soft encoded patterns for facial event analysisComputer Vision and Image Understanding10.1016/j.cviu.2010.11.015115:3(456-465)Online publication date: 1-Mar-2011
https://dl.acm.org/doi/10.1016/j.cviu.2010.11.015
Ayala-Raggi SAltamirano-Robles LCruz-Enriquez J(2011)Automatic face interpretation using fast 3D illumination-based AAM modelsComputer Vision and Image Understanding10.1016/j.cviu.2010.11.005115:2(194-210)Online publication date: 1-Feb-2011
https://dl.acm.org/doi/10.1016/j.cviu.2010.11.005
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents