research-article

Free access

A large scale distributed syntactic, semantic and lexical language model for machine translation

Authors:

Shaojun WangAuthors Info & Claims

HLT '11: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1

Pages 201 - 210

Published: 19 June 2011 Publication History

Abstract

This paper presents an attempt at building a large scale distributed composite language model that simultaneously accounts for local word lexical information, mid-range sentence syntactic structure, and long-span document semantic content under a directed Markov random field paradigm. The composite language model has been trained by performing a convergent N-best list approximate EM algorithm that has linear time complexity and a follow-up EM algorithm to improve word prediction power on corpora with up to a billion tokens and stored on a supercomputer. The large scale distributed composite language model gives drastic perplexity reduction over n-grams and achieves significantly better translation quality measured by the BLEU score and "readability" when applied to the task of re-ranking the N-best list from a state-of-the-art parsing-based machine translation system.

References

[1]

L. Bahl and J. Baker, F. Jelinek and R. Mercer. 1977. Perplexityła measure of difficulty of speech recognition tasks. 94th Meeting of the Acoustical Society of America, 62:S63, Supplement 1.

[2]

T. Brants et al. 2007. Large language models in machine translation. The 2007 Conference on Empirical Methods in Natural Language Processing (EMNLP), 858--867.

[3]

E. Charniak. 2001. Immediate-head parsing for language models. The 39th Annual Conference on Association of Computational Linguistics (ACL), 124--131.

Digital Library

[4]

E. Charniak, K. Knight and K. Yamada. 2003. Syntax-based language models for statistical machine translation. MT Summit IX., Intl. Assoc. for Machine Translation.

[5]

C. Chelba and F. Jelinek. 1998. Exploiting syntactic structure for language modeling. The 36th Annual Conference on Association of Computational Linguistics (ACL), 225--231.

Digital Library

[6]

C. Chelba and F. Jelinek. 2000. Structured language modeling. Computer Speech and Language, 14(4):283--332.

Digital Library

[7]

D. Chiang. 2005. A hierarchical phrase-based model for statistical machine translation. The 43th Annual Conference on Association of Computational Linguistics (ACL), 263--270.

Digital Library

[8]

D. Chiang. 2007. Hierarchical phrase-based translation. Computational Linguistics, 33(2):201--228.

Digital Library

[9]

J. Dean and S. Ghemawat. 2004. MapReduce: Simplified data processing on large clusters. Operating Systems Design and Implementation (OSDI), 137--150.

Digital Library

[10]

A. Dempster, N. Laird and D. Rubin. 1977. Maximum likelihood estimation from incomplete data via the EM algorithm. Journal of Royal Statistical Society, 39:1--38.

[11]

A. Emami, K. Papineni and J. Sorensen. 2007. Large-scale distributed language modeling. The 32nd IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), IV:37--40.

[12]

T. Hofmann. 2001. Unsupervised learning by probabilistic latent semantic analysis. Machine Learning, 42(1):177--196.

Digital Library

[13]

F. Jelinek and R. Mercer. 1981. Interpolated estimation of Markov source parameters from sparse data. Pattern Recognition in Practice, 381--397.

[14]

F. Jelinek and C. Chelba. 1999. Putting language into language modeling. Sixth European Conference on Speech Communication and Technology (EUROSPEECH), Keynote Paper 1.

[15]

F. Jelinek. 2004. Stochastic analysis of structured language modeling. Mathematical Foundations of Speech and Language Processing, 37--72, Springer-Verlag.

[16]

D. Jurafsky and J. Martin. 2008. Speech and Language Processing, 2nd Edition, Prentice Hall.

Digital Library

[17]

R. Kneser and H. Ney. 1995. Improved backing-off for m-gram language modeling. The 20th IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 181--184.

[18]

P. Koehn, F. Och and D. Marcu. 2003. Statistical phrase-based translation. The Human Language Technology Conference (HLT), 48--54.

Digital Library

[19]

S. Khudanpur and J. Wu. 2000. Maximum entropy techniques for exploiting syntactic, semantic and collocational dependencies in language modeling. Computer Speech and Language, 14(4):355--372.

Digital Library

[20]

A. Lavie et al. 2006. MINDS Workshops Machine Translation Working Group Final Report. http://www-nlpir.nist.gov/MINDS/FINAL/MT.web.pdf

[21]

J. Lin and C. Dyer. 2010. Data-Intensive Text Processing with MapReduce. Morgan and Claypool Publishers.

Digital Library

[22]

R. Northedge. 2005. OpenNLP software http://www.codeproject.com/KB/recipes/englishparsing.aspx

[23]

F. Och. 2003. Minimum error rate training in statistical machine translation. The 41th Annual meeting of the Association for Computational Linguistics (ACL), 311--318.

Digital Library

[24]

K. Papineni, S. Roukos, T. Ward, and W. Zhu. 2002. Bleu: a method for automatic evaluation of machine translation. The 40th Annual meeting of the Association for Computational Linguistics (ACL), 311--318.

Digital Library

[25]

B. Roark. 2001. Probabilistic top-down parsing and language modeling. Computational Linguistics, 27(2):249--276.

Digital Library

[26]

S. Wang et al. 2005. Exploiting syntactic, semantic and lexical regularities in language modeling via directed Markov random fields. The 22nd International Conference on Machine Learning (ICML), 953--960.

Digital Library

[27]

S. Wang et al. 2006. Stochastic analysis of lexical and semantic enhanced structural language model. The 8th International Colloquium on Grammatical Inference (ICGI), 97--111.

Digital Library

[28]

K. Yamada and K. Knight. 2001. A syntax-based statistical translation model. The 39th Annual Conference on Association of Computational Linguistics (ACL), 1067--1074.

Digital Library

[29]

W. Zangwill. 1969. Nonlinear Programming: A Unified Approach. Prentice-Hall.

[30]

Y. Zhang, A. Hildebrand and S. Vogel. 2006. Distributed language modeling for N-best list re-ranking. The 2006 Conference on Empirical Methods in Natural Language Processing (EMNLP), 216--223.

Digital Library

[31]

Y. Zhang, 2008. Structured language models for statistical machine translation. Ph. D. dissertation, CMU.

Cited By

Pauls AKlein DLi HLin COsborne M(2012)Large-scale syntactic language modeling with treeletsProceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 110.5555/2390524.2390654(959-968)Online publication date: 8-Jul-2012
https://dl.acm.org/doi/10.5555/2390524.2390654

Index Terms

A large scale distributed syntactic, semantic and lexical language model for machine translation
1. Applied computing
  1. Arts and humanities
    1. Language translation
2. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Machine translation

Recommendations

An incremental syntactic language model for statistical phrase-based machine translation
Syntactic discriminative language model rerankers for statistical machine translation

This article describes a method that successfully exploits syntactic features for n-best translation candidate reranking using perceptrons. We motivate the utility of syntax by demonstrating the superior performance of parsers over n-gram language ...
Preordering using a Target-Language Parser via Cross-Language Syntactic Projection for Statistical Machine Translation

When translating between languages with widely different word orders, word reordering can present a major challenge. Although some word reordering methods do not employ source-language syntactic structures, such structures are inherently useful for word ...

Comments

Information & Contributors

Information

Published In

cover image DL Hosted proceedings

HLT '11: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1

June 2011

1696 pages

ISBN:9781932432879

General Chair:
Dekang Lin
Google

Publisher

Association for Computational Linguistics

United States

Publication History

Published: 19 June 2011

Qualifiers

Research-article

Acceptance Rates

Overall Acceptance Rate 240 of 768 submissions, 31%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
178
Total Downloads

Downloads (Last 12 months)24
Downloads (Last 6 weeks)9

Reflects downloads up to 22 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Pauls AKlein DLi HLin COsborne M(2012)Large-scale syntactic language modeling with treeletsProceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 110.5555/2390524.2390654(959-968)Online publication date: 8-Jul-2012
https://dl.acm.org/doi/10.5555/2390524.2390654

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents