Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/2002472.2002499dlproceedingsArticle/Chapter ViewAbstractPublication PageshltConference Proceedingsconference-collections
research-article
Free access

A large scale distributed syntactic, semantic and lexical language model for machine translation

Published: 19 June 2011 Publication History

Abstract

This paper presents an attempt at building a large scale distributed composite language model that simultaneously accounts for local word lexical information, mid-range sentence syntactic structure, and long-span document semantic content under a directed Markov random field paradigm. The composite language model has been trained by performing a convergent N-best list approximate EM algorithm that has linear time complexity and a follow-up EM algorithm to improve word prediction power on corpora with up to a billion tokens and stored on a supercomputer. The large scale distributed composite language model gives drastic perplexity reduction over n-grams and achieves significantly better translation quality measured by the BLEU score and "readability" when applied to the task of re-ranking the N-best list from a state-of-the-art parsing-based machine translation system.

References

[1]
L. Bahl and J. Baker, F. Jelinek and R. Mercer. 1977. Perplexityła measure of difficulty of speech recognition tasks. 94th Meeting of the Acoustical Society of America, 62:S63, Supplement 1.
[2]
T. Brants et al. 2007. Large language models in machine translation. The 2007 Conference on Empirical Methods in Natural Language Processing (EMNLP), 858--867.
[3]
E. Charniak. 2001. Immediate-head parsing for language models. The 39th Annual Conference on Association of Computational Linguistics (ACL), 124--131.
[4]
E. Charniak, K. Knight and K. Yamada. 2003. Syntax-based language models for statistical machine translation. MT Summit IX., Intl. Assoc. for Machine Translation.
[5]
C. Chelba and F. Jelinek. 1998. Exploiting syntactic structure for language modeling. The 36th Annual Conference on Association of Computational Linguistics (ACL), 225--231.
[6]
C. Chelba and F. Jelinek. 2000. Structured language modeling. Computer Speech and Language, 14(4):283--332.
[7]
D. Chiang. 2005. A hierarchical phrase-based model for statistical machine translation. The 43th Annual Conference on Association of Computational Linguistics (ACL), 263--270.
[8]
D. Chiang. 2007. Hierarchical phrase-based translation. Computational Linguistics, 33(2):201--228.
[9]
J. Dean and S. Ghemawat. 2004. MapReduce: Simplified data processing on large clusters. Operating Systems Design and Implementation (OSDI), 137--150.
[10]
A. Dempster, N. Laird and D. Rubin. 1977. Maximum likelihood estimation from incomplete data via the EM algorithm. Journal of Royal Statistical Society, 39:1--38.
[11]
A. Emami, K. Papineni and J. Sorensen. 2007. Large-scale distributed language modeling. The 32nd IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), IV:37--40.
[12]
T. Hofmann. 2001. Unsupervised learning by probabilistic latent semantic analysis. Machine Learning, 42(1):177--196.
[13]
F. Jelinek and R. Mercer. 1981. Interpolated estimation of Markov source parameters from sparse data. Pattern Recognition in Practice, 381--397.
[14]
F. Jelinek and C. Chelba. 1999. Putting language into language modeling. Sixth European Conference on Speech Communication and Technology (EUROSPEECH), Keynote Paper 1.
[15]
F. Jelinek. 2004. Stochastic analysis of structured language modeling. Mathematical Foundations of Speech and Language Processing, 37--72, Springer-Verlag.
[16]
D. Jurafsky and J. Martin. 2008. Speech and Language Processing, 2nd Edition, Prentice Hall.
[17]
R. Kneser and H. Ney. 1995. Improved backing-off for m-gram language modeling. The 20th IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 181--184.
[18]
P. Koehn, F. Och and D. Marcu. 2003. Statistical phrase-based translation. The Human Language Technology Conference (HLT), 48--54.
[19]
S. Khudanpur and J. Wu. 2000. Maximum entropy techniques for exploiting syntactic, semantic and collocational dependencies in language modeling. Computer Speech and Language, 14(4):355--372.
[20]
A. Lavie et al. 2006. MINDS Workshops Machine Translation Working Group Final Report. http://www-nlpir.nist.gov/MINDS/FINAL/MT.web.pdf
[21]
J. Lin and C. Dyer. 2010. Data-Intensive Text Processing with MapReduce. Morgan and Claypool Publishers.
[22]
R. Northedge. 2005. OpenNLP software http://www.codeproject.com/KB/recipes/englishparsing.aspx
[23]
F. Och. 2003. Minimum error rate training in statistical machine translation. The 41th Annual meeting of the Association for Computational Linguistics (ACL), 311--318.
[24]
K. Papineni, S. Roukos, T. Ward, and W. Zhu. 2002. Bleu: a method for automatic evaluation of machine translation. The 40th Annual meeting of the Association for Computational Linguistics (ACL), 311--318.
[25]
B. Roark. 2001. Probabilistic top-down parsing and language modeling. Computational Linguistics, 27(2):249--276.
[26]
S. Wang et al. 2005. Exploiting syntactic, semantic and lexical regularities in language modeling via directed Markov random fields. The 22nd International Conference on Machine Learning (ICML), 953--960.
[27]
S. Wang et al. 2006. Stochastic analysis of lexical and semantic enhanced structural language model. The 8th International Colloquium on Grammatical Inference (ICGI), 97--111.
[28]
K. Yamada and K. Knight. 2001. A syntax-based statistical translation model. The 39th Annual Conference on Association of Computational Linguistics (ACL), 1067--1074.
[29]
W. Zangwill. 1969. Nonlinear Programming: A Unified Approach. Prentice-Hall.
[30]
Y. Zhang, A. Hildebrand and S. Vogel. 2006. Distributed language modeling for N-best list re-ranking. The 2006 Conference on Empirical Methods in Natural Language Processing (EMNLP), 216--223.
[31]
Y. Zhang, 2008. Structured language models for statistical machine translation. Ph. D. dissertation, CMU.

Cited By

View all
  • (2012)Large-scale syntactic language modeling with treeletsProceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 110.5555/2390524.2390654(959-968)Online publication date: 8-Jul-2012

Index Terms

  1. A large scale distributed syntactic, semantic and lexical language model for machine translation

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image DL Hosted proceedings
      HLT '11: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
      June 2011
      1696 pages
      ISBN:9781932432879

      Publisher

      Association for Computational Linguistics

      United States

      Publication History

      Published: 19 June 2011

      Qualifiers

      • Research-article

      Acceptance Rates

      Overall Acceptance Rate 240 of 768 submissions, 31%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)24
      • Downloads (Last 6 weeks)9
      Reflects downloads up to 22 Sep 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2012)Large-scale syntactic language modeling with treeletsProceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 110.5555/2390524.2390654(959-968)Online publication date: 8-Jul-2012

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Get Access

      Login options

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media