Abstract
We present a Chinese-English Statistical Machine Translation (SMT) system based on dependency tree mappings. We use a state-of-the-art dependency parser to parse the English translation of the Penn Chinese Treebank to make it bilingual and then learn a tree-to-tree dependency mapping model. We also train a phrase-based translation model and collect a bilingual phrase lexicon to bootstrap a treelet translation model. For decoding, we use the same dependency parser on Chinese, using a log-linear framework to integrate the learned translation model with a variety of dependency tree based probability models, and then find the best English dependency tree by dynamic programming. Finally the English tree is flattened to produce the translation. We evaluate our system on the 863 and NIST 2005 Chinese-English MT test data and find that the dependency-based model significantly outperforms Caravan, our phrase-based SMT system which participated in NIST 2006 and IWSLT 2006.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Aho, A.V., Ullman, J.D.: The Theory of Parsing, Translation, and Compiling, Volume I: Parsing. Prentice-Hall, Englewood Cliffs (1972)
Alshawi, H., Bangalore, S., Douglas, S.: Learning dependency transduction models as collections of finite state head transducers. Computational Linguistics 26(1), 45–64 (2000)
Brown, P., DellaPietra, S., DellaPietra, V., Mercer, R.: The mathematics of machine translation: Parameter estimation. Computational Linguistics. 19(2), 263–312 (1993)
Casacuberta, F., Vidal, E.: Machine translation with inferred stochastic finite-state transducers. Computational Linguistics 30(2), 205–225 (2004)
Chelba, C., Engle, D., Jelinek, F., Jimenze, V., Khudanpur, S., Mangu, L., Printz, H., Ristad, E., Rosenfeld, R., Stolcke, A., Wu, D.: Structure and performance of a dependency language model. In: EUROSPEECH’97, Rhodes, Greece (1997)
Charniak, E., Knight, K., Yamada, K.: Syntax-based Language Models for Statistical Machine Translation. In: Proceedings of the 9th Machine Translation Summit, MIT Press, Cambridge (2003)
Chen, Y.D., Shi, X.D.: The XMU Phrase-Based Statistical Machine Translation System for IWSLT 2006. In: Proceedings of IWSLT, Kyoto, Japan, pp. 153–157 (2006)
Cmejrek, M., Curın, J., Havelka, J.: Czech-English Dependency-based Machine Translation. In: Proceedings of EACL 2003, April 12–17, pp. 83–90 (2003)
Collins, M.: Three generative, lexicalized models for statistical parsing. In: Proc. of ACL-97 (1997)
Collins, M.: Head-Driven Statistical Models for Natural Language Parsing, PhD-thesis, University of Pennsylvania, PA. P. Desain and H. Honing (1999)
Dorr, B.J., Pearl, L., Hwa, R., Habash, N.: DUSTer: A Method for Unraveling Cross-Language Divergences for Statistical Word-Level Alignment. In: Proceedings of the Fifth Conference of the Association for Machine Translation in the Americas (2002)
Fox, H.J.: Phrasal Cohesion and Statistical Machine Translation. In: Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing (EMNLP 2002), pp. 304–311 (2002)
Fox, H.J.: Dependency-based Statistical Machine Translation. In: Proceedings of the 2005 ACL Student Workshop (2005)
Gildea, D.: Dependencies vs. constituents for tree-based alignment. In: Proceedings of the EMNLP, pp. 214–221 (2004)
Koehn, P., Och, F.J., Marcu, D.: Statistical Phrase-Based Translation. In: Proceedings of NAACL/HLT (2003)
Knoke, D., Burke, P.J.: Log-Linear Models. Sage Publications, Inc, Newberry Park (1980)
Lee, S.Z., Tsujii, J., Rim, H.C.: Lexicalized Hidden Markov Models for Part-of-Speech Tagging. In: Proceedings of 18th International Conference on Computational Linguistics, Saarbrucken, Germany, August (2000)
Liu, T., Ma, J.S., Li, S.: Building a Dependency Treebank for Improving Chinese Parse. Journal of Chinese Language and Computing 16(4), 207–224 (2006)
Marcus, M., Santorini, B., Marcinkiewicz, M.: Building a large annotated corpus of English: The Penn treebank. Computational Linguistics 19(2) (1993)
Melamed, I.D.: Statistical machine translation by parsing. In: Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics: ACL 2004, pp. 653–660 (2004)
Och, F.J.: Minimum error rate training in statistical. In: Proceedings of the ACL, Sapporo, Japan, pp. 160–167 (2003)
Och, F.J., Ney, H.: The alignment template approach to statistical machine translation. Computational Linguistics 30(4) (2004)
Quirk, C., Menezes, A., Cherry, C.: Dependency Treelet Translation: Syntactically Informed Phrasal SMT. In: Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics, Ann Arbor, Michigan, USA, July (2005)
Wu, D.: Stochastic inversion transduction grammars and bilingual parsing of parallel corpora. Computational Linguistics 23(3), 377–404 (1997)
Xue, N., Xia, F., Chiou, F.D., Palmer, M.: Building a Large Annotated Chinese Corpus: the Penn Chinese Treebank. Journal of Natural Language Engineering 11(2), 207–238 (2005)
Yamada, K., Knight, K.: A Syntax-based Statistical Translation Model. In: Proceedings of the Conference of the Association for Computational Linguistics: ACL 2001 (2001)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Shi, X., Chen, Y., Jia, J. (2007). Dependency-Based Chinese-English Statistical Machine Translation. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2007. Lecture Notes in Computer Science, vol 4394. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-70939-8_34
Download citation
DOI: https://doi.org/10.1007/978-3-540-70939-8_34
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-70938-1
Online ISBN: 978-3-540-70939-8
eBook Packages: Computer ScienceComputer Science (R0)