research-article

HPSG-Based Preprocessing for English-to-Japanese Translation

Authors:

Hideki Isozaki,

Katsuhito Sudoh,

Hajime Tsukada, and

Kevin DuhAuthors Info & Claims

ACM Transactions on Asian Language Information Processing (TALIP), Volume 11, Issue 3

Article No.: 8, Pages 1 - 16

https://doi.org/10.1145/2334801.2334802

Published: 01 September 2012 Publication History

Abstract

Japanese sentences have completely different word orders from corresponding English sentences. Typical phrase-based statistical machine translation (SMT) systems such as Moses search for the best word permutation within a given distance limit (distortion limit). For English-to-Japanese translation, we need a large distance limit to obtain acceptable translations, and the number of translation candidates is extremely large. Therefore, SMT systems often fail to find acceptable translations within a limited time. To solve this problem, some researchers use rule-based preprocessing approaches, which reorder English words just like Japanese by using dozens of rules. Our idea is based on the following two observations: (1) Japanese is a typical head-final language, and (2) we can detect heads of English sentences by a head-driven phrase structure grammar (HPSG) parser. The main contributions of this article are twofold: First, we demonstrate how off-the-shelf, state-of-the-art HPSG parser enables us to write the reordering rules in an abstract level and can easily improve the quality of English-to-Japanese translation. Second, we also show that syntactic heads achieve better results than semantic heads. The proposed method outperforms the best system of NTCIR-7 PATMT EJ task.

References

[1]

Charniak, E. and Johnson, M. 2005. Coarse-to-fine n-best parsing and MaxEnt discriminative reranking. In Proceedings of the Annual Meeting of the Association of Computational Linguistics (ACL’05). 173--180.

Digital Library

[2]

Collins, M., Koehn, P., and Kucerova, I. 2005. Clause restructuring for statistical machine translation. In Proceedings of the Annual Meeting of the Association of Computational Linguistics (ACL’05).

Digital Library

[3]

de Marneffe, M.-C., MacCartney, B., and Manning, C. D. 2006. Generating typed dependency parses from phrase structure parses. In Proceedings of the Language Resources and Evaluation Conference (LREC’06). 449--454.

[4]

Echizen-ya, H., Ehara, T., Shimohata, S., Fujii, A., Utiyama, M., Yamamoto, M., Utsuro, T., and Kando, N. 2009. Meta-evaluation of automatic evaluation methods for machine translation using patent translation data in NTCIR-7. In Proceedings of the 3rd Workshop on Patent Translation (WPT’09). 9--16.

[5]

Fujii, A., Utiyama, M., Yamamoto, M., and Utsuro, T. 2008. Overview of the patent translation task at the NTCIR-7 workshop. In Working Notes of the NII Test Collection for Information Retrieval Workshop Meeting (NTCIR’08). 389--400.

[6]

Galley, M., Hopkins, M., Knight, K., and Marcu, D. 2004. What’s in a translation rule? In Proceedings of the North American Chapter of the Association of Computational Linguistics (NAACL’04). 273--280.

[7]

Genzel, D. 2010. Automatically learning source-side reordering rules for large-scale machine translation. In Proceedings of the International Conference on Computational Linguistics (COLING’10). 376--384.

Digital Library

[8]

Hong, G., Lee, S.-W., and Rim, H.-C. 2009. Bridging morpho-syntactic gap between source and target sentences for English-Korean statistical machine translation. In Proceedings of the International Joint Conference on Natural Language Processing (IJCNLP’09). 233--236.

Digital Library

[9]

Huang, L., Knight, K., and Joshi, A. 2006. A syntax-directed translator with extended domain of locality. In Proceedings of Workshop on Computationally Hard Problems and Joint Inference in Speech and Language Processing (CHPJISLP’06). 1--8.

Digital Library

[10]

Isozaki, H., Sudoh, K., Tsukada, H., and Duh, K. 2010. Head finalization: A simple reordering rule for SOV languages. In Proceedings of the Joint 5th Workshop on Statistical Machine Translation and MetricsMATR (SMT’10). 250--257.

Digital Library

[11]

Izuha, T., Kumano, A., and Kuroda, Y. 2008. Toshiba rule-based machine translation system at NTCIR-7 PAT MT. In Working Notes of the NII Test Collection for Information Retrieval Workshop Meeting (NTCIR’08). 430--434.

[12]

Katz-Brown, J. and Collins, M. 2008. Syntactic reordering in preprocessing for Japanese → English translation: MIT system description for NTCIR-7 patent translation task. In Working Notes of the NII Test Collection for Information Retrieval Workshop Meeting (NTCIR’08).

[13]

Kendall, M. G. 1975. Rank Correlation Methods. Charles Griffin.

[14]

Koehn, P. 2004. Statistical significance tests for machine translation evaluation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’04). 388--395.

[15]

Koehn, P. 2009. Statistical Machine Translation. Cambridge University Press.

Digital Library

[16]

Koehn, P. 2010. MOSES, Statistical Machine Translation System, User Manual and Code Guide. www.statmt.org/moses/manual/manual.pdf.

[17]

Kumai, H., Segawa, H., and Morimoto, Y. 2008. NTCIR-7 patent translation experiments at Hitachi. In Working Notes of the NII Test Collection for Information Retrieval Workshop Meeting (NTCIR’08). 441--444.

[18]

Lee, Y.-S., Zhao, B., and Luo, X. 2010. Constituent reordering and syntax models for English-to-Japanese statistical machine translation. In Proceedings of the International Conference on Computational Linguistics (COLING’10). 626--634.

Digital Library

[19]

Li, C.-H., Zhang, D., Li, M., Zhou, M., Li, M., and Guan, Y. 2007. A probabilistic approach to syntax-based reordering for statistical machine translation. In Proceedings of the Annual Meeting of the Association of Computational Linguistics (ACL’07). 720--727.

[20]

Lin, C.-Y. and Och, F. J. 2004. Automatic evaluation of translation quality using longest common subsequences and skip-bigram statistics. In Proceedings of the Annual Meeting of the Association of Computational Linguistics (ACL’04). 605--612.

Digital Library

[21]

Liu, D. and Gildea, D. 2008. Improved tree-to-string transducer for machine translation. In Proceedings of the Workshop on Statistical Machine Translation (SMT’08). 62--69.

Digital Library

[22]

Miyao, Y. and Jun’ichi Tsujii. 2008. Feature forest models for probabilistic HPSG parsing. Comput. Linguist. 34, 1, 35--80.

Digital Library

[23]

Miyao, Y. and Tsujii, J. 2005. Probabilistic disambiguation models for wide-coverage HPSG parsing. In Proceedings of the Annual Meeting of the Association of Computational Linguistics (ACL’05). 83--90.

Digital Library

[24]

Nakachi, K., Furuse, J., Kinoshita, T., Kawashima, M., and And, H. I. 2010. A phase II study of induction chemotherapy with gemcitabine plus S-1 followed by chemoradiotherapy for locally advanced pancreatic cancer. Cancer Chemo. Pharmacol. 66, 3, 527--534.

[25]

Nakazawa, T. and Kurohashi, S. 2008. Kyoto-u: Syntactical EBMT system for NTCIR-7 patent translation task. In Working Notes of the NTCIR Workshop Meeting (NTCIR’08). 401--408.

[26]

Nguyen, T. P. and Shimazu, A. 2006. Improving phrase-based statistical machine translation with morphosyntactic transformation. Mach. Trans. 20, 3, 147--166.

Digital Library

[27]

Och, F. J. and Ney, H. 2003. A systematic comparison of various statistical alignment models. Comput. Linguist. 29, 1, 19--51.

Digital Library

[28]

Papineni, K., Roukos, S., Ward, T., and Zhu, W.-J. 2002. BLEU: A method for automatic evaluation of machine translation. In Proceedings of the Annual Meeting of the Association of Computational Linguistics (ACL’02). 311--318.

Digital Library

[29]

Pollard, C. and Sag, I. A. 1994. Head-Driven Phrase Structure Grammar. University of Chicago Press.

[30]

Quirk, C., Menezes, A., and Cherry, C. 2005. Dependency treelet translation: Syntactically informed phrasal SMT. In Proceedings of the Annual Meeting of the Association of Computational Linguistics (ACL’05). 271--279.

Digital Library

[31]

Snover, M., Dorr, B., Schwartz, R., Micciulla, L., and Makhoul, J. 2006. A study of translation edit rate with targeted human annotation. In Proceedings of Association for Machine Translation in the Americas (AMTA’06).

[32]

Su, K.-Y., Wu, M.-W., and Chang, J.-S. 1992. A new quantitative quality measure for machine translation systems. In Proceedings of the International Conference on Computational Linguistics (COLING’92). 433--439.

Digital Library

[33]

Sudoh, K., Duh, K., Tsukada, H., Hirao, T., and Nagata, M. 2010. Divide and translate: Improving long distance reordering in statistical machine translation. In Proceedings of the Joint 5th Workshop on Statistical Machine Translation and MetricsMATR (SMT’10). 418--427.

Digital Library

[34]

Toutanova, K. and Suzuki, H. 2007. Generating case markers in machine translation. In Proceedings of the Human Language Technology Conference/North American Chapter of the Association for Computational Linguistics (HLT-NAACL’07). 49--56.

[35]

Watanabe, T., Tsukada, H., and Isozaki, H. 2008. NTT SMT System 2008 at NTCIR-7. In Working Notes of the NTCIR Workshop Meeting (NTCIR’08). 420--422.

[36]

Wu, D. 1997. Stochastic inversion transduction grammars and bilingual parsing of parallel corpora. Comput. Linguist. 23, 3, 377--403.

Digital Library

[37]

Wu, X., Matsuzaki, T., and Tsujii, J. 2010. Fine-grained tree-to-string translation rule extraction. In Proceedings of the Annual Meeting of the Association of Computational Linguistics (ACL’10). 325--334.

Digital Library

[38]

Xia, F. and McCord, M. 2004. Improving a statistical MT system with automatically learned rewrite patterns. In Proceedings of the International Conference on Computational Linguistics (COLING’04). 508--514.

Digital Library

[39]

Xu, P., Kang, J., Ringgaard, M., and Och, F. 2009. Using a dependency parser to improve SMT for Subject-Object-Verb languages. In Proceedings of the Human Language Technology Conference/North American Chapter of the Association for Computational Linguistics (HLT-NAACL’09). 245--253.

Digital Library

[40]

Yamada, K. and Knight, K. 2001. A syntax-based statistical translation model. In Proceedings of the Annual Meeting of the Association of Computational Linguistics (ACL’01). 523--530.

Digital Library

Cited By

Yu ZHuang YGuo J(2022)Improving thai-lao neural machine translation with similarity lexiconJournal of Intelligent & Fuzzy Systems10.3233/JIFS-21223642:4(4005-4014)Online publication date: 4-Mar-2022
https://doi.org/10.3233/JIFS-212236
Budiwati SAritsugi M(2021)Word reordering on multiple pivots for the Japanese and Indonesian language pairMachine Translation10.1007/s10590-021-09288-835:4(611-636)Online publication date: 1-Dec-2021
https://dl.acm.org/doi/10.1007/s10590-021-09288-8
Yu ZYu ZHuang YGuo JWang ZMan Z(2021)Transfer Learning for Chinese-Lao Neural Machine Translation with Linguistic SimilarityMachine Translation10.1007/978-981-33-6162-1_1(1-10)Online publication date: 14-Jan-2021
https://doi.org/10.1007/978-981-33-6162-1_1
Show More Cited By

Index Terms

HPSG-Based Preprocessing for English-to-Japanese Translation
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing

Recommendations

Inter-, Intra-, and Extra-Chunk Pre-Ordering for Statistical Japanese-to-English Machine Translation

A rule-based pre-ordering approach is proposed for statistical Japanese-to-English machine translation using the dependency structure of source-side sentences. A Japanese sentence is pre-ordered to an English-like order at the morpheme level for a ...
Read More
Syntax-Based Post-Ordering for Efficient Japanese-to-English Translation

This article proposes a novel reordering method for efficient two-step Japanese-to-English statistical machine translation (SMT) that isolates reordering from SMT and solves it after lexical translation. This reordering problem, called post-ordering, is ...
Read More
Source-side Reordering to Improve Machine Translation between Languages with Distinct Word Orders
English and Hindi have significantly different word orders. English follows the subject-verb-object (SVO) order, while Hindi primarily follows the subject-object-verb (SOV) order. This difference poses challenges to modeling this pair of languages for ...
Read More

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Asian Language Information Processing

ACM Transactions on Asian Language Information Processing Volume 11, Issue 3

September 2012

93 pages

ISSN:1530-0226

EISSN:1558-3430

DOI:10.1145/2334801

Issue’s Table of Contents

Copyright © 2012 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 September 2012

Accepted: 01 August 2011

Revised: 01 June 2011

Received: 01 March 2011

Published in TALIP Volume 11, Issue 3

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

12
Total Citations
View Citations
448
Total Downloads

Downloads (Last 12 months)6
Downloads (Last 6 weeks)0

Other Metrics

View Author Metrics

Citations

Cited By

Yu ZHuang YGuo J(2022)Improving thai-lao neural machine translation with similarity lexiconJournal of Intelligent & Fuzzy Systems10.3233/JIFS-21223642:4(4005-4014)Online publication date: 4-Mar-2022
https://doi.org/10.3233/JIFS-212236
Budiwati SAritsugi M(2021)Word reordering on multiple pivots for the Japanese and Indonesian language pairMachine Translation10.1007/s10590-021-09288-835:4(611-636)Online publication date: 1-Dec-2021
https://dl.acm.org/doi/10.1007/s10590-021-09288-8
Yu ZYu ZHuang YGuo JWang ZMan Z(2021)Transfer Learning for Chinese-Lao Neural Machine Translation with Linguistic SimilarityMachine Translation10.1007/978-981-33-6162-1_1(1-10)Online publication date: 14-Jan-2021
https://doi.org/10.1007/978-981-33-6162-1_1
Nyein MMar Soe K(2019)Exploiting Dependency-based Pre-ordering for English-Myanmar Statistical Machine Translation2019 23rd International Computer Science and Engineering Conference (ICSEC)10.1109/ICSEC47112.2019.8974760(192-196)Online publication date: Oct-2019
https://doi.org/10.1109/ICSEC47112.2019.8974760
Nagata M(2019)Reordering Techniques in Japanese and English Machine TranslationAdvances in Empirical Translation Studies10.1017/9781108525695.009(164-176)Online publication date: 10-Jun-2019
https://doi.org/10.1017/9781108525695.009
Ding CSakanushi KTouji HYamamoto M(2016)Inter-, Intra-, and Extra-Chunk Pre-Ordering for Statistical Japanese-to-English Machine TranslationACM Transactions on Asian and Low-Resource Language Information Processing10.1145/281838115:3(1-28)Online publication date: 9-Jan-2016
https://dl.acm.org/doi/10.1145/2818381
Han DMartínez-Gómez PMiyao Y(2016)Syntax-Based Pre-reordering for Chinese-to-Japanese Statistical Machine TranslationHybrid Approaches to Machine Translation10.1007/978-3-319-21311-8_4(77-108)Online publication date: 13-Jul-2016
https://doi.org/10.1007/978-3-319-21311-8_4
Horii MArai KNagata MKashino KHiramatsu KFukayama AYamaguchi H(2015)Media Processing Technology for Achieving Hospitality while on the GoNTT Technical Review10.53829/ntr201504fa413:4(28-34)Online publication date: Apr-2015
https://doi.org/10.53829/ntr201504fa4
Goto IUtiyama MSumita EKurohashi S(2015)Preordering using a Target-Language Parser via Cross-Language Syntactic Projection for Statistical Machine TranslationACM Transactions on Asian and Low-Resource Language Information Processing10.1145/269992514:3(1-23)Online publication date: 12-Jun-2015
https://dl.acm.org/doi/10.1145/2699925
Hayashi KSudoh KTsukada HSuzuki JNagata M(2014)Incremental Word Re-Ordering and Article Generation: Its Application to Japanese-to-English Machine TranslationJournal of Natural Language Processing10.5715/jnlp.21.103721:5(1037-1057)Online publication date: 2014
https://doi.org/10.5715/jnlp.21.1037
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents