Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

HPSG-Based Preprocessing for English-to-Japanese Translation

Published: 01 September 2012 Publication History
  • Get Citation Alerts
  • Abstract

    Japanese sentences have completely different word orders from corresponding English sentences. Typical phrase-based statistical machine translation (SMT) systems such as Moses search for the best word permutation within a given distance limit (distortion limit). For English-to-Japanese translation, we need a large distance limit to obtain acceptable translations, and the number of translation candidates is extremely large. Therefore, SMT systems often fail to find acceptable translations within a limited time. To solve this problem, some researchers use rule-based preprocessing approaches, which reorder English words just like Japanese by using dozens of rules. Our idea is based on the following two observations: (1) Japanese is a typical head-final language, and (2) we can detect heads of English sentences by a head-driven phrase structure grammar (HPSG) parser. The main contributions of this article are twofold: First, we demonstrate how off-the-shelf, state-of-the-art HPSG parser enables us to write the reordering rules in an abstract level and can easily improve the quality of English-to-Japanese translation. Second, we also show that syntactic heads achieve better results than semantic heads. The proposed method outperforms the best system of NTCIR-7 PATMT EJ task.

    References

    [1]
    Charniak, E. and Johnson, M. 2005. Coarse-to-fine n-best parsing and MaxEnt discriminative reranking. In Proceedings of the Annual Meeting of the Association of Computational Linguistics (ACL’05). 173--180.
    [2]
    Collins, M., Koehn, P., and Kucerova, I. 2005. Clause restructuring for statistical machine translation. In Proceedings of the Annual Meeting of the Association of Computational Linguistics (ACL’05).
    [3]
    de Marneffe, M.-C., MacCartney, B., and Manning, C. D. 2006. Generating typed dependency parses from phrase structure parses. In Proceedings of the Language Resources and Evaluation Conference (LREC’06). 449--454.
    [4]
    Echizen-ya, H., Ehara, T., Shimohata, S., Fujii, A., Utiyama, M., Yamamoto, M., Utsuro, T., and Kando, N. 2009. Meta-evaluation of automatic evaluation methods for machine translation using patent translation data in NTCIR-7. In Proceedings of the 3rd Workshop on Patent Translation (WPT’09). 9--16.
    [5]
    Fujii, A., Utiyama, M., Yamamoto, M., and Utsuro, T. 2008. Overview of the patent translation task at the NTCIR-7 workshop. In Working Notes of the NII Test Collection for Information Retrieval Workshop Meeting (NTCIR’08). 389--400.
    [6]
    Galley, M., Hopkins, M., Knight, K., and Marcu, D. 2004. What’s in a translation rule? In Proceedings of the North American Chapter of the Association of Computational Linguistics (NAACL’04). 273--280.
    [7]
    Genzel, D. 2010. Automatically learning source-side reordering rules for large-scale machine translation. In Proceedings of the International Conference on Computational Linguistics (COLING’10). 376--384.
    [8]
    Hong, G., Lee, S.-W., and Rim, H.-C. 2009. Bridging morpho-syntactic gap between source and target sentences for English-Korean statistical machine translation. In Proceedings of the International Joint Conference on Natural Language Processing (IJCNLP’09). 233--236.
    [9]
    Huang, L., Knight, K., and Joshi, A. 2006. A syntax-directed translator with extended domain of locality. In Proceedings of Workshop on Computationally Hard Problems and Joint Inference in Speech and Language Processing (CHPJISLP’06). 1--8.
    [10]
    Isozaki, H., Sudoh, K., Tsukada, H., and Duh, K. 2010. Head finalization: A simple reordering rule for SOV languages. In Proceedings of the Joint 5th Workshop on Statistical Machine Translation and MetricsMATR (SMT’10). 250--257.
    [11]
    Izuha, T., Kumano, A., and Kuroda, Y. 2008. Toshiba rule-based machine translation system at NTCIR-7 PAT MT. In Working Notes of the NII Test Collection for Information Retrieval Workshop Meeting (NTCIR’08). 430--434.
    [12]
    Katz-Brown, J. and Collins, M. 2008. Syntactic reordering in preprocessing for Japanese → English translation: MIT system description for NTCIR-7 patent translation task. In Working Notes of the NII Test Collection for Information Retrieval Workshop Meeting (NTCIR’08).
    [13]
    Kendall, M. G. 1975. Rank Correlation Methods. Charles Griffin.
    [14]
    Koehn, P. 2004. Statistical significance tests for machine translation evaluation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’04). 388--395.
    [15]
    Koehn, P. 2009. Statistical Machine Translation. Cambridge University Press.
    [16]
    Koehn, P. 2010. MOSES, Statistical Machine Translation System, User Manual and Code Guide. www.statmt.org/moses/manual/manual.pdf.
    [17]
    Kumai, H., Segawa, H., and Morimoto, Y. 2008. NTCIR-7 patent translation experiments at Hitachi. In Working Notes of the NII Test Collection for Information Retrieval Workshop Meeting (NTCIR’08). 441--444.
    [18]
    Lee, Y.-S., Zhao, B., and Luo, X. 2010. Constituent reordering and syntax models for English-to-Japanese statistical machine translation. In Proceedings of the International Conference on Computational Linguistics (COLING’10). 626--634.
    [19]
    Li, C.-H., Zhang, D., Li, M., Zhou, M., Li, M., and Guan, Y. 2007. A probabilistic approach to syntax-based reordering for statistical machine translation. In Proceedings of the Annual Meeting of the Association of Computational Linguistics (ACL’07). 720--727.
    [20]
    Lin, C.-Y. and Och, F. J. 2004. Automatic evaluation of translation quality using longest common subsequences and skip-bigram statistics. In Proceedings of the Annual Meeting of the Association of Computational Linguistics (ACL’04). 605--612.
    [21]
    Liu, D. and Gildea, D. 2008. Improved tree-to-string transducer for machine translation. In Proceedings of the Workshop on Statistical Machine Translation (SMT’08). 62--69.
    [22]
    Miyao, Y. and Jun’ichi Tsujii. 2008. Feature forest models for probabilistic HPSG parsing. Comput. Linguist. 34, 1, 35--80.
    [23]
    Miyao, Y. and Tsujii, J. 2005. Probabilistic disambiguation models for wide-coverage HPSG parsing. In Proceedings of the Annual Meeting of the Association of Computational Linguistics (ACL’05). 83--90.
    [24]
    Nakachi, K., Furuse, J., Kinoshita, T., Kawashima, M., and And, H. I. 2010. A phase II study of induction chemotherapy with gemcitabine plus S-1 followed by chemoradiotherapy for locally advanced pancreatic cancer. Cancer Chemo. Pharmacol. 66, 3, 527--534.
    [25]
    Nakazawa, T. and Kurohashi, S. 2008. Kyoto-u: Syntactical EBMT system for NTCIR-7 patent translation task. In Working Notes of the NTCIR Workshop Meeting (NTCIR’08). 401--408.
    [26]
    Nguyen, T. P. and Shimazu, A. 2006. Improving phrase-based statistical machine translation with morphosyntactic transformation. Mach. Trans. 20, 3, 147--166.
    [27]
    Och, F. J. and Ney, H. 2003. A systematic comparison of various statistical alignment models. Comput. Linguist. 29, 1, 19--51.
    [28]
    Papineni, K., Roukos, S., Ward, T., and Zhu, W.-J. 2002. BLEU: A method for automatic evaluation of machine translation. In Proceedings of the Annual Meeting of the Association of Computational Linguistics (ACL’02). 311--318.
    [29]
    Pollard, C. and Sag, I. A. 1994. Head-Driven Phrase Structure Grammar. University of Chicago Press.
    [30]
    Quirk, C., Menezes, A., and Cherry, C. 2005. Dependency treelet translation: Syntactically informed phrasal SMT. In Proceedings of the Annual Meeting of the Association of Computational Linguistics (ACL’05). 271--279.
    [31]
    Snover, M., Dorr, B., Schwartz, R., Micciulla, L., and Makhoul, J. 2006. A study of translation edit rate with targeted human annotation. In Proceedings of Association for Machine Translation in the Americas (AMTA’06).
    [32]
    Su, K.-Y., Wu, M.-W., and Chang, J.-S. 1992. A new quantitative quality measure for machine translation systems. In Proceedings of the International Conference on Computational Linguistics (COLING’92). 433--439.
    [33]
    Sudoh, K., Duh, K., Tsukada, H., Hirao, T., and Nagata, M. 2010. Divide and translate: Improving long distance reordering in statistical machine translation. In Proceedings of the Joint 5th Workshop on Statistical Machine Translation and MetricsMATR (SMT’10). 418--427.
    [34]
    Toutanova, K. and Suzuki, H. 2007. Generating case markers in machine translation. In Proceedings of the Human Language Technology Conference/North American Chapter of the Association for Computational Linguistics (HLT-NAACL’07). 49--56.
    [35]
    Watanabe, T., Tsukada, H., and Isozaki, H. 2008. NTT SMT System 2008 at NTCIR-7. In Working Notes of the NTCIR Workshop Meeting (NTCIR’08). 420--422.
    [36]
    Wu, D. 1997. Stochastic inversion transduction grammars and bilingual parsing of parallel corpora. Comput. Linguist. 23, 3, 377--403.
    [37]
    Wu, X., Matsuzaki, T., and Tsujii, J. 2010. Fine-grained tree-to-string translation rule extraction. In Proceedings of the Annual Meeting of the Association of Computational Linguistics (ACL’10). 325--334.
    [38]
    Xia, F. and McCord, M. 2004. Improving a statistical MT system with automatically learned rewrite patterns. In Proceedings of the International Conference on Computational Linguistics (COLING’04). 508--514.
    [39]
    Xu, P., Kang, J., Ringgaard, M., and Och, F. 2009. Using a dependency parser to improve SMT for Subject-Object-Verb languages. In Proceedings of the Human Language Technology Conference/North American Chapter of the Association for Computational Linguistics (HLT-NAACL’09). 245--253.
    [40]
    Yamada, K. and Knight, K. 2001. A syntax-based statistical translation model. In Proceedings of the Annual Meeting of the Association of Computational Linguistics (ACL’01). 523--530.

    Cited By

    View all
    • (2022)Improving thai-lao neural machine translation with similarity lexiconJournal of Intelligent & Fuzzy Systems10.3233/JIFS-21223642:4(4005-4014)Online publication date: 4-Mar-2022
    • (2021)Word reordering on multiple pivots for the Japanese and Indonesian language pairMachine Translation10.1007/s10590-021-09288-835:4(611-636)Online publication date: 1-Dec-2021
    • (2021)Transfer Learning for Chinese-Lao Neural Machine Translation with Linguistic SimilarityMachine Translation10.1007/978-981-33-6162-1_1(1-10)Online publication date: 14-Jan-2021
    • Show More Cited By

    Index Terms

    1. HPSG-Based Preprocessing for English-to-Japanese Translation

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Transactions on Asian Language Information Processing
      ACM Transactions on Asian Language Information Processing  Volume 11, Issue 3
      September 2012
      93 pages
      ISSN:1530-0226
      EISSN:1558-3430
      DOI:10.1145/2334801
      Issue’s Table of Contents
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 01 September 2012
      Accepted: 01 August 2011
      Revised: 01 June 2011
      Received: 01 March 2011
      Published in TALIP Volume 11, Issue 3

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. English
      2. HPSG
      3. Japanese
      4. Machine translation
      5. SOV
      6. SVO

      Qualifiers

      • Research-article
      • Research
      • Refereed

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)6
      • Downloads (Last 6 weeks)0

      Other Metrics

      Citations

      Cited By

      View all
      • (2022)Improving thai-lao neural machine translation with similarity lexiconJournal of Intelligent & Fuzzy Systems10.3233/JIFS-21223642:4(4005-4014)Online publication date: 4-Mar-2022
      • (2021)Word reordering on multiple pivots for the Japanese and Indonesian language pairMachine Translation10.1007/s10590-021-09288-835:4(611-636)Online publication date: 1-Dec-2021
      • (2021)Transfer Learning for Chinese-Lao Neural Machine Translation with Linguistic SimilarityMachine Translation10.1007/978-981-33-6162-1_1(1-10)Online publication date: 14-Jan-2021
      • (2019)Exploiting Dependency-based Pre-ordering for English-Myanmar Statistical Machine Translation2019 23rd International Computer Science and Engineering Conference (ICSEC)10.1109/ICSEC47112.2019.8974760(192-196)Online publication date: Oct-2019
      • (2019)Reordering Techniques in Japanese and English Machine TranslationAdvances in Empirical Translation Studies10.1017/9781108525695.009(164-176)Online publication date: 10-Jun-2019
      • (2016)Inter-, Intra-, and Extra-Chunk Pre-Ordering for Statistical Japanese-to-English Machine TranslationACM Transactions on Asian and Low-Resource Language Information Processing10.1145/281838115:3(1-28)Online publication date: 9-Jan-2016
      • (2016)Syntax-Based Pre-reordering for Chinese-to-Japanese Statistical Machine TranslationHybrid Approaches to Machine Translation10.1007/978-3-319-21311-8_4(77-108)Online publication date: 13-Jul-2016
      • (2015)Media Processing Technology for Achieving Hospitality while on the GoNTT Technical Review10.53829/ntr201504fa413:4(28-34)Online publication date: Apr-2015
      • (2015)Preordering using a Target-Language Parser via Cross-Language Syntactic Projection for Statistical Machine TranslationACM Transactions on Asian and Low-Resource Language Information Processing10.1145/269992514:3(1-23)Online publication date: 12-Jun-2015
      • (2014)Incremental Word Re-Ordering and Article Generation: Its Application to Japanese-to-English Machine TranslationJournal of Natural Language Processing10.5715/jnlp.21.103721:5(1037-1057)Online publication date: 2014
      • Show More Cited By

      View Options

      Get Access

      Login options

      Full Access

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media