Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article
Open access

How to Choose the Best Pivot Language for Automatic Translation of Low-Resource Languages

Published: 01 October 2013 Publication History

Abstract

Recent research on multilingual statistical machine translation focuses on the usage of pivot languages in order to overcome language resource limitations for certain language pairs. Due to the richness of available language resources, English is, in general, the pivot language of choice. However, factors like language relatedness can also effect the choice of the pivot language for a given language pair, especially for Asian languages, where language resources are currently quite limited. In this article, we provide new insights into what factors make a pivot language effective and investigate the impact of these factors on the overall pivot translation performance for translation between 22 Indo-European and Asian languages. Experimental results using state-of-the-art statistical machine translation techniques revealed that the translation quality of 54.8% of the language pairs improved when a non-English pivot language was chosen. Moreover, 81.0% of system performance variations can be explained by a combination of factors such as language family, vocabulary, sentence length, language perplexity, translation model entropy, reordering, monotonicity, and engine performance.

References

[1]
Bogdan Babych, Anthony Hartley, and Serge Sharoff. 2005. Translating from under-resourced languages: Comparing direct transfer against pivot translation. In Proceedings of the Machine Translation Summit XI. International Association for Machine Translation, 29--35.
[2]
Nicola Bertoldi, Madalina Barbaiani, Marcello Federico, and Roldano Cattoni. 2008. Phrase-based statistical machine translation with pivot languages. In Proceedings of the 5th International Workshop on Spoken Language Translation (IWSLT). 143--149.
[3]
Alexandra Birch, Miles Osborne, and Philipp Koehn. 2008. Predicting success in machine translation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP). 745--754.
[4]
Trevor Cohn and Mirella Lapata. 2007. Machine translation by triangulation: Making effective use of multi-parallel corpora. In Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics. Association for Computational Linguistics, 728--735.
[5]
Adria de Gispert and Jose B. Marino. 2006. Catalan-English statistical machine translation without parallel corpus: Bridging through Spanish. In Proceedings of the 5th International Conference on Language Resources and Evaluation (LREC). 65--68.
[6]
Andrew Finch, Etienne Denoual, Hideo Okuma, Michael Paul, Hirofumi Yamamoto, Keiji Yasuda, Ruiqiang Zhang, and Eiichiro Sumita. 2007. The NICT/ATR speech translation system for IWSLT 2007. In Proceedings of the 4th International Workshop on Spoken Language Translation (IWSLT). 103--110.
[7]
Genichiro Kikui, Seiichi Yamamoto, Toshiyuki Takezawa, and Eiichiro Sumita. 2006. Comparative study on corpora for speech translation. IEEE Trans. Audio Speech Lang. 14, 5, 1674--1682.
[8]
Philipp Koehn, Alexandra Birch, and Ralf Steinberger. 2009. 462 Machine translation systems for Europe. In Proceedings of the Machine Translation Summit XII. International Association for Machine Translation, 65--72.
[9]
Gregor Leusch, Aurélien Max, Josep Maria Crego, and Hermann Ney. 2010. Multi-pivot translation by system combination. In Proceedings of the 7th International Workshop on Spoken Language Translation (IWSLT). 299--306.
[10]
Franz Josef Och and Hermann Ney. 2003. A systematic comparison of various statistical alignment models. Computational Linguistics 29, 1, 19--51.
[11]
Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. BLEU: A method for automatic evaluation of machine translation. In Proceedings of the 40th Association for Computational Linguistics (ACL). 311--318.
[12]
Michael Paul. 2008. Overview of the IWSLT 2008 evaluation campaign. In Proceedings of the 5th International Workshop on Spoken Language Translation (IWSLT). 1--17.
[13]
Michael Paul, Hirofumi Yamamoto, Eiichiro Sumita, and Satoshi Nakamura. 2009. On the importance of pivot language selection for statistical machine translation. In Proceedings of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL/HLT). Association for Computational Linguistics, 221--224.
[14]
Lucia Specia, Najeh Hajlaoui, Catalina Hallet, and Wiler Aziz. 2011. Predicting machine translation adequacy. In Proceedings of the Machine Translation Summit XIII. International Association for Machine Translation, 513--520.
[15]
Andreas Stolcke. 2002. SRILM - An extensible language modeling toolkit. In Proceedings of the 7th International Conference on Spoken Language Processing (ICSLP). 901--904.
[16]
Masao Utiyama and Hitoshi Isahara. 2007. A comparison of pivot methods for phrase-based statistical machine translation. In Proceedings of the Human Language Technologies (HLT). 484--491.
[17]
Hua Wu and Haifeng Wang. 2007. Pivot language approach for phrase-based statistical machine translation. In Proceedings of the 45th Association for Computational Linguistics (ACL). 856--863.

Cited By

View all
  • (2024)Mixture-of-languages Routing for Multilingual DialoguesACM Transactions on Information Systems10.1145/3676956Online publication date: 5-Aug-2024
  • (2023)All Translation Tools Are Not Equal: Investigating the Quality of Language Translation for Forced Migration2023 IEEE 10th International Conference on Data Science and Advanced Analytics (DSAA)10.1109/DSAA60987.2023.10302481(1-10)Online publication date: 9-Oct-2023
  • (2023)Research on Chinese-Lao Neural Machine Translation Based on Multi-Pivot2023 2nd International Conference on Artificial Intelligence and Computer Information Technology (AICIT)10.1109/AICIT59054.2023.10277799(1-5)Online publication date: 15-Sep-2023
  • Show More Cited By

Index Terms

  1. How to Choose the Best Pivot Language for Automatic Translation of Low-Resource Languages

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Asian Language Information Processing
    ACM Transactions on Asian Language Information Processing  Volume 12, Issue 4
    October 2013
    86 pages
    ISSN:1530-0226
    EISSN:1558-3430
    DOI:10.1145/2523057
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 01 October 2013
    Accepted: 01 July 2013
    Revised: 01 May 2013
    Received: 01 January 2013
    Published in TALIP Volume 12, Issue 4

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Asian languages
    2. Machine translation
    3. pivot language selection
    4. translation quality indicators

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)120
    • Downloads (Last 6 weeks)13
    Reflects downloads up to 04 Oct 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Mixture-of-languages Routing for Multilingual DialoguesACM Transactions on Information Systems10.1145/3676956Online publication date: 5-Aug-2024
    • (2023)All Translation Tools Are Not Equal: Investigating the Quality of Language Translation for Forced Migration2023 IEEE 10th International Conference on Data Science and Advanced Analytics (DSAA)10.1109/DSAA60987.2023.10302481(1-10)Online publication date: 9-Oct-2023
    • (2023)Research on Chinese-Lao Neural Machine Translation Based on Multi-Pivot2023 2nd International Conference on Artificial Intelligence and Computer Information Technology (AICIT)10.1109/AICIT59054.2023.10277799(1-5)Online publication date: 15-Sep-2023
    • (2022)Low-resource Neural Machine Translation: Methods and TrendsACM Transactions on Asian and Low-Resource Language Information Processing10.1145/352430021:5(1-22)Online publication date: 15-Mar-2022
    • (2021)Word reordering on multiple pivots for the Japanese and Indonesian language pairMachine Translation10.1007/s10590-021-09288-835:4(611-636)Online publication date: 1-Dec-2021
    • (2019)European Rural Development Policy Approaching Health Issues: An Exploration of Programming SchemesInternational Journal of Environmental Research and Public Health10.3390/ijerph1616297316:16(2973)Online publication date: 18-Aug-2019
    • (2019)Toward any-language zero-shot topic classification of textual documentsArtificial Intelligence10.1016/j.artint.2019.02.002Online publication date: Feb-2019
    • (2018)Minimum Bayes-Risk Phrase Table Pruning for Pivot-Based Machine Translation in Internet of ThingsIEEE Access10.1109/ACCESS.2018.28727736(55754-55764)Online publication date: 2018
    • (2017)Twitter communication of agri‐food chain actors on palm oil environmental, socio‐economic, and health sustainabilityJournal of Consumer Behaviour10.1002/cb.169917:1(75-93)Online publication date: 12-Dec-2017

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Get Access

    Login options

    Full Access

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media