Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
note

Integrating Shallow Syntactic Labels in the Phrase-Boundary Translation Model

Published: 14 February 2018 Publication History

Abstract

Using a novel rule labeling method, this article proposes a hierarchical model for statistical machine translation. The proposed model labels translation rules by matching the boundaries of target side phrases with the shallow syntactic labels including POS tags and chunk labels on the target side of the training corpus. The boundary labels are concatenated if there is no label for the whole target span. Labeling with the classes of boundary words on the target side phrases has been previously proposed as a phrase-boundary model which can be considered as the base form of our model. In the extended model, the labeler uses a POS tag if there is no chunk label in one boundary. Using chunks as phrase labels, the proposed model generalizes the rules to decrease the model sparseness. The sparseness is a more important issue in the language pairs with a lot of differences in the word order because they have less number of aligned phrase pairs for extraction of rules. The extended phrase-boundary model is also applicable for low-resource languages having no syntactic parser. Some experiments are performed with the proposed model, the base phrase-boundary model, and variants of Syntax Augmented Machine Translation (SAMT) in translation from Persian and German to English as source and target languages with different word orders. According to the results, the proposed model improves the translation performance in the quality and decoding time aspects. Using BLEU as our metric, the proposed model has achieved a statistically significant improvement of about 0.5 point over the base phrase-boundary model.

References

[1]
H. Almaghout, J. Jiang, and A. Way. 2010. CCG augmented hierarchical phrase based machine-translation. In Proceedings of the 7th International Workshop on Spoken Language Translation. 211--218.
[2]
C. Cherry. 2013. Improved reordering for phrase-based translation using sparse features. In Proceedings of HLT-NAACL. 22--31.
[3]
D. Chiang. 2005. A hierarchical phrase-based model for statistical machine translation. In Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics. 263--270.
[4]
D. Chiang. 2010. Learning to translate with source and target syntax. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics. 1443--1452.
[5]
D. Chiang et al. 2005. The Hiero machine translation system: Extensions, evaluation, and analysis. In Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing. 779--786.
[6]
R. Collobert et al. 2011. Natural language processing (almost) from scratch. The Journal of Machine Learning Research 12 (2011), 2493--2537.
[7]
R. Haque et al. 2010. Supertags as source language context in hierarchical phrase-based SMT. In Proceedings of the 9th Conference of the Association for Machine Translation in the Americas.
[8]
Z. He, Q. Liu, and S. Lin. 2008. Improving statistical machine translation using lexicalized rule selection. In Proceedings of the 22nd International Conference on Computational Linguistics, Vol. 1. 321--328.
[9]
Z. He, Y. Meng, and H. Yu. 2009. Discarding monotone composed rule for hierarchical phrase-based statistical machine translation. In Proceedings of the 3rd International Universal Communication Symposium. 25--29.
[10]
Z. Huang, M. Čmejrek, and B. Zhou. 2010. Soft syntactic constraints for hierarchical phrase-based translation using latent syntactic distributions. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing. 138--147.
[11]
M. Huck et al. 2012. Discriminative reordering extensions for hierarchical phrase-based machine translation. In Proceedings of the 16th Annual Conference of the European Association for Machine Translation. 313--320.
[12]
G. Iglesias et al. 2009. Rule filtering by pattern for efficient hierarchical translation. In Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics. 380--388.
[13]
O. Kashefi. 2018. MIZAN: A large persian-english parallel corpus. CoRR abs/1801.02107. Available at: http://arxiv.org/abs/1801.02107.
[14]
D. Klein and C. D. Manning. 2003. Accurate unlexicalized parsing. In Proceedings of the 41st Annual Meeting on Association for Computational Linguistics, Volume 1. 423--430.
[15]
P. Koehn. 2005. Europarl: A parallel corpus for statistical machine translation. In Proceedings of the MT Summit. 79--86.
[16]
P. Koehn et al. 2007. Moses: Open source toolkit for statistical machine translation. In Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions. 177--180.
[17]
P. Koehn. 2004. Statistical significance tests for machine translation evaluation. In Proceedings of EMNLP. 388--395.
[18]
P. Koehn, F. J. Och, and D. Marcu. 2003. Statistical phrase-based translation. In Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, Vol. 1. 48--54.
[19]
S.-W. Lee et al. 2012. Translation model size reduction for hierarchical phrase-based statistical machine translation. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers, Vol. 2. 291--295.
[20]
J. Li et al. 2012. Head-driven hierarchical phrase-based translation. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers, Vol. 2. 33--37.
[21]
Z. Li et al. 2009. Joshua: An open source toolkit for parsing-based machine translation. In Proceedings of the 4th Workshop on Statistical Machine Translation. 135--139.
[22]
Y. Marton and P. Resnik. 2008. Soft syntactic constraints for hierarchical phras-based translation. In Proceedings of ACL. 1003--1011.
[23]
H. Mino, T. Watanabe, and E. Sumita. 2014. Syntax-augmented machine translation using syntax-label clustering. In Proceedings of EMNLP. 165--171.
[24]
F. J. Och. 2003. Minimum error rate training in statistical machine translation. In Proceedings of the 41st Annual Meeting on Association for Computational Linguistics, Vol. 1. 160--167.
[25]
F. J. Och and H. Ney. 2003. A systematic comparison of various statistical alignment models. Computational Linguistics 29, 1 (2003), 19--51.
[26]
F. J. Och and H. Ney. 2000. Improved statistical alignment models. In Proceedings of the 38th Annual Meeting on Association for Computational Linguistics. 440--447.
[27]
K. Papineni et al. 2002. BLEU: A method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics. 311--318.
[28]
A. Pauls and D. Klein. 2011. Faster and smaller n-gram language models. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Vol. 1. 258--267.
[29]
M. Post et al. 2013. Joshua 5.0: Sparser, better, faster, server. In Proceedings of the 8th Workshop on Statistical Machine Translation. 206--212.
[30]
S. Salami and M. Shamsfard. 2016. Monotonic filter for hierarchical translation models. In Proceedings of the 2016 6th International Conference on Computer and Knowledge Engineering (ICCKE’16). 19--24.
[31]
S. Salami, M. Shamsfard, and S. Khadivi. 2016. Phrase-boundary model for statistical machine translation. Computer Speech 8 Language 38 (2016), 13--27. Available at http://www.sciencedirect.com/science/article/pii/S0885230815001096.
[32]
B. Sankaran, G. Haffari, and A. Sarkar. 2011. Bayesian extraction of minimal SCFG rules for hierarchical phrase-based translation. In Proceedings of the 6th Workshop on Statistical Machine Translation. 533--541.
[33]
B. Sankaran, G. Haffari, and A. Sarkar. 2012. Compact rule extraction for hierarchical phrase-based translation. In Proceedings of the 10th Biennial Conference of the Association for Machine Translation in the Americas (AMTA’12), Association for Computational Linguistics.
[34]
A. Venugopal et al. 2009. Preference grammars: Softening syntactic constraints to improve statistical machine translation. In Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics. 236--244.
[35]
T. Watanabe, H. Tsukada, and H. Isozaki. 2006. Left-to-right target generation for hierarchical phrase-based translation. In Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics. 777--784.
[36]
G. Maillette de Buy Wenniger and K. Sima'an. 2015. Labeling hierarchical phrase-based models without linguistic resources. Machine Translation 29, 3--4 (2015), 225--265.
[37]
B. Zhou et al. 2008. Prior derivation models for formally syntax-based translation using linguistically syntactic parsing and tree kernels. In Proceedings of the 2nd Workshop on Syntax and Structure in Statistical Translation. 19--27.
[38]
A. Zollmann et al. 2008. A systematic comparison of phrase-based, hierarchical and syntax-augmented statistical MT. In Proceedings of the 22nd International Conference on Computational Linguistics, Vol. 1. 1145--1152.
[39]
A. Zollmann and A. Venugopal. 2006. Syntax augmented machine translation via chart parsing. In Proceedings of the Workshop on Statistical Machine Translation. 138--141.
[40]
A. Zollmann and S. Vogel. 2011. A word-class approach to labeling PSCFG rules for machine translation. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Vol. 1. 1--11.

Cited By

View all
  • (2018)Image Annotation Algorithm Based on Semantic Similarity and Multi-featuresSmart Computing and Communication10.1007/978-3-030-05755-8_14(134-142)Online publication date: 9-Dec-2018

Index Terms

  1. Integrating Shallow Syntactic Labels in the Phrase-Boundary Translation Model

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Asian and Low-Resource Language Information Processing
    ACM Transactions on Asian and Low-Resource Language Information Processing  Volume 17, Issue 3
    September 2018
    196 pages
    ISSN:2375-4699
    EISSN:2375-4702
    DOI:10.1145/3184403
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 14 February 2018
    Accepted: 01 December 2017
    Revised: 01 August 2017
    Received: 01 January 2017
    Published in TALLIP Volume 17, Issue 3

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Chunk label
    2. Hierarchical models
    3. POS tag
    4. Statistical machine translation

    Qualifiers

    • Note
    • Research
    • Refereed

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)5
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 09 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2018)Image Annotation Algorithm Based on Semantic Similarity and Multi-featuresSmart Computing and Communication10.1007/978-3-030-05755-8_14(134-142)Online publication date: 9-Dec-2018

    View Options

    Get Access

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media