Abstract
The combination of translation memories (TMs) and statistical machine translation (SMT) has been demonstrated to be beneficial. In this paper, we present a combination approach which integrates TMs into SMT by using sparse features extracted at run-time during decoding. These features can be used on both phrase-based SMT and syntax-based SMT. We conducted experiments on a publicly available English–French data set and an English–Spanish industrial data set. Our experimental results show that these features significantly improve our phrase-based and syntax-based SMT baselines on both language pairs.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
Henceforth, this method is referred to as “TM combination”.
An HD fragment is composed of a head node and all of its dependents in a dependency tree.
In our experiments, we use the training corpus of our SMT experiments as a TM.
Unfortunately, due to confidentiality agreements the data used in these experiments cannot be publicly released.
Three probabilities in model III in Wang et al. (2013) which bring the best performance in their paper: \(p(TCM\mid SCM,NLN,LTC,SPL,SEP,Z)\), \(p(LTC\mid CSS,SCM,NLN,SEP,Z)\), \(p(CPM\mid TCM,SCM,NLN,Z)\). Note that our features are the combination of feature names and values in Wang et al. (2013). For example, the feature TCM\(_L\) in our system means that the value of the feature TCM in Wang et al. (2013) is L.
A qualitative analysis of our test set is being done to determine the real impact of our approach.
References
Biçici E, Dymetman M (2008) Dynamic translation memory: using statistical machine translation to improve translation memory fuzzy matches. In: Proceedings of the 9th international conference on computational linguistics and intelligent text processing, Haifa, Israel, pp 454–465
Bilmes JA, Kirchhoff K (2003) Factored language models and generalized parallel backoff. In: Proceedings of the 2003 conference of the North American chapter of the association for computational linguistics on human language technology—short papers, Edmonton, Canada, pp 4–6
Chen SF, Goodman J (1996) An empirical study of smoothing techniques for language modeling. In: Proceedings of the 34th annual meeting on association for computational linguistics, Santa Cruz, California, pp 310–318
Cherry C (2013) Improved reordering for phrase-based translation using sparse features. In: Proceedings of the 2013 conference of the North American chapter of the association for computational linguistics: human language technologies, Atlanta, Georgia, pp 22–31
Cherry C, Foster G (2012) Batch tuning strategies for statistical machine translation. In: Proceedings of the 2012 conference of the North American chapter of the association for computational linguistics: human language technologies, Montreal, Canada, pp 427–436
Chiang D (2005) A hierarchical phrase-based model for statistical machine translation. In: Proceedings of the 43rd annual meeting on association for computational linguistics, Michigan, Ann Arbor, pp 263–270
Clark JH, Dyer C, Lavie A, Smith NA (2011) Better hypothesis testing for statistical machine translation: controlling for optimizer instability. In: Proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies: short papers, Portland, Oregon, vol 2, pp 176–181
Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297
Denkowski M, Lavie A (2011) Meteor 1.3: automatic metric for reliable optimization and evaluation of machine translation systems. In: Proceedings of the sixth workshop on statistical machine translation, Edinburgh, Scotland, pp 85–91
Galley M, Manning CD (2008) A simple and effective hierarchical phrase reordering model. In: Proceedings of the conference on empirical methods in natural language processing, Honolulu, Hawaii, pp 848–856
He Y, Ma Y, van Genabith J, Way A (2010a) Bridging SMT and TM with translation recommendation. In: Proceedings of the 48th annual meeting of the association for computational linguistics, Uppsala, Sweden, pp 622–630
He Y, Ma Y, Way A, Van Genabith J (2010b) Integrating N-best SMT outputs into a TM system. In: Proceedings of the 23rd international conference on computational linguistics: posters, Beijing, China, pp 374–382
Kirchhoff K, Bilmes J, Duh K (2007) Factored language models tutorial. In: UWEE technical report, Department of Electrical Engineering, University of Washington
Koehn P, Senellart J (2010) Convergence of translation memory and statistical machine translation. In: Proceedings of AMTA workshop on MT research and the translation Industry, Denver, Colorado, USA, pp 21–31
Koehn P, Och FJ, Marcu D (2003) Statistical phrase-based translation. In: Proceedings of the 2003 conference of the North American chapter of the association for computational linguistics on human language technology, Edmonton, Canada, vol 1, pp 48–54
Koehn P, Hoang H, Birch A, Callison-Burch C, Federico M, Bertoldi N, Cowan B, Shen W, Moran C, Zens R, Dyer C, Bojar O, Constantin A, Herbst E (2007) Moses: open source toolkit for statistical machine translation. In: Proceedings of the 45th annual meeting of the ACL on interactive poster and demonstration sessions, Prague, Czech Republic, pp 177–180
Li L, Xie J, Way A, Liu Q (2014) Transformation and decomposition for efficiently implementing and improving dependency-to-string model in Moses. In: Proceedings of SSST-8, eighth workshop on syntax, semantics and structure in statistical translation, Doha, Qatar, pp 122–131
Ma Y, He Y, Way A, van Genabith J (2011) Consistent translation using discriminative learning—a translation memory-inspired approach. In: Proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies, Portland, Oregon, USA, pp 1239–1248
Och FJ, Ney H (2002) Discriminative training and maximum entropy models for statistical machine translation. In: Proceedings of the 40th annual meeting on association for computational linguistics, Pennsylvania, Philadelphia, pp 295–302
Och FJ, Ney H (2004) The alignment template approach to statistical machine translation. Comput Linguist 30(4):417–449
Papineni K, Roukos S, Ward T, Zhu WJ (2002) BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th annual meeting on association for computational linguistics, Pennsylvania, Philadelphia, pp 311–318
Parra Escartín C, Arcedillo M (2015) Living on the edge: productivity gain thresholds in machine translation evaluation metrics. In: Proceedings of the fourth workshop on post-editing technology and practice. Miami, Florida, pp 46–56
Schwartz L (2008) Multi-source translation methods. In: Proceedings of the 8th conference of the association for machine translation in the Americas, Waikiki, Hawaii
Snover M, Dorr B, Schwartz R, Micciulla L, Makhoul J (2006) A study of translation edit rate with targeted human annotation. In: Proceedings of association for machine translation in the Americas, Massachusetts, Cambridge, USA, pp 223–231
Stolcke A (2002) SRILM-an extensible language modeling toolkit. In: Proceedings of the 7th international conference on spoken language processing, Denver, Colorado, USA, pp 257–286
Wang K, Zong C, Su KY (2013) Integrating translation memory into phrase-based machine translation during decoding. In: Proceedings of the 51st annual meeting of the association for computational linguistics (long papers), Sofia, Bulgaria, vol 1, pp 11–21
Xie J, Mi H, Liu Q (2011) A novel dependency-to-string model for statistical machine translation. In: Proceedings of the conference on empirical methods in natural language processing, Edinburgh, United Kingdom, pp 216–226
Ziemski M, Junczys-Dowmunt M, Pouliquen B (2016) The United Nations parallel corpus v1.0. In: Proceedings of the tenth international conference on language resources and evaluation (LREC 2016), European Language Resources Association (ELRA), Paris, France
Acknowledgements
This research has received funding from the People Programme (Marie Curie Actions) of the European Union’s Framework Programme (FP7/2007-2013) under REA Grant agreement \(\hbox {n}^{\mathrm{o}}\) 317471. The ADAPT Centre for Digital Content Technology is funded under the SFI Research Centres Programme (Grant 13/RC/2106) and is co-funded under the European Regional Development Fund.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Li, L., Parra Escartín, C., Way, A. et al. Combining translation memories and statistical machine translation using sparse features. Machine Translation 30, 183–202 (2016). https://doi.org/10.1007/s10590-016-9187-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10590-016-9187-6